Querying Datasets with Dataset.search
Setup
import geodesic
geodesic.set_active_project('tutorials')
Creating the Dataset
We will use the EPA's Facility Registry Service (FRS) - Facilities dataset. It is one of the datasets in the Homeland Infrastructure Foundation-Level Data (HIFLD) , hosted as an ArcGIS online service. Additional information on the FRS and its component datasets is available at the EPA website .
url = "https://geodata.epa.gov/arcgis/rest/services/OEI/FRS_INTERESTS/MapServer/7"
ds = geodesic.Dataset.from_arcgis_layer(name='epa-frs-facilities', url=url)
ds.save()
# output:
dataset:*:*:*:epa-frs-facilities
Searching the Dataset
To start, let's simply run geodesic.boson.dataset.Dataset.search without
any parameters. This will return the first 10 records in the dataset, by default.
We can change the maximum number of results with the limit
parameter. Setting limit
to None
will return all the results.
ds.search()
By default, the results are a geopandas GeoDataFrame
. We can inspect a given feature using
ordinary dataframe methods.
ds.search().iloc[0]
Refining the Search
We can refine our search using various keyword parameters, such as bbox
(to specify a bounding
box),
datetime
(to specify a date range), intersects
(to specify a geometry), and others.
bbox
Let's start by specifying a bounding box. We will use the bounding box of the state of Wyoming, USA.
from geodesic import SearchReturnType
bbox = [-111.056888, 41.00187, -104.052247, 45.005304]
epa_frs_features_wy = ds.search(limit=None, bbox=bbox)
Now, let's put those features on a map.
from geodesic import mapping
m = mapping.Map(center=[43, -107.5], zoom=5)
m.add_feature_collection('EPA FRS sites', epa_frs_features_wy)
m
intersects
Suppose instead we want to search for features in a state that is not rectangular. We can specify a
geometry to search for features that intersect with it. Here, we will use the geometry of New York
State.
You can download the geometry from the
US Census Bureau .
We will use the file cb_2018_us_state_500k.zip
, extract files from the zip archive, and load the
geometry of New York.
import geopandas as gpd
shapefile_path = "./<zip_extract_destination_folder>/cb_2018_us_state_500k.shp"
gdf = gpd.read_file(shapefile_path)
ny_geometry = gdf.loc[gdf['NAME'] == 'New York', 'geometry'].squeeze()
ny_geometry
Now, we can use this geometry to search for features that intersect with it.
epa_frs_features_ny = ds.search(intersects=ny_geometry, limit=1000)
m = mapping.Map(center=[43, -75], zoom=5)
m.add_feature_collection('EPA FRS sites', epa_frs_features_ny)
m
filter
Search also supports CQL filters. Here we return only the sites that are active and are points where a substance is released in the county of New York, with accuracy values of 30 or higher.
from geodesic import cql
filter=cql.CQLFilter.logical_and(
cql.CQLFilter.eq("COUNTY_NAME", "NEW YORK"),
cql.CQLFilter.eq("REF_POINT_DESC", "POINT WHERE SUBSTANCE IS RELEASED"),
cql.CQLFilter.eq("ACTIVE_STATUS", "OPERATING"),
cql.CQLFilter.gte("ACCURACY_VALUE", 30),
)
active_sites = ds.search(
limit=None,
filter=filter
)
m = mapping.Map(center=[40.75, -73.98], zoom=12)
m.add_feature_collection('EPA FRS active sites where substance is released - high confidence', active_sites)
m