Querying Datasets with Dataset.search#
Problem#
You want to query a Boson Dataset.
Solution#
Use Boson’s Dataset search()
method.
Setup#
Start by running through the basic setup in Geodesic Basics, if you have not already done so. Then set the active project.
import geodesic
geodesic.set_active_project('cookbook-examples')
Creating the Dataset#
We will use the EPA’s Facility Registry Service (FRS) - Facilities dataset. It is one of the datasets in the Homeland Infrastructure Foundation-Level Data (HIFLD), hosted as an ArcGIS online service. Additional information on the FRS and its component datasets is available at the EPA website.
url = "https://geodata.epa.gov/arcgis/rest/services/OEI/FRS_INTERESTS/MapServer/7"
ds = geodesic.Dataset.from_arcgis_layer(name='epa-frs-facilities', url=url)
ds.save()
# output:
dataset:*:*:*:epa-frs-facilities
Searching the Dataset#
To start, let’s simply run search()
without any parameters. This will return the first 10 records in the dataset, by default. We can change the maximum number of results with the limit
parameter. Setting limit
to None
will return all the results.
ds.search()
By default, the results are a geopandas GeoDataFrame
. We can inspect a given feature using ordinary dataframe methods.
ds.search().iloc[0]
Refining the Search#
We can refine our search using various keyword parameters, such as bbox
(to specify a bounding box), datetime
(to specify a date range), intersects
(to specify a geometry), and others.
bbox#
Let’s start by specifying a bounding box. We will use the bounding box of the state of Wyoming, USA.
from geodesic import SearchReturnType
bbox = [-111.056888, 41.00187, -104.052247, 45.005304]
epa_frs_features_wy = ds.search(limit=None, bbox=bbox)
Now, let’s put those features on a map.
from geodesic import mapping
m = mapping.Map(center=[43, -107.5], zoom=5)
m.add_feature_collection('EPA FRS sites', epa_frs_features_wy)
m
intersects#
Suppose instead we want to search for features in a state that is not rectangular. We can specify a geometry to search for features that intersect with it. Here, we will use the geometry of New York State. You can download the geometry from the US Census Bureau. We will use the file cb_2018_us_state_500k.zip, extract files from the zip archive, and load the geometry of New York.
import geopandas as gpd
shapefile_path = "./<zip_extract_destination_folder>/cb_2018_us_state_500k.shp"
gdf = gpd.read_file(shapefile_path)
ny_geometry = gdf.loc[gdf['NAME'] == 'New York', 'geometry'].squeeze()
ny_geometry
Now, we can use this geometry to search for features that intersect with it.
epa_frs_features_ny = ds.search(intersects=ny_geometry, limit=1000)
m = mapping.Map(center=[43, -75], zoom=5)
m.add_feature_collection('EPA FRS sites', epa_frs_features_ny)
m
filter#
Search also supports CQL filters. Here we return only the sites that are active and are points where a substance is released in the county of New York, with accuracy values of 30 or higher.
from geodesic import cql
filter=cql.CQLFilter.logical_and(
cql.CQLFilter.eq("COUNTY_NAME", "NEW YORK"),
cql.CQLFilter.eq("REF_POINT_DESC", "POINT WHERE SUBSTANCE IS RELEASED"),
cql.CQLFilter.eq("ACTIVE_STATUS", "OPERATING"),
cql.CQLFilter.gte("ACCURACY_VALUE", 30),
)
active_sites = ds.search(
limit=None,
filter=filter
)
m = mapping.Map(center=[40.75, -73.98], zoom=12)
m.add_feature_collection('EPA FRS active sites where substance is released - high confidence', active_sites)
m