Skip to main content

Querying Datasets with Dataset.search

Setup

import geodesic
geodesic.set_active_project('tutorials')


Creating the Dataset

We will use the EPA's Facility Registry Service (FRS) - Facilities dataset. It is one of the datasets in the Homeland Infrastructure Foundation-Level Data (HIFLD) , hosted as an ArcGIS online service. Additional information on the FRS and its component datasets is available at the EPA website .

url = "https://geodata.epa.gov/arcgis/rest/services/OEI/FRS_INTERESTS/MapServer/7"
ds = geodesic.Dataset.from_arcgis_layer(name='epa-frs-facilities', url=url)
ds.save()

# output:
dataset:*:*:*:epa-frs-facilities

Searching the Dataset

To start, let's simply run geodesic.boson.dataset.Dataset.search without any parameters. This will return the first 10 records in the dataset, by default.
We can change the maximum number of results with the limit parameter. Setting limit to None will return all the results.

ds.search()

Image

By default, the results are a geopandas GeoDataFrame. We can inspect a given feature using ordinary dataframe methods.

ds.search().iloc[0]

Image

We can refine our search using various keyword parameters, such as bbox (to specify a bounding box),
datetime (to specify a date range), intersects (to specify a geometry), and others.

bbox

Let's start by specifying a bounding box. We will use the bounding box of the state of Wyoming, USA.

from geodesic import SearchReturnType

bbox = [-111.056888, 41.00187, -104.052247, 45.005304]
epa_frs_features_wy = ds.search(limit=None, bbox=bbox)

Now, let's put those features on a map.

from geodesic import mapping

m = mapping.Map(center=[43, -107.5], zoom=5)
m.add_feature_collection('EPA FRS sites', epa_frs_features_wy)
m

Image

intersects

Suppose instead we want to search for features in a state that is not rectangular. We can specify a geometry to search for features that intersect with it. Here, we will use the geometry of New York State.
You can download the geometry from the US Census Bureau . We will use the file cb_2018_us_state_500k.zip, extract files from the zip archive, and load the geometry of New York.

import geopandas as gpd

shapefile_path = "./<zip_extract_destination_folder>/cb_2018_us_state_500k.shp"
gdf = gpd.read_file(shapefile_path)

ny_geometry = gdf.loc[gdf['NAME'] == 'New York', 'geometry'].squeeze()

ny_geometry

Image

Now, we can use this geometry to search for features that intersect with it.

epa_frs_features_ny = ds.search(intersects=ny_geometry, limit=1000)

m = mapping.Map(center=[43, -75], zoom=5)
m.add_feature_collection('EPA FRS sites', epa_frs_features_ny)
m

Image

filter

Search also supports CQL filters. Here we return only the sites that are active and are points where a substance is released in the county of New York, with accuracy values of 30 or higher.

from geodesic import cql

filter=cql.CQLFilter.logical_and(
cql.CQLFilter.eq("COUNTY_NAME", "NEW YORK"),
cql.CQLFilter.eq("REF_POINT_DESC", "POINT WHERE SUBSTANCE IS RELEASED"),
cql.CQLFilter.eq("ACTIVE_STATUS", "OPERATING"),
cql.CQLFilter.gte("ACCURACY_VALUE", 30),
)

active_sites = ds.search(
limit=None,
filter=filter
)

m = mapping.Map(center=[40.75, -73.98], zoom=12)
m.add_feature_collection('EPA FRS active sites where substance is released - high confidence', active_sites)
m

Image