Querying Datasets with Dataset.search¶
Problem¶
You want to query a Boson Dataset.
Solution¶
Use Boson’s Dataset.search()
method.
Setup¶
Start by running through the basic setup in Geodesic Basics, if you have not already done so. Then set the active project.
>>> import geodesic
>>> geodesic.set_active_project('docs-example')
Creating the Dataset¶
We will use the EPA’s Facility Registry Service (FRS) - Facilities dataset. It is one of the datasets in the Homeland Infrastructure Foundation-Level Data (HIFLD), hosted as an ArcGIS online service. Additional information on the FRS and its component datasets is available at the EPA website.
>>> url = "https://geodata.epa.gov/arcgis/rest/services/OEI/FRS_INTERESTS/MapServer/7"
>>> ds = geodesic.Dataset.from_arcgis_layer(name='epa-frs-facilities', url=url)
>>> ds.save()
Searching the Dataset¶
To start, let’s simply run search without any parameters. This will return the first 10 records in the dataset, by default. We can change the maximum number of results with the limit parameter. Setting limit to None will return all the results.
>>> ds.search()
We can inspect each feature by clicking on its card.
Or, we can bypass the card and inspect the feature directly as a dictionary.
>>> dict(ds.search()['features'][0])
{'type': 'Feature',
'id': '1',
'geometry': {'type': 'Point',
'coordinates': [-110.72743157131411, 41.307046190216276]},
'properties': {'ACCURACY_VALUE': 30,
'ACTIVE_STATUS': 'OPERATING',
'CITY_NAME': 'UINTA',
'COLLECT_MTH_DESC': 'ADDRESS MATCHING-HOUSE NUMBER',
'COUNTY_NAME': 'UINTA',
'CREATE_DATE': 1643846400000,
'EPA_REGION_CODE': '08',
'FAC_URL': 'https://ofmpub.epa.gov/frs_public2/fii_query_detail.disp_program_facility?p_registry_id=110071199459',
'FEDERAL_AGENCY_NAME': None,
'FEDERAL_LAND_IND': None,
'FED_FACILITY_CODE': None,
'FIPS_CODE': '56041',
'HUC8_CODE': '14040108',
'HUC_12': None,
'INTEREST_TYPE': 'AIR MINOR',
'KEY_FIELD': 'EIS19877211',
'LAST_REPORTED_DATE': 1660694400000,
'LATITUDE83': 41.30704,
'LOCATION_ADDRESS': '30,18N,112W',
'LONGITUDE83': -110.72742,
'OBJECTID': 1,
'PGM_REPORT_URL': 'no data yet',
'PGM_SYS_ACRNM': 'EIS',
'PGM_SYS_ID': '19877211',
'POSTAL_CODE': '82930',
'PRIMARY_NAME': 'PORTER HOLLOW FEDERAL 12-3',
'PROGRAM_URL': None,
'PUBLIC_IND': 'Y',
'REF_POINT_DESC': 'ENTRANCE POINT OF A FACILITY OR STATION',
'REGISTRY_ID': '110071199459',
'STATE_CODE': 'WY',
'UPDATE_DATE': None},
'bbox': [-110.72743157131411,
41.307046190216276,
-110.72743157131411,
41.307046190216276],
'collection': '7'}
Refining the Search¶
We can refine our search using various keyword parameters, such as bbox (to specify a bounding box), datetime (to specify a date range), intersects (to specify a geometry), and others.
bbox¶
Let’s start by specifying a bounding box. We will use the bounding box of the state of Wyoming, USA.
>>> bbox = [-111.056888, 41.00187, -104.052247, 45.005304]
>>> epa_frs_features_wy = ds.search(limit=None, bbox=bbox)
Now, let’s put those features on a map.
>>> from geodesic import mapping
>>> from ipywidgets import Layout
>>> layout = Layout(height='500px')
>>> m = mapping.Map(center=[43, -107.5], zoom=5, layout=layout)
>>> m.add_feature_collection('EPA FRS sites', epa_frs_features_wy)
>>> m
intersects¶
Suppose instead we want to search for features in a state that is not rectangular. We can specify a geometry to search for features that intersect with it. Here, we will use the geometry of New York State. You can download the geometry from the US Census Bureau. We will use the file cb_2020_us_state_500k.zip, extract files from the zip archive, and load the geometry of New York.
>>> import geopandas as gpd
>>> shapefile_path = "./<zip_extract_destination_folder>/cb_2018_us_state_500k.shp"
>>> gdf = gpd.read_file(shapefile_path)
>>> ny_geometry = gdf.loc[gdf['NAME'] == 'New York', 'geometry'].squeeze()
>>> ny_geometry
Now, we can use this geometry to search for features that intersect with it.
>>> epa_frs_features_ny = ds.search(intersects=ny_geometry, limit=1000)
>>> m = mapping.Map(center=[43, -75], zoom=6, layout=layout)
>>> m.add_feature_collection('EPA FRS sites', epa_frs_features_ny)
>>> m
filter¶
Search also supports CQL filters. Here we return only the sites that are active and are points where a substance is released in the county of New York, with accuracy values of 30 or higher.
>>> from geodesic import cql
>>> filter=cql.CQLFilter.logical_and(
cql.CQLFilter.eq("COUNTY_NAME", "NEW YORK"),
cql.CQLFilter.eq("REF_POINT_DESC", "POINT WHERE SUBSTANCE IS RELEASED"),
cql.CQLFilter.eq("ACTIVE_STATUS", "OPERATING"),
cql.CQLFilter.gte("ACCURACY_VALUE", 30),
)
>>> active_sites = ds.search(
limit=None,
filter=filter,
)
>>> m = mapping.Map(center=[40.75, -73.98], zoom=12, layout=layout)
>>> m.add_feature_collection('EPA FRS active sites where substance is released - high confidence', active_sites)
>>> m