Doing a Spatial Join with Boson#
Problem#
You want to join two datasets where they overlap spatially, using Boson.
Solution#
Use the Boson spatial join provider. All Boson Datasets have a join()
method that can be used to join two datasets. Set the spatial_join
parameter to True
to perform a spatial join.
Setup#
Start by seting up the basic geodesic workflow in python.
import geodesic
geodesic.set_active_project('cookbook-examples')
Load the Building Footprint Dataset#
Next, we must load or create the datasets we wish to join. In this case, we are going to join two datasets that are already loaded into the system - building footprints from Microsoft Plantary Computer and the town boundaries for Maine from ArcGIS Living Atlas. The resulting dataset will have the building footprints and heights from Microsoft Plantary Computer, and the town boundaries and names from ArcGIS Living Atlas. Let’s first take a look at the building footprints dataset.
maine_footprints_ds = geodesic.get_dataset('microsoft-building-footprints-maine')
maine_footprints = maine_footprints_ds.search(limit=10)
maine_footprints

We can share this dataset to an ArcGIS map or a web map by generating a share token and getting a URL from it. Here, we will share the building footprints dataset to an ipyleaflet map. To see an example of sharing a dataset to an ArcGIS map, see Working with Multiple Data Sources in your Environment of Choice - ArcGIS Online
from geodesic import mapping
from ipyleaflet import VectorTileLayer
maine_footprints_share_token = maine_footprints_ds.share_as_ogc_tiles_service()
maine_footprints_url = maine_footprints_share_token.get_ogc_vector_tile_url()
footprints_layer = VectorTileLayer(name="Building Footprints", url=maine_footprints_url)
m = mapping.Map(center=(43.66,-70.25), zoom=14)
m.add_layer()
m

Load the Town Boundaries Dataset#
Now, let’s load the town boundaries dataset.
maine_towns_ds = geodesic.get_dataset('maine_towns')
maine_towns = maine_towns_ds.search(limit=10)
maine_towns

As you can see, the town boundaries dataset contains the names of the towns in Maine, along with their geometries. There are also some other fields that we are not interested in for this example, such as last_edited_user. We will drop those fields in the join step. Let’s display the town boundaries on a map:
maine_towns_url = maine_towns_ds.share_as_ogc_tiles_service().get_ogc_vector_tile_url()
towns_layer = VectorTileLayer(name="Maine Towns", url=maine_towns_url)
m = mapping.Map(center=(43.66,-70.25), zoom=11)
m.add_layer(towns_layer)
m

Perform the Spatial Join#
Now that we have loaded both datasets, we can perform the spatial join, using the Dataset.join() method. We set the field to join on as None, and the spatial_join keyword to True. We will join the building footprints dataset as the left dataset with the town boundaries dataset as the right. The use_geometry parameter determines which dataset’s geometry to use for the join (‘right’ or ‘left’). We will use the building footprints dataset’s geometry for the join by specifying use_geometry=’left’. The drop_fields and right_drop_fields parameters allow you to drop fields from the left and right datasets, respectively, that you don’t want in the joined dataset. We will drop unnecessary columns from the town boundaries dataset with the right_drop_fields keyword.
maine_footprints_towns_joined = maine_footprints_ds.join(
name='maine-footprints-towns-joined',
right_dataset=maine_towns_ds,
field=None,
spatial_join=True,
right_drop_fields=['created_date', 'created_user', 'last_edited_date', 'last_edited_user', 'id'],
use_geometry='left',
skip_initialize=True,
)
maine_footprints_towns_joined.save()
Note
For large “right” datasets, set skip_initialize=True
. This will skip the initialization of the right dataset, which can be time-consuming for large datasets. As a result, all joins will be dynamic.
You must query with a bounding box or geometry; a search without either will not work. Also take care not to query with a bounding box that will return a large number of features on the right dataset. Should the nature of the datasets make dynamic joins problematic, you can run a Tesseract job to create a spatially joined (non-dynamic) dataset. See the Tesseract Job Guide for more information.
Let’s take a look at the joined dataset:
portland_bbox = [-70.3150, 43.5875, -70.1800, 43.7200]
maine_footprints_towns_joined_gdf = maine_footprints_towns_joined.search(bbox=portland_bbox, limit=10)
maine_footprints_towns_joined_gdf

And there you have it - a dynamic spatial join between two datasets using Boson.