Doing a Spatial Join with Boson
Use the Boson spatial join provider. All Boson Datasets have a
~geodesic.boson.dataset.Dataset.join
method that can be used to join two datasets. Set the
spatial_join
parameter to True
to perform a spatial join.
Setup
Start by creating a project, if you don't already have one set up, then set it as the active project.
import geodesic
geodesic.set_active_project('tutorials')
Load the Building Footprint Dataset
Next, we must load or create the datasets we wish to join. In this case, we are going to join two datasets that are already loaded into the system - building footprints from Microsoft Plantary Computer and the town boundaries for Maine from ArcGIS Living Atlas. The resulting dataset will have the building footprints and heights from Microsoft Plantary Computer, and the town boundaries and names from ArcGIS Living Atlas. Let's first take a look at the building footprints dataset.
maine_footprints_ds = geodesic.get_dataset('microsoft-building-footprints-maine')
maine_footprints = maine_footprints_ds.search(limit=10)
maine_footprints
We can share this dataset to an ArcGIS map or a web map by generating a share token and getting a URL from it. Here, we will share the building footprints dataset to an ipyleaflet map. To see an example of sharing a dataset to an ArcGIS map, see Working with Multiple Data Sources in your Environment of Choice - ArcGIS Online
from geodesic import mapping
from ipyleaflet import VectorTileLayer
maine_footprints_share_token = maine_footprints_ds.share_as_ogc_tiles_service()
maine_footprints_url = maine_footprints_share_token.get_ogc_vector_tile_url()
footprints_layer = VectorTileLayer(name="Building Footprints", url=maine_footprints_url)
m = mapping.Map(center=(43.66,-70.25), zoom=14)
m.add_layer()
m
Load the Town Boundaries Dataset
Now, let's load the town boundaries dataset.
maine_towns_ds = geodesic.get_dataset('maine_towns')
maine_towns = maine_towns_ds.search(limit=10)
maine_towns
As you can see, the town boundaries dataset contains the names of the towns in Maine, along with
their geometries. There are also some other fields that we are not interested in for this example,
such as last_edited_user
. We will drop those fields in the join step. Let's display the town
boundaries on a map:
maine_towns_url = maine_towns_ds.share_as_ogc_tiles_service().get_ogc_vector_tile_url()
towns_layer = VectorTileLayer(name="Maine Towns", url=maine_towns_url)
m = mapping.Map(center=(43.66,-70.25), zoom=11)
m.add_layer(towns_layer)
m
Perform the Spatial Join
Now that we have loaded both datasets, we can perform the spatial join, using the Dataset.join()
method. We set the field
to join on as None
, and the spatial_join
keyword to True
. We will
join the building footprints dataset as the left dataset with the town boundaries dataset as the
right. The use_geometry
parameter determines which dataset's geometry to use for the join ('right'
or 'left'). We will use the building footprints dataset's geometry for the join by specifying
use_geometry='left'
. The drop_fields
and right_drop_fields
parameters allow you to drop fields
from the left and right datasets, respectively, that you don't want in the joined dataset. We will
drop unnecessary columns from the town boundaries dataset with the right_drop_fields
keyword.
maine_footprints_towns_joined = maine_footprints_ds.join(
name='maine-footprints-towns-joined',
right_dataset=maine_towns_ds,
field=None,
spatial_join=True,
right_drop_fields=['created_date', 'created_user', 'last_edited_date', 'last_edited_user', 'id'],
use_geometry='left',
skip_initialize=True,
)
maine_footprints_towns_joined.save()
skip_initialize=True
. This will skip the initialization ofthe right dataset, which can be time-consuming for large datasets. As a result, all joins will be
dynamic. You must query with a bounding box or geometry; a search without either will not work. Also
take care not to query with a bounding box that will return a large number of features on the right
dataset. Should the nature of the datasets make dynamic joins problematic, you can run a Tesseract
job to create a spatially joined (non-dynamic) dataset. See the
:doc:Tesseract Job Guide <../../guides/advanced_guides/tesseract_jobs>
for more information.
Let's take a look at the joined dataset:
portland_bbox = [-70.3150, 43.5875, -70.1800, 43.7200]
maine_footprints_towns_joined_gdf = maine_footprints_towns_joined.search(bbox=portland_bbox, limit=10)
maine_footprints_towns_joined_gdf
And there you have it - a dynamic spatial join between two datasets using Boson.