Boson Overview#
Why Boson?#
Data in the real world are fragmented. As data scientists, we often spend a significant amount of time accessing and combining remote datasets to perform an analysis or build a model. This is hard enough with traditional data - spatiotemporal data compounds this problem even further. Many data scientists are used to working with their dataset in several different ways: - Local Files - Database Connections - External APIs - Simulations/Data Generation
Data scientists also spend a large fraction of their day just getting data and working it into a format that’s easy to
work with - a numpy
array, a pandas DataFrame
, and similar. The problem is even more difficult when it’s time to
take a model into production to make business impact. Wouldn’t it be great if there were a way to unify the way we
access data? Boson is the geospatial service mesh. It unifies the way geospatial data is accessed, and does so in a
completely horizontally scalable way. As we discussed previously, Geodesic works on a decentralized data model - we
don’t need to own, control, or even have direct access to a dataset in order to use it. We place the power in your
hands, Boson is the primary tool with which we do so.
Boson, fundamentally, is a proxy that can serve various APIs and data sources THROUGH many different API interfaces. To give a concrete example, let’s say you are used to working with the ArcGIS REST API. This would normally mean you are limited to working with data exposed by ArcGIS layers or a small number of APIs that implement the GeoServices REST Specification. Perhaps you are comfortable working with Google Earth Engine. Google Earth Engine (GEE) provides tens of terabytes of geospatial data and is free for academic usage (commercial licenses are available). GEE exposes data through its proprietary REST api and prefers that users work through either it’s Javascript or REST API. To give one final example, consider the growing list of STAC APIs for public data. These might include the public USGS Landsat Catalog, the NASA Earthdata CMR, and many others. These three examples all contain valuable spatiotemporal data, but they are all accessed through a different API. Through Boson, we can unify the way we access these data sources. We don’t just mean by converting them to our proprietary API, adding yet another standard into the mix, but by translating them on both the request and response side. This means you can serve Google Earth Engine data through GeoServices APIs, GeoServices APIs through STAC APIs, and an extensible list of similar translations.
Servicers and Providers#
The way you talk to Boson is through a Servicer and Boson retrieves external data through a Provider. Boson
has a growing list of servicers
and providers
that allow you to connect your dataset and work with them. The
core Boson software is the same, but the choice of servicer and provider defines the configuration for a particular
request. The current list of servicers and providers is as follows:
Servicers
STAC/OGC API: Features
Esri GeoServices
GDAL Raster Operations
Providers
STAC/OGC API: Features
Esri GeoServices
Google Earth Engine
Elasticsearch
Cloud Raster Data (S3/GCS/Azure Blob)
Tesseract (Geodesic)
Remote HTTP/gRPC
This list is continually growing based on the priorities of our users, but this already gives you access to petabytes of geospatial data. Even though you see STAC and GeoServices on both the servicers and providers list, a servicer does not need to know HOW to get the data from the provider and a provider doesn’t need to know much about how it’s being queried. Both servicers and providers communicate with each other through a slightly abstracted middle layer, simply, Boson.
Boson Datasets#
Before we get more into how Boson works, let’s discuss the concept of a dataset
in Geodesic. A dataset is a type
of object
that is stored in the Geodesic knowledge graph. Like all objects in Entanglement, a dataset has many of
the same basic fields, but has more rigorous contraints on what must exist for a dataset to be valid in Geodesic.
These contraints are there to ensure that when a dataset is created, it should be usable with all services in Geodesic.
Fortunately, if it works with Boson, it should work with all services because that’s not only how the Python API
accesses data, it’s how our services internally access data. The full JSON Schema for the dataset object is available in
the entanglement docs. The important piece of a dataset to call
attention to is the boson_config
. This is, fundamentally what tells Boson how it should configure itself when
accessing a dataset. The boson config provides a few pieces of information:
provider_name
- simply the name of the provider that can talk to the downstream API/raw dataurl
- the remote path to the data. This is the API url, cloud bucket storing data, or some reference to the data.properties
- a list of provider-specific properties that say something about this dataset.credential
- the name of aCredential
in Geodesic that Boson will use to access this dataset.
There are a few other fields as well, but they are less important. When we say a Boson dataset is “stored in the graph”, this is what we mean. This is all Boson needs to access remote data.
So far we’ve kept this fairly abstract, so let’s make it more concrete in the next section, which goes through some examples of using Boson
Why “Boson”?#
In particle physics, a Boson (named after theoretical physicist and mathematician Satyendra Nath Bose) is a type of fundamental particle that carries a force. The most familiar Boson is the photon (aka “light”). The photon mediates the electromagnetic interaction and is responsible for nearly everything we directly experience in the world. Other Bosons include the Higgs Boson, the W/Z Bosons of the weak force, and gluons of the strong force. Some theories propose another Boson, the graviton, which would be responsible for the graviational force.
We chose the name Boson for two reasons. First, because Bosons bind the universe together, we consider Boson as the glue that can unify all geospatial data infrastrucutre. The second reason takes some explaination.
The above picture shows a “Feynman Diagram”. The solid lines with arrows represent matter particles and the wavy line
in the middle is a Boson. This diagram represents a possible interaction between two matter particles mediated by a
Boson. This diagram also represents the way Boson works internally. On the left the particle represents the servicer
and the provider
is on the right. Boson sits in the middle and mediates the exchange. In Geodesic’s Boson, this
disentangles the two representations, allowing them to interact like they could never before.