Skip to main content

Data Mesh Overview

Why a Data Mesh?

Data in the real world are fragmented. As data scientists, we often spend a significant amount of time accessing and combining remote datasets to perform an analysis or build a model. This is hard enough with traditional data - spatiotemporal data compounds this problem even further. Many data scientists are used to working with their dataset in several different ways:

  • Local Files
  • Database Connections
  • External APIs
  • Simulations/Data Generation

Data scientists also spend a large fraction of their day just getting data and working it into a format that’s easy to work with - a numpy array, a pandas DataFrame, or similar. The problem is even more difficult when it’s time to take a model into production to make business impact. Wouldn’t it be great if there were a way to unify the way we access data? Boson is the geospatial service mesh. It unifies the way geospatial data is accessed, and does so in a completely scalable way. Geodesic works on a decentralized data model - we don’t need to own, control, or even have direct access to a dataset in order to use it. We place the power in your hands, Boson is the primary tool with which we do so.

Boson, fundamentally, is a proxy that can serve various APIs and data sources THROUGH many different API interfaces. To give a concrete example, let’s say you are used to working with the ArcGIS REST API. This would normally mean you are limited to working with data exposed by ArcGIS layers or a small number of APIs that implement the GeoServices REST Specification. Perhaps you are also comfortable working with Google Earth Engine. Google Earth Engine (GEE) provides tens of terabytes of geospatial data and is free for academic usage (commercial licenses are available). GEE exposes data through its proprietary REST API and prefers that users work through either its Javascript or REST API. To give one final example, consider the growing list of STAC APIs for public data. These might include the public USGS Landsat Catalog, the NASA Earthdata CMR, and many others. These three examples all contain valuable spatiotemporal data, but they are all accessed through different APIs. With Boson, we can unify the way we access these data sources. We don’t just mean by converting them to our proprietary API, adding yet another standard into the mix, but by translating them on both the request and response side. This means you can serve Google Earth Engine data through GeoServices APIs, GeoServices APIs through STAC APIs, and an extensible list of similar translations.

Why “Boson”?

In particle physics, a Boson (named after theoretical physicist and mathematician Satyendra Nath Bose) is a type of fundamental particle that carries a force. The most familiar Boson is the photon (aka “light”). The photon mediates the electromagnetic interaction and is responsible for nearly everything we directly experience in the world. Other Bosons include the Higgs Boson, the W/Z Bosons of the weak force, and gluons of the strong force.

We chose the name Boson because Bosons bind the universe together. In a similar way, Geodesic's Boson is the glue that can unify all geospatial data infrastrucutre.

Feynman

The above picture shows a “Feynman Diagram”. The solid lines with arrows represent matter particles and the wavy line in the middle is a Boson. This diagram represents a possible interaction between two matter particles mediated by a Boson. This diagram also represents the way Boson works internally. On the left the particle represents the servicer and the provider is on the right. Boson sits in the middle and mediates the exchange. In Geodesic’s Boson, this disentangles the two representations, allowing them to interact like they could never before.

Servicers and Providers

As depicted above, Boson connects servicers and providers. Servicers allow users to request data and serve it in a particular form. Providers are the data sources. Essentially, Boson allows the user to request data from virtually any source, in virtually any format. Boson has a growing list of servicers and providers that allow you to connect your datasets and work with them. The current list of servicers and providers is as follows:

Servicers

  • STAC/OGC API: Features

  • Esri GeoServices

  • GDAL Raster Operations

Providers

  • STAC/OGC API: Features

  • Esri GeoServices

  • Google Earth Engine

  • Tabular (GeoJSON, GeoParquet, csv, etc.)

  • Elasticsearch

  • Cloud Raster Data (S3/GCS/Azure Blob)

  • Tesseract (Geodesic's planetary scale compute engine)

  • Remote HTTP/gRPC - users can deploy custom providers for interacting with other APIs

This list is continually growing based on the priorities of our users, but this already gives you access to petabytes of geospatial data. A servicer does not need to know HOW to get the data from the provider and a provider doesn’t need to know much about how it’s being queried. Both servicers and providers communicate with each other through Boson.

Boson Datasets

A Boson dataset is a type of object that is stored in the Geodesic knowledge graph, called Entanglement. Like all objects in Entanglement, a dataset has many of the same basic fields, but has more rigorous constraints on what must exist for a dataset to be valid in Geodesic. These constraints are there to ensure that when a dataset is created, it should be usable with all services in Geodesic. The full JSON Schema for the dataset object is available in the knowledge graph documentation. The important piece of a dataset to call attention to is the boson_config. This is, fundamentally what tells Boson how it should configure itself when accessing a dataset. The boson_config provides a few pieces of information:

  • provider_name - simply the name of the provider that can talk to the downstream API/raw data

  • url - the remote path to the data. This is the API url, cloud bucket storing data, or some reference to the data.

  • properties - a list of provider-specific properties that say something about this dataset.

  • credential - the name of a Credential in Geodesic that Boson will use to access this dataset.

When we say a Boson dataset is created, it is automatically “stored in the graph”. That is, an dataset object is created in Entanglement, which can be connected to other objects in the knowledge graph, such as events, entities, observables and other datasets. We will discuss the knowledge graph further in the next section.