Adding a Remote Provider#

Problem#

You want to use Boson to connect to data which is not currently supported by a built-in provider.

Solution#

We will build a remote provider that simulates lightning strikes. Keep in mind that remote providers can also be used to access data from an API. For a more complete example, see the API Wrapper Remote Provider Template.

Setup#

Remote providers are built using the Boson SDK and are stored in a Docker image.

To build a remote provider, you will need to have Docker installed on your machine.

Create a new directory for your remote provider and create files with the following structure:

.
├── Dockerfile
├── requirements.txt
└── provider.py

We will go over the meaning of each file in the following sections.

Provider#

The provider (provider.py) is the main file that contains the logic for your remote provider. We will start by importing the necessary libraries.

from typing import List
import logging

from boson.http import serve
import geodesic
import numpy as np
from shapely.geometry import Point
from boson.boson_core_pb2 import Property
import geopandas as gpd

logger = logging.getLogger(__name__)

Lightning Strike Simulator#

This simulator will generate random lightning strikes, using a simple Poisson process. In terms of code, we will create a class called LightningSimulator that has a search method that returns a FeatureCollection of lightning strikes. This simulation does not take any parameters, but all of the parameters are available in the search kwargs.

class LightningSimulator:
    def __init__(self) -> None:
        """Here we are just setting up some fake storms to generate some points around.

        We are just using some 2D Gaussian distributions to simulate the storms with
        storm center points and storm sizes in both dimensions.
        """
        self.storm_centers = [
            (-102.557373, 37.195331),
            (-76.609039, 40.702505),
            (-118.748474, 44.112240),
        ]
        self.storm_sizes = [(4, 2.5), (5, 3), (1, 2.5)]

    def simulate(self) -> List[geodesic.Feature]:
        """Simulate lightning strikes for Gaussian shaped storms."""
        feature_list = []
        for storm_center, storm_size in zip(self.storm_centers, self.storm_sizes):
            try:
                n_strikes = int(np.random.poisson(5) * storm_size[0] * storm_size[1])
                y = np.random.normal(storm_center[0], storm_size[0], n_strikes)
                x = np.random.normal(storm_center[1], storm_size[1], n_strikes)
            except Exception as e:
                logging.critical(f"Error: {e}")
                raise e
            for x, y in zip(x, y):
                feature_list.append(
                    geodesic.Feature(
                        **{
                            "geometry": {"type": "Point", "coordinates": [y, x]},
                            "properties": {
                                "strike": np.random.choice(["ground", "cloud"], p=[0.2, 0.8]),
                                "energy": np.abs(np.random.normal(5, 2)),
                            },
                        }
                    )
                )

        return feature_list

    def search(self, **kwargs) -> geodesic.FeatureCollection:
        """Implements the Boson Search endpoint."""
        logging.info("Simulating lightning strikes.")
        features = self.simulate()
        return geodesic.FeatureCollection(features=features)

Once we have our class, we can create an instance of it and pass it to the serve function.

sim = LightningSimulator()
app = serve(search_func=sim.search)

Requirements#

The requirements file is used to install the necessary libraries for the remote provider. The requirements file should look like this:

boson-sdk
geopandas
numpy
shapely

Dockerfile#

The Dockerfile is used to build the Docker image that will contain the remote provider. The Dockerfile should look like this:

FROM python:3.8-slim-buster as builder

# Update base container install
RUN apt-get update \
    && apt-get install -y python3-pip\
    && apt-get install -y git \
    && rm -rf /var/lib/apt/lists/* \
    && apt-get clean

RUN python3 -m pip install pip --upgrade

WORKDIR /app
COPY requirements.txt .
RUN pip install --user -r requirements.txt

FROM python:3.8-slim-buster as runner
COPY --from=builder /root/.local /root/.local

WORKDIR /app
COPY provider.py .
ENV PATH=/root/.local/bin:$PATH
CMD /usr/local/bin/python3 -u -m uvicorn --host 0.0.0.0 --port ${PORT} --log-level trace provider:app

Running the Remote Provider#

To build the Docker image, run the following command:

docker build -t lightning-simulator:v0.0.1 .

To run the Docker image, run the following command:

docker run -p 8000:8000 lightning-simulator:v0.0.1

The remote provider is not running locally on port 8000, but that doesn’t do too much good for Boson. To make it accessible to Boson, you will need to run the image in the cloud.

Google Cloud Run enables us to run the image in the cloud. To deploy the image to Google Cloud Run, you will need to have the Google Cloud SDK installed on your machine.

To deploy the image to Google Cloud Run, you first need to push the image to Google’s Artifact Registry. To do this, you will need to tag the image with the Artifact Registry URL.

Once the image is pushed to the Artifact Registry, you can deploy it to Google Cloud Run.

gcloud run deploy lightning-simulator \
    --image [ARTIFACT_REGISTRY_URL]/lightning-simulator:v0.0.1

After the image is deployed, you will get a URL that you can use to access the remote provider. Give this URL to Boson to access the data.

import geodesic

remote_provider = geodesic.entanglement.Dataset.from_remote_provider(
    url="https://lightning-simulator-azwzjbkrwq-uc.a.run.app",
    name="Lightning Simulator",
    description="Simulates lightning strikes.",
)
remote_provider.save()

# search for lightning strikes
remote_provider.search()

Notes#

Example arguments#

The serve function is the entry point for the remote provider. It takes 5 callable (functions) arguments:

  • extent: A function that returns the extent of the data. Can leave empty if the data does not have a spatial extent smaller than the entire globe.

    def extent():
        return {
            "spatial": {
                "bbox": [[-180.0, -90.0, 180.0, 90.0]]
            },
            "temporal": {
                "interval": [[None, None]]
            }
        }
    
  • dataset_info_func: A function that return the dataset information. This can also be left blank, and will be filled in with the extent and other metadata automatically if not explicitly written.

  • queryables_func: A function that returns the queryables of the data.

    def queryables():
        return {
            "include_photos": Property(
                title="Photos",
                type="boolean",
            )
        }
    
  • pixels_func: A function that returns the pixels given the request parameters. This method is one of the two main methods that can be implemented (the other is search). If the remote provider does not have any pixels data, then this method can be left blank.

    def pixels(**kwargs):
        include_photos = kwargs.get("include_photos", False)
        if include_photos:
            return {
                "photos": np.random.randint(0, 10, 10)
            }
        return {
            "some_name": np.random.randint(0, 100, 100)
        }
    
  • search_func: A function that returns the search results given the request parameters. If the remote provider does not have any search data, then this method can be left blank.

    def search(**kwargs):
        n = kwargs.get("limit", 10)
        return gpd.GeoDataFrame(
            {
                "geometry": [
                    Point(np.random.uniform(-180, 180), np.random.uniform(-90, 90))
                    for _ in range(n)
                ]
            }
        )
    

GeoPandas#

You can either return a GeoDataFrame or a FeatureCollection from the search method. If you return a GeoDataFrame, Boson will automatically convert it to a FeatureCollection. If you return a FeatureCollection, Boson will use it as is. This can be useful if you want to use the GeoPandas library to manipulate the data before returning it.