.. _tesseract-models: Tesseract Models with the Python SDK ==================================== The Tesseract Compute Engine allows you to scale spatio-temporal analysis to massive scales and fuse datasets that are in different formats and locations with a simple job description. Running arbitrary processing or models is done through the use of Docker containers. In this tutorial we will cover how to create the docker image that will do the processing. The Tesseract Python SDK helps you to build docker containers that can be used with the Tesseract Compute Engine. Overview -------- To use a docker image in Tesseract you just need to follow these steps: 1. Install the SDK 2. Create python script that will be the entrypoint 3. Implement two required functions: `inference` and `get_model_info` 4. Build, validate and push the docker image to a registry We will go over all of these steps with a simple example that will serve as a template to build other, more complex models. Installation ------------ The SDK can be installed with pip: .. code:: bash pip install tesseract-sdk Test that the install worked by running: .. code:: bash tesseract-sdk --help This will allow us to use the tesseract CLI to validate our image. Python Script ------------- The python script will be the entrypoint to the docker image. It will be called by the Tesseract Compute Engine and will be responsible for loading the model,running inference or any other processing you would like to do on the inputs. The script only needs to have two functions: `inference` and `get_model_info`. The `inference` function will be called for each chunk of data to be processed and the `get_model_info` function just lets Tesseract know what inputs and outputs should be expected so that some basic validation can be performed. Lets look at a minimal example script called `calculate_ndvi.py`: .. code:: python import logging from tesseract import serve import numpy as np def inference(assets: dict, logger: logging.Logger, **kwargs) -> dict: logger.info("Running my custom calculate ndvi model") red = assets['landsat'][0, 0, :, :].astype('float32') # red band of landsat. Can also use $0 to get the first input nir = assets['landsat'][0, 1, :, :].astype('float23') # nir band of landsat. ndvi = (nir - red) / (nir + red + 1e-30) # Calculate NDVI. Small factor added to avoid divide by zero ndvi = ndvi.reshape(1, 1, 1024, 1024) # reshape to match model output return {'ndvi': ndvi} def get_model_info() -> dict: return { "inputs": [{ "name":"landsat", "dtype": "uint16", "shape": [1, 2, 1024, 1024] }], "outputs": [{ "name":"ndvi", "dtype": "float32", "shape": [1, 1, 1024, 1024] }] } if __name__ == "__main__": serve(inference, get_model_info) Lets look at what each part of this script is doing. First of all the `inference` function can take several arguments, in this case we are using `assets` and `logger`. The `assets` argument is a dictionary that contains all of the inputs to the model. Assets can be accessed in two ways, by the input name, or by its position in the tesseract job. In this example we use the input in the first position ($0). Each key in `assets` will contain a 4D numpy array with one chunk of data from the input. The array will always have dimensions [time, bands, height, width]. That is, the first dimension is always the number of time steps, the second is the number of bands, and the last two are height and width, or y and x. In the example we are using a non-temporal dataset, so the time dimension is 1. The `logger` argument is a python logger that can be used to output information to the Tesseract logs. These logs are accessible from the tesseract job after running. We will look at this later. The processing that is done in the `inference` function is very simple, it just sums all of the bands that are passed as input. This is not a very useful model but shows that we can perform any processing we want on the input assets. The output of the `inference` function is a dictionary with the results as numpy arrays similar to the `assets` input. In this case we are returning a single output called `sum_of_inputs`. Notice that its second dimension will have size 1 because we have summed all of the bands. The `get_model_info` function is used to tell Tesseract what inputs and outputs to expect. This is used to validate the model before running it. The inputs and outputs are defined as a list of dictionaries. Each dictionary has three keys: `name`, `dtype` and `shape`. The `name` is the name of input or output. The `dtype` is the data type of the input or output. Any numpy dtype strings will work for these (eg. 'float32', '` tutorial.