Skip to main content

Adding A STAC Collection Dataset

In this example, we will be accessing Sentinel 2 analysis-ready data from Earth Search by Element 84 .
The Sentinel 2 satellite observes the earth in 13 spectral bands once every 5 days. We will be working with the Sentinel-2 Level 2A data, which has been
processed to remove atmospheric effects. The resulting data product has 10 bands and a weekly cadence. We will use Boson to access the data and produce an
RGB image of a section of Philadelphia, Pennsylvania, USA.

This dataset and many others can be found on the STAC Index . Note that the geodesic.boson.dataset.Dataset.from_stac_collection
provider requires that the STAC Collection is hosted on a STAC API endpoint, and follows the STAC specification. Specifically, the STAC URL must be of the
form https://<root_url>/collections/<collection_id>.

Setup

First, we need to do some initial setup. We start by importing geodesic:

import geodesic

If you haven't yet, you will need to authenticate geodesic using the following command:

geodesic.authenticate()

We need to set the active project to ensure that our dataset is saved to the correct project.
You can fetch a dictionary of existing project that you can access by running

geodesic.account.projects.get_projects. \ If you are the owner of the project you want to use, you can set the active project by running:
geodesic.set_active_project('project_name'). If you are not the owner of the project, you will need \ to set the active project by providing the uid: geodesic.set_active_project(uid). Note that you will \ also need read and/or write permissions for the project in question. Finally, if you want to create a new project, \ you can do so by running:

proj = geodesic.create_project(
name="tutorials",
alias="tutorial Demos Project",
description="Project to hold demos and examples for our tutorial doc pages",
set_as_active=True,
)




Creating The Provider

Datasets in Geodesic provide a method,

geodesic.boson.dataset.Dataset.from_stac_collection which makes adding a STAC collection dataset extremely straightforward. \ To add the Sentinel 2 Level 2A dataset, all we need is the url to the STAC Catalog. The first argument to the \

geodesic.boson.dataset.Dataset.from_stac_collection method is a user-specified \ name for the dataset, and the second argument is the url for the STAC Catalog. We simply run:

stac_url = "https://earth-search.aws.element84.com/v0/collections/sentinel-s2-l2a-cogs"

ds = geodesic.Dataset.from_stac_collection(
name='sentinel-2',
url=stac_url
)

ds.save()

#output:
dataset:*:*:*:sentinel-2

Testing The Provider

Now to run a quick test to ensure that the provider is working. Let's run a simple search to check that features are returned

ds.search(limit=3)

This will return the first three features from the STAC collection in a geopandas GeoDataFrame.

Image

We can also inspect the Dataset with the

geodesic.boson.dataset.Dataset.dataset_info method. Let's look at the Blue band.

ds.dataset_info()['description']

# output:
'Sentinel-2a and Sentinel-2b imagery, processed to Level 2A (Surface Reflectance) and converted to Cloud-Optimized GeoTIFFs'


ds.dataset_info()['collections'][0]['item_assets']['B02']

# output:
{'title': 'Band 2 (blue)',
'type': 'image/tiff; application=geotiff; profile=cloud-optimized',
'roles': ['data'],
'gsd': 10,
'eo:bands': [{'name': 'B02',
'common_name': 'blue',
'center_wavelength': 0.4966,
'full_width_half_max': 0.098}]}


Creating an Image from the Dataset

We are making an RGB image, so we will pull the red, green, and blue bands from the dataset.
We also need to specify the bounding box of the area we want to visualize, and set the pixel size.

asset_bands = [
{'asset': 'B04', 'bands': [0]}, #red
{'asset': 'B03', 'bands': [0]}, #green
{'asset': 'B02', 'bands': [0]}, #blue
]


The resolution is 20m per pixel in the red band, and 10m per pixel for the green and blue bands. We set the pixel size to 10m by 10m, and Boson will handle the resampling.

pixsize = [10, 10]

We will look at a section of Philadelphia, Pennsylvania, USA in September of 2020, so we will set the bounding box, start_date and end_date accordingly.

import datetime

bbox = [-75.207767, 39.934486, -75.155411, 39.970806]
start_date = datetime.datetime(2020, 9, 29)
end_date = start_date + datetime.timedelta(days=7)

We can now pull the image data using Boson. For retrieving raster data,
we use the Dataset's geodesic.boson.dataset.Dataset.get_pixels method.

boson_output = ds.get_pixels(
bbox=bbox,
datetime=(start_date, end_date),
bbox_crs="EPSG:4326",
output_crs='EPSG:3857',
asset_bands=asset_bands,
pixel_size=pixsize,
)

The resulting raster is a numpy array with shape of 3-bands by x-pixels by y-pixels. The first dimension represents the three color bands.
We need to move them to the last dimension in order to display the image. We also need to normalize the pixel values to range between 0 and 1.

import numpy as np
import matplotlib.pyplot as plt

def prep_image(boson_output):
'''
This function accepts the numpy array from our boson request,
and outputs a numpy array that can be displayed as an RGB image.
Specifically, it moves the bands dimension from the first dimension
to the last dimension, and normalizes the data to be between 0 and 1.
'''

rgb_data = np.transpose(boson_output, (1, 2, 0))
rgb_data = (rgb_data - np.min(rgb_data)) / (np.max(rgb_data) - np.min(rgb_data))

return rgb_data

rgb_image = prep_image(boson_output)
plt.imshow(rgb_image)
plt.show()


Image

This image has some scattered clouds. Let's grab some more images and make a composite to remove the clouds. We will grab images for 3 weeks, starting on 9/5/2020,
and make a composite image by taking the median of the three images.

start_date_list = [datetime.datetime(2020, 9, 5) + datetime.timedelta(days=7 * time_step) for time_step in range(3)]

rgb_images = []

for start_date in start_date_list:

end_date = start_date + datetime.timedelta(days=7)

boson_output = ds.get_pixels(bbox=bbox,
datetime=[start_date, end_date],
content_type='raw',
bbox_crs='EPSG:4326',
output_crs='EPSG:3857',
asset_bands=asset_bands,
pixel_size=pixsize)

rgb_image = prep_image(boson_output)
rgb_images.append(rgb_image)

start_date = end_date


composite_image = np.median(rgb_images, axis=0)
plt.imshow(composite_image)
plt.show()

Image

In this cloudless image, you can see the Schuylkill River dividing West Philadelphia from Center City. The large white rectangle to the east is the Pennsylvania Convention Center.