Accessing raster data with Dataset.get_pixels#

Problem#

You want to access raster data from a Boson Dataset.

Solution#

In this example, we will be accessing Sentinel 2 analysis-ready data from Earth Search by Element 84. The Sentinel 2 satellites together observe the earth in 13 spectral bands, with each satellite capturing imagery once every 5 days. We will be working with the Sentinel-2 Level 2A data, which has been processed to remove atmospheric effects. The resulting data product has imagery for every 2-4 days in 10 spectral bands. We will use Boson to access the data and examine the NDVI (Normalized Difference Vegetation Index) over some corn fields in Idaho. NDVI is a measure of crop health in an area, and is calculated from the red and near-infrared bands of the satellite data. We can calculate NDVI on the fly using the band arithmetic middleware, making the process of accessing and analyzing the data incredibly simple.

Setup#

The process of accessing Sentinel 2 data and saving it as a Boson Dataset is covered in the Adding a STAC Collection Dataset cookbook. Please follow that cookbook up through the “Testing the Provider” section to create the dataset.

Creating a View of the Dataset With Middleware#

To calculate the NDVI, we need to access the red and near-infrared bands of the Sentinel 2 data. We can do this by creating a view of the dataset and adding the band arithmetic middleware on the view.

The normalized difference middleware is an alias for the band arithmetic middleware, which makes it easier to calculate the NDVI and other normalized differences. Normalized difference is calculated as:

normalized difference=B1B2B1+B2\text{normalized difference} = \frac{B1 - B2}{B1 + B2}

where B1 and B2 are the values of the two bands. In this case, we will use the red band (B04) and the near-infrared band (B08) to calculate the NDVI.

In the normalized difference middleware, we specify the bands we want to use for the calculation in the asset_bands_1 and asset_bands_2 parameters. In the asset_name parameter, we specify the name of the new asset that will be created. With the middleware, the sentinel-2-ndvi view dataset will contain an additional asset called ndvi that contains the normalized difference between Sentinel 2 bands B04 (red) and B08 (nir), also known as the NDVI.

Creating a View of the Dataset with Middleware#
from geodesic.boson import PixelsTransform, AssetBands

ndvi_ds = ds.view(
    name="sentinel-2-ndvi",
    middleware=[
        PixelsTransform.normalized_difference(
            asset_band_1=AssetBands(asset="B04", bands=[0]),  # red
            asset_band_2=AssetBands(asset="B08", bands=[0]),  # nir
            asset_name="ndvi",
        )
    ]
)
ndvi_ds.save()

The resolution is 20m per pixel in the red band, and 10m per pixel in the NIR. We set the pixel size to 10m by 10m, and Boson will handle the resampling.

Creating an Image - Setting the pixel size#
pixsize = [10, 10]

We will look at some corn fields in Idaho, USA in September of 2020, so we set the bounding box, start_date and end_date accordingly.

Creating an Image - Setting the bounding box and datetime#
import datetime

bbox = [-116.730895,43.550601,-116.718450,43.561362]

start_date = datetime.datetime(2020, 9, 1)
end_date = start_date + datetime.timedelta(days=7)

We can now pull the image data using Boson.

Creating an Image - Getting the image data#
boson_output = ndvi_ds.get_pixels(
    bbox=bbox,
    datetime=(start_date, end_date),
    bbox_crs="EPSG:4326",
    output_crs="EPSG:3857",
    asset_bands=AssetBands(asset="ndvi", bands=[0]),
    pixel_size=pixsize,
)

The result is a numpy array that is 1-band x 165-pixels x 139-pixels. It contains the NDVI values for the area of the corn fields in Idaho.

Displaying the image#
import matplotlib.pyplot as plt

plt.imshow(boson_output[0])
plt.title('NDVI of Corn Field in Idaho')
plt.colorbar()
plt.show()
../../_images/ndvi_image.png

Analyzing NDVI with Boson#

We can examine the NDVI over time by pulling weekly images and calculating the median NDVI for each image.

NDVI Time Series#
import numpy as np

start_date_list = [datetime.datetime(2020, 1, 1) + datetime.timedelta(days=time_step * 7) for time_step in range(52)]

ndvi_images = []

for start_date in start_date_list:

    end_date = start_date + datetime.timedelta(days=7)

    boson_output = ndvi_ds.get_pixels(bbox=bbox,
                    datetime=[start_date, end_date],
                    bbox_crs="EPSG:4326",
                    output_crs="EPSG:3857",
                    asset_bands=AssetBands(asset="ndvi", bands=[0]),
                    pixel_size=pixsize)

    ndvi_images.append(boson_output[0])

    start_date = end_date

ndvi_median_series = [np.median(ndvi_image) for ndvi_image in ndvi_images]

# Plot the NDVI time series
plt.plot(start_date_list, ndvi_median_series)
plt.xlabel('Date')
plt.ylabel('NDVI')
plt.title('Weekly NDVI - 2020')

plt.show()
../../_images/ndvi_time_series.png

Get Pixel Options#

Now that we have accessed and analyzed the data, let’s look more closely at some of the options we can use with the Dataset.get_pixels() method.

pixel_size vs. shape#

Above, we set the pixel size to 10m by 10m. This means that each pixel in the resulting image represents a 10m by 10m area on the ground. Suppose, however, that we need the array to be a particular shape. We can set the shape parameter to specify the number of pixels in the x and y directions, and Boson will automatically resample the data to fit the new shape. Only one of pixel_size or shape can be set.

output_shape = [100, 100]

start_date = datetime.datetime(2020, 9, 1)
end_date = start_date + datetime.timedelta(days=7)

boson_output = ndvi_ds.get_pixels(
    bbox=bbox,
    datetime=(start_date, end_date),
    bbox_crs="EPSG:4326",
    output_crs="EPSG:3857",
    asset_bands=AssetBands(asset="ndvi", bands=[0]),
    shape=output_shape
)

plt.imshow(boson_output[0])
plt.title('NDVI of Corn Field in Idaho')
plt.colorbar()

plt.show()
../../_images/ndvi_image_shape.png

output_crs#

Thus far, for the output coordinate reference system (CRS), we have been using EPSG:3857, which is the Web Mercator projection, also called WGS 84. This is the default, and is the most common projection for web maps. But suppose we want to use a more accurate CRS for Idaho, such as the Idaho Transverse Mercator projection, EPSG:8826. We can set the output_crs parameter to this value. Boson will automatically reproject the data to the new CRS. In order to compare the new output to the original NDVI image, we will need to set the shape parameter to match the original image shape, as the number of pixels in the x and y directions will change.

Using a different output_crs#
start_date = datetime.datetime(2020, 9, 1)
end_date = start_date + datetime.timedelta(days=7)

boson_output_8826 = ndvi_ds.get_pixels(
    bbox=bbox,
    datetime=(start_date, end_date),
    bbox_crs="EPSG:4326",
    output_crs="EPSG:8826",
    asset_bands=AssetBands(asset="ndvi", bands=[0]),
    shape=boson_output.shape
)

difference_image = boson_output_8826 - boson_output

plt.imshow(difference_image[0])
plt.title('NDVI Residuals between EPSG:8826 and EPSG:3857')
plt.colorbar()
plt.show()
../../_images/ndvi_residuals_output_crs.png

resampling#

By default, Boson will capture all images that intersect the bounding box and datetime range, and resample them to the output pixel size. Its default behavior is to use the nearest neighbor resampling method. We can change this behavior by setting the resampling parameter to a different value. The resampling methods are based on those found in GDAL, and include nearest, bilinear, cubic, cubicspline, lanczos, average, mode, max, min, median, q1, q3, and sum. For this example, we will zoom out a bit to see the effect of the different resampling methods.

Using a different resampling method#
start_date = datetime.datetime(2020, 9, 1)
end_date = start_date + datetime.timedelta(days=14)
bbox = [-116.765895, 43.515601, -116.68345, 43.596362]

fig, ax = plt.subplots(3, 5, figsize=(12, 6))

resampling_methods = ['nearest', 'bilinear', 'cubic', 'cubicspline', 'lanczos', 'average', 'mode', 'max', 'min', 'median', 'q1', 'q3', 'sum']

for i, method in enumerate(resampling_methods):
    boson_output = ndvi_ds.get_pixels(
        bbox=bbox,
        datetime=(start_date, end_date),
        bbox_crs="EPSG:4326",
        output_crs="EPSG:3857",
        asset_bands=AssetBands(asset="ndvi", bands=[0]),
        pixel_size=pixsize,
        resampling=method
        )
    ax[i//5, i%5].imshow(boson_output[0])
    ax[i//5, i%5].set_title(method)
    ax[i//5, i%5].axis('off')

ax[2, 3].axis('off')
ax[2, 4].axis('off')
fig.suptitle('NDVI with Different Resampling Methods')
plt.show()
../../_images/ndvi_resampling_methods.png