Skip to main content

Spatiotemporal Data Primer

In this primer we will cover the basics of spatiotemporal data, the unique challenges and opportunities that come with this type of data, and the standard data formats that are commonly used. If you are already familiar with spatiotemporal data, you can skip this section and move on.

What is Spatiotemporal Data?

When we say spatiotemporal data, we mean data that has a space component or a time component, both, or neither. That is to say, we consider all data to be spatiotemporal. Obviously we cant take advantage of the spatial or temporal components of data that don't have them, but by treating all data as spatiotemporal we can build systems that are more flexible and can handle a wider variety of complex data types.

Types of Spatiotemporal Data

Spatiotemporal data can be broken down into to main categories: Raster and feature. These words may be unfamiliar to you, bur you are likely already familiar with the concepts. Raster data is data that is represented as a grid of values, like an image or a numpy array. Feature data is data that can be represented as rows in a table, like a CSV file or a database table. Feature data also will have a column of data that represents its geometry. This geometry can be points, polygons, or lines. This data is also called vector data. The terms raster and feature are commonly used in Geographic Information Systems (GIS) and you will often see them used in the context of spatial data. We may switch between the terms raster and array, or feature and table, but they are all referring to the same concepts.

An example of raster/image/array data.

An example of feature/table/vector data.

Although we often talk about raster and feature data, these are just the common image and table types you are used to but with some extra information that tells us about where it belongs on the Earth. More on that later.

What Makes Data Spatial

As we mentioned earlier, spatial data is just like the normal types of data that you are used to except that they have some extra information that tells us where they belong on the Earth. This information is called the spatial reference or coordinate reference system (CRS). There are many different spatial references that are used for different purposes, but they all serve the same basic function. Often times you will find that different datasets have different spatial references. This can make it very difficult to work with multiple datasets at once, but this is one of the problems that the Geodesic Platform is designed to solve.

For raster data, the spatial reference tells us how big each pixel is and where the top left corner of the image is located on the Earth. As many of you know, the Earth is round. This poses problems when trying to take rectangular data like images and put them on the Earth. The array data in an image must be stretched and warped to fit the Earth's surface. This is called a map projection. We dont need to get into the details of projections or CRSs here, but know that there are good reasons to pick one of another and so just picking a single one for everyone to work with is not a good solution. Unfortunately, this means we often need to convert between these different coordinate systems and projects in order to work with data.

Feature data has similar problems. The geometry of a feature or vector is still defined as coordinate pairs for each point, and these point still need to be placed on the surface of the Earth. We also get new, fun problems when dealing with vector data. When you draw a line between two points on the Earth, they dont actually follow a straight line, but a curve which means that masuring distances can be tricky. This is because the Earth is round. At least if evidence based science and mathematics are to be believed.

More on Raster Data

When we talk about an image, we are generally talking about an array that is a grid of values in the X and Y directions, but also has a third dimension that contains the color information. This 3rd axis most often just has 3 values, red, green, and blue for a color image. In the realm of spatial data, we call these "bands", so for a color image we would say it has 3 bands. These values also normally would be either integer numbers from 0 to 255 or floating point numbers from 0 to 1. This covers most of the images that you are used to working with if you haven't worked with spatial data.

When dealing with spatial rasters however, especially remote sensing data, you will mostly see many more bands than 3. For instance the Sentinel-2 satellite has 12 bands of data, plus 10 or so quality or other metadata bands. These bands represent much more than the visible spectrum and include things like infrared and aerosol measurements. These bands also tend to not be in the same range as the 0-255 or 0-1 of common image formats. They could be any precision of integer or float and even single bits in some cases.

More on Feature Data

Like we talked about earlier, feature data is usually represented as a table of values. This could be a CSV file, a database table, or any other tabular data format. The geometry of the data is represented as a column in the table often called GEOMETRY. This column will contain the coordinates of the geometry in some format. Often this geometry will be in a format called GeoJSON, which is a JSON format that contains the coordinates of the geometry and some metadata about the geometry. This format is very common in the web mapping world and is easy to work with in most programming languages. There are many other formats that are used for storing geometry data, such as Well Known Text (WKT), Well Known Binary (WKB).

Each row in the table represents a single feature, although the geometry of that feature could be a single point or polygon, or could be a multipart geometry such as a multipolygon. Often times the entire table of features is called a Feature Collection. These feature collections should have only one type of geometry in them and since they are all in the same table every feature has the same schema, that is the columns of the table are the same for every row.