Skip to main content

Adding A CSV Dataset

In this example, we will be using the US Census Bureau's Centers of Population dataset, specifically at the county level. This dataset gives a point location for the population centroid of each county in the US in a CSV format. Files are available broken down by state, but we will be using the file for the entire US available at https://www2.census.gov/geo/docs/reference/cenpop2020/county/CenPop2020_Mean_CO.txt.

For the purposes of this demo, we have a copy of this file saved to a public bucket at gs://geodesic-public-data/CenPop2020_Mean_CO.csv but you might want to copy this file to your own cloud storage (e.g., S3 or Google Cloud Storage) to better replicate your use case. If you want to do this, you can easily substitute the url in the examples with your own cloud storage url.

Setup

First to do some initial setup. We start by importing geodesic:

import geodesic

If you haven't yet, you will need to authenticate geodesic using the following command:

geodesic.authenticate()

This process is covered in more detail in the Quickstart.

We need to set the active project to ensure that our dataset is saved to the correct project. You can fetch a dictionary of existing project that you can access by running

geodesic.account.projects.get_projects. If you are the owner of the project you want to use, you can set the active project by passing its name to geodesic.account.projects.set_active_project. If you are not the owner of the project, you will need to set the active project by passing the uid instead. Note that you will also need read and/or write permissions for the project in question. Finally, if you want to create a new project, you can do so by running:

proj = geodesic.create_project(
name="tutorials",
alias="tutorial Demos Project",
description="Project to hold demos and examples for our tutorial doc pages",
set_as_active=True,
)


Creating The Provider

The geodesic python API provides a method,

geodesic.boson.dataset.Dataset.from_csv, which makes adding a CSV dataset extremely straightforward. To add our population centers dataset, we simply run:

ds = geodesic.Dataset.from_csv(
name='uscb-pop-centers',
url='gs://geodesic-public-data/CenPop2020_Mean_CO.csv',
index_data=True,
crs='EPSG:4326',
x_field='LONGITUDE',
y_field='LATITUDE'
)
ds.save()

#output:
dataset:*:*:*:uscb-pop-centers

The geodesic.Dataset.from_csv method can also be used with non-geospatial data. You can simply leave out the x_field and y_field arguments if they are not present in your dataset.

Testing The Provider

Now to run a quick test to ensure that the provider is working. Let's search run a simple search to check that features are returned

ds.search(limit=3)

This will return the first three features from the CSV in a geopandas GeoDataFrame.

Image

As a slightly less trivial example, let's search for the counties in New York state with populations greater than one million:

from geodesic import cql
feats = ds.search(
limit=10,
filter=cql.CQLFilter.logical_and(
cql.CQLFilter.eq('STNAME', 'New York'),
cql.CQLFilter.gte('POPULATION', 1000000)
)
)
feats

This should return seven features:

Image

Finally, if you have installed the relevant dependencies, you can use the geodesic mapping utilities to visualize these features on a map using the following lines:

from geodesic import mapping
m = mapping.Map(center=[40.956961,-73.591128], zoom = 9)
m.add_feature_collection('pop_centers', feats)
m

If you use the map to navigate to the area of New York, you should see the features from our search:

Image