Dataset¶

class geodesic.entanglement.dataset.Dataset(**obj)[source]¶

Bases: geodesic.entanglement.object.Object

Allows interaction with SeerAI datasets.

Dataset provides a way to interact with datasets in the SeerAI.

Parameters

**obj (dict) – Dictionary with all properties in the dataset.

Variables

alias (str) – Alternative name for the dataset. This name has fewer restrictions on characters and should be human
readable. –

item¶

(dict) - the contents of the dataset definition

Descriptor: _DictDescr

alias¶

(str) - the alias of this object, anything you wish it to be

Descriptor: _StringDescr

data_api¶

(str) - the api to access the data

Descriptor: _StringDescr

item_type¶

(str) - the api to access the data

Descriptor: _StringDescr

item_assets¶

(dict, Asset) - information about assets contained in this dataset

Descriptor: _AssetsDescr

extent¶

(Extent, dict) - spatiotemporal extent of this Dataset

Descriptor: _TypeConstrainedDescr

services¶

(str) - list of services that expose the data for this dataset

Descriptor: _ListDescr

providers¶

list of providers for this dataset

Descriptor: _ListDescr

stac_extensions¶

list of STAC extensions this dataset uses

Descriptor: _ListDescr

links¶

list of links

Descriptor: _ListDescr

metadata¶

(dict) - arbitrary metadata for this dataset

Descriptor: _DictDescr

boson_config¶

(BosonConfig, dict) - boson configuration for this dataset

Descriptor: BosonDescr

version¶

(str) - the version string for this dataset

Descriptor: _StringDescr

create()[source]¶

Creates a new Dataset in Entanglement

Returns: self
Raises: requests.HTTPError – If this failed to create or if the dataset already exists

save()[source]¶

Updates an existing Dataset in Entanglement.

Returns: self
Raises: requests.HTTPError – If this failed to save.

search(bbox=None, datetime=None, limit=10, page_size=500, intersects=None, collections=None, ids=None, filter=None, fields=None, sortby=None, method='POST', extra_params={})[source]¶

Search the dataset for items.

Search this service’s OGC Features or STAC API.

Parameters

bbox – The spatial extent for the query as a bounding box. Example: [-180, -90, 180, 90]
datetime – The temporal extent for the query formatted as a list: [start, end].
limit – The maximum number of items to return in the query. If None, will page through all results
page_size – If retrieving all items, this page size will be used for the subsequent requests
intersects – a geometry to use in the query
collections – a list of collections to search
ids – a list of feature/item IDs to filter to
filter – a CQL2 filter. This is supported by most datasets but will not work for others.
fields – a list of fields to include/exclude. Included fields should be prefixed by ‘+’ and excluded fields by ‘-’. Alernatively, a dict with a ‘include’/’exclude’ lists may be provided
sortby – a list of sortby objects, which are dicts containing “field” and “direction”. Direction may be one of “asc” or “desc”. Not supported by all datasets
method – the HTTP method - POST is default and usually should be left alone unless a server doesn’t support
extra_params – a dict of additional parameters that will be passed along on the request.

Returns

A geodesic.stac.FeatureCollection with all items in the dataset matching the query.

Examples

A query on the sentinel-2-l2a dataset with a given bounding box and time range. Additionally, you can apply filters on the parameters in the items

>>> bbox = geom.bounds
>>> date_range = (datetime.datetime(2020, 12,1), datetime.datetime.now())
>>> ds.search(
...          bbox=bbox,
...          datetime=date_range,
...          filter=CQLFilter.lte("properties.eo:cloud_cover", 10.0)
... )

query(bbox=None, datetime=None, limit=10, page_size=500, intersects=None, collections=None, ids=None, filter=None, fields=None, sortby=None, method='POST', extra_params={})¶

Deprecated in 1.0.0

Search the dataset for items.

Search this service’s OGC Features or STAC API.
Args:
bbox: The spatial extent for the query as a bounding box. Example: [-180, -90, 180, 90] datetime: The temporal extent for the query formatted as a list: [start, end]. limit: The maximum number of items to return in the query. If None, will page through all results page_size: If retrieving all items, this page size will be used for the subsequent requests intersects: a geometry to use in the query collections: a list of collections to search ids: a list of feature/item IDs to filter to filter: a CQL2 filter. This is supported by most datasets but will not work for others. fields: a list of fields to include/exclude. Included fields should be prefixed by ‘+’ and excluded fields by ‘-’. Alernatively, a dict with a ‘include’/’exclude’ lists may be provided sortby: a list of sortby objects, which are dicts containing “field” and “direction”. Direction may be one of “asc” or “desc”. Not supported by all datasets method: the HTTP method - POST is default and usually should be left alone unless a server doesn’t support extra_params: a dict of additional parameters that will be passed along on the request.

Returns:
A geodesic.stac.FeatureCollection with all items in the dataset matching the query.

Examples:
A query on the sentinel-2-l2a dataset with a given bounding box and time range. Additionally, you can apply filters on the parameters in the items
>>> bbox = geom.bounds
>>> date_range = (datetime.datetime(2020, 12,1), datetime.datetime.now())
>>> ds.search(
...          bbox=bbox,
...          datetime=date_range,
...          filter=CQLFilter.lte("properties.eo:cloud_cover", 10.0)
... )

get_pixels(*, bbox, datetime=None, pixel_size=None, shape=None, pixel_dtype=<class 'numpy.float32'>, bbox_crs='EPSG:4326', output_crs='EPSG:3857', resampling='nearest', no_data=None, content_type='raw', asset_bands=[], filter={}, compress=True, bbox_srs=None, output_srs=None, input_nodata=None, output_nodata=None)[source]¶

get pixel data or an image from this Dataset

get_pixels gets requested pixels from a dataset by calling Boson. This method returns either a numpy array or the bytes of a image file (jpg, png, gif, or tiff). If the content_type is “raw”, this will return a numpy array, otherwise it will return the requested image format as bytes that can be written to a file. Where possible, a COG will be returned for Tiff format, but is not guaranteed.

Parameters

bbox – a bounding box to export as imagery (xmin, ymin, xmax, ymax)
datetime – a start and end datetime to query against. Imagery will be filtered to between this range and mosaiced.
pixel_size – a list of the x/y pixel size of the output imagery. This list needs to have length equal to the number of bands. This should be specified in the output spatial reference.
shape – the shape of the output image (rows, cols). Either this or the pixel_size must be specified, but not both.
pixel_dtype – a numpy datatype or string descriptor in numpy format (e.g. <f4) of the output. Most, but not all basic dtypes are supported.
bbox_crs – the spatial reference of the bounding bbox, as a string. May be EPSG:<code>, WKT, Proj4, ProjJSON, etc.
output_crs – the spatial reference of the output pixels.
resampling – a string to select the resampling method.
no_data – in the source imagery, what value should be treated as no data?
content_type – the image format. Default is “raw” which sends raw image bytes that will be converted into a numpy array. If “jpg”, “gif”, or “tiff”, returns the bytes of an image file instead, which can directly be written to disk.
asset_bands – a list containing dictionaries with the keys “asset” and “bands”. Asset should point to an asset in the dataset, and “bands” should list band indices (0-indexed) or band names.
filter – a CQL2 JSON filter to filter images that will be used for the resulting output.
compress – compress bytes when transfering. This will usually, but not always improve performance
input_nodata (deprecated) – in the source imagery, what value should be treated as no data?
output_nodata (deprecated) – what value should be set as the nodata value in the resulting dataset. Only meaningful for tiff outputs.
bbox_srs (deprecated) – the spatial reference of the bounding bbox, as a string. May be EPSG:<code>, WKT, Proj4, ProjJSON, etc.
output_srs (deprecated) – the spatial reference of the output pixels.

Returns

a numpy array or bytes of an image file.

warp(*, bbox, datetime=None, pixel_size=None, shape=None, pixel_dtype=<class 'numpy.float32'>, bbox_crs='EPSG:4326', output_crs='EPSG:3857', resampling='nearest', no_data=None, content_type='raw', asset_bands=[], filter={}, compress=True, bbox_srs=None, output_srs=None, input_nodata=None, output_nodata=None)¶

Deprecated in 1.0.0

get pixel data or an image from this Dataset

get_pixels gets requested pixels from a dataset by calling Boson. This method returns either a numpy array or the bytes of a image file (jpg, png, gif, or tiff). If the content_type is “raw”, this will return a numpy array, otherwise it will return the requested image format as bytes that can be written to a file. Where possible, a COG will be returned for Tiff format, but is not guaranteed.

Args:
bbox: a bounding box to export as imagery (xmin, ymin, xmax, ymax) datetime: a start and end datetime to query against. Imagery will be filtered to between this range and mosaiced. pixel_size: a list of the x/y pixel size of the output imagery. This list needs to have length equal to the number of bands. This should be specified in the output spatial reference. shape: the shape of the output image (rows, cols). Either this or the pixel_size must be specified, but not both. pixel_dtype: a numpy datatype or string descriptor in numpy format (e.g. <f4) of the output. Most, but not all basic dtypes are supported. bbox_crs: the spatial reference of the bounding bbox, as a string. May be EPSG:<code>, WKT, Proj4, ProjJSON, etc. output_crs: the spatial reference of the output pixels. resampling: a string to select the resampling method. no_data: in the source imagery, what value should be treated as no data? content_type: the image format. Default is “raw” which sends raw image bytes that will be converted into a numpy array. If “jpg”, “gif”, or “tiff”, returns the bytes of an image file instead, which can directly be written to disk. asset_bands: a list containing dictionaries with the keys “asset” and “bands”. Asset should point to an asset in the dataset, and “bands” should list band indices (0-indexed) or band names. filter: a CQL2 JSON filter to filter images that will be used for the resulting output. compress: compress bytes when transfering. This will usually, but not always improve performance input_nodata (deprecated): in the source imagery, what value should be treated as no data? output_nodata (deprecated): what value should be set as the nodata value in the resulting dataset. Only meaningful for tiff outputs. bbox_srs (deprecated): the spatial reference of the bounding bbox, as a string. May be EPSG:<code>, WKT, Proj4, ProjJSON, etc. output_srs (deprecated): the spatial reference of the output pixels.

Returns:
a numpy array or bytes of an image file.

dataset_info()[source]¶: returns information about this Dataset

view(name, bbox=None, intersects=None, datetime=None, collections=None, ids=None, filter=None, asset_bands=[], feature_limit=None, middleware={}, cache={}, tile_options={}, domain=None, category=None, type=None, project=None, **kwargs)[source]¶

creates a curated view of a Dataset

This method creates a new Dataset that is a “view” of an existing dataset. This allows the user to provide a set of persistent filters to a Dataset as a separate Object. A view may also be saved in a different Project than the original. The applied filters affect both a query as well as the get_pixels. The final request processed will be the intersection of the view parameters with the query.

Parameters

name – name of the view Dataset
bbox – The spatial extent for the query as a bounding box. Example: [-180, -90, 180, 90]
intersects – a geometry to use in the query
datetime – The temporal extent for the query formatted as a list: [start, end].
collections – a list of collections to search
ids – a list of feature/item IDs to filter to
filter – a CQL2 filter. This is supported by most datasets but will not work for others.
asset_bands – a list of asset/bands combinations to filter this Dataset to
feature_limit – if specified, overrides the max_page_size of the this Dataset
middleware – configure any boson middleware to be applied to the new dataset.
cache – configure caching for this dataset
tile_options – configure tile options for this dataset
domain – domain of the resulting Object
category – category of the resulting Object
type – the type of the resulting Object
project – a new project to save this view to. If None, inherits from the parent Dataset

union(name, others=[], feature_limit=1000, project=None, ignore_duplicate_fields=False, middleware={}, cache={}, tile_options={}, domain='*', category='*', type='*', **kwargs)[source]¶

creates a union of this dataset with a list of others

Creates a new Dataset that is the union of this Dataset with a list of others. If others is an empty list, this creates a union of a dataset with itself, which is essentially a virtual copy of the original endowed with any capabilities Boson adds.

See: geodesic.entanglement.dataset.new_union_dataset()

Parameters

name – the name of the new Dataset
others – a list of Datasets to union
feature_limit – the max size of a results page from a query/search
project – the name of the project this will be assigned to
ignore_duplicate_fields – if True, duplicate fields across providers will be ignored
middleware – configure any boson middleware to be applied to the new dataset.
cache – configure caching for this dataset
tile_options – configure tile options for this dataset

share(servicer, ttl=None)[source]¶

Shares a dataset, producing a token that will allow unauthenticated users to run a proxied boson request

Parameters

servicer – The name of the servicer to use in the boson request.
ttl – The time in until the dataset’s token should expire. Either a timedelta object or seconds Defaults to -1 (no expiration) if not provided.

Raises

requests.HTTPError – If the user is not permitted to access the dataset or if an error occurred

Returns

a share token created by Ted and its corresponding data

share_as_arcgis_service(ttl=None)[source]¶

Share a dataset as a GeoServices/ArcGIS service

Parameters: ttl – The time in until the dataset’s token should expire. Either a timedelta object or seconds Defaults to -1 (no expiration) if not provided.
Raises: requests.HTTPError – If the user is not permitted to access the dataset or if an error occurred
Returns: a share token created by Ted and its corresponding data

share_as_ogc_tiles_service(ttl=None)[source]¶

Share a dataset as a OGC Tiles service

Parameters: ttl – The time in until the dataset’s token should expire. Either a timedelta object or seconds Defaults to -1 (no expiration) if not provided.
Raises: requests.HTTPError – If the user is not permitted to access the dataset or if an error occurred
Returns: a share token created by Ted and its corresponding data

command(command, **kwargs)[source]¶

issue a command to this dataset’s provider

Commands can be used to perform operations on a dataset such as reindexing. Most commands run in the background and will return immediately. If a command is successfully submitted, this should return a message {“success”: True}, otherwise it will raise an exception with the error message.

Parameters

command – the name of the command to issue. Providers supporting “reindex” will accept this command.
**kwargs – additional arguments passed to this command.

static from_snowflake_table(name, account, database, table, credential, schema='public', warehouse=None, id_column=None, geometry_column=None, datetime_column=None, feature_limit=8000, middleware={}, cache={}, tile_options={}, domain='*', category='*', type='*', **kwargs)[source]¶

create a Dataset from a Snowflake table.

This method creates a new Dataset from an existing Snowflake table.

Parameters

name – name of the Dataset
account – Snowflake account name
database – Snowflake database that contains the table
table – name of the Snowflake table
credential – name of a credential to access table. Either basic auth or oauth2 refresh token are supported
schema – Snowflake schema the table resides in
warehouse – name of the Snowflake warehouse to use
id_column – name of the column containing a unique identifier. Integer IDs preferred, but not required
geometry_column – name of the column containing the primary geometry for spatial filtering.
datetime_column – name of the column containing the primary datetime field for temporal filtering.
feature_limit – max number of results to return in a single page from a search
middleware – configure any boson middleware to be applied to the new dataset.
cache – configure caching for this dataset
tile_options – configure tile options for this dataset

static from_arcgis_item(name, item_id, arcgis_instance='https://www.arcgis.com/', credential=None, layer_id=None, middleware={}, cache={}, tile_options={}, domain='*', category='*', type='*', gis=None, **kwargs)[source]¶

creates a new Dataset from an ArcGIS Online/Enterprise item

Parameters

name – name of the Dataset to create
item_id – the item ID of the ArcGIS Item Referenced
arcgis_instance – the base url of the ArcGIS Online or Enterprise root. Defaults to AGOL, MUST be specified for ArcGIS Enterprise instances
credential – the name or uid of a credential required to access this. Currently, this must be the client credentials of an ArcGIS OAuth2 Application. Public layers do not require credentials.
layer_id – an integer layer ID to subset a service’s set of layers.
middleware – configure any boson middleware to be applied to the new dataset.
cache – configure caching for this dataset
tile_options – configure tile options for this dataset
gis – the logged in arcgis.gis.GIS to use to access the metadata for this item. To access secure content, if this is not specified, the active GIS is used.

Returns

a new Dataset.

Examples

>>> ds = Dataset.from_arcgis_item(
...          name="my-dataset",
...          item_id="abc123efghj34234kxlk234joi",
...          credential="my-arcgis-creds"
... )
>>> ds.save()

static from_arcgis_layer(name, url, arcgis_instance='https://www.arcgis.com', credential=None, middleware={}, cache={}, tile_options={}, domain='*', category='*', type='*', gis=None, **kwargs)[source]¶

creates a new Dataset from an ArcGIS Online/Enterprise Service URL

Parameters

name – name of the Dataset to create
url – the URL of the Feature, Image, or Map Server. This is the layer url, not the Service url. Only the specified layer will be available to the dataset
arcgis_instance – the base url of the ArcGIS Online or Enterprise root. Defaults to AGOL, MUST be specified for ArcGIS Enterprise instances
credential – the name or uid of a credential required to access this. Currently, this must be the client credentials of an ArcGIS OAuth2 Application. Public layers do not require credentials.
middleware – configure any boson middleware to be applied to the new dataset.
cache – configure caching for this dataset
tile_options – configure tile options for this dataset
gis – the logged in arcgis.gis.GIS to use to access the metadata for this item. To access secure content, if this is not specified, the active GIS is used.

Returns

a new Dataset.

Examples

>>> ds = Dataset.from_arcgis_layer(
...          name="my-dataset",
...          url="https://services9.arcgis.com/ABC/arcgis/rest/services/SomeLayer/FeatureServer/0",
...          credential="my-arcgis-creds"
... )
>>> ds.save()

static from_arcgis_service(name, url, arcgis_instance='https://www.arcgis.com', credential=None, layer_id=None, middleware={}, cache={}, tile_options={}, domain='*', category='*', type='*', gis=None, **kwargs)[source]¶

creates a new Dataset from an ArcGIS Online/Enterprise Service URL

Parameters

name – name of the Dataset to create
url – the URL of the Feature, Image, or Map Server. This is not the layer url, but the Service url. Layers will be enumerated and all accessible from this dataset.
arcgis_instance – the base url of the ArcGIS Online or Enterprise root. Defaults to AGOL, MUST be specified for ArcGIS Enterprise instances
credential – the name or uid of a credential required to access this. Currently, this must be the client credentials of an ArcGIS OAuth2 Application. Public layers do not require credentials.
layer_id – an integer layer ID to subset a service’s set of layers.
middleware – configure any boson middleware to be applied to the new dataset.
cache – configure caching for this dataset
tile_options – configure tile options for this dataset
gis – the logged in arcgis.gis.GIS to use to access the metadata for this item. To access secure content, if this is not specified, the active GIS is used.

Returns

a new Dataset.

Examples

>>> ds = Dataset.from_arcgis_service(
...          name="my-dataset",
...          url="https://services9.arcgis.com/ABC/arcgis/rest/services/SomeLayer/FeatureServer",
...          credential="my-arcgis-creds"
... )
>>> ds.save()

static from_stac_collection(name, url, credential=None, item_type='raster', middleware={}, cache={}, tile_options={}, domain='*', category='*', type='*', **kwargs)[source]¶

Create a new Dataset from a STAC Collection

Parameters

name – name of the Dataset to create
url – the url to the collection (either STAC API or OGC API: Features)
credential – name or uid of the credential to access the API
item_type – what type of items does this contain? “raster” for raster data, “features” for features, other types, such as point_cloud may be specified, but doesn’t alter current internal functionality.
middleware – configure any boson middleware to be applied to the new dataset.
cache – configure caching for this dataset
tile_options – configure tile options for this dataset

Returns

a new Dataset.

Examples

>>> ds = Dataset.from_stac_collection(
...          name="landsat-c2l2alb-sr-usgs",
...          url="https://landsatlook.usgs.gov/stac-server/collections/landsat-c2l2alb-sr"
...)
>>> ds.save()

static from_bucket(name, url, pattern=None, region=None, datetime_field=None, start_datetime_field=None, end_datetime_field=None, oriented=False, credential=None, middleware={}, cache={}, tile_options={}, domain='*', category='*', type='*', **kwargs)[source]¶

Creates a new Dataset from a Cloud Storage Bucket (S3/GCP/Azure)

Parameters

name – name of the Dataset to create
url – the url to the bucket, including the prefix (ex. s3://my-bucket/myprefix, gs://my-bucket/myprefix, …)
pattern – a regex to filter for files to index
region – for S3 buckets, the region where the bucket is
datetime_field – the name of the metadata key on the file to find a timestamp
start_datetime_field – the name of the metadata key on the file to find a start timestamp
end_datetime_field – the name of the metadata key on the file to find an end timestamp
oriented – Is this oriented imagery? If so, EXIF data will be parsed for geolocation. Anything missing location info will be dropped.
credential – the name or uid of the credential to access the bucket.
middleware – configure any boson middleware to be applied to the new dataset.
kwargs – other metadata that will be set on the Dataset, such as description, alias, etc

Returns

a new Dataset.

Examples

>>> ds = Dataset.from_bucket(
...          name="bucket-dataset",
...          url="s3://my-bucket/myprefix",
...          pattern=r".*\.tif",
...          region="us-west-2",
...          datetime_field="TIFFTAG_DATETIME",
...          oriented=False,
...          credential="my-iam-user",
...          description="my dataset is the bomb"
...)
>>> ds.save()

static from_google_earth_engine(name, asset, credential, folder='projects/earthengine-public/assets', url='https://earthengine-highvolume.googleapis.com', middleware={}, cache={}, tile_options={}, domain='*', category='*', type='*', **kwargs)[source]¶

Creates a new Dataset from a Google Earth Engine Asset

Parameters

name – name of the Dataset to create
asset – the asset in GEE to use (ex. ‘LANDSAT/LC09/C02/T1_L2’)
credential – the credential to access this, a Google Earth Engine GCP Service Account. Future will allow the use of a oauth2 refresh token or other.
folder – by default this is the earth engine public, but you can specify another folder if needed to point to legacy data or personal projects.
url – the GEE url to use, defaults to the recommended high volume endpoint.
kwargs – other metadata that will be set on the Dataset, such as description, alias, etc
middleware – configure any boson middleware to be applied to the new dataset.
cache – configure caching for this dataset
tile_options – configure tile options for this dataset

Returns

a new Dataset.

Examples

>>> ds = Dataset.from_google_earth_engine(
...          name="landsat-9-c2-gee",
...          asset="s3://my-bucket/myprefixLANDSAT/LC09/C02/T1_L2",
...          credential="google-earth-engine-svc-account",
...          description="my dataset is the bomb"
...)
>>> ds.save()

static from_elasticsearch_index(name, url, index_pattern, credential=None, storage_credential=None, datetime_field='properties.datetime', geometry_field='geometry', geometry_type='geo_shape', id_field='_id', data_api='features', item_type='other', feature_limit=2000, middleware={}, cache={}, tile_options={}, domain='*', category='*', type='*', **kwargs)[source]¶

create a new Dataset from an elasticsearch index containing geojson features or STAC items

Parameters

name – name of the Dataset to create
url – the DNS name or IP of the elasticsearch host to connect to.
index_pattern – an elasticsearch index name or index pattern
credential – name of the Credential object to use. Currently, this only supports basic auth (username/password).
storage_credential – the name of the Credential object to use for storage if any of the data referenced in the index requires a credential to access (e.g. cloud storage for STAC)
datetime_field – the field that is used to search by datetime in the elasticserach index.
geometry_field – the name of the field that contains the geometry
geometry_type – the type of the geometry field, either geo_shape or geo_point
id_field – the name of the field to use as an ID field
data_api – the data API, either ‘stac’ or ‘features’
item_type – the type of item. If it’s a stac data_api, then it should describe what the data is
feature_limit – the max number of features the service will return per page.
middleware – configure any boson middleware to be applied to the new dataset.
**kwargs – other arguments that will be used to create the collection and provider config.

Returns

A new Dataset. Must call .save() for it to be usable.

static from_csv(name, url, index_data=True, crs='EPSG:4326', x_field='CoordX', y_field='CoordY', z_field='CoordZ', geom_field='WKT', feature_limit=1000, region=None, credential=None, middleware={}, cache={}, tile_options={}, domain='*', category='*', type='*', **kwargs)[source]¶

create a new Dataset from a CSV file in cloud storage

Parameters

name – name of the Dataset to create
url – the URL/URI of the data. Can be a cloud storage URI such as s3://<bucket>/key, gs://
index_data – if true, the data will be copied and spatially indexed for more efficient queries
crs – a string coordinate reference for the data
(x/y/z)_field – the field name for the x/y/z fields
geom_field – the field name containing the geometry in well known text (WKT) or hex encoded well known binary (WKB).
feature_limit – the max number of features this will return per page
region – for S3 buckets, the region where the bucket is
credential – the name of the credential object needed to access this data.
middleware – configure any boson middleware to be applied to the new dataset.

static from_tabular_data(name, url, index_data=True, crs='EPSG:4326', feature_limit=1000, region=None, credential=None, middleware={}, cache={}, tile_options={}, domain='*', category='*', type='*', **kwargs)[source]¶

create a new Dataset from a vector file in cloud storage.

This can be a Shapefile, GeoJSON Feature Collection, FlatGeobuf, and several others

Parameters

name – name of the Dataset to create
url – the URL/URI of the data. Can be a cloud storage URI such as s3://<bucket>/key, gs://
index_data – if true, the data will be copied and spatially indexed for more efficient queries
crs – a string coordinate reference for the data
feature_limit – the max number of features this will return per page
region – for S3 buckets, the region where the bucket is
credential – the name of the credential object needed to access this data.
middleware – configure any boson middleware to be applied to the new dataset.
cache – configure caching for this dataset
tile_options – configure tile options for this dataset

static from_geoparquet(name, url, feature_limit=1000, datetime_field='datetime', return_geometry_properties=False, expose_partitions_as_layer=True, update_existing_index=True, credential=None, middleware={}, cache={}, tile_options={}, domain='*', category='*', type='*', **kwargs)[source]¶

creates a dataset from Hive-partitioned GeoParquet files in cloud storage

Hive-partition GeoParquet is a particular convention typically used when writing data out from a parallel process (such as Tesseract or Apache Spark) or when the individual file sizes or row counts are too large. This provider indexes these partitions spatially to optimize query performance. Hive partitioned parquet is organized like this and we require this structure:

prefix/<root>.parquet: /key=value_1/<partition-00001>.parquet /key=value_2/<partition-00002>.parquet /… /key=value_m/<partition-n>.parquet

“root” and “partition-xxxxx” can be whatever provided they both have the parquet suffix. Any number oof key/value pairs are allowed in Hive Partitioned data. This can also point to a single parquet file.

Parameters

name – name of the Dataset to create
url – the path to the <root>.parquet. Format depends on the storage backend.
feature_limit – the max number of features that this provider will allow returned by a single query.
datetime_field – if the data is time enabled, this is the name of the datetime field.
return_geometry_properties – if True, will compute and return geometry properties along with the features.
expose_partitions_as_layer – this will create a collection/layer in this Dataset that simply has the partition bounding box and count of features within. Can be used as a simple heatmap
update_existing_index – if the data has been indexed in our scheme by a separate process, set to False to use that instead, otherwise this will index the parquet data in the bucket before you are able to query it.
credential – the name of the credential to access the data in cloud storage.
middleware – configure any boson middleware to be applied to the new dataset.
cache – configure caching for this dataset
tile_options – configure tile options for this dataset
**kwargs – additional arguments that will be used to create the STAC collection, Dataset description Alias, etc.

static from_remote_provider(name, url, data_api='features', transport_protocol='http', feature_limit=2000, credential=None, middleware={}, cache={}, tile_options={}, domain='*', category='*', type='*', **kwargs)[source]¶

Creates a dataset from a server implementing the Boson remote provider interface

The Boson Remote Provider interface may be implemented using the Boson Python SDK (https://pypi.org/project/boson-sdk/). The provider must be hosted somewhere and this connects Boson to a remote provider.

Remote Providers may either implement the Search or the Pixels endpoint (or both).

Parameters

name – name of the Dataset to create
url – URL of the server implementing the interface
data_api – either ‘features’ or ‘raster’.
transport_protocol – either ‘http’ or ‘grpc’
credential – the name of the credential to access the api.
middleware – configure any boson middleware to be applied to the new dataset.
cache – configure caching for this dataset
tile_options – configure tile options for this dataset
**kwargs – additional arguments that will be used to create the STAC collection, Dataset description Alias, etc.

Dataset¶

Docs

Tutorials

Resources