Shortcuts

Dataset

class geodesic.entanglement.dataset.Dataset(**obj)[source]

Bases: geodesic.entanglement.object.Object

Allows interaction with SeerAI datasets.

Dataset provides a way to interact with datasets in the SeerAI.

Parameters

**obj (dict) – Dictionary with all properties in the dataset.

Variables
  • alias (str) – Alternative name for the dataset. This name has fewer restrictions on characters and should be human

  • readable.

item

(dict) - the contents of the dataset definition

Descriptor: _DictDescr

alias

(str) - the alias of this object, anything you wish it to be

Descriptor: _StringDescr

data_api

(str) - the api to access the data

Descriptor: _StringDescr

item_type

(str) - the api to access the data

Descriptor: _StringDescr

item_assets

(dict, Asset) - information about assets contained in this dataset

Descriptor: _AssetsDescr

extent

(Extent, dict) - spatiotemporal extent of this Dataset

Descriptor: _TypeConstrainedDescr

services

(str) - list of services that expose the data for this dataset

Descriptor: _ListDescr

providers

list of providers for this dataset

Descriptor: _ListDescr

stac_extensions

list of STAC extensions this dataset uses

Descriptor: _ListDescr

list of links

Descriptor: _ListDescr

metadata

(dict) - arbitrary metadata for this dataset

Descriptor: _DictDescr

boson_config

(BosonConfig, dict) - boson configuration for this dataset

Descriptor: BosonDescr

version

(str) - the version string for this dataset

Descriptor: _StringDescr

create()[source]

Creates a new Dataset in Entanglement

Returns

self

Raises

requests.HTTPError – If this failed to create or if the dataset already exists

save()[source]

Updates an existing Dataset in Entanglement.

Returns

self

Raises

requests.HTTPError – If this failed to save.

search(bbox=None, datetime=None, limit=10, page_size=500, intersects=None, collections=None, ids=None, filter=None, fields=None, sortby=None, method='POST', extra_params={})[source]

Search the dataset for items.

Search this service’s OGC Features or STAC API.

Parameters
  • bbox (Optional[List]) – The spatial extent for the query as a bounding box. Example: [-180, -90, 180, 90]

  • datetime (Optional[Union[List, Tuple]]) – The temporal extent for the query formatted as a list: [start, end].

  • limit (Optional[Union[bool, int]]) – The maximum number of items to return in the query. If None, will page through all results

  • page_size (Optional[int]) – If retrieving all items, this page size will be used for the subsequent requests

  • intersects (Optional[object]) – a geometry to use in the query

  • collections (Optional[List[str]]) – a list of collections to search

  • ids (Optional[List[str]]) – a list of feature/item IDs to filter to

  • filter (Optional[Union[geodesic.cql.CQLFilter, dict]]) – a CQL2 filter. This is supported by most datasets but will not work for others.

  • fields (Optional[dict]) – a list of fields to include/exclude. Included fields should be prefixed by ‘+’ and excluded fields by ‘-’. Alernatively, a dict with a ‘include’/’exclude’ lists may be provided

  • sortby (Optional[dict]) – a list of sortby objects, which are dicts containing “field” and “direction”. Direction may be one of “asc” or “desc”. Not supported by all datasets

  • method (str) – the HTTP method - POST is default and usually should be left alone unless a server doesn’t support

  • extra_params (Optional[dict]) – a dict of additional parameters that will be passed along on the request.

Returns

A geodesic.stac.FeatureCollection with all items in the dataset matching the query.

Examples

A query on the sentinel-2-l2a dataset with a given bounding box and time range. Additionally, you can apply filters on the parameters in the items

>>> bbox = geom.bounds
>>> date_range = (datetime.datetime(2020, 12,1), datetime.datetime.now())
>>> ds.search(
...          bbox=bbox,
...          datetime=date_range,
...          filter=CQLFilter.lte("properties.eo:cloud_cover", 10.0)
... )
query(bbox=None, datetime=None, limit=10, page_size=500, intersects=None, collections=None, ids=None, filter=None, fields=None, sortby=None, method='POST', extra_params={})

Deprecated in 1.0.0

Search the dataset for items.

Search this service’s OGC Features or STAC API.

Args:

bbox: The spatial extent for the query as a bounding box. Example: [-180, -90, 180, 90] datetime: The temporal extent for the query formatted as a list: [start, end]. limit: The maximum number of items to return in the query. If None, will page through all results page_size: If retrieving all items, this page size will be used for the subsequent requests intersects: a geometry to use in the query collections: a list of collections to search ids: a list of feature/item IDs to filter to filter: a CQL2 filter. This is supported by most datasets but will not work for others. fields: a list of fields to include/exclude. Included fields should be prefixed by ‘+’ and excluded fields by ‘-’. Alernatively, a dict with a ‘include’/’exclude’ lists may be provided sortby: a list of sortby objects, which are dicts containing “field” and “direction”. Direction may be one of “asc” or “desc”. Not supported by all datasets method: the HTTP method - POST is default and usually should be left alone unless a server doesn’t support extra_params: a dict of additional parameters that will be passed along on the request.

Returns:

A geodesic.stac.FeatureCollection with all items in the dataset matching the query.

Examples:

A query on the sentinel-2-l2a dataset with a given bounding box and time range. Additionally, you can apply filters on the parameters in the items

>>> bbox = geom.bounds
>>> date_range = (datetime.datetime(2020, 12,1), datetime.datetime.now())
>>> ds.search(
...          bbox=bbox,
...          datetime=date_range,
...          filter=CQLFilter.lte("properties.eo:cloud_cover", 10.0)
... )
get_pixels(*, bbox, datetime=None, pixel_size=None, shape=None, pixel_dtype=<class 'numpy.float32'>, bbox_crs='EPSG:4326', output_crs='EPSG:3857', resampling='nearest', no_data=None, content_type='raw', asset_bands=[], filter={}, compress=True, bbox_srs=None, output_srs=None, input_nodata=None, output_nodata=None)[source]

get pixel data or an image from this Dataset

get_pixels gets requested pixels from a dataset by calling Boson. This method returns either a numpy array or the bytes of a image file (jpg, png, gif, or tiff). If the content_type is “raw”, this will return a numpy array, otherwise it will return the requested image format as bytes that can be written to a file. Where possible, a COG will be returned for Tiff format, but is not guaranteed.

Parameters
  • bbox (list) – a bounding box to export as imagery (xmin, ymin, xmax, ymax)

  • datetime (Optional[Union[List, Tuple]]) – a start and end datetime to query against. Imagery will be filtered to between this range and mosaiced.

  • pixel_size (Optional[list]) – a list of the x/y pixel size of the output imagery. This list needs to have length equal to the number of bands. This should be specified in the output spatial reference.

  • shape (Optional[list]) – the shape of the output image (rows, cols). Either this or the pixel_size must be specified, but not both.

  • pixel_dtype (Union[numpy.dtype, str]) – a numpy datatype or string descriptor in numpy format (e.g. <f4) of the output. Most, but not all basic dtypes are supported.

  • bbox_crs (str) – the spatial reference of the bounding bbox, as a string. May be EPSG:<code>, WKT, Proj4, ProjJSON, etc.

  • output_crs (str) – the spatial reference of the output pixels.

  • resampling (str) – a string to select the resampling method.

  • no_data (Optional[Any]) – in the source imagery, what value should be treated as no data?

  • content_type (str) – the image format. Default is “raw” which sends raw image bytes that will be converted into a numpy array. If “jpg”, “gif”, or “tiff”, returns the bytes of an image file instead, which can directly be written to disk.

  • asset_bands (List[geodesic.boson.asset_bands.AssetBands]) – a list containing dictionaries with the keys “asset” and “bands”. Asset should point to an asset in the dataset, and “bands” should list band indices (0-indexed) or band names.

  • filter (dict) – a CQL2 JSON filter to filter images that will be used for the resulting output.

  • compress (bool) – compress bytes when transfering. This will usually, but not always improve performance

  • input_nodata (deprecated) – in the source imagery, what value should be treated as no data?

  • output_nodata (deprecated) – what value should be set as the nodata value in the resulting dataset. Only meaningful for tiff outputs.

  • bbox_srs (deprecated) – the spatial reference of the bounding bbox, as a string. May be EPSG:<code>, WKT, Proj4, ProjJSON, etc.

  • output_srs (deprecated) – the spatial reference of the output pixels.

Returns

a numpy array or bytes of an image file.

warp(*, bbox, datetime=None, pixel_size=None, shape=None, pixel_dtype=<class 'numpy.float32'>, bbox_crs='EPSG:4326', output_crs='EPSG:3857', resampling='nearest', no_data=None, content_type='raw', asset_bands=[], filter={}, compress=True, bbox_srs=None, output_srs=None, input_nodata=None, output_nodata=None)

Deprecated in 1.0.0

get pixel data or an image from this Dataset

get_pixels gets requested pixels from a dataset by calling Boson. This method returns either a numpy array or the bytes of a image file (jpg, png, gif, or tiff). If the content_type is “raw”, this will return a numpy array, otherwise it will return the requested image format as bytes that can be written to a file. Where possible, a COG will be returned for Tiff format, but is not guaranteed.

Args:

bbox: a bounding box to export as imagery (xmin, ymin, xmax, ymax) datetime: a start and end datetime to query against. Imagery will be filtered to between this range and mosaiced. pixel_size: a list of the x/y pixel size of the output imagery. This list needs to have length equal to the number of bands. This should be specified in the output spatial reference. shape: the shape of the output image (rows, cols). Either this or the pixel_size must be specified, but not both. pixel_dtype: a numpy datatype or string descriptor in numpy format (e.g. <f4) of the output. Most, but not all basic dtypes are supported. bbox_crs: the spatial reference of the bounding bbox, as a string. May be EPSG:<code>, WKT, Proj4, ProjJSON, etc. output_crs: the spatial reference of the output pixels. resampling: a string to select the resampling method. no_data: in the source imagery, what value should be treated as no data? content_type: the image format. Default is “raw” which sends raw image bytes that will be converted into a numpy array. If “jpg”, “gif”, or “tiff”, returns the bytes of an image file instead, which can directly be written to disk. asset_bands: a list containing dictionaries with the keys “asset” and “bands”. Asset should point to an asset in the dataset, and “bands” should list band indices (0-indexed) or band names. filter: a CQL2 JSON filter to filter images that will be used for the resulting output. compress: compress bytes when transfering. This will usually, but not always improve performance input_nodata (deprecated): in the source imagery, what value should be treated as no data? output_nodata (deprecated): what value should be set as the nodata value in the resulting dataset. Only meaningful for tiff outputs. bbox_srs (deprecated): the spatial reference of the bounding bbox, as a string. May be EPSG:<code>, WKT, Proj4, ProjJSON, etc. output_srs (deprecated): the spatial reference of the output pixels.

Returns:

a numpy array or bytes of an image file.

dataset_info()[source]

returns information about this Dataset

view(name, bbox=None, intersects=None, datetime=None, collections=None, ids=None, filter=None, asset_bands=[], feature_limit=None, middleware={}, cache={}, tile_options={}, domain=None, category=None, type=None, project=None, **kwargs)[source]

creates a curated view of a Dataset

This method creates a new Dataset that is a “view” of an existing dataset. This allows the user to provide a set of persistent filters to a Dataset as a separate Object. A view may also be saved in a different Project than the original. The applied filters affect both a query as well as the get_pixels. The final request processed will be the intersection of the view parameters with the query.

Parameters
  • name (str) – name of the view Dataset

  • bbox (Optional[Union[List, Tuple]]) – The spatial extent for the query as a bounding box. Example: [-180, -90, 180, 90]

  • intersects (Optional[object]) – a geometry to use in the query

  • datetime (Optional[Union[List, Tuple]]) – The temporal extent for the query formatted as a list: [start, end].

  • collections (Optional[List[str]]) – a list of collections to search

  • ids (Optional[List[str]]) – a list of feature/item IDs to filter to

  • filter (Optional[Union[geodesic.cql.CQLFilter, dict]]) – a CQL2 filter. This is supported by most datasets but will not work for others.

  • asset_bands (list) – a list of asset/bands combinations to filter this Dataset to

  • feature_limit (Optional[int]) – if specified, overrides the max_page_size of the this Dataset

  • middleware (geodesic.boson.middleware.MiddlewareConfig) – configure any boson middleware to be applied to the new dataset.

  • cache (geodesic.boson.boson.CacheConfig) – configure caching for this dataset

  • tile_options (geodesic.boson.tile_options.TileOptions) – configure tile options for this dataset

  • domain (Optional[str]) – domain of the resulting Object

  • category (Optional[str]) – category of the resulting Object

  • type (Optional[str]) – the type of the resulting Object

  • project (Optional[str]) – a new project to save this view to. If None, inherits from the parent Dataset

union(name, others=[], feature_limit=1000, project=None, ignore_duplicate_fields=False, middleware={}, cache={}, tile_options={}, domain='*', category='*', type='*', **kwargs)[source]

creates a union of this dataset with a list of others

Creates a new Dataset that is the union of this Dataset with a list of others. If others is an empty list, this creates a union of a dataset with itself, which is essentially a virtual copy of the original endowed with any capabilities Boson adds.

See: geodesic.entanglement.dataset.new_union_dataset()

Parameters
join(name, key, right_dataset, right_key, drop_duplicates=False, drop_fields=[], right_drop_fields=[], suffix='_left', right_suffix='_right', use_geometry='left', feature_limit=1000, project=None, middleware={}, cache={}, tile_options={}, domain='*', category='*', type='*', **kwargs)[source]

creates a left join of this dataset with another dataset

See: geodesic.entanglement.dataset.new_join_dataset()

Parameters
  • name (str) – the name of the new Dataset

  • key (str) – the name of the field in this dataset to join on. This key must exist for there to be output. An error will be thrown if the key does not exist for 50% of the features in a query.

  • right_dataset (geodesic.entanglement.dataset.Dataset) – the dataset to join with

  • right_key (str) – the name of the field in the right dataset to join on.

  • drop_fields (List[str]) – a list of fields to drop from this dataset

  • right_drop_fields (List[str]) – a list of fields to drop from the right dataset

  • suffix (str) – the suffix to append to fields from this dataset

  • right_suffix (str) – the suffix to append to fields from the right dataset

  • use_geometry (str) – which geometry to use in the join. “left” will use the left dataset’s geometry, “right” will use the right dataset’s geometry

  • drop_duplicates (bool) – if True, duplicate fields across providers will be ignored

  • feature_limit (int) – the max size of a results page from a query/search

  • project (Optional[Union[geodesic.account.projects.Project, str]]) – the name of the project this will be assigned to

  • ignore_duplicate_fields – if True, duplicate fields across providers will be ignored

  • middleware (geodesic.boson.middleware.MiddlewareConfig) – configure any boson middleware to be applied to the new dataset.

  • cache (geodesic.boson.boson.CacheConfig) – configure caching for this dataset

  • tile_options (geodesic.boson.tile_options.TileOptions) – configure tile options for this dataset

share(servicer, ttl=None)[source]

Shares a dataset, producing a token that will allow unauthenticated users to run a proxied boson request

Parameters
  • servicer (str) – The name of the servicer to use in the boson request.

  • ttl (Optional[Union[datetime.timedelta, int, float]]) – The time in until the dataset’s token should expire. Either a timedelta object or seconds Defaults to -1 (no expiration) if not provided.

Raises

requests.HTTPError – If the user is not permitted to access the dataset or if an error occurred

Returns

a share token created by Ted and its corresponding data

share_as_arcgis_service(ttl=None)[source]

Share a dataset as a GeoServices/ArcGIS service

Parameters

ttl (Optional[Union[datetime.timedelta, int, float]]) – The time in until the dataset’s token should expire. Either a timedelta object or seconds Defaults to -1 (no expiration) if not provided.

Raises

requests.HTTPError – If the user is not permitted to access the dataset or if an error occurred

Returns

a share token created by Ted and its corresponding data

share_as_ogc_tiles_service(ttl=None)[source]

Share a dataset as a OGC Tiles service

Parameters

ttl (Optional[Union[datetime.timedelta, int, float]]) – The time in until the dataset’s token should expire. Either a timedelta object or seconds Defaults to -1 (no expiration) if not provided.

Raises

requests.HTTPError – If the user is not permitted to access the dataset or if an error occurred

Returns

a share token created by Ted and its corresponding data

command(command, **kwargs)[source]

issue a command to this dataset’s provider

Commands can be used to perform operations on a dataset such as reindexing. Most commands run in the background and will return immediately. If a command is successfully submitted, this should return a message {“success”: True}, otherwise it will raise an exception with the error message.

Parameters
  • command (str) – the name of the command to issue. Providers supporting “reindex” will accept this command.

  • **kwargs – additional arguments passed to this command.

reindex(timeout=None)[source]

issue a reindex command to this dataset’s provider

Reindexes a dataset. This will reindex the dataset in the background, and will return immediately. If the kicking off reindexing is successful, this will return a message {“success”: True}, otherwise it will raise an exception with the error message.

Parameters

timeout (Optional[Union[datetime.timedelta, str]]) – the maximum time to wait for the reindexing to complete. If None, will use the default timeout of 30 minutes.

static from_snowflake_table(name, account, database, table, credential, schema='public', warehouse=None, id_column=None, geometry_column=None, datetime_column=None, feature_limit=8000, middleware={}, cache={}, tile_options={}, domain='*', category='*', type='*', **kwargs)[source]

create a Dataset from a Snowflake table.

This method creates a new Dataset from an existing Snowflake table.

Parameters
  • name (str) – name of the Dataset

  • account (str) – Snowflake account string, formatted as <orgname>-<account_name>. Ref url: https://docs.snowflake.com/en/user-guide/admin-account-identifier#using-an-account-name-as-an-identifier

  • database (str) – Snowflake database that contains the table

  • table (str) – name of the Snowflake table

  • credential (str) – name of a credential to access table. Either basic auth or oauth2 refresh token are supported

  • schema (str) – Snowflake schema the table resides in

  • warehouse (Optional[str]) – name of the Snowflake warehouse to use

  • id_column (Optional[str]) – name of the column containing a unique identifier. Integer IDs preferred, but not required

  • geometry_column (Optional[str]) – name of the column containing the primary geometry for spatial filtering.

  • datetime_column (Optional[str]) – name of the column containing the primary datetime field for temporal filtering.

  • feature_limit (int) – max number of results to return in a single page from a search

  • middleware (geodesic.boson.middleware.MiddlewareConfig) – configure any boson middleware to be applied to the new dataset.

  • cache (geodesic.boson.boson.CacheConfig) – configure caching for this dataset

  • tile_options (geodesic.boson.tile_options.TileOptions) – configure tile options for this dataset

static from_arcgis_item(name, item_id, arcgis_instance='https://www.arcgis.com/', credential=None, layer_id=None, middleware={}, cache={}, tile_options={}, domain='*', category='*', type='*', gis=None, **kwargs)[source]

creates a new Dataset from an ArcGIS Online/Enterprise item

Parameters
  • name (str) – name of the Dataset to create

  • item_id (str) – the item ID of the ArcGIS Item Referenced

  • arcgis_instance (str) – the base url of the ArcGIS Online or Enterprise root. Defaults to AGOL, MUST be specified for ArcGIS Enterprise instances

  • credential – the name or uid of a credential required to access this. Currently, this must be the client credentials of an ArcGIS OAuth2 Application. Public layers do not require credentials.

  • layer_id (Optional[int]) – an integer layer ID to subset a service’s set of layers.

  • middleware (geodesic.boson.middleware.MiddlewareConfig) – configure any boson middleware to be applied to the new dataset.

  • cache (geodesic.boson.boson.CacheConfig) – configure caching for this dataset

  • tile_options (geodesic.boson.tile_options.TileOptions) – configure tile options for this dataset

  • gis – the logged in arcgis.gis.GIS to use to access the metadata for this item. To access secure content, if this is not specified, the active GIS is used.

Returns

a new Dataset.

Examples

>>> ds = Dataset.from_arcgis_item(
...          name="my-dataset",
...          item_id="abc123efghj34234kxlk234joi",
...          credential="my-arcgis-creds"
... )
>>> ds.save()
static from_arcgis_layer(name, url, arcgis_instance='https://www.arcgis.com', credential=None, middleware={}, cache={}, tile_options={}, domain='*', category='*', type='*', gis=None, **kwargs)[source]

creates a new Dataset from an ArcGIS Online/Enterprise Service URL

Parameters
  • name (str) – name of the Dataset to create

  • url (str) – the URL of the Feature, Image, or Map Server. This is the layer url, not the Service url. Only the specified layer will be available to the dataset

  • arcgis_instance (str) – the base url of the ArcGIS Online or Enterprise root. Defaults to AGOL, MUST be specified for ArcGIS Enterprise instances

  • credential – the name or uid of a credential required to access this. Currently, this must be the client credentials of an ArcGIS OAuth2 Application. Public layers do not require credentials.

  • middleware (geodesic.boson.middleware.MiddlewareConfig) – configure any boson middleware to be applied to the new dataset.

  • cache (geodesic.boson.boson.CacheConfig) – configure caching for this dataset

  • tile_options (geodesic.boson.tile_options.TileOptions) – configure tile options for this dataset

  • gis – the logged in arcgis.gis.GIS to use to access the metadata for this item. To access secure content, if this is not specified, the active GIS is used.

Returns

a new Dataset.

Examples

>>> ds = Dataset.from_arcgis_layer(
...          name="my-dataset",
...          url="https://services9.arcgis.com/ABC/arcgis/rest/services/SomeLayer/FeatureServer/0",
...          credential="my-arcgis-creds"
... )
>>> ds.save()
static from_arcgis_service(name, url, arcgis_instance='https://www.arcgis.com', credential=None, layer_id=None, middleware={}, cache={}, tile_options={}, domain='*', category='*', type='*', gis=None, **kwargs)[source]

creates a new Dataset from an ArcGIS Online/Enterprise Service URL

Parameters
  • name (str) – name of the Dataset to create

  • url (str) – the URL of the Feature, Image, or Map Server. This is not the layer url, but the Service url. Layers will be enumerated and all accessible from this dataset.

  • arcgis_instance (str) – the base url of the ArcGIS Online or Enterprise root. Defaults to AGOL, MUST be specified for ArcGIS Enterprise instances

  • credential – the name or uid of a credential required to access this. Currently, this must be the client credentials of an ArcGIS OAuth2 Application. Public layers do not require credentials.

  • layer_id (Optional[int]) – an integer layer ID to subset a service’s set of layers.

  • middleware (geodesic.boson.middleware.MiddlewareConfig) – configure any boson middleware to be applied to the new dataset.

  • cache (geodesic.boson.boson.CacheConfig) – configure caching for this dataset

  • tile_options (geodesic.boson.tile_options.TileOptions) – configure tile options for this dataset

  • gis – the logged in arcgis.gis.GIS to use to access the metadata for this item. To access secure content, if this is not specified, the active GIS is used.

Returns

a new Dataset.

Examples

>>> ds = Dataset.from_arcgis_service(
...          name="my-dataset",
...          url="https://services9.arcgis.com/ABC/arcgis/rest/services/SomeLayer/FeatureServer",
...          credential="my-arcgis-creds"
... )
>>> ds.save()
static from_stac_collection(name, url, credential=None, item_type='raster', middleware={}, cache={}, tile_options={}, domain='*', category='*', type='*', **kwargs)[source]

Create a new Dataset from a STAC Collection

Parameters
  • name (str) – name of the Dataset to create

  • url (str) – the url to the collection (either STAC API or OGC API: Features)

  • credential – name or uid of the credential to access the API

  • item_type (str) – what type of items does this contain? “raster” for raster data, “features” for features, other types, such as point_cloud may be specified, but doesn’t alter current internal functionality.

  • middleware (geodesic.boson.middleware.MiddlewareConfig) – configure any boson middleware to be applied to the new dataset.

  • cache (geodesic.boson.boson.CacheConfig) – configure caching for this dataset

  • tile_options (geodesic.boson.tile_options.TileOptions) – configure tile options for this dataset

Returns

a new Dataset.

Examples

>>> ds = Dataset.from_stac_collection(
...          name="landsat-c2l2alb-sr-usgs",
...          url="https://landsatlook.usgs.gov/stac-server/collections/landsat-c2l2alb-sr"
...)
>>> ds.save()
static from_bucket(name, url, pattern=None, region=None, datetime_field=None, start_datetime_field=None, end_datetime_field=None, oriented=False, credential=None, middleware={}, cache={}, tile_options={}, domain='*', category='*', type='*', **kwargs)[source]

Creates a new Dataset from a Cloud Storage Bucket (S3/GCP/Azure)

Parameters
  • name (str) – name of the Dataset to create

  • url (str) – the url to the bucket, including the prefix (ex. s3://my-bucket/myprefix, gs://my-bucket/myprefix, …)

  • pattern (Optional[str]) – a regex to filter for files to index

  • region (Optional[str]) – for S3 buckets, the region where the bucket is

  • datetime_field (Optional[str]) – the name of the metadata key on the file to find a timestamp

  • start_datetime_field (Optional[str]) – the name of the metadata key on the file to find a start timestamp

  • end_datetime_field (Optional[str]) – the name of the metadata key on the file to find an end timestamp

  • oriented (bool) – Is this oriented imagery? If so, EXIF data will be parsed for geolocation. Anything missing location info will be dropped.

  • credential (Optional[str]) – the name or uid of the credential to access the bucket.

  • middleware (geodesic.boson.middleware.MiddlewareConfig) – configure any boson middleware to be applied to the new dataset.

  • kwargs – other metadata that will be set on the Dataset, such as description, alias, etc

Returns

a new Dataset.

Examples

>>> ds = Dataset.from_bucket(
...          name="bucket-dataset",
...          url="s3://my-bucket/myprefix",
...          pattern=r".*\.tif",
...          region="us-west-2",
...          datetime_field="TIFFTAG_DATETIME",
...          oriented=False,
...          credential="my-iam-user",
...          description="my dataset is the bomb"
...)
>>> ds.save()
static from_google_earth_engine(name, asset, credential, folder='projects/earthengine-public/assets', url='https://earthengine-highvolume.googleapis.com', middleware={}, cache={}, tile_options={}, domain='*', category='*', type='*', **kwargs)[source]

Creates a new Dataset from a Google Earth Engine Asset

Parameters
  • name (str) – name of the Dataset to create

  • asset (str) – the asset in GEE to use (ex. ‘LANDSAT/LC09/C02/T1_L2’)

  • credential (str) – the credential to access this, a Google Earth Engine GCP Service Account. Future will allow the use of a oauth2 refresh token or other.

  • folder (str) – by default this is the earth engine public, but you can specify another folder if needed to point to legacy data or personal projects.

  • url (str) – the GEE url to use, defaults to the recommended high volume endpoint.

  • kwargs – other metadata that will be set on the Dataset, such as description, alias, etc

  • middleware (geodesic.boson.middleware.MiddlewareConfig) – configure any boson middleware to be applied to the new dataset.

  • cache (geodesic.boson.boson.CacheConfig) – configure caching for this dataset

  • tile_options (geodesic.boson.tile_options.TileOptions) – configure tile options for this dataset

Returns

a new Dataset.

Examples

>>> ds = Dataset.from_google_earth_engine(
...          name="landsat-9-c2-gee",
...          asset="s3://my-bucket/myprefixLANDSAT/LC09/C02/T1_L2",
...          credential="google-earth-engine-svc-account",
...          description="my dataset is the bomb"
...)
>>> ds.save()
static from_elasticsearch_index(name, url, index_pattern, credential=None, storage_credential=None, datetime_field='properties.datetime', geometry_field='geometry', geometry_type='geo_shape', id_field='_id', data_api='features', item_type='other', feature_limit=2000, middleware={}, cache={}, tile_options={}, domain='*', category='*', type='*', **kwargs)[source]

create a new Dataset from an elasticsearch index containing geojson features or STAC items

Parameters
  • name (str) – name of the Dataset to create

  • url (str) – the DNS name or IP of the elasticsearch host to connect to.

  • index_pattern (str) – an elasticsearch index name or index pattern

  • credential (Optional[str]) – name of the Credential object to use. Currently, this only supports basic auth (username/password).

  • storage_credential (Optional[str]) – the name of the Credential object to use for storage if any of the data referenced in the index requires a credential to access (e.g. cloud storage for STAC)

  • datetime_field (str) – the field that is used to search by datetime in the elasticserach index.

  • geometry_field (str) – the name of the field that contains the geometry

  • geometry_type (str) – the type of the geometry field, either geo_shape or geo_point

  • id_field (str) – the name of the field to use as an ID field

  • data_api (str) – the data API, either ‘stac’ or ‘features’

  • item_type (str) – the type of item. If it’s a stac data_api, then it should describe what the data is

  • feature_limit (int) – the max number of features the service will return per page.

  • middleware (geodesic.boson.middleware.MiddlewareConfig) – configure any boson middleware to be applied to the new dataset.

  • **kwargs – other arguments that will be used to create the collection and provider config.

Returns

A new Dataset. Must call .save() for it to be usable.

static from_csv(name, url, index_data=True, crs='EPSG:4326', x_field='CoordX', y_field='CoordY', z_field='CoordZ', geom_field='WKT', datetime_field=None, feature_limit=1000, region=None, credential=None, middleware={}, cache={}, tile_options={}, domain='*', category='*', type='*', **kwargs)[source]

create a new Dataset from a CSV file in cloud storage

Parameters
  • name (str) – name of the Dataset to create

  • url (str) – the URL/URI of the data. Can be a cloud storage URI such as s3://<bucket>/key, gs://

  • index_data (bool) – if true, the data will be copied and spatially indexed for more efficient queries

  • crs (str) – a string coordinate reference for the data

  • (x/y/z)_field – the field name for the x/y/z fields

  • geom_field (str) – the field name containing the geometry in well known text (WKT) or hex encoded well known binary (WKB).

  • feature_limit (int) – the max number of features this will return per page

  • datetime_field (Optional[str]) – if the data is time enabled, this is the name of the datetime field. The datetime must be RFC3339 formatted.

  • region (Optional[str]) – for S3 buckets, the region where the bucket is

  • credential (Optional[str]) – the name of the credential object needed to access this data.

  • middleware (geodesic.boson.middleware.MiddlewareConfig) – configure any boson middleware to be applied to the new dataset.

static from_tabular_data(name, url, index_data=True, crs='EPSG:4326', feature_limit=1000, datetime_field=None, region=None, credential=None, middleware={}, cache={}, tile_options={}, domain='*', category='*', type='*', **kwargs)[source]

create a new Dataset from a vector file in cloud storage.

This can be a Shapefile, GeoJSON Feature Collection, FlatGeobuf, and several others

Parameters
  • name (str) – name of the Dataset to create

  • url (str) – the URL/URI of the data. Can be a cloud storage URI such as s3://<bucket>/key, gs://

  • index_data (bool) – if true, the data will be copied and spatially indexed for more efficient queries

  • crs (str) – a string coordinate reference for the data

  • feature_limit (int) – the max number of features this will return per page

  • datetime_field (Optional[str]) – if the data is time enabled, this is the name of the datetime field. The datetime field must RFC3339 formatted.

  • region (Optional[str]) – for S3 buckets, the region where the bucket is

  • credential (Optional[str]) – the name of the credential object needed to access this data.

  • middleware (geodesic.boson.middleware.MiddlewareConfig) – configure any boson middleware to be applied to the new dataset.

  • cache (geodesic.boson.boson.CacheConfig) – configure caching for this dataset

  • tile_options (geodesic.boson.tile_options.TileOptions) – configure tile options for this dataset

static from_geoparquet(name, url, feature_limit=1000, datetime_field='datetime', return_geometry_properties=False, expose_partitions_as_layer=True, update_existing_index=True, credential=None, middleware={}, cache={}, tile_options={}, domain='*', category='*', type='*', **kwargs)[source]

creates a dataset from Hive-partitioned GeoParquet files in cloud storage

Hive-partition GeoParquet is a particular convention typically used when writing data out from a parallel process (such as Tesseract or Apache Spark) or when the individual file sizes or row counts are too large. This provider indexes these partitions spatially to optimize query performance. Hive partitioned parquet is organized like this and we require this structure:

prefix/<root>.parquet

/key=value_1/<partition-00001>.parquet /key=value_2/<partition-00002>.parquet /… /key=value_m/<partition-n>.parquet

“root” and “partition-xxxxx” can be whatever provided they both have the parquet suffix. Any number oof key/value pairs are allowed in Hive Partitioned data. This can also point to a single parquet file.

Parameters
  • name (str) – name of the Dataset to create

  • url (str) – the path to the <root>.parquet. Format depends on the storage backend.

  • feature_limit (int) – the max number of features that this provider will allow returned by a single query.

  • datetime_field (str) – if the data is time enabled, this is the name of the datetime field. This is the name of a column in the parquet dataset that will be used for time filtering. Must be RFC3339 formatted in order to work.

  • return_geometry_properties (bool) – if True, will compute and return geometry properties along with the features.

  • expose_partitions_as_layer (bool) – this will create a collection/layer in this Dataset that simply has the partition bounding box and count of features within. Can be used as a simple heatmap

  • update_existing_index (bool) – if the data has been indexed in our scheme by a separate process, set to False to use that instead, otherwise this will index the parquet data in the bucket before you are able to query it.

  • credential (Optional[str]) – the name of the credential to access the data in cloud storage.

  • middleware (geodesic.boson.middleware.MiddlewareConfig) – configure any boson middleware to be applied to the new dataset.

  • cache (geodesic.boson.boson.CacheConfig) – configure caching for this dataset

  • tile_options (geodesic.boson.tile_options.TileOptions) – configure tile options for this dataset

  • **kwargs – additional arguments that will be used to create the STAC collection, Dataset description Alias, etc.

static from_remote_provider(name, url, data_api='features', transport_protocol='http', additional_properties={}, feature_limit=2000, credential=None, middleware={}, cache={}, tile_options={}, domain='*', category='*', type='*', **kwargs)[source]

Creates a dataset from a server implementing the Boson remote provider interface

The Boson Remote Provider interface may be implemented using the Boson Python SDK (https://pypi.org/project/boson-sdk/). The provider must be hosted somewhere and this connects Boson to a remote provider.

Remote Providers may either implement the Search or the Pixels endpoint (or both).

Parameters
  • name (str) – name of the Dataset to create

  • url (str) – URL of the server implementing the interface

  • data_api (str) – either ‘features’ or ‘raster’.

  • transport_protocol (str) – either ‘http’ or ‘grpc’

  • additional_properties (dict) – additional properties to set on the dataset

  • feature_limit (int) – the max number of features that this provider will allow returned in a single page.

  • credential (Optional[str]) – the name of the credential to access the api.

  • middleware (geodesic.boson.middleware.MiddlewareConfig) – configure any boson middleware to be applied to the new dataset.

  • cache (geodesic.boson.boson.CacheConfig) – configure caching for this dataset

  • tile_options (geodesic.boson.tile_options.TileOptions) – configure tile options for this dataset

  • **kwargs – additional arguments that will be used to create the STAC collection, Dataset description Alias, etc.

Docs

Developer documentation for Seer AI APIs

View Docs

Tutorials

Get in-depth tutorials for beginners and advanced developers

View Tutorials

Resources

Find development resources and get your questions answered

View Resources