Dataset¶
- class geodesic.entanglement.dataset.Dataset(**obj)[source]¶
Bases:
geodesic.entanglement.object.Object
Allows interaction with SeerAI datasets.
Dataset provides a way to interact with datasets in the SeerAI.
- Parameters
**obj (dict) – Dictionary with all properties in the dataset.
- Variables
alias (str) – Alternative name for the dataset. This name has fewer restrictions on characters and should be human
readable. –
- hash¶
(str) - hash of this dataset
Descriptor:
_StringDescr
- alias¶
(str) - the alias of this object, anything you wish it to be
Descriptor:
_StringDescr
- data_api¶
(str) - the api to access the data
Descriptor:
_StringDescr
- item_type¶
(str) - the api to access the data
Descriptor:
_StringDescr
- item_assets¶
(dict,
Asset
) - information about assets contained in this datasetDescriptor:
_AssetsDescr
- extent¶
(
Extent
, dict) - spatiotemporal extent of this DatasetDescriptor:
_TypeConstrainedDescr
- services¶
(str) - list of services that expose the data for this dataset
Descriptor:
_ListDescr
- providers¶
list of providers for this dataset
Descriptor:
_ListDescr
- stac_extensions¶
list of STAC extensions this dataset uses
Descriptor:
_ListDescr
- links¶
list of links
Descriptor:
_ListDescr
- metadata¶
(dict) - arbitrary metadata for this dataset
Descriptor:
_DictDescr
- boson_config¶
(
BosonConfig
, dict) - boson configuration for this datasetDescriptor:
BosonDescr
- save()[source]¶
Create or update a Dataset in Boson.
- Returns
self
- Raises
requests.HTTPError – If this failed to save.
- create()¶
Deprecated in 1.0.0
Create or update a Dataset in Boson.
- Returns
self
- Raises
requests.HTTPError – If this failed to save.
- search(bbox=None, datetime=None, limit=10, page_size=500, intersects=None, collections=None, ids=None, filter=None, fields=None, sortby=None, method='POST', return_type=None, extra_params={})[source]¶
Search the dataset for items.
Search this service’s OGC Features or STAC API.
- Parameters
bbox (Optional[List]) – The spatial extent for the query as a bounding box. Example: [-180, -90, 180, 90]
datetime (Union[List, Tuple]) – The temporal extent for the query formatted as a list: [start, end].
limit (Optional[Union[bool, int]]) – The maximum number of items to return in the query. If None, will page through all results
page_size (Optional[int]) – If retrieving all items, this page size will be used for the subsequent requests
intersects (Optional[object]) – a geometry to use in the query
collections (Optional[List[str]]) – a list of collections to search
ids (Optional[List[str]]) – a list of feature/item IDs to filter to
filter (Optional[Union[CQLFilter, dict]]) – a CQL2 filter. This is supported by most datasets but will not work for others.
fields (Optional[dict]) – a list of fields to include/exclude. Included fields should be prefixed by ‘+’ and excluded fields by ‘-’. Alernatively, a dict with a ‘include’/’exclude’ lists may be provided
sortby (Optional[dict]) – a list of sortby objects, which are dicts containing “field” and “direction”. Direction may be one of “asc” or “desc”. Not supported by all datasets
method (str) – the HTTP method - POST is default and usually should be left alone unless a server doesn’t support
return_type (SearchReturnType) – the type of object to return. Either a FeatureCollection or a GeoDataFrame
extra_params (Optional[dict]) – a dict of additional parameters that will be passed along on the request.
- Returns
A
geodesic.stac.FeatureCollection
with all items in the dataset matching the query.
Examples
A query on the sentinel-2-l2a dataset with a given bounding box and time range. Additionally, you can apply filters on the parameters in the items
>>> bbox = geom.bounds >>> date_range = (datetime.datetime(2020, 12,1), datetime.datetime.now()) >>> ds.search( ... bbox=bbox, ... datetime=date_range, ... filter=CQLFilter.lte("properties.eo:cloud_cover", 10.0) ... )
- query(bbox=None, datetime=None, limit=10, page_size=500, intersects=None, collections=None, ids=None, filter=None, fields=None, sortby=None, method='POST', return_type=None, extra_params={})¶
Deprecated in 1.0.0
Search the dataset for items.
Search this service’s OGC Features or STAC API.
- Args:
bbox: The spatial extent for the query as a bounding box. Example: [-180, -90, 180, 90] datetime: The temporal extent for the query formatted as a list: [start, end]. limit: The maximum number of items to return in the query. If None, will page through all results page_size: If retrieving all items, this page size will be used for the subsequent requests intersects: a geometry to use in the query collections: a list of collections to search ids: a list of feature/item IDs to filter to filter: a CQL2 filter. This is supported by most datasets but will not work for others. fields: a list of fields to include/exclude. Included fields should be prefixed by ‘+’ and excluded fields by ‘-’. Alernatively, a dict with a ‘include’/’exclude’ lists may be provided sortby: a list of sortby objects, which are dicts containing “field” and “direction”. Direction may be one of “asc” or “desc”. Not supported by all datasets method: the HTTP method - POST is default and usually should be left alone unless a server doesn’t support return_type: the type of object to return. Either a FeatureCollection or a GeoDataFrame extra_params: a dict of additional parameters that will be passed along on the request.
- Returns:
A
geodesic.stac.FeatureCollection
with all items in the dataset matching the query.- Examples:
A query on the sentinel-2-l2a dataset with a given bounding box and time range. Additionally, you can apply filters on the parameters in the items
>>> bbox = geom.bounds >>> date_range = (datetime.datetime(2020, 12,1), datetime.datetime.now()) >>> ds.search( ... bbox=bbox, ... datetime=date_range, ... filter=CQLFilter.lte("properties.eo:cloud_cover", 10.0) ... )
- get_pixels(*, bbox, datetime=None, pixel_size=None, shape=None, pixel_dtype=<class 'numpy.float32'>, bbox_crs='EPSG:4326', output_crs='EPSG:3857', resampling='nearest', no_data=None, content_type='raw', asset_bands=[], filter={}, compress=True, bbox_srs=None, output_srs=None, input_nodata=None, output_nodata=None)[source]¶
get pixel data or an image from this Dataset
get_pixels gets requested pixels from a dataset by calling Boson. This method returns either a numpy array or the bytes of a image file (jpg, png, gif, or tiff). If the content_type is “raw”, this will return a numpy array, otherwise it will return the requested image format as bytes that can be written to a file. Where possible, a COG will be returned for Tiff format, but is not guaranteed.
- Parameters
bbox (list) – a bounding box to export as imagery (xmin, ymin, xmax, ymax)
datetime (Optional[Union[List, Tuple]]) – a start and end datetime to query against. Imagery will be filtered to between this range and mosaiced.
pixel_size (Optional[list]) – a list of the x/y pixel size of the output imagery. This list needs to have length equal to the number of bands. This should be specified in the output spatial reference.
shape (Optional[list]) – the shape of the output image (rows, cols). Either this or the pixel_size must be specified, but not both.
pixel_dtype (Union[numpy.dtype, str]) – a numpy datatype or string descriptor in numpy format (e.g. <f4) of the output. Most, but not all basic dtypes are supported.
bbox_crs (str) – the spatial reference of the bounding bbox, as a string. May be EPSG:<code>, WKT, Proj4, ProjJSON, etc.
output_crs (str) – the spatial reference of the output pixels.
resampling (str) – a string to select the resampling method.
no_data (Optional[Any]) – in the source imagery, what value should be treated as no data?
content_type (str) – the image format. Default is “raw” which sends raw image bytes that will be converted into a numpy array. If “jpg”, “gif”, or “tiff”, returns the bytes of an image file instead, which can directly be written to disk.
asset_bands (List[geodesic.boson.asset_bands.AssetBands]) – a list containing dictionaries with the keys “asset” and “bands”. Asset should point to an asset in the dataset, and “bands” should list band indices (0-indexed) or band names.
filter (dict) – a CQL2 JSON filter to filter images that will be used for the resulting output.
compress (bool) – compress bytes when transfering. This will usually, but not always improve performance
input_nodata (deprecated) – in the source imagery, what value should be treated as no data?
output_nodata (deprecated) – what value should be set as the nodata value in the resulting dataset. Only meaningful for tiff outputs.
bbox_srs (deprecated) – the spatial reference of the bounding bbox, as a string. May be EPSG:<code>, WKT, Proj4, ProjJSON, etc.
output_srs (deprecated) – the spatial reference of the output pixels.
- Returns
a numpy array or bytes of an image file.
Examples
>>> # Get a numpy array of pixels from sentinel-2-l2a >>> bbox = [-109.050293,36.993778,-102.030029,41.004775] # roughly the state of Colorado >>> range = (datetime(2020,1,1), datetime(2020,2,1)) >>> # The RGB bands of sentinel-2-l2a are B04, B03, B02 >>> bands = [AssetBands(asset="B04", bands=[0]), AssetBands(asset="B03", bands=[0]), AssetBands(asset="B02", bands=[0])] >>> pixels = ds.get_pixels(bbox=bbox, datetime=range, pixel_size=(1000,1000), bands=bands, output_srs="EPSG:3857", bbox_srs="EPSG:4326")
- warp(*, bbox, datetime=None, pixel_size=None, shape=None, pixel_dtype=<class 'numpy.float32'>, bbox_crs='EPSG:4326', output_crs='EPSG:3857', resampling='nearest', no_data=None, content_type='raw', asset_bands=[], filter={}, compress=True, bbox_srs=None, output_srs=None, input_nodata=None, output_nodata=None)¶
Deprecated in 1.0.0
get pixel data or an image from this Dataset
get_pixels gets requested pixels from a dataset by calling Boson. This method returns either a numpy array or the bytes of a image file (jpg, png, gif, or tiff). If the content_type is “raw”, this will return a numpy array, otherwise it will return the requested image format as bytes that can be written to a file. Where possible, a COG will be returned for Tiff format, but is not guaranteed.
- Args:
bbox: a bounding box to export as imagery (xmin, ymin, xmax, ymax) datetime: a start and end datetime to query against. Imagery will be filtered to between this range and mosaiced. pixel_size: a list of the x/y pixel size of the output imagery. This list needs to have length equal to the number of bands. This should be specified in the output spatial reference. shape: the shape of the output image (rows, cols). Either this or the pixel_size must be specified, but not both. pixel_dtype: a numpy datatype or string descriptor in numpy format (e.g. <f4) of the output. Most, but not all basic dtypes are supported. bbox_crs: the spatial reference of the bounding bbox, as a string. May be EPSG:<code>, WKT, Proj4, ProjJSON, etc. output_crs: the spatial reference of the output pixels. resampling: a string to select the resampling method. no_data: in the source imagery, what value should be treated as no data? content_type: the image format. Default is “raw” which sends raw image bytes that will be converted into a numpy array. If “jpg”, “gif”, or “tiff”, returns the bytes of an image file instead, which can directly be written to disk. asset_bands: a list containing dictionaries with the keys “asset” and “bands”. Asset should point to an asset in the dataset, and “bands” should list band indices (0-indexed) or band names. filter: a CQL2 JSON filter to filter images that will be used for the resulting output. compress: compress bytes when transfering. This will usually, but not always improve performance input_nodata (deprecated): in the source imagery, what value should be treated as no data? output_nodata (deprecated): what value should be set as the nodata value in the resulting dataset. Only meaningful for tiff outputs. bbox_srs (deprecated): the spatial reference of the bounding bbox, as a string. May be EPSG:<code>, WKT, Proj4, ProjJSON, etc. output_srs (deprecated): the spatial reference of the output pixels.
- Returns:
a numpy array or bytes of an image file.
- Examples:
>>> # Get a numpy array of pixels from sentinel-2-l2a >>> bbox = [-109.050293,36.993778,-102.030029,41.004775] # roughly the state of Colorado >>> range = (datetime(2020,1,1), datetime(2020,2,1)) >>> # The RGB bands of sentinel-2-l2a are B04, B03, B02 >>> bands = [AssetBands(asset="B04", bands=[0]), AssetBands(asset="B03", bands=[0]), AssetBands(asset="B02", bands=[0])] >>> pixels = ds.get_pixels(bbox=bbox, datetime=range, pixel_size=(1000,1000), bands=bands, output_srs="EPSG:3857", bbox_srs="EPSG:4326")
- view(name, bbox=None, intersects=None, datetime=None, collections=None, ids=None, filter=None, asset_bands=[], feature_limit=None, middleware={}, cache={}, tile_options={}, domain=None, category=None, type=None, project=None, **kwargs)[source]¶
creates a curated view of a
Dataset
This method creates a new
Dataset
that is a “view” of an existing dataset. This allows the user to provide a set of persistent filters to aDataset
as a separateObject
. A view may also be saved in a differentProject
than the original. The applied filters affect both a query as well as the get_pixels. The final request processed will be the intersection of the view parameters with the query.- Parameters
name (str) – name of the view
Dataset
bbox (Optional[Union[List, Tuple]]) – The spatial extent for the query as a bounding box. Example: [-180, -90, 180, 90]
intersects (Optional[object]) – a geometry to use in the query
datetime (Optional[Union[List, Tuple]]) – The temporal extent for the query formatted as a list: [start, end].
collections (Optional[List[str]]) – a list of collections to search
ids (Optional[List[str]]) – a list of feature/item IDs to filter to
filter (Optional[Union[geodesic.cql.CQLFilter, dict]]) – a CQL2 filter. This is supported by most datasets but will not work for others.
asset_bands (list) – a list of asset/bands combinations to filter this
Dataset
tofeature_limit (Optional[int]) – if specified, overrides the max_page_size of the this
Dataset
middleware (geodesic.boson.middleware.MiddlewareConfig) – configure any boson middleware to be applied to the new dataset.
cache (geodesic.boson.boson.CacheConfig) – configure caching for this dataset
tile_options (geodesic.boson.tile_options.TileOptions) – configure tile options for this dataset
domain (Optional[str]) – domain of the resulting
Object
category (Optional[str]) – category of the resulting
Object
type (Optional[str]) – the type of the resulting
Object
project (Optional[str]) – a new project to save this view to. If None, inherits from the parent
Dataset
- union(name, others=[], feature_limit=1000, project=None, ignore_duplicate_fields=False, middleware={}, cache={}, tile_options={}, domain='*', category='*', type='*', **kwargs)[source]¶
creates a union of this dataset with a list of others
Creates a new
Dataset
that is theunion
of thisDataset
with a list ofothers
. Ifothers
is an empty list, this creates a union of a dataset with itself, which is essentially a virtual copy of the original endowed with any capabilities Boson adds.See:
geodesic.entanglement.dataset.new_union_dataset()
- Parameters
name (str) – the name of the new
Dataset
others (List[Dataset]) – a list of
Datasets
tounion
feature_limit (int) – the max size of a results page from a query/search
project (Optional[Union[geodesic.account.projects.Project, str]]) – the name of the project this will be assigned to
ignore_duplicate_fields (bool) – if True, duplicate fields across providers will be ignored
middleware (geodesic.boson.middleware.MiddlewareConfig) – configure any boson middleware to be applied to the new dataset.
cache (geodesic.boson.boson.CacheConfig) – configure caching for this dataset
tile_options (geodesic.boson.tile_options.TileOptions) – configure tile options for this dataset
- join(name, key, right_dataset, right_key, drop_duplicates=False, drop_fields=[], right_drop_fields=[], suffix='_left', right_suffix='_right', use_geometry='left', feature_limit=1000, project=None, middleware={}, cache={}, tile_options={}, domain='*', category='*', type='*', **kwargs)[source]¶
creates a left join of this dataset with another dataset
See:
geodesic.entanglement.dataset.new_join_dataset()
- Parameters
name (str) – the name of the new
Dataset
key (str) – the name of the field in this dataset to join on. This key must exist for there to be output. An error will be thrown if the key does not exist for 50% of the features in a query.
right_dataset (Dataset) – the dataset to join with
right_key (str) – the name of the field in the right dataset to join on.
drop_fields (List[str]) – a list of fields to drop from this dataset
right_drop_fields (List[str]) – a list of fields to drop from the right dataset
suffix (str) – the suffix to append to fields from this dataset
right_suffix (str) – the suffix to append to fields from the right dataset
use_geometry (str) – which geometry to use in the join. “left” will use the left dataset’s geometry, “right” will use the right dataset’s geometry
drop_duplicates (bool) – if True, duplicate fields across providers will be ignored
feature_limit (int) – the max size of a results page from a query/search
project (Optional[Union[geodesic.account.projects.Project, str]]) – the name of the project this will be assigned to
ignore_duplicate_fields – if True, duplicate fields across providers will be ignored
middleware (geodesic.boson.middleware.MiddlewareConfig) – configure any boson middleware to be applied to the new dataset.
cache (geodesic.boson.boson.CacheConfig) – configure caching for this dataset
tile_options (geodesic.boson.tile_options.TileOptions) – configure tile options for this dataset
Shares a dataset, producing a token that will allow unauthenticated users to run a proxied boson request
- Parameters
servicer (str) – The name of the servicer to use in the boson request.
ttl (Optional[Union[datetime.timedelta, int, float]]) – The time in until the dataset’s token should expire. Either a timedelta object or seconds Defaults to -1 (no expiration) if not provided.
- Raises
requests.HTTPError – If the user is not permitted to access the dataset or if an error occurred
- Returns
a share token created by Ted and its corresponding data
Share a dataset as a GeoServices/ArcGIS service
- Parameters
ttl (Optional[Union[datetime.timedelta, int, float]]) – The time in until the dataset’s token should expire. Either a timedelta object or seconds Defaults to -1 (no expiration) if not provided.
- Raises
requests.HTTPError – If the user is not permitted to access the dataset or if an error occurred
- Returns
a share token created by Ted and its corresponding data
Share a dataset as a OGC Tiles service
- Parameters
ttl (Optional[Union[datetime.timedelta, int, float]]) – The time in until the dataset’s token should expire. Either a timedelta object or seconds Defaults to -1 (no expiration) if not provided.
- Raises
requests.HTTPError – If the user is not permitted to access the dataset or if an error occurred
- Returns
a share token created by Ted and its corresponding data
- command(command, **kwargs)[source]¶
issue a command to this dataset’s provider
Commands can be used to perform operations on a dataset such as reindexing. Most commands run in the background and will return immediately. If a command is successfully submitted, this should return a message {“success”: True}, otherwise it will raise an exception with the error message.
- Parameters
command (str) – the name of the command to issue. Providers supporting “reindex” will accept this command.
**kwargs – additional arguments passed to this command.
- reindex(timeout=None)[source]¶
issue a reindex command to this dataset’s provider
Reindexes a dataset. This will reindex the dataset in the background, and will return immediately. If the kicking off reindexing is successful, this will return a message {“success”: True}, otherwise it will raise an exception with the error message.
- Parameters
timeout (Optional[Union[datetime.timedelta, str]]) – the maximum time to wait for the reindexing to complete. If None, will use the default timeout of 30 minutes.
- clear_store(prefix=None)[source]¶
clears the persistent store for this dataset
Some data, such as cached files, indices, and tiles remain in the store. Boson isn’t always able to recognize when data is stale. This can be called to clear out the persistent store for this dataset.
- Parameters
prefix (Optional[str]) – if specified, only keys with this prefix will be cleared
- clear_tile_cache(cache_prefix='default')[source]¶
clears the tile cache for this dataset
- Parameters
cache_prefix (str) – if specified, only specified cache will be cleared. “default” is most common and refers the the tiles with no additional filtering applied. Beneath this key is the Tile Matrix Set used, so by default, all tiles for all tile matrix sets will be cleared
- static from_snowflake_table(name, account, database, table, credential, schema='public', warehouse=None, id_column=None, geometry_column=None, datetime_column=None, feature_limit=8000, middleware={}, cache={}, tile_options={}, domain='*', category='*', type='*', **kwargs)[source]¶
create a
Dataset
from a Snowflake table.This method creates a new
Dataset
from an existing Snowflake table.- Parameters
name (str) – name of the
Dataset
account (str) – Snowflake account string, formatted as
<orgname>-<account_name>
. Ref url: https://docs.snowflake.com/en/user-guide/admin-account-identifier#using-an-account-name-as-an-identifierdatabase (str) – Snowflake database that contains the table
table (str) – name of the Snowflake table
credential (str) – name of a credential to access table. Either basic auth or oauth2 refresh token are supported
schema (str) – Snowflake schema the table resides in
warehouse (Optional[str]) – name of the Snowflake warehouse to use
id_column (Optional[str]) – name of the column containing a unique identifier. Integer IDs preferred, but not required
geometry_column (Optional[str]) – name of the column containing the primary geometry for spatial filtering.
datetime_column (Optional[str]) – name of the column containing the primary datetime field for temporal filtering.
feature_limit (int) – max number of results to return in a single page from a search
middleware (geodesic.boson.middleware.MiddlewareConfig) – configure any boson middleware to be applied to the new dataset.
cache (geodesic.boson.boson.CacheConfig) – configure caching for this dataset
tile_options (geodesic.boson.tile_options.TileOptions) – configure tile options for this dataset
- static from_arcgis_item(name, item_id, arcgis_instance='https://www.arcgis.com', credential=None, middleware={}, cache={}, tile_options={}, domain='*', category='*', type='*', **kwargs)[source]¶
creates a new Dataset from an ArcGIS Online/Enterprise item
- Parameters
name (str) – name of the Dataset to create
item_id (str) – the item ID of the ArcGIS Item Referenced
arcgis_instance (str) – the base url of the ArcGIS Online or Enterprise root. Defaults to AGOL, MUST be specified for ArcGIS Enterprise instances
credential (Optional[str]) – the name or uid of a credential required to access this. Currently, this must be the client credentials of an ArcGIS OAuth2 Application. Public layers do not require credentials.
layer_id – an integer layer ID to subset a service’s set of layers.
middleware (geodesic.boson.middleware.MiddlewareConfig) – configure any boson middleware to be applied to the new dataset.
cache (geodesic.boson.boson.CacheConfig) – configure caching for this dataset
tile_options (geodesic.boson.tile_options.TileOptions) – configure tile options for this dataset
- Returns
a new Dataset
Examples
>>> ds = Dataset.from_arcgis_item( ... name="my-dataset", ... item_id="abc123efghj34234kxlk234joi", ... credential="my-arcgis-creds" ... ) >>> ds.save()
- static from_arcgis_layer(name, url, arcgis_instance='https://www.arcgis.com', credential=None, middleware={}, cache={}, tile_options={}, domain='*', category='*', type='*', **kwargs)[source]¶
creates a new Dataset from an ArcGIS Online/Enterprise Service URL
- Parameters
name (str) – name of the Dataset to create
url (str) – the URL of the Feature, Image, or Map Server. This is the layer url, not the Service url. Only the specified layer will be available to the dataset
arcgis_instance (str) – the base url of the ArcGIS Online or Enterprise root. Defaults to AGOL, MUST be specified for ArcGIS Enterprise instances
credential (Optional[str]) – the name or uid of a credential required to access this. Currently, this must be the client credentials of an ArcGIS OAuth2 Application. Public layers do not require credentials.
middleware (geodesic.boson.middleware.MiddlewareConfig) – configure any boson middleware to be applied to the new dataset.
cache (geodesic.boson.boson.CacheConfig) – configure caching for this dataset
tile_options (geodesic.boson.tile_options.TileOptions) – configure tile options for this dataset
- Returns
a new Dataset.
Examples
>>> ds = Dataset.from_arcgis_layer( ... name="my-dataset", ... url="https://services9.arcgis.com/ABC/arcgis/rest/services/SomeLayer/FeatureServer/0", ... credential="my-arcgis-creds" ... ) >>> ds.save()
- static from_arcgis_service(name, url, arcgis_instance='https://www.arcgis.com', credential=None, layer_id=None, middleware={}, cache={}, tile_options={}, domain='*', category='*', type='*', **kwargs)[source]¶
creates a new Dataset from an ArcGIS Online/Enterprise Service URL
- Parameters
name (str) – name of the Dataset to create
url (str) – the URL of the Feature, Image, or Map Server. This is not the layer url, but the Service url. Layers will be enumerated and all accessible from this dataset.
arcgis_instance (str) – the base url of the ArcGIS Online or Enterprise root. Defaults to AGOL, MUST be specified for ArcGIS Enterprise instances
credential (Optional[str]) – the name or uid of a credential required to access this. Currently, this must be the client credentials of an ArcGIS OAuth2 Application. Public layers do not require credentials.
layer_id (Optional[int]) – an integer layer ID to subset a service’s set of layers.
middleware (geodesic.boson.middleware.MiddlewareConfig) – configure any boson middleware to be applied to the new dataset.
cache (geodesic.boson.boson.CacheConfig) – configure caching for this dataset
tile_options (geodesic.boson.tile_options.TileOptions) – configure tile options for this dataset
gis – the logged in arcgis.gis.GIS to use to access the metadata for this item. To access secure content, if this is not specified, the active GIS is used.
- Returns
a new Dataset.
Examples
>>> ds = Dataset.from_arcgis_service( ... name="my-dataset", ... url="https://services9.arcgis.com/ABC/arcgis/rest/services/SomeLayer/FeatureServer", ... credential="my-arcgis-creds" ... ) >>> ds.save()
- static from_stac_collection(name, url, credential=None, item_type='raster', middleware={}, cache={}, tile_options={}, domain='*', category='*', type='*', **kwargs)[source]¶
Create a new Dataset from a STAC Collection
- Parameters
name (str) – name of the Dataset to create
url (str) – the url to the collection (either STAC API or OGC API: Features)
credential – name or uid of the credential to access the API
item_type (str) – what type of items does this contain? “raster” for raster data, “features” for features, other types, such as point_cloud may be specified, but doesn’t alter current internal functionality.
middleware (geodesic.boson.middleware.MiddlewareConfig) – configure any boson middleware to be applied to the new dataset.
cache (geodesic.boson.boson.CacheConfig) – configure caching for this dataset
tile_options (geodesic.boson.tile_options.TileOptions) – configure tile options for this dataset
- Returns
a new Dataset.
Examples
>>> ds = Dataset.from_stac_collection( ... name="landsat-c2l2alb-sr-usgs", ... url="https://landsatlook.usgs.gov/stac-server/collections/landsat-c2l2alb-sr" ...) >>> ds.save()
- static from_bucket(name, url, pattern=None, region=None, datetime_field=None, start_datetime_field=None, end_datetime_field=None, oriented=False, credential=None, middleware={}, cache={}, tile_options={}, domain='*', category='*', type='*', **kwargs)[source]¶
Creates a new Dataset from a Cloud Storage Bucket (S3/GCP/Azure)
- Parameters
name (str) – name of the Dataset to create
url (str) – the url to the bucket, including the prefix (ex. s3://my-bucket/myprefix, gs://my-bucket/myprefix, …)
pattern (Optional[str]) – a regex to filter for files to index
region (Optional[str]) – for S3 buckets, the region where the bucket is
datetime_field (Optional[str]) – the name of the metadata key on the file to find a timestamp
start_datetime_field (Optional[str]) – the name of the metadata key on the file to find a start timestamp
end_datetime_field (Optional[str]) – the name of the metadata key on the file to find an end timestamp
oriented (bool) – Is this oriented imagery? If so, EXIF data will be parsed for geolocation. Anything missing location info will be dropped.
credential (Optional[str]) – the name or uid of the credential to access the bucket.
middleware (geodesic.boson.middleware.MiddlewareConfig) – configure any boson middleware to be applied to the new dataset.
kwargs – other metadata that will be set on the Dataset, such as description, alias, etc
- Returns
a new Dataset.
Examples
>>> ds = Dataset.from_bucket( ... name="bucket-dataset", ... url="s3://my-bucket/myprefix", ... pattern=r".*\.tif", ... region="us-west-2", ... datetime_field="TIFFTAG_DATETIME", ... oriented=False, ... credential="my-iam-user", ... description="my dataset is the bomb" ...) >>> ds.save()
- static from_google_earth_engine(name, asset, credential, folder='projects/earthengine-public/assets', url='https://earthengine-highvolume.googleapis.com', middleware={}, cache={}, tile_options={}, domain='*', category='*', type='*', **kwargs)[source]¶
Creates a new Dataset from a Google Earth Engine Asset
- Parameters
name (str) – name of the Dataset to create
asset (str) – the asset in GEE to use (ex. ‘LANDSAT/LC09/C02/T1_L2’)
credential (str) – the credential to access this, a Google Earth Engine GCP Service Account. Future will allow the use of a oauth2 refresh token or other.
folder (str) – by default this is the earth engine public, but you can specify another folder if needed to point to legacy data or personal projects.
url (str) – the GEE url to use, defaults to the recommended high volume endpoint.
kwargs – other metadata that will be set on the Dataset, such as description, alias, etc
middleware (geodesic.boson.middleware.MiddlewareConfig) – configure any boson middleware to be applied to the new dataset.
cache (geodesic.boson.boson.CacheConfig) – configure caching for this dataset
tile_options (geodesic.boson.tile_options.TileOptions) – configure tile options for this dataset
- Returns
a new Dataset.
Examples
>>> ds = Dataset.from_google_earth_engine( ... name="landsat-9-c2-gee", ... asset="s3://my-bucket/myprefixLANDSAT/LC09/C02/T1_L2", ... credential="google-earth-engine-svc-account", ... description="my dataset is the bomb" ...) >>> ds.save()
- static from_elasticsearch_index(name, url, index_pattern, credential=None, storage_credential=None, datetime_field='properties.datetime', geometry_field='geometry', geometry_type='geo_shape', id_field='_id', data_api='features', item_type='other', feature_limit=2000, middleware={}, cache={}, tile_options={}, domain='*', category='*', type='*', **kwargs)[source]¶
create a new Dataset from an elasticsearch index containing geojson features or STAC items
- Parameters
name (str) – name of the Dataset to create
url (str) – the DNS name or IP of the elasticsearch host to connect to.
index_pattern (str) – an elasticsearch index name or index pattern
credential (Optional[str]) – name of the Credential object to use. Currently, this only supports basic auth (username/password).
storage_credential (Optional[str]) – the name of the Credential object to use for storage if any of the data referenced in the index requires a credential to access (e.g. cloud storage for STAC)
datetime_field (str) – the field that is used to search by datetime in the elasticserach index.
geometry_field (str) – the name of the field that contains the geometry
geometry_type (str) – the type of the geometry field, either geo_shape or geo_point
id_field (str) – the name of the field to use as an ID field
data_api (str) – the data API, either ‘stac’ or ‘features’
item_type (str) – the type of item. If it’s a stac data_api, then it should describe what the data is
feature_limit (int) – the max number of features the service will return per page.
middleware (geodesic.boson.middleware.MiddlewareConfig) – configure any boson middleware to be applied to the new dataset.
**kwargs – other arguments that will be used to create the collection and provider config.
- Returns
A new Dataset. Must call .save() for it to be usable.
- static from_csv(name, url, index_data=True, crs='EPSG:4326', x_field='CoordX', y_field='CoordY', z_field='CoordZ', geom_field='WKT', datetime_field=None, feature_limit=1000, region=None, credential=None, middleware={}, cache={}, tile_options={}, domain='*', category='*', type='*', **kwargs)[source]¶
create a new Dataset from a CSV file in cloud storage
- Parameters
name (str) – name of the Dataset to create
url (str) – the URL/URI of the data. Can be a cloud storage URI such as s3://<bucket>/key, gs://
index_data (bool) – if true, the data will be copied and spatially indexed for more efficient queries
crs (str) – a string coordinate reference for the data
(x/y/z)_field – the field name for the x/y/z fields
geom_field (str) – the field name containing the geometry in well known text (WKT) or hex encoded well known binary (WKB).
feature_limit (int) – the max number of features this will return per page
datetime_field (Optional[str]) – if the data is time enabled, this is the name of the datetime field. The datetime must be RFC3339 formatted.
region (Optional[str]) – for S3 buckets, the region where the bucket is
credential (Optional[str]) – the name of the credential object needed to access this data.
middleware (geodesic.boson.middleware.MiddlewareConfig) – configure any boson middleware to be applied to the new dataset.
- static from_tabular_data(name, url, index_data=True, crs='EPSG:4326', feature_limit=1000, datetime_field=None, region=None, credential=None, middleware={}, cache={}, tile_options={}, domain='*', category='*', type='*', **kwargs)[source]¶
create a new Dataset from a vector file in cloud storage.
This can be a Shapefile, GeoJSON Feature Collection, FlatGeobuf, and several others
- Parameters
name (str) – name of the Dataset to create
url (str) – the URL/URI of the data. Can be a cloud storage URI such as s3://<bucket>/key, gs://
index_data (bool) – if true, the data will be copied and spatially indexed for more efficient queries
crs (str) – a string coordinate reference for the data
feature_limit (int) – the max number of features this will return per page
datetime_field (Optional[str]) – if the data is time enabled, this is the name of the datetime field. The datetime field must RFC3339 formatted.
region (Optional[str]) – for S3 buckets, the region where the bucket is
credential (Optional[str]) – the name of the credential object needed to access this data.
middleware (geodesic.boson.middleware.MiddlewareConfig) – configure any boson middleware to be applied to the new dataset.
cache (geodesic.boson.boson.CacheConfig) – configure caching for this dataset
tile_options (geodesic.boson.tile_options.TileOptions) – configure tile options for this dataset
- static from_geoparquet(name, url, feature_limit=1000, datetime_field='datetime', return_geometry_properties=False, expose_partitions_as_layer=True, update_existing_index=True, credential=None, middleware={}, cache={}, tile_options={}, domain='*', category='*', type='*', **kwargs)[source]¶
creates a dataset from Hive-partitioned GeoParquet files in cloud storage
Hive-partition GeoParquet is a particular convention typically used when writing data out from a parallel process (such as Tesseract or Apache Spark) or when the individual file sizes or row counts are too large. This provider indexes these partitions spatially to optimize query performance. Hive partitioned parquet is organized like this and we require this structure:
- prefix/<root>.parquet
/key=value_1/<partition-00001>.parquet /key=value_2/<partition-00002>.parquet /… /key=value_m/<partition-n>.parquet
“root” and “partition-xxxxx” can be whatever provided they both have the parquet suffix. Any number oof key/value pairs are allowed in Hive Partitioned data. This can also point to a single parquet file.
- Parameters
name (str) – name of the Dataset to create
url (str) – the path to the <root>.parquet. Format depends on the storage backend.
feature_limit (int) – the max number of features that this provider will allow returned by a single query.
datetime_field (str) – if the data is time enabled, this is the name of the datetime field. This is the name of a column in the parquet dataset that will be used for time filtering. Must be RFC3339 formatted in order to work.
return_geometry_properties (bool) – if True, will compute and return geometry properties along with the features.
expose_partitions_as_layer (bool) – this will create a collection/layer in this Dataset that simply has the partition bounding box and count of features within. Can be used as a simple heatmap
update_existing_index (bool) – if the data has been indexed in our scheme by a separate process, set to False to use that instead, otherwise this will index the parquet data in the bucket before you are able to query it.
credential (Optional[str]) – the name of the credential to access the data in cloud storage.
middleware (geodesic.boson.middleware.MiddlewareConfig) – configure any boson middleware to be applied to the new dataset.
cache (geodesic.boson.boson.CacheConfig) – configure caching for this dataset
tile_options (geodesic.boson.tile_options.TileOptions) – configure tile options for this dataset
**kwargs – additional arguments that will be used to create the STAC collection, Dataset description Alias, etc.
- static from_remote_provider(name, url, data_api='features', transport_protocol='http', additional_properties={}, feature_limit=2000, credential=None, middleware={}, cache={}, tile_options={}, domain='*', category='*', type='*', **kwargs)[source]¶
Creates a dataset from a server implementing the Boson remote provider interface
The Boson Remote Provider interface may be implemented using the Boson Python SDK (https://pypi.org/project/boson-sdk/). The provider must be hosted somewhere and this connects Boson to a remote provider.
Remote Providers may either implement the Search or the Pixels endpoint (or both).
- Parameters
name (str) – name of the Dataset to create
url (str) – URL of the server implementing the interface
data_api (str) – either ‘features’ or ‘raster’.
transport_protocol (str) – either ‘http’ or ‘grpc’
additional_properties (dict) – additional properties to set on the dataset
feature_limit (int) – the max number of features that this provider will allow returned in a single page.
credential (Optional[str]) – the name of the credential to access the api.
middleware (geodesic.boson.middleware.MiddlewareConfig) – configure any boson middleware to be applied to the new dataset.
cache (geodesic.boson.boson.CacheConfig) – configure caching for this dataset
tile_options (geodesic.boson.tile_options.TileOptions) – configure tile options for this dataset
**kwargs – additional arguments that will be used to create the STAC collection, Dataset description Alias, etc.