geodesic.boson.dataset.Dataset.from_cloud_hosted_imagery#

static Dataset.from_cloud_hosted_imagery(name, url=None, regex_pattern=None, glob_pattern=None, region=None, s3_endpoint=None, datetime_field=None, start_datetime_field=None, end_datetime_field=None, datetime_filename_pattern=None, start_datetime_filename_pattern=None, end_datetime_filename_pattern=None, metadata_pattern=None, match_full_path=False, orthorectification_altitude=None, feature_limit=2000, oriented=False, no_data=None, credential=None, pattern=None, middleware={}, cache={}, tile_options={}, domain='*', category='*', type='*', **kwargs)[source]#

Creates a new Dataset from imagery hosted in a Cloud Storage Bucket (S3/GCP/Azure).

Parameters:
  • name (str) – name of the Dataset to create

  • url (str | None) – the url to the bucket, including the prefix (ex. s3://my-bucket/myprefix, gs://my-bucket/myprefix, …)

  • regex_pattern (str | None) – a regex pattern to filter for files to index (e.g. .*.tif)

  • glob_pattern (str | None) – a glob pattern to filter for files to index (e.g. *.tif)

  • region (str | None) – for S3 buckets, the region where the bucket is

  • s3_endpoint (str | None) – for S3 buckets, the endpoint to use (e.g. https://data.source.coop). If not provided, will use the default

  • datetime_field (str | None) – the name of the metadata key on the file to find a timestamp

  • start_datetime_field (str | None) – the name of the metadata key on the file to find a start timestamp

  • end_datetime_field (str | None) – the name of the metadata key on the file to find an end timestamp

  • datetime_filename_pattern (str | None) – a regex pattern to extract a datetime from the filename

  • start_datetime_filename_pattern (str | None) – a regex pattern to extract a start datetime from the filename

  • end_datetime_filename_pattern (str | None) – a regex pattern to extract an end datetime from the filename

  • metadata_pattern (str | None) – a regex pattern to extract metadata from the filename

  • match_full_path (bool) – if True, will match the full path/key of the file to the datetime and metadata patterns. If False, will only match the filename.

  • orthorectification_altitude (float | None) – the altitude in meters (above mean sea level) to use for orthorectification. If not provided, will use the mean sea level. Not needed if imagery is orthorectified.

  • feature_limit (int) – the max number of features to return in a single page from a search. Defaults to 2000

  • oriented (bool) – Is this oriented imagery? If so, EXIF data will be parsed for geolocation. Anything missing location info will be dropped.

  • no_data (list | tuple | None) – a list of no data values to be treated as “no data” in source imagery

  • pattern (str | None) – (DEPRECATED: use regex_pattern or glob_pattern instead) a regex to filter for files to index

  • credential (str | None) – the name or uid of the credential to access the bucket.

  • middleware (MiddlewareConfig | list) – configure any boson middleware to be applied to the new dataset.

  • cache (CacheConfig) – configure caching for this dataset

  • tile_options (TileOptions) – configure tile options for this dataset

  • domain (str) – domain of the resulting Dataset

  • category (str) – category of the resulting Dataset

  • type (str) – the type of the resulting Dataset

  • **kwargs – additional properties to set on the new Dataset

Returns:

a new Dataset.

Return type:

Dataset

Examples

>>> ds = Dataset.from_cloud_hosted_imagery(
...          name="bucket-dataset",
...          url="s3://my-bucket/myprefix",
...          glob_pattern=r"*.tif",
...          region="us-west-2",
...          datetime_field="TIFFTAG_DATETIME",
...          oriented=False,
...          credential="my-iam-user",
...          description="my dataset is the bomb"
...)
>>> ds.stage()
>>> # Staging is optional, but is a useful tool for validating configuration
>>> ds.save()
>>> # Extract 'quadkey' and 'filename' as an additional properties from the path using
>>> # a regular expression with named capture groups. If you need to match the entire
>>> # path, also set `max_full_path=True`
>>> #
>>> # Example image path: s3://my-bucket/031311100221/12/34/56.tif
>>> # We recommend testing your expression against the expected path using a tool such
>>> # as https://regex101.com
>>> #
>>> ds = Dataset.from_cloud_hosted_imagery(
...         name="bucket-dataset",
...         url="s3://my-bucket",
...         regex_pattern=r".*\.tif",
...         metadata_pattern=r"\/(?P<quadkey>\d{12})\/.*\/(?P<filename>.*)\.tif",
...         match_full_path=True
...         oriented=False,
...         credential="my-iam-user",
...         description="my dataset has extra properties"
... )
>>> ds.save()