geodesic.boson.dataset.Dataset.from_cloud_hosted_imagery#
- static Dataset.from_cloud_hosted_imagery(name, url=None, regex_pattern=None, glob_pattern=None, region=None, s3_endpoint=None, datetime_field=None, start_datetime_field=None, end_datetime_field=None, datetime_filename_pattern=None, start_datetime_filename_pattern=None, end_datetime_filename_pattern=None, metadata_pattern=None, match_full_path=False, orthorectification_altitude=None, feature_limit=2000, oriented=False, no_data=None, credential=None, pattern=None, middleware={}, cache={}, tile_options={}, domain='*', category='*', type='*', **kwargs)[source]#
Creates a new Dataset from imagery hosted in a Cloud Storage Bucket (S3/GCP/Azure).
- Parameters:
name (str) – name of the Dataset to create
url (str | None) – the url to the bucket, including the prefix (ex. s3://my-bucket/myprefix, gs://my-bucket/myprefix, …)
regex_pattern (str | None) – a regex pattern to filter for files to index (e.g. .*.tif)
glob_pattern (str | None) – a glob pattern to filter for files to index (e.g. *.tif)
region (str | None) – for S3 buckets, the region where the bucket is
s3_endpoint (str | None) – for S3 buckets, the endpoint to use (e.g. https://data.source.coop). If not provided, will use the default
datetime_field (str | None) – the name of the metadata key on the file to find a timestamp
start_datetime_field (str | None) – the name of the metadata key on the file to find a start timestamp
end_datetime_field (str | None) – the name of the metadata key on the file to find an end timestamp
datetime_filename_pattern (str | None) – a regex pattern to extract a datetime from the filename
start_datetime_filename_pattern (str | None) – a regex pattern to extract a start datetime from the filename
end_datetime_filename_pattern (str | None) – a regex pattern to extract an end datetime from the filename
metadata_pattern (str | None) – a regex pattern to extract metadata from the filename
match_full_path (bool) – if True, will match the full path/key of the file to the datetime and metadata patterns. If False, will only match the filename.
orthorectification_altitude (float | None) – the altitude in meters (above mean sea level) to use for orthorectification. If not provided, will use the mean sea level. Not needed if imagery is orthorectified.
feature_limit (int) – the max number of features to return in a single page from a search. Defaults to 2000
oriented (bool) – Is this oriented imagery? If so, EXIF data will be parsed for geolocation. Anything missing location info will be dropped.
no_data (list | tuple | None) – a list of no data values to be treated as “no data” in source imagery
pattern (str | None) – (DEPRECATED: use regex_pattern or glob_pattern instead) a regex to filter for files to index
credential (str | None) – the name or uid of the credential to access the bucket.
middleware (MiddlewareConfig | list) – configure any boson middleware to be applied to the new dataset.
cache (CacheConfig) – configure caching for this dataset
tile_options (TileOptions) – configure tile options for this dataset
domain (str) – domain of the resulting
Datasetcategory (str) – category of the resulting
Datasettype (str) – the type of the resulting
Dataset**kwargs – additional properties to set on the new
Dataset
- Returns:
a new Dataset.
- Return type:
Examples
>>> ds = Dataset.from_cloud_hosted_imagery( ... name="bucket-dataset", ... url="s3://my-bucket/myprefix", ... glob_pattern=r"*.tif", ... region="us-west-2", ... datetime_field="TIFFTAG_DATETIME", ... oriented=False, ... credential="my-iam-user", ... description="my dataset is the bomb" ...) >>> ds.stage() >>> # Staging is optional, but is a useful tool for validating configuration >>> ds.save()
>>> # Extract 'quadkey' and 'filename' as an additional properties from the path using >>> # a regular expression with named capture groups. If you need to match the entire >>> # path, also set `max_full_path=True` >>> # >>> # Example image path: s3://my-bucket/031311100221/12/34/56.tif >>> # We recommend testing your expression against the expected path using a tool such >>> # as https://regex101.com >>> # >>> ds = Dataset.from_cloud_hosted_imagery( ... name="bucket-dataset", ... url="s3://my-bucket", ... regex_pattern=r".*\.tif", ... metadata_pattern=r"\/(?P<quadkey>\d{12})\/.*\/(?P<filename>.*)\.tif", ... match_full_path=True ... oriented=False, ... credential="my-iam-user", ... description="my dataset has extra properties" ... ) >>> ds.save()