Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiband Raster Support #107

Closed
julianblue opened this issue Mar 22, 2023 · 10 comments · Fixed by #111
Closed

Multiband Raster Support #107

julianblue opened this issue Mar 22, 2023 · 10 comments · Fixed by #111

Comments

@julianblue
Copy link

@Kirill888 , is there support for loading mutliband cogs ind odc-stac ?
Screenshot 2023-03-22 at 09 29 09

For example, the visual band is multi band and only loads a single band.

Thanks in advance.

@Kirill888
Copy link
Member

odc-stac does support multi-band images but it requires stac items to use eo extension and describe all the bands.

But this is mainly for "hyper-spectral" type of data stored within a single TIFF image, and not for "RGB(A)" assets, eo:bands should still work for those too.

Can you provide an example stac item from which screenshot above was generated.

@Kirill888
Copy link
Member

@julianblue rgb(a) sources introduce a bit of a conundrum actually, as they can be seen as either

  1. 3 or 4 separate bands with common footprint and aligned pixels (t,y,x coords on 3/4 DataArrays)
  2. As as single "visual band" (t,y,x,band coords on a single DataArray)

I'm guessing your expectation for the output would be option (2)? Currently only option (1) is supported and only if eo:bands are present on the asset being loaded. I'm not aware of any STAC extension mechanism to indicate that loading data in t,y,x,band format should be preferred as an asset contains rgb(a) imagery. Maybe @gadomski or @TomAugspurger can comment on the STAC side of things?

@gadomski
Copy link
Contributor

I'm not aware of any STAC extension mechanism to indicate that loading data in t,y,x,band format should be preferred as an asset contains rgb(a) imagery.

I think the closest we get is the visual role, which is described as:

An asset that is a full resolution version of the data, processed for visual use (RGB only, often sharpened (pan-sharpened and/or using an unsharp mask)).

A second huersitc would be to look at eo:bands->common_name and prefer a three-band representation if red, green, and blue were present.

But I think I would prefer the visual role as the preferred way to say "this asset should be loaded as RGB imagery".

@julianblue
Copy link
Author

julianblue commented Mar 27, 2023

odc-stac does support multi-band images but it requires stac items to use eo extension and describe all the bands.

But this is mainly for "hyper-spectral" type of data stored within a single TIFF image, and not for "RGB(A)" assets, eo:bands should still work for those too.

Can you provide an example stac item from which screenshot above was generated.

The example I mentioned was for the true color image of a sentinel 2-l2a tile. https://stacindex.org/catalogs/earth-search#/item/43bjKKcJQfxYaT1ir3Ep6uENfjEoQrjkzhd2/NopKk2m24tJYiekqbLp72LWk57p1ZsViRLdV3HiFw13eHu73QTYZUKAB?t=2#9/-6.825703/133.001187
Ideally the returned xarray object would have the visual raster listed as a single band/data variable, although I see the complications of inconsistent band dimensions through all data variables.

To extend this question, I am especially looking to load multi band COGS though odc-stac. For example, RGBNIR Planet scope data. What would a stac item need to look like to load this data via odc-stac ?

@Kirill888
Copy link
Member

Kirill888 commented Mar 27, 2023

@julianblue I had a deeper look at this and refreshed my memory

  • My previous comment about eo:bands was wrong, rather it needs to have raster:bands extension, which is not present on the visual band in that collection, even though eo:bands ARE defined (@gadomski is this by design or an omission?).
{'href': 'https://sentinel-cogs.s3.us-west-2.amazonaws.com/sentinel-s2-l2a-cogs/30/S/TF/2023/3/S2A_30STF_20230322_0_L2A/TCI.tif',
 'type': 'image/tiff; application=geotiff; profile=cloud-optimized',
 'title': 'True color image',
 'eo:bands': [{'name': 'red',
   'common_name': 'red',
   'description': 'Red (band 4)',
   'center_wavelength': 0.665,
   'full_width_half_max': 0.038},
  {'name': 'green',
   'common_name': 'green',
   'description': 'Green (band 3)',
   'center_wavelength': 0.56,
   'full_width_half_max': 0.045},
  {'name': 'blue',
   'common_name': 'blue',
   'description': 'Blue (band 2)',
   'center_wavelength': 0.49,
   'full_width_half_max': 0.098}],
 'proj:shape': [10980, 10980],
 'proj:transform': [10, 0, 199980, 0, -10, 4100040],
 'roles': ['visual']}

In comparison a single channel green asset, does define both eo:bands and raster:bands:

{'href': 'https://sentinel-cogs.s3.us-west-2.amazonaws.com/sentinel-s2-l2a-cogs/30/S/TF/2023/3/S2A_30STF_20230322_0_L2A/B03.tif',
 'type': 'image/tiff; application=geotiff; profile=cloud-optimized',
 'title': 'Green (band 3) - 10m',
 'eo:bands': [{'name': 'green',
   'common_name': 'green',
   'description': 'Green (band 3)',
   'center_wavelength': 0.56,
   'full_width_half_max': 0.045}],
 'gsd': 10,
 'proj:shape': [10980, 10980],
 'proj:transform': [10, 0, 199980, 0, -10, 4100040],
 'raster:bands': [{'nodata': 0,
   'data_type': 'uint16',
   'bits_per_sample': 15,
   'spatial_resolution': 10,
   'scale': 0.0001,
   'offset': -0.1}],
 'roles': ['data', 'reflectance']}
  • After patching item to include raster:bands, I was able to load all three visual bands, BUT there is an error in code and it ends up loading the first band into all three, but that's an easy fix.

Problem is here:

asset_name, _ = bk
asset = _assets.get(asset_name)
if asset is None:
continue
grid_name = band2grid.get(asset_name, "default")
geobox: Optional[GeoBox] = _get_grid(grid_name, asset) if has_proj else None
uri = asset.get_absolute_href()
if uri is None:
raise ValueError(
f"Can not determine absolute path for band: {asset_name}"
) # pragma: no cover (https://github.com/stac-utils/pystac/issues/754)
bands[bk] = RasterSource(uri=uri, geobox=geobox, meta=meta)

band index gets ignored and all the bands from the same asset end up loading the first band of the image.

from odc.stac import load as stac_load, __version__
from pystac import Item
from pystac.extensions.raster import RasterExtension
from odc.stac import load as stac_load
from odc.stac._mdtools import parse_item

print(f"odc.stac=={__version__}")

item = Item.from_file("https://earth-search.aws.element84.com/v1/collections/sentinel-2-l2a/items/S2A_30STF_20230322_0_L2A")

# inject raster:bands
dd = item.to_dict()
dd['assets']['visual']['raster:bands'] = [{'data_type': 'uint8'}]*3
item_patched = Item.from_dict(dd)

assert RasterExtension.ext(item.assets['visual']).bands is None
assert RasterExtension.ext(item_patched.assets['visual']).bands is not None

p = parse_item(item)
p_ = parse_item(item_patched)

print("Original")
display(p.collection.bands, p.bands)
print("\nPatched")
display(p_.collection.bands, p_.bands)

Error in patched parsed data is here:

('visual',  1): RasterSource(uri='https://sentinel-cogs.s3.us-west-2.amazonaws.com/sentinel-s2-l2a-cogs/30/S/TF/2023/3/S2A_30STF_20230322_0_L2A/TCI.tif',
band=1, subdataset=None, geobox=GeoBox((10980, 10980), Affine(10.0, 0.0, 199980.0,
        0.0, -10.0, 4100040.0), CRS('EPSG:32630')), meta=RasterBandMetadata(data_type='uint8', nodata=None, unit='1')),
 ('visual',  2): RasterSource(uri='https://sentinel-cogs.s3.us-west-2.amazonaws.com/sentinel-s2-l2a-cogs/30/S/TF/2023/3/S2A_30STF_20230322_0_L2A/TCI.tif', 
band=1, subdataset=None, geobox=GeoBox((10980, 10980), Affine(10.0, 0.0, 199980.0,
        0.0, -10.0, 4100040.0), CRS('EPSG:32630')), meta=RasterBandMetadata(data_type='uint8', nodata=None, unit='1')),
 ('visual',  3): RasterSource(uri='https://sentinel-cogs.s3.us-west-2.amazonaws.com/sentinel-s2-l2a-cogs/30/S/TF/2023/3/S2A_30STF_20230322_0_L2A/TCI.tif',
band=1, subdataset=None, geobox=GeoBox((10980, 10980), Affine(10.0, 0.0, 199980.0,
        0.0, -10.0, 4100040.0), CRS('EPSG:32630')), meta=RasterBandMetadata(data_type='uint8', nodata=None, unit='1')),

band=1 for all 3 visual bands, but should be 1,2,3.

@Kirill888
Copy link
Member

After patching data and patching code I was able to load rgb

xx = stac_load(
    [item_patched],
    bands=("visual.1", "visual.2", "visual.3"),
    chunks={},
    resolution=160,
)
rgb = xx.to_array("band").transpose("time", "y", "x", "band")
display(rgb)
rgb.compute().plot.imshow(col='time');

image

Kirill888 added a commit that referenced this issue Mar 28, 2023
- Band index was always set to 1 on the parsed item
- Conversion to datacube dataset also had an issue with band index:
  it was adding +1 to an already 1 based index.
Kirill888 added a commit that referenced this issue Mar 28, 2023
- Band index was always set to 1 on the parsed item
- Conversion to datacube dataset also had an issue with band index:
  it was adding +1 to an already 1 based index.
@julianblue
Copy link
Author

This is awesome @Kirill888 !
Just for clarification, is it best practice to have both eo:bands and raster:bands as properties ? Does odc-stac require both to create the xarray object ?

@Kirill888
Copy link
Member

@julianblue

Just for clarification, is it best practice to have both eo:bands and raster:bands as properties ? Does odc-stac require both to create the xarray object ?

This page describes what extensions are used by odc-stac when available:

https://odc-stac.readthedocs.io/en/latest/stac-best-practice.html

odc-stac aspires to be able to load data from any valid STAC collection with GeoTIFF assets, in practice that might not be always possible. STAC by itself guarantees very little beyond a link to the file, timestamp and "approximate" region covered. One can always inspect the image files to figure out everything that raster:bands and proj:* provide. When you are processing thousands of files together to form a data-cube, extracting that metadata from each asset ahead of time is not really feasible (slow, and in case of remote Dask cluster might not be even possible due to access rights), so moving that information into STAC is the best practice.

When extensions are not supplied, rather than inspecting all the assets to figure out things like dtype, nodata, native resolution, width/height and projection of the assets odc-stac either requires information from the user (desired CRS and resolution of the output), or defaults to something reasonable (dtype=float32, 1 single band per asset). There is also user configuration, which is basically an override for missing or incorrect metadata.

We could potentially introduce a mode where we inspect one of the items from a collection and then assume that all of them are like that, this should at least work for dtype, nodata and resolution, and maybe CRS (will need to detect "families" of projections, i.e. UTM).

@julianblue
Copy link
Author

@Kirill888 ,

another question. For this example:

from pystac import Item
import pystac
import odc.stac

item = pystac.read_file(
    "https://planetarycomputer.microsoft.com/api/stac/v1/collections/naip/items/fl_m_2608004_nw_17_060_20191215_20200113"
 )
dd = item.to_dict()
dd['assets']['image']['raster:bands'] = [{'data_type': 'uint8'}]*4
item_patched = Item.from_dict(dd)
ds3 = odc.stac.load([item_patched], chunks={}, resolution=100)
ds3.data_vars

which returns:

Data variables:
    image    (time, y, x) uint8 dask.array<chunksize=(1, 75, 67), meta=np.ndarray>
    Green    (time, y, x) uint8 dask.array<chunksize=(1, 75, 67), meta=np.ndarray>
    Blue     (time, y, x) uint8 dask.array<chunksize=(1, 75, 67), meta=np.ndarray>
    NIR      (time, y, x) uint8 dask.array<chunksize=(1, 75, 67), meta=np.ndarray>

I think there needs to be some modification here ?

odc-stac/odc/stac/_model.py

Lines 126 to 137 in dc9f5d5

def _norm_key(self, k: BandKey) -> str:
asset, idx = k
if idx == 1:
return asset
# if any alias references this key as first choice return that
for alias, (_k, *_) in self.aliases.items():
if _k == k:
return alias
# Finaly use . notation
return f"{asset}.{idx}"

Ideally it should be image.red, image.green ... correct ?

Thanks in advance.

@Kirill888
Copy link
Member

@julianblue in STAC the same "common_name" can be attached to multiple bands of different assets so there is potential for clashing of output bands, so we default to asset name which is guaranteed to be unique. But this approach obviously breaks down for multi-band assets. You can supply bands = ["Red", "Green", "Blue", "NIR"] and it will pick the right band from the right assets and name output data vars properly, but this would break if there are other assets using the same common_name as then there is an ambiguity as to which "red" you mean.

I think logic should check if asset pointed by k has more bands than 1 and act differently

- if idx == 1:
+ if idx == 1 and self.band_count[asset] == 1:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants