Skip to content

Commit

Permalink
Merge pull request #38 from Jena-Earth-Observation-School/feature/loa…
Browse files Browse the repository at this point in the history
…ding

Allow overriding of default loading parameters
  • Loading branch information
maawoo authored Dec 13, 2023
2 parents 63f8063 + a7c0c13 commit 780bafd
Show file tree
Hide file tree
Showing 7 changed files with 252 additions and 39 deletions.
51 changes: 51 additions & 0 deletions docs/content/02_Getting_Started/01_00_Data_Access.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,57 @@ defining an area of interest (e.g., using https://geojson.io/). Develop your
workflow on a small subset of the data before scaling up.
```

### Advanced: Overriding the default loading parameters

```{warning}
The following section is only relevant if you deliberately want to override the
default loading parameters and if you know what you are doing. If you are not
sure, please skip this section and just be happy with the default values 🙂 or
get in contact with me to discuss your use case.
```

All data products except for the MSWEP product are loaded internally using the
[`odc.stac.load`](https://odc-stac.readthedocs.io/en/latest/_api/odc.stac.load.html#odc-stac-load)
-function. As mentioned above, some loading parameters are set to default values
to make this package beginner-friendly and easier to use. To be more precise,
the following defaults are used:
- `crs='EPSG:4326'`
- `resolution=0.0002`
- `resampling='bilinear'`
- `chunks={'time': -1, 'latitude': 'auto', 'longitude': 'auto'}`

The default values for `crs` and `resolution`, for example, are the native CRS
and resolution of the Sentinel-1 RTC and the Sentinel-2 L2A products (most bands
of the latter at least). The `resampling`-parameter is only relevant if a data
product needs to be reprojected (note that this is overriden by default for the
SANLC product to use `resampling='nearest'`, which is a better choice for
categorical data). The `chunks`-parameter is important for loading the data
lazily using Dask (see {ref}`xarray-dask-intro` for more information). The
default values have been chosen to work well for time series analysis
(alignment of chunks along the time dimension) and to be memory efficient
(automatically choose the chunk size along the spatial dimensions based on
Dask's default).

If you want to override these defaults or add additional parameters that
influence the loading process, you can do so by providing the
`override_defaults`-parameter to the `load_product`-function. This parameter
should be a dictionary with keys corresponding to parameter names of the
[`odc.stac.load`](https://odc-stac.readthedocs.io/en/latest/_api/odc.stac.load.html#odc-stac-load)
-function and values corresponding to the desired values. It is also possible to
partially override the defaults while keeping the rest unchanged. The following
is a simple example of how to override only the default `resolution`-parameter
when loading the Sentinel-1 RTC product:

```{code-block} python
from sdc.load import load_product
override_defaults = {"resolution": 0.0001}
s1_data = load_product(product="s1_rtc",
vec="/path/to/my_area_of_interest.geojson",
time_range=("2020-01-01", "2021-01-01),
override_defaults=override_defaults)
```

(xarray-dask-intro)=
## Xarray, Dask and lazy loading

Expand Down
45 changes: 37 additions & 8 deletions sdc/load.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,8 @@ def load_product(product: str,
time_range: Optional[tuple[str, str]] = None,
time_pattern: Optional[str] = None,
s2_apply_mask: bool = True,
sanlc_year: Optional[int] = None
sanlc_year: Optional[int] = None,
override_defaults: Optional[dict] = None
) -> Dataset | DataArray:
"""
Load data products available in the SALDi Data Cube (SDC).
Expand Down Expand Up @@ -49,6 +50,17 @@ def load_product(product: str,
Currently supported years are:
- 2018
- 2020
override_defaults : dict, optional
Dictionary of loading parameters to override the default parameters with.
Partial overriding is possible, i.e. only override a specific parameter while
keeping the others at their default values. For an overview of allowed
parameters, see documentation of `odc.stac.load`:
https://odc-stac.readthedocs.io/en/latest/_api/odc.stac.load.html#odc-stac-load
If `None` (default), the default parameters will be used:
- crs: 'EPSG:4326'
- resolution: 0.0002
- resampling: 'bilinear'
- chunks: {'time': -1, 'latitude': 'auto', 'longitude': 'auto'}
Returns
-------
Expand All @@ -57,7 +69,7 @@ def load_product(product: str,
"""
if vec.lower() in ['site01', 'site02', 'site03', 'site04', 'site05', 'site06']:
if product in ['s1_rtc', 's1_surfmi', 's2_l2a']:
print("WARNING: Loading data for an entire SALDi site will likely result "
print("[WARNING] Loading data for an entire SALDi site will likely result "
"in performance issues as it will load data from multiple tiles. "
"Only do so if you know what you are doing and have optimized your "
"workflow! It is recommended to start with a small subset to test "
Expand All @@ -66,24 +78,41 @@ def load_product(product: str,
else:
bounds = fiona.open(vec, 'r').bounds

if override_defaults is not None:
print("[WARNING] Overriding default loading parameters is only recommended for "
"advanced users. Start with the default parameters and only override "
"them if you know what you are doing.")
if product == 'mswep':
print("[INFO] Overriding default loading parameters is currently not "
"supported for the MSWEP product. Default parameters will be used "
"instead.")

kwargs = {'bounds': bounds,
'time_range': time_range,
'time_pattern': time_pattern}

if product == 's1_rtc':
ds = prod.load_s1_rtc(**kwargs)
ds = prod.load_s1_rtc(override_defaults=override_defaults,
**kwargs)
elif product == 's1_surfmi':
ds = prod.load_s1_surfmi(**kwargs)
ds = prod.load_s1_surfmi(override_defaults=override_defaults,
**kwargs)
elif product == 's1_coh':
ds = prod.load_s1_coherence(**kwargs)
ds = prod.load_s1_coherence(override_defaults=override_defaults,
**kwargs)
elif product == 's2_l2a':
ds = prod.load_s2_l2a(apply_mask=s2_apply_mask, **kwargs)
ds = prod.load_s2_l2a(apply_mask=s2_apply_mask,
override_defaults=override_defaults,
**kwargs)
elif product == 'sanlc':
ds = prod.load_sanlc(bounds=bounds, year=sanlc_year)
ds = prod.load_sanlc(bounds=bounds,
year=sanlc_year,
override_defaults=override_defaults)
elif product == 'mswep':
ds = prod.load_mswep(**kwargs)
elif product == 'cop_dem':
ds = prod.load_copdem(bounds=bounds)
ds = prod.load_copdem(bounds=bounds,
override_defaults=override_defaults)
else:
raise ValueError(f'Product {product} not supported')

Expand Down
34 changes: 34 additions & 0 deletions sdc/products/_ancillary.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
from copy import deepcopy
from pathlib import Path
import inspect

from typing import Any
from pystac import Catalog, Collection, Item
Expand Down Expand Up @@ -50,6 +51,39 @@ def common_params() -> dict[str, Any]:
"chunks": {'time': -1, 'latitude': 'auto', 'longitude': 'auto'}}


def override_common_params(params: dict[str, Any],
verbose: bool = True,
**kwargs: Any
) -> dict[str, Any]:
"""
Overrides the common parameters with the provided keyword arguments.
Parameters
----------
params : dict
Dictionary of parameters to override.
verbose : bool
Whether to print the parameters after overriding.
**kwargs : Any
Keyword arguments to override the parameters with.
Returns
-------
dict
A dictionary of the overridden parameters.
"""
from odc.stac import load as odc_stac_load
allowed = inspect.getfullargspec(odc_stac_load).kwonlyargs

for key in kwargs:
if key not in allowed:
raise ValueError(f"Parameter '{key}' is not allowed.")
params.update(kwargs)
if verbose:
print(f"[INFO] odc.stac.load parameters: {params}")
return params


def convert_asset_hrefs(list_stac_obj: list[Catalog | Collection | Item],
href_type: str
) -> list[Catalog | Collection | Item] | list[None]:
Expand Down
24 changes: 20 additions & 4 deletions sdc/products/copdem.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,14 +4,16 @@
from rasterio.enums import Resampling
from xrspatial import slope, aspect

from typing import Optional
from xarray import DataArray

from sdc.products import _ancillary as anc
from sdc.products import _query as query


def load_copdem(bounds: tuple[float, float, float, float]
) -> DataArray:
def load_copdem(bounds: tuple[float, float, float, float],
override_defaults: Optional[dict] = None
) -> DataArray:
"""
Loads the Copernicus 30m GLO DEM (COP-DEM) data product for an area of interest.
Expand All @@ -20,6 +22,17 @@ def load_copdem(bounds: tuple[float, float, float, float]
bounds: tuple of float
The bounding box of the area of interest in the format (minx, miny, maxx, maxy).
Will be used to filter the STAC Catalog for intersecting STAC Collections.
override_defaults : dict, optional
Dictionary of loading parameters to override the default parameters with.
Partial overriding is possible, i.e. only override a specific parameter while
keeping the others at their default values. For an overview of allowed
parameters, see documentation of `odc.stac.load`:
https://odc-stac.readthedocs.io/en/latest/_api/odc.stac.load.html#odc-stac-load
If `None` (default), the default parameters will be used:
- crs: 'EPSG:4326'
- resolution: 0.0002
- resampling: 'bilinear'
- chunks: {'time': -1, 'latitude': 'auto', 'longitude': 'auto'}
Returns
-------
Expand All @@ -31,9 +44,12 @@ def load_copdem(bounds: tuple[float, float, float, float]

catalog = Catalog.from_file(anc.get_catalog_path(product=product))
_, items = query.filter_stac_catalog(catalog=catalog, bbox=bounds)


params = anc.common_params()
if override_defaults is not None:
params = anc.override_common_params(params=params, **override_defaults)
da = odc_stac_load(items=items, bands=bands, bbox=bounds, dtype='float32',
**anc.common_params())
**params)
da = da.height.squeeze()

# Calculate slope and aspect
Expand Down
Loading

0 comments on commit 780bafd

Please sign in to comment.