Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamic time selection #2055

Merged
merged 30 commits into from
Feb 17, 2025
Merged
Show file tree
Hide file tree
Changes from 26 commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
afddae6
Add time selection between dynamic doy
Dec 19, 2024
0593440
Fixing doctsting issue of mask_between_doys
Dec 19, 2024
5ec7ec6
Fix the inclusion of bounds for int doys
Jan 20, 2025
2512024
Merge branch 'main' into dynamic-time-selection
Jan 22, 2025
ac76379
Add details to AUTHORS.rst
Jan 22, 2025
2e40c9f
Add details to zenodo file
Jan 22, 2025
0c815b9
Add a check to ensure that doy bounds have same freq in mask_between_…
Jan 28, 2025
c52cd63
Fix bug when masking between doys on dataset
Jan 28, 2025
36b1862
Edit freq error message in mask_between_doys
Jan 28, 2025
ac20249
Rewrite mask_between_doys to add spatial dims support
aulemahal Jan 31, 2025
ab2fb7b
Merge branch 'main' into dynamic-time-selection
aulemahal Feb 3, 2025
ca47063
Merge branch 'main' into dynamic-time-selection
Zeitsperre Feb 5, 2025
5c91ef3
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Feb 5, 2025
775f9ed
Add tests - fix doc - fix end < start case for bounds
aulemahal Feb 11, 2025
4584b7c
Merge branch 'main' into dynamic-time-selection
aulemahal Feb 11, 2025
fccf1b5
Add changelog line
aulemahal Feb 11, 2025
0cea4d1
Merge branch 'main' into dynamic-time-selection
Zeitsperre Feb 11, 2025
bf6bc07
merge and add doy bounds support in expected_count
aulemahal Feb 12, 2025
f87a61d
merge
aulemahal Feb 12, 2025
243c3be
Merge branch 'main' into dynamic-time-selection
Zeitsperre Feb 12, 2025
19f3f42
invert the problem, stop dropping when indexing
aulemahal Feb 12, 2025
dd07766
Merge branch 'main' into dynamic-time-selection
aulemahal Feb 12, 2025
90e696d
Merge branch 'dynamic-time-selection' of github.com:baptistehamon/xcl…
aulemahal Feb 12, 2025
9fb56a2
upd nb example
aulemahal Feb 12, 2025
3446d36
add simple tests for DA indexing
aulemahal Feb 12, 2025
f75e559
Merge branch 'main' into dynamic-time-selection
Zeitsperre Feb 13, 2025
38415e9
add tests with include_bounds
aulemahal Feb 17, 2025
3f44a35
Merge branch 'main' into dynamic-time-selection
aulemahal Feb 17, 2025
cc767b5
Error on drop True and array-like doy-bounds
aulemahal Feb 17, 2025
a96ad8d
Merge branch 'dynamic-time-selection' of github.com:baptistehamon/xcl…
aulemahal Feb 17, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .zenodo.json
Original file line number Diff line number Diff line change
Expand Up @@ -153,6 +153,11 @@
"name": "Lehner, Sebastian",
"affiliation": "GeoSphere Austria, Vienna, Austria",
"orcid": "0000-0002-7562-8172"
},
{
"name": "Hamon, Baptiste",
"affiliation": "University of Canterbury, Christchurch, New Zealand",
"orcid": "0009-0007-4530-9772"
}
],
"keywords": [
Expand Down
1 change: 1 addition & 0 deletions AUTHORS.rst
Original file line number Diff line number Diff line change
Expand Up @@ -48,3 +48,4 @@ Contributors
* Adrien Lamarche `@LamAdr <https://github.com/LamAdr>`_
* Faisal Mahmood <[email protected]> <[email protected]> `@faimahsho <https://github.com/faimahsho>`_
* Sebastian Lehner <[email protected]> `@seblehner <https://github.com/seblehner>`_
* Baptiste Hamon <[email protected]> `@baptistehamon <https://github.com/baptistehamon>`_
8 changes: 5 additions & 3 deletions CHANGELOG.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ Changelog

v0.55.0 (unreleased)
--------------------
Contributors to this version: Juliette Lavoie (:user:`juliettelavoie`), Trevor James Smith (:user:`Zeitsperre`), Sascha Hofmann (:user:`saschahofmann`), Pascal Bourgault (:user:`aulemahal`), Éric Dupuis (:user:`coxipi`).
Contributors to this version: Juliette Lavoie (:user:`juliettelavoie`), Trevor James Smith (:user:`Zeitsperre`), Sascha Hofmann (:user:`saschahofmann`), Pascal Bourgault (:user:`aulemahal`), Éric Dupuis (:user:`coxipi`), Baptiste Hamon (:user:`baptistehamon`).

Breaking changes
^^^^^^^^^^^^^^^^
Expand All @@ -29,16 +29,18 @@ New features and enhancements
* ``xclim.testing.helpers.test_timeseries`` now accepts a `calendar` argument that is forwarded to ``xr.cftime_range``. (:pull:`2019`).
* New ``xclim.indices.fao_allen98``, exporting the FAO-56 Penman-Monteith equation for potential evapotranspiration (:issue:`2004`, :pull:`2067`).
* Missing values method "pct" and "at_least_n" now accept a new "subfreq" option that allows to compute the missing mask in two steps. When given, the algorithm is applied at this "subfreq" resampling frequency first and then the result is resampled at the target indicator frequency. In the output, a period is invalid if any of its subgroup where flagged as invalid by the chosen method. (:pull:`2058`, :issue:`1820`).
* `rv_continuous` functions can now be given directly as the `dist` argument in ``standardized_precipitation_index`` and ``standardized_precipitation_evapotranspiration_index`` indicators. This includes `lmoments3` distribution when specifying `method="PWM"`. (:issue:`2043`, :pull:`2045`).
* ``rv_continuous`` functions can now be given directly as the ``dist`` argument in ``standardized_precipitation_index`` and ``standardized_precipitation_evapotranspiration_index`` indicators. This includes `lmoments3` distribution when specifying ``method="PWM"``. (:issue:`2043`, :pull:`2045`).
* Time selection in ``xclim.core.calendar.select_time`` and the ``**indexer`` argument of indicators now supports day-of-year bounds given as DataArrays with spatial and/or temporal dimensions. (:issue:`1987`, :pull:`2055`).

Internal changes
^^^^^^^^^^^^^^^^
* `sphinx-codeautolink` and `pygments` have been temporarily pinned due to breaking API changes. (:pull:`2030`).
* Adjusted the ``TestOfficialYaml`` test to use a dynamic method for finding the installed location of `xclim`. (:pull:`2028`).
* Adjusted two tests for better handling when running in Windows environments. (:pull:`2057`).
* Refactor of the ``xclim.core.missing`` module, usage of the ``Missing`` objects has been broken. (:pull:`2058`, :issue:`1820`, :issue:`2000`).
* Refactor of the ``xclim.core.missing`` module, usage of the ``Missing`` objects has been broken. (:pull:`2058`, :pull:`2055`, :pull:`2076`, :issue:`1820`, :issue:`2000`).
- Objects are initialized with their options and then called with the data, input frequency, target frequency and indexer.
- Subclasses receive non-resampled DataArray in their ``is_missing`` methods.
- Subclasses receive the array of valid timesteps ``valid`` instead of ``null``, the invalid ones.
- ``MissingWMO`` now uses ``xclim.indices.helpers.resample_map`` which should greatly improve performance in a dask context.
* There is now a warning stating that `fitkwargs` are not employed when using the `lmoments3` distribution. One exception is the use of `'floc'` which is allowed with the gamma distribution. `'floc'` is used to shift the distribution before computing fitting parameters with the `lmoments3` distribution since ``loc=0`` is always assumed in the library. (:issue:`2043`, :pull:`2045`).

Expand Down
5 changes: 3 additions & 2 deletions docs/notebooks/customize.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -157,7 +157,7 @@
"\n",
"Finally, it is possible for advanced users to register their own methods. Xclim's missing methods are in fact class-based. To create a custom missing class, one should implement a subclass of `xclim.core.checks.MissingBase` and override at least the `is_missing` method. This method should take the following arguments:\n",
"\n",
"- `null`, a `DataArray` of the mask of invalid values in the input data array (with the same time coordinate as the raw data).\n",
"- `valid`, a `DataArray` of the mask of valid values in the input data array (with the same time coordinate as the raw data).\n",
"- `count`, `DataArray` of the number of days in each resampled periods\n",
"- `freq`, the resampling frequency.\n",
"\n",
Expand Down Expand Up @@ -185,8 +185,9 @@
" def __init__(self, max_n: int = 5):\n",
" super().__init__(max_n=max_n)\n",
"\n",
" def is_missing(self, null, count, freq):\n",
" def is_missing(self, valid, count, freq):\n",
" \"\"\"Return a boolean mask where True values are for elements that are considered missing and masked on the output.\"\"\"\n",
" null = ~valid\n",
" return (\n",
" null.resample(time=freq).map(longest_run, dim=\"time\")\n",
" >= self.options[\"max_n\"]\n",
Expand Down
146 changes: 130 additions & 16 deletions src/xclim/core/calendar.py
Original file line number Diff line number Diff line change
Expand Up @@ -1037,7 +1037,7 @@ def doy_to_days_since(
# 2cases:
# val is a day in the same year as its index : da - offset
# val is a day in the next year : da + doy_max - offset
out = xr.where(dac > base_doy, dac, dac + doy_max) - start_doy
out = xr.where(dac >= base_doy, dac, dac + doy_max) - start_doy
out.attrs.update(da.attrs)
if start is not None:
out.attrs.update(units=f"days after {start}")
Expand Down Expand Up @@ -1117,12 +1117,136 @@ def days_since_to_doy(
return out.convert_calendar(base_calendar).rename(da.name)


def _get_doys(start: int, end: int, inclusive: tuple[bool, bool]):
"""
Get the day of year list from start to end.

Parameters
----------
start : int
Start day of year.
end : int
End day of year.
inclusive : 2-tuple of booleans
Whether the bounds should be inclusive or not.

Returns
-------
np.ndarray
Array of day of year between the start and end.
"""
if start <= end:
doys = np.arange(start, end + 1)
else:
doys = np.concatenate((np.arange(start, 367), np.arange(0, end + 1)))
if not inclusive[0]:
doys = doys[1:]
if not inclusive[1]:
doys = doys[:-1]
return doys


def mask_between_doys(
da: xr.DataArray,
doy_bounds: tuple[int | xr.DataArray, int | xr.DataArray],
include_bounds: tuple[bool, bool] = [True, True],
) -> xr.DataArray | xr.Dataset:
"""
Mask the data outside the day of year bounds.

Parameters
----------
da : xr.DataArray or xr.Dataset
Input data. It must have a time coordinate.
doy_bounds : 2-tuple of integers or DataArray
The bounds as (start, end) of the period of interest expressed in day-of-year, integers going from
1 (January 1st) to 365 or 366 (December 31st).
If DataArrays are passed, they must have the same coordinates on the dimensions they share.
They may have a time dimension, in which case the masking is done independently for each period defined by the coordinate,
which means the time coordinate must have an inferable frequency (see :py:func:`xr.infer_freq`).
Timesteps of the input not appearing in the time coordinate of the bounds are masked as "outside the bounds".
Missing values (nan) in the start and end bounds default to 1 and 366 respectively in the non-temporal case
and to open bounds (the start and end of the period) in the temporal case.
include_bounds : 2-tuple of booleans
Whether the bounds of `doy_bounds` should be inclusive or not.

Returns
-------
xr.DataArray
Boolean array with the same time coordinate as `da` and any other dimension present on the bounds.
True value inside the period of interest and False outside.
"""
if isinstance(doy_bounds[0], int) and isinstance(doy_bounds[1], int): # Simple case
mask = da.time.dt.dayofyear.isin(_get_doys(*doy_bounds, include_bounds))
else:
start, end = doy_bounds
# convert ints to DataArrays
if isinstance(start, int):
start = xr.full_like(end, start)
elif isinstance(end, int):
end = xr.full_like(start, end)
# Ensure they both have the same dims
# align join='exact' will fail on common but different coords, broadcast will add missing coords
start, end = xr.broadcast(*xr.align(start, end, join="exact"))

if not include_bounds[0]:
start += 1
if not include_bounds[1]:
end -= 1

if "time" in start.dims:
freq = xr.infer_freq(start.time)
# Convert the doy bounds to a duration since the beginning of each period defined in the bound's time coordinate
# Also ensures the bounds share the sime time calendar as the input
# Any missing value is replaced with the min/max of possible values
calkws = dict(
calendar=da.time.dt.calendar, use_cftime=(da.time.dtype == "O")
)
start = doy_to_days_since(start.convert_calendar(**calkws)).fillna(0)
end = doy_to_days_since(end.convert_calendar(**calkws)).fillna(366)

out = []
# For each period, mask the days since between start and end
for base_time, indexes in da.resample(time=freq).groups.items():
group = da.isel(time=indexes)

if base_time in start.time:
start_d = start.sel(time=base_time)
end_d = end.sel(time=base_time)

# select days between start and end for group
days = (group.time - base_time).dt.days
days = days.where(days >= 0)
mask = (days >= start_d) & (days <= end_d)
else: # This group has no defined bounds : put False in the mask
# Array with the same shape as the "mask" in the other case : broadcast of time and bounds dims
template = xr.broadcast(
group.time.dt.day, start.isel(time=0, drop=True)
)[0]
mask = xr.full_like(template, False, dtype="bool")
out.append(mask)
mask = xr.concat(out, dim="time")
else: # Only "Spatial" dims, we can't constrain as in days since, so there are two cases
doys = da.time.dt.dayofyear # for readability
# Any missing value is replaced with the min/max of possible values
start = start.fillna(1)
end = end.fillna(366)
mask = xr.where(
start <= end,
# case 1 : start <= end, ROI is within a calendar year
(doys >= start) & (doys <= end),
# case 2 : start > end, ROI crosses the new year
~((doys > end) & (doys < start)),
)
return mask


def select_time(
da: xr.DataArray | xr.Dataset,
drop: bool = False,
season: str | Sequence[str] | None = None,
month: int | Sequence[int] | None = None,
doy_bounds: tuple[int, int] | None = None,
doy_bounds: tuple[int | xr.DataArray, int | xr.DataArray] | None = None,
date_bounds: tuple[str, str] | None = None,
include_bounds: bool | tuple[bool, bool] = True,
) -> DataType:
Expand All @@ -1143,9 +1267,10 @@ def select_time(
One or more of 'DJF', 'MAM', 'JJA' and 'SON'.
month : int or sequence of int, optional
Sequence of month numbers (January = 1 ... December = 12).
doy_bounds : 2-tuple of int, optional
doy_bounds : 2-tuple of int or xr.DataArray, optional
The bounds as (start, end) of the period of interest expressed in day-of-year, integers going from
1 (January 1st) to 365 or 366 (December 31st).
1 (January 1st) to 365 or 366 (December 31st). If a combination of int and xr.DataArray is given,
the int day-of-year corresponds to the year of the xr.DataArray.
If calendar awareness is needed, consider using ``date_bounds`` instead.
date_bounds : 2-tuple of str, optional
The bounds as (start, end) of the period of interest expressed as dates in the month-day (%m-%d) format.
Expand Down Expand Up @@ -1187,17 +1312,6 @@ def select_time(
if N == 0:
return da

def _get_doys(_start, _end, _inclusive):
if _start <= _end:
_doys = np.arange(_start, _end + 1)
else:
_doys = np.concatenate((np.arange(_start, 367), np.arange(0, _end + 1)))
if not _inclusive[0]:
_doys = _doys[1:]
if not _inclusive[1]:
_doys = _doys[:-1]
return _doys

if isinstance(include_bounds, bool):
include_bounds = (include_bounds, include_bounds)

Expand All @@ -1212,7 +1326,7 @@ def _get_doys(_start, _end, _inclusive):
mask = da.time.dt.month.isin(month)

elif doy_bounds is not None:
mask = da.time.dt.dayofyear.isin(_get_doys(*doy_bounds, include_bounds))
mask = mask_between_doys(da, doy_bounds, include_bounds)

elif date_bounds is not None:
# This one is a bit trickier.
Expand Down
Loading
Loading