-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prototype of bounded dataarray functionality #737
base: main
Are you sure you want to change the base?
Conversation
@@ -173,7 +173,7 @@ def average( | |||
Using custom weights for averaging: | |||
|
|||
>>> # The shape of the weights must align with the data var. | |||
>>> self.weights = xr.DataArray( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This documentation update is more generic than this PR and should probably be implemented separately...
da = self._dataarray.copy() | ||
ds = boundedDataArray_to_dataset(da) | ||
# get data_var key | ||
data_var = da.name | ||
# pass on call to spatial averager | ||
ds_sa = ds.spatial.average( | ||
data_var, | ||
axis=axis, | ||
weights=weights, | ||
keep_weights=keep_weights, | ||
lat_bounds=lat_bounds, | ||
lon_bounds=lon_bounds, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the template: take your "bounded dataarray", convert it back to a dataset, and then call the existing xcdat API (passing along all the underlying arguments).
A shortfall, beyond replicating most (but not all) documentation, is that if we change the dataset API, we need to make sure to update it here, too. Maybe there is a fancy way to pass in **kwargs
that can do this automatically (probably not)?
except: # noqa: E722 | ||
continue | ||
# return dataarray with bounds | ||
da = xr.DataArray(ds[key], coords=coords, dims=ds[key].dims, attrs=ds[key].attrs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pasting the lat_bnds
coordinate / dataarray that is embedded in the "bounded dataarray" – I'm wondering if this could take on a nicer form than this random dtype: ('lower', '<f8'), ('upper', '<f8')
. I'm guessing if we had other bound types, we could make this more generic, e.g., ('lower_left', '<f8'), ('lower_right', '<f8'), ('upper_right', '<f8', ('upper_left', '<f8'))
.
da.lat_bnds
<xarray.DataArray 'lat_bnds' (lat: 64)> Size: 1kB
array([(-90. , -86.57774751), (-86.57774751, -83.75702878),
(-83.75702878, -80.95502019), (-80.95502019, -78.15834785),
...
( 78.15834785, 80.95502019), ( 80.95502019, 83.75702878),
( 83.75702878, 86.57774751), ( 86.57774751, 90. )],
dtype=[('lower', '<f8'), ('upper', '<f8')])
Coordinates:
- lat (lat) float64 512B -87.86 -85.1 -82.31 -79.53 ... 82.31 85.1 87.86
lat_bnds (lat) [('lower', '<f8'), ('upper', '<f8')] 1kB (-90.0, -86.5777...
Another thing to note is that Xarray does not have a native API for generating bounds. Pitching our bounds generation functionalities might also be a good idea, considering I think they're generic enough (except for time frequency based on CF calendar). |
@tomvothecoder – I asked ChatGPT if there was a better way to approach this problem and it yielded a slightly different approach for including the bounds in the import xarray as xr
import xcdat as xc
@xr.register_dataarray_accessor("cf_bounds")
class CFBoundsAccessor:
def __init__(self, xarray_obj):
self._obj = xarray_obj
# Initialize or load bounds as a dictionary from attributes.
self._bounds = self._obj.attrs.get("bounds", {})
@property
def bounds(self):
"""Return the entire dictionary of bounds."""
if not self._bounds:
raise AttributeError("No bounds are attached to this DataArray.")
return self._bounds
def set_bounds(self, bounds_dict):
"""
Set multiple bounds at once.
Parameters
----------
bounds_dict : dict
A dictionary mapping names (e.g., 'lat_bnds', 'lon_bnds', 'time_bnds')
to their respective bounds arrays.
"""
if not isinstance(bounds_dict, dict):
raise TypeError("bounds_dict must be a dictionary.")
self._bounds = bounds_dict
self._obj.attrs["bounds"] = bounds_dict
def get_bounds(self, key):
"""
Retrieve the bounds for a given coordinate or bounds name.
Parameters
----------
key : str
The key corresponding to the bounds you want to retrieve.
Returns
-------
The bounds array associated with the given key.
"""
try:
return self._bounds[key]
except KeyError:
raise KeyError(f"No bounds available for key '{key}'.")
def add_bound(self, key, bound_array):
"""
Add or update a single bounds entry.
Parameters
----------
key : str
The key for the bounds (e.g., 'lat_bnds').
bound_array : array-like
The bounds array corresponding to the key.
"""
self._bounds[key] = bound_array
self._obj.attrs["bounds"] = self._bounds
ds = xc.open_dataset('tas_Amon_MIROC-ES2L_historical_r2i1p1f2_gn_185001-201412.nc')
tas = ds['tas']
tas.cf_bounds.set_bounds({
"lat_bnds": ds.lat_bnds,
"lon_bnds": ds.lon_bnds
})
print(tas.cf_bounds.get_bounds("lat_bnds"))
print(tas.cf_bounds.get_bounds("lon_bnds"))
I like this approach (generally; I'd change some thing) because it seems to store the bounds in their native I also asked ChatGPT: once we have a dataarray with bounds included, how can we best leverage our existing functions to perform operations (e.g., spatial averaging). It basically came back with:
|
Description
This PR is meant to prototype and explore avenues for enabling xcdat functionality to operate on dataarray objects. The initial PR takes the approach of a) embedding bounds in a "bounded dataarray" (following this xarray exchange), b) converting the bounded dataarray to a dataset to call xcdat functionality (spatial averaging was prototyped).
This approach seems to work fine and is fast (converting between datasets/dataarrays is extremely fast), but it does require some manual re-implementation of functions and documentation. It also is only currently prototyped for rectilinear grids (but maybe a similar approach could be followed for other grid types). Last, it overwrites
xr.Dataset.__call__
with a function that converts the dataset to a bounded dataarray, which might be inadvisable (we could avoid this if we needed to, by calling the underlying function directly, e.g.,tas = xc.to_bounded_dataarray(ds, "tas")
).Alternate approached could be explored (in this PR or separate PRs):
Regardless, it might be useful to attend an xarray meeting to discuss what they think makes sense.
Checklist
If applicable: