-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Should the zarr backend support NCZarr conventions? #6374
Comments
For Unidata and netcdf, I think the situation is briefly this. In netcdf-4, dimensions are named objects that can "reside" inside groups.
So base dimension names (e.g. "z") can occur in different groups and can represent different dimension objects (with different sizes). It is possible to reference any dimension using fully-qualified-names (FQNs) such as "/g1/y". NCZarr captures this information by recording fully qualified names as special keys. If XArray is to be extended to support the equivalent of groups and distinct sets of dimensions are going to be supported in different groups, then some equivalent of the netcdf FQN is going to be needed. One final note. In netcdf, the dimension size is declared once and associated with a name. |
Thanks for the details, @DennisHeimbigner. But my reading of what you outline is that for some nczarr datasets, xarray will be able to open them. Correct? If so, there were always likely to be follow-on's to this issue when/if we identify critical edge cases. Perhaps for the moment, though, we can focus here on what we want to enable and what can be done straight-forwardly. That likely makes this more a question for @shoyer, @jhamman, @rabernat et al. (sorry, no way to |
CC @pydata/xarray |
Thanks, @max-sixty! Guess it just doesn't complete for those outside the org. |
My opinion is that we should not try to support the nczarr conventions directly. Xarray already supports nczarr via netCDF4. If netCDF4 can open the Zarr store, then Xarray can read it. Supporting nczarr directly would require lots of custom logic within xarray. That's because nczarr introduces several additional metadata files that are not part of the zarr spec. These additional metadata files break the abstractions through which xarray interacts with zarr; working around this requires going under the hood, access the store object directly (rather than the zarr groups and arrays). I would turn this question around and ask: if netCDF4 supports access to these datasets directly, what's the advantage of xarray bypassing netCDF4 and opening them directly? If there are significant performance benefits, I would be more likely to consider it worthwhile. |
As the moment, NCzarr format files (as opposed to pure Zarr format files
|
@DennisHeimbigner I think it would be great to standardize NCZarr as a super-set of the "Xarray-Zarr" standard! I think Xarray should indeed be able to read such files. If you want to read a sub-group, you can read the sub-group in a separate call to @rabernat I would not be opposed to adding support inside Xarray for reading NCZarr data, specifically to understand NCZarr's encoding of dimension names when using Zarr-Python. This wouldn't give 100% compatibility with NCZarr, but it would be very close (maybe just with incorrect dtypes for attributes) with a minimal amount of work. I don't think it would be a big deal to look for |
Sure, to be clear, my hesitancy is mostly just around being reluctant to maintain more complexity in our zarr interface. If there is momentum to implement and maintain this compatibility, I am definitely not opposed. 🚀 |
I guess I was not clear. If you are willing to lose netcdf specific metadata, |
@malmans2 can chime in with his experience, but it seems that from the user point-of-view, not needing to know if something is an xarray-zarr or a nczarr would be kinder of us. Plus as said below, I do think it puts us on the path to defining a common spec.
Mea culpa. I wasn't clear enough about the intent from my side at least, namely to support loading ARRAY_DIMENSIONS (or some other necessary subset) from nczarr rather than its entirety.
I'll add as a side that work on the subgroups (i.e. datatree) is progressing in case any consideration needs to be included now rather than later. |
re: pydata/xarray#6374 As a result of a discussion about Xarray (see above issue), I decided to turn on the xarray convention for NCZarr datasets where possible so that xarray can read a larger set of nczarr generated datasets. This causes the following changes: * If the user wants to generate a pure zarr file, then the mode "zarr" must be explicitly used; it is no longer the case that "mode=xarray" or mode="noxarray" implies "mode=zarr". * It is still the case that "mode=noxarray" will turn off the XArray convention. The following conditions will cause ''_ARRAY_DIMENSIONS'' to not be written. * The variable is not in the root group, * Any dimension referenced by the variable is not in the root group.
Adding support for reading I'm not sure whether it is better to (i) add direct support for
|
I made a recent change to this so that where possible, all NCZarr files contain the |
Thanks! #6420 looks at |
As it is currently it is also not possible to write a zarr which follows the GDAL ZARR driver conventions. Writing the
|
@wankoelias could you kindly open a new issue for writing GDAL ZARR? |
Can you elaborate? What API are you using to do the write: python, netcdf-c, or what? |
This error message comes from Xarray and can be triggered by calling Line 162 in facafac
I don't think netCDF-C needs to be involved at all, which is why I suggested opening a separate issue. |
What is your issue?
As part of the CZI EOSS4 grant, at B-Open we are keen on improving xarray/zarr cross-community conventions. It looks like xarray's
"zarr"
backend does not support Unidata NCZarr conventions while NetCDF-c>=4.8.1 supports xarray conventions.Currently, it is possible to open a
NCZarr
data source using the"netcdf4"
backend only. Here is an example:Would the community benefit from a
"zarr"
backend compatible with NCZarr as well? @rsignell-usgs and @rabernat suggested to discuss this within the xarray community, but I don't think a dedicated issue has been opened yet.cc: @alexamici @aurghs @joshmoore
The text was updated successfully, but these errors were encountered: