-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document Xarray zarr encoding conventions #4047
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, thank you!
Thanks @rabernat |
I have a couple of questions about _ARRAY_DIMENSIONS. |
Thanks for the useful questions @DennisHeimbigner
Yes, correct
Yes, correct as well. Understanding how this works requires me to describe some xarray internals. When decoding a Dataset, each array is decoded as an xarray.Variable. According to those docs "a single Variable object is not fully described outside the context of its parent Dataset". The Zarr decoding process returns a Variable, which is basically a tuple of Once the variables have all been decoded, then we put them together into a Dataset object. At that point, if there are inconsistent shapes across the different variables, an error will be raised. So far we haven't encountered this situation, because all the Zarr data we read tends to have been also written by Xarray, so it is consistent. But you could definitely manually hack a Zarr store to break this consistency, rendering it un-decodable by Xarray.
I hoped this was clear in the documentation I wrote which is now live here: http://xarray.pydata.org/en/latest/internals.html#zarr-encoding-specification. What I said was
An "array attribute" has a specific meaning in Zarr: it is the user metadata associated with an individual array. So the As you pointed out on the last call, there are clearly some downsides to having chosen to store this important property with the rest of user metadata (.zattrs). However, it allowed us to move forward without any changes to the zarr spec. |
When we implemented the Zarr backend, we made some ad hoc choices about how to encode NetCDF data in Zarr. At this stage, it would be useful to explicitly document this encoding. I decided to put it on the "Xarray Internals" page, but I'm open to moving if folks feel it fits better elsewhere.
cc @jeffdlb, @WardF, @DennisHeimbigner