Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dimension_names missing in dict if None #2663

Closed
dcherian opened this issue Jan 6, 2025 · 8 comments
Closed

dimension_names missing in dict if None #2663

dcherian opened this issue Jan 6, 2025 · 8 comments

Comments

@dcherian
Copy link
Contributor

dcherian commented Jan 6, 2025

Why does Zarr remove "dimension_names" if it is None?

# if `dimension_names` is `None`, we do not include it in
# the metadata document
if out_dict["dimension_names"] is None:
out_dict.pop("dimension_names")

This kind of inconsistency is irritating to work around downstream. For example, see this icechunk test https://github.com/earth-mover/icechunk/blob/fd8f7c23ab613b4ff4452c3f8d49bf988ea21321/icechunk-python/tests/test_stateful_repo_ops.py#L372-L374

            # FIXME: zarr omits this if None?
            if "dimension_names" not in expected:
                actual.pop("dimension_names")
@rabernat
Copy link
Contributor

rabernat commented Jan 6, 2025

Agree. If it's in the spec, why not just always include it in the metadata?

@jhamman
Copy link
Member

jhamman commented Jan 6, 2025

dimensions_names is not a required key (according to the current v3 spec: https://zarr-specs.readthedocs.io/en/latest/v3/core/v3.0.html#dimension-names) so I'm guessing this was a micro optimization to limit the size of the metadata object.

If specified, must be an array of strings or null objects with the same length as shape.

I'd be fine always writing it from zarr-python but I think readers need to be prepared to find metadata docs without it.

@d-v-b
Copy link
Contributor

d-v-b commented Jan 7, 2025

I also find it annoying, and I would 100% prefer a metadata document with a stable type signature. However, the spec is clear that dimension_names is either an array of str | Null, or unset.

Specifies dimension names, e.g. ["x", "y", "z"]. If specified, must be
an array of strings or null objects with the same length as shape. An
unnamed dimension is indicated by the null object. If dimension_names is
not specified, all dimensions are unnamed.

@d-v-b
Copy link
Contributor

d-v-b commented Jan 7, 2025

I assume we are past the point of fixing this wart in the zarr v3 spec. Given that nobody noticed this problem in the PR that added it 😉 , I wonder what we could have done differently there.

Perhaps requiring that ALL metadata be described via rows in a table, where each row has the type (name: str, type: JSON, required: bool, description: str) (similar to the geoparquet spec) would have made it more obvious that this field was not required?

@normanrz
Copy link
Member

normanrz commented Jan 8, 2025

I think that was deliberate. At least @jbms pointed it out to me while I was hacking on zarrita: scalableminds/zarrita#10

@dcherian
Copy link
Contributor Author

dcherian commented Jan 8, 2025

Right, it is as the spec decrees sigh.

@dcherian dcherian closed this as completed Jan 8, 2025
@jbms
Copy link

jbms commented Jan 8, 2025

Is the specific issue that there is difficulty in representing this type with a Python TypedDict? Does NotRequired address this?

@d-v-b
Copy link
Contributor

d-v-b commented Jan 8, 2025

in my experience the friction comes from needing to special-case dimension_names when serializing metadata to JSON

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants