Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

0.8 filespec updates #555

Closed
ivirshup opened this issue Apr 15, 2021 · 0 comments
Closed

0.8 filespec updates #555

ivirshup opened this issue Apr 15, 2021 · 0 comments

Comments

@ivirshup
Copy link
Member

0.8 filespec updates

I intend to expand on this a bit

Specs

Each element to be written will now be written with a "spec". That is, attributes which tell us what this element is supposed to be. This is already the case for a number of elements, but not universal. The required attributes for a spec are encoding-type and encoding-version.

What does this allow

This makes it possible to read elements regardless of where they are. For example, these have enabled allowing dataframes to be anywhere in the object (e.g. uns) since we can identify from the hdf5 group how we are suppoed to read the group into memory.

This also makes a path for anndata extensions. That is, we can allow third parties to register their own specs and methods so they can read and write whatever types they want. Some basic examples of this can be found here.

Why encoding-version

Sometime's we make mistakes, or write data in such a way that it doesn't allow a new operation in the future. This gives a controlled way to iterate on what we can do with data on disk and specify whether operation are allowed or not on older formats of data.

The registry

To map between methods and elements there is a registry. Technically two registries, one for writing, and one for reading. The write_registry recognized objects by their type and dispatches to the appropriate writing method. The read registry reads the IOSpec of an object and finds the right reading method.

Questions

  • Is it backend specific? E.g. different registries for h5ad and zarr?
  • Is is a problem that we aren't using subtyping? Currently seems like this actually solves some problems
  • How do we handle the more dynamic types? E.g. lists?

Backwards compat

I think it's time to start throwing some warnings for old files. Anything read without specs in the attributes will start throwing warnings telling people to update their files.

Future directions

  • Partial IO
  • Modifications
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant