You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Each element to be written will now be written with a "spec". That is, attributes which tell us what this element is supposed to be. This is already the case for a number of elements, but not universal. The required attributes for a spec are encoding-type and encoding-version.
What does this allow
This makes it possible to read elements regardless of where they are. For example, these have enabled allowing dataframes to be anywhere in the object (e.g. uns) since we can identify from the hdf5 group how we are suppoed to read the group into memory.
This also makes a path for anndata extensions. That is, we can allow third parties to register their own specs and methods so they can read and write whatever types they want. Some basic examples of this can be found here.
Why encoding-version
Sometime's we make mistakes, or write data in such a way that it doesn't allow a new operation in the future. This gives a controlled way to iterate on what we can do with data on disk and specify whether operation are allowed or not on older formats of data.
The registry
To map between methods and elements there is a registry. Technically two registries, one for writing, and one for reading. The write_registry recognized objects by their type and dispatches to the appropriate writing method. The read registry reads the IOSpec of an object and finds the right reading method.
Questions
Is it backend specific? E.g. different registries for h5ad and zarr?
Is is a problem that we aren't using subtyping? Currently seems like this actually solves some problems
How do we handle the more dynamic types? E.g. lists?
Backwards compat
I think it's time to start throwing some warnings for old files. Anything read without specs in the attributes will start throwing warnings telling people to update their files.
Future directions
Partial IO
Modifications
The text was updated successfully, but these errors were encountered:
0.8 filespec updates
I intend to expand on this a bit
Specs
Each element to be written will now be written with a "spec". That is, attributes which tell us what this element is supposed to be. This is already the case for a number of elements, but not universal. The required attributes for a spec are
encoding-type
andencoding-version
.What does this allow
This makes it possible to read elements regardless of where they are. For example, these have enabled allowing dataframes to be anywhere in the object (e.g.
uns
) since we can identify from the hdf5 group how we are suppoed to read the group into memory.This also makes a path for anndata extensions. That is, we can allow third parties to register their own specs and methods so they can read and write whatever types they want. Some basic examples of this can be found here.
Why
encoding-version
Sometime's we make mistakes, or write data in such a way that it doesn't allow a new operation in the future. This gives a controlled way to iterate on what we can do with data on disk and specify whether operation are allowed or not on older formats of data.
The registry
To map between methods and elements there is a registry. Technically two registries, one for writing, and one for reading. The write_registry recognized objects by their type and dispatches to the appropriate writing method. The read registry reads the IOSpec of an object and finds the right reading method.
Questions
h5ad
andzarr
?list
s?Backwards compat
I think it's time to start throwing some warnings for old files. Anything read without specs in the attributes will start throwing warnings telling people to update their files.
Future directions
The text was updated successfully, but these errors were encountered: