0.8 filespec updates #555

ivirshup · 2021-04-15T08:55:54Z

0.8 filespec updates

I intend to expand on this a bit

Specs

Each element to be written will now be written with a "spec". That is, attributes which tell us what this element is supposed to be. This is already the case for a number of elements, but not universal. The required attributes for a spec are encoding-type and encoding-version.

What does this allow

This makes it possible to read elements regardless of where they are. For example, these have enabled allowing dataframes to be anywhere in the object (e.g. uns) since we can identify from the hdf5 group how we are suppoed to read the group into memory.

This also makes a path for anndata extensions. That is, we can allow third parties to register their own specs and methods so they can read and write whatever types they want. Some basic examples of this can be found here.

Why `encoding-version`

Sometime's we make mistakes, or write data in such a way that it doesn't allow a new operation in the future. This gives a controlled way to iterate on what we can do with data on disk and specify whether operation are allowed or not on older formats of data.

The registry

To map between methods and elements there is a registry. Technically two registries, one for writing, and one for reading. The write_registry recognized objects by their type and dispatches to the appropriate writing method. The read registry reads the IOSpec of an object and finds the right reading method.

Questions

Is it backend specific? E.g. different registries for h5ad and zarr?
Is is a problem that we aren't using subtyping? Currently seems like this actually solves some problems
How do we handle the more dynamic types? E.g. lists?

Backwards compat

I think it's time to start throwing some warnings for old files. Anything read without specs in the attributes will start throwing warnings telling people to update their files.

Future directions

Partial IO
Modifications

The text was updated successfully, but these errors were encountered:

ivirshup mentioned this issue Apr 15, 2021

Specs for all elements #554

Merged

11 tasks

ivirshup pinned this issue Apr 19, 2021

ivirshup closed this as completed Jan 11, 2022

gtca mentioned this issue May 31, 2022

Feature request - var_names/obs_names as fixed-sized types (integer or bytes) #777

Open

ivirshup unpinned this issue Feb 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

0.8 filespec updates #555

0.8 filespec updates #555

ivirshup commented Apr 15, 2021

0.8 filespec updates #555

0.8 filespec updates #555

Comments

ivirshup commented Apr 15, 2021

0.8 filespec updates

Specs

What does this allow

Why encoding-version

The registry

Questions

Backwards compat

Future directions

Why `encoding-version`