Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow OME-Zarr for non-microscopy data #1704

Open
balbasty opened this issue Feb 13, 2024 · 13 comments
Open

Allow OME-Zarr for non-microscopy data #1704

balbasty opened this issue Feb 13, 2024 · 13 comments

Comments

@balbasty
Copy link
Contributor

balbasty commented Feb 13, 2024

BIDS accepts ome-zarr data in its microscopy extension, but not in its main MRI-related specification.

In most cases, MRI volumes are small enough that chunked formats do not make sense, but MRIs acquired in ex vivo human brains can become quite large. For example, these 120 um isotropic MRIs are about 5GB each. This is particularly problematic for web-based viewing, as most (all?) viewers load the entire volume in GPU memory and have a hard memory limit. The MRI linked above is 1600x1400x640 and I am not sure that niiview would be able to display it. Even if it were, having to download the entire file before showing it makes this impractical.

Could we allow OME-Zarr files in the main spec?

The main problem I see is that, in its current form, the OME-NGFF specification does not allow storing most of the metadata that live in the nifti header -- most improtantly the qform and sform. As an alternative, we have drafted a very lightweight supplement to OME-Zarr, namely NIfTI-Zarr, which only requires dumping the nifti header under .zattrs["nifti"]["base64"] (using base64 encoding). This makes it very easy to decode the nifti metadata for any library that has access to a base64 decoder, and a nifti parser. We have reference implementations in python and julia.

I guess this might be somewhat related to #197

@satra @martindurant @yarikoptic

@effigies
Copy link
Collaborator

I think this seems like a sensible intermediate between a novel format and the limitations of NIfTI. Is there still a NIfTI working group to get the blessings of?

@martindurant
Copy link

I was tagged so that I could mention kerchunk, which could provide a way to read directly from the .nii.gz files in parallel and chunk-wise (chunking limited to the largest dimension) by pretending to be a zarr dataset without rewriting the data at all. This is already possible for uncompressed data, and gzip would take some work ( pauldmccarthy/indexed_gzip#112 ). Having done that, you could create a global virtual zarr dataset over all of the files in the archive.

If you also want subsampled pyramid (OME) data, you still need to create and store those somewhere, but of course they would be much smaller. The format could be zarr or something else, and kerchunk could present the whole lot as a single zarr dataset.

Note: all this only works in python, but https://observablehq.com/@manzt/ome-tiff-as-filesystemreference (appears to have gone stale) presented a prototype in-browser JS viz of exactly the same thing.

@martindurant
Copy link

dumping the nifti header under .zattrs["nifti"]["base64"]

Mild comment: this seems an odd choice to me (but I don't know the domain). The nice thing about JSON metadata is that you can trivially read it, so why not (also?) include the fields, e.g., as interpreted by nibabel dict(nibabel.nifti1.Nifti1Header)?

@yarikoptic
Copy link
Collaborator

Agree with @martindurant on unclear reason for an attempt to preserve nifti header in its original binary form. I can only guess that rationale was to facilitate 1-to-1 binary roundtrip nifti-zarr-nifti. But I do not think that it is that much needed or desired. JSON choice there would have been much better. Continuing on

OME-NGFF specification does not allow storing most of the metadata that live in the nifti header -- most improtantly the qform and sform

I wonder if NIfTI-Zarr considered adopting superset of NIfTI + .json sidecar fields defined by BIDS already to be included within OME-Zarr?
From the other side -- shouldn't we in BIDS converge on harmonization of metadata in sidecar JSON file to cover also metadata people rely on getting directly from NIfTI (ie sform/qfrorm, AcquisitionMatrixExtent, ...)?

Also attn @matthew-brett as he was into "new imaging format" considerations.

Overall it sounds like a separate and large issue to discuss so we might want a dedicated another issue to it. But it also feels like a prerequisite to have a complete answer to this one. As @balbasty noted that OME-Zarr lacks needed metadata, we apparently even lack it in sidecar files, and it is unlikely that we would accept some ad-hoc (not "agreed upon or already widely used") solution within OME-Zarr.

@martindurant
Copy link

OME-Zarr lacks needed metadata

Do you mean it lacks required fields, or that the metadata structure doesn't allow for the kind of information you want to preserve? I find in other fields (particularly earth observation and climatology), a huge amount of complex metadata is stored in zarrs.

@balbasty
Copy link
Contributor Author

I think @yarikoptic means that the OME-NGFF specification (which formalizes OME metadata) does not currently support affine transforms (nor other NIfTI metadata). I know they are working on a spatial transform supplement, but it will most likely take time to find a consensus, and even more time for something like NIfTI and OME to converge.

I actually feel quite strongly about keeping the binary form of the header:

  • In the draft, we do allow alternate json-like fields to be stored (we use the "SHOULD" word for that). Even though I do not love the metadata duplication.
  • Keeping the binary header is what makes this format "not really a new format". A NIfTI header is not a dictionary: it does not name fields, it only specifies how to interpret the values. Converting it to a json-like object means agreeing on field names, which I'd rather avoid.
  • Keeping the binary header and forcing the json fields to be compatible ensures that the metadata can be converted back into NIfTI without loss. Otherwise, nothing prevents someone to e.g. store a description that's more than 80 characters in length.
  • I know it was just an example, but I would really not like the specification to rely on a specific implementation (dict(nibabel.nifti1.Nifti1Header)). It makes it look like it's a "python format" and would hamper its adoption. Right now, anyone can implement relatively easily a NIfTI/OME-Zarr reader (at least the filesystem-backed version).

It's a personal view, but I also don't love relying on sidecar json files. A big reason why NIfTI worked, in a social sense, is that it is a single file format. OME-Zarr can be seen as a "single directory" format (when filesystem-backed) so also works well in that respect.

@satra
Copy link
Collaborator

satra commented Feb 14, 2024

@martindurant and @yarikoptic - the expanded form of the binary header in json form is also included in the zarr metadata in @balbasty example.

@martindurant
Copy link

My opinions on metadata were merely suggestions, you people know better than me, especially in other languages (although zarr is >90% python).

A big reason why NIfTI worked, in a social sense, is that it is a single file format. OME-Zarr can be seen as a "single directory" format (when filesystem-backed) so also works well in that respect.

This does get less tractable for bigger data, and when directly accessing the data remotely. Cloud-native workflows require reading as few bytes of the data (from dandi, ipfs, s3, whatever) as necessary and some manner of parallelism. nifti clearly comes from an age of "download everything you need before starting", which is many cases needs an online parameter search service to pick the right files. zarr can index over all parameter space (with kerchuk or if converting the whole dataset) and has concurrency/chunking/parallelism built in.

I know people who have heard of zarr know these things, but still worth pointing out!

@effigies
Copy link
Collaborator

effigies commented Feb 16, 2024

Another approach to the B64-in-JSON encoding for the header would be to create a nifti array that is just the literal NIfTI header:

import nibabel as nb
import numpy as np
import zarr

img = nb.Nifti1Image(np.zeros((256,256,256), dtype='f4'), np.diag((2, 2, 2, 1)))

root = zarr.group(store='/tmp/img.nii.zarr')
root.array(name='0', data=img.dataobj)
root.array(name='nifti', data=img.header.binaryblock)

rt_header = nb.Nifti1Header(binaryblock=np.asanyarray(root['nifti']).tobytes())
round_trip = nb.Nifti1Image(
    root['0'],
    affine=rt_header.get_best_affine(),
    header=rt_header,
)

Playing with a real file, I was able to read the header with mri_info:

❯ mri_info /tmp/nii.zarr/nifti/0
Volume information for /tmp/nii.zarr/nifti/0
          type: nii
    dimensions: 208 x 300 x 320
   voxel sizes: 0.800000, 0.800000, 0.800000
          type: FLOAT (3)
           fov: 166.400
           dof: 1
        xstart: -83.2, xend: 83.2
        ystart: -120.0, yend: 120.0
        zstart: -128.0, zend: 128.0
            TR: 2400.00 msec, TE: 0.00 msec, TI: 0.00 msec, flip angle: 0.00 degrees
       nframes: 1
       PhEncDir: UNKNOWN
       FieldStrength: 0.000000
ras xform present
    xform info: x_r =   0.9981, y_r =   0.0621, z_r =   0.0057, c_r =    -0.9937
              : x_a =  -0.0622, y_a =   0.9980, z_a =   0.0072, c_a =    18.6450
              : x_s =  -0.0052, y_s =  -0.0075, z_s =   1.0000, c_s =     4.2302
Orientation   : RAS
Primary Slice Direction: axial

voxel to ras transform:
                0.7984   0.0497   0.0045   -92.2098
               -0.0497   0.7984   0.0057   -96.8662
               -0.0042  -0.0060   0.8000  -122.4309
                0.0000   0.0000   0.0000     1.0000

voxel-to-ras determinant 0.512

ras to voxel transform:
                1.2476  -0.0777  -0.0065   106.7157
                0.0776   1.2476  -0.0094   126.8562
                0.0071   0.0090   1.2499   154.5524
                0.0000   0.0000   0.0000     1.0000

Any tool that can currently parse a NIfTI header should be able to work with this structure.

@martindurant
Copy link

@effigies , that's quite clever :)

@yarikoptic
Copy link
Collaborator

indeed quite sneaky! example needs little fixing though

❯ python tryniftihdr.py
Traceback (most recent call last):
  File "/tmp/tryniftihdr.py", line 11, in <module>
    rt_header = nb.Nifti1Header(binaryblock=np.asanyarray(nifti).tobytes())
                                                          ^^^^^
NameError: name 'nifti' is not defined

@effigies
Copy link
Collaborator

Fixed. Though the goal was to be lazy, not sneaky.

@yarikoptic
Copy link
Collaborator

Being sneaky while being lazy is a true super-power! ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants