WIP zarr backend #98

donghekang · 2019-07-15T21:18:32Z

Motivation

Add support for Zarr DirectoryStore I/O

How to test the behavior?

See the included tests

ToDo

Dependencies

Update sources to latest HDMF dev
Remove dependency on six in Zarr backend
Update requirements to the current Zarr and numcodecs version
Make the import of Zarr optional to make sure HDMF work even without Zarr installed, i.e., only the explicit import of the zarr backend classes should fail if Zarr is not installed. Update: In src/hdmf/utils.py ,src/hdmf/io/__init__.py , tests/unit/test_io_zarr.py etc.
Add get_docval_macro method #446 adds a get_docval_macro function. We should remove the __get_docval_macro that has been added in this PR and update the code to use the new one from HDMF.

Features

Implement core Zarr I/O backend and utilities
Builder.written flag has been removed and built status should be tracked in the I/O backend
Remove logic to support export to Zarr from HDF5 and use new export functions instead
Add support to allow the use of Zarr-supported compressors and filters for datasets
Raise NotImplemented error for RegionReferences and mention in Docs
Add support for AbstractDataChunkIterator (also exhaust_dci flag has been moved to the HDF5 backend and removed from HDMFIO)
Add support for for RegionReferences (Optional)

Testing and Documentation

Add unit tests for iterative data write
Update CHANGELOG.md
Check that we can convert an NWB file from HDF5 --> Zarr --> HDF5
Add tests for export between HDF5 and Zarr and vice versa
Tests for export from ZarrToZarr are in TestExportZarrToZarr. Export of links seems to fail. Re-enable and debug the tests.
Check resolve references properly #179 to see if we need to update reference handling
Resolve TODO items listed in the code
Add example for HDF5 to Zarr conversion (and vice versa) to the export docs.

Checklist

Have you checked our Contributing document?
Have you ensured the PR description clearly describes problem and the solution?
Is your contribution compliant with our coding style ? This can be checked running flake8 from the source directory.
Have you checked to ensure that there aren't other open Pull Requests for the same change?
Have you included the relevant issue number using #XXX notation where XXX is the issue number ? By including "Fix #XXX" you allow GitHub to close the corresponding issue.

oruebel · 2019-07-15T21:25:38Z

NeurodataWithoutBorders/pynwb#300
NeurodataWithoutBorders/pynwb#230

bendichter · 2019-07-15T21:33:03Z

@kangDH thanks for getting this going! A bunch of us are really excited about incorporating alternative backends!

bendichter · 2019-07-15T22:10:01Z

@kangDH I think for requirements it might make more sense to have Zarr and other optional backends be included as extras

donghekang · 2019-07-15T22:27:17Z

@kangDH I think for requirements it might make more sense to have Zarr and other optional backends be included as extras

Because we import zarr in the utils.py, I think zarr should be installed by default.

bendichter · 2019-07-16T22:12:20Z

@kangDH it looks like you developed this in python 2. Could you change to python 2/3 syntax? e.g. changing print a to print(a). Also we use flake8 for managing code style, which is pretty strict. Check the results there to see where it has issues.

oruebel · 2019-07-16T22:30:04Z

I'm working on the fixes for Python 3

…le name in zarr_tools

oruebel · 2019-07-17T00:52:44Z

@kangDH the following PR donghekang#1 to your branch should fix the test failures for Python 2/3 and flake8. Can you please review and merge the PR with your branch.

Fix/zarriopy3

…te it

…e for Zarr tests

Add support for writing the namespace schema

…alued attributes

…eferences from attributes. Fix write of object reference attributes

bendichter · 2020-09-04T00:34:04Z

@oruebel Since Zarr only supports chunked datasets, how does this backend handles writing unchunked datasets? Does it write the dataset as one big chunk?

oruebel · 2020-09-04T01:33:32Z

In that case the chunk parameter will be set to False when calling the require_dataset function of zarr.hiearchy.Group I believe this means that Zarr will store the array in a single block, but I'd need to double check what Zarr actually does in this case when the file is written to disk.

bendichter · 2020-09-04T02:04:08Z

That could be a problem for large datasets because a user would not be able to read a section of it without reading the entire dataset into memory

oruebel · 2020-09-04T02:37:42Z

The default option for chunking is set to True currently in ZarrIO. We can run some test with some NWB files. The convert should be fairly simple:

from pynwb import NWBHDF5IO, NWBZarrIO
import os
infile = "H19.28.012.11.05-2.nwb"
outfile = "test_zarr_" + os.path.basename(infile)
h5r = NWBHDF5IO(infile , 'r', load_namespaces=False)
f = h5r.read()
zw = NWBZarrIO(outfile, 
               mode='w', 
               manager=h5r.manager, 
               chunking=True)
zw.write(f, cache_spec=True)
zw.close()
h5r.close()

…t tests for CSRMatrix

… and remove dependency on h5py

…arison options

…s type data

oruebel · 2022-02-16T07:21:50Z

Closing in favor of the new PR #696

zarr backend

52f5160

zarr requirement

68a83e6

zarr backend doc

25622bf

oruebel requested a review from ajtritt July 15, 2019 22:41

Add missing numcodec requirement

b6b49d0

oruebel added 6 commits July 16, 2019 15:31

Use six texttypes in Zarr IO for Python 3 compliance

180906d

Update print functions for Python3 compliance

57c39fd

Fix flake8 in src. Fix missing import in zarr_utils. Fix wrong variab…

29587af

…le name in zarr_tools

Updated list of available/missing features for the Zarr backend

d915a1d

Replaced dict creation with dict literal

a6644be

Cleaned up ZarrIO tests to fix flake8 and replace print with asserts

0d77ee0

donghekang and others added 12 commits July 16, 2019 19:24

Merge pull request #1 from kangdh/fix/zarriopy3

ce53a91

Fix/zarriopy3

Add support for writing the namespace schema

87d47d5

Moved get_source_name in NamespaceIOHelper in case we need to overwri…

1548671

…te it

Moved __get_types function for namespaces in tests to make it reusabl…

79f3f9c

…e for Zarr tests

Added support to load namespaces from ZarrIO

ecc6d32

Merge pull request #2 from kangdh/zarr/addspecio

6fe9806

Add support for writing the namespace schema

Add missing zarr_dtype attribute. Fix write of numpy and bytestring v…

e476bb5

…alued attributes

Add write of reference attributes

503f73d

Fix read scalars in ZarrIO

943b3fb

Fix encoding of byte-string attributes as utf-8 strings

6a9f3df

Support read of object references to groups. Support read of object r…

2df3132

…eferences from attributes. Fix write of object reference attributes

Fix flake8

4c525e6

oruebel added 3 commits September 3, 2020 16:38

Add unit tests for ZarrIO.write_attributes and rename the function

57bb485

Cleanup imports for IO tests

452696d

Remove six and unused GroupBuilderTestCase class from test_io_zarr.py

a1aaf10

oruebel added 3 commits September 3, 2020 18:48

Make object codec used for ZarrIO setable

21717f5

Add initial test harness for testing convert between HDF5 and Zarr

d8aa512

Add documentation for the test_io_convert test harness

099960d

oruebel and others added 10 commits September 3, 2020 21:44

Add support for array_data in CSRMatrix and add unit tests and conver…

c8d739f

…t tests for CSRMatrix

Merge branch 'dev' into 1.0.3-zarr

634971d

Updates hdmf TestCase class to support comparing with Zarr containers…

01a44f8

… and remove dependency on h5py

Updated test convert harness to allow configuration of container comp…

ade69cc

…arison options

Make manager configurable in convert test harness

73981b2

Added additional conversion tests using the Foo example

4456d5f

Updates HDF5IO.get_types to correctly determin the data type for byte…

fd64ca0

…s type data

Test and fix converting HDF5 with external links to Zarr

2ca0c04

Add code for future test case

c728468

Revert link logic to fix broken CI test

cab0013

rly mentioned this pull request Sep 9, 2020

Refactor HDF5IO.write_dataset to be more readable #428

Merged

6 tasks

oruebel mentioned this pull request Oct 29, 2020

Add get_docval_macro method #446

Merged

6 tasks

oruebel added 5 commits February 15, 2022 22:17

Merge branch 'dev' into 1.0.3-zarr

36ec57d

Fix test failures due to update to latest HDMF dev

b5ee52e

Attempt to fix tests by updating Zarr dependencies

0efd5bc

Fix flake8 in src/

85f5c19

Fix flake8 in tests

61b1233

oruebel mentioned this pull request Feb 16, 2022

Zarr Backend #696

Closed

25 tasks

oruebel closed this Feb 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP zarr backend #98

WIP zarr backend #98

donghekang commented Jul 15, 2019 •

edited by oruebel

Loading

oruebel commented Jul 15, 2019

bendichter commented Jul 15, 2019

bendichter commented Jul 15, 2019

donghekang commented Jul 15, 2019

bendichter commented Jul 16, 2019

oruebel commented Jul 16, 2019

oruebel commented Jul 17, 2019

bendichter commented Sep 4, 2020

oruebel commented Sep 4, 2020

bendichter commented Sep 4, 2020

oruebel commented Sep 4, 2020

oruebel commented Feb 16, 2022

WIP zarr backend #98

WIP zarr backend #98

Conversation

donghekang commented Jul 15, 2019 • edited by oruebel Loading

Motivation

How to test the behavior?

ToDo

Dependencies

Features

Testing and Documentation

Checklist

oruebel commented Jul 15, 2019

bendichter commented Jul 15, 2019

bendichter commented Jul 15, 2019

donghekang commented Jul 15, 2019

bendichter commented Jul 16, 2019

oruebel commented Jul 16, 2019

oruebel commented Jul 17, 2019

bendichter commented Sep 4, 2020

oruebel commented Sep 4, 2020

bendichter commented Sep 4, 2020

oruebel commented Sep 4, 2020

oruebel commented Feb 16, 2022

donghekang commented Jul 15, 2019 •

edited by oruebel

Loading