Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bugs in ZarrIO to support roundtrip from HDF5 to Zarr #4

Merged
merged 1 commit into from
Jul 23, 2019

Conversation

oruebel
Copy link
Collaborator

@oruebel oruebel commented Jul 23, 2019

Motivation

!!! Missing features:

  • ZarrIO.write_dataset
    • H5ReferenceDatasets are currently not written when converting from HDF5 to Zarr
    • np.bytes_ datasets should be written as string type datasets rathern than JSON arrays

Changes made in this PR

  • Replace build.written with def __buikder_written_to_zarr_io This is a bit hackky, but when we roundtrip from HDF5 (i.e., read from HDF5 and write to Zarr) the builders are already marked as written. This should ideally be resolved in the BuildManager
  • Fix write of attributes:
    • Ensure numpy arrays with more than one element are correctly written as attributes
    • Ensure np.bytes_ (in addition to python bytes) are converted to utf-8 string arrays to make them JSON serializable
    • Ensure attributes with references to Builders are written correctly
  • Handle np.dtypes when checking whether a dtype is a reference
  • Hanlde np.dtypes when determining the data type
  • For datasets np.dtypes only as objects (via JSON) if the np.dtype is not a np.number subtype
  • When writing object datasets, convert np.bytes_ to utf8 strings to make sure they are JSON serializable
  • In ZarrIO.listfill set the code (i.e., JSON) when serializing objects

How to test the behavior?

Using PyNWB from NeurodataWithoutBorders/pynwb#1018

from pynwb import NWBHDF5IO, NWBZarrIO
h5r = NWBHDF5IO('H19.28.012.11.05-2.nwb' , 'r')
f = h5r.read()
zw = NWBZarrIO('test_zarr_nwb', 'w', manager=h5r.manager)
zw.write(f)

Checklist

  • Have you checked our Contributing document?
  • Have you ensured the PR description clearly describes problem and the solution?
  • Is your contribution compliant with our coding style ? This can be checked running flake8 from the source directory.
  • Have you checked to ensure that there aren't other open Pull Requests for the same change?
  • Have you included the relevant issue number using #XXX notation where XXX is the issue number ? By including "Fix #XXX" you allow GitHub to close the corresponding issue.

@oruebel oruebel merged commit a011681 into 1.0.3-zarr Jul 23, 2019
@oruebel oruebel deleted the bugfix/nwbzarrio branch July 23, 2019 16:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants