Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extension to the HDF5 chunks API #309

Closed
davidhassell opened this issue Aug 6, 2024 · 0 comments · Fixed by #310
Closed

Extension to the HDF5 chunks API #309

davidhassell opened this issue Aug 6, 2024 · 0 comments · Fixed by #310
Labels
enhancement New feature or request netCDF read Relating to reading netCDF datasets netCDF write Relating to writing netCDF datasets performance Relating to speed and memory performance

Comments

@davidhassell
Copy link
Contributor

Currently (v1.11.1.0), the treatment of HDF5 chunking is a bit inadequate:

  • Chunking can only be set on a per-Data object basis
  • Chunking can only be defined by explicitly setting the chunks shape on each axis
  • Chunking is ignored in an output file unless native compression is on
  • Chunks from an input file are not stored

A more comprehensive and flexible API is needed:

  • cfdm.write should chunk by default, and have a keywork argument (hdf5_chunks) to configure the default chunking.
  • cfdm.read should, by default, store HDF5 chunking on the returned data, so that it will be used when when writing out to a new netCDF4 file.
  • Setting a HDF5 chunking strategy should be more intuitive. E.g. it should be easy to "chunk the time axis by 12 elements, leaving all other axes unchunked": f.nc_set_hdf_chunksizes({'T': 12})
  • Setting HDF5 chunksizes follows the Dask API for defining its computaitonal chunk sizes. E.g. f.nc_set_hdf_chunksizes("8 MiB")

PR to follow.

@davidhassell davidhassell added enhancement New feature or request performance Relating to speed and memory performance netCDF write Relating to writing netCDF datasets netCDF read Relating to reading netCDF datasets labels Aug 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request netCDF read Relating to reading netCDF datasets netCDF write Relating to writing netCDF datasets performance Relating to speed and memory performance
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant