-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
libaec / szip support #420
Comments
Also szip support via libaec was added to conda-forge's build of HDF5 in conda-forge/hdf5-feedstock#179 |
I don't know about the merits of if sz versus any other compressor, but if there is data out there using it, then at least kerchunk would like numcodecs to be able to deal with it. So we need a little cython code to link with the available binary? We will probably not want numcodecs to explicitly depend on libaec. |
Although I can see szlib.h is included with libaec via conda-forge, there are no instructions on how to call the function that seems to be the one we want, SZ_BufftoBuffDecompress . |
Note that there are two shared libraries
I think you want the second one?
Are you looking for documentation of the function? |
Here's the link to the HDF5 SZIP filter showing usage of |
Does this help? #include<stdio.h>
#include<stdint.h>
#include<szlib.h>
int main(int argc, char *args[]) {
printf("SZIP demo test\n______________\n\n");
// Pixel type
typedef uint32_t pixel_t;
// Status, check for errors
int status;
// INPUT DATA
const size_t inbuf_len = SZ_MAX_PIXELS_PER_SCANLINE*4;
const size_t inbuf_nbytes = SZ_MAX_PIXELS_PER_SCANLINE*4*sizeof(pixel_t);
pixel_t inbuf[inbuf_len];
for(int i = 0; i < SZ_MAX_PIXELS_PER_SCANLINE*4; ++i) {
inbuf[i] = i % 16;
}
// SZIP PARAMETERS
SZ_com_t sz_params;
//little endian
sz_params.options_mask = SZ_LSB_OPTION_MASK;
sz_params.bits_per_pixel = sizeof(pixel_t)*8;
sz_params.pixels_per_block = SZ_MAX_PIXELS_PER_BLOCK;
sz_params.pixels_per_scanline = SZ_MAX_PIXELS_PER_SCANLINE;
// COMPRESSION
size_t compressed_buf_nbytes = SZ_MAX_PIXELS_PER_SCANLINE*4*sizeof(pixel_t);
char compressed_buf[compressed_buf_nbytes];
status = SZ_BufftoBuffCompress(
compressed_buf, &compressed_buf_nbytes,
inbuf, inbuf_nbytes,
&sz_params
);
printf("SZ_BufftoBuffCompress status: %d\n", status);
printf("inbuf_nbytes: %zd\n", inbuf_nbytes);
printf("compressed_buf_nbytes: %zd\n\n", compressed_buf_nbytes);
if (status != SZ_OK) {
return status;
}
// DECOMPRESSION
size_t decompressed_buf_nbytes = inbuf_nbytes;
char decompressed_buf[decompressed_buf_nbytes];
status = SZ_BufftoBuffDecompress(
decompressed_buf, &decompressed_buf_nbytes,
compressed_buf, compressed_buf_nbytes,
&sz_params
);
printf("status: %d\n", status);
printf("compressed_buf_nbytes: %zd\n", compressed_buf_nbytes);
printf("decompressed_buf_nbytes: %zd\n", decompressed_buf_nbytes);
if (status != SZ_OK) {
return status;
}
return 0;
} To compile and run:
|
Thank you, that is helpful. Are those sz_params values guaranteed to be like that in some HDF file, or is that something stored in the HDF filter metadata? Some question for knowing how big the decompressed buffer should be. |
In HDF5, A HDF5 chunk stores the number of bytes of a decompressed buffer in the first four bytes of the chunk as a little endian The macro Note that the practice of storing extra metadata at the beginning of a chunk is common in HDF5. For example, the LZ4 filter uses the first 8 bytes to store the original size of the buffer followed by 4 bytes to encode the block size: |
Some of the parameters are also configurable by the user. See the documentation for |
I mean at read time, when these must be fixed. Perhaps easiest would be for me to get my hands on a sz-containing hdf and see what info is available. |
It's pretty easy to create such a hdf5 file with conda-forge's h5py now. conda environment:
In [1]: import h5py
In [2]: with h5py.File("test.h5", "w") as h5f:
...: ds = h5f.create_dataset("test", (1024, 1024), chunks=(16,16), dtype="uint16", compression="szip", compression_opts=('ec',32))
...: ds[:] = 1
In [3]: with h5py.File("test.h5", "r") as h5f:
...: ds = h5f["test"]
...: print(ds.compression)
...: print(ds.compression_opts)
...: dcpl = ds.id.get_create_plist()
...: print(dcpl.get_filter(0))
...:
szip
('ec', 32)
(4, 1, (141, 32, 16, 256), b'szip')
You can use the
The order of the parameters come from here: |
I have installed h5py from conda-forge or from pip, but
says
I have h5py 3.8.0 and libaec 1.0.6 (and I do see libsz.dylib in my lib/) |
(nvm, had to update hdf too) |
FWIW, the next version of imagecodecs will include a SZIP codec. It already has an AEC codec. |
Thanks @cgohlke . Does that mean you have and will be working on a Cython based version? |
If imagecodecs is getting SZIP, I don't think we mind how it is implemented :) |
Yes. Imagecodecs is Cython based and includes numcodecs compatible codecs.
It's ready for release but depends on libjpeg-turbo 3, which is currently in beta... |
Imagecodecs v2023.3.16 is available on PyPI (conda-forge will have to wait for libjpeg-turbo 3) and includes a numcodecs compatible SZIP codec based on an implementation in Cython |
Thank you, @cgohlke |
NASA Earth Observing System (EOS) uses szip compression:
https://www.earthdata.nasa.gov/esdis/esco/standards-and-practices/hdf-eos5
It is an integrated compression codec in HDF5:
https://portal.hdfgroup.org/display/HDF5/Szip+Compression+in+HDF+Products
Deutschen Klimarechenzentrum (DKRZ) has a freely available implementation available here under a 2-Clause BSD License:
https://gitlab.dkrz.de/k202009/libaec
Github Mirror
https://github.com/MathisRosenhauer/libaec
Conda-forge has libaec binaries here:
https://anaconda.org/conda-forge/libaec/files
They are produced by the following feedstock:
https://github.com/conda-forge/libaec-feedstock
Based on the outstanding use of SZIP and the availability of the libaec implementation, I recommend that numcodecs support SZIP / AEC compression.
The text was updated successfully, but these errors were encountered: