-
-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Access to filter tools? #107
Comments
Hi, |
Rereading my post, let me clarify that a file with filters on datasets I created in h5py could be opened just fine in C# with these tools. |
yeah got it. But a newly create H5 file in C# (this library) can not set them explicitly right now. |
Exactly right. My own reading leads me to think it won't be something that can be retroactively set; the dataset must be created with the filter. If it can be retroactively set, the file size won't automatically change without a repack. |
Correct. But maybe a different conversion tool can recreate those files if needed |
I've looked through h5py and this source a bit more to figure out the mechanism and I've determined two things: |
I think it easier than since the H5D.create method for dataset accept additional parameters: long dcPropertyId = -1;
long lcPropertyId = -1;
long datasetId = -1;
H5P.set_shuffle(dcPropertyId);
H5P.set_deflate(dcPropertyId, 7);
H5P.set_chunk(dcPropertyId, 1, new ulong[] { chunkLength });
lcPropertyId = H5P.create(H5P.LINK_CREATE);
H5P.set_create_intermediate_group(lcPropertyId, 1);
datasetId = H5D.create(.., .., .., .., lcPropertyId, dcPropertyId); I haven't tested it yet but this may enable it (or variation of this code example). |
Is there a way to still use "WriteObject" when creating datasets in that way? Like using a custom attribute "Filters" to assign Hdf5ReadWrite-Attribute to skip any writes and use your custom dataset creation with deflate/shuffle to sneak the dataset into a created group? |
@gaschd I don't completely understand what you are asking. Do you mean re write existing dataset? |
I think in order to use filter tools, a chunked dataset is needed. Currently arrays with "WriteObject" are always continious memory. So I'd suggest to introduce an attribute to indicate that. private class TestClassWithArray
{
public double[] TestDoubles { get; set; }
[ChunkedDataset({10,50})]
public double[] TestDoublesChunked { get; set; }
[ChunkedDataset({10,50}, HDF.PInvoke.H5Z.filter_t.DEFLATE, HDF.PInvoke.H5Z.filter_t.SHUFFLE)]
public double[] TestDoublesChunkedFiltered { get; set; }
} If the attribute has also filters like shuffle and deflate defined, they could be applied when creating the chunked dataset, without the need of converting them. |
@gaschd now I understand. :) |
Is there an intention to add the ability to use the filter capabilities of H5Z on datasets, such as the checksum or compression filters? For an example of what I mean, see https://docs.h5py.org/en/stable/high/group.html#h5py.Group.create_dataset which provides optional arguments for Compression, Shuffle, Fletcher32, etc.
I noticed that reading files with filters in place (such as with the above h5py tools) works automatically thanks to baseline HDF5, interestingly enough, so it is halfway done already.
The text was updated successfully, but these errors were encountered: