Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Access to filter tools? #107

Open
ChasenRanger opened this issue Feb 5, 2021 · 11 comments
Open

Access to filter tools? #107

ChasenRanger opened this issue Feb 5, 2021 · 11 comments

Comments

@ChasenRanger
Copy link

ChasenRanger commented Feb 5, 2021

Is there an intention to add the ability to use the filter capabilities of H5Z on datasets, such as the checksum or compression filters? For an example of what I mean, see https://docs.h5py.org/en/stable/high/group.html#h5py.Group.create_dataset which provides optional arguments for Compression, Shuffle, Fletcher32, etc.
I noticed that reading files with filters in place (such as with the above h5py tools) works automatically thanks to baseline HDF5, interestingly enough, so it is halfway done already.

@LiorBanai
Copy link
Owner

Hi,
I was not familiar with the H5Z namespace.
From fast reading it does seem to be supported in the underline library so I'll read more and see if it is possible to add it.

@ChasenRanger
Copy link
Author

Rereading my post, let me clarify that a file with filters on datasets I created in h5py could be opened just fine in C# with these tools.

@LiorBanai
Copy link
Owner

yeah got it. But a newly create H5 file in C# (this library) can not set them explicitly right now.

@ChasenRanger
Copy link
Author

Exactly right. My own reading leads me to think it won't be something that can be retroactively set; the dataset must be created with the filter. If it can be retroactively set, the file size won't automatically change without a repack.

@LiorBanai
Copy link
Owner

Correct. But maybe a different conversion tool can recreate those files if needed

@ChasenRanger
Copy link
Author

I've looked through h5py and this source a bit more to figure out the mechanism and I've determined two things:
First, the function on line 105 (and particularly line 135 below it) of Hdf5Utils.cs is where the H5D call is, which means adding the filters is perhaps more work than I initially suspected.
Second, I have absolutely no idea how to parse out how certain parameters after trying far too long to follow the thread of calls that sets up the dcpl_id long that is used in this case.
The second one has so far stymied my attempts to manually replicate the filters, but I only just figured out the filters for H5Z actually invoke from H5P, which is the ultimate reason for this update comment. I am not yet sure how to use the Dataset Creation Properties, but I at least realize they are the pertinent thing to look at. Here is a relevant link for the sake of completeness.

@LiorBanai
Copy link
Owner

I think it easier than since the H5D.create method for dataset accept additional parameters:

                long dcPropertyId = -1;
                long lcPropertyId = -1;
                 long datasetId = -1;
                H5P.set_shuffle(dcPropertyId);
                H5P.set_deflate(dcPropertyId, 7);
                H5P.set_chunk(dcPropertyId, 1, new ulong[] { chunkLength });
                lcPropertyId = H5P.create(H5P.LINK_CREATE);
                H5P.set_create_intermediate_group(lcPropertyId, 1);
                datasetId = H5D.create(.., .., .., .., lcPropertyId, dcPropertyId);

I haven't tested it yet but this may enable it (or variation of this code example).

@gaschd
Copy link

gaschd commented Jul 13, 2022

Is there a way to still use "WriteObject" when creating datasets in that way? Like using a custom attribute "Filters" to assign Hdf5ReadWrite-Attribute to skip any writes and use your custom dataset creation with deflate/shuffle to sneak the dataset into a created group?

@LiorBanai
Copy link
Owner

@gaschd I don't completely understand what you are asking. Do you mean re write existing dataset?

@gaschd
Copy link

gaschd commented Aug 3, 2022

I think in order to use filter tools, a chunked dataset is needed. Currently arrays with "WriteObject" are always continious memory.

So I'd suggest to introduce an attribute to indicate that.

  private class TestClassWithArray
        {
            public double[] TestDoubles { get; set; }

            [ChunkedDataset({10,50})]
            public double[] TestDoublesChunked { get; set; }

            [ChunkedDataset({10,50}, HDF.PInvoke.H5Z.filter_t.DEFLATE, HDF.PInvoke.H5Z.filter_t.SHUFFLE)]
            public double[] TestDoublesChunkedFiltered { get; set; }
        }

If the attribute has also filters like shuffle and deflate defined, they could be applied when creating the chunked dataset, without the need of converting them.

@LiorBanai
Copy link
Owner

LiorBanai commented Aug 3, 2022

@gaschd now I understand. :)
I'll see when I can implement this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants