Access to filter tools? #107

ChasenRanger · 2021-02-05T14:49:02Z

Is there an intention to add the ability to use the filter capabilities of H5Z on datasets, such as the checksum or compression filters? For an example of what I mean, see https://docs.h5py.org/en/stable/high/group.html#h5py.Group.create_dataset which provides optional arguments for Compression, Shuffle, Fletcher32, etc.
I noticed that reading files with filters in place (such as with the above h5py tools) works automatically thanks to baseline HDF5, interestingly enough, so it is halfway done already.

LiorBanai · 2021-02-05T19:43:43Z

Hi,
I was not familiar with the H5Z namespace.
From fast reading it does seem to be supported in the underline library so I'll read more and see if it is possible to add it.

ChasenRanger · 2021-02-05T19:54:35Z

Rereading my post, let me clarify that a file with filters on datasets I created in h5py could be opened just fine in C# with these tools.

LiorBanai · 2021-02-05T19:57:11Z

yeah got it. But a newly create H5 file in C# (this library) can not set them explicitly right now.

ChasenRanger · 2021-02-05T20:06:55Z

Exactly right. My own reading leads me to think it won't be something that can be retroactively set; the dataset must be created with the filter. If it can be retroactively set, the file size won't automatically change without a repack.

LiorBanai · 2021-02-05T20:13:28Z

Correct. But maybe a different conversion tool can recreate those files if needed

ChasenRanger · 2021-02-23T20:05:15Z

I've looked through h5py and this source a bit more to figure out the mechanism and I've determined two things:
First, the function on line 105 (and particularly line 135 below it) of Hdf5Utils.cs is where the H5D call is, which means adding the filters is perhaps more work than I initially suspected.
Second, I have absolutely no idea how to parse out how certain parameters after trying far too long to follow the thread of calls that sets up the dcpl_id long that is used in this case.
The second one has so far stymied my attempts to manually replicate the filters, but I only just figured out the filters for H5Z actually invoke from H5P, which is the ultimate reason for this update comment. I am not yet sure how to use the Dataset Creation Properties, but I at least realize they are the pertinent thing to look at. Here is a relevant link for the sake of completeness.

LiorBanai · 2021-02-23T20:56:21Z

I think it easier than since the H5D.create method for dataset accept additional parameters:

                long dcPropertyId = -1;
                long lcPropertyId = -1;
                 long datasetId = -1;
                H5P.set_shuffle(dcPropertyId);
                H5P.set_deflate(dcPropertyId, 7);
                H5P.set_chunk(dcPropertyId, 1, new ulong[] { chunkLength });
                lcPropertyId = H5P.create(H5P.LINK_CREATE);
                H5P.set_create_intermediate_group(lcPropertyId, 1);
                datasetId = H5D.create(.., .., .., .., lcPropertyId, dcPropertyId);

I haven't tested it yet but this may enable it (or variation of this code example).

gaschd · 2022-07-13T10:21:43Z

Is there a way to still use "WriteObject" when creating datasets in that way? Like using a custom attribute "Filters" to assign Hdf5ReadWrite-Attribute to skip any writes and use your custom dataset creation with deflate/shuffle to sneak the dataset into a created group?

LiorBanai · 2022-07-14T05:15:40Z

@gaschd I don't completely understand what you are asking. Do you mean re write existing dataset?

gaschd · 2022-08-03T11:49:31Z

I think in order to use filter tools, a chunked dataset is needed. Currently arrays with "WriteObject" are always continious memory.

So I'd suggest to introduce an attribute to indicate that.

  private class TestClassWithArray
        {
            public double[] TestDoubles { get; set; }

            [ChunkedDataset({10,50})]
            public double[] TestDoublesChunked { get; set; }

            [ChunkedDataset({10,50}, HDF.PInvoke.H5Z.filter_t.DEFLATE, HDF.PInvoke.H5Z.filter_t.SHUFFLE)]
            public double[] TestDoublesChunkedFiltered { get; set; }
        }

If the attribute has also filters like shuffle and deflate defined, they could be applied when creating the chunked dataset, without the need of converting them.

LiorBanai · 2022-08-03T16:22:13Z

@gaschd now I understand. :)
I'll see when I can implement this.

gaschd mentioned this issue Jan 4, 2023

ChunkedDataset AppendDataset null check #268

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Access to filter tools? #107

Access to filter tools? #107

ChasenRanger commented Feb 5, 2021 •

edited

Loading

LiorBanai commented Feb 5, 2021

ChasenRanger commented Feb 5, 2021

LiorBanai commented Feb 5, 2021

ChasenRanger commented Feb 5, 2021

LiorBanai commented Feb 5, 2021

ChasenRanger commented Feb 23, 2021

LiorBanai commented Feb 23, 2021

gaschd commented Jul 13, 2022

LiorBanai commented Jul 14, 2022

gaschd commented Aug 3, 2022

LiorBanai commented Aug 3, 2022 •

edited

Loading

Access to filter tools? #107

Access to filter tools? #107

Comments

ChasenRanger commented Feb 5, 2021 • edited Loading

LiorBanai commented Feb 5, 2021

ChasenRanger commented Feb 5, 2021

LiorBanai commented Feb 5, 2021

ChasenRanger commented Feb 5, 2021

LiorBanai commented Feb 5, 2021

ChasenRanger commented Feb 23, 2021

LiorBanai commented Feb 23, 2021

gaschd commented Jul 13, 2022

LiorBanai commented Jul 14, 2022

gaschd commented Aug 3, 2022

LiorBanai commented Aug 3, 2022 • edited Loading

ChasenRanger commented Feb 5, 2021 •

edited

Loading

LiorBanai commented Aug 3, 2022 •

edited

Loading