Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

netcdf built with hdf5 1.10.8 error after 'nc_def_var_fletcher32' #2189

Closed
abhibaruah opened this issue Jan 19, 2022 · 23 comments · Fixed by #2730
Closed

netcdf built with hdf5 1.10.8 error after 'nc_def_var_fletcher32' #2189

abhibaruah opened this issue Jan 19, 2022 · 23 comments · Fixed by #2730
Assignees
Milestone

Comments

@abhibaruah
Copy link

NetCDF version: v4.8.1
OS: Linux 10
Compiler gcc 8.3.0

I am a developer at The MathWorks Inc. For MATLAB, we were building netcdf with hdf5-1.8.12.
However, we are trying to change netcdf's dependency on hdf5 to v1.10.8.

I built netcdf successfully with hdf5 1.10.8. However, while qualifying I started seeing some of our tests fail, which I have narrowed down to an issue with 'nc_def_var_fletcher'.

I am attaching a C repro code below, which works fine when netcdf 4.8.1 is built with hdf5 1.8.12.

However, when netcdf 4.8.1 is built with hdf5 1.10.8, 'nc_close' on line 35 gives the error code -101 (NC_EHDFERR)
The error does not occur if the datatype (xtype) in 'nc_def_var' is anything other than NC_STRING.
Also, the error does not occur if the call to 'nc_def_var_fletcher32' is removed.

Any help in resolving this would be highly appreciated.

#include <iostream>
#include <vector>
#include "netcdf.h"

void checkErrorCode(int status, const char* message){
    if (status != NC_NOERR){
        std::cout << "Error code: " << status << " from " << message << std::endl;
    }
}

int main(int argc, const char * argv[]) {
    
    int ncid;
    int retval;
    
    retval = nc_create("myfile.nc", NC_NETCDF4, &ncid);
    checkErrorCode(retval, "nc_create");
    
    // Define dimensions
    int dimid_a;
	int dimlen = 100;
    retval = nc_def_dim(ncid, "a", dimlen, &dimid_a);
    checkErrorCode(retval, "nc_def_dim for 'a'");
    
    // Define variable
    int varid;
	int dimids[1] = {dimid_a};
    retval = nc_def_var(ncid, "var", NC_STRING, 1, dimids, &varid);
    checkErrorCode(retval, "nc_def_var");
        
    
    retval = nc_def_var_fletcher32(ncid, varid, true);
    checkErrorCode(retval, "nc_def_var_fletcher32");

    retval = nc_close(ncid);
    checkErrorCode(retval, "nc_close");
    
    
    return retval;
}
@WardF
Copy link
Member

WardF commented Jan 19, 2022

Thanks; taking a look at it now, I will also see about getting 1.10.8 into our test matrix.

@WardF
Copy link
Member

WardF commented Jan 19, 2022

What platform are you observing this failure on?

@WardF WardF self-assigned this Jan 19, 2022
@WardF WardF added this to the 4.9.0 milestone Jan 19, 2022
@WardF
Copy link
Member

WardF commented Jan 19, 2022

Nevermind my question, I see it in the header of the issue you opened, sorry, thanks :)

@WardF
Copy link
Member

WardF commented Jan 19, 2022

Observing this issue on OSX as well.

@WardF
Copy link
Member

WardF commented Jan 19, 2022

In linux with debug and logging turned on, observing the following.

TLDR: H5Dcreate2 is throwing an error for some reason.

wfisher@ubuntudev-arm:~/Desktop/github-test$ ./test 
				log_level changed to 5
		HDF5_def_dim: ncid 0x10000 name a len 100
				nc4_find_dim: dimid 0
		NC4_def_var: name var type 12 ndims 1
				dimid[0] 0
				nc4_get_typelen_mem xtype: 12
		nc_inq_atomic_type: typeid 12
				nc4_type_new: size 8 name string assignedid 12
				nc4_find_dim: dimid 0
				allocating array of 1 size_t to hold chunksizes for var var
				nc4_find_default_chunksizes2: name var dim 0 DEFAULT_CHUNK_SIZE 4194304 num_values 100.000000 type_size 8 chunksize 100
				total_chunk_size 800.000000
				nc4_get_typelen_mem xtype: 12
				new varid 0
		nc_def_var_extra: ncid 0x10000 varid 0
			extra: default: chunksizes for var varsizes=100
	NC4_close: ncid 0x10000
			nc4_close_hdf5_file: h5->path myfile.nc abort 0
			sync_netcdf4_file
		*** NetCDF-4 Internal Metadata: int_ncid 0x0 ext_ncid 0x10000
		FILE - path: myfile.nc cmode: 0x11009 parallel: 0 redef: 0 fill_mode: 0 no_write: 0 next_nc_grpid: 1
		 GROUP - / nc_grpid: 0 nvars: 1 natts: 0
		 DIMENSION - dimid: 0 name: a len: 100 unlimited: 0
		 VARIABLE - varid: 0 name: var ndims: 1 dimids: 0 storage: chunked
			nc4_rec_write_groups_types: grp->hdr.name /
			nc4_rec_write_metadata: grp->hdr.name /, bad_coord_order 0
				nc4_create_dim_wo_var: creating dim a
				nc4_create_dim_wo_var: about to H5Dcreate1 a dimscale dataset a
				write_netcdf4_dimid: writing secret dimid 0
				write_var: writing var var
			var_create_dataset:: name var
				var_create_dataset: about to H5Dcreate2 dataset var of type 0x13e
ERROR: file nc4hdf.c, line 988.
NetCDF: HDF error
Error -101: nc_close was received

@WardF
Copy link
Member

WardF commented Jan 19, 2022

For comparison, the following succeeds (netCDF 4.8.1, HDF5 1.8.21)

wfisher@ubuntudev-arm:~/Desktop/github-test$ ./test 
				log_level changed to 5
		HDF5_def_dim: ncid 0x10000 name a len 100
				nc4_find_dim: dimid 0
		NC4_def_var: name var type 12 ndims 1
				dimid[0] 0
				nc4_get_typelen_mem xtype: 12
		nc_inq_atomic_type: typeid 12
				nc4_type_new: size 8 name string assignedid 12
				nc4_find_dim: dimid 0
				allocating array of 1 size_t to hold chunksizes for var var
				nc4_find_default_chunksizes2: name var dim 0 DEFAULT_CHUNK_SIZE 4194304 num_values 100.000000 type_size 8 chunksize 100
				total_chunk_size 800.000000
				nc4_get_typelen_mem xtype: 12
				new varid 0
		nc_def_var_extra: ncid 0x10000 varid 0
			extra: default: chunksizes for var varsizes=100
	NC4_close: ncid 0x10000
			nc4_close_hdf5_file: h5->path myfile.nc abort 0
			sync_netcdf4_file
		*** NetCDF-4 Internal Metadata: int_ncid 0x0 ext_ncid 0x10000
		FILE - path: myfile.nc cmode: 0x11009 parallel: 0 redef: 0 fill_mode: 0 no_write: 0 next_nc_grpid: 1
		 GROUP - / nc_grpid: 0 nvars: 1 natts: 0
		 DIMENSION - dimid: 0 name: a len: 100 unlimited: 0
		 VARIABLE - varid: 0 name: var ndims: 1 dimids: 0 storage: chunked
			nc4_rec_write_groups_types: grp->hdr.name /
			nc4_rec_write_metadata: grp->hdr.name /, bad_coord_order 0
				nc4_create_dim_wo_var: creating dim a
				nc4_create_dim_wo_var: about to H5Dcreate1 a dimscale dataset a
				write_netcdf4_dimid: writing secret dimid 0
				write_var: writing var var
			var_create_dataset:: name var
				var_create_dataset: about to H5Dcreate2 dataset var of type 0x3000060
		attach_dimscales: attaching scale for dimid 0 to var var
					NC4_write_ncproperties
			nc4_rec_grp_HDF5_del: grp->name /
			closing HDF5 dataset 83886081
				nc4_rec_grp_HDF5_del: closing group /
			nc4_close_netcdf4_file: h5->path myfile.nc abort 0
					NC4_clear_provenance
			nc4_rec_grp_del: grp->name /
				var_free: deleting var var
				nc4_type_free: deleting type string
				dim_free: deleting dim a

@WardF
Copy link
Member

WardF commented Jan 19, 2022

@WardF
Copy link
Member

WardF commented Jan 19, 2022

Note to self, I've narrowed this down as a regression between HDF5 1.10.3 (works) and 1.10.7 (fails). It is unclear to me as of yet whether or not fletcher32 on a variable-length datatype variable is valid in the HDF5 API. I'll investigate this further tomorrow.

@WardF
Copy link
Member

WardF commented Jan 20, 2022

Confirmation that HDF5 1.10.6 appears to work, I'll have to dig into the release notes.

@WardF
Copy link
Member

WardF commented Jan 20, 2022

From the 1.10.7 release notes, I see the following note. It is confusing since it seems to indicate, at a glance, the reverse of what we are observing. It will merit further review.

   - Creation of dataset with optional filter

      When the combination of type, space, etc doesn't work for filter
      and the filter is optional, it was supposed to be skipped but it was
      not skipped and the creation failed.

      A fix is applied to allow the creation of a dataset in such
      situation, as specified in the user documentation.

      (BMR - 2020/8/13, HDFFV-10933)

@WardF
Copy link
Member

WardF commented Jan 20, 2022

I'm still taking a look into why this is happening, but the bottom line might be that it might be beyond our control in this circumstance. It should not, however, fail silently. We'll have to put some thought into how we will best handle this at the netcdf layer.

@DennisHeimbigner
Copy link
Collaborator

We could put in a check in nc_def_var_filter so that if one attempts to add a filter to
a string or vlen typed variable, the call to nc_def_var_filter will fail.

@WardF
Copy link
Member

WardF commented Jan 20, 2022

@DennisHeimbigner Agreed. I'm uncertain (and can't switch gears at the moment to think about it) whether we want to remove the filter definition when the combination doesn't work and silently proceed, or whether we want to fail with an informative message. The question is whether we want to break a workflow with a message that explains why, or do we want to simply retain previous behavior (one layer up).

@WardF
Copy link
Member

WardF commented Jan 20, 2022

@abhibaruah for what it's worth, it looks like this test may need some alteration on your end, regardless of how we proceed. Any HDF5 version 1.10.7+ is going to (from my testing) result in the behavior you're seeing.

@DennisHeimbigner
Copy link
Collaborator

I interpret the HDF5 release notes to mean that if the filter cannot be applied, then it is now
skipped, whereas before it was not and was executed anyway. My interpretations is that this
relates to a function
in the filter API called "can_apply". It is invoked before a filter is used to indicate that the filter
should work or not on the specific variable. So, for fletcher32, this function probably is defined to
indicate that it cannot apply if given a variable of variable length type. However, the HDF5 library
was ignoring this and going ahead and applying it anyway. The fix in 1.10.7 probably causes
the library to pay attention to the output of can_apply(). Not sure when can_apply() is called.
Does that interpretation fit the symptoms?

@abhibaruah
Copy link
Author

Hello Ward and Dennis,
Thank you for working on this.
My understanding from the discussion above is that from hdf5 1.10.7 onwards, invoking a filter on a variable of variable length type will fail.
However, this behavior seems to exist with only fletcher32. I tried using the netcdf function 'nc_def_var_deflate' on a string type variable, and it did not throw any error (I am unsure if the filter is applied or if it is simply ignored).

I wonder if the erroring behavior with fletcher32 might have been an oversight on the part of hdf5.

@abhibaruah
Copy link
Author

Meanwhile, I have also reached out to the HDF group and forwarded them the link to this discussion. Hopefully, they can help us in identifying next steps for the same, and if the behavior in the hdf5 layer is intended or a bug.

@DennisHeimbigner
Copy link
Collaborator

My suspicion is that other filters do not have a can_apply() function or the function
does not forbid variable length types. If correct, then what is compressed is not
the actual data but rather the pointers to the variable length data; the pointer to a string
or the .p field of an hvl_t object.

@abhibaruah
Copy link
Author

You are correct Dennis.
Here is what Elena from THG replied:
"Yes, behavior changed. Filtering for variable-size types was disabled at some point by Quincey."

She also confirmed that the erroring behavior would be seen only with fletcher32, since the checksum is mandatory. For all other filters, they will be silently ignored when used with variables having variable length datatypes.

Here is what Elena wrote:
"See H5ocpl.c file where gzip and fletcher filtesr are registered (i.e., H5Pset_deflate and H5Pset_fletcher32). The first filter is optional while checksum is mandatory."

@abhibaruah
Copy link
Author

@DennisHeimbigner Agreed. I'm uncertain (and can't switch gears at the moment to think about it) whether we want to remove the filter definition when the combination doesn't work and silently proceed, or whether we want to fail with an informative message. The question is whether we want to break a workflow with a message that explains why, or do we want to simply retain previous behavior (one layer up).

So, with reference to the two options in Ward's previous comment, which option will the netcdf library go ahead with? Ignoring the filter and proceeding or failing with a more informative message?

@DennisHeimbigner
Copy link
Collaborator

Frankly, I have no idea how to disentangle this.
Needs some thought.

@DennisHeimbigner
Copy link
Collaborator

After some discussion, we have decided to treat this an an error.

DennisHeimbigner added a commit to DennisHeimbigner/netcdf-c that referenced this issue Feb 19, 2022
re: Unidata#2189

Compression of a variable whose type is variable length
fails for all current filters. This is because at some point,
the compression buffer will contain pointers to data instead
of the actual data. Compression of pointers of course is meaningless.

The PR changes the behavior of nc_def_var_filter so that it will
fail with error NC_EFILTER if an attempt is made to add a filter
to a variable whose type is variable-length.

A variable is variable-length if it is of type string or VLEN
or transitively (via a compound type) contains a string or VLEN.

Also added a test case for this.

## Misc Changes
1. Turn off a number of debugging statements
@DennisHeimbigner
Copy link
Collaborator

"Fixed" by #2231

@WardF WardF closed this as completed Feb 22, 2022
DennisHeimbigner added a commit to DennisHeimbigner/netcdf-c that referenced this issue Jun 21, 2023
re: Discussion Unidata#2554
re: PR Unidata#2231
re: Issue Unidata#2189

After some discussion, the issue of applying filters on variables
whose type is not fixed size, was resolved as follows:
1. A call to nc_def_var_filter will ignore such filters, but will issue a log warning.
2. Loading (from an existing file) a variable whose type is not fixed-size and which has filters, will cause the variable to be suppressed.

This PR enforces those rules.

### Misc. Other changes
* Add a test case to test the vlen change.
* Make some minor clean-ups in various cmake and automake files.
* Remove unused test
DennisHeimbigner added a commit to DennisHeimbigner/netcdf-c that referenced this issue Aug 3, 2023
re: PR Unidata#2716).
re: Issue Unidata#2189

The basic change is to make use of the fact that HDF5 automatically suppresses optional filters when an attempt is made to apply them to variable-length typed arrays.
This means that e.g. ncdump or nccopy will properly see meaningful data.
Note that if a filter is defined as HDF5 mandatory, then the corresponding variable will be suppressed and will be invisible to ncdump and nccopy.
This functionality is also propagated to NCZarr.

This PR makes some minor changes to PR Unidata#2716 as follows:
* Move the test for filter X variable-length from dfilter.c down into the dispatch table functions.
* Make all filters for HDF5 optional rather than mandatory so that the built-in HDF5 test for filter X variable-length will be invoked.

The test case for this was expanded to verify that the filters are defined, but suppressed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants