Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chunking fail for String variable with new netcdf-c package library (4.9.0) #1420

Open
charlycou opened this issue Feb 24, 2025 · 5 comments
Labels
bug Something isn't working

Comments

@charlycou
Copy link

charlycou commented Feb 24, 2025

Versions impacted by the bug

netcdf-java : 5.7.0
netcdf-c : 4.9.0 (packed with libnetcdf19)

What went wrong?

  • Problem Statement: When the NetCDF-C native library version 4.9.0 or later is installed, the NetCDF-Java library exhibits unexpected failures during writing Netcdf4 file involving chunking. This issue does not occur with version 4.7.4.
  • Expected Behavior: The tests should pass regardless of the NetCDF-C library version.
  • Actual Behavior: The tests fail when using NetCDF-C version 4.9.0 or later.

netcdf4 file writing fail writing variable

String qualityFlags(dateEnd=1051776);
  :flag_meanings = "missing_value Checked Estimated_from_bucket,valid_over_the_whole_period";
  :standard_name = "quality_flag";
  :flag_values = "11 9 12";

at

ret = nc4.nc_def_var_deflate(g4.grpid, varid, shuffle, deflate, deflateLevel);

This issue is raised for String variable and is not thrown for other Double variable from the same file. This could be linked to Unidata/netcdf4-python#1205 but from this discussion this issue is raised when trying to use compression on variable length String variable which seems not to be my case.

Relevant stack trace

java.io.IOException: NetCDF: Filter error: bad id or parameters or duplicate filter

	at ucar.nc2.jni.netcdf.Nc4Iosp.createVariable(Nc4Iosp.java:2546)
	at ucar.nc2.jni.netcdf.Nc4Iosp.createGroup(Nc4Iosp.java:2458)
	at ucar.nc2.jni.netcdf.Nc4Iosp.createGroup(Nc4Iosp.java:2469)
	at ucar.nc2.jni.netcdf.Nc4Iosp.createGroup(Nc4Iosp.java:2469)
	at ucar.nc2.jni.netcdf.Nc4Iosp.createGroup(Nc4Iosp.java:2469)
	at ucar.nc2.jni.netcdf.Nc4Iosp.create(Nc4Iosp.java:2370)
	at ucar.nc2.write.NetcdfFormatWriter.<init>(NetcdfFormatWriter.java:337)
	at ucar.nc2.write.NetcdfFormatWriter.<init>(NetcdfFormatWriter.java:42)
	at ucar.nc2.write.NetcdfFormatWriter$Builder.build(NetcdfFormatWriter.java:267)
	at fr.theialand.insitu.data.netcdf.creation.service.NetcdfBuilderService.writeNetcdfFromParquetFileInput(NetcdfBuilderService.java:210)
	at fr.theialand.insitu.data.netcdf.creation.service.NetcdfBuilderServiceTest.writeNetcdfFromParquetFileInput(NetcdfBuilderServiceTest.java:112)
	at java.base/java.lang.reflect.Method.invoke(Method.java:580)
	at java.base/java.util.ArrayList.forEach(ArrayList.java:1597)
	at java.base/java.util.ArrayList.forEach(ArrayList.java:1597)

Relevant log messages

8922 [main] INFO  ucar.nc2.jni.netcdf.Nc4wrapper  - trace: nc_def_var  ret=0 args=(65539,qualityFlags,12,1,[I@217b0952,int@0x7f68c9d22be0=0x6 (6))
8922 [main] INFO  ucar.nc2.jni.netcdf.Nc4wrapper  - trace: nc_def_var_chunking ret=0 args=(65539,6,0,[Lucar.nc2.jni.netcdf.SizeT;@3fd9e01c)
8922 [main] INFO  ucar.nc2.jni.netcdf.Nc4wrapper  - trace: nc_def_var_deflate ret=-132 args=(65539,6,1,1,5)
8922 [main] INFO  ucar.nc2.jni.netcdf.Nc4wrapper  - trace: nc_strerror ret=NetCDF: Filter error: bad id or parameters or duplicate filter args=(-132)
8924 [main] INFO  ucar.nc2.jni.netcdf.Nc4wrapper  - trace: nc_close ret=0 args=(65536)

To reproduce my issue : https://github.com/charlycou/netcdf4-java-chunking-issue

@charlycou charlycou added the bug Something isn't working label Feb 24, 2025
@charlycou charlycou changed the title Chunking fail for String variable when new netcdf-c package library (4.9.0) Chunking fail for String variable with new netcdf-c package library (4.9.0) Feb 24, 2025
@lesserwhirls
Copy link
Collaborator

Greetings @charlycou! Thank you for your detailed report. I have confirmed with the netCDF-C team that this is indeed the same issue you pointed to in the issue on the netCDF4-python repository.

When using netCDF-Java to write netCDF-4 files, we apply a default chunking strategy, which does add compression. To fix this particular bug, we will need to tweak the chunk strategies we include in netCDF-Java to not set a deflate level greater than zero when writing variables that are of type String or are variable length.

@charlycou
Copy link
Author

charlycou commented Feb 25, 2025

Thank you for your answer @lesserwhirls . In this case adding a condition in isChunked method from the Nc4ChunkingDefault class should do the trick.

  @Override
  public boolean isChunked(Variable v) {
    if (v.isUnlimited())
      return true;
    // if (getChunkAttribute(v) != null) return true;

    if(v.getDataType().equals(DataType.STRING) || v.isVariableLength() )
      return false;

    long size = v.getSize() * v.getElementSize();
    return (size > minVariableSize);
  }

But is String the only DataType for which chunking should be excluded?

@lesserwhirls
Copy link
Collaborator

I think we will also have to handle STRUCTURE special as well, since the STRUCTURE should not contain STRING either (I think). I also think we'll need to do a check in the getDeflateLevel(Variable v) method of Nc4ChunkingStrategy.java to return zero for these cases as well. I am getting other failures when I make these changes locally, so still digging.

@DennisHeimbigner
Copy link
Collaborator

In netcdf-c, I had to write a recursive routine that walked all the types to look for variable length types.
For example you might have a struct in struct in struct with a string field in the innermost struct.

@lesserwhirls
Copy link
Collaborator

Thanks for that tip, @DennisHeimbigner! I don't know that we have a sample file with that level of complexity or not, but I should make sure we hit that case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants