_FillValue from a classic netCDF file does not end up in the kerchunk json #385

noaaroland · 2023-11-01T13:42:44Z

So I spent some time reading through all of the discussion of #177 and it seems like I'm having the exact same problem with a classic netCDF file (not an HDF file under the hood). I am using these versions:

kerchunk version =  0.2.2
xarray version =  2023.10.1

I using this file: wget https://data.pmel.noaa.gov/aclim/thredds/fileServer/B10K-K20_CORECFS/Level1/1995-1999/B10K-K20_CORECFS_1995-1999_average_Jel.nc

When I open the netCDF using xarray, the "fill values" are set to nan:

ds1 = xr.open_dataset('./B10K-K20_CORECFS_1995-1999_average_Jel.nc')
ds1.Jel[100,0,250,175].values

array(nan, dtype=float32)

And I naively kerchunk it with this code, read the result with xarray, the "file value" is set to 1.e37:

fs1 = fsspec.filesystem('file')
ncchunks = NetCDF3ToZarr('./B10K-K20_CORECFS_1995-1999_average_Jel.nc')
with fs1.open('./B10K-K20_CORECFS_1995-1999_average_Jel.json', 'wb') as f:
        f.write(ujson.dumps(ncchunks.translate()).encode())
ds2 = xr.open_dataset('./B10K-K20_CORECFS_1995-1999_average_Jel.json')
ds2.Jel[100,0,250,175].values

reference://
array(1.e+37, dtype=float32)

When I write from the netCDF xarray data set to zarr, the "fill value" is preserved when reading the resulting zarr.

ds1.to_zarr('./B10K-K20_CORECFS_1995-1999_average_Jel.zarr')
ds3 = xr.open_dataset('./B10K-K20_CORECFS_1995-1999_average_Jel.zarr', engine='zarr')
ds3.Jel[100,0,250,175].values

array(nan, dtype=float32)

The use of decode_cf=True didn't change anything regarding the value of the "fill value".

The text was updated successfully, but these errors were encountered:

noaaroland · 2023-11-02T21:18:35Z

For what it's worth, if I start with same netCDF file above and first convert it to netCDF4, then kerchunk it using SingleHdf5ToZarr the "fill value" is converted to a nan as expected when reading the resulting json file with xarray.

ncks --fl_fmt=netcdf4_classic B10K-K20_CORECFS_1995-1999_average_Jel.nc B10K-K20_CORECFS_1995-
1999_average_Jel.nc4

u = 'B10K-K20_CORECFS_1995-1999_average_Jel.nc4'
so = dict(mode='rb', default_fill_cache=False, default_cache_type='first')
with fs1.open(u, **so) as infile:
    h5chunks = SingleHdf5ToZarr(infile, u, inline_threshold=300)
    with fs1.open('B10K-K20_CORECFS_1995-1999_average_Jel_4.json', 'wb') as f:
        f.write(ujson.dumps(h5chunks.translate()).encode())

ds4 = xr.open_dataset('B10K-K20_CORECFS_1995-1999_average_Jel_4.json', decode_cf=True)
ds4.Jel[100,0,250,175].values

reference://
B10K-K20_CORECFS_1995-1999_average_Jel_4.json
B10K-K20_CORECFS_1995-1999_average_Jel_4.json
B10K-K20_CORECFS_1995-1999_average_Jel_4.json
array(nan, dtype=float32)

martindurant · 2023-11-03T00:33:50Z

In netCDF3,oy, we have

                fill = var._attributes.get("missing_value", None)
...
            arr.attrs.update(
                {
                    k: v.decode() if isinstance(v, bytes) else str(v)
                    for k, v in var._attributes.items()
                    if k not in ["_FillValue", "missing_value"]
                }
            )

so perhaps the first line should consider both possible versions of fill as done in the second block?

-                fill = var._attributes.get("missing_value", None)
+                fill = var._attributes.get("missing_value", None) or var._attributes.get("_FillValue", None)

noaaroland · 2023-11-03T16:54:02Z

I tested this change yesterday and it worked as I expected, so considering the _FillValue as well as the missing in whatever way makes sense to you would likely solve my problem.

diff --git a/kerchunk/netCDF3.py b/kerchunk/netCDF3.py
index 01f573c..8a521d5 100644
--- a/kerchunk/netCDF3.py
+++ b/kerchunk/netCDF3.py
@@ -172,6 +172,8 @@ class NetCDF3ToZarr(netcdf_file):
                 # simple array block
                 # TODO: chance to sub-chunk
                 fill = var._attributes.get("missing_value", None)
+                if fill is None:
+                    fill = var._attributes.get("_FillValue", None)
                 if fill is not None and var.data.dtype.kind == "f":
                     fill = float(fill)
                 if fill is not None and var.data.dtype.kind == "i":
@@ -221,6 +223,8 @@ class NetCDF3ToZarr(netcdf_file):

                 # TODO: avoid this code repeat
                 fill = var._attributes.get("missing_value", None)
+                if fill is None:
+                    fill = var._attributes.get("_FillValue", None)
                 if fill is not None and base.kind == "f":
                     fill = float(fill)
                 if fill is not None and base.kind == "i":

martindurant · 2023-11-03T17:12:20Z

Let's make it a PR

noaaroland mentioned this issue Nov 3, 2023

Get fill from _FillValue if netCDF file does not have missing_value a... #389

Merged

martindurant closed this as completed in #389 Nov 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

_FillValue from a classic netCDF file does not end up in the kerchunk json #385

_FillValue from a classic netCDF file does not end up in the kerchunk json #385

noaaroland commented Nov 1, 2023

noaaroland commented Nov 2, 2023

martindurant commented Nov 3, 2023

noaaroland commented Nov 3, 2023

martindurant commented Nov 3, 2023

_FillValue from a classic netCDF file does not end up in the kerchunk json #385

_FillValue from a classic netCDF file does not end up in the kerchunk json #385

Comments

noaaroland commented Nov 1, 2023

noaaroland commented Nov 2, 2023

martindurant commented Nov 3, 2023

noaaroland commented Nov 3, 2023

martindurant commented Nov 3, 2023