Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add NetCDF3 dtype coercion for unsigned integer types #4014

Closed
blsqr opened this issue Apr 29, 2020 · 2 comments · Fixed by #4018
Closed

Add NetCDF3 dtype coercion for unsigned integer types #4014

blsqr opened this issue Apr 29, 2020 · 2 comments · Fixed by #4018

Comments

@blsqr
Copy link
Contributor

blsqr commented Apr 29, 2020

xr.Dataset.to_netcdf does not seem to support writing data with unsigned integer dtypes, uint32, uint64 etc.. This seems to be the case for both scipy-based output formats, NETCDF3_CLASSIC and NETCDF3_64BIT.

It seems like dtype coercions for int64 and bool are done automatically for NetCDF3 in the xarray.netcdf3 module.
Shouldn't data of signed dtypes then also be coerced, i.e. to their signed equivalent?

MCVE Code Sample

import numpy as np
import xarray as xr

da = xr.DataArray(np.array([1,2,3], dtype='uint64'))

# The following all fail:
da.to_netcdf("foo")  # default format: scipy NETCDF3_CLASSIC
da.to_netcdf("bar", format='NETCDF3_64BIT')
da.astype('uint32').to_netcdf("baz")
da.astype('uint16').to_netcdf("spam")

# This works:
da.astype('int64').to_netcdf("working64")  # is coerced
da.astype('int32').to_netcdf("working32")  # works natively

Importantly, this is with the netcdf4 python package not being installed, in which case that package would be used for writing rather than scipy's netcdf.

Expected Output

NetCDF3 file is written with an appropriately coerced data format, e.g. as done with int64.

Alternatively, writing data fails for all dtypes that would natively be unsupported, including int64 and bool.

Problem Description

Given that the infrastructure for coercion is already in place, it seems more consistent to me to apply coercion to all cases where it would lead to to_netcdf method calls succeeding rather than failing, not only to int64 and bool.

Ideally, coercion would happen towards another unsigned integer type.
However, writing uint32 seems not to be possible, so it's not a 64bit/32bit issue.
While the NetCDF Format Specification declares as only unsigned integer type NON_NEG, which I presume to be equivalent to uint32, writing unsigned integers seems not possible via scipy's NetCDF3 writer. Thus, the only viable coercion, as far as I see, would be to signed equivalents.

I see no new cast safety implications here, because the existing coerce_nc3_dtype function already checks if the original and cast arrays compare equivalently.

Versions

Output of xr.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.7.6 (default, Dec 30 2019, 19:38:26)
[Clang 11.0.0 (clang-1100.0.33.16)]
python-bits: 64
OS: Darwin
OS-release: 19.3.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
libhdf5: 1.10.4
libnetcdf: None

xarray: 0.15.1
pandas: 1.0.1
numpy: 1.18.3
scipy: 1.4.1
netCDF4: None
pydap: None
h5netcdf: None
h5py: 2.10.0
Nio: None
zarr: None
cftime: 1.0.4.2
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2.10.1
distributed: 2.10.0
matplotlib: 3.1.3
cartopy: None
seaborn: None
numbagg: None
setuptools: 45.1.0
pip: 20.0.2
conda: None
pytest: 5.3.5
IPython: 7.12.0
sphinx: 2.4.1

@blsqr
Copy link
Contributor Author

blsqr commented Apr 29, 2020

I'd be happy to open a PR, if such a behaviour is desired.

It seems like extending the _nc3_dtype_coercions dict (and extending tests) is all that needs to be done here?

Suggestion: Add all uint types supported by numpy:

_nc3_dtype_coercions = {
    "int64": "int32",
    "bool": "int8",
    "uint64": "int32",
    "uint32": "int32",
    "uint16": "int16",
    "uint8": "int8",
}

@dcherian
Copy link
Contributor

This sounds reasonable to me

blsqr added a commit to blsqr/xarray that referenced this issue May 2, 2020
dcherian added a commit that referenced this issue May 20, 2020
* In netcdf3 backend, also coerce unsigned integer dtypes

* Adjust test for netcdf3 rountrip to include coercion

This might be a bit too general for what is required at this point,
though ... 🤔

* Add test for failing dtype coercion

* Add What's New entry for issue #4014 and PR #4018

* Move netcdf3-specific test to NetCDF3Only class

Also uses a class variable for definition of netcdf3 formats now.

Co-authored-by: Deepak Cherian <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants