Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Xarray operations produce read-only array #3813

Open
djhoese opened this issue Feb 28, 2020 · 7 comments
Open

Xarray operations produce read-only array #3813

djhoese opened this issue Feb 28, 2020 · 7 comments

Comments

@djhoese
Copy link
Contributor

djhoese commented Feb 28, 2020

I've turned on testing my Satpy package with unstable or pre-releases of some of our dependencies including numpy and xarray. I've found one error so far where in previous versions of xarray it was possible to assign to the numpy array taken from a DataArray.

MCVE Code Sample

import numpy as np
import dask.array as da
import xarray as xr

data = np.arange(15, 301, 15).reshape(2, 10)
data_arr = xr.DataArray(data, dims=('y', 'x'), attrs={'test': 'test'})
data_arr = data_arr.copy()
data_arr = data_arr.expand_dims('bands')
data_arr['bands'] = ['L']
n_arr = np.asarray(data_arr.data)
n_arr[n_arr == 45] = 5

Which results in:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-12-90dae37dd808> in <module>
----> 1 n_arr = np.asarray(data_arr.data); n_arr[n_arr == 45] = 5

ValueError: assignment destination is read-only

Expected Output

A writable array. No error.

Problem Description

If this is expected new behavior then so be it, but wanted to check with the xarray devs before I tried to work around it.

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.7.6 | packaged by conda-forge | (default, Jan 7 2020, 22:33:48) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 5.3.0-7629-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.5 libnetcdf: 4.7.3

xarray: 0.15.1.dev21+g20e6236f
pandas: 1.1.0.dev0+630.gedcf1c8f8
numpy: 1.19.0.dev0+acba244
scipy: 1.5.0.dev0+f614064
netCDF4: 1.5.3
pydap: None
h5netcdf: None
h5py: 2.10.0
Nio: None
zarr: 2.4.0
cftime: 1.0.4.2
nc_time_axis: None
PseudoNetCDF: None
rasterio: 1.1.3
cfgrib: None
iris: None
bottleneck: None
dask: 2.11.0+13.gfcc500c2
distributed: 2.11.0+7.g0d7a31ad
matplotlib: 3.2.0rc3
cartopy: 0.17.0
seaborn: None
numbagg: None
setuptools: 45.2.0.post20200209
pip: 20.0.2
conda: None
pytest: 5.3.5
IPython: 7.12.0
sphinx: 2.4.3

@max-sixty
Copy link
Collaborator

Thanks for the report @djhoese . Confirmed in 0.15.0 too.

Do you know which of those steps causes the array to become non-writeable?

@dcherian
Copy link
Contributor

dcherian commented Feb 29, 2020

It's usually expand_dims.

We should add an FAQ for little things like this.

@max-sixty
Copy link
Collaborator

#2891

@djhoese
Copy link
Contributor Author

djhoese commented Feb 29, 2020

@max-sixty That's exactly it. What's really weird for this is that the original code in Satpy is using a dask array and not a numpy array. It seemed very strange to both copy the DataArray (.copy()), convert the dask array to a numpy array (np.asarray), and then still get a read-only array.

I can understand how xarray would treat numpy arrays and dask arrays the same when it comes to this, but coming from outside the project it is very surprising that a dask array would be marked as read-only when it was used to just create a "new" numpy array.

Feel free to close this or use it as a marker to clarify some documentation or error messages as mentioned in #2891.

@max-sixty
Copy link
Collaborator

Cheers @djhoese

Yes, let's leave this open. Two follow-ups from #2891 (comment); we would welcome PRs:

  • Mention the work-around of using .copy() in the error message xarray prints when an array is read-only.

  • Add a copy argument to expand_dims, so users can write copy=True if they want a writeable result.

@bradyrx
Copy link
Contributor

bradyrx commented Jul 7, 2020

FYI, this is also seen on xr.apply_ufunc, but only when vectorize=True. It seems like ndarrays write switch are turned off when vectorize=True. This is also solved by .copy(), which is good anways to avoid mutating the original ndarrays. Perhaps also a copy=bool could be added to apply_ufunc to create copies of the ndarrays? I'd be happy to lead that PR if it makes sense.

Example:

def match_nans(a, b):
"""Pairwise matching of nans between two time series."""
   # Try with and without `.copy` commands.
   # a = a.copy()
   # b = b.copy()
    if np.isnan(a).any() or np.isnan(b).any():
        idx = np.logical_or(np.isnan(a), np.isnan(b))
        a[idx], b[idx] = np.nan, np.nan
    return a, b

A = xr.DataArray(np.random.rand(10, 5), dims=['time', 'space'])
B = xr.DataArray(np.random.rand(10, 5), dims=['time', 'space'])
A[0, 1] = np.nan
B[5, 0] = np.nan

xr.apply_ufunc(match_nans,
               A,
               B,
               input_core_dims=[['time'], ['time']],
               output_core_dims=[['time'], ['time']],
               # Try with and without vectorize.
               vectorize=True,)

@rbavery
Copy link

rbavery commented Mar 30, 2021

I ran into the same issue as @bradyrx with writable arrays and apply_ufunc. An addition to the docs FAQ or apply_ufunc docs would help clarify that you can't write to the array inputs in the ufunc. I also want to +1 @bradyrx 's idea of making an arg to apply_ufunc that copies the input arrays to handle this common use case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants