Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xr.DataSet.expand_dims axis option doesn't work #7456

Closed
4 tasks done
cmdupuis3 opened this issue Jan 19, 2023 · 15 comments
Closed
4 tasks done

xr.DataSet.expand_dims axis option doesn't work #7456

cmdupuis3 opened this issue Jan 19, 2023 · 15 comments

Comments

@cmdupuis3
Copy link

What happened?

When I try to change the position of a new dimension added with expand_dims by setting the axis option, nothing happens.

What did you expect to happen?

I would expect this option to add new dimensions in the position I selected, as the documentation describes. I would expect setting axis=0 to give a result like this:

Frozen({'yomama': 1, 'a': 3, 'b': 3})

Minimal Complete Verifiable Example

da = xr.DataArray([[1,2,3],[4,5,6],[7,8,9]], coords={'a':[1,2,3], 'b':[1,2,3]})
ds = xr.Dataset({'da':da})
ds1 = ds.expand_dims('yomama', axis=0)
print(ds1.dims)
ds2 = ds.expand_dims('yomama', axis=2)
print(ds2.dims)

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

Frozen({'a': 3, 'b': 3, 'yomama': 1})
Frozen({'a': 3, 'b': 3, 'yomama': 1})

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.9.13 | packaged by conda-forge | (main, May 27 2022, 16:56:21) [GCC 10.3.0] python-bits: 64 OS: Linux OS-release: 5.10.133+ machine: x86_64 processor: x86_64 byteorder: little LC_ALL: C.UTF-8 LANG: C.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.2 libnetcdf: 4.8.1

xarray: 2022.10.0
pandas: 1.5.0
numpy: 1.23.3
scipy: 1.9.1
netCDF4: 1.6.1
pydap: installed
h5netcdf: 1.0.2
h5py: 3.7.0
Nio: None
zarr: 2.13.3
cftime: 1.6.2
nc_time_axis: 1.4.1
PseudoNetCDF: None
rasterio: 1.3.2
cfgrib: 0.9.10.2
iris: None
bottleneck: 1.3.5
dask: 2022.10.0
distributed: 2022.10.0
matplotlib: 3.6.1
cartopy: 0.21.0
seaborn: 0.12.0
numbagg: None
fsspec: 2022.8.2
cupy: None
pint: 0.19.2
sparse: 0.13.0
flox: 0.6.0
numpy_groupies: 0.9.19
setuptools: 65.5.0
pip: 22.3
conda: None
pytest: 7.1.3
IPython: 8.5.0
sphinx: None

@cmdupuis3 cmdupuis3 added bug needs triage Issue that has not been reviewed by xarray team member labels Jan 19, 2023
@kmuehlbauer
Copy link
Contributor

@cmdupuis3 dimensions are not given in a particular order in the Dataset. There could be two DataArray's which have reversed dimensions for instance.

You would need to inspect the DataArray's:

da = xr.DataArray([[1,2,3],[4,5,6],[7,8,9]], coords={'a':[1,2,3], 'b':[1,2,3]})
ds = xr.Dataset({'da':da})
ds0 = ds.expand_dims('yomama', axis=0)
print(ds0.dims)
print(ds0.da.dims)
ds1 = ds.expand_dims('yomama', axis=1)
print(ds1.dims)
print(ds1.da.dims)
ds2 = ds.expand_dims('yomama', axis=2)
print(ds2.dims)
print(ds2.da.dims)
Frozen({'a': 3, 'b': 3, 'yomama': 1})
('yomama', 'a', 'b')
Frozen({'a': 3, 'b': 3, 'yomama': 1})
('a', 'yomama', 'b')
Frozen({'a': 3, 'b': 3, 'yomama': 1})
('a', 'b', 'yomama')

@keewis
Copy link
Collaborator

keewis commented Jan 19, 2023

I wonder if we shouldn't recommend using expand_dims without axis plus a transpose afterwards if we care about dimension order? Most of xarray's functions work without making assumptions about the dimension order, and I don't think expand_dims should, either (though I might be missing something, of course)

@dcherian dcherian added bug usage question needs triage Issue that has not been reviewed by xarray team member topic-documentation contrib-good-first-issue and removed needs triage Issue that has not been reviewed by xarray team member bug labels Jan 19, 2023
@dcherian
Copy link
Contributor

I think it might be enough to describe this thoroughly with examples in the docstring., though I do like the solution of recommending transpose.

@cmdupuis3
Copy link
Author

cmdupuis3 commented Jan 19, 2023

I mean that's fine, but in that case, the documentation is very misleading

@dcherian
Copy link
Contributor

the documentation is very misleading

Updating the docstring would be a fairly easy and impactful PR if you're up for it!

@cmdupuis3
Copy link
Author

Yeah, I could put something together. It'll probably have to wait until next week though.

@cmdupuis3
Copy link
Author

cmdupuis3 commented Jan 19, 2023

EDIT: Lots of confusion below about nothing, plz disregard

Okay, regardless of expected behavior here, my particular use-case requires that I transpose these dimensions. Can someone show me a way to do this? I tried to explain the xarray point of view to Keras, but Keras is really not interested ;)

I tried something like ds.expand_dims("sample").transpose('sample','nlat','nlon') to complete futility, probably something to do with the Frozen stuff if I had to guess.

@maxrjones
Copy link
Contributor

Okay, regardless of expected behavior here, my particular use-case requires that I transpose these dimensions. Can someone show me a way to do this? I tried to explain the xarray point of view to Keras, but Keras is really not interested ;)

I tried something like ds.expand_dims("sample").transpose('sample','nlat','nlon') to complete futility, probably something to do with the Frozen stuff if I had to guess.

The transpose method should change the dimension order on each array in the dataset. One particularly important component from Kai's comment above is that ds.dims does not tell you information about the axis order for the DataArrays in the Dataset. Can you please describe how the DataArray dimension order reported by the code below differs from your expectations?

for var in ds.data_vars:
    print(ds[var].sizes)

@cmdupuis3
Copy link
Author

cmdupuis3 commented Jan 19, 2023

Nvm, my use case isn't what I thought it was, but I'll push the issue a bit.

So I'm not disputing anything about what these functions actually do now, the issue I have is that the functions here treat the dimension order of a DataSet as if it's arbitrary, but calling [] on a DataSet slices it in a decidedly non-arbitrary way. It turns out that [] actually does care about which axis you select if you call expand_dims first, and you index with an integer like [0]. I think this inconsistency is what's confusing to me atm.

@maxrjones
Copy link
Contributor

I'm not an xarray developer, but my guess is that your argument is why positional indexing/slicing is not available for datasets.

As for the specific case of using axis parameter of expand_dims, I think this is useful for the case in which the user is either confident about the axis order in each DataArray or will use label based operations such that axis order doesn’t matter. I was curious so I did a quick comparison of the speed for using this parameter versus a subsequent transpose operation:

shape = (10, 50, 100, 200)
ds = xr.Dataset(
    {
        "foo": (["time", "x", "y", "z"], np.random.rand(*shape)),
        "bar": (["time", "x", "y", "z"], np.random.randint(0, 10, shape)),
    },
    {
        "time": (["time"], np.arange(shape[0])),
        "x": (["x"], np.arange(shape[1])),
        "y": (["y"], np.arange(shape[2])),
        "z": (["z"], np.arange(shape[3])),
    },
)
%%timeit -r 4
ds1 = ds.expand_dims("sample", axis=1)

38.1 µs ± 76 ns per loop (mean ± std. dev. of 4 runs, 10,000 loops each)

%%timeit -r 4
ds2 = ds.expand_dims("sample").transpose("time", "sample", "x", "y", "z")

172 µs ± 612 ns per loop (mean ± std. dev. of 4 runs, 10,000 loops each)

@cmdupuis3
Copy link
Author

Okay I think I get the philosophy now. However, indexing a DataSet with an integer actually does work. If performance is the goal, shouldn't something like ds[0] throw a warning or an error?

@maxrjones
Copy link
Contributor

Okay I think I get the philosophy now. However, indexing a DataSet with an integer actually does work. If performance is the goal, shouldn't something like ds[0] throw a warning or an error?

Can you share your code for this? I would interpret that as meaning you have a variable in your dataset mapped to an integer key, which is allowed as a hashable type but can cause problems with downstream packages.

@cmdupuis3
Copy link
Author

I was thinking something like this:

    da = xr.DataArray([[1,2,3],[4,5,6],[7,8,9]], coords={'a':[1,2,3], 'b':[1,2,3]})
    ds = xr.Dataset({'da':da})
    ds1 = ds.expand_dims('yomama', axis=0)
    print(ds1[0].dims)
    ds2 = ds.expand_dims('yomama', axis=2)
    print(ds2[0].dims)

...but this throws an error (like it should). I think I must be reading my code wrong lol

@Karimat22

This comment was marked as off-topic.

@kmuehlbauer
Copy link
Contributor

Closing this now. Feel free to reopen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants