Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deep copying a datatree somehow duplicates data onto child nodes #9454

Closed
TomNicholas opened this issue Sep 8, 2024 · 2 comments · Fixed by #9457
Closed

Deep copying a datatree somehow duplicates data onto child nodes #9454

TomNicholas opened this issue Sep 8, 2024 · 2 comments · Fixed by #9457
Labels
bug topic-DataTree Related to the implementation of a DataTree class

Comments

@TomNicholas
Copy link
Member

TomNicholas commented Sep 8, 2024

What happened?

Deep copying a DataTree object somehow duplicates data onto a child node (!)

What did you expect to happen?

Copying to return an identical subtree.

Minimal Complete Verifiable Example

In [15]: dt = DataTree(data=xr.Dataset(coords={'x': ('x', [0, 1])}))

In [16]: print(dt)
<xarray.DataTree>
Group: /
    Dimensions:  (x: 2)
    Coordinates:
      * x        (x) int64 16B 0 1
    Data variables:
        *empty*

In [17]: dt['c'] = DataTree()

In [18]: print(dt)
<xarray.DataTree>
Group: /Dimensions:  (x: 2)
│   Coordinates:
│     * x        (x) int64 16B 0 1Data variables:
│       *empty*
└── Group: /c

In [19]: dt.copy(deep=True)
Out[19]: 
<xarray.DataTree>
Group: /Dimensions:  (x: 2)
│   Coordinates:
│     * x        (x) int64 16B 0 1Data variables:
│       *empty*
└── Group: /c
        Dimensions:  (x: 2)
        Coordinates:
          * x        (x) int64 16B 0 1
        Data variables:
            *empty*

Anything else we need to know?

No response

Environment

main

@TomNicholas TomNicholas added bug topic-DataTree Related to the implementation of a DataTree class labels Sep 8, 2024
@TomNicholas
Copy link
Member Author

Wait I'm being an idiot - the coordinates appearing in both groups is expected behaviour now that we have coordinate inheritance.

What's weird is that they don't appear in the child node when we print(dt)! This implies some bug with coordinate inheritance / handling of empty nodes (cc @shoyer).

@shoyer
Copy link
Member

shoyer commented Sep 8, 2024

I think I know what's going on... stay tuned for a fix

shoyer added a commit to shoyer/xarray that referenced this issue Sep 8, 2024
Fixes pydata#9454

Previously, we were copying parent coordinates/dimensions onto all
child nodes. This is not obvious in the current repr, but you can see
it from looking at the private `._node_coord_variables` and
`._node_dims`.

To make the use of `_to_dataset_view()` little more obvious, I've added
a required boolean `inherited` argument.
@shoyer shoyer closed this as completed in 40666b2 Sep 9, 2024
hollymandel pushed a commit to hollymandel/xarray that referenced this issue Sep 23, 2024
* Fix inheritance in DataTree.copy()

Fixes pydata#9454

Previously, we were copying parent coordinates/dimensions onto all
child nodes. This is not obvious in the current repr, but you can see
it from looking at the private `._node_coord_variables` and
`._node_dims`.

To make the use of `_to_dataset_view()` little more obvious, I've added
a required boolean `inherited` argument.

* typing error

* add missing inherited argument

* Apply suggestions from code review

Co-authored-by: Tom Nicholas <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* tweaks to from_dict

* add issue link

---------

Co-authored-by: Tom Nicholas <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug topic-DataTree Related to the implementation of a DataTree class
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants