Repo layout proposal #10089

max-sixty · 2025-03-02T22:32:03Z

What is your issue?

As part of the efforts described in #10039, I added #10088, and noticed the repo layout has arguably not kept up with the code growth over the past decade. This isn't the most pressing issue, but it does make the returns to refactors lower, since we're moving lines from 11K LOC files to 1K LOC files, rather than anything smaller.

(Even if you think LLMs aren't that useful / aren't going to get better / etc; these changes would still make the repo easier for people to navigate...)

In particular, 2/3 of our code is in xarray/core — 66873 LOC vs 97118 LOC in xarray

I can imagine splitting this up into a few categories:

compat — dask_array_*, npcompat, pdcompat, array_api_compat
compute / computation — computation, arithmetic, nanops, weighted, the curvefit that's currently in dataset, rolling, rolling_exp, maybe missing
reshape / align / merge (need a better name) — merge, alignment, concat

I'd propose having each of those be paths within xarray/. Then there's more freedom to make new files within those paths relative to the current state, where a new file means adding onto a very long list of files in xarray/core.

I'm not confident on how much disruption that can cause to existing PRs. I think if we land them as commits which mostly just move the files, then git will mostly handle merges well. We can start slowly and see how it goes...

The text was updated successfully, but these errors were encountered:

dcherian · 2025-03-03T16:40:37Z

I agree. One specific suggestion: breaking apply_ufunc.py out of computation.py is easy, and shouldn't be disruptive.

TomNicholas · 2025-03-04T16:21:23Z

I also support this.

reshape / align / merge (need a better name)

manipulation? restructuring?

There's another opportunity for refactoring that could split up large files in #9203.

Related is the general issue of scope creep within the main repository. I think at some point we should revisit the idea of splitting out as much non-core functionality as possible into a separate package (there was a very old issue about this that I'm struggling to find right now - proposing "xr-scipy"). Distinguishing between crucial things such as apply_ufunc and other things that currently live in computation.py such as curvefit would be important step towards that.

max-sixty added the needs triage Issue that has not been reviewed by xarray team member label Mar 2, 2025

dcherian removed the needs triage Issue that has not been reviewed by xarray team member label Mar 3, 2025

TomNicholas added the topic-internals label Mar 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repo layout proposal #10089

Repo layout proposal #10089

max-sixty commented Mar 2, 2025 •

edited

Loading

dcherian commented Mar 3, 2025

TomNicholas commented Mar 4, 2025

Repo layout proposal #10089

Repo layout proposal #10089

Comments

max-sixty commented Mar 2, 2025 • edited Loading

What is your issue?

dcherian commented Mar 3, 2025

TomNicholas commented Mar 4, 2025

max-sixty commented Mar 2, 2025 •

edited

Loading