Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Repo layout proposal #10089

Open
max-sixty opened this issue Mar 2, 2025 · 2 comments
Open

Repo layout proposal #10089

max-sixty opened this issue Mar 2, 2025 · 2 comments

Comments

@max-sixty
Copy link
Collaborator

max-sixty commented Mar 2, 2025

What is your issue?

As part of the efforts described in #10039, I added #10088, and noticed the repo layout has arguably not kept up with the code growth over the past decade. This isn't the most pressing issue, but it does make the returns to refactors lower, since we're moving lines from 11K LOC files to 1K LOC files, rather than anything smaller.

(Even if you think LLMs aren't that useful / aren't going to get better / etc; these changes would still make the repo easier for people to navigate...)

In particular, 2/3 of our code is in xarray/core — 66873 LOC vs 97118 LOC in xarray

I can imagine splitting this up into a few categories:

  • compat — dask_array_*, npcompat, pdcompat, array_api_compat
  • compute / computation — computation, arithmetic, nanops, weighted, the curvefit that's currently in dataset, rolling, rolling_exp, maybe missing
  • reshape / align / merge (need a better name) — merge, alignment, concat

I'd propose having each of those be paths within xarray/. Then there's more freedom to make new files within those paths relative to the current state, where a new file means adding onto a very long list of files in xarray/core.

I'm not confident on how much disruption that can cause to existing PRs. I think if we land them as commits which mostly just move the files, then git will mostly handle merges well. We can start slowly and see how it goes...

@max-sixty max-sixty added the needs triage Issue that has not been reviewed by xarray team member label Mar 2, 2025
@dcherian dcherian removed the needs triage Issue that has not been reviewed by xarray team member label Mar 3, 2025
@dcherian
Copy link
Contributor

dcherian commented Mar 3, 2025

I agree. One specific suggestion: breaking apply_ufunc.py out of computation.py is easy, and shouldn't be disruptive.

@TomNicholas
Copy link
Member

I also support this.

reshape / align / merge (need a better name)

manipulation? restructuring?

There's another opportunity for refactoring that could split up large files in #9203.

Related is the general issue of scope creep within the main repository. I think at some point we should revisit the idea of splitting out as much non-core functionality as possible into a separate package (there was a very old issue about this that I'm struggling to find right now - proposing "xr-scipy"). Distinguishing between crucial things such as apply_ufunc and other things that currently live in computation.py such as curvefit would be important step towards that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants