You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As part of the efforts described in #10039, I added #10088, and noticed the repo layout has arguably not kept up with the code growth over the past decade. This isn't the most pressing issue, but it does make the returns to refactors lower, since we're moving lines from 11K LOC files to 1K LOC files, rather than anything smaller.
(Even if you think LLMs aren't that useful / aren't going to get better / etc; these changes would still make the repo easier for people to navigate...)
In particular, 2/3 of our code is in xarray/core — 66873 LOC vs 97118 LOC in xarray
I can imagine splitting this up into a few categories:
I'd propose having each of those be paths within xarray/. Then there's more freedom to make new files within those paths relative to the current state, where a new file means adding onto a very long list of files in xarray/core.
I'm not confident on how much disruption that can cause to existing PRs. I think if we land them as commits which mostly just move the files, then git will mostly handle merges well. We can start slowly and see how it goes...
The text was updated successfully, but these errors were encountered:
There's another opportunity for refactoring that could split up large files in #9203.
Related is the general issue of scope creep within the main repository. I think at some point we should revisit the idea of splitting out as much non-core functionality as possible into a separate package (there was a very old issue about this that I'm struggling to find right now - proposing "xr-scipy"). Distinguishing between crucial things such as apply_ufunc and other things that currently live in computation.py such as curvefit would be important step towards that.
What is your issue?
As part of the efforts described in #10039, I added #10088, and noticed the repo layout has arguably not kept up with the code growth over the past decade. This isn't the most pressing issue, but it does make the returns to refactors lower, since we're moving lines from 11K LOC files to 1K LOC files, rather than anything smaller.
(Even if you think LLMs aren't that useful / aren't going to get better / etc; these changes would still make the repo easier for people to navigate...)
In particular, 2/3 of our code is in
xarray/core
— 66873 LOC vs 97118 LOC inxarray
I can imagine splitting this up into a few categories:
dask_array_*
,npcompat
,pdcompat
,array_api_compat
computation
,arithmetic
,nanops
,weighted
, thecurvefit
that's currently indataset
,rolling
,rolling_exp
, maybemissing
merge
,alignment
,concat
I'd propose having each of those be paths within
xarray/
. Then there's more freedom to make new files within those paths relative to the current state, where a new file means adding onto a very long list of files inxarray/core
.I'm not confident on how much disruption that can cause to existing PRs. I think if we land them as commits which mostly just move the files, then git will mostly handle merges well. We can start slowly and see how it goes...
The text was updated successfully, but these errors were encountered: