Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move data files out of repo #20

Open
ruestefa opened this issue May 23, 2022 · 2 comments
Open

Move data files out of repo #20

ruestefa opened this issue May 23, 2022 · 2 comments

Comments

@ruestefa
Copy link
Contributor

  • PyFlexPlot version: v1.0.9
  • Python version: any
  • Operating System: any

Description

NetCDF files used in tests/slow are stored in the repo (tests/data), both in original and reduced form. This is not ideal as they make the repo unnecessarily big, and git doesn't handle binary files very well (AFAIK). They should instead be stored somewhere and fetched before running the tests (e.g., with a get_test_data.sh script).

As only the reduced files are used in the tests, as a first step only the original files could be moved out of the repo, which would not necessitate any changes to the tests.

Once moved out, the repo could be cleansed of the binary files still in it's history in order to reduce the size of the repo. However, this should be done with care, as it would break the tests for old revisions (the data would have to be fetched manually or by checking out the get_test_data.sh script from a new revision. Alternatively, a new repo could be created that starts w/o test data and the current repo could be archived.

@pirmink
Copy link
Collaborator

pirmink commented May 23, 2022

At CSCS, I copied the NetCDF files to /store/mch/msopr/pyflexplot_testdata.
I added a test case for an extra-large cloud passing over the date line from both sides in /store/mch/msopr/pyflexplot_testdata/ifs-hres, however this could be removed again as a respective test case has been added as fast test by @ruestefa .

@ruestefa
Copy link
Contributor Author

Once the data files have been moved out, this would be a good approach to cleanly separate the old (with data files) from the present-and-future state (w/o data files) w/o losing access to the git history if necessary. This wouldn't even necessitate removing the data files from history (unless the "history" version of the repo is deemed too big, that is), since the new repo would start with a clean slate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants