-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix caching of gt_cache directories on CircleCI #327
Conversation
Dockerfile
Outdated
@@ -10,9 +10,11 @@ RUN apt-get update && apt-get install -y make \ | |||
netcdf-bin \ | |||
libnetcdf-dev \ | |||
python3 \ | |||
python3-pip | |||
python3-pip \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will revert the changes to this file, they didn't help.
cpu mpi savepoint tests now pass in 5 minutes when fully cached, instead of 1.5h. |
paths: | ||
- .gt_cache | ||
- .gt_cache_000000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought GT_CACHE_ROOT
would specify the cache directory?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't set GT_CACHE_ROOT for this plan, though?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, looks like only for running 54 ranks mpi test
@@ -24,6 +24,7 @@ PULL ?=True | |||
DEV ?=y | |||
CHECK_CHANGED_SCRIPT=$(CWD)/changed_from_main.py | |||
CONTAINER_CMD?=docker | |||
SAVEPOINT_SETUP=pip3 list && python3 -m gt4py.gt_src_manager install |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think at some point this was part of the dockerfile? Also, I see that this is called during set up environment in circleci, does that not work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The complicated case is that on the CircleCI test, we:
- Cannot run this outside of the container, because the environment is not set up / installed (I could add it just to run this command, but that feels like overkill)
- Are running the MPI-parallel tests, so if the process itself tries to clone these sources it happens 6 times in parallel, which leads to errors
- Need to bind-mount our directory into the container so that we can retain the caches, but doing so over-writes the gt4py directory (including gridtools sources) with our directory that does not have these
I could turn this into a ?=
and modify it only for the CircleCI tests, if that would be better? Normally this isn't an issue because we have/can get the gridtools sources on our local filesystem copy of gt4py.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, just to double check, if we already had gridtools installed, this wouldn't do anything right?
jenkins fail run at https://jenkins.ginko.ch/job/pace_PR/2852/ relaunching to check if it's persistent or random |
launch jenkins |
Purpose
This PR fixes an issue where gt4py cache directories were not cached between executions on CircleCI.
Infrastructure changes:
python3 -m gt4py.gt_src_manager install
before running tests inside docker, to avoid issues arising from git cloning these sources in a MPI-parallel contextChecklist
Before submitting this PR, please make sure:
pace-util
, HISTORY has been updated