Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix caching of gt_cache directories on CircleCI #327

Merged
merged 6 commits into from
Sep 16, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 11 additions & 6 deletions .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,9 @@ jobs:
command: git submodule status external/gt4py | awk '{print $1;}' > gt4py_version.txt
- restore_cache:
keys:
- v1-gt_cache-serial-<<parameters.backend>>-{{ checksum "gt4py_version.txt" }}
- v3-gt_cache-serial-<<parameters.backend>>-{{ checksum "gt4py_version.txt" }}
- restore_cache:
keys:
- v1-savepoints-{{ checksum "Makefile.data_download" }}
- run:
name: build image
Expand All @@ -97,12 +99,13 @@ jobs:
- run:
name: run tests
command: |
TEST_ARGS="--backend=<<parameters.backend>> -v -s" DEV=n make savepoint_tests
TEST_ARGS="--backend=<<parameters.backend>> -v -s" DEV=y make savepoint_tests
no_output_timeout: 3h
- save_cache:
key: v1-gt_cache-serial-<<parameters.backend>>-{{ checksum "gt4py_version.txt" }}
key: v3-gt_cache-serial-<<parameters.backend>>-{{ checksum "gt4py_version.txt" }}
paths:
- .gt_cache
- .gt_cache_000000
Copy link
Collaborator

@elynnwu elynnwu Sep 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought GT_CACHE_ROOT would specify the cache directory?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't set GT_CACHE_ROOT for this plan, though?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, looks like only for running 54 ranks mpi test

- save_cache:
key: v1-savepoints-{{ checksum "Makefile.data_download" }}
paths:
Expand Down Expand Up @@ -139,7 +142,9 @@ jobs:
command: git submodule status external/gt4py | awk '{print $1;}' > gt4py_version.txt
- restore_cache:
keys:
- v1-gt_cache-<<parameters.backend>>-{{ checksum "gt4py_version.txt" }}
- v2-gt_cache-<<parameters.backend>>-{{ checksum "gt4py_version.txt" }}
- restore_cache:
keys:
- v1-savepoints-{{ checksum "Makefile.data_download" }}
- run:
name: build image
Expand All @@ -148,10 +153,10 @@ jobs:
- run:
name: run tests
command: |
TEST_ARGS="--backend=<<parameters.backend>> -v -s" DEV=n make savepoint_tests_mpi
TEST_ARGS="--backend=<<parameters.backend>> -v -s" DEV=y make savepoint_tests_mpi
no_output_timeout: 3h
- save_cache:
key: v1-gt_cache-<<parameters.backend>>-{{ checksum "gt4py_version.txt" }}
key: v2-gt_cache-<<parameters.backend>>-{{ checksum "gt4py_version.txt" }}
paths:
- .gt_cache
- .gt_cache_000000
Expand Down
13 changes: 7 additions & 6 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ PULL ?=True
DEV ?=y
CHECK_CHANGED_SCRIPT=$(CWD)/changed_from_main.py
CONTAINER_CMD?=docker
SAVEPOINT_SETUP=pip3 list && python3 -m gt4py.gt_src_manager install
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think at some point this was part of the dockerfile? Also, I see that this is called during set up environment in circleci, does that not work?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The complicated case is that on the CircleCI test, we:

  • Cannot run this outside of the container, because the environment is not set up / installed (I could add it just to run this command, but that feels like overkill)
  • Are running the MPI-parallel tests, so if the process itself tries to clone these sources it happens 6 times in parallel, which leads to errors
  • Need to bind-mount our directory into the container so that we can retain the caches, but doing so over-writes the gt4py directory (including gridtools sources) with our directory that does not have these

I could turn this into a ?= and modify it only for the CircleCI tests, if that would be better? Normally this isn't an issue because we have/can get the gridtools sources on our local filesystem copy of gt4py.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, just to double check, if we already had gridtools installed, this wouldn't do anything right?


VOLUMES ?=

Expand Down Expand Up @@ -103,11 +104,11 @@ test_util:

savepoint_tests: build
TARGET=dycore $(MAKE) get_test_data
$(CONTAINER_CMD) $(CONTAINER_FLAGS) bash -c "pip3 list && cd $(PACE_PATH) && pytest --data_path=$(EXPERIMENT_DATA_RUN)/dycore/ $(TEST_ARGS) $(FV3CORE_THRESH_ARGS) $(PACE_PATH)/fv3core/tests/savepoint"
$(CONTAINER_CMD) $(CONTAINER_FLAGS) bash -c "$(SAVEPOINT_SETUP) && cd $(PACE_PATH) && pytest --data_path=$(EXPERIMENT_DATA_RUN)/dycore/ $(TEST_ARGS) $(FV3CORE_THRESH_ARGS) $(PACE_PATH)/fv3core/tests/savepoint"

savepoint_tests_mpi: build
TARGET=dycore $(MAKE) get_test_data
$(CONTAINER_CMD) $(CONTAINER_FLAGS) bash -c "pip3 list && cd $(PACE_PATH) && $(MPIRUN_CALL) python3 -m mpi4py -m pytest --maxfail=1 --data_path=$(EXPERIMENT_DATA_RUN)/dycore/ $(TEST_ARGS) $(FV3CORE_THRESH_ARGS) -m parallel $(PACE_PATH)/fv3core/tests/savepoint"
$(CONTAINER_CMD) $(CONTAINER_FLAGS) bash -c "$(SAVEPOINT_SETUP) && cd $(PACE_PATH) && $(MPIRUN_CALL) python3 -m mpi4py -m pytest --maxfail=1 --data_path=$(EXPERIMENT_DATA_RUN)/dycore/ $(TEST_ARGS) $(FV3CORE_THRESH_ARGS) -m parallel $(PACE_PATH)/fv3core/tests/savepoint"

dependencies.svg: dependencies.dot
dot -Tsvg $< -o $@
Expand All @@ -119,21 +120,21 @@ constraints.txt: driver/setup.py dsl/setup.py fv3core/setup.py physics/setup.py

physics_savepoint_tests: build
TARGET=physics $(MAKE) get_test_data
$(CONTAINER_CMD) $(CONTAINER_FLAGS) bash -c "pip3 list && cd $(PACE_PATH) && pytest --data_path=$(EXPERIMENT_DATA_RUN)/physics/ $(TEST_ARGS) $(PHYSICS_THRESH_ARGS) $(PACE_PATH)/physics/tests/savepoint"
$(CONTAINER_CMD) $(CONTAINER_FLAGS) bash -c "$(SAVEPOINT_SETUP) && cd $(PACE_PATH) && pytest --data_path=$(EXPERIMENT_DATA_RUN)/physics/ $(TEST_ARGS) $(PHYSICS_THRESH_ARGS) $(PACE_PATH)/physics/tests/savepoint"

physics_savepoint_tests_mpi: build
TARGET=physics $(MAKE) get_test_data
$(CONTAINER_CMD) $(CONTAINER_FLAGS) bash -c "pip3 list && cd $(PACE_PATH) && $(MPIRUN_CALL) python -m mpi4py -m pytest --maxfail=1 --data_path=$(EXPERIMENT_DATA_RUN)/physics/ $(TEST_ARGS) $(PHYSICS_THRESH_ARGS) -m parallel $(PACE_PATH)/physics/tests/savepoint"
$(CONTAINER_CMD) $(CONTAINER_FLAGS) bash -c "$(SAVEPOINT_SETUP) && cd $(PACE_PATH) && $(MPIRUN_CALL) python -m mpi4py -m pytest --maxfail=1 --data_path=$(EXPERIMENT_DATA_RUN)/physics/ $(TEST_ARGS) $(PHYSICS_THRESH_ARGS) -m parallel $(PACE_PATH)/physics/tests/savepoint"

test_main: build
$(CONTAINER_CMD) $(CONTAINER_FLAGS) bash -c "pip3 list && cd $(PACE_PATH) && pytest $(TEST_ARGS) $(PACE_PATH)/tests/main"
$(CONTAINER_CMD) $(CONTAINER_FLAGS) bash -c "$(SAVEPOINT_SETUP) && cd $(PACE_PATH) && pytest $(TEST_ARGS) $(PACE_PATH)/tests/main"

test_mpi_54rank:
mpirun -n 54 $(MPIRUN_ARGS) python3 -m mpi4py -m pytest tests/mpi_54rank

driver_savepoint_tests_mpi: build
TARGET=driver $(MAKE) get_test_data
$(CONTAINER_CMD) $(CONTAINER_FLAGS) bash -c "pip3 list && cd $(PACE_PATH) && $(MPIRUN_CALL) python -m mpi4py -m pytest --maxfail=1 --data_path=$(EXPERIMENT_DATA_RUN)/driver/ $(TEST_ARGS) $(PHYSICS_THRESH_ARGS) -m parallel $(PACE_PATH)/physics/tests/savepoint"
$(CONTAINER_CMD) $(CONTAINER_FLAGS) bash -c "$(SAVEPOINT_SETUP) && cd $(PACE_PATH) && $(MPIRUN_CALL) python -m mpi4py -m pytest --maxfail=1 --data_path=$(EXPERIMENT_DATA_RUN)/driver/ $(TEST_ARGS) $(PHYSICS_THRESH_ARGS) -m parallel $(PACE_PATH)/physics/tests/savepoint"

docs: ## generate Sphinx HTML documentation
$(MAKE) -C docs html
Expand Down