-
Notifications
You must be signed in to change notification settings - Fork 582
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PIP cache should cache the installed packages as well #330
Comments
Hello @crabhi, thanks for your request! |
would be also nice to follow https://github.com/actions/cache#outputs and provide an output |
This pattern affects more languages ( |
for For pip AFAIR there are no postinstall scripts, then this would not be an issue. |
I'm experimenting with this at the moment and caching site-packages (read: pip output) isn't straightforward either; for instance binary wrappers (black, ..) won't work ( |
Hey, this feature was merged today and should be a part of the near-future release |
but the |
Oh you're talking |
I have a case where building packages for pypy (grpcio, grpcio-tools) takes about 6 minutes-- it's way too slow to introduce a matrix. If anyone has a manual example using actions/cache, please share it. |
I was creating a python venv and then caching that directory, however I hit an issue where that was broken once restored (behaviour was inconsistent). I currently have a job that takes ~6 min to complete, 4 min of which is installation of pip packages. An effective caching of installed packages would be a great boost. |
Could you share the workflow so the people can take a look at it? I think it's possible to hack around while this feature is not here |
- uses: actions/checkout@v3
- id: setup_python
uses: actions/setup-python@v3
with:
python-version: 3.7
- id: python_cache
uses: actions/cache@v3
with:
path: venv
key: pip-${{ steps.setup_python.outputs.python-version }}-${{ hashFiles('requirements.txt') }}
- if: steps.python_cache.outputs.cache-hit != 'true'
run: |
python3 -m venv venv
- run: |
venv/bin/python3 -m pip install -r requirements.txt This worked quite well for me for the most part, just that after a while I started getting errors as such:
|
@rashidnhm have you tried debugging this issue? It seems like the problem may be not in this action. |
So weirdly enough, I have not been able to reproduce the issue. To fix I simply removed the venv code and recreated and re cached it. I'm not even sure what caused it in the first place. My only thought was maybe somehow the cach got corrupted and it kept restoring that. Really can't say. For now I've kept the code I sent above, it's been working well since and haven't hit any other issues |
Ok, nice. The code seemed ok, so that was strange. I'd only advise you to may be not run pip install if cache was hit implying you don't want to modify cache in any way if it's hit to avoid corruption |
So I have done quite a deep dive into the venv corruption issue, and I believe I know what happened, and how to avoid it as well. The version of Python between when my cache was created and when it was restored changed. And I had a generic restore key which matched the old cache key. See detailed explanation below. This is how I had my yaml file was when I hit this error: # BAD CONFIG DO NOT USE (Illustrative purposes only)
- uses: actions/checkout@v3
- id: setup_python
uses: actions/setup-python@v3
with:
python-version: 3.7
- id: python_cache
uses: actions/cache@v3
with:
path: venv
key: pip-${{ steps.setup_python.outputs.python-version }}-${{ hashFiles('requirements.txt') }}
restore-keys: |
pip-${{ steps.setup_python.outputs.python-version }}-
pip- # This line in specific was the cause of the issue
- if: steps.python_cache.outputs.cache-hit != 'true'
run: |
python3 -m venv venv
- run: |
venv/bin/python3 -m pip install -r requirements.txt When this workflow initially ran and saved the venv to cache, the latest release of Python3.7 was 3.7.12 ... meaning the venv created had symlinks to 3.7.12. However, few days later when the workflow ran again, the latest release of Python3.7 was 3.7.13. Notice in my workflow I don't pin my Python patch version, so However, my restore-key The resolution is to really ensure that the output of setup python is always part of the cache key. So any change in python version (even a patch version bump) would create a new cache key. This is the code I have now, it has been working well without any issues. I have updated the workflow with the advice @dhvcc gave in the above comment. The venv is not touched if there is a cache hit. - uses: actions/checkout@v3
- id: setup_python
uses: actions/setup-python@v3
with:
python-version: 3.7
- id: python_cache
uses: actions/cache@v3
with:
path: venv
key: pip-${{ steps.setup_python.outputs.python-version }}-${{ hashFiles('requirements.txt') }}
- if: steps.python_cache.outputs.cache-hit != 'true'
run: |
# Check if venv exists (restored from secondary keys if any, and delete)
# You might not need this line if you only have one primary key for the venv caching
# I kept it in my code as a fail-safe
if [ -d "venv" ]; then rm -rf venv; fi
# Re-create the venv
python3 -m venv venv
# Install dependencies
venv/bin/python3 -m pip install -r requirements.txt |
Hi, @rashidnhm 👋 Thanks a lot for such a detailed explanation, it should help others who encountered such issues. |
Any news on how to flag to cache the installed packages, and not only the downloaded ones, with |
What do you exactly mean by that? A bit more context would be helpful to avoid misunderstandings |
Sorry, @dhvcc if I didn't manage to make myself clear.
It would be great if the installed packages could be cached as well (the purpose of this issue #330) through |
setup-python action caches only downloads, so install always needed open issue for installed packages cache: actions/setup-python#330
@Avasam possibly at least less strain on pypi. Also we should test small and big amounts of dependencies |
Just wanted to add an anecdote of my own experience. TorchGeo has a long list of dependencies: Install times without caching vary quite a bit by OS and Python version:
We first tried using the cache feature of setup-python: - name: Set up python
uses: actions/[email protected]
with:
python-version: ${{ matrix.python-version }}
cache: 'pip'
cache-dependency-path: |
requirements/required.txt
requirements/datasets.txt
requirements/tests.txt
- name: Install pip dependencies
run: pip install -r requirements/required.txt -r requirements/datasets.txt -r requirements/tests.txt Not only do install times not significantly improve, in many cases it's actually worse!
Finally, we tried the setup proposed in this blog that manually caches the entire Python installation: - name: Set up python
uses: actions/[email protected]
with:
python-version: ${{ matrix.python-version }}
- name: Cache dependencies
uses: actions/[email protected]
id: cache
with:
path: ${{ env.pythonLocation }}
key: ${{ env.pythonLocation }}-${{ hashFiles('requirements/required.txt') }}-${{ hashFiles('requirements/datasets.txt') }}-${{ hashFiles('requirements/tests.txt') }}
- name: Install pip dependencies
if: steps.cache.outputs.cache-hit != 'true'
run: pip install -r requirements/required.txt -r requirements/datasets.txt -r requirements/tests.txt This resulted in significantly faster installation times, which could likely be further improved by only caching the site-packages directory:
Apparently slower Windows caching is a known issue: actions/cache#752. So yes, if setup-python also cached installed packages, that would be awesome! |
In hindsight, this is a bad idea, many tools like black or flake8 also install files into bin so we'll at least need to cache bin too. |
I addressed this point a while ago (above) - recap here:
So, instead of invoking |
That's a decent workaround, but I don't think it's realistic to expect all users to change how they invoke other steps later in their workflow. I think we would have to cache bin too. Possibly everything. Bonus of caching everything is that we have to install Python from a cache anyway. |
Most definitely not a catch-all! To be honest I'm not confident there's a straightforward solution.. |
The workaround from @adamjstewart seems to work wonders indeed ! But I think a standard implementation from this repository would be a great addition. Any updates on it from the dev team ? |
This right here has been a life saver for me - I toiled over this caching for so long, but this got me there!! Thank you so so so much!! |
@adamjstewart, thanks for your awesome post, it really reduced my build times. It's definitely possible to achieve your idea of caching only the necessary items instead of the entire python build by caching the site-packages and bin (in order to get executable packages like black, ruff, etc..). In my case, my two dependency files are classic pip-tools
Finally I run the pip install step conditionally based on successful cache hits with the following:
Mileage can vary for small dependency lists based on the cache restore network speed. |
@adamjstewart Thanks for the cache example, it really helps. :-) I have one more suggestion that I decided to use, because I want to avoid downloading all pip dependencies everytime any of them changes - to introduce a second cache just for pip cache. - name: "Python: Setup Python"
uses: actions/[email protected]
with:
python-version: 3.10.13
- name: "Cache: Cache Python"
id: python-cache
uses: actions/[email protected]
with:
path: ${{env.pythonLocation}}
key: ${{env.pythonLocation}}-${{hashFiles('requirements-dev.txt')}}-${{hashFiles('requirements.txt')}}
- name: "Shell: Get pip cache dir"
id: pip-cache-dir
if: steps.python-cache.outputs.cache-hit != 'true'
run: |
python -m pip install -U pip
pip install -U wheel
echo "pip-cache-dir=$(pip cache dir)" >> ${GITHUB_OUTPUT}
- name: "Cache: Cache pip"
if: steps.python-cache.outputs.cache-hit != 'true'
uses: actions/[email protected]
with:
path: ${{steps.pip-cache-dir.outputs.pip-cache-dir}}
key: 3.10-${{hashFiles('requirements-dev.txt')}}-${{hashFiles('requirements.txt')}}
restore-keys: |
3.10-${{hashFiles('requirements-dev.txt')}}-
3.10-
- name: "Shell: Install pip dependencies"
if: steps.python-cache.outputs.cache-hit != 'true'
run: pip install -r requirements-dev.txt |
Unfortunately, the built-in cache functionality only supports dependencies downloaded by package manger, not the whole Python as expected. I made an example to cache both of download libraries and binaries ( - name: Cache dependencies
uses: actions/[email protected]
id: cache
with:
path: ${{ runner.tool_cache }}/Python/${{ inputs.python-version }} # e.g /opt/hostedtoolcache/Python/3.11.6
key: ${{ runner.tool_cache }}/Python/${{ inputs.python-version }}/${{ runner.arch }}-${{ hashFiles('requirements.txt') }}
- name: Set up Python env
uses: actions/setup-python@v4
with:
token: ${{ secrets.xyz }} # if needed
python-version: ${{ inputs.python-version }}
- name: Install pip dependencies
shell: bash
run: |
pip install -r requirements.txt The key is on the The console output with cache should be
Note: the above ^ testing on self-host runner. |
This works pretty well after the cache is populated, however, populating the cache for me seems to take 13m with the deps we have, not just on initial creation, but also when the cache is busted This is running
Has anyone had any luck with a similar approach that will handle partial restores more cleanly (without using pipenv, which is what I've typically done for caching in the past) |
Description:
Currently,
setup-python
caches only the~/.cache/pip
directory to avoid redownloads. However, it doesn't cache the installed packages. As some package have lengthy installation steps, this leads to delays in builds.You can see the current behaviour for example in https://github.com/crabhi/setup-python-cache-test/actions/runs/1789016634 (or in attached build.txt) - the
pip install
output shows "Collecting" and "Installing" instead of "Requirement already satisfied" for all packages.Justification:
For example installing the
ansible
package takes well over a minute even if it's already downloaded.Are you willing to submit a PR?
Yes, I can try.
The text was updated successfully, but these errors were encountered: