Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch from --user to venv for PROD image and enable uv #37796

Merged
merged 1 commit into from
Mar 6, 2024

Conversation

potiuk
Copy link
Member

@potiuk potiuk commented Feb 29, 2024

This PR introduces a joint way to treat the .local (--user) folder
as both - venv and --user package installation. It fixes a number
of problems the --user installation created us in the past and
does it in fully backwards compatible way.

This improves both "production" use for end user as well as local
iteration on the PROD image during tests - but also for CI.

Improvements for "end user":

  • user does not have to use pip install --user to install new
    packages any more and it is not enabled by default with PIP_USER
    flag.

  • users can use uv to install packages when they extend the image
    (but it's not obligatory - pip continues working as it did)

  • users can use uv to build custom production image, which gives
    40%-50% saving for image build time compring to pip.

  • python -m venv --system-site-packages continues to use the
    .local packages from the .local installation (and not uses
    them if --system-site-packages is not used) - so we have full
    compatibility with previous images.

Improvements for development:

  • when image is built from sources (no --use-docker-context-files
    are specified), airflow is installed in --editable mode, which
    means that airflow + all providers are installed locally from
    airflow sources, not from packages - which means that both
    airflow and providers have the latest version inside the
    prod image.

  • when local sources changes and you want to run k8s tests locally,
    it is now WAY faster (several minutes) to iterate with your changes
    because you do not have to rebuild the base image - the only thing
    needed is to copy sources to the PROD image to "/opt/airflow" which
    is where editable installlation is done from. You only need to
    rebuild the image if dependencies change.

  • By default uv is used for local source build for k8s tests so
    even if you have to rebuild it, it is way faster (60%-80%) during
    iterating with the image.

CI/DEV tooling improvements:

  • this PR switches to use uv by default for most prod images we
    build in CI, but it adds a check if the image still builds with pip.

  • we also switch to more PEP standard way of installing packages
    from local filesystem (package-name @ file:///FILE)

Fixes: #37785
Fixes: #37815


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@potiuk
Copy link
Member Author

potiuk commented Feb 29, 2024

Aftere a looong time I've finally found a way how to get rid of the --user flag for PROD airflow lmage and keep backwards compatibility. I attempted it already in the past #19189 (loong time ago( and could not find a good way to keep compatibility. Mainly it was about python -m venv --system-site-packages creting venv dynamically with preinstalled airflow and all its dependencies. This was not working well when airflow was just a virtualenv.

But now I revised it - in order to bring uv on board for production image (uv does not support --user flag) and I found out that simply recreating a ~/.local directory as venv, adding ~/.local/bin to the PATH and setting VIRTUAL_ENV variable pointing to ~/.local/ does the job well. It has all the properties we need:

  • nicely installs all packages locally without the need to have root access
  • allows to create new venvs: empty (default) or with installed airflow (when --system-site-packages are used)
  • works with uv out-of-the-box (thoug uv venv acts a little differently than python -m venv because uv still does not consider --user / .local directories as special
  • allows to build our images with uv which brings 40% - 55% percent improvements in build times.

The request to add --user flag to --uv is in astral-sh/uv#2077 - but we do not need it any more to handle that case.

Copy link
Collaborator

@aritra24 aritra24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On a first pass looks good, will take another look a little bit later

@potiuk potiuk force-pushed the use-uv-for-prod-image branch from a3008e0 to ee83e85 Compare February 29, 2024 19:42
@potiuk potiuk force-pushed the use-uv-for-prod-image branch 5 times, most recently from ed154b1 to bb84fa4 Compare February 29, 2024 23:28
Copy link
Member

@Lee-W Lee-W left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left 2 nitpicks. But overall looks good!

@potiuk
Copy link
Member Author

potiuk commented Mar 1, 2024

Yeah. That one should be ready to go - and save some of the CI time.

Once this is merged, I plan to rewrite the whole "scripts/docker" scripts in Python and make it much cleaner - for now it's just a way to get it working for PROD image. So if anyone wants to suggest that, the answer is: yes, as a follow up :)

@potiuk potiuk force-pushed the use-uv-for-prod-image branch 2 times, most recently from 198f9bb to a252fb5 Compare March 1, 2024 09:40
@potiuk potiuk force-pushed the use-uv-for-prod-image branch 9 times, most recently from e0856d1 to 5a0c94f Compare March 3, 2024 09:19
@potiuk
Copy link
Member Author

potiuk commented Mar 3, 2024

OK. Green and heavily tested ... Would love some reviews ;) - > it's much smaller than the original one :)

Copy link
Contributor

@o-nikolas o-nikolas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a lot here 😅 But overall LGTM. Mostly doc/comment feedback and a question or two.

@potiuk potiuk force-pushed the use-uv-for-prod-image branch from 89b26a2 to bdf8eb2 Compare March 5, 2024 22:04
@potiuk potiuk added the default versions only When assigned to PR - only default python version is used for CI tests label Mar 5, 2024
@potiuk potiuk force-pushed the use-uv-for-prod-image branch from 904648f to 5390295 Compare March 5, 2024 22:41
This PR introduces a joint way to treat the .local (--user) folder
as both - venv and `--user` package installation. It fixes a number
of problems the `--user` installation created us in the past and
does it in fully backwards compatible way.

This improves both "production" use for end user as well as local
iteration on the PROD image during tests - but also for CI.

Improvements for "end user":

* user does not have to use `pip install --user` to install new
  packages any more and it is not enabled by default with PIP_USER
  flag.

* users can use uv to install packages when they extend the image
  (but it's not obligatory - pip continues working as it did)

* users can use `uv` to build custom production image, which gives
  40%-50% saving for image build time compring to `pip`.

* python -m venv --system-site-packages continues to use the
  .local packages from the .local installation (and not uses
  them if --system-site-packages is not used) - so we have full
  compatibility with previous images.

Improvements for development:

* when image is built from sources (no --use-docker-context-files
  are specified), airflow is installed in --editable mode, which
  means that airflow + all providers are installed locally from
  airflow sources, not from packages - which means that both
  airflow and providers have the latest version inside the
  prod image.

* when local sources changes and you want to run k8s tests locally,
  it is now WAY faster (several minutes) to iterate with your changes
  because you do not have to rebuild the base image - the only thing
  needed is to copy sources to the PROD image to "/opt/airflow" which
  is where editable installlation is done from. You only need to
  rebuild the image if dependencies change.

* By default `uv` is used for local source build for k8s tests so
  even if you have to rebuild it, it is way faster (60%-80%) during
  iterating with the image.

CI/DEV tooling improvements:

* this PR switches to use `uv` by default for most prod images we
  build in CI, but it adds a check if the image still builds with `pip`.

* we also switch to more PEP standard way of installing packages
  from local filesystem (package-name @ file:///FILE)

Fixes: #37785
Fixes: #37815

Update contributing-docs/testing/k8s_tests.rst

Co-authored-by: Niko Oliveira <[email protected]>

Update contributing-docs/testing/k8s_tests.rst

Co-authored-by: Niko Oliveira <[email protected]>

Update docs/docker-stack/build.rst

Co-authored-by: Niko Oliveira <[email protected]>

Update docs/docker-stack/build.rst

Co-authored-by: Niko Oliveira <[email protected]>

Update docs/docker-stack/build.rst

Co-authored-by: Niko Oliveira <[email protected]>

Update docs/docker-stack/build.rst

Co-authored-by: Niko Oliveira <[email protected]>

Update docs/docker-stack/build.rst

Co-authored-by: Niko Oliveira <[email protected]>

Update scripts/docker/install_airflow.sh

Co-authored-by: Niko Oliveira <[email protected]>

Update docs/docker-stack/changelog.rst

Co-authored-by: Niko Oliveira <[email protected]>

Update docs/docker-stack/build.rst

Co-authored-by: Niko Oliveira <[email protected]>
@potiuk potiuk force-pushed the use-uv-for-prod-image branch from 5390295 to dd6753f Compare March 5, 2024 23:25
@potiuk
Copy link
Member Author

potiuk commented Mar 6, 2024

Failing tests fixed in #37923

@potiuk potiuk merged commit a1717a6 into main Mar 6, 2024
63 of 72 checks passed
@potiuk
Copy link
Member Author

potiuk commented Mar 6, 2024

cc: @jedcunningham - PROD image build with breeze should now contain airflow + latest providers.

@jedcunningham jedcunningham deleted the use-uv-for-prod-image branch March 6, 2024 01:23
@utkarsharma2 utkarsharma2 added this to the Airflow 2.8.3 milestone Mar 6, 2024
@utkarsharma2 utkarsharma2 added the type:misc/internal Changelog: Misc changes that should appear in change log label Mar 6, 2024
@ephraimbuddy ephraimbuddy added changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) and removed type:misc/internal Changelog: Misc changes that should appear in change log labels Mar 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:dev-tools area:production-image Production image improvements and fixes changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) default versions only When assigned to PR - only default python version is used for CI tests kind:documentation
Projects
None yet
6 participants