Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize CI/PROD image waiting and verification in CI workflow #35856

Merged
merged 1 commit into from
Nov 26, 2023

Conversation

potiuk
Copy link
Member

@potiuk potiuk commented Nov 25, 2023

Currently both "wait-for-ci-images" and "preview-constraints" jobs are waiting for images to be built - which means that they both take a running worker slot (public runner) just to do the waiting while the image is being built. Also "verify-image" job is run as part of "wait-for-image" which adds additional delay between being downloaded and dependent jobs starting.

This PR optimizes it quite a bit:

  • preview-constraints job now depends on "wait-for-ci-images". This means that only one slot will be busy while waiting for images.

  • both CI and PROD verify-image commands in breeze got --run-in-parallel set of flags that allow the verification to happen for all images in parallel.

  • Image verification is added as separate step in jobs that already need to pull the images to do other stuff. For CI Image it's "Preview constraints" and for PROD image it is "Test Docker compose job". The fact that they are not run as part of "wait for image" jobs allows us to start the other jobs faster but also to not let failure in image verification block other tests from running.

  • In case of the "in-workflow-build" the "wait-for-ci-images" does not have to be run at all, because there wait-for-ci-images depends on in-workflow build-ci-images job - so if that job completes, we know image is built already and we do not have to wait for it separately - so far we had to run it in order to add --verify flag to verify the images. With separate job we can run i in parallel to all the other waiting jobs.

  • Also names and dependencies between jobs are updated, including CI documentation describing diagrams of how CI workflows work. The diagrams are cleaned-up/verified and updated. The separate diagram for scheduled build has been removed as it was essentially the same as "canary build". A paragraph description for every type of workflow was added to add more context to the diagrams.


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@boring-cyborg boring-cyborg bot added area:dev-tools area:production-image Production image improvements and fixes labels Nov 25, 2023
@potiuk potiuk force-pushed the optimize-image-wait-verify branch 5 times, most recently from 5e93483 to fb0a155 Compare November 26, 2023 00:16
@potiuk potiuk marked this pull request as ready for review November 26, 2023 00:16
@potiuk
Copy link
Member Author

potiuk commented Nov 26, 2023

These diagrams: https://github.com/apache/airflow/blob/optimize-image-wait-verify/CI_DIAGRAMS.md and the https://github.com/apache/airflow/blob/optimize-image-wait-verify/CI.rst are up-to-date and describe very accurately the proposed changes to CI workflow, so if someonw would like to dive into details - I recommend them as explanation.

@potiuk potiuk force-pushed the optimize-image-wait-verify branch 11 times, most recently from 102e106 to 9f206c2 Compare November 26, 2023 01:02
@potiuk potiuk force-pushed the optimize-image-wait-verify branch 4 times, most recently from 4b63d7c to 70d94de Compare November 26, 2023 02:29
Currently both "wait-for-ci-images" and  "preview-constraints"
jobs are waiting for images to be built - which means that they
both take a running worker slot (public runner) just to do
the waiting while the image is being built. Also "verify-image"
job is run as part of "wait-for-image" which adds additional
delay between being downloaded and dependent jobs starting.

This PR optimizes it quite a bit:

* preview-constraints job now depends on "wait-for-ci-images".
  This means that only one slot will be busy while waiting
  for images.

* both CI and PROD `verify-image` commands in breeze got
  --run-in-parallel set of flags that allow the verification
  to happen for all images in parallel.

* Image verification is added as separate step in jobs that
  already need to pull the images to do other stuff. For
  CI Image it's "Preview constraints" and for PROD image
  it is "Test Docker compose job". The fact that they are
  not run as part of "wait for image" jobs allows us to
  start the other jobs faster but also to not let failure
  in image verification block other tests from running.

* In case of the "in-workflow-build" the "wait-for-ci-images"
  does not have to be run at all, because there wait-for-ci-images
  depends on in-workflow build-ci-images job - so if that job
  completes, we know image is built already and we do not have to
  wait for it separately - so far we had to run it in order to
  add `--verify` flag to verify the images. With separate job we
  can run i  in parallel to all the other waiting jobs.

* Also names and dependencies between jobs are updated, including
  CI documentation describing diagrams of how CI workflows work.
  The diagrams are cleaned-up/verified and updated. The separate
  diagram for scheduled build has been removed as it was essentially
  the same as "canary build". A paragraph description for every
  type of workflow was added to add more context to the diagrams.
@potiuk potiuk force-pushed the optimize-image-wait-verify branch from 70d94de to 7b85629 Compare November 26, 2023 02:30
@potiuk potiuk merged commit d368488 into main Nov 26, 2023
potiuk added a commit to potiuk/airflow that referenced this pull request Nov 26, 2023
Small follow-up after apache#35856 - cache building uses constraints
so rather than describing it, we add arrow-dependency
potiuk added a commit that referenced this pull request Nov 26, 2023
Small follow-up after #35856 - cache building uses constraints
so rather than describing it, we add arrow-dependency
potiuk added a commit to potiuk/airflow that referenced this pull request Dec 7, 2023
The change apache#35856 optimized waiting time before PROD image builds
start - rather than waiting for full constratints generation, the
PROD image building just used source constraints generated right
after building the CI image quickly. This is fine for main because there
we install airflow and packages using constraints from sources, but
for release branches we use the provider constraints - in order
to be able to install providers from PyPI rather than from sources.

This means that we have to wait for constraints generation to
complete before we start building PROD images - because we need to
download the constraints generated there to use them.

Unfortunately GitHub Actions do not have conditional dependencies
depending on where the workflow is run  - so instead we have to
effectively duplicate PROD build steps and skip steps in them instead.
potiuk added a commit that referenced this pull request Dec 7, 2023
The change #35856 optimized waiting time before PROD image builds
start - rather than waiting for full constratints generation, the
PROD image building just used source constraints generated right
after building the CI image quickly. This is fine for main because there
we install airflow and packages using constraints from sources, but
for release branches we use the provider constraints - in order
to be able to install providers from PyPI rather than from sources.

This means that we have to wait for constraints generation to
complete before we start building PROD images - because we need to
download the constraints generated there to use them.

Unfortunately GitHub Actions do not have conditional dependencies
depending on where the workflow is run  - so instead we have to
effectively duplicate PROD build steps and skip steps in them instead.
potiuk added a commit that referenced this pull request Dec 7, 2023
The change #35856 optimized waiting time before PROD image builds
start - rather than waiting for full constratints generation, the
PROD image building just used source constraints generated right
after building the CI image quickly. This is fine for main because there
we install airflow and packages using constraints from sources, but
for release branches we use the provider constraints - in order
to be able to install providers from PyPI rather than from sources.

This means that we have to wait for constraints generation to
complete before we start building PROD images - because we need to
download the constraints generated there to use them.

Unfortunately GitHub Actions do not have conditional dependencies
depending on where the workflow is run  - so instead we have to
effectively duplicate PROD build steps and skip steps in them instead.
ephraimbuddy pushed a commit that referenced this pull request Dec 7, 2023
The change #35856 optimized waiting time before PROD image builds
start - rather than waiting for full constratints generation, the
PROD image building just used source constraints generated right
after building the CI image quickly. This is fine for main because there
we install airflow and packages using constraints from sources, but
for release branches we use the provider constraints - in order
to be able to install providers from PyPI rather than from sources.

This means that we have to wait for constraints generation to
complete before we start building PROD images - because we need to
download the constraints generated there to use them.

Unfortunately GitHub Actions do not have conditional dependencies
depending on where the workflow is run  - so instead we have to
effectively duplicate PROD build steps and skip steps in them instead.
@Taragolis Taragolis deleted the optimize-image-wait-verify branch December 27, 2023 10:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:dev-tools area:production-image Production image improvements and fixes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants