Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

drone: use emptyDir for /var/lib/docker filesystem and prevent repetitive docker pulls #6145

Merged
merged 15 commits into from
Mar 30, 2021

Conversation

webvictim
Copy link
Contributor

@webvictim webvictim commented Mar 25, 2021

Explicitly mounts Docker-in-Docker's /var/lib/docker using an emptyDir, which will be backed by the worker's SSD filesystem. This will remove a layer of abstraction so we're not writing a Docker overlay filesystem inside a container which is itself running on an overlay filesystem.

Also adds a check when running in CI to see whether a Docker image with a matching name:tag is already present and avoid pulling if so. This prevents an edge case I introduced recently where a second image pull will reset the image digest after a changed build. The effect is described below.

Current pipeline:

  • make buildbox
    • triggers docker pull buildbox
    • triggers docker build buildbox
      • if cache matches: this is basically a no-op
      • if cache doesn't match: buildbox gets rebuilt
  • make lint
    • triggers docker pull buildbox again (which will change the digest back if docker build changed it)
    • triggers docker build buildbox again
      • if cache matches: no-op
      • if cache doesn't match: buildbox needlessly gets rebuilt again
  • make test (same thing happens)
  • make integration (same thing happens again)

New workflow with this PR (changes in bold):

  • make buildbox
    • triggers docker pull buildbox as there is no image with that name:tag locally
    • triggers docker build buildbox
      • if cache matches: this is basically a no-op
      • if cache doesn't match: buildbox gets rebuilt
  • make lint
    • no longer triggers docker pull buildbox again as there is a matching image locally
    • triggers docker build buildbox again
      • this will always be a no-op using cache
  • make test (uses local image)
  • make integration (uses local image)

@webvictim webvictim self-assigned this Mar 25, 2021
@webvictim webvictim force-pushed the gus/dind-filesystem branch from 7d7371e to b6371e7 Compare March 25, 2021 17:07
@webvictim webvictim force-pushed the gus/dind-filesystem branch from 07e76d0 to e40fac7 Compare March 25, 2021 19:26
@webvictim webvictim requested review from awly and a-palchikov March 25, 2021 20:10
@webvictim webvictim marked this pull request as ready for review March 25, 2021 20:10
@webvictim webvictim force-pushed the gus/dind-filesystem branch 2 times, most recently from 0cdebdf to 8993bef Compare March 25, 2021 20:23
@webvictim webvictim enabled auto-merge (squash) March 25, 2021 20:24
@webvictim webvictim changed the title drone: Use tmpfs for /var/lib/docker filesystem drone: use emptyDir for /var/lib/docker filesystem and prevent repetitive docker pulls Mar 25, 2021
@webvictim webvictim disabled auto-merge March 25, 2021 20:24
@webvictim webvictim enabled auto-merge (squash) March 25, 2021 20:25
@@ -109,7 +109,7 @@ buildbox:
.PHONY:buildbox-fips
buildbox-fips:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

optional: is it possible to squash all these permutations of the buildbox with something like:

buildbox-fips: BUILDBOX="$(BUILDBOX_FIPS)"
buildbox-fips: BUILDBOX_NAME="$(BUILDBOX_FIPS_NAME)"
buildbox-fips: buildbox

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to do this lots of different ways before I started doing it this way, but it doesn't seem possible without removing the buildbox target as a dependency of release (which would probably hamper people's ability to use this locally). Each buildbox has a different Dockerfile and slightly different build arguments too which makes it harder.

There's probably a much smarter way to do this that I'm missing.

@awly
Copy link
Contributor

awly commented Mar 25, 2021

I'm not seeing a huge difference in pipeline runtimes compared to a few open PRs I checked.
Will this change show up as faster pipelines or only as lower bandwidth usage?

@webvictim
Copy link
Contributor Author

It should be marginally faster. Using in-memory tmpfs would be faster still, but I'm not confident that we wouldn't exhaust the amount of worker RAM Kubernetes allows you to use for that (which is 50%)

I expect this to excel in situations where we have multiple test runs going at once - it should improve reliability and consistency.

I think some of the slowdowns we're seeing are also related to Drone committing test logs back to its database, which will need some downtime to fix.

@webvictim webvictim force-pushed the gus/dind-filesystem branch 3 times, most recently from 2bb93de to 05bf0f5 Compare March 30, 2021 16:57
Copy link
Contributor

@russjones russjones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bot.

@webvictim webvictim force-pushed the gus/dind-filesystem branch from e77143d to bc35df2 Compare March 30, 2021 21:19
@webvictim webvictim merged commit 71ef02f into master Mar 30, 2021
@webvictim webvictim deleted the gus/dind-filesystem branch March 30, 2021 21:32
pierrebeaucamp pushed a commit that referenced this pull request Mar 31, 2021
…e/dynamodb-gsi-autoscaling

* 'master' of github.com:gravitational/teleport: (41 commits)
  Refactor ssh.ClientConfig used by tctl and API clients to use the first valid principal as User.
  Update Architecture Overview With Link To User Roles (#6224)
  Add `lint-api` target and fix lint errors (#6169)
  ssh: fix relogin with jumphosts (#6213)
  drone: use emptyDir for /var/lib/docker filesystem and prevent repetitive docker pulls (#6145)
  Remove ARM64 FIPS builds (#6236)
  tsh Profile SSH certs fix (#6214)
  mfa: fix gRPC unimplemented check in cert reissue
  Open Sources Access Controls Docs (#6188) (#6217)
  add PAM environment with interpolation support
  Cache per-cluster SSH certificates under ~/.tsh (#5938)
  add special resource type for access plugin data
  Enable DynamoDB autoscaling on global secondary indices (#6112)
  darwin fips builds (#5866)
  kube: add kubernetes_labels to role JSON schema
  mfa: send username instead of SSH login name in MFA cert request
  fix nil slice bug
  RFD 16: Add a section on `tctl rm` resetting resources back to defaults (#5673)
  Update application access docs (#6055) (#6137)
  Bump linux FIPS builds to use go1.16.2b7 release (#6143)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants