Build rust code only once per platform in a CI run #7047

benjyw · 2019-01-09T04:33:55Z

Creates greater symmetry between Linux and OSX.
Each platform has a bootstrap stage job that builds the native engine, a single-platform pants.pex and fs_util.
The jobs in the test stage use the prebuilt pants.pex for the relevant platform (and/or the native_engine.so embedded in it).
The Travis cache config has been refactored so that the bootstrap jobs only cache dirs related to building rust code, and test jobs only cache dirs related to running pants (e.g., the ivy2 cache).
Removes the -k option from ci.sh. It was never used in practice (e.g., ccdf523 inverted the sense of the flag, but didn't change what it actually does, which seems like ample evidence that it isn't needed).
Adds a -f option to release.sh, to build fs_util. Now a dry-run release no longer automatically builds it. This allows us to build it only in the bootstrap stage.
Ensures that each shard has a separate Travis cache.
The .travis.yml changes utilize a little-known but very useful feature of mustache, where if a partial is invoked on a line by itself then on substitution, every line in it receives the indentation of that line.

benjyw · 2019-01-09T04:36:54Z

Reviewers: please ensure that the logic is sound, particularly around the rust builds. It probably makes sense to examine the CI run for this PR, to double-check that it does what we think it does.

benjyw · 2019-01-09T04:40:44Z

Note that there is probably some beneficial fine-tuning of the Travis caches that we can do in a followup change. Right now I erred on the side of caching less than before, because our caches were huge, and the benefit of those is questionable.

illicitonion

This looks great, thanks! Particularly the cache splitting and shrinking. As you note, I suspect we'll want to add some more things back to the caches, but we should consider that on a case by case basis :)

(In particular, I've been experimenting with avoiding the bootstrap shard re-compiling rust when rust code hasn't changed; I've got it mostly working in #7044 - just one more kink to work out :))

illicitonion · 2019-01-09T09:13:25Z

.travis.yml

+          -v "${HOME}:/travis/home"
+          -v "${TRAVIS_BUILD_DIR}:/travis/workdir"
+          travis_ci:latest
+          sh -c "git clean -xfd && ./build-support/bin/ci.sh && ./build-support/bin/release.sh -f"


Why do we git clean -xfd here and on the next shard?

I'm not sure. It was there from before. Probably unnecessary, since we work on a clean clone?

Yeah, let's drop it unless we can justify it; it can only confuse things...

illicitonion · 2019-01-09T09:14:35Z

.travis.yml

+      env:
+        - CACHE_NAME=macos10.12sanity
+      script:
+        - MODE=debug ./build-support/bin/travis-ci.sh -bm


Follow up TODO: swap -b from "don't bootstrap" to "do bootstrap" so it's only present for the bootstrap shards (I'll happily do it, but just noting)

If and when you do, note that the bootstrapping, and ONLY the bootstrapping, should run before we set RUN_PANTS_FROM_PEX=1. Any other logic must follow this line, or else that logic will likely end up bootstrapping again.

illicitonion · 2019-01-09T09:16:22Z

.travis.yml

@@ -221,325 +259,359 @@ matrix:
          repo: pantsbuild/pants

    # Deploy Pex Unstable to s3
-    - name: "Deploy Pants PEX Unstable"
+    - name: "Deploy unstable multiplatform pants.pex"


Idle question for the future (definitely not for now): Would it make sense to merge the stable and unstable pex publish jobs? The difference between the two pexes is basically a handful of strings and hashes, right?

I don't actually know. I was out of the loop when this was originally implemented.

illicitonion · 2019-01-09T09:29:06Z

.travis.yml

-      name: "Rust Clippy on Linux"
+      name: "Linux Rust Clippy"
+      env:
+        - CACHE_NAME=linuxclippy


This job should use the native_engine_cache_config (and maybe share it with the linux bootstrap shard)

Per the travis documentation, sharing caches between jobs is a no-no. Will add the cache config though.

Is this job recompiling a lot of rust code that the bootstrap job already compiled? Would it make sense to move the clippy run to there?

It's re-compiling all of it, but in a way which is different (both cheaper and more expensive) than the bootstrap job. It's doing compiler-integrated linting, which adds extra instrumentation (more expensive), but also doesn't actually do full codegen or linking (cheaper). It probably makes sense to merge it with the rust test running shard, actually...

illicitonion · 2019-01-09T09:29:58Z

.travis.yml

-      name: "Rust Tests Linux"
+      name: "Linux Rust tests"
+      env:
+        - CACHE_NAME=linuxrusttests


This job should use the native_engine_cache_config (and maybe share it with the linux bootstrap shard)

See above. Again, is this recompiling unnecessarily? Is it better to move these tests into the bootstrap shard. Or - is it possible to propagate the bootstrapped rust build state via S3 (like we do for the pex) so that this job isn't starting from a clean rust workspace?

These tests take a non-trivial amount of time, so we probably don't want to merge them into the bootstrap shard...

The build state we'd want to share is basically the entire cache (~/.cache/pants/rust + src/rust/engine/target), but it's probably noticeably simpler for us to just re-do the work in this shard than to do custom cross-shard caching...

How much time might it save to run these on that build state, rather than on clean state? I mean, the whole thing is ~12 minutes from clean right now... And clippy is about the same, AFAICT. And that's without the native_engine_cache_config. So you're probably right that it's not worth worrying about...

illicitonion · 2019-01-09T09:32:15Z

.travis.yml

-    - name: "Rust + Platform-specific Tests OSX"
-      os: osx
+    - <<: *default_osx_test_config
+      name: "OSX Rust + platform-specific tests"


We should split this shard back into two; one for rust tests (ci.sh -e) which uses the rust cache config, and possibly shares its cache with the bootstrap shard, and one which runs platform-specific tests (ci.sh -z) and uses the run cache config.

The shards were merged because the cost of bootstrapping rust was high, but now that that's ameliorated, we should split them so that they can benefit from more appropriate caching.

On this note, should we split Self checks, lint, and JVM tests into two shards? Self checks and lint seem like one logical shard, and JVM tests another logical shard. Will make it easier for us to fail more quickly if there's a linting error.

Makes sense, but I'd rather do that in a separate change, since this one is already hefty.

Also, as above, we can't share Travis caches, but we can maybe share the built rust state via S3? I don't know enough about what that state is in order to do so correctly, but either way, let's address in a followup?

illicitonion · 2019-01-09T09:44:43Z

.travis.yml

+    # TODO: Figure out why we have such large caches (2-7GB) and try to trim them.
+    timeout: 500
+    directories:
+      - ${HOME}/.cache/pants/rust/cargo


We could shave 100M off this dir by replacing this with:

- ${HOME}/.cache/pants/rust/cargo/.crates.toml - ${HOME}/.cache/pants/rust/cargo/git - ${HOME}/.cache/pants/rust/cargo/registry - ${HOME}/.cache/pants/rust/cargo/bin/cargo-ensure-installed - ${HOME}/.cache/pants/rust/cargo/bin/cargo-ensure-prefix

at the expense of:

Hard coding some internals of cargo that may change

Hard-coding a list of things we cargo install (so that if we add more in the future, we'd need to update this list, or they'd be re-compiled)

I'm not sure what the cost of 100M here is, so can't really comment on the trade-off

I think is safe to cache ${HOME}/.cache/pants/python_cache, which has subfolders for interpreters and for requirements. This will allow us to avoid redownloading Python requirements every time, as we cache the wheels. Same with interpreters, as it will cache wheel, setuptools, and the python executable.

The two situtations where the cache would be invalid are:

modified requirements.txt. Our cache code will know that the cache is invalidated, though, so this is fine.

changing the name of the venv folder, as done in Allow Pants to run with Python 3 via ./pants3 script #6959. Here, we can manually delete the cache, get the PR merged, and then the new venv folder names will be cached correctly.

Edit: after inspecting Travis, I think it's okay and possibly preferable to keep these out of the cache, as the only two shards that will benefit are the native engine builders.

Makes sense, but again - let's experiment with this in a followup change whose only purpose is further cache tweaking? This was a tough change, and I'm leery of perturbing it any further when it's already a win as-is.

Eric-Arellano

Crazy to see how much faster CI is now. Thanks Benjy! Cool how PEX makes this all possible.

Eric-Arellano · 2019-01-09T10:36:14Z

.travis.yml

+    # TODO: Figure out why we have such large caches (2-7GB) and try to trim them.
+    timeout: 500
+    directories:
+      - ${HOME}/.cache/pants/rust/cargo


I think is safe to cache ${HOME}/.cache/pants/python_cache, which has subfolders for interpreters and for requirements. This will allow us to avoid redownloading Python requirements every time, as we cache the wheels. Same with interpreters, as it will cache wheel, setuptools, and the python executable.

The two situtations where the cache would be invalid are:

modified requirements.txt. Our cache code will know that the cache is invalidated, though, so this is fine.

changing the name of the venv folder, as done in Allow Pants to run with Python 3 via ./pants3 script #6959. Here, we can manually delete the cache, get the PR merged, and then the new venv folder names will be cached correctly.

Edit: after inspecting Travis, I think it's okay and possibly preferable to keep these out of the cache, as the only two shards that will benefit are the native engine builders.

.travis.yml

Eric-Arellano · 2019-01-09T10:41:38Z

.travis.yml

+      services:
+        - docker
+      env:
+        - CACHE_NAME=linuxpexbuild


Nit: linux_pex_build is easier on the eye

The python interpreter was recently unsafe to cache because the call site makes a symlink. Practically fine if you use the system /usr/bin/python or something, but breaks for us since we bootstrap our interpreter from a virtualenv.

I haven't looked to see if that behavior has changed in the last couple months, but I ended up moving the interpreter cache under .pants.d. That allows me to continue caching all of XDG_CACHE_HOME/pants.

Since this is for Pants CI, caching is likely fine, but FYI

Eric-Arellano · 2019-01-09T10:43:42Z

.travis.yml

      # We request the oldest image we can (corresponding to OSX 10.11) for maximum compatibility.
      # We use 10.11 as a minimum to avoid https://github.com/rust-lang/regex/issues/489.
      # See: https://docs.travis-ci.com/user/reference/osx/#OS-X-Version
      osx_image: xcode8
+      env:
+        - CACHE_NAME=macospexbuild


Nit: macos_pex_build

Eric-Arellano · 2019-01-09T10:47:42Z

.travis.yml

-    - name: "Rust + Platform-specific Tests OSX"
-      os: osx
+    - <<: *default_osx_test_config
+      name: "OSX Rust + platform-specific tests"


On this note, should we split Self checks, lint, and JVM tests into two shards? Self checks and lint seem like one logical shard, and JVM tests another logical shard. Will make it easier for us to fail more quickly if there's a linting error.

.travis.yml

build-support/bin/release.sh

benjyw · 2019-01-09T14:22:15Z

I've handled the comments that are easily addressable here, but I'd prefer to punt on some of the more intricate ones (splitting jobs, tweaking the cache dirs, possibly propagating rust build state from job to job) to a followup change, as this one was already several days in the making, and I'd like to checkpoint the progress so far.

illicitonion · 2019-01-09T14:25:55Z

Awesome, let's merge! :D Thanks!

Eric-Arellano · 2019-01-09T14:30:52Z

Agreed on breaking out the proposed changes in separate PRs. This is a good scope and looks good to merge!

I would still prefer renaming all the cache names to use snake_case as it's more readable than nospacing and it mirrors Python conventions.

illicitonion · 2019-01-09T14:57:42Z

Shard 34 seems to have appeared, called just "Ruby", and I can't work out what it is from the config... Any ideas?

src/python/pants/releases/packages.py

build-support/travis/travis.yml.mustache

Prior to #7047 and #7012, a significant portion of CI was spent on repeatedly bootstrapping Pants. So, we combined some jobs that did not have much logical connection in order to save CI time. In general, it is acceptable to combine unrelated jobs into the same shard when there are benefits, as it does still avoid some shard startup cost. Here, however, we get some benefits from splitting these two shards into four. * "Lint, self-checks, and JVM tests" -> "Lint and self checks" + "JVM tests" - Lint fails more quickly. This is one of the shards most likely to fail, so we want to surface the failure quickly. Meanwhile, we do not often modify JVM code, so JVM tests are far less likely to fail from the average PR. - Allows us to move JVM tests later in the CI run, as again they are less likely to fail from the average PR. * "OSX Rust + platform-specific tests" -> "OSX Rust tests" + "OSX platform specific tests" - Allows us to mark "OSX Rust tests" as no pex, which saves some CI time by not installing `aws-cli` and simplifies `.travis.yml` to not need `py2` and `py3` variants of the shard. - Allows us to extract a common `base_rust_tests` config between `linux_rust_tests` and `osx_rust_tests`. - Thanks to the extracted `base_rust_tests`, we now setup `native_engine_cache` for the `osx_rust_tests` shard, which we did not have before. This caches the rust bootstrap.

Thanks to the changes made in pantsbuild#7047 and pantsbuild#7012, we went from bootstrapping in nearly every shard to only bootstrapping in the first 4 shards. So, we flip the meaning to be a positive rather than negative flag.

Thanks to the changes made in #7047 and #7012, we went from bootstrapping in nearly every shard to only bootstrapping in the first 4 shards. So, we flip the meaning to be a positive rather than negative flag.

benjyw requested review from illicitonion, jsirois, cosmicexplorer, Eric-Arellano and stuhood January 9, 2019 04:34

Make CI more efficient.

01cd87d

benjyw force-pushed the ci_consolidation2 branch from 5fcd0f1 to 01cd87d Compare January 9, 2019 04:35

illicitonion approved these changes Jan 9, 2019

View reviewed changes

Eric-Arellano approved these changes Jan 9, 2019

View reviewed changes

benjyw added 2 commits January 9, 2019 09:22

Address code review comments.

7aff577

Remove git clean fdx.

697f8ac

jsirois approved these changes Jan 9, 2019

View reviewed changes

src/python/pants/releases/packages.py Outdated Show resolved Hide resolved

build-support/travis/travis.yml.mustache Outdated Show resolved Hide resolved

build-support/travis/travis.yml.mustache Outdated Show resolved Hide resolved

benjyw added 2 commits January 9, 2019 10:49

WIP

0da572f

WIP

428ee9a

benjyw merged commit 5acaf84 into pantsbuild:master Jan 10, 2019

benjyw deleted the ci_consolidation2 branch January 10, 2019 17:20

This was referenced Jan 16, 2019

Split out unrelated CI jobs into their own distinct shards #7090

Merged

Cache ./pants bootstrapping in CI using stages #6016

Closed

Eric-Arellano mentioned this pull request Jan 17, 2019

Make ci.sh's -b bootstrap flag a positive flag #7096

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Build rust code only once per platform in a CI run #7047

Build rust code only once per platform in a CI run #7047

benjyw commented Jan 9, 2019

benjyw commented Jan 9, 2019

benjyw commented Jan 9, 2019

illicitonion left a comment

illicitonion Jan 9, 2019

benjyw Jan 9, 2019

illicitonion Jan 9, 2019

illicitonion Jan 9, 2019

benjyw Jan 9, 2019

illicitonion Jan 9, 2019

benjyw Jan 9, 2019

illicitonion Jan 9, 2019

benjyw Jan 9, 2019

illicitonion Jan 9, 2019

illicitonion Jan 9, 2019

benjyw Jan 9, 2019

illicitonion Jan 9, 2019

benjyw Jan 9, 2019

illicitonion Jan 9, 2019

Eric-Arellano Jan 9, 2019

benjyw Jan 9, 2019

illicitonion Jan 9, 2019

Eric-Arellano Jan 9, 2019

benjyw Jan 9, 2019

Eric-Arellano left a comment

Eric-Arellano Jan 9, 2019

Eric-Arellano Jan 9, 2019

mateor Jan 9, 2019

Eric-Arellano Jan 9, 2019

Eric-Arellano Jan 9, 2019

benjyw commented Jan 9, 2019

illicitonion commented Jan 9, 2019

Eric-Arellano commented Jan 9, 2019

illicitonion commented Jan 9, 2019

Build rust code only once per platform in a CI run #7047

Build rust code only once per platform in a CI run #7047

Conversation

benjyw commented Jan 9, 2019

benjyw commented Jan 9, 2019

benjyw commented Jan 9, 2019

illicitonion left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Eric-Arellano left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

benjyw commented Jan 9, 2019

illicitonion commented Jan 9, 2019

Eric-Arellano commented Jan 9, 2019

illicitonion commented Jan 9, 2019