Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove layer caching in favor of just running cargo #41

Merged
merged 4 commits into from
Apr 21, 2021

Conversation

dmikusa
Copy link
Contributor

@dmikusa dmikusa commented Apr 18, 2021

  • Remove buildpack's layer caching. Caching for the layer could be difficult, so removing it just leaves things up to Cargo which is the most accurate
  • Creates two folders under the cargo cache layer, target/ and home/. The former is where build files go while the latter is where cargo cached downloads, like from crates.io go.
  • At the moment, a layer cached by the lifecycle does not preserve file mtimes. They are squashed in the name of reproducible builds. To get around this, the buildpack will preserve the mtimes of everything install, after cargo runs & then restore them the next time before cargo runs. This keeps consistent file mtimes, which is necessary for cargo to work properly.
  • Removes unnecessary directories from cargo's home directory, based on https://doc.rust-lang.org/cargo/guide/cargo-home.html#caching-the-cargo-home-in-ci, which saves space on the cache layer.

Resolves: #24 and #25

- Remove buildpack's layer caching. Caching for the layer could be difficult, so removing it just leaves things up to Cargo which is the most accurate
- Creates two folders under the cargo cache layer, `target/` and `home/`. The former is where build files go while the latter is where `cargo` cached downloads, like from crates.io go.
- At the moment, a layer cached by the lifecycle does not preserve file mtimes. They are squashed in the name of reproducible builds. To get around this, the buildpack will preserve the mtimes of everything install, after cargo runs & then restore them the next time before cargo runs. This keeps consistent file mtimes, which is necessary for cargo to work properly.
- Removes unnecessary directories from cargo's home directory, based on https://doc.rust-lang.org/cargo/guide/cargo-home.html#caching-the-cargo-home-in-ci, which saves space on the cache layer.

Resolves: #24 and #25
@dmikusa dmikusa requested a review from a team as a code owner April 18, 2021 04:50
…restoring if the mtimes.json file doesn't exist, it won't the first time we run.
@ForestEckhardt
Copy link
Contributor

I am struggling to understand what the mtimes Preserver and Restorer do and more so how what they do helps in a rebuild. If you could give me some pointers or clarity it would be much appreciated!

@dmikusa
Copy link
Contributor Author

dmikusa commented Apr 20, 2021

I am struggling to understand what the mtimes Preserver and Restorer do and more so how what they do helps in a rebuild. If you could give me some pointers or clarity it would be much appreciated!

Sure. So by default, the lifecycle doesn't not keep file modification times on a cached layer. It squashes everything to one standard date in the name of reproducible builds. This does not play well with cargo. When cargo runs, it uses the file modification times to make determinations about what needs to build.

When I removed the layer caching, which was broken, that exposed this issue. Essentially what happens is that cargo install runs the first time and is OK. It builds everything. Then on a subsequent build, it will decide to build a random assortment of stuff. This is because basically all of the files have the same modification date, so it breaks cargo's algorithm to determine what gets rebuilt. At first, I thought this was just impacting the speed of rebuilds, but it also can cause things that need to be built to not be built, which is bad. It is basically totally broken.

I opened an issue with the lifecycle team about this: buildpacks/lifecycle#580 Because it seems odd that a layer only being used for cached/built time data should have the file modification times squashed. That layer doesn't impact the launch image, so it's not really impacting reproducible builds.

Anyway, to work around this in the meantime, the best thing I could come up with is to preserve & restore the file modification times. So prior to running cargo, the buildpack will look for a json file & read the contents. It is a list of files and the modification times. It restores all of those times. Then after cargo runs, it generates a new file with the new list of files and their modification times.

In my testing, this worked very well. It added little overhead and it resulted in correct builds with cargo. Files I modified would get rebuilt and files not changed would not be rebuilt.

@dmikusa
Copy link
Contributor Author

dmikusa commented Apr 20, 2021

PS. there's one minor issue with the tests on this PR. I need to fix the time zones so they are consistent. It works locally because they are set for EST, but when it runs in CI they are UTC, so it fails. I'm hoping to do that soon, but it should be done before we merge.

@ForestEckhardt
Copy link
Contributor

Alright, that tracks from what I saw then thanks! I think that the work around sound reasonable. I will happily merge this once the unit tests are passing on this PR.

@dmikusa
Copy link
Contributor Author

dmikusa commented Apr 21, 2021

👍 tests are switched to UTC & all passing now

@ForestEckhardt ForestEckhardt merged commit e919823 into main Apr 21, 2021
@ForestEckhardt ForestEckhardt deleted the gh_issue_24 branch April 21, 2021 13:39
@dmikusa dmikusa mentioned this pull request Apr 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Investigate if caching actually helps & how much
2 participants