Retry the cleanup of downloadAndExtract #24295

sthornington · 2024-11-12T16:50:15Z

This helps the HTTP downloader better cope with filesystems where unlink is non-atomic.

sthornington · 2024-11-12T16:52:13Z

This fixes #23687 and maybe #20013. It would be wonderful if we could get this back-ported to 7.x as well, since it fixes a pretty big problem for us.

sthornington · 2024-11-12T17:08:30Z

...ava/com/google/devtools/build/lib/bazel/repository/starlark/StarlarkBaseExternalContext.java

-        downloadDirectory.deleteTree();
+    // Retry the deleteTree a while, if necessary, to cope with filesytems where unlinks are not atomic
+    Instant start = Instant.now();
+    Instant deadline = start.plus(Duration.ofSeconds(5));


An easy objection is that 5 seconds is a long time, and I am not doing any sleeping or backoff in this loop, but my reasoning is: previously, this would be an immediate abort of the entire build, so the length of time or CPU usage of retrying something that was previously completely unhandled is a bit moot. In a few tests in our environment, 90% of these circumstances are resolved in ~100ms but occasionally the file deletes are not reflected for over 500ms so I figured "better safe than sorry", and I didn't want to make this diff any more complicated than it already was.

Could you give an example of such a filesystem? What exactly does it mean for unlink to not be atomic?

Some distributed NFS appliances have this property, and they ack the delete before all the caches are updated. The symptom is that you rm a file, but it may not be absent in an immediately following directory listing.

Another thought: Could we download to a temporary directory outside the repo that we can clean up asynchronously?

That would work too, I was actually surprised the download was done into output_dir but I figured it was due to some hermeticity argument.

Is there anything I can do to move this along? It solves a massive problem for us, and I am sure others who use similar filesystems.

is deleteTree the only call that needs this special treatment?

either way, I think I'd prefer if you could extract this special handling into a separate method (deleteTreeWithRetries or something), and add some documentation explaining the rationale (preferably also linking to the GH issue). That way, it's less implementation burden for us.

It's not really me using this function, it's the dependency downloading of http_archive and all the crate downloading of rules_rust. But, if what you mean is that I should simply wrap up the retries into a function which is instead used by downloadAndExtract I am happy to do that. I can do that right now one minute.

For now I am adding it in the StarlarkBaseExternalContext since Path.java seems pretty closely aligned with the file system calls...

thanks, this looks good to me.

It's not really me using this function

I understand that -- it's just that having the extra logic specific to this call of deleteTree invites the question of "why isn't this done elsewhere". Extracting the logic into a separate method makes it easier to apply the same retry logic to other call sites.

sthornington · 2024-11-12T17:10:36Z

I actually tested this change in release-7.4.1 but I was not sure which branch was best to submit the PR on. It doesn't look like this section has changed between 7.4.1 and the head.

… unlink is non-atomic, refactored

Wyverald · 2024-11-15T19:19:39Z

...ava/com/google/devtools/build/lib/bazel/repository/starlark/StarlarkBaseExternalContext.java

-        downloadDirectory.deleteTree();
+    // Retry the deleteTree a while, if necessary, to cope with filesytems where unlinks are not atomic
+    Instant start = Instant.now();
+    Instant deadline = start.plus(Duration.ofSeconds(5));


thanks, this looks good to me.

It's not really me using this function

I understand that -- it's just that having the extra logic specific to this call of deleteTree invites the question of "why isn't this done elsewhere". Extracting the logic into a separate method makes it easier to apply the same retry logic to other call sites.

sthornington · 2024-12-09T21:03:36Z

Is there anything I can do to get this into a 7.4.2 ?

sthornington · 2024-12-10T12:18:41Z

Is there anything I can do to get this into a 7.4.2 ?

@Wyverald sorry to ping you - if I go through all the steps outlined in https://github.com/bazelbuild/continuous-integration/blob/master/docs/release-playbook.md can I cherry-pick this to a 7.4.2 and release it? This checklist is so daunting I assumed it was for Google employees.

sthornington · 2024-12-10T16:20:34Z

@bazel-io flag

iancha1992 · 2024-12-10T19:20:54Z

@bazel-io fork 8.0.1

This helps the HTTP downloader better cope with filesystems where unlink is non-atomic. Closes bazelbuild#24295. PiperOrigin-RevId: 697920434 Change-Id: I91b4dbf07a2efdca07c0310e15aac5f4d89c4091

This helps the HTTP downloader better cope with filesystems where unlink is non-atomic. Closes #24295. PiperOrigin-RevId: 697920434 Change-Id: I91b4dbf07a2efdca07c0310e15aac5f4d89c4091 Commit 99a27f6 Co-authored-by: Simon Thornington <[email protected]>

iancha1992 · 2024-12-12T16:58:36Z

@bazel-io fork 7.5.0

This helps the HTTP downloader better cope with filesystems where unlink is non-atomic. Closes bazelbuild#24295. PiperOrigin-RevId: 697920434 Change-Id: I91b4dbf07a2efdca07c0310e15aac5f4d89c4091

This helps the HTTP downloader better cope with filesystems where unlink is non-atomic. Closes #24295. PiperOrigin-RevId: 697920434 Change-Id: I91b4dbf07a2efdca07c0310e15aac5f4d89c4091 Commit 99a27f6 Co-authored-by: Simon Thornington <[email protected]>

This helps the HTTP downloader better cope with filesystems where unlink is non-atomic. Closes bazelbuild#24295. PiperOrigin-RevId: 697920434 Change-Id: I91b4dbf07a2efdca07c0310e15aac5f4d89c4091

github-actions bot added team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file. awaiting-review PR is awaiting review from an assigned reviewer labels Nov 12, 2024

sthornington changed the title ~~retry the cleanup of downloadAndExtract~~ Retry the cleanup of downloadAndExtract Nov 12, 2024

sthornington commented Nov 12, 2024

View reviewed changes

sthornington force-pushed the simon_httpretry_onmaster branch from ed999d4 to 67822d1 Compare November 14, 2024 21:43

retry the cleanup of downloadAndExtract to cope with filesytems where…

14f697f

… unlink is non-atomic, refactored

sthornington force-pushed the simon_httpretry_onmaster branch from 67822d1 to 14f697f Compare November 14, 2024 22:03

Wyverald approved these changes Nov 15, 2024

View reviewed changes

Wyverald added awaiting-PR-merge PR has been approved by a reviewer and is ready to be merge internally and removed awaiting-review PR is awaiting review from an assigned reviewer labels Nov 15, 2024

copybara-service bot closed this in 99a27f6 Nov 19, 2024

github-actions bot removed the awaiting-PR-merge PR has been approved by a reviewer and is ready to be merge internally label Nov 19, 2024

sthornington deleted the simon_httpretry_onmaster branch December 2, 2024 19:43

bazel-io added the potential release blocker Flagged by community members using "@bazel-io flag". Should be added to a release blocker milestone label Dec 10, 2024

bazel-io removed the potential release blocker Flagged by community members using "@bazel-io flag". Should be added to a release blocker milestone label Dec 10, 2024

bazel-io mentioned this pull request Dec 10, 2024

[8.0.1] Retry the cleanup of downloadAndExtract #24629

Closed

bazel-io mentioned this pull request Dec 10, 2024

[8.0.1] Retry the cleanup of downloadAndExtract #24630

Merged

bazel-io mentioned this pull request Dec 12, 2024

[7.5.0] Retry the cleanup of downloadAndExtract #24664

Closed

bazel-io mentioned this pull request Dec 12, 2024

[7.5.0] Retry the cleanup of downloadAndExtract #24665

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Retry the cleanup of downloadAndExtract #24295

Retry the cleanup of downloadAndExtract #24295

sthornington commented Nov 12, 2024

sthornington commented Nov 12, 2024

sthornington Nov 12, 2024

fmeum Nov 12, 2024

sthornington Nov 12, 2024 •

edited

Loading

fmeum Nov 12, 2024

sthornington Nov 12, 2024

sthornington Nov 14, 2024

Wyverald Nov 14, 2024

sthornington Nov 14, 2024

sthornington Nov 14, 2024 •

edited

Loading

Wyverald Nov 15, 2024

sthornington commented Nov 12, 2024

Wyverald Nov 15, 2024

sthornington commented Dec 9, 2024

sthornington commented Dec 10, 2024

sthornington commented Dec 10, 2024

iancha1992 commented Dec 10, 2024

iancha1992 commented Dec 12, 2024

Retry the cleanup of downloadAndExtract #24295

Retry the cleanup of downloadAndExtract #24295

Conversation

sthornington commented Nov 12, 2024

sthornington commented Nov 12, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sthornington Nov 12, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sthornington Nov 14, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sthornington commented Nov 12, 2024

Choose a reason for hiding this comment

sthornington commented Dec 9, 2024

sthornington commented Dec 10, 2024

sthornington commented Dec 10, 2024

iancha1992 commented Dec 10, 2024

iancha1992 commented Dec 12, 2024

sthornington Nov 12, 2024 •

edited

Loading

sthornington Nov 14, 2024 •

edited

Loading