-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Retry the cleanup of downloadAndExtract #24295
Retry the cleanup of downloadAndExtract #24295
Conversation
downloadDirectory.deleteTree(); | ||
// Retry the deleteTree a while, if necessary, to cope with filesytems where unlinks are not atomic | ||
Instant start = Instant.now(); | ||
Instant deadline = start.plus(Duration.ofSeconds(5)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An easy objection is that 5 seconds is a long time, and I am not doing any sleeping or backoff in this loop, but my reasoning is: previously, this would be an immediate abort of the entire build, so the length of time or CPU usage of retrying something that was previously completely unhandled is a bit moot. In a few tests in our environment, 90% of these circumstances are resolved in ~100ms but occasionally the file deletes are not reflected for over 500ms so I figured "better safe than sorry", and I didn't want to make this diff any more complicated than it already was.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you give an example of such a filesystem? What exactly does it mean for unlink to not be atomic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some distributed NFS appliances have this property, and they ack the delete before all the caches are updated. The symptom is that you rm a file, but it may not be absent in an immediately following directory listing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another thought: Could we download to a temporary directory outside the repo that we can clean up asynchronously?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would work too, I was actually surprised the download was done into output_dir but I figured it was due to some hermeticity argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there anything I can do to move this along? It solves a massive problem for us, and I am sure others who use similar filesystems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is deleteTree
the only call that needs this special treatment?
either way, I think I'd prefer if you could extract this special handling into a separate method (deleteTreeWithRetries
or something), and add some documentation explaining the rationale (preferably also linking to the GH issue). That way, it's less implementation burden for us.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not really me using this function, it's the dependency downloading of http_archive
and all the crate downloading of rules_rust
. But, if what you mean is that I should simply wrap up the retries into a function which is instead used by downloadAndExtract
I am happy to do that. I can do that right now one minute.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now I am adding it in the StarlarkBaseExternalContext
since Path.java
seems pretty closely aligned with the file system calls...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, this looks good to me.
It's not really me using this function
I understand that -- it's just that having the extra logic specific to this call of deleteTree
invites the question of "why isn't this done elsewhere". Extracting the logic into a separate method makes it easier to apply the same retry logic to other call sites.
I actually tested this change in release-7.4.1 but I was not sure which branch was best to submit the PR on. It doesn't look like this section has changed between 7.4.1 and the head. |
ed999d4
to
67822d1
Compare
… unlink is non-atomic, refactored
67822d1
to
14f697f
Compare
downloadDirectory.deleteTree(); | ||
// Retry the deleteTree a while, if necessary, to cope with filesytems where unlinks are not atomic | ||
Instant start = Instant.now(); | ||
Instant deadline = start.plus(Duration.ofSeconds(5)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, this looks good to me.
It's not really me using this function
I understand that -- it's just that having the extra logic specific to this call of deleteTree
invites the question of "why isn't this done elsewhere". Extracting the logic into a separate method makes it easier to apply the same retry logic to other call sites.
Is there anything I can do to get this into a 7.4.2 ? |
@Wyverald sorry to ping you - if I go through all the steps outlined in https://github.com/bazelbuild/continuous-integration/blob/master/docs/release-playbook.md can I cherry-pick this to a 7.4.2 and release it? This checklist is so daunting I assumed it was for Google employees. |
@bazel-io flag |
@bazel-io fork 8.0.1 |
This helps the HTTP downloader better cope with filesystems where unlink is non-atomic. Closes bazelbuild#24295. PiperOrigin-RevId: 697920434 Change-Id: I91b4dbf07a2efdca07c0310e15aac5f4d89c4091
This helps the HTTP downloader better cope with filesystems where unlink is non-atomic. Closes #24295. PiperOrigin-RevId: 697920434 Change-Id: I91b4dbf07a2efdca07c0310e15aac5f4d89c4091 Commit 99a27f6 Co-authored-by: Simon Thornington <[email protected]>
@bazel-io fork 7.5.0 |
This helps the HTTP downloader better cope with filesystems where unlink is non-atomic. Closes bazelbuild#24295. PiperOrigin-RevId: 697920434 Change-Id: I91b4dbf07a2efdca07c0310e15aac5f4d89c4091
This helps the HTTP downloader better cope with filesystems where unlink is non-atomic. Closes #24295. PiperOrigin-RevId: 697920434 Change-Id: I91b4dbf07a2efdca07c0310e15aac5f4d89c4091 Commit 99a27f6 Co-authored-by: Simon Thornington <[email protected]>
This helps the HTTP downloader better cope with filesystems where unlink is non-atomic. Closes bazelbuild#24295. PiperOrigin-RevId: 697920434 Change-Id: I91b4dbf07a2efdca07c0310e15aac5f4d89c4091
This helps the HTTP downloader better cope with filesystems where unlink is non-atomic.