Cleanup exported repos during sync failures #3282

nairb774 · 2020-09-08T00:35:50Z

Several functions which generate a clone of the repo can result in the repository being left behind when an error is triggered. This shores up some of those failure paths to prevent the storage leaks.

One of our deployments of FluxCD was found to be eating up a bunch of space in the /tmp dir. There were many folders of like /tmp/flux-workingXXXXXXX that seemed to be hanging out. This seemed to be the result of Flux being able to pull the repository, but the path it was configured to read from had been removed. Each failure caused a new copy of the repository to be generated on disk. This shores up some of the exit paths to do a better job attempting to clean the data up.

kingdonb · 2021-01-26T16:54:16Z

Thanks, this seems like a nice fix!

I am reviewing all of the open PRs to see which ones can be included in a next Flux v1 release.

If you can rebase and also amend the commit with a --signoff for the DCO, to satisfy DCObot, then I can try to incorporate your change. This seems like a good candidate to include as a bugfix. As Flux v2 approaches feature parity, we hope your needs are able to be met with the new version, but if there are outstanding issues that can still be solved easily in Flux v1, my aim is to help with that.

kingdonb · 2021-02-09T22:12:44Z

I included this in #3421, with a note that closes this PR.

If #3421 is approved by maintainers, it will merge and this PR will have been included in the next release. 👍

kingdonb · 2021-02-11T12:53:04Z

@nairb774 Can you check out the failing test on #3421? It looks like it is yours.

I ran the CI multiple times before now and unfortunately I'm not quite adept enough at Go just yet to see for myself why this test sometimes fails. Is there some nondeterministic behavior that can be made safe by blocking somewhere? (If you happen to know why it fails, can you say if the failure is in the test itself, or is it a failure in a shipping part of the daemon?)

kingdonb · 2021-02-11T12:55:47Z

The specific failure I'm asking for you to look at is here: https://github.com/fluxcd/flux/pull/3421/checks?check_run_id=1874454916

It only surfaces on about 30% of runs. Must be some kind of race condition, like (guess) a non-blocking filesystem operation?

ok  	github.com/fluxcd/flux/pkg/event	0.050s
--- FAIL: TestExportFailsCheckout (0.13s)
    export_test.go:97: Found "flux-working984482503" in "/tmp" - failed to cleanup when returning an error
FAIL
FAIL	github.com/fluxcd/flux/pkg/git	1.315s
ok  	github.com/fluxcd/flux/pkg/git/gittest	1.774s

(There is another failure in the link checker but it's not related, looks to be a remote service that is down.)

kingdonb · 2021-03-02T18:27:01Z

This still seems like a valuable change because it prevents Flux from eating up ephemeral storage and potentially being targeted for termination in cases where the cluster node hosting Flux is experiencing storage pressure.

But with the flaky test failure as it is, I am sure I won't be able to merge it. Hopefully you have been able to move on to Flux v2 and are no longer experiencing this issue. If you'd like to revisit this, we can reopen it.

Thanks for your contribution. (Closing for now)

pgendreau · 2021-11-22T20:03:50Z

Hi,
I am having this specific issue with version 1.24.2. I can't figure out if this fix has been merged or if it is still waiting on something. Could you update me? Thank you

caleygoff-invitae · 2021-12-14T16:04:31Z

I think we're having this same issue with flux:1.24.1 also.

Several functions which generate a clone of the repo can result in the repository being left behind when an error is triggered. This shores up some of those failure paths to prevent the storage leaks. Signed-off-by: Brian Atkinson <[email protected]>

kingdonb · 2021-12-16T20:13:06Z

Linking back to #2713 where discussion about this PR is ongoing now

kingdonb · 2021-12-16T20:15:02Z

I haven't been able to reproduce the "flaky test" that I think I was describing from the previous round of reviews on this PR.

Will try a few more times, as it will be really inconvenient to merge a flaky test. I also haven't read this change from end to end.

yebyen · 2021-12-16T22:46:44Z

There it is

kingdonb · 2022-01-05T13:08:48Z

We cannot merge a flaky test, as it will complicate future releases. Commenter in #2713 indicated that this issue is still a problem for them. @nairb774 I will see if I can find the issue in the test that causes it to sometimes fail, and fix it up so this can be merged for release. Do you have any idea what might be the issue?

pjbgf · 2022-07-26T10:45:06Z

This project is in Migration and security support only, so unfortunately this PR won't be merged. We recommend users to migrate to Flux 2 at their earliest convenience.

More information about the Flux 2 transition timetable can be found at: https://fluxcd.io/docs/migration/timetable/.

nairb774 force-pushed the nostalgic_booth branch from 1c55caf to 61c2944 Compare January 26, 2021 17:39

kingdonb mentioned this pull request Feb 5, 2021

WIP: Flux Daemon/Chart 1.21.2 Release Planning #3418

Closed

kingdonb added this to the 1.21.2 milestone Feb 9, 2021

kingdonb mentioned this pull request Feb 9, 2021

Release 1.21.2 #3421

Merged

kingdonb modified the milestones: 1.21.2, 1.21.3 Feb 11, 2021

kingdonb removed this from the 1.22.1 milestone Mar 2, 2021

kingdonb closed this Mar 2, 2021

kingdonb mentioned this pull request Mar 25, 2021

Flux may be leaking files in /tmp #2713

Closed

kingdonb reopened this Dec 14, 2021

kingdonb force-pushed the nostalgic_booth branch from eb25639 to 2cb0fd2 Compare December 14, 2021 21:15

pjbgf closed this Jul 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cleanup exported repos during sync failures #3282

Cleanup exported repos during sync failures #3282

nairb774 commented Sep 8, 2020

kingdonb commented Jan 26, 2021

kingdonb commented Feb 9, 2021

kingdonb commented Feb 11, 2021

kingdonb commented Feb 11, 2021 •

edited

Loading

kingdonb commented Mar 2, 2021

pgendreau commented Nov 22, 2021

caleygoff-invitae commented Dec 14, 2021

kingdonb commented Dec 16, 2021

kingdonb commented Dec 16, 2021

yebyen commented Dec 16, 2021

kingdonb commented Jan 5, 2022

pjbgf commented Jul 26, 2022

Cleanup exported repos during sync failures #3282

Cleanup exported repos during sync failures #3282

Conversation

nairb774 commented Sep 8, 2020

kingdonb commented Jan 26, 2021

kingdonb commented Feb 9, 2021

kingdonb commented Feb 11, 2021

kingdonb commented Feb 11, 2021 • edited Loading

kingdonb commented Mar 2, 2021

pgendreau commented Nov 22, 2021

caleygoff-invitae commented Dec 14, 2021

kingdonb commented Dec 16, 2021

kingdonb commented Dec 16, 2021

yebyen commented Dec 16, 2021

kingdonb commented Jan 5, 2022

pjbgf commented Jul 26, 2022

kingdonb commented Feb 11, 2021 •

edited

Loading