Skip to content
This repository has been archived by the owner on Nov 1, 2022. It is now read-only.

Cleanup exported repos during sync failures #3282

Closed
wants to merge 1 commit into from

Conversation

nairb774
Copy link

@nairb774 nairb774 commented Sep 8, 2020

Several functions which generate a clone of the repo can result in the repository being left behind when an error is triggered. This shores up some of those failure paths to prevent the storage leaks.

One of our deployments of FluxCD was found to be eating up a bunch of space in the /tmp dir. There were many folders of like /tmp/flux-workingXXXXXXX that seemed to be hanging out. This seemed to be the result of Flux being able to pull the repository, but the path it was configured to read from had been removed. Each failure caused a new copy of the repository to be generated on disk. This shores up some of the exit paths to do a better job attempting to clean the data up.

@kingdonb
Copy link
Member

Thanks, this seems like a nice fix!

I am reviewing all of the open PRs to see which ones can be included in a next Flux v1 release.

If you can rebase and also amend the commit with a --signoff for the DCO, to satisfy DCObot, then I can try to incorporate your change. This seems like a good candidate to include as a bugfix. As Flux v2 approaches feature parity, we hope your needs are able to be met with the new version, but if there are outstanding issues that can still be solved easily in Flux v1, my aim is to help with that.

@kingdonb kingdonb added this to the 1.21.2 milestone Feb 9, 2021
@kingdonb kingdonb mentioned this pull request Feb 9, 2021
@kingdonb
Copy link
Member

kingdonb commented Feb 9, 2021

I included this in #3421, with a note that closes this PR.

If #3421 is approved by maintainers, it will merge and this PR will have been included in the next release. 👍

@kingdonb
Copy link
Member

@nairb774 Can you check out the failing test on #3421? It looks like it is yours.

I ran the CI multiple times before now and unfortunately I'm not quite adept enough at Go just yet to see for myself why this test sometimes fails. Is there some nondeterministic behavior that can be made safe by blocking somewhere? (If you happen to know why it fails, can you say if the failure is in the test itself, or is it a failure in a shipping part of the daemon?)

@kingdonb
Copy link
Member

kingdonb commented Feb 11, 2021

The specific failure I'm asking for you to look at is here: https://github.com/fluxcd/flux/pull/3421/checks?check_run_id=1874454916

It only surfaces on about 30% of runs. Must be some kind of race condition, like (guess) a non-blocking filesystem operation?

ok  	github.com/fluxcd/flux/pkg/event	0.050s
--- FAIL: TestExportFailsCheckout (0.13s)
    export_test.go:97: Found "flux-working984482503" in "/tmp" - failed to cleanup when returning an error
FAIL
FAIL	github.com/fluxcd/flux/pkg/git	1.315s
ok  	github.com/fluxcd/flux/pkg/git/gittest	1.774s

(There is another failure in the link checker but it's not related, looks to be a remote service that is down.)

@kingdonb kingdonb modified the milestones: 1.21.2, 1.21.3 Feb 11, 2021
@kingdonb kingdonb removed this from the 1.22.1 milestone Mar 2, 2021
@kingdonb
Copy link
Member

kingdonb commented Mar 2, 2021

This still seems like a valuable change because it prevents Flux from eating up ephemeral storage and potentially being targeted for termination in cases where the cluster node hosting Flux is experiencing storage pressure.

But with the flaky test failure as it is, I am sure I won't be able to merge it. Hopefully you have been able to move on to Flux v2 and are no longer experiencing this issue. If you'd like to revisit this, we can reopen it.

Thanks for your contribution. (Closing for now)

@pgendreau
Copy link

Hi,
I am having this specific issue with version 1.24.2. I can't figure out if this fix has been merged or if it is still waiting on something. Could you update me? Thank you

@caleygoff-invitae
Copy link

I think we're having this same issue with flux:1.24.1 also.

@kingdonb kingdonb reopened this Dec 14, 2021
Several functions which generate a clone of the repo can result in the
repository being left behind when an error is triggered. This shores up
some of those failure paths to prevent the storage leaks.

Signed-off-by: Brian Atkinson <[email protected]>
@kingdonb
Copy link
Member

Linking back to #2713 where discussion about this PR is ongoing now

@kingdonb
Copy link
Member

I haven't been able to reproduce the "flaky test" that I think I was describing from the previous round of reviews on this PR.

Will try a few more times, as it will be really inconvenient to merge a flaky test. I also haven't read this change from end to end.

@yebyen
Copy link
Contributor

yebyen commented Dec 16, 2021

There it is

@kingdonb
Copy link
Member

kingdonb commented Jan 5, 2022

We cannot merge a flaky test, as it will complicate future releases. Commenter in #2713 indicated that this issue is still a problem for them. @nairb774 I will see if I can find the issue in the test that causes it to sometimes fail, and fix it up so this can be merged for release. Do you have any idea what might be the issue?

@pjbgf
Copy link
Member

pjbgf commented Jul 26, 2022

This project is in Migration and security support only, so unfortunately this PR won't be merged. We recommend users to migrate to Flux 2 at their earliest convenience.

More information about the Flux 2 transition timetable can be found at: https://fluxcd.io/docs/migration/timetable/.

@pjbgf pjbgf closed this Jul 26, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants