Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

easily retry failed sonobuoy tests #326

Closed
Tracked by #267
webern opened this issue Feb 25, 2022 · 3 comments
Closed
Tracked by #267

easily retry failed sonobuoy tests #326

webern opened this issue Feb 25, 2022 · 3 comments
Assignees
Labels
in-progress Somebody is working on this in-review A PR is open that will close this issue priority/high Something that we need to do as soon as possible sonobuoy Testing Bottlerocket with Sonobuoy test-agent Related to a test agent component
Milestone

Comments

@webern
Copy link
Contributor

webern commented Feb 25, 2022

Sometimes Sonobuoy fails when it shouldn't. Sometimes retrying tests will get them to pass.

Should we make it an option when kicking off the test that sonobuoy will automatically retry failed tests?

If so should it behave differently for quick mode?

We have seen these failures which appear to be flakes:

[2022-02-25T03:26:57Z INFO  sonobuoy_test_agent] Initializing Sonobuoy test agent...
time="2022-02-25T03:26:57Z" level=info msg="created object" name=sonobuoy namespace= resource=namespaces
time="2022-02-25T03:26:57Z" level=info msg="created object" name=sonobuoy-serviceaccount namespace=sonobuoy resource=serviceaccounts
time="2022-02-25T03:26:57Z" level=info msg="created object" name=sonobuoy-serviceaccount-sonobuoy namespace= resource=clusterrolebindings
time="2022-02-25T03:26:57Z" level=info msg="created object" name=sonobuoy-serviceaccount-sonobuoy namespace= resource=clusterroles
time="2022-02-25T03:26:57Z" level=info msg="created object" name=sonobuoy-config-cm namespace=sonobuoy resource=configmaps
time="2022-02-25T03:26:57Z" level=info msg="created object" name=sonobuoy-plugins-cm namespace=sonobuoy resource=configmaps
time="2022-02-25T03:26:58Z" level=info msg="created object" name=sonobuoy namespace=sonobuoy resource=pods
time="2022-02-25T03:26:58Z" level=info msg="created object" name=sonobuoy-aggregator namespace=sonobuoy resource=services
202202250327_sonobuoy_33ddaa6a-74e1-4a78-bfd6-f994a7dbd234.tar.gz
time="2022-02-25T03:32:59Z" level=info msg=deleted kind=namespace namespace=sonobuoy
time="2022-02-25T03:32:59Z" level=info msg=deleted kind=clusterrolebindings
time="2022-02-25T03:32:59Z" level=info msg=deleted kind=clusterroles
bash-4.2$ isengard 359404537045 Administrator exec -- kubectl logs upgrade-downgrade-aarch64-aws-k8s-120-test-1-initial-d9jj5
[2022-02-25T03:27:18Z INFO  sonobuoy_test_agent] Initializing Sonobuoy test agent...
time="2022-02-25T03:27:19Z" level=info msg="created object" name=sonobuoy namespace= resource=namespaces
time="2022-02-25T03:27:19Z" level=info msg="created object" name=sonobuoy-serviceaccount namespace=sonobuoy resource=serviceaccounts
time="2022-02-25T03:27:19Z" level=info msg="created object" name=sonobuoy-serviceaccount-sonobuoy namespace= resource=clusterrolebindings
time="2022-02-25T03:27:19Z" level=info msg="created object" name=sonobuoy-serviceaccount-sonobuoy namespace= resource=clusterroles
time="2022-02-25T03:27:19Z" level=info msg="created object" name=sonobuoy-config-cm namespace=sonobuoy resource=configmaps
time="2022-02-25T03:27:19Z" level=info msg="created object" name=sonobuoy-plugins-cm namespace=sonobuoy resource=configmaps
time="2022-02-25T03:27:19Z" level=info msg="created object" name=sonobuoy namespace=sonobuoy resource=pods
time="2022-02-25T03:27:19Z" level=info msg="created object" name=sonobuoy-aggregator namespace=sonobuoy resource=services
time="2022-02-25T03:33:19Z" level=error msg="error attempting to run sonobuoy: waiting for run to finish: pod entered a fatal terminal status: failed"
time="2022-02-25T03:33:19Z" level=info msg=deleted kind=namespace namespace=sonobuoy
time="2022-02-25T03:33:19Z" level=info msg=deleted kind=clusterrolebindings
time="2022-02-25T03:33:19Z" level=info msg=deleted kind=clusterroles
@webern webern added this to the sprint-12 milestone Feb 25, 2022
@webern webern added test-agent Related to a test agent component sonobuoy Testing Bottlerocket with Sonobuoy labels Feb 25, 2022
@webern webern added the priority/high Something that we need to do as soon as possible label Mar 3, 2022
@webern
Copy link
Contributor Author

webern commented Mar 3, 2022

Should we make it an option when kicking off the test that sonobuoy will automatically retry failed tests?

I think the answer to this question is "no". I'm concerned that if we automate the rerunning of failed tests, we will hide these failures from human operators. I think it would be better if we gave the human operator a way to invoke the re-running of failed sonobuoy tests.

@webern
Copy link
Contributor Author

webern commented Mar 3, 2022

Design proposal "Option 1":

In order for this to work, the --keep-running flag will need to have been true for the original test run.

  • We create a CLI command, e.g. testsys run sonobuoy --rerun-failed.
  • We add a rerun_failed flag to the sonobuoy test-agent configuration. Normally it is false.
  • The command obtains the sonobuoy tarball from the still-running test-agent.
  • Then the CLI creates a new test e.g. original-test-name-rerun-failed with the rerun_failed flag set to true.
  • The CLI waits until the pod is running, and places the tarball into a certain location (either agreed-upon by convention, or named in the configuration)
  • When the sonobuoy test agent starts up, if it sees rerun_failed is true, it goes into a loop where it checks the filepath. If it doesn't find a file there it keeps looping, if it does, it uses it (this file is the tarball with test failures in it) to invoke rerun failed.

@webern webern modified the milestones: sprint-12, sprint-13 Mar 8, 2022
@webern webern changed the title automatically retry failed sonobuoy tests easily retry failed sonobuoy tests Mar 8, 2022
@ecpullen ecpullen self-assigned this Mar 15, 2022
@ecpullen ecpullen added in-review A PR is open that will close this issue in-progress Somebody is working on this labels Mar 15, 2022
@ecpullen
Copy link
Contributor

Closed by #352

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
in-progress Somebody is working on this in-review A PR is open that will close this issue priority/high Something that we need to do as soon as possible sonobuoy Testing Bottlerocket with Sonobuoy test-agent Related to a test agent component
Projects
None yet
Development

No branches or pull requests

2 participants