Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Track down test flakes #337

Merged
merged 13 commits into from
Oct 17, 2022
Merged

Track down test flakes #337

merged 13 commits into from
Oct 17, 2022

Conversation

squaremo
Copy link
Contributor

@squaremo squaremo commented Oct 10, 2022

I'm using this PR to try and get the tests in better shape, since they seem to flake on default branch and elsewhere. (Granted, that may in part be because of #331; but it would be good to be able to let the parallelisation and randomisation stand.)

  • Port to Ginkgo v2 and use DeferCleanup
  • Rearrange so AWS is not required for everything
  • Use file backend unless specifically testing against the service

@squaremo squaremo changed the title Port to Ginkgo v2 Track down test flakes Oct 10, 2022
This updates Ginkgo, the test frameworks, to v2. It is largely the same,
except 1. custom reporters are even more deprecated (I removed the code
which set up a special test reporter; this should get rid of the
deprecation message); 2. we can now use DeferCleanup, which is a clearer
than having "matching" code in BeforeEach/AfterEach.

I've rewritten the cleanup code which used `toDelete []string`. Because
the invocation was outside a BeforeEach/It body, it executed during the
tree construction phase[1] rather than when the tests were run, so it
didn't actually clean anything up.

[1]: https://onsi.github.io/ginkgo/#mental-model-how-ginkgo-traverses-the-spec-hierarchy

Signed-off-by: Michael Bridgen <[email protected]>
They don't need to be finalised, but having them present might foul
other test cases (they are all lumped into the same namespace, at
present). Specifically: there is an AfterEach in
stack_controller_test.go which removes _all_ stacks in the namespace,
then waits until there are none (they have all been processed) -- if an
individual stack gets stuck, finalising or otherwise, this will take a
very long time to time out, and the failure is not easily connected to
the stuck stack.

Signed-off-by: Michael Bridgen <[email protected]>
The cleanup procedure for tests in stack_controller_test.go deletes all
stacks in the namespace, then waits for them to disappear from the API
results (i.e., they have been finalised and removed from the
cache). This has the problem that it can foul on other tests, if they
don't tidy up their own stacks. It will catch when a test case has
simply neglected to delete a stack it created, but not when a stack
fails to be finalised, and in the latter case, it takes a long time to
fail, and is difficult to connect to the actual problem.

On balance I think it's better to remove a source of mysterious
failures, at the expense of less chance of noticing if a test case leaks
resources.

Signed-off-by: Michael Bridgen <[email protected]>
This file has what are more or less end-to-end tests, which are few and
each quite long-running. Here I have rearranged the tests so that set-up
and tear-down is clearer, and so they are putting fewer things to the
test at a time (and there is further this can go).

I have also made most of the tests use the file backend rather than the
service, in the expectation that they will complete faster. And, rather
than using the v1alpha1 and v1 types throughout, I've made an explicit
test covering their compatibility, and used v1 everywhere else.

Signed-off-by: Michael Bridgen <[email protected]>
I find that on my laptop, running two test nodes (each one a control
plane instance of `{etcd, kube-apiserver}`) is more reliable than trying
for four. Sharing the control plane and isolating with namespaces would
help, but in the meantime to make this a bit more convenient, this
commit makes the number of nodes a variable in the Makefile.

    make test TEST_NODES=2

(.. and, incidentally, --randomize-all is the Ginkgo v2 spelling of
--randomizeAllSpecs)

Signed-off-by: Michael Bridgen <[email protected]>
@lblackstone lblackstone marked this pull request as ready for review October 17, 2022 20:05
@lblackstone lblackstone merged commit 098f40b into master Oct 17, 2022
@lblackstone lblackstone deleted the test-flake-hunting branch October 17, 2022 20:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants