Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix watchdog usage in controller unit tests #250

Merged
merged 3 commits into from
Jan 27, 2025

Conversation

slintes
Copy link
Member

@slintes slintes commented Jan 24, 2025

Why we need this PR

Controller unit tests are broken. They have false positives because they all use the same watchdog.
Also, one unit test is testing for the wrong result.

Changes made

  • fix error handling in peers component
  • add unit test for watchdog
  • fixed issues in controller unit test

For details see commit messages.

Which issue(s) this PR fixes

Fixes issues with #238

Test plan

There are new and fixed tests

Do not exit the Start method with an error in case of a peers update
problem, it
- potentially breaks watchdog related unit tests, because the
  k8s manager will stop all components, which disarms the watchdog
- will cause a pod restart in production for no reason

Retry with a quicker interval instead.

Signed-off-by: Marc Sluiter <[email protected]>
Copy link
Contributor

openshift-ci bot commented Jan 24, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

Copy link
Contributor

openshift-ci bot commented Jan 24, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: slintes

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@slintes
Copy link
Member Author

slintes commented Jan 24, 2025

/test 4.17-openshift-e2e

@novasbc
Copy link
Contributor

novasbc commented Jan 24, 2025

Looking at this briefly on my phone and it looks very familiar to what I was working on trying to accomplish on my local branch, and was having issues getting fully correct!

Thanks for putting this in, I should be able to rebase off this and I think it will do the trick.

Regards,
Mark

quickUpdateWorkerPeersError := p.updateWorkerPeers(ctx)
quickUpdateControlPlanePeersError := p.updateControlPlanePeers(ctx)
if quickUpdateWorkerPeersError == nil && quickUpdateControlPlanePeersError == nil {
quickCancel()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, does this cancel the parent context as well? That would be bad!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't, we're good here. Would be terrifying if a child can cancel the parent's and with that all sibling's contexts, right? 🙈

Copy link
Contributor

@clobrano clobrano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just few typos (in old code probably, but we moved to a new file, so it is worth fixing it I think)

wd.Stop()
})

It("should be triggered and and not be fed anymore", func() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: "and" is repeated twice.

})

Context("Watchdog started", func() {
It("should be feeded", func() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's old code, but I think it's a typo: s/feeded/fed. Same below

})
})

Context("Triggered watchdog resetted", func() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: resetted/reset

wd.Reset()
})

It("should be armed and and be feeded", func() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: "add" repeated twice

This is the right place for testing feeding

Signed-off-by: Marc Sluiter <[email protected]>
@clobrano
Copy link
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm label Jan 24, 2025
@slintes
Copy link
Member Author

slintes commented Jan 24, 2025

/hold

@mshitrit in case you want to review

@slintes slintes marked this pull request as ready for review January 24, 2025 17:26
@openshift-ci openshift-ci bot requested review from clobrano and mshitrit January 24, 2025 17:26
- Do not test watchdog feeding (is in new watchdog unit tests now)
- Test watchdog status instead
- Reset watchdog before each test! Otherwise we get false positives,
  because earlier tests triggered the watchdog already!
- Fix "Unhealthy node without api-server access" test: the watchdog
  should NOT be triggered!

Signed-off-by: Marc Sluiter <[email protected]>
@slintes
Copy link
Member Author

slintes commented Jan 27, 2025

/hold cancel

@openshift-merge-bot openshift-merge-bot bot merged commit 16052be into medik8s:main Jan 27, 2025
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants