Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: fix assertAlertmanagersAreResilientToDisruption() #454

Conversation

simonpasquier
Copy link
Contributor

I suspect that the test was flaky because it uses the address of the loop variable (see https://go.dev/blog/loopvar-preview).

@simonpasquier simonpasquier requested a review from a team as a code owner April 5, 2024 10:43
@simonpasquier simonpasquier requested review from sthaha and slashpai and removed request for a team April 5, 2024 10:43
@openshift-ci openshift-ci bot requested review from danielmellado and jan--f April 5, 2024 10:44
Copy link

openshift-ci bot commented Apr 5, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: simonpasquier

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved label Apr 5, 2024
@danielmellado
Copy link
Contributor

/lgtm seems like we'll be seeing a few of those go1.22 issues around

I suspect that the test was flaky because it uses the address of the
loop variable (see https://go.dev/blog/loopvar-preview).

Signed-off-by: Simon Pasquier <[email protected]>
@simonpasquier simonpasquier force-pushed the fix-assertAlertmanagersAreResilientToDisruption branch from fcae7ed to f4558d4 Compare April 5, 2024 12:12
Copy link

openshift-ci bot commented Apr 5, 2024

@simonpasquier: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/observability-operator-e2e f4558d4 link true /test observability-operator-e2e

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@simonpasquier
Copy link
Contributor Author

/skip

@simonpasquier
Copy link
Contributor Author

simonpasquier commented Apr 5, 2024

After all I'm not sure that it's the bug because it failed on the same test. Trying an updated version where all evictions are executed serially then we check the errors. Let's see if it helps but IIUC we're still subject to race conditions if the evicted pods get ready before the last eviction... The more robust version would be to mark all nodes as unschedulable before the test.

@danielmellado
Copy link
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm label Apr 5, 2024
@simonpasquier simonpasquier merged commit 1fc0a34 into rhobs:main Apr 8, 2024
9 of 11 checks passed
@simonpasquier simonpasquier deleted the fix-assertAlertmanagersAreResilientToDisruption branch April 8, 2024 10:09
@lihongyan1
Copy link
Contributor

The case failed again and even failed in the upstream ci, refer
https://github.com/rhobs/observability-operator/actions/runs/8660157870/job/23747502573?pr=448

@simonpasquier
Copy link
Contributor Author

@lihongya indeed I've seen the test failed. I tried to simulate the eviction scenario manually but the PDB always kicked in to reject the last eviction. I know that @jan--f investigated too but I wonder if there's a flaw in the test or a bug in Kubernetes 🤔

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants