Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ci: increase goldpinger replicas from 4 to 8 and add topology constraints #3376

Merged
merged 2 commits into from
Feb 3, 2025

Conversation

QxBytes
Copy link
Contributor

@QxBytes QxBytes commented Jan 29, 2025

Reason for Change:

Attempts to fix this issue in our pipelines: https://msazure.visualstudio.com/One/_build/results?buildId=113599589&view=logs&j=b1429627-982b-5d7c-f874-2297f1590463&t=f4ed6e9d-3ea8-54bd-90c0-ce2d41a370dc

It looks like there is a possibility that one node becomes unbalanced (ex: both metrics pods, gateway keepers etc. all land on the same node), and so new pods are scheduled all on the other node to balance resources. This PR increases the number of replicas such that it is more likely that at least two pods land on each node (which is a hard requirement in the test). It also adds a topology constraint to encourage pods to distribute themselves on the nodes.

Issue Fixed:

Requirements:

Notes:
Successful runs:
https://msazure.visualstudio.com/One/_build/results?buildId=113216481
https://msazure.visualstudio.com/One/_build/results?buildId=113216663
https://msazure.visualstudio.com/One/_build/results?buildId=113217222
https://msazure.visualstudio.com/One/_build/results?buildId=113258554
https://msazure.visualstudio.com/One/_build/results?buildId=113266372

@QxBytes QxBytes added the ci Infra or tooling. label Jan 29, 2025
@QxBytes QxBytes self-assigned this Jan 29, 2025
@QxBytes
Copy link
Contributor Author

QxBytes commented Jan 30, 2025

/azp run Azure Container Networking PR

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@QxBytes QxBytes marked this pull request as ready for review January 30, 2025 01:03
@QxBytes QxBytes requested a review from a team as a code owner January 30, 2025 01:03
@QxBytes QxBytes requested a review from rajvinar January 30, 2025 01:03
Copy link
Contributor

@jpayne3506 jpayne3506 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Very few scenarios still use goldpinger. Do you have a schedule for a set of automated runs to see if this was effective or not?

@QxBytes
Copy link
Contributor Author

QxBytes commented Jan 30, 2025

👍 Very few scenarios still use goldpinger. Do you have a schedule for a set of automated runs to see if this was effective or not?

I ran several runs on the pipeline page (might need to scroll down on https://msazure.visualstudio.com/One/_build?definitionId=95007&_a=summary) (see the ones with commit message "try topology constraints to balance") and none failed with the error this intends to fix-- the only failures were for other reasons. Also looks like this issue may be present in the v1.5.x release train as well https://msazure.visualstudio.com/One/_build/results?buildId=113892380&view=logs&j=b1429627-982b-5d7c-f874-2297f1590463&t=f4ed6e9d-3ea8-54bd-90c0-ce2d41a370dc

@jpayne3506
Copy link
Contributor

Most dualstack runs are failing now because goldpinger pods are landing on the same node. Let's get this merged

@jpayne3506 jpayne3506 added release/latest Change affects latest release train needs-backport Change needs to be backported to previous release trains release/1.5 Change affects v1.5 release train labels Jan 31, 2025
@rbtr rbtr added this pull request to the merge queue Jan 31, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Feb 1, 2025
@rbtr rbtr added this pull request to the merge queue Feb 1, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to no response for status checks Feb 1, 2025
@rbtr rbtr added this pull request to the merge queue Feb 3, 2025
Merged via the queue into master with commit 661bddd Feb 3, 2025
98 of 99 checks passed
@rbtr rbtr deleted the alew/increase-datapath-goldpinger-replicas branch February 3, 2025 18:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci Infra or tooling. needs-backport Change needs to be backported to previous release trains release/latest Change affects latest release train release/1.5 Change affects v1.5 release train
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants