Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recommend Ping Pong strategy as default for Canary with Traffic Routing using ALB. #2864

Open
2 tasks done
rajeshetty87 opened this issue Jul 3, 2023 · 1 comment
Open
2 tasks done
Labels

Comments

@rajeshetty87
Copy link

rajeshetty87 commented Jul 3, 2023

Checklist:

  • I've included steps to reproduce the bug.
  • I've included the version of argo rollouts.

Describe the bug

Canary w/ Traffic Routing with AWS ALB Ingress causes availability issues at the end of rollout.
The rollout progresses successfully through all steps and at the end, service selector labels are switched by Rollouts.
This triggered two actions initiated by the ALB Ingress Controller

  1. Registers the targets from the Canary Target Groups to the Stable Target Groups.
  2. Updates the weights on the ALB listener from 0 --> 100% for Stable and 100 -> 0% for Canary

The above two actions don't necessarily happen in sync and in scenarios when listener updates (Action #2) before (Action #1) , ALB will send the traffic to an empty target group causing 503 errors and thus causing availability drop.
The issues is mostly observed in swim-lanes with high TPS and lots of targets (>300) whereas low traffic swim-lanes don't
show similar issue.

This is a known issue with Canary with Traffic routing and has been widely discussed in these threads(#2061, #1283 , #1453 ). The solution to resolve this issue was the Ping Pong feature and details are available here.

The request with this bugs is to make the Ping Pong strategy as a default when using Canary w/ Traffic Routing + AWS ALB
over Simple Canary. This will save time for folks performing the switch and help them evaluate the correct solution(Ping Pong) instead of the Canary w/ Traffic Routing that does not provide Zero Downtime deployments.

To Reproduce

Steps to reproduce

  • Create a namespace with ingress on AWS ALB
  • Create rollout with Canary w/ traffic routing strategy.
  • Send at least 70K TPS over the ingress and have >=300 Pods in stable target group
  • Do a rollout with Canary Steps as 0-25-50-75 (weightage doesn't matter)
  • Once all the steps are completed looks for 503' in the ALB logs

Screenshots

Screenshot 2023-07-03 at 2 41 57 PM

Version

v1.5.1

Message from the maintainers:

Impacted by this bug? Give it a 👍. We prioritize the issues with the most 👍.

@rajeshetty87 rajeshetty87 added the bug Something isn't working label Jul 3, 2023
@zachaller zachaller changed the title Recommend Ping Pong strategy as default for Canary with Traffic Routing. Recommend Ping Pong strategy as default for Canary with Traffic Routing using ALB. Jul 29, 2023
@github-actions
Copy link
Contributor

This issue is stale because it has been open 60 days with no activity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants