You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Canary w/ Traffic Routing with AWS ALB Ingress causes availability issues at the end of rollout.
The rollout progresses successfully through all steps and at the end, service selector labels are switched by Rollouts.
This triggered two actions initiated by the ALB Ingress Controller
Registers the targets from the Canary Target Groups to the Stable Target Groups.
Updates the weights on the ALB listener from 0 --> 100% for Stable and 100 -> 0% for Canary
The above two actions don't necessarily happen in sync and in scenarios when listener updates (Action #2) before (Action #1) , ALB will send the traffic to an empty target group causing 503 errors and thus causing availability drop.
The issues is mostly observed in swim-lanes with high TPS and lots of targets (>300) whereas low traffic swim-lanes don't
show similar issue.
This is a known issue with Canary with Traffic routing and has been widely discussed in these threads(#2061, #1283 , #1453 ). The solution to resolve this issue was the Ping Pong feature and details are available here.
The request with this bugs is to make the Ping Pong strategy as a default when using Canary w/ Traffic Routing + AWS ALB
over Simple Canary. This will save time for folks performing the switch and help them evaluate the correct solution(Ping Pong) instead of the Canary w/ Traffic Routing that does not provide Zero Downtime deployments.
To Reproduce
Steps to reproduce
Create a namespace with ingress on AWS ALB
Create rollout with Canary w/ traffic routing strategy.
Send at least 70K TPS over the ingress and have >=300 Pods in stable target group
Do a rollout with Canary Steps as 0-25-50-75 (weightage doesn't matter)
Once all the steps are completed looks for 503' in the ALB logs
Screenshots
Version
v1.5.1
Message from the maintainers:
Impacted by this bug? Give it a 👍. We prioritize the issues with the most 👍.
The text was updated successfully, but these errors were encountered:
zachaller
changed the title
Recommend Ping Pong strategy as default for Canary with Traffic Routing.
Recommend Ping Pong strategy as default for Canary with Traffic Routing using ALB.
Jul 29, 2023
Checklist:
Describe the bug
Canary w/ Traffic Routing with AWS ALB Ingress causes availability issues at the end of rollout.
The rollout progresses successfully through all steps and at the end, service selector labels are switched by Rollouts.
This triggered two actions initiated by the ALB Ingress Controller
The above two actions don't necessarily happen in sync and in scenarios when listener updates (Action #2) before (Action #1) , ALB will send the traffic to an empty target group causing 503 errors and thus causing availability drop.
The issues is mostly observed in swim-lanes with high TPS and lots of targets (>300) whereas low traffic swim-lanes don't
show similar issue.
This is a known issue with Canary with Traffic routing and has been widely discussed in these threads(#2061, #1283 , #1453 ). The solution to resolve this issue was the Ping Pong feature and details are available here.
The request with this bugs is to make the Ping Pong strategy as a default when using Canary w/ Traffic Routing + AWS ALB
over Simple Canary. This will save time for folks performing the switch and help them evaluate the correct solution(Ping Pong) instead of the Canary w/ Traffic Routing that does not provide Zero Downtime deployments.
To Reproduce
Steps to reproduce
Screenshots
Version
v1.5.1
Message from the maintainers:
Impacted by this bug? Give it a 👍. We prioritize the issues with the most 👍.
The text was updated successfully, but these errors were encountered: