-
Notifications
You must be signed in to change notification settings - Fork 904
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Infinite loop with PreviewReplicaCount set #308
Conversation
rolloutReplicas := defaults.GetReplicasOrDefault(rollout.Spec.Replicas) | ||
if *(newRS.Spec.Replicas) == rolloutReplicas { | ||
// Scaling not required. | ||
return false, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have the same if statement in scaleReplicaSetAndRecordEvent() (https://github.com/argoproj/argo-rollouts/blob/master/rollout/sync.go#L295) so this check is not needed.
// Scaling not required. | ||
return false, nil | ||
} | ||
if *(newRS.Spec.Replicas) > rolloutReplicas { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically, if someone set their previewReplicaCount to greater than the .Spec.Replica
, the controller would get into an infinite loop as it jump between the two values as a result of this check
@@ -5,7 +5,7 @@ jobs: | |||
docker: | |||
# CircleCI Go images available at: https://hub.docker.com/r/circleci/golang/ | |||
- image: circleci/golang:1.13.1 | |||
resource_class: large |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was getting repeated failed workflows with the following error
/usr/local/go/pkg/tool/linux_amd64/link: signal: killed
https://app.circleci.com/jobs/github/argoproj/argo-rollouts/651
https://app.circleci.com/jobs/github/argoproj/argo-rollouts/652
https://app.circleci.com/jobs/github/argoproj/argo-rollouts/650
This Stackoverflow suggested it was a resourcing issue: https://stackoverflow.com/questions/52129612/golang-circleci-2-0-test-failing-with-signal-killed
Codecov Report
@@ Coverage Diff @@
## master #308 +/- ##
==========================================
+ Coverage 83.82% 83.88% +0.05%
==========================================
Files 66 66
Lines 6042 6027 -15
==========================================
- Hits 5065 5056 -9
+ Misses 701 698 -3
+ Partials 276 273 -3
Continue to review full report at Codecov.
|
* Fix Infinite loop with PreviewReplicaCount set * Use xlarge instead of large
* Fix Infinite loop with PreviewReplicaCount set * Use xlarge instead of large
Solves #301 by removing the scaling preview ReplicaSet section in the
scaleBlueGreen
method. That section would scale up the preview ReplicaSet to the.spec.Replicas
count without taking the rollout pause status into account (https://github.com/argoproj/argo-rollouts/compare/fix-infinite-loop?expand=1#diff-cfad87918ee0d3175ebd8e47f5efd159L309 vs https://github.com/argoproj/argo-rollouts/blob/master/utils/replicaset/replicaset.go#L89). The rollout should only scale to the.spec.Replicas
count when the rollout is paused and the ScaleUpPreviewCheckPoint is set to true.The
scaleBlueGreen
method also scales the new ReplicaSet. Since the new ReplicaSet is the same as the preview ReplicaSet, the controller scales it to the.spec.strategy.bluegreen.previewReplicaCount
count (as the rollout is still paused). These two updates on the same ReplicaSet cause an infinite loop as they fight over the.spec.Replica
value.We did not hit this bug earlier since this code only happened while the scaling. With v0.6, the controller leverages that code path whenever the rollout is paused.
In addition, I added some tests to confirm the behavior of when to set the ScaleUpPreviewCheckPoint value.