-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
✨ Implement MachineDeployment rolloutAfter support #7053
Conversation
@chrischdi: This issue is currently awaiting triage. If CAPI contributors determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
246eeb0
to
f4f2735
Compare
Have to take a look at this :-) |
dfbd8a2
to
9663ed6
Compare
9663ed6
to
bc28a8a
Compare
55823ed
to
b94dc55
Compare
If the reconciliation time is after spec.rolloutAfter then a rollout should happen or has already happened. A new MachineSet will be created at the first time the reconciliation time is after spec.rolloutAfter. Otherwise the oldest with creation timestamp > lastRolloutAfter annotation is picked. If a new MachineSet is required due to reconciliation time > spec.rolloutAfter the rolloutAfter time is added for creating the hash of the MachineSet name. When a new MachineSet is created the name does not clash with the existing MachineSet having the same template and the rollout can be orchestrated as usual. Co-authored-by: Enxebre <[email protected]>
b94dc55
to
50ece3b
Compare
/test help |
@chrischdi: The specified target(s) for
The following commands are available to trigger optional jobs:
Use
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/test pull-cluster-api-apidiff-main |
From a quick glance, the current changes make sense to me, although these changes touch on the hashing code that @fabriziopandini was looking at for in place propagation of labels and annotations |
Fair 👍 so better hold this and adapt depending on what in place propagation may change. |
Yup. +/- ideally consider what we want to do in this PR during implementation of in-place mutation so it fits nicely. |
// see https://github.com/kubernetes/kubernetes/issues/40415 | ||
// Besides only considering MachineSets which have an equivalent MachineTemplateSpec, we choose the MachineSet | ||
// which has the most recent RolloutAfter annotation set (if any) or as second criteria is the oldest one. | ||
sort.Sort(MachineSetsByRolloutAfterAnnotationAndCreationTimestamp(msList)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was looking at this logic in the context of Node Label propagation / in-place upgrades, and I have noticed that this approach can cause turbulence in the Cluster because it leads to picking one of the matching MS without keeping into account where the machines are. So IMO the sort criteria should be modified in order to pick the MS with more machines on it (*)
This could probably simplify the entire logic by dropping the annotation on MS, and rollout will be triggered by the if in the next for loop that drops MS if rollout after is triggered
(*) this could be a separated PR that we merge as precedence of this one
@@ -254,6 +262,42 @@ func (r *Reconciler) getNewMachineSet(ctx context.Context, d *clusterv1.MachineD | |||
return createdMS, err | |||
} | |||
|
|||
func generateMachineSetName(d *clusterv1.MachineDeployment, now *metav1.Time) (string, string, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hash is currently used as:
- UID to identify machines belonging to the MS
- Adding a unique suffix to the MS set name
Given that I'm really wondering if we should drop the current spew/hash logic and simply use a random string + a check that verifies that the random string is not already taken by an existing MS (for this MD). It seems that the code could be re-entrant also it this way and we can get rid of all this complex logic (*) ...
@vincepri @enxebre @sbueringer opinions?
(*) this could be a separated PR that we merge as precedence of this one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fabriziopandini Sorry missed the mention somehow.
Sounds fine to me, assuming we can make this re-entrant (I didn't look at the code in detail to see how this would be achieved).
+100 to making this a separate PR independent of this work and the propagation work
Would be nice to get rid of the hash early in the v1.4 cycle to give us time to discover potential side effects
@chrischdi: PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@chrischdi: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
This is gonna be replaced by #7053 so closing in favor of it. |
What this PR does / why we need it:
If the reconciliation time is after spec.rolloutAfter then a rollout should happen or has already happened.
A new MachineSet will be created at the first time the reconciliation time is after spec.rolloutAfter.
Otherwise the oldest with creation timestamp > lastRolloutAfter annotation is picked.
If a new MachineSet is required due to reconciliation time > spec.rolloutAfter the rolloutAfter time is added for creating the hash of the MachineSet name.
When a new MachineSet is created the name does not clash with the existing MachineSet having the same template and the rollout can be orchestrated as usual.
Co-authored-by: Enxebre [email protected]
Compared to the previous PR at #4596 I did the following changes:
generateMachineSetName
func to not append another hash to the name, because this would extend the machine object name which could cause other unexpected issues for providers / machines due to the extended length. Instead I decided to recalculate the hash using the same information plus the rolloutAfter value.MachineDeployment.Spec.RolloutAfter
gets now added to the MachineSet when it is getting created. By that the sorting algorithm helps to return the MachineSet by using the following sort criteria:> lastRolloutAnnotation
< creationTimestamp
< Name
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #4536
Additional information
Current sorting algorithm:
< creationTimestamp
< Name
Table to determine all kind of cases (I hope this does not cause more confusion than not having this info, it did help to find the correct implementation):
Reduced table by Case:
Case description: