✨ Implement MachineDeployment rolloutAfter support #7053

chrischdi · 2022-08-11T15:47:52Z

What this PR does / why we need it:

If the reconciliation time is after spec.rolloutAfter then a rollout should happen or has already happened.
A new MachineSet will be created at the first time the reconciliation time is after spec.rolloutAfter.
Otherwise the oldest with creation timestamp > lastRolloutAfter annotation is picked.
If a new MachineSet is required due to reconciliation time > spec.rolloutAfter the rolloutAfter time is added for creating the hash of the MachineSet name.
When a new MachineSet is created the name does not clash with the existing MachineSet having the same template and the rollout can be orchestrated as usual.

Co-authored-by: Enxebre [email protected]

Compared to the previous PR at #4596 I did the following changes:

Refactored the table tests and tried to catch all cases
Adjusted the generateMachineSetName func to not append another hash to the name, because this would extend the machine object name which could cause other unexpected issues for providers / machines due to the extended length. Instead I decided to recalculate the hash using the same information plus the rolloutAfter value.
The current value of MachineDeployment.Spec.RolloutAfter gets now added to the MachineSet when it is getting created. By that the sorting algorithm helps to return the MachineSet by using the following sort criteria:
1. New: > lastRolloutAnnotation
2. < creationTimestamp
3. < Name

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #4536

Additional information

Current sorting algorithm:
1. < creationTimestamp
2. < Name

Table to determine all kind of cases (I hope this does not cause more confusion than not having this info, it did help to find the correct implementation):

#	Case	Equal(MS,TPL)	MD.RolloutAfter < now	MD.RolloutAfter vs MS.CreationTimestamp	Result
1	A	no	< (irrelevant)	< (irrelevant)	create
2	A	no	< (irrelevant)	> (irrelevant)	create
3	A	no	> (irrelevant)	< (irrelevant)	create
4	A	no	> (irrelevant)	> (irrelevant)	create
5	B	yes	<	<	create
6	C	yes	<	>	no-op
7	D	yes	>	< (irrelevant)	no-op
8	D	yes	>	> (irrelevant)	no-op

Reduced table by Case:

Case	Equal(MS, TPL)	MD.RolloutAfter vs now	MD.RolloutAfter vs MS.CreationTimestamp	Return Value
A	false	-	-	nil / Create
B	true	<	<	nil / Create
C	true	<	>	MS / no-op
D	true	>	-	MS / no-op

Case description:

A: Create new MachineSet because there is no existing having an equivalent template
B: Create new MachineSet having the same template due to RolloutAfter
C: Keep old MachineSet which has an equal MachineTemplate because RolloutAfter was already done
D: Keep old MachineSet which has an equal MachineTemplate because RolloutAfter should be done in the future

k8s-ci-robot · 2022-08-11T15:47:59Z

@chrischdi: This issue is currently awaiting triage.

If CAPI contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot · 2022-08-11T15:48:05Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign neolit123 for approval by writing /assign @neolit123 in a comment. For more information see:The Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

chrischdi · 2022-08-11T17:16:03Z

@chrischdi: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-cluster-api-verify-main f4f2735 link true /test pull-cluster-api-verify-main
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Have to take a look at this :-)

sbueringer · 2022-08-16T12:24:18Z

@vincepri @enxebre Given how long we spent on the previous PR, would be good to get a first opinion from your side.

If the reconciliation time is after spec.rolloutAfter then a rollout should happen or has already happened. A new MachineSet will be created at the first time the reconciliation time is after spec.rolloutAfter. Otherwise the oldest with creation timestamp > lastRolloutAfter annotation is picked. If a new MachineSet is required due to reconciliation time > spec.rolloutAfter the rolloutAfter time is added for creating the hash of the MachineSet name. When a new MachineSet is created the name does not clash with the existing MachineSet having the same template and the rollout can be orchestrated as usual. Co-authored-by: Enxebre <[email protected]>

chrischdi · 2022-09-22T13:43:27Z

/test help

k8s-ci-robot · 2022-09-22T13:43:30Z

@chrischdi: The specified target(s) for /test were not found.
The following commands are available to trigger required jobs:

/test pull-cluster-api-build-main
/test pull-cluster-api-e2e-main
/test pull-cluster-api-test-main
/test pull-cluster-api-test-mink8s-main
/test pull-cluster-api-verify-main

The following commands are available to trigger optional jobs:

/test pull-cluster-api-apidiff-main
/test pull-cluster-api-e2e-full-main
/test pull-cluster-api-e2e-informing-ipv6-main
/test pull-cluster-api-e2e-informing-main
/test pull-cluster-api-e2e-workload-upgrade-1-25-latest-main

Use /test all to run the following jobs that were automatically triggered:

pull-cluster-api-apidiff-main
pull-cluster-api-build-main
pull-cluster-api-e2e-informing-ipv6-main
pull-cluster-api-e2e-informing-main
pull-cluster-api-e2e-main
pull-cluster-api-test-main
pull-cluster-api-test-mink8s-main
pull-cluster-api-verify-main

In response to this:

/test help

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

chrischdi · 2022-09-22T13:45:56Z

/test pull-cluster-api-apidiff-main
/test pull-cluster-api-e2e-full-main
/test pull-cluster-api-e2e-informing-ipv6-main
/test pull-cluster-api-e2e-informing-main
/test pull-cluster-api-e2e-workload-upgrade-1-25-latest-main

vincepri · 2022-09-26T17:04:26Z

From a quick glance, the current changes make sense to me, although these changes touch on the hashing code that @fabriziopandini was looking at for in place propagation of labels and annotations

chrischdi · 2022-09-27T06:06:02Z

From a quick glance, the current changes make sense to me, although these changes touch on the hashing code that @fabriziopandini was looking at for in place propagation of labels and annotations

Fair 👍 so better hold this and adapt depending on what in place propagation may change.

sbueringer · 2022-09-27T10:08:28Z

From a quick glance, the current changes make sense to me, although these changes touch on the hashing code that @fabriziopandini was looking at for in place propagation of labels and annotations

Fair +1 so better hold this and adapt depending on what in place propagation may change.

Yup. +/- ideally consider what we want to do in this PR during implementation of in-place mutation so it fits nicely.

fabriziopandini · 2022-09-29T15:47:53Z

internal/controllers/machinedeployment/mdutil/util.go

+	// see https://github.com/kubernetes/kubernetes/issues/40415
+	// Besides only considering MachineSets which have an equivalent MachineTemplateSpec, we choose the MachineSet
+	// which has the most recent RolloutAfter annotation set (if any) or as second criteria is the oldest one.
+	sort.Sort(MachineSetsByRolloutAfterAnnotationAndCreationTimestamp(msList))


I was looking at this logic in the context of Node Label propagation / in-place upgrades, and I have noticed that this approach can cause turbulence in the Cluster because it leads to picking one of the matching MS without keeping into account where the machines are. So IMO the sort criteria should be modified in order to pick the MS with more machines on it (*)

This could probably simplify the entire logic by dropping the annotation on MS, and rollout will be triggered by the if in the next for loop that drops MS if rollout after is triggered

(*) this could be a separated PR that we merge as precedence of this one

fabriziopandini · 2022-09-29T18:30:22Z

internal/controllers/machinedeployment/machinedeployment_sync.go

@@ -254,6 +262,42 @@ func (r *Reconciler) getNewMachineSet(ctx context.Context, d *clusterv1.MachineD
 	return createdMS, err
 }

+func generateMachineSetName(d *clusterv1.MachineDeployment, now *metav1.Time) (string, string, error) {


Hash is currently used as:

UID to identify machines belonging to the MS

Adding a unique suffix to the MS set name

Given that I'm really wondering if we should drop the current spew/hash logic and simply use a random string + a check that verifies that the random string is not already taken by an existing MS (for this MD). It seems that the code could be re-entrant also it this way and we can get rid of all this complex logic (*) ...

@vincepri @enxebre @sbueringer opinions?

(*) this could be a separated PR that we merge as precedence of this one

@fabriziopandini Sorry missed the mention somehow.

Sounds fine to me, assuming we can make this re-entrant (I didn't look at the code in detail to see how this would be achieved).

+100 to making this a separate PR independent of this work and the propagation work

Would be nice to get rid of the hash early in the v1.4 cycle to give us time to discover potential side effects

k8s-ci-robot · 2022-11-28T22:50:41Z

@chrischdi: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot · 2023-02-23T11:28:01Z

@chrischdi: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
pull-cluster-api-test-main	`50ece3b`	link	true	`/test pull-cluster-api-test-main`
pull-cluster-api-test-mink8s-main	`50ece3b`	link	true	`/test pull-cluster-api-test-mink8s-main`
pull-cluster-api-e2e-main	`50ece3b`	link	true	`/test pull-cluster-api-e2e-main`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

chrischdi · 2023-03-02T10:24:13Z

This is gonna be replaced by #7053 so closing in favor of it.

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Aug 11, 2022

chrischdi changed the title ~~✨ Implement MachineDeployment rolloutAfter support~~ ✨ [wip] Implement MachineDeployment rolloutAfter support Aug 11, 2022

k8s-ci-robot requested review from CecileRobertMichon and stmcginnis August 11, 2022 15:48

k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Aug 11, 2022

chrischdi force-pushed the pr-rollout-after branch 3 times, most recently from 246eeb0 to f4f2735 Compare August 11, 2022 16:05

chrischdi changed the title ~~✨ [wip] Implement MachineDeployment rolloutAfter support~~ ✨ Implement MachineDeployment rolloutAfter support Aug 11, 2022

tobiasgiese mentioned this pull request Aug 13, 2022

✨ Add KCP feature to clusterctl alpha rollout #6858

Merged

chrischdi force-pushed the pr-rollout-after branch 2 times, most recently from dfbd8a2 to 9663ed6 Compare August 16, 2022 07:06

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 19, 2022

chrischdi force-pushed the pr-rollout-after branch from 9663ed6 to bc28a8a Compare August 22, 2022 09:03

k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 22, 2022

chrischdi force-pushed the pr-rollout-after branch 3 times, most recently from 55823ed to b94dc55 Compare August 27, 2022 18:42

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 20, 2022

chrischdi force-pushed the pr-rollout-after branch from b94dc55 to 50ece3b Compare September 22, 2022 13:43

k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 22, 2022

fabriziopandini reviewed Sep 29, 2022

View reviewed changes

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 28, 2022

sbueringer mentioned this pull request Dec 23, 2022

Umbrella issue for In place propagation of changes affecting Kubernetes objects only proposal implementation #7731

Closed

19 tasks

chrischdi closed this Mar 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

✨ Implement MachineDeployment rolloutAfter support #7053

✨ Implement MachineDeployment rolloutAfter support #7053

chrischdi commented Aug 11, 2022

k8s-ci-robot commented Aug 11, 2022

k8s-ci-robot commented Aug 11, 2022

chrischdi commented Aug 11, 2022

sbueringer commented Aug 16, 2022 •

edited

Loading

chrischdi commented Sep 22, 2022

k8s-ci-robot commented Sep 22, 2022

chrischdi commented Sep 22, 2022

vincepri commented Sep 26, 2022

chrischdi commented Sep 27, 2022

sbueringer commented Sep 27, 2022 •

edited

Loading

fabriziopandini Sep 29, 2022

fabriziopandini Sep 29, 2022 •

edited

Loading

sbueringer Nov 29, 2022 •

edited

Loading

k8s-ci-robot commented Nov 28, 2022

k8s-ci-robot commented Feb 23, 2023

chrischdi commented Mar 2, 2023

✨ Implement MachineDeployment rolloutAfter support #7053

✨ Implement MachineDeployment rolloutAfter support #7053

Conversation

chrischdi commented Aug 11, 2022

k8s-ci-robot commented Aug 11, 2022

k8s-ci-robot commented Aug 11, 2022

chrischdi commented Aug 11, 2022

sbueringer commented Aug 16, 2022 • edited Loading

chrischdi commented Sep 22, 2022

k8s-ci-robot commented Sep 22, 2022

chrischdi commented Sep 22, 2022

vincepri commented Sep 26, 2022

chrischdi commented Sep 27, 2022

sbueringer commented Sep 27, 2022 • edited Loading

fabriziopandini Sep 29, 2022

Choose a reason for hiding this comment

fabriziopandini Sep 29, 2022 • edited Loading

Choose a reason for hiding this comment

sbueringer Nov 29, 2022 • edited Loading

Choose a reason for hiding this comment

k8s-ci-robot commented Nov 28, 2022

k8s-ci-robot commented Feb 23, 2023

chrischdi commented Mar 2, 2023

sbueringer commented Aug 16, 2022 •

edited

Loading

sbueringer commented Sep 27, 2022 •

edited

Loading

fabriziopandini Sep 29, 2022 •

edited

Loading

sbueringer Nov 29, 2022 •

edited

Loading