Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubeadm: backup kubelet config for "upgrade node" and "upgrade apply" #114695

Merged
merged 4 commits into from
Feb 7, 2023

Conversation

chendave
Copy link
Member

@chendave chendave commented Dec 26, 2022

test result with kinder: https://gist.github.com/chendave/64f7825cf7faa7f30774fe6c48053c21

/kind feature

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?

kubelet config file will be backed up to `/etc/kubernetes/tmp/` folder with `kubeadm-kubelet-config` append with a random suffix as the filename 

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. kind/feature Categorizes issue or PR as related to a new feature. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Dec 26, 2022
@k8s-ci-robot k8s-ci-robot added area/kubeadm sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Dec 26, 2022
@chendave
Copy link
Member Author

/hold
need more self-testing, but I believe this pr will fix the issue we found.

/cc @neolit123 @pacoxu

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 26, 2022
@k8s-ci-robot k8s-ci-robot requested a review from pacoxu December 26, 2022 09:58
@chendave chendave force-pushed the fix_cross_move branch 3 times, most recently from 4a2a5ed to f773f74 Compare December 27, 2022 08:38
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Dec 27, 2022
@chendave
Copy link
Member Author

/hold cancel
this is ready for review now.

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 27, 2022
@chendave chendave force-pushed the fix_cross_move branch 3 times, most recently from 0ad6b18 to cc42ad6 Compare December 28, 2022 05:55
@chendave
Copy link
Member Author

/test pull-kubernetes-e2e-gce

@pacoxu
Copy link
Member

pacoxu commented Jan 3, 2023

I am working on some old PR follow-ups this week and will update some KEPs in this week and next. I will get back to this later this week or next then.
/assign

@chendave
Copy link
Member Author

chendave commented Jan 4, 2023

need a rebase since #114719 was merged.

// Write the configuration for the kubelet down to disk and print the generated manifests instead if dry-running.
// If not dry-running, the kubelet config file will be backed up to /etc/kubernetes/tmp/ dir, so that it could be
// recovered if there is anything goes wrong.
upgrade.WriteKubeletConfigFiles(cfg, data.PatchesDir(), dryRun, data.OutputWriter())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we handle the return err here?

err := upgrade.WriteKubeletConfigFiles(cfg, data.PatchesDir(), dryRun, data.OutputWriter())

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes! addressed.

chendave and others added 2 commits February 2, 2023 11:43
This addresses the TODO item so that the old kubelet config file could
be recovered if something goes wrong.

Signed-off-by: Dave Chen <[email protected]>
Co-authored-by: Paco Xu <[email protected]>
The root cause for that error is because `rename` doesn't work
across different mount points.

The kubelet config file and back up directory are mounted to
different file system in kinder environment.

```
df /var/lib/kubelet/config.yaml | tail -n1 | awk '{print $1}'
/dev/sda2

df /etc/kubernetes/tmp/kubeadm-kubelet-configxxx | tail -n1 | awk '{print $1}'
overlay
```

Call `cp` instead of `rename` to back up the kubelet file would fix
that issue.

Signed-off-by: Dave Chen <[email protected]>
The error was ingored which means if anything wrong from `PrintDryRunFiles`,
it was sliently ignored.

Signed-off-by: Dave Chen <[email protected]>
@chendave
Copy link
Member Author

chendave commented Feb 2, 2023

@pacoxu pls see https://github.com/kubernetes/kubernetes/compare/f6b04fe5d4e168067d978440af4464134f57caa4..484d0d542e3adbcbda08a48236daabcfe1649504 for the diff.

added the check for privileged user so that the test can only run with the root user. Your comments leads me to think about the regression we have: #114824 👍

Back up kubelet config file for `kubeadm upgrade apply`, some code
refactoring is done to de-dup some redundant code logic.

Signed-off-by: Dave Chen <[email protected]>
@pacoxu
Copy link
Member

pacoxu commented Feb 2, 2023

/priority important-soon
/triage accepted

https://github.com/kubernetes/kubernetes/compare/f6b04fe5d4e168067d978440af4464134f57caa4..484d0d542e3adbcbda08a48236daabcfe1649504 for the diff.

added the check for privileged user so that the test can only run with the root user. Your comments leads me to think about the regression we have: #114824 👍

I am not sure if this needs to be fixed or not, and is this the best practice to skip it like this?

Overall
/lgtm

@k8s-ci-robot k8s-ci-robot added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. triage/accepted Indicates an issue or PR is ready to be actively worked on. lgtm "Looks good to me", indicates that a PR is ready to be merged. and removed needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Feb 2, 2023
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 2bb78489bb0588d545d9c542de3d6353e3440aeb

@chendave
Copy link
Member Author

chendave commented Feb 2, 2023

I am not sure if this needs to be fixed or not, and is this the best practice to skip it like this?

my understanding is that upgrade needs the user to be root, so it is non-sense to run those test if the user is not root, besides, I was thinking to create the dir which every user has the permission to write on all the arches & os, but I don't think its possible.

@neolit123
Copy link
Member

/hold
/approve

please test with kinder and the related e2e test workflow before merging / removing hold.
docs on how to use kinder can be found in the k/kubeadm repo

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 7, 2023
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: chendave, neolit123

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 7, 2023
@neolit123
Copy link
Member

neolit123 commented Feb 7, 2023

actually i realize it's not so simple to test uncommited code with kinder....that is because the tool is mostly designed for testing artifacts that are already build from k/k commits in CI.

alternatively just remove the hold watch the e2e test signal in the next 2 days and revert / update if needed:
https://testgrid.k8s.io/sig-cluster-lifecycle-kubeadm

@chendave
Copy link
Member Author

chendave commented Feb 7, 2023

@neolit123 I tested it locally, and here is the link of the result I pasted in the desc,

test result with kinder: https://gist.github.com/chendave/64f7825cf7faa7f30774fe6c48053c21

@chendave
Copy link
Member Author

chendave commented Feb 7, 2023

/hold cancel

let's see if the CI job is happy.

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 7, 2023
@pacoxu
Copy link
Member

pacoxu commented Feb 7, 2023

/retest

@k8s-ci-robot k8s-ci-robot merged commit 8e20eff into kubernetes:master Feb 7, 2023
@k8s-ci-robot k8s-ci-robot added this to the v1.27 milestone Feb 7, 2023
@chendave chendave deleted the fix_cross_move branch February 7, 2023 10:34
@pacoxu
Copy link
Member

pacoxu commented Feb 7, 2023

https://prow.k8s.io/view/gs/kubernetes-jenkins/logs/ci-kubernetes-e2e-kubeadm-kinder-patches-latest/1622937946548080640
This passed after merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubeadm cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants