Issues during upgrade 4.5 to 4.6 #397

llomgui · 2020-11-29T17:57:10Z

Hello,

I tried to update 4.5.0-0.okd-2020-10-15-235428 to 4.6.0-0.okd-2020-11-27-200126.
I had multiple issues due to being behind a proxy.

First I allowed storage.googleapis.com to validate the 4.6 image. Error was: Unable to apply 4.6.0-0.okd-2020-11-27-200126: the image may not be safe to use.

Then I had an issue due to rpm-ostree getting timeout. It was not using the proxy.
I followed coreos/rpm-ostree#762 (comment) to solve it.
Is it possible to manage it via ignition files?
Can we specify a mirror to use instead of trying them all twice (80/443)?

Last issue was gcp rewriting the hostname.
Fix: sudo hostnamectl set-hostname okdmaster1a.example.com

The text was updated successfully, but these errors were encountered:

thurcombe · 2020-11-29T18:04:57Z

Confirmed 4.5 to 4.6 is a problem with rpm-ostree so workaround required but I guess once your cluster is at 4.6 the fedora repos should be disabled and therefore this wont be a problem going forward

vrutkovs · 2020-11-29T22:00:01Z

rpm-ostree getting timeout

Which timeout is it? We're disabling all available repos before proceeding to update, as all necessary RPMs are already included in machine-os-content. Please collect a must-gather

thurcombe · 2020-11-29T22:57:24Z

My 4.5.0-0.okd-2020-10-15-235428 to 4.6.0-0.okd-2020-11-27-200126 failed with the same issue, I added the dropin to set the proxy environment variables which allowed my upgrade to proceed.

Just poked at one of my masters post upgrade, these repos are still enabled:

sh-5.0# grep enabled=1 /etc/yum.repos.d/*
/etc/yum.repos.d/fedora-cisco-openh264.repo:enabled=1
/etc/yum.repos.d/fedora-updates-archive.repo:enabled=1
/etc/yum.repos.d/fedora-updates.repo:enabled=1
/etc/yum.repos.d/fedora.repo:enabled=1

As per our discussion in #389 this didnt seem to be an issue on a fresh 4.6 UPI install.

Same check on my 4.6 cluster that was built from scratch:

sh-5.0# grep enabled=1 /etc/yum.repos.d/*
sh-5.0#

bobby0724 · 2020-11-30T03:05:33Z

Is the 4.5 to 4.6 upgrade really supported ?

vrutkovs · 2020-11-30T08:19:38Z

Just poked at one of my masters post upgrade, these repos are still enabled:

Right, that seems to be a bug - OKD doesn't need enabled repos when updating, as it ships all RPMs with it. This is resolved during fresh install (we disable all Fedora repos by default), but still open on update - and proxy env is not set there. @thurcombe mind filing a separate bug for that?

vrutkovs · 2020-11-30T08:20:53Z

First I allowed storage.googleapis.com to validate the 4.6 image. Error was: Unable to apply 4.6.0-0.okd-2020-11-27-200126: the image may not be safe to use.

That's expected - CVO checks image signatures stored on GCS. Seems to be a docs bug

Last issue was gcp rewriting the hostname.
Fix: sudo hostnamectl set-hostname okdmaster1a.example.com

Is the cluster installed on GCP or this service being run mistakenly (#396)?

thurcombe · 2020-11-30T09:05:26Z

Is the cluster installed on GCP or this service being run mistakenly (#396)?

Just for info, after our previous discussion re a fresh 4.6 install I noted that gcp hostname was listed as a failed unit in my UPI. I figured it was a non-issue but mentioning it here in case it helps. I'll raise a new defect for the repo problem.

llomgui · 2020-11-30T11:06:21Z

Another issue is MountVolume.SetUp failed for volume "var-lib-tuned-profiles-data" : stat /var/lib/kubelet/pods/687ac4e3-cb54-41d2-a31c-6c7d36d4be74/volumes/kubernetes.io~configmap/var-lib-tuned-profiles-data: no such file or directory

Apparently tuner pods cannot mount their volume configmap after an upgrade.
The configmap is the default one, empty.

Solution is to delete all pods. Then they will be running.

giatule · 2021-01-05T14:09:58Z

I got the same issues, tried to delete all pods. the openshift-console and openshift-console still CrashLoopBackOff

openshift-bot · 2021-04-05T18:29:10Z

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-bot · 2021-05-05T22:33:56Z

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

openshift-bot · 2021-06-05T01:24:06Z

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

openshift-ci · 2021-06-05T01:24:13Z

@openshift-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

vrutkovs added the triage/needs-information Indicates an issue needs more information in order to work on it. label Nov 29, 2020

openshift-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 5, 2021

openshift-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 5, 2021

openshift-ci bot closed this as completed Jun 5, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issues during upgrade 4.5 to 4.6 #397

Issues during upgrade 4.5 to 4.6 #397

llomgui commented Nov 29, 2020

thurcombe commented Nov 29, 2020

vrutkovs commented Nov 29, 2020

thurcombe commented Nov 29, 2020 •

edited

Loading

bobby0724 commented Nov 30, 2020

vrutkovs commented Nov 30, 2020

vrutkovs commented Nov 30, 2020

thurcombe commented Nov 30, 2020

llomgui commented Nov 30, 2020

giatule commented Jan 5, 2021 •

edited

Loading

openshift-bot commented Apr 5, 2021

openshift-bot commented May 5, 2021

openshift-bot commented Jun 5, 2021

openshift-ci bot commented Jun 5, 2021

Issues during upgrade 4.5 to 4.6 #397

Issues during upgrade 4.5 to 4.6 #397

Comments

llomgui commented Nov 29, 2020

thurcombe commented Nov 29, 2020

vrutkovs commented Nov 29, 2020

thurcombe commented Nov 29, 2020 • edited Loading

bobby0724 commented Nov 30, 2020

vrutkovs commented Nov 30, 2020

vrutkovs commented Nov 30, 2020

thurcombe commented Nov 30, 2020

llomgui commented Nov 30, 2020

giatule commented Jan 5, 2021 • edited Loading

openshift-bot commented Apr 5, 2021

openshift-bot commented May 5, 2021

openshift-bot commented Jun 5, 2021

openshift-ci bot commented Jun 5, 2021

thurcombe commented Nov 29, 2020 •

edited

Loading

giatule commented Jan 5, 2021 •

edited

Loading