Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 1772154: RHCOS: Bump to 43.81.201911192044.0 for CRI-O bug fix #2666

Merged
merged 1 commit into from
Nov 20, 2019

Conversation

stbenjam
Copy link
Member

The current version of RHCOS pinned in the installer has a bug where
CRI-O is overriding the PodIP for the baremetal IPI platform's
containers that use host networking, resulting in cluster installation
failing.

fixes #2646

@openshift-ci-robot openshift-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Nov 13, 2019
@stbenjam stbenjam changed the title Bugzilla 1772154: RHCOS: Bump to 43.81.201911131545.0 for CRI-O bug fix Bug 1772154: RHCOS: Bump to 43.81.201911131545.0 for CRI-O bug fix Nov 13, 2019
@openshift-ci-robot openshift-ci-robot added the bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. label Nov 13, 2019
@openshift-ci-robot
Copy link
Contributor

@stbenjam: This pull request references Bugzilla bug 1772154, which is invalid:

  • expected dependent Bugzilla bug 1771623 to be in one of the following states: VERIFIED, RELEASE_PENDING, CLOSED (ERRATA), but it is CLOSED (CURRENTRELEASE) instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

Bug 1772154: RHCOS: Bump to 43.81.201911131545.0 for CRI-O bug fix

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@stbenjam
Copy link
Member Author

/bugzilla refresh

@openshift-ci-robot openshift-ci-robot added the bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. label Nov 13, 2019
@openshift-ci-robot
Copy link
Contributor

@stbenjam: This pull request references Bugzilla bug 1772154, which is valid. The bug has been moved to the POST state.

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot removed the bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. label Nov 13, 2019
@stbenjam
Copy link
Member Author

/cc @hardys

@stbenjam
Copy link
Member Author

/assign @abhinavdahiya

@abhinavdahiya
Copy link
Contributor

/test e2e-azure

@abhinavdahiya
Copy link
Contributor

/test e2e-gcp

@abhinavdahiya
Copy link
Contributor

/test e2e-metal

@abhinavdahiya
Copy link
Contributor

/test e2e-vsphere

1 similar comment
@jcpowermac
Copy link
Contributor

/test e2e-vsphere

@jcpowermac
Copy link
Contributor

Probably related to all the failures. At least in vSphere see screenshot...
image

@stbenjam
Copy link
Member Author

stbenjam commented Nov 14, 2019

@openshift/openshift-team-red-hat-coreos Hello, any chance someone could look at above failures in vsphere related to coreos-cryptfs? 🙏

@stbenjam
Copy link
Member Author

/test e2e-vsphere

@stbenjam
Copy link
Member Author

vsphere is failing due to a bug in coreos-cryptfs, it may be mistaken that the vm has a TPM. Other platforms seem OK (after fixing the outstanding compression issues)

[root@localhost ~]# mount /dev/sdb4 /mnt/core/
mount: /mnt/core: unknown filesystem type 'crypto_LUKS'.
[root@localhost ~]# cryptsetup luksOpen /dev/sdb4 crypted_sda4
Enter passphrase for /dev/sdb4: Error reading passphrase from terminal.

We'll need a new rhcos build once this is fixed

@stbenjam
Copy link
Member Author

/honk

@openshift-ci-robot
Copy link
Contributor

@stbenjam:
goose image

In response to this:

/honk

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@stbenjam
Copy link
Member Author

/hold

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 14, 2019
@stbenjam stbenjam changed the title Bug 1772154: RHCOS: Bump to 43.81.201911131545.0 for CRI-O bug fix Bug 1772154: RHCOS: Bump to 43.81.201911131833.0 for CRI-O bug fix Nov 14, 2019
@stbenjam
Copy link
Member Author

/test e2e-vsphere

@stbenjam
Copy link
Member Author

/label platform/baremetal

Checking Metal3 CI.

@openshift-ci-robot
Copy link
Contributor

@stbenjam: The label(s) /label platform/baremetal cannot be applied. These labels are supported: platform/aws, platform/azure, platform/baremetal, platform/google, platform/libvirt, platform/openstack, ga

In response to this:

/label platform/baremetal

Checking Metal3 CI.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@stbenjam
Copy link
Member Author

stbenjam commented Nov 20, 2019

@stbenjam: The label(s) /label platform/baremetal cannot be applied. These labels are supported: platform/aws, platform/azure, platform/baremetal, platform/google, platform/libvirt, platform/openstack, ga

I kicked off Metal3 CI manually, but why are we still seeing this? platform/baremetal is in that list...

/honk

@openshift-ci-robot
Copy link
Contributor

@stbenjam:
goose image

In response to this:

@stbenjam: The label(s) /label platform/baremetal cannot be applied. These labels are supported: platform/aws, platform/azure, platform/baremetal, platform/google, platform/libvirt, platform/openstack, ga
I kicked off Metal3 CI manually, but why are we still seeing this? platform/baremetal is in that list...
/honk

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@stbenjam
Copy link
Member Author

Looks like vsphere is still broken. @cgwalters Do you know if the encryption fixes include the problem we were seeing with VMware disks?

@hardys
Copy link

hardys commented Nov 20, 2019

I kicked off Metal3 CI manually, but why are we still seeing this? platform/baremetal is in that list...

This CI passed, not sure why the metal3ci comment didn't appear here though...

@cgwalters
Copy link
Member

The vsphere failure is test failures though, not cluster bringup.
/test e2e-vsphere

@cgwalters
Copy link
Member

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Nov 20, 2019
@jcpowermac
Copy link
Contributor

jcpowermac commented Nov 20, 2019

Those are the same test failures I have seen lately. Not related to this PR
/lgtm

@sdodson
Copy link
Member

sdodson commented Nov 20, 2019

/approve

@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cgwalters, JAORMX, jcpowermac, sdodson, stbenjam

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 20, 2019
@stbenjam
Copy link
Member Author

Thank you all!!

@stbenjam
Copy link
Member Author

/hold cancel

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 20, 2019
@openshift-ci-robot
Copy link
Contributor

@stbenjam: The following tests failed, say /retest to rerun them all:

Test name Commit Details Rerun command
ci/prow/e2e-metal 12d450119a815bcfd06257264fd7f78c25be6374 link /test e2e-metal
ci/prow/e2e-azure 12d450119a815bcfd06257264fd7f78c25be6374 link /test e2e-azure
ci/prow/e2e-vsphere 3c05621 link /test e2e-vsphere

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-merge-robot openshift-merge-robot merged commit 5db8dbc into openshift:master Nov 20, 2019
@openshift-ci-robot
Copy link
Contributor

@stbenjam: All pull requests linked via external trackers have merged. Bugzilla bug 1772154 has been moved to the MODIFIED state.

In response to this:

Bug 1772154: RHCOS: Bump to 43.81.201911192044.0 for CRI-O bug fix

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@stbenjam stbenjam deleted the bump-rhcos branch November 20, 2019 16:26
jhixson74 pushed a commit to jhixson74/installer that referenced this pull request Dec 6, 2019
The current version of RHCOS pinned in the installer has a bug where
CRI-O is overriding the PodIP for the baremetal IPI platform's
containers that use host networking, resulting in cluster installation
failing.

Bumping to 43.81.201911192044.0 also brings in some fixes for Azure
FIPS and encryption [1].

Generated with:

  $ hack/update-rhcos-bootimage.py https://releases-art-rhcos.svc.ci.openshift.org/art/storage/releases/rhcos-4.3/43.81.201911192044.0/x86_64/meta.json

Fixes openshift#2646

[1]: openshift#2666 (comment)
wking added a commit to wking/openshift-installer that referenced this pull request Dec 11, 2019
We suspect 3c05621 (RHCOS: Bump to 43.81.201911192044.0 for CRI-O
bug fix, 2019-11-13, openshift#2666) made etcd sad, with a jump in leader
elections and 'etcdserver: request timed out' [1].  Not clear on why
yet, but here's trying the older RHCOS to see how it plays.

[1]: https://bugzilla.redhat.com/show_bug.cgi?id=1775878
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet