Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

equinixmetal: Exclude weird devices from flatcar installation #364

Merged
merged 3 commits into from
Aug 30, 2022

Conversation

krnowak
Copy link
Member

@krnowak krnowak commented Aug 29, 2022

s3.xlarge.x86 has some nvme disks attached, which have the smallest size in bytes from all the available disks. Since we are passing the -s flag to flatcar-install (so the scripts find the smallest disk to install flatcar), they get picked. For some reason, they can't be booted. The nvme devices have major 259, whereas other disks - 8.

I had two ways of fixing the issue - either exclude 259, or include only 8. Went with the former.

During trying to find the fix I did some small cleanup.

The commit messages still need to be rewritten.

I also had a commit that drops fiddling with AlwaysPXE flag, and uses reinstall action instead. This might be something to consider, but it's certainly slower, since all the disks seem to be zeroed. So in case of s3.xlarge.x86 instance, the reinstallation caused the timeout to be triggered. Maybe something for follow-up PR.

Fixes flatcar/Flatcar#834

  • Changelog entries added in the respective changelog/ directory (user-facing change, bug fix, security fix, update)

I think we forgot to destroy the device in one place. Instead of
trying to remember to add the destruction in every fail case, just
destroy the device by default and skip doing it only if everything
succeeds.
I don't know if NVMe disks are a recent addition to s3.xlarge.x86
instances, or maybe those disks got shrunk, but the flatcar-install
script was picking one of them to install flatcar on. For some reason,
the boot agent is not able to boot from them, maybe there is some
additional setup needed, which is missing. Fortunately the script
already had an option of excluding devices by major numbers, so use
this functionality. That way, the script installs the OS on one of
`/dev/sdX` disks, which are bootable.

For device majors see the kernel documentation at:
https://www.kernel.org/doc/Documentation/admin-guide/devices.txt
@krnowak
Copy link
Member Author

krnowak commented Aug 30, 2022

CI is still running, but tests for s3.xlarge.x86 have already succeeded: http://jenkins.infra.kinvolk.io:8080/job/container/job/test/2168/

@krnowak krnowak force-pushed the krnowak/exclude-weird-devices branch from c398f3b to 6be664e Compare August 30, 2022 07:11
@krnowak krnowak marked this pull request as ready for review August 30, 2022 07:11
@krnowak krnowak requested a review from a team August 30, 2022 07:11
Copy link
Contributor

@tormath1 tormath1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch!

@krnowak krnowak merged commit 1115976 into flatcar-master Aug 30, 2022
@krnowak krnowak deleted the krnowak/exclude-weird-devices branch August 30, 2022 08:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

equinix metal s3.xlarge.x86 won't boot
2 participants