Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VirtualBox: Waiting for root device booting the image standalone #842

Closed
dariopb opened this issue Mar 11, 2020 · 12 comments
Closed

VirtualBox: Waiting for root device booting the image standalone #842

dariopb opened this issue Mar 11, 2020 · 12 comments
Labels
area/core Issues core to the OS (variant independent) status/icebox Things we think would be nice but are not prioritized type/enhancement New feature or request

Comments

@dariopb
Copy link

dariopb commented Mar 11, 2020

I compiled the code and generated the img successfully as per the BUILDING instructions.

Image I'm using:

build/bottlerocket-aws-k8s-1.15-x86_64-0.3.1-00000000.img
commit 24ed37c

What I expected to happen:
Image to boot correctly and mount the root filesystem.

What actually happened:
The image seems to boot up correctly but then keeps waiting for: "Waiting for root device /dev/dm-0"
The root filesystem seems to be in the image as well:

(Using virtualBox for now)

Disk /home/dario/projects/bottlerocket/tmp/br.img: 2147483648B
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags: 

Number  Start        End          Size        File system  Name                     Flags
 1      1048576B     5242879B     4194304B                 BIOS-BOOT                bios_grub
 2      5242880B     47185919B    41943040B   ext2         BOTTLEROCKET-BOOT-A
 3      47185920B    1011875839B  964689920B  ext2         BOTTLEROCKET-ROOT-A
 4      1011875840B  1022361599B  10485760B                BOTTLEROCKET-HASH-A
 5      1022361600B  1053818879B  31457280B                BOTTLEROCKET-RESERVED-A
 6      1053818880B  1095761919B  41943040B                BOTTLEROCKET-BOOT-B
 7      1095761920B  2060451839B  964689920B               BOTTLEROCKET-ROOT-B
 8      2060451840B  2070937599B  10485760B                BOTTLEROCKET-HASH-B
 9      2070937600B  2102394879B  31457280B                BOTTLEROCKET-RESERVED-B
10      2102394880B  2146435583B  44040704B   ext4         BOTTLEROCKET-PRIVATE

dario@dario-PC:~/projects/bottlerocket/tmp$ sudo mount -o loop,offset=47185920 br.img /mnt/tmp
dario@dario-PC:~/projects/bottlerocket/tmp$ ls /mnt/tmp
bin   dev  home  lib64  lost+found  mnt  proc  run   srv  tmp  var
boot  etc  lib   local  media       opt  root  sbin  sys  usr  x86_64-bottlerocket-linux-gnu

Is there something else to do to be able to boot the image outside of AWS?

@iliana
Copy link
Contributor

iliana commented Mar 11, 2020

I’ll try to reproduce this today but that all looks right to me. Just to confirm, does partition 4 (HASH-A) have non-zero data?

If you’re able to get full kernel logs it might point to a dm-verity setup issue. If you attach a serial console you should be able to get them as text output.

@iliana
Copy link
Contributor

iliana commented Mar 11, 2020

Actually, I think it might have to do with the virtualized disk device used. Can you let us know what the hardware configuration for the VM you’re using is?

Bottlerocket doesn’t have an initramfs, so any drivers needed to mount the root filesystem must be configured as =y, and we’re probably missing the relevant ones for outside of EC2 even though they’re otherwise being built as kernel modules.

@dariopb
Copy link
Author

dariopb commented Mar 11, 2020

Attached the full boot log. This try is using IDE type PIIX4 (but I tried with all the options virtualbox supports IDE/SATA/virtio-scsi)

[    0.000000] DMI: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[    0.000000] Hypervisor detected: KVM
[    0.000000] kvm-clock: Using msrs 4b564d01 and 4b564d00
[    0.000000] kvm-clock: cpu 0, msr 3201001, primary cpu clock
[    0.000000] kvm-clock: using sched offset of 2820761146 cycles
...
    0.059145] Kernel command line: BOOT_IMAGE=(hd0,gpt2)/vmlinuz root=/dev/dm-0 rootwait ro init=/sbin/preinit console=tty0 conso
le=ttyS0 random.trust_cpu=on selinux=1 enforcing=1 systemd.log_target=journal-or-kmsg systemd.log_color=0 net.ifnames=0 biosdevnam
e=0 dm_verity.max_bios=-1 dm_verity.dev_wait=1 "dm-mod.create=root,,,ro,0 1884160 verity 1 PARTUUID=37795ed7-89f2-4632-8000-246494
ab9f36/PARTNROFF=1 PARTUUID=37795ed7-89f2-4632-8000-246494ab9f36/PARTNROFF=2        4096 4096 235520 1 sha256 b983a5702ff8b0103171
843c6a166c7d4f7a0daea2a2d1d06d1cc458a2e268bf 581cd465c170d6f5fcefc40ca57f0133c4ef478aa3fbf5764bd61e7ae13ded0a 1 restart_on_corrupt
ion"
...
[    0.429194] pci 0000:00:01.1: legacy IDE quirk: reg 0x10: [io  0x01f0-0x01f7]
[    0.429931] pci 0000:00:01.1: legacy IDE quirk: reg 0x14: [io  0x03f6]
[    0.430725] pci 0000:00:01.1: legacy IDE quirk: reg 0x18: [io  0x0170-0x0177]
[    0.431440] pci 0000:00:01.1: legacy IDE quirk: reg 0x1c: [io  0x0376]
[    0.432290] pci 0000:00:02.0: [15ad:0405] type 00 class 0x030000
...
[    0.582532] device-mapper: init: waiting for all devices to be available before creating mapped devices
[    0.584994] device-mapper: table: 252:0: verity: Data device lookup failed
[    0.585870] device-mapper: ioctl: error adding target to table
[    0.586913] md: Waiting for all devices to be available before autodetect
[    0.588019] md: If you don't use raid, use raid=noautodetect
[    0.588848] md: Autodetecting RAID arrays.
[    0.589360] md: autorun ...
[    0.589801] md: ... autorun DONE.
[    0.590443] Waiting for root device /dev/dm-0...

123.log

@iliana
Copy link
Contributor

iliana commented Mar 11, 2020

It looks like we may want to add this to packages/kernel/config-bottlerocket:

CONFIG_ATA=y
CONFIG_ATA_PIIX=y

If you like you can add that, do a build, and let us know how it goes.

@iliana iliana added the type/bug Something isn't working label Mar 11, 2020
@dariopb
Copy link
Author

dariopb commented Mar 12, 2020

I added the ATA settings and now the disk is identified as ATA (before it was the IDE legacy) but unfortunately the dm still doesn't find it.

[    0.659003] device-mapper: init: waiting for all devices to be available before creating mapped devices
[    0.804554] ata1.00: ATA-6: VBOX HARDDISK, 1.0, max UDMA/133
[    0.805142] ata1.00: 4194304 sectors, multi 128: LBA 
[    0.805800] ata2.00: ATAPI: VBOX CD-ROM, 1.0, max UDMA/133
[    0.806855] scsi 0:0:0:0: Direct-Access     ATA      VBOX HARDDISK    1.0  PQ: 0 ANSI: 5
[    0.807907] scsi 1:0:0:0: CD-ROM            VBOX     CD-ROM           1.0  PQ: 0 ANSI: 5
[    0.809047] device-mapper: table: 252:0: verity: Data device lookup failed
[    0.809723] device-mapper: ioctl: error adding target to table
[    0.810536] md: Waiting for all devices to be available before autodetect
[    0.811356] md: If you don't use raid, use raid=noautodetect
[    0.812199] md: Autodetecting RAID arrays.
[    0.812656] md: autorun ...
[    0.812981] md: ... autorun DONE.
[    0.813362] Waiting for root device /dev/dm-0...

ll2.log

@anguslees
Copy link

anguslees commented Mar 13, 2020

Fwiw, I was getting a similar error to the above with qemu. Newer qemu versions support nvme disks directly, and this boots ok!

(Optional, since I didn't have a sufficiently new qemu installable easily)

docker run --rm -ti --device=/dev/kvm -v /tmp:/tmp --entrypoint=/bin/bash tianon/qemu:4.2

Then:

qemu-system-x86_64 -m 1G -netdev user,id=net0 -device e1000,id=net0 -drive file=/tmp/bottlerocket-aws-k8s-1.15-x86_64-0.3.1-b459526.img,if=none,id=root -device nvme,drive=root,serial=1234 -drive file=/tmp/bottlerocket-aws-k8s-1.15-x86_64-0.3.1-b459526-data.img,if=none,id=data -device nvme,drive=data,serial=5678 -nographic

Edit: Oh duh, bottlerocket supports virtio out of the box, so that's even easier. No need for the newer version of qemu (works with 2.11.1), just use:

qemu-system-x86_64 -m 1G -netdev user,id=net0 -device e1000,id=net0 -drive file=/tmp/bottlerocket-aws-k8s-1.15-x86_64-0.3.1-b459526.img,if=virtio -drive file=/tmp/bottlerocket-aws-k8s-1.15-x86_64-0.3.1-b459526-data.img,if=virtio

@mdaniel
Copy link

mdaniel commented Mar 13, 2020

I had to assign a net with -netdev user,id=net0,net=192.168.76.0/24 to keep it from hanging at the DHCP step on boot, but after that it fired right up. Thank you so very much!

@bcressey bcressey added this to the v0.3.2 milestone Mar 24, 2020
@zmrow zmrow assigned zmrow and jahkeup and unassigned zmrow Apr 14, 2020
@zmrow zmrow removed this from the v0.3.2 milestone Apr 14, 2020
@jahkeup
Copy link
Member

jahkeup commented May 14, 2020

There's 2 issues identified, so far, with getting VirtualBox to boot a Bottlerocket image. The first issue can be immediately resolved (via kernel config), while the other requires further investigation (regarding Kernel IO errors + VirtualBox VERR, see below).

The first issue is that the "modern" (1 2) virtio controller and driver - virtio-scsi - requires additional modules that aren't baked in with the current kernel config. Of the virtio block host-side drivers, VirtualBox only supports virtio-scsi so it needs to be configured & included in the kernel build. The second issue is that even with this kernel, the vm will still not successfully start up and begins to spew block device IO errors.

To address the first issue, virtio-scsi simply needs to be configured to be included in the kernel. This suggests providing something like:

CONFIG_SCSI=y
CONFIG_SCSI_VIRTIO=y
CONFIG_BLK_DEV_SD=y

This configuration does move the boot process further along (there's a harmless message logged regarding a failed attempt to configure LRO on a virtio-net adapter), but then we run into issues with the vm where the it can no longer write to its data volume and cascades IO errors, also reported by VirtualBox as VERR's, to both attached disks.

Trimmed console log

...
[  OK  ] Found device HARDDISK BOTTLEROCKET-DATA.
         Starting Prepare Local Directory (/local)...
[    1.598150]  sdb: sdb1
[    1.602093] input: ImExPS/2 Generic Explorer Mouse as /devices/platform/i8042/serio1/input/input6
[    1.604640]  sdb: sdb1
[    1.606095] cryptd: max_cpu_qlen set to 1000
[    1.626151] AVX2 version of gcm_enc/dec engaged.
[    1.626523] AES CTR mode by8 optimization enabled
[    1.638399] mousedev: PS/2 mouse device common for all mice
[    1.706934] kvm: Nested Virtualization enabled
[  OK  ] Finished udev Wait for Complete Device Initialization.
[    1.720867]  sdb: sdb1
[    1.742038] EXT4-fs (sdb1): mounted filesystem with ordered data mode. Opts: 
[    1.745632] EXT4-fs (sdb1): resizing filesystem from 261632 to 261632 blocks
[  OK  ] Finished Prepare Local Directory (/local).
         Mounting Opt Directory (/opt)...
         Mounting Var Directory (/var)...
[  OK  ] Mounted Var Directory (/var).
         Mounting Private Directory (/var/lib/bottlerocket)...
         Starting Flush Journal to Persistent Storage...
[  OK  ] Mounted Opt Directory (/opt).
[    1.770348] systemd-journald[978]: Received client request to flush runtime journal.
[  OK  ] Finished Flush Journal to Persistent Storage.
[    1.773296] EXT4-fs (sda10): mounted filesystem with ordered data mode. Opts: 
[  OK  ] Mounted Private Directory (/var/lib/bottlerocket).
[  OK  ] Reached target Local File Systems.
...
[    2.558735] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[  OK  ] Finished wicked managed network interfaces.
[  OK  ] Reached target Network.
[  OK  ] Reached target Network is Online.
         Starting Bottlerocket userdata configuration system...
[    7.086863] sd 0:0:0:0: [sda] tag#76 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_SENSE
[    7.087608] sd 0:0:0:0: [sda] tag#76 Sense Key : Medium Error [current] 
[    7.088119] sd 0:0:0:0: [sda] tag#76 Add. Sense: Write error
[    7.088581] sd 0:0:0:0: [sda] tag#76 CDB: Write(10) 2a 00 00 3e de 80 00 01 a0 00
[    7.089151] blk_update_request: I/O error, dev sda, sector 4120192 op 0x1:(WRITE) flags 0x0 phys_seg 52 prio class 0
[    7.089807] EXT4-fs warning (device sda10): ext4_end_bio:309: I/O error 10 writing to inode 22 (offset 0 size 4096 starting block 515025)
[    7.090585] Buffer I/O error on device sda10, logical block 1744
[    7.091157] EXT4-fs warning (device sda10): ext4_end_bio:309: I/O error 10 writing to inode 25 (offset 0 size 4096 starting block 515026)
...
[    7.099923] Buffer I/O error on device sda10, logical block 1752
[    7.100313] EXT4-fs warning (device sda10): ext4_end_bio:309: I/O error 10 writing to inode 38 (offset 0 size 4096 starting block 515034)
[    7.101095] Buffer I/O error on device sda10, logical block 1753
[    7.101512] sd 0:0:1:0: [sdb] tag#77 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_SENSE
[    7.102059] sd 0:0:1:0: [sdb] tag#77 Sense Key : Medium Error [current] 
[    7.102483] sd 0:0:1:0: [sdb] tag#77 Add. Sense: Write error
[    7.102853] sd 0:0:1:0: [sdb] tag#77 CDB: Write(10) 2a 00 00 08 08 08 00 00 c8 00
[    7.103339] blk_update_request: I/O error, dev sdb, sector 526344 op 0x1:(WRITE) flags 0x800 phys_seg 25 prio class 0
[    7.104010] Aborting journal on device sdb1-8.

Around this time, an error is reported in the vm's VBox.log:

Trimmed VBox.log

00:00:06.261474 NAT: DHCP offered IP address 10.0.2.15
00:00:10.304198 VD#0: Write (0 bytes left) returned rc=VERR_INVALID_PARAMETER
00:00:10.304221 SCSI#0: Write at offset 2109538304 (212992 bytes left) returned rc=VERR_INVALID_PARAMETER
00:00:10.304277 VD#1: Write (0 bytes left) returned rc=VERR_INVALID_PARAMETER
00:00:10.304280 SCSI#1: Write at offset 269488128 (102400 bytes left) returned rc=VERR_INVALID_PARAMETER
00:00:10.319827 VD#0: Write (0 bytes left) returned rc=VERR_INVALID_PARAMETER
00:00:10.319834 SCSI#0: Write at offset 2109538304 (212992 bytes left) returned rc=VERR_INVALID_PARAMETER

When using the errors in search keywords, one will find forum posts asking how much disk space you have available, among other troubleshooting & debugging questions. I have enough disk space, so this isn't the issue. Another suggested line of troubleshooting from the VBox forums is to use another disk container type - preferably a fixed disk that's native to VirtualBox. For the above repro, I'm using .qcow2, and switching disk container formats used did not change the observed behavior. The disks still became unreadable and unusable - so the investigation continues!

@gregdek gregdek added the status/needs-triage Pending triage or re-evaluation label Dec 29, 2020
@gregdek gregdek removed the status/needs-triage Pending triage or re-evaluation label Jan 14, 2021
@gregdek gregdek added this to the techdebt milestone Apr 1, 2021
@etungsten etungsten changed the title Waiting for root device booting the image standalone VirtualBox: Waiting for root device booting the image standalone Oct 19, 2021
@stmcginnis stmcginnis added status/needs-triage Pending triage or re-evaluation and removed priority/p1 labels Dec 1, 2022
@stmcginnis
Copy link
Contributor

@foersleo would you be able to take a look and see if this is something we could address (or maybe already did)?

@stmcginnis stmcginnis added type/enhancement New feature or request area/core Issues core to the OS (variant independent) status/icebox Things we think would be nice but are not prioritized and removed type/bug Something isn't working status/needs-triage Pending triage or re-evaluation labels Dec 14, 2022
@stmcginnis stmcginnis removed this from the techdebt milestone Dec 14, 2022
@yeazelm
Copy link
Contributor

yeazelm commented Dec 30, 2022

I grabbed the metal-k8s-1.24-x86_64-v1.11.1 image from the repo and was able to get it to boot with VirtualBox on an Ubuntu machine as the host. It moves past the waiting on /dev/dm-0 line so I think we can resolve this issue. One note for those following along, I used a net.toml of:

version = 2 

[enp0s3]
dhcp4 = true

This got the networking going for the default NIC from VirtualBox.

@btiernay
Copy link

@yeazelm thanks for reporting back. Curious as to the full setup you used as I'm trying to get this working on macOS.

My setup is as follows:

Details

image image image

Any ideas of what I may need to adjust? Thanks! 🙇

@btiernay
Copy link

Got it working on macOS and then spent some time trying to automate for the benefit of others 🎉 !

https://gist.github.com/btiernay/5e4d62b126f28962cd008094e867e9a2

If this is useful, happy to create some project documentation around it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/core Issues core to the OS (variant independent) status/icebox Things we think would be nice but are not prioritized type/enhancement New feature or request
Projects
None yet
Development

No branches or pull requests