Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to install on 4k drive #385

Closed
bcg62 opened this issue Feb 12, 2020 · 38 comments · Fixed by coreos/coreos-assembler#1130
Closed

Unable to install on 4k drive #385

bcg62 opened this issue Feb 12, 2020 · 38 comments · Fixed by coreos/coreos-assembler#1130
Labels
jira for syncing to jira platform/metal

Comments

@bcg62
Copy link

bcg62 commented Feb 12, 2020

Bug

Host Operating System Version

31.20200127.3.0

Target Operating System Version

31.20200127.3.0

coreos-installer Version

coreos-installer 0.1.2

Expected Behavior

boot drive part partitioned

Actual Behavior

Clearing partition table
Error: install failed

Reproduction Steps

coreos-installer install /dev/sda --ignition /tmp/coreos-installer-j3u2Cx --firstboot-args rd.neednet=1 ip=dhcp  --image-url http://cache/assets/fedora-coreos/fedora-coreos-31.20200127.3.0-metal.x86_64.raw.xz

Is there a way to get better debug information to find out why or how this is failing?

coreos-installer install /dev/sda --ignition /tmp/coreos-installer-j3u2Cx --firstboot-args rd.neednet=1 ip=dhcp  --image-url http://cache/assets/fedora-coreos/fedora-coreos-31.20200127.3.0-metal.x86_64.raw.xz
Downloading image from http://cache/assets/fedora-coreos/fedora-coreos-31.20200127.3.0-metal.x86_64.raw.xz
Downloading signature from http://cache/assets/fedora-coreos/fedora-coreos-31.20200127.3.0-metal.x86_64.raw.xz.sig
gpg: Signature made Mon Feb 10 23:11:29 2020 UTC
gpg:                using RSA key 50CB390B3C3359C4
gpg: Good signature from "Fedora (31) <[email protected]>" [ultimate]
> Read disk 434.8 MiB/434.8 MiB (100%)
Error: couldn't find boot device for /dev/sda
Clearing partition table
Error: install failed
@bcg62
Copy link
Author

bcg62 commented Feb 12, 2020

same behavior on coreos-installer 0.1.3-alpha.0

@jlebon
Copy link
Member

jlebon commented Feb 12, 2020

You can try booting with coreos.inst.skip_reboot (or just running the install interactively on the prompt) and verify yourself the state of the disk, whether the partitions show up. Hmm, though in that case it doesn't help that coreos-installer nukes the partition table on failure.

Wonder if somehow we didn't wait long enough for the partitions to show up. Is this reproducible? We might just need to bump the timeout. Or maybe we should change the strategy so we keep looping/waiting in udev_settle until we actually detect partitions from the target device coming online? /cc @bgilbert

@bcg62
Copy link
Author

bcg62 commented Feb 12, 2020

thanks @jlebon, re-running manually has the same result.

here is some inspection post failure:

[root@90-e2-ba-7d-9c-e1 ~]# lsblk
NAME  MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
loop0   7:0    0 546.1M  0 loop /sysroot
sda     8:0    0   1.1T  0 disk
sdb     8:16   0  21.8T  0 disk
[root@90-e2-ba-7d-9c-e1 ~]# lsblk --pair
NAME="loop0" MAJ:MIN="7:0" RM="0" SIZE="546.1M" RO="0" TYPE="loop" MOUNTPOINT="/sysroot"
NAME="sda" MAJ:MIN="8:0" RM="0" SIZE="1.1T" RO="0" TYPE="disk" MOUNTPOINT=""
NAME="sdb" MAJ:MIN="8:16" RM="0" SIZE="21.8T" RO="0" TYPE="disk" MOUNTPOINT=""
[root@90-e2-ba-7d-9c-e1 ~]# fdisk -l
Disk /dev/sdb: 21.84 TiB, 23994103234560 bytes, 5857935360 sectors
Disk model: MR9270-8i
Units: sectors of 1 * 4096 = 4096 bytes
Sector size (logical/physical): 4096 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes


Disk /dev/sda: 1.9 TiB, 1199705161728 bytes, 292896768 sectors
Disk model: MR9270-8i
Units: sectors of 1 * 4096 = 4096 bytes
Sector size (logical/physical): 4096 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes


Disk /dev/loop0: 546.7 MiB, 572596224 bytes, 1118352 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes

@jlebon
Copy link
Member

jlebon commented Feb 12, 2020

Right yeah, inspecting in post won't work because the installer clears the partition table.

If you're in a position to compile Rust code, you could give this PR a try: coreos/coreos-installer#155.

@bcg62
Copy link
Author

bcg62 commented Feb 12, 2020

Grabbed an strace as well.

https://gist.github.com/bcg62/91f0339a3b925b17c609d945afcd0bc2

I'll see if I can build and test the patch. thanks.

@bcg62
Copy link
Author

bcg62 commented Feb 12, 2020

more info but still failed.

[root@90-e2-ba-7d-9c-e1 tmp]# ./coreos-installer install /dev/sda --ignition /tmp/coreos-installer-Nvpzfz --firstboot-args rd.neednet=1 --image-url http://cahce/assets/fedora-coreos/fedora-coreos-31.20200127.3.0-metal.x86_64.raw.xz
Downloading image from http://cache/assets/fedora-coreos/fedora-coreos-31.20200127.3.0-metal.x86_64.raw.xz
Downloading signature from http://cache/assets/fedora-coreos/fedora-coreos-31.20200127.3.0-metal.x86_64.raw.xz.sig
gpg: Signature made Mon Feb 10 23:11:29 2020 UTC
gpg:                using RSA key 50CB390B3C3359C4
gpg: Good signature from "Fedora (31) <[email protected]>" [ultimate]
> Read disk 434.8 MiB/434.8 MiB (100%)
Waiting for partition label boot to come onlineError: timed out waiting for boot partition to show up
Clearing partition table
Error: install failed

@bcg62
Copy link
Author

bcg62 commented Feb 12, 2020

I increased the retry

-    for _ in (0..10).rev() {
+    for _ in (0..1000).rev() {

during the loop took some more inspections

# lsblk
NAME  MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
loop0   7:0    0 546.1M  0 loop /sysroot
sda     8:0    0   1.1T  0 disk
sdb     8:16   0  21.8T  0 disk
# fdisk -l /dev/sda
GPT PMBR size mismatch (5455871 != 292896767) will be corrected by write.
Disk /dev/sda: 1.9 TiB, 1199705161728 bytes, 292896768 sectors
Disk model: MR9270-8i
Units: sectors of 1 * 4096 = 4096 bytes
Sector size (logical/physical): 4096 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: dos
Disk identifier: 0x00000000

Device     Boot Start       End   Sectors  Size Id Type
/dev/sda1           1 292896767 292896767  1.1T ee GPT
# gdisk -l /dev/sda
GPT fdisk (gdisk) version 1.0.4

Partition table scan:
  MBR: protective
  BSD: not present
  APM: not present
  GPT: not present

Creating new GPT entries in memory.
Disk /dev/sda: 292896768 sectors, 1.1 TiB
Model: MR9270-8i
Sector size (logical/physical): 4096/4096 bytes
Disk identifier (GUID): 2C6BF6B3-1B71-42F8-8BE6-F0C6CA6BA385
Partition table holds up to 128 entries
Main partition table begins at sector 2 and ends at sector 5
First usable sector is 6, last usable sector is 292896762
Partitions will be aligned on 256-sector boundaries
Total free space is 292896757 sectors (1.1 TiB)

Number  Start (sector)    End (sector)  Size       Code  Name

@jlebon
Copy link
Member

jlebon commented Feb 12, 2020

Hmm, what kind of setup are you installing on?

@bcg62
Copy link
Author

bcg62 commented Feb 12, 2020

bare-metal, attempting an install via pxe boot.

EFI, megaraid-sas-9270-8i, HGST HUC101812CS4200 4k drives

Looking for anything in specific?

@jlebon
Copy link
Member

jlebon commented Feb 12, 2020

4k drives

Ahh I think that might be the issue. My understanding is that 4k disks will require separate metal images which we do not yet produce. Our current metal images assume traditional 512b sector-sized drives.

@jlebon jlebon changed the title debug partitioning during install Unabl to install on 4k drive Feb 12, 2020
@jlebon jlebon changed the title Unabl to install on 4k drive Unable to install on 4k drive Feb 12, 2020
@jlebon
Copy link
Member

jlebon commented Feb 12, 2020

Ahh yup, this was discussed here: #18 (comment).

Transferring this ticket to the main tracker.

@jlebon jlebon transferred this issue from coreos/coreos-installer Feb 12, 2020
@bcg62
Copy link
Author

bcg62 commented Feb 13, 2020

reading though #18 is seems like there isn't currently a work around?

Is there a way or documentation on how someone might be able to build their own 4k compat images?

@dustymabe
Copy link
Member

Ahh I think that might be the issue. My understanding is that 4k disks will require separate metal images which we do not yet produce. Our current metal images assume traditional 512b sector-sized drives.

I almost wonder if we could handle this case as part of the "complex root devices" work. i.e. rather than make a separate metal image we act like we would if someone requested a different filesystem on ignition during boot. There's probably a million reasons why this wouldn't work, but I'd really like to not have to ship a separate set of metal images :(

@jlebon
Copy link
Member

jlebon commented Feb 13, 2020

I started on this in coreos/coreos-assembler#1130. If you're feeling adventurous and want to test that patch, you can build cosa with it, and then build FCOS as described in the cosa README. :)

I almost wonder if we could handle this case as part of the "complex root devices" work. i.e. rather than make a separate metal image we act like we would if someone requested a different filesystem on ignition during boot.

For 4k native setups, I don't think this is doable on first boot, because the system wouldn't even be able to boot in the first place. But it might be doable to do at install time via coreos-installer. E.g. it could detect that the target disk is 4k and translate the metal image on the fly? Not sure how feasible that is though. My understanding is that it's not just a bit-level transformation, but would require changing the filesystem level itself since it assumes 512b sectors (see #18 (comment)). Worth exploring the idea though with some storage folks.

@darkmuggle
Copy link
Contributor

For 4k native setups, I don't think this is doable on first boot, because the system wouldn't even be able to boot in the first place. But it might be doable to do at install time via coreos-installer. E.g. it could detect that the target disk is 4k and translate the metal image on the fly? Not sure how feasible that is though. My understanding is that it's not just a bit-level transformation, but would require changing the filesystem level itself since it assumes 512b sectors (see #18 (comment)). Worth exploring the idea though with some storage folk

I agree with @jlebon here. The installer will splat the disk down and then the complicated disk setup work won't even come to play and worse, prepping /boot is out of scope for complicated disks. For the time being, we'll need a new disk target.

@bgilbert
Copy link
Contributor

coreos/coreos-installer#167 teaches coreos-installer to detect this case and fail with a clearer error message.

@bgilbert
Copy link
Contributor

Just did a quick experiment with losetup --sector-size 4096 and ext4, XFS, and FAT32. XFS filesystems created for a 512-byte sector fail to mount from a 4k-sector disk:

XFS (loop0p1): device supports 4096 byte sectors (not 512)

FAT32 likewise fails:

FAT-fs (loop0p3): logical sector size too small for device (logical sector size = 512)

ext4 works fine, and all three are fine if created on a 4k-sector disk but mounted from a 512-byte sector disk.

That's not an argument that we should just use 4k sectors for everything, though. XFS, at least, is presumably relying on atomic sector writes.

jlebon added a commit to jlebon/coreos-assembler that referenced this issue Mar 11, 2020
First, add a new `buildextend-metal4k` command to create 4k disk images.
Then, teach `kola` and `cosa run` to read these images.

To test:

    cosa run -I metal4k

One potentially controversial bit here is that this requires a newer
libguestfs which isn't in f31 yet, so we pull it from f32 for now.

Closes: coreos/fedora-coreos-tracker#385
jlebon added a commit to jlebon/coreos-assembler that referenced this issue Mar 11, 2020
First, add a new `buildextend-metal4k` command to create 4k disk images.
Then, teach `kola` and `cosa run` to read these images.

To test:

    cosa run -I metal4k

One potentially controversial bit here is that this requires a newer
libguestfs which isn't in f31 yet, so we pull it from f32 for now.

Closes: coreos/fedora-coreos-tracker#385
jlebon added a commit to jlebon/coreos-assembler that referenced this issue Mar 11, 2020
First, add a new `buildextend-metal4k` command to create 4k disk images.
Then, teach `kola` and `cosa run` to read these images.

To test:

    host$ cosa run -I metal4k
    ...
    vm$ sudo fdisk -l /dev/vda
    ...
    Sector size (logical/physical): 4096 bytes / 4096 bytes
    ...

One potentially controversial bit here is that this requires a newer
libguestfs which isn't in f31 yet, so we pull it from f32 for now.

Closes: coreos/fedora-coreos-tracker#385
jlebon added a commit to jlebon/coreos-assembler that referenced this issue Mar 12, 2020
First, add a new `buildextend-metal4k` command to create 4k disk images.
Then, teach `kola` and `cosa run` to read these images.

To test:

    host$ cosa run -I metal4k
    ...
    vm$ sudo fdisk -l /dev/vda
    ...
    Sector size (logical/physical): 4096 bytes / 4096 bytes
    ...

One potentially controversial bit here is that this requires a newer
libguestfs which isn't in f31 yet, so we pull it from f32 for now.

Closes: coreos/fedora-coreos-tracker#385
@jlebon
Copy link
Member

jlebon commented Mar 19, 2020

Hmm, or maybe a cleaner alternative is having a metal.4k object, which inside has "raw.xz". OK, did that in coreos/fedora-coreos-releng-automation#87.

@lucab
Copy link
Contributor

lucab commented Mar 20, 2020

@jlebon from the point of coreos-installer list-stream (the final consumer of all of this), how do the 4k artifacts fit in the arch-platform-format tuple?

jlebon added a commit to jlebon/fedora-coreos-releng-automation that referenced this issue Mar 20, 2020
@jlebon
Copy link
Member

jlebon commented Mar 20, 2020

@lucab In coreos/fedora-coreos-releng-automation#87, I went back to just using 4k.raw.xz. So the list-stream output would look like this:

Architecture  Platform   Format
x86_64        aliyun     qcow2.xz
x86_64        aws        vmdk.xz
x86_64        azure      vhd.xz
x86_64        exoscale   qcow2.xz
x86_64        gcp        tar.gz
x86_64        metal      iso
x86_64        metal      pxe
x86_64        metal      raw.xz
x86_64        metal      4k.raw.xz
x86_64        openstack  qcow2.xz
x86_64        qemu       qcow2.xz
x86_64        vmware     ova

@dustymabe
Copy link
Member

There are a few things left to do before it's all plumbed through:

  • plumb metal4k through all the metadata and website
  • adjust coreos-installer to automatically use the metal4k image

Some work has happened since then. Is the above still the current status?

@jlebon
Copy link
Member

jlebon commented Mar 30, 2020

Yes, this is done. The download page already lists 4k images but I've submitted a patch to give it a better label and have it listed near the regular raw image: https://pagure.io/fedora-web/websites/pull-request/104

@jlebon jlebon closed this as completed Mar 30, 2020
@bgilbert
Copy link
Contributor

Let's leave this open until coreos-installer can automatically select 4k images and/or we've documented how to install them.

@bgilbert bgilbert reopened this Mar 30, 2020
@jlebon
Copy link
Member

jlebon commented Mar 30, 2020

Oh right, I totally forgot about that part. (I actually did start on that, which is probably why I got mixed up and thought it was done already!)

@bgilbert
Copy link
Contributor

coreos-installer feature request: coreos/coreos-installer#201

@jlebon
Copy link
Member

jlebon commented Mar 31, 2020

Let's also add CI coverage to the list of things missing before closing this out completely.

@jlebon
Copy link
Member

jlebon commented Apr 8, 2020

Kola support for testing metal4k added in coreos/coreos-assembler#1312 and part of coreos-assembler CI. coreos-installer support and CI testing added in coreos/coreos-installer#203 and pipeline testing in coreos/fedora-coreos-pipeline#218.

So we just need a new coreos-installer release now. @dustymabe Is the new procedure here to leave this open until that happens or do we close it and tag it with something?

@dustymabe
Copy link
Member

The fix for this landed upstream. It is now pending a testing stream release.

@dustymabe dustymabe added the status/pending-testing-release Fixed upstream. Waiting on a testing release. label Apr 10, 2020
@dustymabe
Copy link
Member

we're still waiting on that coreos-installer release for this right?

@jlebon
Copy link
Member

jlebon commented Apr 27, 2020

we're still waiting on that coreos-installer release for this right?

Yes, correct.

@dustymabe
Copy link
Member

The fix for this went into testing stream release 31.20200505.2.0. Please try out the new release and report issues.

@dustymabe dustymabe added status/pending-stable-release Fixed upstream and in testing. Waiting on stable release. and removed status/pending-testing-release Fixed upstream. Waiting on a testing release. labels May 8, 2020
@dustymabe
Copy link
Member

The fix for this went into stable stream release 31.20200505.3.0.

@dustymabe dustymabe removed the status/pending-stable-release Fixed upstream and in testing. Waiting on stable release. label May 20, 2020
@majorinche
Copy link

majorinche commented Jun 2, 2020

is this fixed, i still encounter this issue, install os from pxe env.

the pxelinux.cfg file is:
DEFAULT pxeboot
TIMEOUT 20
PROMPT 0
LABEL pxeboot
KERNEL fedora-coreos-31.20200505.3.0-live-kernel-x86_64
APPEND ip=dhcp rd.neednet=1 initrd=fedora-coreos-31.20200505.3.0-live-initramfs.x86_64.img coreos.inst=yes coreos.inst.ignition_url=http://10.30.0.109/example.ign coreos.inst.image_url=http://10.30.0.109/fedora-coreos-31.20200505.3.0-metal.x86_64.raw.xz coreos.inst.install_dev=/dev/sda coreos.inst.insecure=yes
IPAPPEND 2

the report error still be:

coreos-installer install /dev/sda --ignition /tmp/coreos-installer-hmtgYF --firstboot-args rd.neednet=1 ip=dhcp BOOTIF=01-3c-2a-1d-da-44 --image-url http://10.30.0.109/fedora-coreos-31.20200505.3.0-metal.x86_64.raw.xz --insecure

Error: parsing arguments
Caused by: getting sector size of /dev/sda
Caused by: opening "/dev/sda"
Caused by: No such file or directory(os error2)

@dustymabe
Copy link
Member

dustymabe commented Jun 2, 2020

Assuming /dev/sda exists in your machine that error message could clearly use some work.

I see you are using the wrong metal image for a 4k sector drive. Could you try with the 4k metal image? fedora-coreos-31.20200517.3.0-metal4k.x86_64.raw.xz

@amyonatan
Copy link

how did you build the image with 4K sector size?

@bgilbert
Copy link
Contributor

cosa buildextend-metal4k

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
jira for syncing to jira platform/metal
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants