Skip to content
This repository has been archived by the owner on Oct 16, 2020. It is now read-only.

CoreOS Alpha installation fails on HP ProLiant #307

Closed
mazad01 opened this issue Mar 16, 2015 · 11 comments
Closed

CoreOS Alpha installation fails on HP ProLiant #307

mazad01 opened this issue Mar 16, 2015 · 11 comments

Comments

@mazad01
Copy link

mazad01 commented Mar 16, 2015

Setup:
-HP ProLiant BL660c Gen8
-PXE-Boot to CoreOS live, then install with cloud-config

hostname ~ # coreos-install -C alpha -c ./cloud-config -d /dev/sda
Checking availability of "local-file"
Fetching user-data from datasource of type "local-file"
Downloading the signature for http://alpha.release.core-os.net/amd64-usr/618.0.0/coreos_production_image.bin.bz2...
2014-08-03 03:01:01 URL:http://alpha.release.core-os.net/amd64-usr/618.0.0/coreos_production_image.bin.bz2.sig [543/543] -> "/tmp/coreos-install.xlXoewhW53/coreos_production_image.bin.bz2.sig" [1]
Downloading, writing and verifying coreos_production_image.bin.bz2...
2014-08-03 03:01:30 URL:http://alpha.release.core-os.net/amd64-usr/618.0.0/coreos_production_image.bin.bz2 [133634322/133634322] -> "-" [1]
#GPG Success
/dev/sda: 8 bytes were erased at offset 0x00000200 (gpt): 45 46 49 20 50 41 52 54
/dev/sda: 2 bytes were erased at offset 0x000001fe (PMBR): 55 aa
/dev/sda: calling ioctl to re-read partition table: Device or resource busy
hostname ~ #

And on iLO screen,

2

After rebooting, a red screen appears with the illegal opcode spew. Any ideas?

@marineam marineam changed the title CoreOS Alpha installation fails (GPT header error) CoreOS Alpha installation fails on HP ProLian Mar 16, 2015
@marineam marineam changed the title CoreOS Alpha installation fails on HP ProLian CoreOS Alpha installation fails on HP ProLiant Mar 16, 2015
@marineam
Copy link

The GPT errors are harmless, they are simply caused by the disk being larger than the base disk image coreos-install wrote to disk. CoreOS automatically fixes that up on first boot.

I presume the issue itself is related to GRUB on this particular hardware.

@marineam
Copy link

Oh, missed one extra detail in your report that wipefs got triggered:

/dev/sda: 8 bytes were erased at offset 0x00000200 (gpt): 45 46 49 20 50 41 52 54
/dev/sda: 2 bytes were erased at offset 0x000001fe (PMBR): 55 aa
/dev/sda: calling ioctl to re-read partition table: Device or resource busy

wipefs gets called to invalidate the partition tables if writing the image fails for some reason but no error is being reported in your output for some reason. The code in question is:

echo "Downloading, writing and verifying ${IMAGE_NAME}..."
declare -a EEND
if ! wget --inet4-only --no-verbose -O - "${IMAGE_URL}" \
    | tee >(bunzip2 --stdout >"${DEVICE}") \
    | gpg --batch --trusted-key "${GPG_LONG_ID}" \
        --verify "${WORKDIR}/${SIG_NAME}" -
then
    EEND=(${PIPESTATUS[@]})
    [ ${EEND[0]} -ne 0 ] && echo "${EEND[0]}: Download of ${IMAGE_NAME} did not complete" >&2
    [ ${EEND[1]} -ne 0 ] && echo "${EEND[1]}: Cannot expand ${IMAGE_NAME} to ${DEVICE}" >&2
    [ ${EEND[2]} -ne 0 ] && echo "${EEND[2]}: GPG signature verification failed for ${IMAGE_NAME}" >&2
    wipefs --all --backup "${DEVICE}"
    exit 1
fi

Not sure why you aren't getting an error, perhaps ${PIPESTATUS[@]} is not behaving the way we expected it to. I'm hoping the invalid opcode error is related and not something else entirely but in any case after wipefs is called it shouldn't be possible to boot from the disk, one would generally get a "no OS" sort of error.

Also I'm guessing errors from bunzip2 --stdout >"${DEVICE}" are not actually going to be handled properly, but that is likely not the issue here.

This script really just needs to get replaced :(

@mazad01
Copy link
Author

mazad01 commented Mar 17, 2015

That's what I was expecting too, a missing OS type of error. Here's the red screen if it helps.
capture

@jl-montes
Copy link

I've encountered a similar issue with BL blade hardware and PXE booting.

Can you share what the value of the 'localboot' param is in your pxelinux.cfg/default file?
if it's --> localboot 0
Try --> localboot -1 and re-initiate PXE boot afterwards

@mazad01
Copy link
Author

mazad01 commented Mar 17, 2015

@jl-montes I've also read that on various forums but didn't think to try it because I completely disabled PXE-Boot after CoreOS installed. After doing that, I was getting an OS type of error with the hard disk. But after making several changes to the BIOS in the system(date, array default boot volume, NIC disabling), along with the localboot -1 in the default file for pxelinux.cfg, it installed successfully and the reboot prompted to the Grub menu.

@marineam
Copy link

Yeah, booting from local disk via PXE is a known issue but in this case the root cause is coreos-install silently failing without telling us why.

@marineam
Copy link

@mazad01 so a second run of coreos-install succeeded? I guess we can consider your particular issue resolved but note this as another pain point for prioritizing the rewrite of coreos-install into a form that is more reliable.

@mazad01
Copy link
Author

mazad01 commented Mar 17, 2015

Yes a second run solved it with the changes in the environment with the variables listed above.

@marineam
Copy link

Ok, resolving. Filed a new bug #308 for coreos-install's poor error reporting. It should have been clear that the install failed and how it failed before you rebooted and got the scary invalid opcode.

@mazad01
Copy link
Author

mazad01 commented Mar 17, 2015

Thanks @marineam !

@mazad01
Copy link
Author

mazad01 commented Mar 17, 2015

Also for those wondering the step by step process that I did...

Things I changed...

-Hardcoded the correct time in the BIOS. The time in iLO was wrong thus passing an incorrect time to the bios. Because of this, this error spewed...

gpg: Signature made Thu Mar 12 00:34:18 2015 UTC using RSA key ID E5676EFC
gpg: key E5676EFC was created 3107583 seconds in the future (time warp or clock problem)
gpg: key E5676EFC was created 3107583 seconds in the future (time warp or clock problem)
gpg: Can't check signature: Time conflict
2: GPG signature verification failed for coreos_production_image.bin.bz2

-Recreated HDD array and ENABLED LUN BOOT in the drive configuration menu.
Not sure if that fixed it but it wasn't enabled in the previous attempts to build. Or maybe this needs to be enabled in this specific hardware?

-Created a /tftpboot/pxelinux.cfg/01-MAC-ADDRESS file on TFTPServer. The only difference in that file compared to /tftpboot/pxelinux.cfg/default is the localboot line which I changed to localboot -1. This is a good way to test specific configs since pxeboot recurses through specific device identifiers (our case being mac addresses) then the default.

-Denied 2nd NIC reservation on DHCP Server. Sometimes, the IP wouldn't get resolved thus ssh attempts to it failing. The 2nd NIC was essentially conflicting with the 1st NIC. Probably not important. More likely a local environment issue.

So when this host PXE'd, it loaded that file I created instead of default, and the rest of the steps are pretty much generic.


hostname ~ # curl -O cloud-config path
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  7221  100  7221    0     0  2344k      0 --:--:-- --:--:-- --:--:-- 7051k
hostname ~ # coreos-install -C alpha -c ./cloud-config -d /dev/sda
Checking availability of "local-file"
Fetching user-data from datasource of type "local-file"
Downloading the signature for http://alpha.release.core-os.net/amd64-usr/618.0.0                                                                                                                           /coreos_production_image.bin.bz2...
2015-03-17 15:11:20 URL:http://alpha.release.core-os.net/amd64-usr/618.0.0/coreo                                                                                                                           s_production_image.bin.bz2.sig [543/543] -> "/tmp/coreos-install.zD0yIdlCWG/core                                                                                                                           os_production_image.bin.bz2.sig" [1]
Downloading, writing and verifying coreos_production_image.bin.bz2...
2015-03-17 15:11:49 URL:http://alpha.release.core-os.net/amd64-usr/618.0.0/coreo                                                                                                                           s_production_image.bin.bz2 [133634322/133634322] -> "-" [1]
gpg: Signature made Thu Mar 12 00:34:18 2015 UTC using RSA key ID E5676EFC
gpg: key 93D2DCB4 marked as ultimately trusted
gpg: checking the trustdb
gpg: 3 marginal(s) needed, 1 complete(s) needed, PGP trust model
gpg: depth: 0  valid:   1  signed:   0  trust: 0-, 0q, 0n, 0m, 0f, 1u
gpg: Good signature from "CoreOS Buildbot (Offical Builds) <[email protected]>                                                                                                                           " [ultimate]
Installing cloud-config...
Success! CoreOS alpha 618.0.0 is installed on /dev/sda
hostname ~ #

After reboot, the default option was chosen and success.

This is hostname.domain.com (Linux x86_64 3.19.0) 19:50:23
#SSH Info
eno1: IP and MAC
eno10:
eno2:  MAC
eno9:

hostname login:

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants