Skip to content
This repository has been archived by the owner on Oct 29, 2024. It is now read-only.

Try to work around Debian kernel weirdness #238

Closed
wants to merge 1 commit into from

Conversation

rincebrain
Copy link
Contributor

Looking at this, it fails to build later because zlib1g-dev didn't get installed when the huge apt install failed because of this:

+ sudo -E apt-get --yes install linux-headers-5.10.0-0.bpo.8-cloud-arm64 zlib1g-dev uuid-dev libblkid-dev libselinux-dev xfslibs-dev libattr1-dev libacl1-dev libudev-dev libdevmapper-dev libssl-dev libaio-dev libffi-dev libelf-dev libmount-dev libpam0g-dev pamtester python-dev python-setuptools python-cffi python-packaging python3 python3-dev python3-setuptools python3-cffi libcurl4-openssl-dev python3-packaging python-distlib python3-distlib
Reading package lists...
Building dependency tree...
Reading state information...
Package linux-headers-5.10.0-0.bpo.8-cloud-arm64 is not available, but is referred to by another package.
This may mean that the package is missing, has been obsoleted, or
is only available from another source

E: Package 'linux-headers-5.10.0-0.bpo.8-cloud-arm64' has no installation candidate

Which is curious, because the buildslave AMI itself is running 4.19.0-11-arm64, and the x86_64 testslave is getting a non-bpo kernel...

Unfortunately, I can't really see into the AMI instance that's spawned, so I can't tell what AMI it was running or why that has a BPO kernel and the x86_64 does not, just that in the ones that succeeded before it really is using those headers and not, say, the kernel version the buildbot has. Since the buildslave's AMIs are the only ones I see mentioned in the repo, I can't see where it would be getting one that is flawed like that...

So here's a workaround. Note that this should not be merged as-is - if this is going to happen, the relevant package files should be rehosted somewhere and that location pointed to instead, because otherwise the snapshot "mirror" is probably going to start banning hosts for hitting it too much.

For some reason, the arm64 testbot is booting an old version of a
-backports kernel, and failing to find headers because it's, well,
old.

So let's shove that into a separate command, and add an attempted
workaround to try to install the right thing anyway, until we figure
out why the AMI is doing something backwards...

Signed-off-by: Rich Ercolani <[email protected]>
@behlendorf
Copy link
Contributor

When the buildbot spins up an AMI it calls the bb-bootstrap.sh script which installs and configures buildbot, does some minimal additional configuration, and then installs the latest kernel. For the Debian aarch64 AMIs it also switches to using the linux-image-cloud-arm64 repository for the newer kernel then reboots on to it.

This sure looks like a problem with the linux-image-cloud-arm64 repository on Buster not providing a matching linux-image and linux-headers package. It's quite easy to reproduce using the latest AMI for Buster.

One option which would probably work is to move to Debian Bullseye now that's it's been released. There doesn't appear to be any issue with the repositories there. However, we'll may run in to other minor issues. For example, it does look like some of the names of packages we need to install have changed.

Presumably the Debian linux-image-cloud-arm64 repository on Buster will get sorted out at some point to which would sort things out.

@rincebrain
Copy link
Contributor Author

That's curious. (Why does it need the newer kernel?)

Sigh. I've filed a bug against Debian.

In the interim, we could just fetch the old package, as I suggested, or hopping to Debian 11 would also work.

@behlendorf
Copy link
Contributor

My recollection is the newer kernel was primarily to get better performance in ec2. We could probably do without.

Let me try and roll things forward to Debian 11. A test build went fine, the CI is idle, and we should move forward anyway.

@behlendorf
Copy link
Contributor

behlendorf commented Oct 29, 2021

Moving forward didn't go as smoothly as I'd hoped. In the end I opted to make two CI changes to resolve these failures. I switched up back to the default kernel for aarch64, and I updated the bootstrap script so we always reboot on to the newest kernel. In practice we were already doing this for almost all of the builders anyway.

5b320ac Always reboot Linux builders on to the latest kernel
2f2927f Revert "bb-bootstrap, install a newer kernel on buster arm64"

@behlendorf
Copy link
Contributor

Closing. Switching to the default kernel worked as expected and resolved this issue.

@behlendorf behlendorf closed this Oct 29, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants