Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segment fault with Ubuntu 24.04 20250120.5.0 #11471

Closed
3 of 16 tasks
RadxaYuntian opened this issue Jan 26, 2025 · 40 comments
Closed
3 of 16 tasks

Segment fault with Ubuntu 24.04 20250120.5.0 #11471

RadxaYuntian opened this issue Jan 26, 2025 · 40 comments

Comments

@RadxaYuntian
Copy link

RadxaYuntian commented Jan 26, 2025

Current workaround

GitHub has refused to acknowledge this problem. The current workaround can be found here.

Description

We have a scheduled job running every Sunday, which failed today, with no code change in last 2 weeks.

After checking the build log, it always failed at a dkms package installation. Once the workflow file is changed to print the dkms log, the error is always gcc segment fault.

Changing running environment to ubuntu22-04 fixed the segment fault. Action still failed but that's because the change we made to investigate this issue.

What may be unusual for us is that we are using binfmt to run aarch64 gcc in a devcontainer, because the final output is an aarch64 system image. So this is not some normal gcc failing.

Platforms affected

  • Azure DevOps
  • GitHub Actions - Standard Runners
  • GitHub Actions - Larger Runners

Runner images affected

  • Ubuntu 20.04
  • Ubuntu 22.04
  • Ubuntu 24.04
  • macOS 12
  • macOS 13
  • macOS 13 Arm64
  • macOS 14
  • macOS 14 Arm64
  • macOS 15
  • macOS 15 Arm64
  • Windows Server 2019
  • Windows Server 2022
  • Windows Server 2025

Image version and build link

20250120.5.0

Is it regression?

20250105.1.0: https://github.com/RadxaOS-SDK/rsdk/actions/runs/12848906725

Expected behavior

DKMS install successfully without gcc segfault.

Actual behavior

gcc segfault:

   2025-01-26 07:47:36,252 bdebstrap ERROR: mmdebstrap failed with exit code 25. See above for details.
  
  /workspaces/rsdk
  
  DKMS make.log for radxa-overlays-0.1.20 for kernel 6.1.68-2-stable (aarch64)
  Sun Jan 26 07:47:16 UTC 2025
  make: Entering directory '/usr/src/linux-headers-6.1.68-2-stable'
  Segmentation fault (core dumped)
  warning: the compiler differs from the one used to build the kernel
    The kernel was built by: aarch64-linux-gnu-gcc (Debian 10.2.1-6) 10.2.1 20210110
    You are using:           gcc (Debian 12.2.0-14) 12.2.0
    CC [M]  /var/lib/dkms/radxa-overlays/0.1.20/build/radxa-overlays.o
    DTC     /var/lib/dkms/radxa-overlays/0.1.20/build/arch/arm64/boot/dts/amlogic/overlays/meson-g12-disable-gpu.dtbo
    DTC     /var/lib/dkms/radxa-overlays/0.1.20/build/arch/arm64/boot/dts/amlogic/overlays/meson-g12-disable-hdmi.dtbo
    DTC     /var/lib/dkms/radxa-overlays/0.1.20/build/arch/arm64/boot/dts/rockchip/overlays/radxa-s0-ext-antenna.dtbo
  gcc: internal compiler error: Segmentation fault signal terminated program cc1
  Please submit a full bug report, with preprocessed source (by using -freport-bug).
  See <file:///usr/share/doc/gcc-12/README.Bugs> for instructions.
  make[2]: *** [scripts/Makefile.lib:409: /var/lib/dkms/radxa-overlays/0.1.20/build/arch/arm64/boot/dts/rockchip/overlays/radxa-s0-ext-antenna.dtbo] Error 4
  make[1]: *** [scripts/Makefile.build:500: /var/lib/dkms/radxa-overlays/0.1.20/build/arch/arm64/boot/dts/rockchip/overlays] Error 2
  make[1]: *** Waiting for unfinished jobs....

Repro steps

  1. Clone https://github.com/RadxaOS-SDK/rsdk
  2. Cherry pick RadxaYuntian/rsdk@090908a to view dkms log
  3. Trigger workflow_dispatch for build.yaml
@RadxaYuntian
Copy link
Author

RadxaYuntian commented Jan 26, 2025

The gcc version Debian 12.2.0-14 was released on 2023/01/08, so the last successful run (2025/01/19) and today's failed run are both using the same version in the devcontainer.

@deviantintegral
Copy link

I can confirm this as well at https://github.com/pbkhrv/rtl_433-hass-addons/actions/runs/12972957498/job/36181006667. That job is compiling aarch64 in Docker under QEMU (I know, proper cross compiling would be better, but this is what the official Home Assistant builder action does so 🤷 ).

Is there a way to specify the runner image version to a previous 24.04 release to confirm the regression?

@woblerr
Copy link

woblerr commented Jan 26, 2025

The same problem for buildx for linux/arm64 via QEMU: https://github.com/woblerr/docker-pgbackrest/actions/runs/12965488407/job/36165276019#step:7:2658

Rollback to the ubuntu-22.04 runner solved the problem.

@MyreMylar
Copy link

Chiming in to say that we are seeing segfaults on our test runners for pygame-ce in the ppc64le architecture build since getting version 20250120.5.0. and, perhaps related, it also reporting that it can no longer detect the GNU compiler type for our S390x architecture build.

As @deviantintegral says it would be nice to have a way to roll back to a previous runner image to isolate the problem.

@RaviAkshintala
Copy link
Contributor

Hi @RadxaYuntian Thank you for bringing this issue to our attention. We will look into this issue and will update you after investigating.

stevenhorsman added a commit to stevenhorsman/cloud-api-adaptor that referenced this issue Jan 27, 2025
Due to an
[issue](actions/runner-images#11471)
with Ubuntu 24.04 20250120.5.0 runner image
we have been seeing failures in our multi-arch images for
the last few days which is blocking the release. I assume that
the issue is something related to qemu, so downgrade to 22.04
until this issue is resolved.

Signed-off-by: stevenhorsman <[email protected]>
@BrianPugh
Copy link

I'm also having very similar issues in tamp when using cibuildwheel to build python wheels for ppc64le and aarch64 targets.

@rtobar
Copy link

rtobar commented Jan 28, 2025

Same issue here with gcc segfault, but in my case I saw it both with ubuntu-latest and ubuntu-20.04. Updating/downgrading to ubuntu-22.04 solved it as mentioned by other people.

stevenhorsman added a commit to confidential-containers/cloud-api-adaptor that referenced this issue Jan 28, 2025
Due to an
[issue](actions/runner-images#11471)
with Ubuntu 24.04 20250120.5.0 runner image
we have been seeing failures in our multi-arch images for
the last few days which is blocking the release. I assume that
the issue is something related to qemu, so downgrade to 22.04
until this issue is resolved.

Signed-off-by: stevenhorsman <[email protected]>
charlesomer referenced this issue in mikebrady/shairport-sync Jan 28, 2025
@visstro
Copy link

visstro commented Feb 13, 2025

tonistiigi/binfmt:latest had large version bump just 2 days ago - tonistiigi/binfmt#165 (comment).
There is a new bug report tracking segfault issues: tonistiigi/binfmt#240

Looks like the solution for some people is to pin tonistiigi/binfmt:qemu-v7.0.0-28.

In my case tonistiigi/binfmt:qemu-v8.1.5 is still working fine (as I already noted in previous comment).

RadxaYuntian added a commit to RadxaOS-SDK/rsdk that referenced this issue Feb 17, 2025
@RadxaYuntian
Copy link
Author

Now even with 22.04 the segfault still happened with docker.io/tonistiigi/binfmt:latest.

22.04+qemu v7 seems to be fine.

baszoetekouw added a commit to OpenConext/OpenConext-BaseContainers that referenced this issue Feb 17, 2025
- add arm64 builds (using ubuntu-22.04 runners for now, until actions/runner-images#11471 is fixed)
- use new fancy cache feature of docker/build-push-action
- run actions for all branches, but only push for main
@ashwin153
Copy link

ashwin153 commented Feb 17, 2025

switching to ubuntu-22.04 + pinning to tonistiigi/binfmt:qemu-v7.0.0-28 worked for me.

    runs-on: ubuntu-22.04
    steps:
      - uses: actions/checkout@v4
      - uses: docker/setup-qemu-action@v3
        with:
          image: tonistiigi/binfmt:qemu-v7.0.0-28

soulgalore added a commit to sitespeedio/sitespeed.io that referenced this issue Feb 18, 2025
PastaPastaPasta added a commit to dashpay/dash that referenced this issue Feb 18, 2025
…`jammy`), pin QEMU to avoid segfault

20524e4 fix: pin version of QEMU to avoid segfault (Kittywhiskers Van Gogh)
c8ab705 revert: update containers and CI to use Ubuntu 24.04 LTS (`noble`) (Kittywhiskers Van Gogh)
2bcc90a revert: use default non-root user `ubuntu` introduced in Ubuntu 22.10 (Kittywhiskers Van Gogh)

Pull request description:

  ## Additional Information

  * There is a regression in `noble` that has caused a build failure in trying to package our release container ([build](https://github.com/dashpay/dash/actions/runs/13376032021)) linked to [docker/setup-qemu-action#198](docker/setup-qemu-action#198).

  * In light of this, all non-CI containers and jobs have been moved back to Ubuntu 22.04 (`jammy`) though it seems that downgrading alone may be insufficient (see [actions/runner-images#11471](actions/runner-images#11471 (comment))).
    * To remedy this we are pinning the version of QEMU as suggested in [tonistiigi/binfmt#240](tonistiigi/binfmt#240 (comment))

  ## Checklist:

  - [x] I have performed a self-review of my own code
  - [x] I have commented my code, particularly in hard-to-understand areas **(note: N/A)**
  - [x] I have added or updated relevant unit/integration/functional/e2e tests **(note: N/A)**
  - [x] I have made corresponding changes to the documentation **(note: N/A)**
  - [x] I have assigned this pull request to a milestone _(for repository code-owners and collaborators only)_

ACKs for top commit:
  UdjinM6:
    utACK 20524e4

Tree-SHA512: 26bb2cd55a0267b56f86938d97ddfa32f0cdd8a2786c0366eedbcddf706e38b6af93cd29ab98ba420cbdbd112561ded61e2dba906c4b233ad737f24730f58ddc
PastaPastaPasta added a commit to dashpay/dash-dev-branches that referenced this issue Feb 19, 2025
…`jammy`), pin QEMU to avoid segfault

20524e48150c1552e60e833b4b42684b734e0040 fix: pin version of QEMU to avoid segfault (Kittywhiskers Van Gogh)
c8ab7051afe03576f12789ed3fdbdaf63155ac59 revert: update containers and CI to use Ubuntu 24.04 LTS (`noble`) (Kittywhiskers Van Gogh)
2bcc90af46df03f4d0f1d0478187feb31b604c62 revert: use default non-root user `ubuntu` introduced in Ubuntu 22.10 (Kittywhiskers Van Gogh)

Pull request description:

  ## Additional Information

  * There is a regression in `noble` that has caused a build failure in trying to package our release container ([build](https://github.com/dashpay/dash/actions/runs/13376032021)) linked to [docker/setup-qemu-action#198](docker/setup-qemu-action#198).

  * In light of this, all non-CI containers and jobs have been moved back to Ubuntu 22.04 (`jammy`) though it seems that downgrading alone may be insufficient (see [actions/runner-images#11471](actions/runner-images#11471 (comment))).
    * To remedy this we are pinning the version of QEMU as suggested in [tonistiigi/binfmt#240](tonistiigi/binfmt#240 (comment))

  ## Checklist:

  - [x] I have performed a self-review of my own code
  - [x] I have commented my code, particularly in hard-to-understand areas **(note: N/A)**
  - [x] I have added or updated relevant unit/integration/functional/e2e tests **(note: N/A)**
  - [x] I have made corresponding changes to the documentation **(note: N/A)**
  - [x] I have assigned this pull request to a milestone _(for repository code-owners and collaborators only)_

ACKs for top commit:
  UdjinM6:
    utACK 20524e48150c1552e60e833b4b42684b734e0040

Tree-SHA512: 26bb2cd55a0267b56f86938d97ddfa32f0cdd8a2786c0366eedbcddf706e38b6af93cd29ab98ba420cbdbd112561ded61e2dba906c4b233ad737f24730f58ddc
@amotl
Copy link

amotl commented Feb 19, 2025

Dear @ashwin153,

thank you so much for your suggestion. We needed to apply that fix to fix the OCI build on ARM.

With kind regards,
Andreas.

NB: Shall this ticket be re-opened? Do participants of this conversation have any other suggestions or news on this topic?

@WyriHaximus
Copy link

NB: Shall this ticket be re-opened? Do participants of this conversation have any other suggestions or news on this topic?

IMHO yes until it's all fully working again without having to pin versions, or bump to an older Ubuntu version etc.

Brooooooklyn added a commit to napi-rs/napi-rs that referenced this issue Feb 20, 2025
Brooooooklyn added a commit to napi-rs/napi-rs that referenced this issue Feb 20, 2025
Xynnn007 added a commit to Xynnn007/kbs that referenced this issue Feb 21, 2025
Due to actions/runner-images#11471 we see
similar problems when using ubuntu24.04 to do cross build for arm64
images.

A workaround is to downgrade to ubuntu22.04.

Fixes confidential-containers#715

Signed-off-by: Xynnn007 <[email protected]>
Xynnn007 added a commit to Xynnn007/kbs that referenced this issue Feb 21, 2025
Due to actions/runner-images#11471 we see
similar problems when using ubuntu24.04 to do cross build for arm64
images.

A workaround is to downgrade to ubuntu22.04.

Fixes confidential-containers#715

Signed-off-by: Xynnn007 <[email protected]>
Xynnn007 added a commit to Xynnn007/kbs that referenced this issue Feb 21, 2025
Due to actions/runner-images#11471 we see
similar problems when using ubuntu24.04 to do cross build for arm64
images.

A workaround is to downgrade to ubuntu22.04 and use Qemu v7.

Fixes confidential-containers#715

Signed-off-by: Xynnn007 <[email protected]>
Xynnn007 added a commit to Xynnn007/kbs that referenced this issue Feb 21, 2025
Due to actions/runner-images#11471 we see
similar problems when using ubuntu24.04 to do cross build for arm64
images.

A workaround is to downgrade to ubuntu22.04 and use Qemu v8.

Fixes confidential-containers#715

Signed-off-by: Xynnn007 <[email protected]>
Xynnn007 added a commit to Xynnn007/kbs that referenced this issue Feb 21, 2025
Due to actions/runner-images#11471 we see
similar problems when using ubuntu24.04 to do cross build for arm64
images.

A workaround is to downgrade to ubuntu22.04 and use Qemu v8.

Fixes confidential-containers#715

Signed-off-by: Xynnn007 <[email protected]>
Xynnn007 added a commit to Xynnn007/kbs that referenced this issue Feb 21, 2025
Due to actions/runner-images#11471 we see
similar problems when using ubuntu24.04 to do cross build for arm64
images.

A workaround is to downgrade to ubuntu22.04 and use Qemu v8.

Fixes confidential-containers#715

Signed-off-by: Xynnn007 <[email protected]>
@amotl
Copy link

amotl commented Feb 22, 2025

Shall this ticket be re-opened?

IMHO yes until it's all fully working again without having to pin versions.

@kishorekumar-anchala: Based on the information above, can you please re-open this ticket, if you agree?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests