Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MinGW GCC LTO internal compiler error when linking: cannot read 'LTO_section_decls' #102867

Closed
akien-mga opened this issue Feb 14, 2025 · 7 comments · Fixed by #103077 · May be fixed by #103176
Closed

MinGW GCC LTO internal compiler error when linking: cannot read 'LTO_section_decls' #102867

akien-mga opened this issue Feb 14, 2025 · 7 comments · Fixed by #103077 · May be fixed by #103176

Comments

@akien-mga
Copy link
Member

Tested versions

System information

Fedora 41, mingw64-gcc-14.2.1-3.fc41, mingw64-headers-12.0.0-3.fc41, mingw64-binutils-2.42-2.fc41

Issue description

This is the bug report I promised in #102506, describing the actual bug that #102506 just hacks around, but that PR should be reverted and the bug fixed properly. This is very likely to be a GCC or binutils bug, and not a Godot bug.

Reproducible this bug requires (for now) to revert #102506 locally, as it seems like reintroducing this unnecessary code prevents the LTO internal compiler error. It's all fairly brittle though and I'm not confident it won't come back as we add/change other code, hence why this is a release blocker.

When compiling current master (b607110) with #102506 reverted and the following command, using MinGW-GCC:

scons p=windows target=editor arch=x86_64 production=yes module_mono_enabled=yes

(mono module and LTO are important)

The linking steps fails with:

lto1: internal compiler error: cannot read 'LTO_section_decls' from /tmp/ccf5RNMk.ltrans115.o
Please submit a full bug report, with preprocessed source (by using -freport-bug).
See <http://bugzilla.redhat.com/bugzilla> for instructions.
make: *** [/tmp/cct9c3iK.mk:347: /tmp/ccf5RNMk.ltrans115.ltrans.o] Error 1
make: *** Waiting for unfinished jobs....
lto-wrapper: fatal error: make returned 2 exit status
compilation terminated.
/usr/lib/gcc/x86_64-w64-mingw32/14.2.1/../../../../x86_64-w64-mingw32/bin/ld: error: lto-wrapper failed
collect2: error: ld returned 1 exit status

Or this:

lto1: error: two or more sections for .gnu.lto__ZZL39_register_variant_builtin_methods_arrayvEN17Method_Array_back17get_argument_typeEi.lto_priv.0.37928630.b9f437417adc9a80
(null):0: confused by earlier errors, bailing out
make: *** [/tmp/cc3CZrbQ.mk:347: /tmp/ccmO0Wsr.ltrans115.ltrans.o] Error 1
make: *** Waiting for unfinished jobs....
lto-wrapper: fatal error: make returned 2 exit status
compilation terminated.
/usr/lib/gcc/x86_64-w64-mingw32/14.2.1/../../../../x86_64-w64-mingw32/bin/ld: error: lto-wrapper failed
collect2: error: ld returned 1 exit status

I debugged a bit with @bruvzg, who could also reproduce this with mingw-gcc on macOS with GCC 14.2.0. He tried with -freport-bug as advised by the error but that seems to prevent the bug.

@Repiteo noted that the second error is similar to the one in #92585 (note: x86_32 there, while here it's on x86_64), which we just worked around at the time by updating the base Fedora version to get newer mingw/gcc/binutils.

But here we're using the latest version provided on Fedora 41. mingw-headers and mingw-gcc on Fedora 42 aren't newer, but mingw-binutils is at 2.43.1, might be worth testing against.

Since the error popped up between 4.4.beta2 and 4.4.beta3, I bisected it and landed on #102179. There's evidently nothing egregious in that patch that should trigger a LTO crash. Through trial and error, I manage to find a workaround in #102506 by doing a minimal partial revert of #102179. Reintroducing some of that code, even in a path that will never be executed (but could be, as it's checked by an undocumented project setting), seems to prevent the crash.

We also noticed that the presence of LTO objects from a previous build can also be a cause for the issue. But with a git clean -fxd this issue is still reproducible when reverting #102179.

Obviously, the problem is deeper so we need to dig. That's where I page @hpvb as the resident expert in debugging GCC bugs :D

Steps to reproduce

Set up latest mingw with GCC 14.2.

git revert e12a424bc569ac5ae9ff2944a8df7fc11b157d71
git clean -fxd
scons p=windows target=editor arch=x86_64 production=yes module_mono_enabled=yes

Minimal reproduction project (MRP)

n/a

@akien-mga
Copy link
Member Author

But here we're using the latest version provided on Fedora 41. mingw-headers and mingw-gcc on Fedora 42 aren't newer, but mingw-binutils is at 2.43.1, might be worth testing against.

So I actually tested reproducing the issue in a fedora:42 podman container, and it seems fixed there. So it might have been a bug in binutils 2.42 that was fixed in 2.43.1 already.

If that's the case, we could just update to Fedora 42 already for official builds, or backport binutils 2.43.1.
It wouldn't solve the problem for end users whose latest MinGW distribution wouldn't have binutils 2.43.1 yet.

@bruvzg
Copy link
Member

bruvzg commented Feb 14, 2025

So it might have been a bug in binutils 2.42 that was fixed in 2.43.1 already.

macOS MinGW build I was able to reproduce it with seems to be using binutils 2.43.1.

Sources for the MinGW package are:

url "https://downloads.sourceforge.net/project/mingw-w64/mingw-w64/mingw-w64-release/mingw-w64-v12.0.0.tar.bz2"
sha256 "cc41898aac4b6e8dd5cffd7331b9d9515b912df4420a3a612b5ea2955bbeed2f"

url "https://ftp.gnu.org/gnu/binutils/binutils-2.43.1.tar.bz2"
sha256 "becaac5d295e037587b63a42fad57fe3d9d7b83f478eb24b67f9eec5d0f1872f"

url "https://ftp.gnu.org/gnu/gcc/gcc-14.2.0/gcc-14.2.0.tar.xz"
sha256 "a7b39bc69cbf9e25826c5a60ab26477001f7c08d85cec04bc0e29cabed6f3cc9"

@akien-mga
Copy link
Member Author

macOS MinGW build I was able to reproduce it with seems to be using binutils 2.43.1.

Yeah I tested too in a fedora:41 podman container, installing the mingw-binutils packages from Fedora 42 (mingw-binutils-generic-2.43.1-3.fc42.x86_64.rpm and mingw64-binutils-2.43.1-3.fc42.x86_64.rpm) and I still reproduce the bug.

So there might be another component in Fedora 42 that's different, or it's just a coincidence.

@hpvb
Copy link
Member

hpvb commented Feb 17, 2025

I've been able to reproduce this, I'm doing some work with gdb on the wrapper to see if I can work out what's going on.

@hpvb
Copy link
Member

hpvb commented Feb 17, 2025

I have not been able to figure out what the problem is exactly, but I have been able to build Godot successfully with LTO with the following ccflags and linkflags: -fno-use-linker-plugin -fwhole-program this should result in at least similar optimization as with the normal lto.

At least for now I recommend using this while I try and figure out with the gcc folk what is up.

@bruvzg
Copy link
Member

bruvzg commented Feb 18, 2025

Retested it with current master + reverted e12a424 and seems like it's no longer failing on macOS (toolchain is the same), so it's pretty random (probably timing sensitive race condition).

@hpvb
Copy link
Member

hpvb commented Feb 18, 2025

@bruvzg it is not likely timing sensitive. The problem reproduces with or without using multiple threads.

The problem is almost certainly in the "Whole Program Analysis" (WPA) phase of LTO. This is the phase that will partition the LTO work into smaller objects to be processed in series or in parallel. This phase seems to partition the work wrong, which then results in the 115th partition to not be linkable.

The issue is that almost any change to the build will cause the problem to go away. For instance just turning LTO off for literally any of the .a or .o files, just one, causes the link to succeed.

Because of this the problem is turning out to be very hard to debug. But I'm afraid that it is not a race condition.

akien-mga added a commit to akien-mga/godot that referenced this issue Feb 20, 2025
…gram`

- Works around and closes godotengine#102867.
- Works around and closes godotengine#102982.

Co-authored-by: Hein-Pieter van Braam-Stewart <[email protected]>
rt9391 pushed a commit to rt9391/rt9391godot2 that referenced this issue Feb 21, 2025
…gram`

- Works around and closes godotengine#102867.
- Works around and closes godotengine#102982.

Co-authored-by: Hein-Pieter van Braam-Stewart <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment