-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mandrel cannot dynamically load awt shared libraries from OpenJDK distro #487
Comments
Can we disable AWT tests for 23.x for the time being until this is resolved? |
FWIW I tried copying the so files from the GraalVM build to the Mandrel build and it still fails, so to me it doesn't look like an issue with the so files. |
Maybe the shim generation? |
If I am interpreting the logs correctly
On the contrary, when using Mandrel if fails to find the symbol in
which makes me think that this is not an issue related to the .so files from the JDK being copied/generated along the image. The reason the symbol doesn't end up in Note that the shim generation in Mandrel and GraalVM is responsible for the "generation" (copying actually) of:
Lines 249 to 252 in 7c66772
While it also generates the dummy/empty:
Lines 296 to 301 in 7c66772
the actual contents (at least the reachable ones) of which are expected to be statically linked in |
When I grab LabsJDK jdk-17.0.7+4 and build it, e.g. with
...and then build Mandrel with it, using Graal's fairly recent master a81ecbe, and then use that to run the reproducer, it fails in the same way as if I built with vanilla JDK. So it's probably not about what is different in LabsJDK but rather about how it was built.
|
Thanks for that. Yes, that matches my understanding. The weird thing is this empty |
Thanks for verifying this, @Karm. |
@jerboaa Can you see anything interesting here?
Source: https://github.com/graalvm/labs-openjdk-17/blob/master/.github/workflows/build-linux.yml Comparing to mine:
I have zlib bundled, they have the system one... Not sure what the make labsjdk goal does though. EDIT:
And then use the
No dice. |
I wanted to get Mandrel or Graal with libawt with debuginfo, Mandrel cannot build with LabsJDK Debug, so I built with LabsJDK and shamelessly replaced lib directory later, with what I found in LabsJDK debug. The resulting Mandrel distro is unable to build my Quarkus app, so that's a failure. Yet the reason why is a linker failing on:
What is more interesting to me is that the whole linker command contains also our infamous
in the log:
|
That's done here: Line 186 in 770da72
|
It's only the static libraries which seem to make a difference. I've stopped the native image build, collected the link command and relevant files so I can do further experiments. The link command is this:
If I point this ( FWIW, the linker version script doesn't seem to matter. I've already tried with this base-case:
|
Experimenting with the above I ended up with the following working command:
The moment I use either of:
from the Mandrel build it fails. Inspecting |
Thanks for this, yes some flag when compiling the OpenJDK static libs seems to cause this. We "just" need to figure out which it is ;-) |
@zakkak I'm curious, how did you manage to see this? Which tool did you use and how? Thanks! |
Redirected the output of
That's proving pretty hard so far, because I can't find the |
It's these switches apparently: |
https://bugs.openjdk.org/browse/JDK-8239563 is related, but I need to figure out how to use that infra to produce the desired result. |
And here it is: graalvm/labs-openjdk-17@f5100f0 |
just did the same for 20 :) graalvm/labs-openjdk-20@be6b24f The thing is that @Karm said that when building labsjdk on his own he still sees the failures, which doesn't seem to align with the fact that the change in in the repo :/ |
He used an earlier tag, which doesn't have that change: #487 (comment) |
Oh I see, he didn't grab a |
I'll do some verification and then try to bring this upstream. |
So apparently we need to change how the static builds we use in Mandrel are being built. While @jerboaa works on achieving this. I experimented with a hacky alternative that allows us to change the visibility of the symbols post-build. This is meant to be a possible work around to get things working until we have a proper fix. The goal is to unhide the symbols we need (anything listed in the export.list provided by GraalVM/Mandrel to the linker script ). Thanks to this SO post we can achieve this doing the following: PrerequisiteTo do the unhiding we will use a small tool called git clone https://github.com/BR903/ELFkickers
cd ELFkickers/rebind
make
export PATH=$PATH:$(pwd)/ Patch the libraries# unarchive the static library
mkdir tmp-dir
cd tmp-dir
ar x /path/to/libjava.a # (might need to apply this to other libs as well)
# for symbol in symbols from export.list;
# do
for f in *.o;
do
rebind --visibility default $f $symbol;
done
# done
ar r /path/to/libjava.a *.o |
Upstream bug for the visibility change: https://bugs.openjdk.org/browse/JDK-8304871 |
Integrated in mainline. JDK 20 and 17 backports in progress. Edit: Fix is going to be in OpenJDK |
Yep, fix will be in |
With this mandrel build:
The reproducer passes for me. That is, it's solved with any EA build equal or later than |
(Milestone: sorry for the noise, twitchy fingers...) |
This issue appears to be stale because it has been open 30 days with no activity. This issue will be closed in 7 days unless |
Since this comment, 23.0 mandrel release date has changed to June. Are you OK with closing this? |
Hmm. O.K. We can clone JDK 20 specific one? |
Yes I believe cloning this and making the clone JDK 20 specific, adding an affects/23.0 label, and assigning it to milestone 23.0.1.0-Final should be the way to go. |
The fix is in JDK 20. It'll get released in July. |
This is a followup on
The relevant change in GraalVM's behavior is: oracle#4921
TL;DR:
.so
libraries load just fine when Mandrel is built with LabsJDK distro ✔️.so
libs cannot be loaded,dlopen
returns null pointer, when Mandrel is built with OpenJDK Temurin distro ❌Steps to reproduce
With this tiny reproducer app:
I am unable to make it work at runtime with OpenJDK Temurin based Mandrel. It fails at
dlopen
(although the file is really there and it seems found and loaded by the nativedl
).Steps to reproduce - Temurin OpenJDK ❌
(
-g -O0 -H:+TrackNodeSourcePosition
is just to usegdb
, no impact on reproducibility of the issue)It fails to load needed .so libs:
Full log: mandrel-openjdk-log.txt [910K]
I am using a home built Mandrel, mandrel-23.0-SNAPSHOT.tar.xz [195M],
made of GraalVM a81ecbe with this diff:
Peek into the log
You can see that the native
dlopen
failed, but not because it wouldn't find the file. It rather failed to load its dependencies' symbols, IMHO.See the mandrel-openjdk-log.txt [910K] between line
and line
You can find this:
With a gdb gdb-session.txt [3.9K], one can see that it is trying to fine any suitable
libawt.so
and it might look like the file is just not found. The file is found all right the first time, it just fails to load.LabsJDK ✔️
Building exactly the same Mandrel distro as mentioned above, this time with LabsJDK 17.0.7+4-jvmci-23.0-b09-linux-amd64 makes it work.
See the full successful log: mandrel-labs-log.txt [4.0M]
Between lines:
and
The AWT lib was successfully loaded.
I am not sure at the moment, whether we need a change on the Temurin side, in how it builds those shared objects or whether we could adjust the shim generation on the Mandrel side...
FYI @zakkak @jerboaa
The text was updated successfully, but these errors were encountered: