Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jdk_jfr tests fail on jdk8 hotspot xLinux #3253

Open
Haroon-Khel opened this issue Jan 21, 2022 · 6 comments
Open

jdk_jfr tests fail on jdk8 hotspot xLinux #3253

Haroon-Khel opened this issue Jan 21, 2022 · 6 comments
Labels
release triage triage required Issue needs deeper triage to determine which repo to move issue into

Comments

@Haroon-Khel
Copy link
Contributor

The following tests from jdk_jfr in the extended openjdk test suite fail on xLinux

https://ci.adoptopenjdk.net/job/Test_openjdk8_hs_extended.openjdk_x86-64_linux_testList_1/30/testReport/

jdk/jfr/event/compiler/TestCompilerCompile.java.TestCompilerCompile

[2022-01-19T14:19:55.882Z] TEST RESULT: Error. Program 
`/home/jenkins/workspace/Test_openjdk8_hs_extended.openjdk_x86-64_linux_testList_1/openjdkbinary/j2sdk-image/bin/java' 
timed out (timeout set to 960000ms, elapsed time including timeout handling was 960365ms).

jdk/jfr/event/runtime/TestShutdownEvent.java.TestShutdownEvent

[2022-01-19T14:27:29.508Z] TEST RESULT: Error. Program 
`/home/jenkins/workspace/Test_openjdk8_hs_extended.openjdk_x86-64_linux_testList_1/openjdkbinary/j2sdk-image/bin/java' 
timed out (timeout set to 960000ms, elapsed time including timeout handling was 960366ms).

jdk/jfr/event/runtime/TestShutdownEvent.java.TestShutdownEvent has been excluded on aarch64 linux but not yet for xLinux
#3007

@smlambert
Copy link
Contributor

Deep history shows this test passes occasionally. Can we verify the status of the jdk_jfr tests from the October release on this platform? If we hit the same issues with last release (and reran to see it passing), we will do the same with this release. Try grinding on the machine its shown to pass in the deep history view (test-docker-ubuntu2010-x64-3). If passes, what is special about that machine in relation to what this test is doing (look at test code). If fails, run with ITERATIONS=10 to see what is the pass rate.

We need to take extra action to try and figure out root cause (test / infra / product) and raise an issue in the appropriate location.

If test issue:
We should determine if this is a 'flaky' test (JBS report for test material issue)

If infra issue:
one that needs a particular machine setup and raise/fix infra requirements (adoptium/infrastructure issue).

If intermittent product issue:
JBS report for JFR issue

@smlambert smlambert added the triage required Issue needs deeper triage to determine which repo to move issue into label Jan 21, 2022
@Haroon-Khel
Copy link
Contributor Author

Rerunning on the machine it last passed on, test-docker-ubuntu2010-x64-3, https://ci.adoptopenjdk.net/job/Grinder/3263/console

@jiekang
Copy link
Contributor

jiekang commented Jan 25, 2022

@Haroon-Khel @smlambert

The deep history log provides a dump when the timeout occurs:
https://trss.adoptium.net/output/test?id=61e848cba7df1a2924028e9a

TestCompilerCompiles has:

[2022-01-19T14:46:27.450Z] "MainThread" #9 prio=5 os_prio=0 tid=0x00007f0e2c12a800 nid=0x158efa waiting on condition [0x00007f0df70ee000]
[2022-01-19T14:46:27.450Z]    java.lang.Thread.State: TIMED_WAITING (sleeping)
[2022-01-19T14:46:27.450Z] 	at java.lang.Thread.sleep(Native Method)
[2022-01-19T14:46:27.450Z] 	at jdk.test.lib.Utils.waitForCondition(Utils.java:530)
[2022-01-19T14:46:27.450Z] 	at jdk.test.lib.Utils.waitForCondition(Utils.java:499)
[2022-01-19T14:46:27.450Z] 	at jdk.jfr.event.compiler.TestCompilerCompile.doTest(TestCompilerCompile.java:85)
[...]
[2022-01-19T14:46:27.450Z] Warning: 'NoSuchMethodError' on register of sun.hotspot.WhiteBox::NMTNewArena(J)J
[2022-01-19T14:46:27.450Z] Warning: 'NoSuchMethodError' on register of sun.hotspot.WhiteBox::NMTFreeArena(J)V
[2022-01-19T14:46:27.450Z] Warning: 'NoSuchMethodError' on register of sun.hotspot.WhiteBox::NMTArenaMalloc(JJ)V

which is the test waiting for the request to compile the method via WhiteBox.

TestShutdownEvent has:

[2022-01-19T14:54:31.488Z] "MainThread" #9 prio=5 os_prio=0 tid=0x00007fd7dc12a000 nid=0x159ece in Object.wait() [0x00007fd787db3000]
[2022-01-19T14:54:31.488Z]    java.lang.Thread.State: WAITING (on object monitor)
[2022-01-19T14:54:31.488Z] 	at java.lang.Object.wait(Native Method)
[2022-01-19T14:54:31.488Z] 	- waiting on <0x00007fd7c309e3a0> (a java.lang.UNIXProcess)
[2022-01-19T14:54:31.488Z] 	at java.lang.Object.wait(Object.java:502)
[2022-01-19T14:54:31.488Z] 	at java.lang.UNIXProcess.waitFor(UNIXProcess.java:395)
[2022-01-19T14:54:31.488Z] 	- locked <0x00007fd7c309e3a0> (a java.lang.UNIXProcess)
[2022-01-19T14:54:31.488Z] 	at jdk.test.lib.process.ProcessTools.getOutput(ProcessTools.java:98)
[2022-01-19T14:54:31.488Z] 	at jdk.test.lib.process.OutputAnalyzer.<init>(OutputAnalyzer.java:48)
[2022-01-19T14:54:31.488Z] 	at jdk.test.lib.process.ProcessTools.executeProcess(ProcessTools.java:412)
[2022-01-19T14:54:31.488Z] 	at jdk.jfr.event.runtime.TestShutdownEvent.runSubtest(TestShutdownEvent.java:82)

where it's waiting on the code to actually start up the subprocess. However the stdout has:

[2022-01-19T14:54:31.493Z] ...
[2022-01-19T14:54:31.493Z] Output overflow:
[2022-01-19T14:54:31.493Z] JT Harness has limited the test output to the text
[2022-01-19T14:54:31.493Z] at the beginning and the end, so that you can see how the
[2022-01-19T14:54:31.493Z] test began, and how it completed.
[2022-01-19T14:54:31.493Z] 
[2022-01-19T14:54:31.493Z] If you need to see more of the output from the test,
[2022-01-19T14:54:31.493Z] set the system property javatest.maxOutputSize to a higher
[2022-01-19T14:54:31.493Z] value. The current value is 100000
[2022-01-19T14:54:31.493Z] ...

So we can't see exactly which subprocess was stuck.

I'd consider them both test/infra issues rather than product issues.

@smlambert
Copy link
Contributor

Thanks for this assessment @jiekang - based on it and the fact that this test passes intermittently on some machines (test-docker-ubuntu2010-x64-3 has both passed and failed before which also potentially indicates infra/test issues), we will not block the release on these failures, but indeed we should figure out which subprocess is stuck and why...

@Haroon-Khel - could it be the case that there are some processes laying around that then block some of this tests subprocesses from running? and when we do a cleanup of the machines (static docker containers) we see 1 passing run before the machine is back to a state where this test fails again.

@Haroon-Khel
Copy link
Contributor Author

could it be the case that there are some processes laying around that then block some of this tests subprocesses from running? and when we do a cleanup of the machines (static docker containers) we see 1 passing run before the machine is back to a state where this test fails again.

I'll take a look

@Haroon-Khel
Copy link
Contributor Author

Haroon-Khel commented Jan 25, 2022

On test-docker-ubuntu2010-x64-3 there was a rogue Test_openjdk17_hs_extended.openjdk_x86-64_linux_testList_1 process from jan 18 still running.

Also, I noticed that test-docker-ubuntu2010-x64-3 is actually test-docker-fedora35-x64-1; I replaced test-docker-ubuntu2010-x64-3 with test-docker-ubuntu2110-x64-1 a few weeks back, and I must have forgotten to remove the jenkins node for test-docker-ubuntu2010-x64-3, and then when setting up test-docker-fedora35-x64-1 I gave it the same port on the dockerhost. That may explain why the test no longer passes on test-docker-ubuntu2010-x64-3, and so does hint towards this issue being a machine specific problem

Anyway, rerunning on test-docker-fedora35-x64-1 https://ci.adoptopenjdk.net/job/Grinder/3291/console

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release triage triage required Issue needs deeper triage to determine which repo to move issue into
Projects
Status: Todo
Development

No branches or pull requests

4 participants