-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: TestSegv/Segv failure with 'unknown pc' on linux/ppc64le #52963
Comments
coredump shows the command that failed is I'm confused by the stack shows "net.(*file).getLineFromData+0x58", which is an offset to |
@prattmic @cherrymui any advise? |
The signal PC 0x3fe45b3c1a is weird. It doesn't look like a Go PC. If you have a core dump, can you check what that address is? Does it belong to (say) a memory mapping of a C shared library? |
No mapping around 0x3fe45b3c1a.
|
Just to double check, that is from the core dump of the exact crash shown above, not a different run, correct? Because those mappings seem similar enough for 0x3fe45b3c1a to be the randomized load address in a different run. |
Yes, It's from coredumpctl and here is the file I've tried to run the test 100 times and no luck. |
Thanks. Then the address is not a PC at all. It's unclear to me how we get there, or how it gets into the signal context. |
Debug logs, Something is weird.
This frame.lr != 0 so it should be two defer of frame.sp IMHO three possible explanations:
|
Where did you see that? I didn't see it.
This line is printed from the signal handler, which indicates the signal context's PC is 0x3fe45b3c1a. For such a bad PC I don't think the traceback code can do anything useful. The question is why we get such a PC in the signal context. |
Just my understand on frame stack:
which aligned with frame struct
The code seems fine (why sigcode=0?) runtime/defs_linux_riscv64.go
|
No. That hex dump is not the stkframe structure. They are unrelated. Also, traceback probably doesn't matter here. We start with a bad signal PC in the first place.
sigcode=0 (i.e. _SI_USER) is expected. The test is testing a user-sent SIGSEGV. Hmmm, if the ucontext fields are wrong, I'd think many things would go wrong, not just this failure. |
You're right about ucontext fields. |
The LR is 0x10f0f2. Could you look at the core and see what that address is? It is not a Go PC either. But it belongs to the program's mapping. PLT stub? |
The original binary had been deleted so I rebuild with the commit 99d6300 go/src/runtime/cgo/gcc_linux_riscv64.c Lines 32 to 38 in fd0ffed
|
Thanks. Could you disassemble to function and see if it is a PC immediately after a call instruction? It looks like it is. It is calling into libc (or libpthread). Maybe the C library is doing something weird (especially when it is manipulating the signal mask)? |
Here is the binary testprogcgo.zip
|
I think a couple recent failures on the ppc64le builder are related. I turned off ASLR on the ppc64le builder and eventually reproduced a similar failure trying to reproduce on the ppc64le VM. 2022-09-08T21:16:39-a9a3982/linux-ppc64le-buildlet The PC is within libc's pthread_sigmask, called from _cgo_sys_thread_start. |
Change https://go.dev/cl/430375 mentions this issue: |
Indeed, the failure mode at least matches the same pattern. It's not clear to me whether the underlying cause is the same.
CC @golang/ppc64 |
These look very similar. In the above debugging and in Paul's debugging, it ended up in the pthread_sigmask code. Also, in all the logs where it fails, the stack trace shows that the goroutine for Segv.func1 is calling preemption code. If I run the test manually and it passes, I don't see preempt code in the stack trace. I don't know all the details of what happens during preemption. Perhaps another goroutine or thread starts running so the test ends up sending the SEGV signal to the wrong one which happens to be C code. Then the Go backtrace doesn't always work since slot 0 in the stack frame is the backchain for the stack and not a PC like it is in Go code, and that commonly results in that unknown PC error. |
Found new dashboard test flakes for:
2022-08-10 17:59 linux-ppc64le-power9osu go@9f8685f4 runtime.TestSegv (log)
2022-09-08 21:16 linux-ppc64le-buildlet go@a9a39822 runtime.TestSegv (log)
2022-09-21 15:30 linux-ppc64le-power9osu go@e246cf62 runtime.TestSegv (log)
|
Found new dashboard test flakes for:
2023-01-17 19:54 linux-ppc64le-power10osu go@b003ee49 runtime.TestSegv (log)
|
Found new dashboard test flakes for:
2023-01-31 17:44 linux-ppc64le-power10osu go@7d7fd6d3 runtime.TestSegv (log)
|
Found new dashboard test flakes for:
2023-02-27 19:11 linux-ppc64le-power9osu go@132fae93 runtime.TestSegv (log)
|
Found new dashboard test flakes for:
2023-03-22 11:02 linux-ppc64le-buildlet go@9b6231a1 runtime.TestSegv (log)
|
Found new dashboard test flakes for:
2023-05-02 15:36 linux-ppc64le-buildlet go@2d83b646 runtime.TestSegv (log)
|
Found new dashboard test flakes for:
2023-05-12 12:35 linux-ppc64le-power10osu go@b26c3927 runtime.TestSegv (log)
|
Change https://go.dev/cl/500535 mentions this issue: |
greplogs -l -e 'FAIL: TestSegv/Segv .*(?:\n[ ]{8}.*)*unknown pc' --since=2022-01-01
2022-05-16T19:48:35-99d6300/linux-riscv64-unmatched
Compare #50979, #47537.
(attn @golang/riscv64; CC @golang/runtime)
The text was updated successfully, but these errors were encountered: