Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Known profiling issues where unwinding fails #56332

Open
IanButterworth opened this issue Oct 25, 2024 · 5 comments
Open

Known profiling issues where unwinding fails #56332

IanButterworth opened this issue Oct 25, 2024 · 5 comments
Labels

Comments

@IanButterworth
Copy link
Member

IanButterworth commented Oct 25, 2024

Experience

Show Profile.print and most of the samples are hidden as C frames, with no depth when printed via C=true
i.e.

  127╎127   @julialib/libopenblas64_.so:?  dgemm_beta_ZEN
 5910╎5910  @julialib/libopenblas64_.so:?  dgemm_kernel_ZEN

Reason

Samples that are hidden as C frames with no depth in the Profile tree are indicative of the unwinder not succeeding to unwind back from library code through julia's code.

Potential causes

I'm just collecting comments from slack. Making dedicated issues might make sense, or just PRs to fix.

  1. LLVM-libunwind doesn't have support for unwinding during prologue/epilogue

  2. The platform for AArch64 prohibits async unwinding due to a bug in the spec

  3. Apple compiles most of their libraries only with compact unwind, which by-design is not able to encode async unwinding as that makes it more compact

    • Need to get Apple to fix at their end?
    • Implement workarounds?

Quoting @vtjnash

We can make some moderately educated guesses based upon inspecting the registers and disassembling the code there, which is the usual approaches to this
E.g. if we see the instruction is a syscall, we can guess that this is libc and that we can skip this frame (just set PC=LR)
other times we can try to hope the FP points at the current frame pointer, so popping that gives the old {FP, LR}

  1. dgemm kernels in openblas are missing correct unwind info. (Empty profile information on v1.11 #56327 looked like this issue but was actually a brief bug with profiling only collecting on thread 1)

Please feel free to refine this summary directly or suggest changes.

@IanButterworth IanButterworth changed the title Known profiling issues Known profiling issues where unwinding fails Oct 25, 2024
@topolarity
Copy link
Member

@IanButterworth is (4) totally resolved at this point?

It seems like it's still pretty easy to end up with top-level dgemm_* entries if you do @profile peakflops()

@IanButterworth
Copy link
Member Author

I don't think the gemm thing is fixed. I'm not aware of a PR. I think the issue got confused with a threading bug.

@mlechu
Copy link
Contributor

mlechu commented Jan 23, 2025

@IanButterworth Chiming in because I was looking at this—do you know of any library with this issue other than OpenBLAS?

@gbaraldi
Copy link
Member

System libraries like libc for example

@topolarity
Copy link
Member

System libraries like libc for example

There's not much we can do about those right now, right? (due to (3), presumably)

It'd be especially useful to have a test case for (1), if we think we need that ported for AArch64

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants