Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix trampoline on PPC #38980

Merged
merged 3 commits into from
Dec 25, 2020
Merged

Fix trampoline on PPC #38980

merged 3 commits into from
Dec 25, 2020

Conversation

vchuravy
Copy link
Member

In my earlier tests I was checking on a machine with powepc64le as arch, thus us skipping the inclusion of trampolines.

extern void (*func)();

void main() {
  (*func)(); 
}

yields

       .globl main
        .type   main, @function
main:
.LFB0:
        .cfi_startproc
.LCF0:
0:      addis 2,12,.TOC.-.LCF0@ha
        addi 2,2,.TOC.-.LCF0@l
        .localentry     main,.-main
        mflr 0
        std 0,16(1)
        std 31,-8(1)
        stdu 1,-112(1)
        .cfi_def_cfa_offset 112
        .cfi_offset 65, 16
        .cfi_offset 31, -8
        mr 31,1
        .cfi_def_cfa_register 31
        std 2,24(1)
        addis 9,2,.LC0@toc@ha
        ld 9,.LC0@toc@l(9)
        ld 9,0(9)
        mr 12,9
        mtctr 12
        bctrl
        ld 2,24(1)

So the important changes I made were:

  1. Add a global entry that calculates r2 correctly
  2. Load the function pointer out of the toc, based on r2

Things that confuse me:

  1. The ABI docs and the C example say that I should do de reference the TOC entry instead of directly using it,
    but I observed that the pointer obtained is already the correct one
  2. I had to remove std 2,24(1); \ which is odd, since IIUC it's optional but including it caused segmentation faults down the line.

This passes tests on my development machine.

@staticfloat do you recall what you based the original implementation on?

cc: @xorJane

@vchuravy vchuravy added system:powerpc PowerPC backport 1.6 Change should be backported to release-1.6 labels Dec 23, 2020
@vchuravy vchuravy added this to the 1.6 blockers milestone Dec 23, 2020
@vchuravy
Copy link
Member Author

@staticfloat I added an error check to the initialization and surprisingly it get's triggered on aarch64/macos/windows

ERROR: Unable to load jl_compile_extern_c from libjulia-internalmake[1]: *** [/buildworker/worker/package_linuxaarch64/build/usr/lib/julia/corecompiler.ji] Error 1

Copy link
Member

@staticfloat staticfloat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you recall what you based the original implementation on?

Keno and I were mimicking the PLT disassembly from other projects, such as libgit2.so:

$ objdump -d libgit2.so | grep -E "^[0-9a-f]+.*plt_call.*:" -A 4 | head
0000000000016400 <0000001b.plt_call.__gmon_start__>:
   16400:       18 00 41 f8     std     r2,24(r1)
   16404:       60 8b 82 e9     ld      r12,-29856(r2)
   16408:       a6 03 89 7d     mtctr   r12
   1640c:       20 04 80 4e     bctr

That being said, our method of calculating the pointer value was haphazard and we went with the first reasonable thing that compiled and seemed to work. I trust your tests more than what we put together, but perhaps if @Keno takes a look, he'll have an opinion.

I added an error check to the initialization and surprisingly it get's triggered on aarch64/macos/windows

If I'm reading this properly, this is because jl_compile_extern_c is incompletely exported; it appears in jl_exported_funcs.inc and yet is not marked as JL_DLLEXPORT in julia.h, it's only declared in julia_internal.h. Man, it's almost like whoever wrote jl_exported_funcs.inc had no idea what they were doing..... ;)

@vchuravy vchuravy force-pushed the vc/error_on_not_loading branch from c5197f6 to 47546b5 Compare December 24, 2020 03:02
@vchuravy vchuravy marked this pull request as ready for review December 24, 2020 03:02
@vchuravy
Copy link
Member Author

That being said, our method of calculating the pointer value was haphazard and we went with the first reasonable thing that compiled and seemed to work.

Fair enough, the linker actually optimizes the toc lookup I wrote to something that looks very similar to what you wrote, so I suspect we only needed the global entry point and remove the store of the r2.

This actually raises another point. The code I encountered this on was

ccall(:jl_toplevel_eval_in, Any, (Any, Any),
, and we ended up using the symbol exported by the CLI and not the libjulia-internal. @vtjnash should we re-order the lookup priority? Otherwise most runtime calls will have to go through the trampoline.

@vchuravy
Copy link
Member Author

@staticfloat My check seems to conservative :)

src/processor_x86.cpp
7:extern "C" JL_DLLEXPORT void jl_cpuid(int32_t CPUInfo[4], int32_t InfoType)
30:extern "C" JL_DLLEXPORT void jl_cpuidex(int32_t CPUInfo[4], int32_t InfoType, int32_t subInfoType)

@Keno
Copy link
Member

Keno commented Dec 25, 2020

CI is unhappy with this

@vchuravy
Copy link
Member Author

CI is unhappy with this

I split out the verification into a separate draft PR #38994, since that is orthogonal to this PR.

@Keno Keno merged commit 29d5d60 into master Dec 25, 2020
@Keno Keno deleted the vc/error_on_not_loading branch December 25, 2020 07:36
KristofferC pushed a commit that referenced this pull request Dec 27, 2020
* [Make] Normalize ppc64le to powerpc64le

* [CLI] Fix trampoline on PowerPC

1. Add global entry
2. Don't store `r2` on parent frame
3. Conservative load from toc

* remove jl_compile_extern_c from exported funcs

(cherry picked from commit 29d5d60)
@KristofferC KristofferC mentioned this pull request Dec 27, 2020
26 tasks
@KristofferC KristofferC removed the backport 1.6 Change should be backported to release-1.6 label Jan 6, 2021
staticfloat pushed a commit that referenced this pull request Jan 15, 2021
* [Make] Normalize ppc64le to powerpc64le

* [CLI] Fix trampoline on PowerPC

1. Add global entry
2. Don't store `r2` on parent frame
3. Conservative load from toc

* remove jl_compile_extern_c from exported funcs

(cherry picked from commit 29d5d60)
ElOceanografo pushed a commit to ElOceanografo/julia that referenced this pull request May 4, 2021
* [Make] Normalize ppc64le to powerpc64le

* [CLI] Fix trampoline on PowerPC

1. Add global entry
2. Don't store `r2` on parent frame
3. Conservative load from toc

* remove jl_compile_extern_c from exported funcs
staticfloat pushed a commit that referenced this pull request Dec 23, 2022
* [Make] Normalize ppc64le to powerpc64le

* [CLI] Fix trampoline on PowerPC

1. Add global entry
2. Don't store `r2` on parent frame
3. Conservative load from toc

* remove jl_compile_extern_c from exported funcs

(cherry picked from commit 29d5d60)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants