Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade to LLVM v8.0.1 #32712

Merged
merged 3 commits into from
Nov 20, 2019
Merged

Upgrade to LLVM v8.0.1 #32712

merged 3 commits into from
Nov 20, 2019

Conversation

staticfloat
Copy link
Member

@staticfloat staticfloat commented Jul 27, 2019

Upgrade to LLVM v8.0.1, with BinaryBuilder tarballs to match.

edit: closes #31921

@staticfloat staticfloat requested a review from vchuravy July 27, 2019 20:15
@staticfloat
Copy link
Member Author

staticfloat commented Jul 27, 2019

Welp, at least the whitespace check passed.

  • We need to update the analyzegc pass to look at the right header location, as well as change some of the source.
  • The LLVM build system seems to only install libLLVM.so, while libjulia has linked against libLLVM-8.so. Interesting.
  • Win32 is segfaulting during bootstrap

@vchuravy
Copy link
Member

Thanks Elliot!

The LLVM build system seems to only install libLLVM.so, while libjulia has linked against libLLVM-8.so. Interesting.

Aren't we pulling that from llvm-config?

@vchuravy
Copy link
Member

I fixed the source for GCChecker just now.

@vchuravy
Copy link
Member

@Keno added three patches for wasm that we will need to pull as well.

julia/deps/llvm.mk

Lines 442 to 444 in 442d159

$(eval $(call LLVM_PATCH,llvm-6.0-D63688-wasm-isLocal))
$(eval $(call LLVM_PATCH,llvm-6.0-D64032-cmake-cross))
$(eval $(call LLVM_PATCH,llvm-6.0-D64225-cmake-cross2))

@staticfloat
Copy link
Member Author

staticfloat commented Jul 29, 2019

We aren't building WASM at the moment (a holdover from when LLVMBuilder was used for LLVM.jl and not for base Julia and having WASM enabled caused compatibility issues) but we could. Should I rebuild tarballs with WASM enabled?

@Keno
Copy link
Member

Keno commented Jul 29, 2019

The wasm code generator is unlikely to be mature enough for our use until at least LLVM 9.0 (maybe 10.0). However, we do still link the LLVM support libraries, so we need those patches even absent the wasm target.

@vchuravy
Copy link
Member

AnalyzeGC now fails in interesting ways, it finds one unrooted things and one null pointer dereference (locally it found two unrooted things). @Keno can you take a look if those are genuine or false-positives?

@tshort
Copy link
Contributor

tshort commented Aug 2, 2019

It'd be nice to get in the address space patches for WebAssembly from #32734.

@tshort
Copy link
Contributor

tshort commented Aug 2, 2019

I'd also like to see the WASM target included.

@staticfloat
Copy link
Member Author

When talking about the WASM backend with @Keno and @vchuravy, I was under the impression that it's pretty broken (for our purposes) until at least LLVM v9+. Why do you want the WASM backend for LLVM v8?

@tshort
Copy link
Contributor

tshort commented Aug 2, 2019

I'd love to hear more about what's broken and what improvements are coming. For playing with basic static compilation, what's in v8 may be sufficient. That said, waiting for v9 is fine (it'll be here pretty soon).

Note that a source compilation of master Julia has the address-space patches and the WebAssembly target for v8.

@Keno
Copy link
Member

Keno commented Aug 2, 2019

There were many bugs between LLVM 8 and LLVM 9. LLVM seems fairly stable, but needs at least https://reviews.llvm.org/D65463 and https://reviews.llvm.org/D65470 in addition when fed with the LLVM IR that julia generates.

@vchuravy
Copy link
Member

vchuravy commented Aug 14, 2019

The win32 is readily reproducible:

Thread 1 received signal SIGSEGV, Segmentation fault.
0x09a50013 in japi1_top-level scope_0 ()
(gdb) bt
#0  0x09a50013 in japi1_top-level scope_0 ()
#1  0x6ca69e79 in jl_fptr_args (f=0x0, args=0x0, nargs=0, m=0x7e26590) at /home/User/julia/src/gf.c:1809
#2  0x6ca6abba in _jl_invoke (F=0x0, args=0x0, nargs=0, mfunc=0x7e36650, world=1) at /home/User/julia/src/gf.c:2049
#3  0x6ca6ac45 in jl_invoke (F=0x0, args=0x0, nargs=0, mfunc=0x7e36650) at /home/User/julia/src/gf.c:2056
#4  0x6caa6973 in jl_toplevel_eval_flex (m=0x7e40010, e=0x7e31d30, fast=1, expanded=1) at /home/User/julia/src/toplevel.c:808
#5  0x6ca7474d in jl_parse_eval_all (fname=0x6cd79bc3 <szclass_table+1411> "boot.jl", content=0x0, contentlen=0, inmodule=0x7e40010) at /home/User/julia/src/ast.c:873
#6  0x6caa6dcd in jl_load (module=0x7e40010, fname=0x6cd79bc3 <szclass_table+1411> "boot.jl") at /home/User/julia/src/toplevel.c:878
#7  0x6ca871fb in _julia_init (rel=JL_IMAGE_JULIA_HOME) at /home/User/julia/src/init.c:785
#8  0x6ca8835e in julia_init__threading (rel=JL_IMAGE_JULIA_HOME) at /home/User/julia/src/task.c:229
#9  0x00401e97 in wmain (argc=1, argv=0x6367028, envp=0x6376f70) at /home/User/julia/ui/repl.c:211
#10 0x0040139d in __tmainCRTStartup () at /usr/src/debug/mingw64-i686-runtime-6.0.0-1/crt/crtexe.c:334
#11 0x76290419 in KERNEL32!BaseThreadInitThunk () from /cygdrive/c/Windows/System32/KERNEL32.DLL
#12 0x770a662d in ntdll!RtlGetAppContainerNamedObjectPath () from /cygdrive/c/Windows/SYSTEM32/ntdll.dll
#13 0x770a65fd in ntdll!RtlGetAppContainerNamedObjectPath () from /cygdrive/c/Windows/SYSTEM32/ntdll.dll
#14 0x00000000 in ?? ()
(gdb) disassemble
Dump of assembler code for function japi1_top-level scope_0:
   0x09a50000 <+0>:     push   %ebp
   0x09a50001 <+1>:     mov    %esp,%ebp
   0x09a50003 <+3>:     push   %ebx
   0x09a50004 <+4>:     push   %edi
   0x09a50005 <+5>:     push   %esi
   0x09a50006 <+6>:     and    $0xfffffff0,%esp
   0x09a50009 <+9>:     sub    $0x70,%esp
   0x09a5000c <+12>:    mov    0xc(%ebp),%eax
   0x09a5000f <+15>:    mov    0x38(%esp),%ecx
=> 0x09a50013 <+19>:    mov    0x68cc5004(%ecx),%edx
   0x09a50019 <+25>:    mov    (%edx),%edx
   0x09a5001b <+27>:    xor    %ebp,%edx
   0x09a5001d <+29>:    mov    %edx,0x68(%esp)
   0x09a50021 <+33>:    xorps  %xmm0,%xmm0
   0x09a50024 <+36>:    movaps %xmm0,0x40(%esp)
   0x09a50029 <+41>:    movl   $0x0,0x50(%esp)
   0x09a50031 <+49>:    mov    %eax,0x3c(%esp)
   0x09a50035 <+53>:    mov    $0x6cabb170,%eax
(gdb) info registers
eax            0x0      0
ecx            0xcbf820 13367328
edx            0x0      0
ebx            0x8ebde6f        149675631
esp            0xcbf7c0 0xcbf7c0
ebp            0xcbf848 0xcbf848
esi            0x76c9d8e        124558734
edi            0x358    856
eip            0x9a50013        0x9a50013 <japi1_top-level scope_0+19>
eflags         0x10206  [ PF IF RF ]
cs             0x23     35
ss             0x2b     43
ds             0x2b     43
es             0x2b     43
fs             0x53     83
gs             0x2b     43
        0x68cc5000 - 0x68cc53b4 is .bss in /home/User/win32/usr/bin/libssp-0.dll
(gdb) p/x ($ecx +  0x68cc5004)
$22 = 0x69984824

which is outside a mapped region, hence the seqfault.

@vtjnash /@Keno does this trigger any memories?

--edit:
Ahah! With LLVM assertions

    JULIA /home/User/win32/usr/lib/julia/corecompiler.ji
Relocation type not implemented yet!
UNREACHABLE executed at /workspace/srcdir/llvm-8.0.1.src/lib/ExecutionEngine/RuntimeDyld/RuntimeDyldELF.cpp:359!
make[1]: *** [/home/User/julia/sysimage.mk:60: /home/User/win32/usr/lib/julia/corecompiler.ji] Error 3

which seems to be relocation type R_386_GOT32

--- edit 2:
with JULIA_LLVM_ARGS=-debug-only="dyld"

Parse symbols:
emitSection SectionID: 0 Name: .text obj addr: 0ab5fa10 new addr: 0b050000 DataSize: 637 StubBufSize: 0 Allocate: 637
        Type: 4 Name: japi1_top-level scope_0 SID: 0 Offset: 00000000 flags: 66
Parse relocations:
        SectionID: 0
                RelType: 3 Addend: 0 TargetName: __stack_chk_guard
                SectionID: 0 Offset: 21
                RelType: 3 Addend: 0 TargetName: jl_world_counter
                SectionID: 0 Offset: 91
                RelType: 3 Addend: 0 TargetName: jl_global#1
                SectionID: 0 Offset: 102
                RelType: 2 Addend: 0 TargetName: jl_copy_ast
                SectionID: 0 Offset: 129
                RelType: 3 Addend: 0 TargetName: jl_sym#meta3
                SectionID: 0 Offset: 152
                RelType: 3 Addend: 0 TargetName: jl_sym#nospecialize4
                SectionID: 0 Offset: 160
                RelType: 3 Addend: 0 TargetName: jl_sym#x5
                SectionID: 0 Offset: 168
                RelType: 2 Addend: 0 TargetName: jl_f__expr
                SectionID: 0 Offset: 221
                RelType: 3 Addend: 0 TargetName: jl_global#6
                SectionID: 0 Offset: 244
                RelType: 2 Addend: 0 TargetName: jl_copy_ast
                SectionID: 0 Offset: 263
                RelType: 3 Addend: 0 TargetName: jl_sym#block7
                SectionID: 0 Offset: 286
                RelType: 3 Addend: 0 TargetName: jl_global#8
                SectionID: 0 Offset: 294
                RelType: 3 Addend: 0 TargetName: jl_global#9
                SectionID: 0 Offset: 302
                RelType: 2 Addend: 0 TargetName: jl_f__expr
                SectionID: 0 Offset: 363
                RelType: 3 Addend: 0 TargetName: jl_sym#=10
                SectionID: 0 Offset: 386
                RelType: 2 Addend: 0 TargetName: jl_f__expr
                SectionID: 0 Offset: 435
                RelType: 3 Addend: 0 TargetName: jl_global#11
                SectionID: 0 Offset: 464
                RelType: 2 Addend: 0 TargetName: jl_f__expr
                SectionID: 0 Offset: 509
                RelType: 3 Addend: 0 TargetName: jl_global#13
                SectionID: 0 Offset: 532
                RelType: 3 Addend: 0 TargetName: jlplt_jl_toplevel_eval_in_15_got
                SectionID: 0 Offset: 540
                RelType: 3 Addend: 0 TargetName: jl_global#16
                SectionID: 0 Offset: 565
                RelType: 3 Addend: 0 TargetName: __stack_chk_guard
                SectionID: 0 Offset: 596
                RelType: 2 Addend: 0 TargetName: __stack_chk_fail
                SectionID: 0 Offset: 621
emitSection SectionID: 1 Name: .eh_frame obj addr: 0ab5fc90 new addr: 0b250000 DataSize: 60 StubBufSize: 0 Allocate: 60
        SectionID: 1
                RelType: 2 Addend: 0 TargetName:
                This is section symbol
                SectionID: 1 Offset: 32
Reassigning address for section 1 (.eh_frame): 0x000000000b250000 -> 0x000000000b150000
Reassigning address for section 0 (.text): 0x000000000b050000 -> 0x000000000af50000
----- Contents of section .text before relocations -----
0x000000000af50000: 55 89 e5 53 57 56 83 e4 f0 83 ec 70 8b 45 0c 8b
0x000000000af50010: 4c 24 38 8b 91 00 00 00 00 8b 12 31 ea 89 54 24
0x000000000af50020: 68 0f 57 c0 0f 29 44 24 40 c7 44 24 50 00 00 00
0x000000000af50030: 00 89 44 24 3c b8 70 b1 ab 6c ff d0 89 c1 c7 44
0x000000000af50040: 24 40 06 00 00 00 8b 10 89 54 24 44 8d 54 24 40
0x000000000af50050: 89 10 8b 50 04 8b 74 24 38 8b be 00 00 00 00 8b
0x000000000af50060: 1f 89 58 04 8b 9e 00 00 00 00 8b 1b 89 e6 89 1e
0x000000000af50070: 89 44 24 34 89 4c 24 30 89 54 24 2c 89 7c 24 28
0x000000000af50080: e8 fc ff ff ff 8b 4c 24 28 8b 11 8b 74 24 34 89
0x000000000af50090: 56 04 8b 54 24 38 8b ba 00 00 00 00 8b 3f 8b 9a
0x000000000af500a0: 00 00 00 00 8b 1b 8b 8a 00 00 00 00 8b 09 89 44
0x000000000af500b0: 24 50 89 7c 24 54 89 5c 24 58 89 4c 24 5c 89 e1
0x000000000af500c0: 8d 7c 24 54 89 79 04 c7 41 08 03 00 00 00 c7 01
0x000000000af500d0: 00 00 00 00 89 44 24 24 89 7c 24 20 e8 fc ff ff
0x000000000af500e0: ff 8b 4c 24 28 8b 11 8b 74 24 34 89 56 04 8b 54
0x000000000af500f0: 24 38 8b ba 00 00 00 00 8b 3f 89 44 24 4c 89 e3
0x000000000af50100: 89 3b 89 44 24 1c e8 fc ff ff ff 8b 4c 24 28 8b
0x000000000af50110: 11 8b 74 24 34 89 56 04 8b 54 24 38 8b ba 00 00
0x000000000af50120: 00 00 8b 1f 8b 8a 00 00 00 00 8b 09 8b 92 00 00
0x000000000af50130: 00 00 8b 12 89 44 24 48 89 5c 24 54 89 4c 24 58
0x000000000af50140: 8b 4c 24 1c 89 4c 24 5c 89 54 24 60 89 44 24 64
0x000000000af50150: 89 e0 8b 54 24 20 89 50 04 c7 40 08 05 00 00 00
0x000000000af50160: c7 00 00 00 00 00 89 7c 24 18 e8 fc ff ff ff 8b
0x000000000af50170: 4c 24 28 8b 11 8b 74 24 34 89 56 04 8b 54 24 38
0x000000000af50180: 8b ba 00 00 00 00 8b 3f 89 44 24 48 89 7c 24 54
0x000000000af50190: 8b 7c 24 24 89 7c 24 58 89 44 24 5c 89 e0 8b 5c
0x000000000af501a0: 24 20 89 58 04 c7 40 08 03 00 00 00 c7 00 00 00
0x000000000af501b0: 00 00 e8 fc ff ff ff 8b 4c 24 28 8b 11 8b 74 24
0x000000000af501c0: 34 89 56 04 8b 54 24 18 8b 3a 8b 5c 24 38 8b 8b
0x000000000af501d0: 00 00 00 00 8b 09 89 44 24 48 89 7c 24 54 89 4c
0x000000000af501e0: 24 58 89 44 24 5c 89 e0 8b 4c 24 20 89 48 04 c7
0x000000000af501f0: 40 08 03 00 00 00 c7 00 00 00 00 00 e8 fc ff ff
0x000000000af50200: ff 8b 4c 24 28 8b 11 8b 74 24 34 89 56 04 8b 54
0x000000000af50210: 24 38 8b ba 00 00 00 00 8b 3f 8b 9a 00 00 00 00
0x000000000af50220: 8b 1b 89 44 24 48 89 e1 89 41 04 89 39 ff d3 8b
0x000000000af50230: 4c 24 38 8b 91 00 00 00 00 8b 12 8b 74 24 30 8b
0x000000000af50240: 7c 24 2c 89 7e 04 8b 7c 24 44 89 3e 8b 7c 24 68
0x000000000af50250: 31 ef 8b 99 00 00 00 00 8b 1b 29 fb 89 44 24 14
0x000000000af50260: 89 54 24 10 89 5c 24 0c 75 02 eb 05 e8 fc ff ff
0x000000000af50270: ff 8b 44 24 10 8d 65 f4 5e 5f 5b 5d c3
----- Contents of section .eh_frame before relocations -----
0x000000000b150000: 14 00 00 00 00 00 00 00 01 7a 52 00 01 7c 08 01
0x000000000b150010: 1b 0c 04 04 88 01 00 00 1c 00 00 00 1c 00 00 00
0x000000000b150020: 00 00 00 00 7d 02 00 00 00 41 0e 08 85 02 42 0d
0x000000000b150030: 05 49 86 05 87 04 83 03 00 00 00 00
Resolving relocations Name: jl_global#8 0x8d08e60
Relocation type not implemented yet!
UNREACHABLE executed at /home/User/julia/deps/srccache/llvm-8.0.1/lib/ExecutionEngine/RuntimeDyld/RuntimeDyldELF.cpp:359!

RelType: 3 is R_386_GOT32

@vchuravy
Copy link
Member

vchuravy commented Aug 14, 2019

We also need to add the patches we figured out when fixing BinaryBuilder.

Otherwise people who build from source on Mingw (e.g. me) will encounter them again

@vchuravy
Copy link
Member

vchuravy commented Aug 23, 2019

Okay now both Win32 and Win64 fail with:

ERROR: could not load library "C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\lib\julia\sys.dll"
%1 is not a valid Win32 application.
gflags -i julia.exe +sls
cdb julia.exe
> g
0b6c:05cc @ 12495953 - LdrpProcessWork - ERROR: Unable to load DLL: "C:\cygwin\home\vchuravy\julia\usr\lib\julia\sys.dll", Parent Module: "(null)", Status: 0xc000007b

Status 0xc000007b is STATUS_INVALID_IMAGE_FORMAT

@Keno
Copy link
Member

Keno commented Sep 11, 2019

@vchuravy asked me to leave this here:

diff --git a/src/jltypes.c b/src/jltypes.c
index 66aeeae..4728e63 100644
--- a/src/jltypes.c
+++ b/src/jltypes.c
@@ -1043,7 +1043,7 @@ static void check_datatype_parameters(jl_typename_t *tn, jl_value_t **params, si
 arraylist_t partial_inst;
 int inside_typedef = 0;

-static jl_value_t *extract_wrapper(jl_value_t *t)
+static jl_value_t *extract_wrapper(jl_value_t *t JL_PROPAGATES_ROOT)
 {
     t = jl_unwrap_unionall(t);
     if (jl_is_datatype(t))

@vchuravy
Copy link
Member

Yay! (We might want to consider going straight to LLVM 9 though, which was just released today)

@staticfloat
Copy link
Member Author

I was talking to Keno about that; he seemed happier to be on a X.Y.1 release than a X.Y.0 release; less chance of bugs, and not that many things in LLVM 9 that we're interested in. Talking to Tim though, I hear there are GPU goodies that we might be interested in.

@staticfloat
Copy link
Member Author

I think something is going wrong with sys.dll on windows?

@staticfloat
Copy link
Member Author

Huh, in the rebase I just did, a bunch of patch commits disappeared, but I suppose that is because the patches have already been merged into master?

@vchuravy
Copy link
Member

Talking to Tim though, I hear there are GPU goodies that we might be interested in.

In particular I am after decent debug-information for GPU kernels and better profiling.

Huh, in the rebase I just did, a bunch of patch commits disappeared, but I suppose that is because the patches have already been merged into master?

Jup, x-ref #33018

I think something is going wrong with sys.dll on windows?

I would hope not, did the BB apply llvm7-revert-D44485?

@staticfloat
Copy link
Member Author

I would hope not, did the BB apply llvm7-revert-D44485 ?

Applying patch /workspace/srcdir/llvm_patches/0016-llvm7-revert-D44485.patch                                                                             
patching file lib/MC/WinCOFFObjectWriter.cpp                                                                                                             
Hunk #1 succeeded at 681 (offset -9 lines).

Looks like it did to me.

@staticfloat
Copy link
Member Author

Oh, huh, the windwos buildbots made it past bootstrap this time. I don't know what that other error message I was seeing was, let's blame buildbot.

@vchuravy
Copy link
Member

@Keno can you take another look at analyzegc?

vtjnash and others added 3 commits November 18, 2019 20:28
Fixes some missing roots identified by the analysis pass,
and clarifies other code to avoid false-positive errors.
@vtjnash
Copy link
Member

vtjnash commented Nov 19, 2019

anyone know why it can't find the c++ headers? if not, I'll just put that commit on a new PR

@vchuravy
Copy link
Member

anyone know why it can't find the c++ headers? if not, I'll just put that commit on a new PR

Probably better to do that, since it is unrelated from this PR.

@KristofferC
Copy link
Member

@nanosoldier runbenchmarks(ALL, vs = ":master")

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan

@vtjnash vtjnash merged commit 10463bb into master Nov 20, 2019
@vtjnash vtjnash deleted the sf/llvm8 branch November 20, 2019 18:07
@vchuravy
Copy link
Member

Yay! Onwards to LLVM 9.

@chriselrod
Copy link
Contributor

A tangible benefit* to LLVM 9 is that while the LLVM 8 documentation mentions expandload and compressstore, both of these intrinsics caused a crash on a Haswell cluster with the message

LLVM ERROR: Cannot select: 0x2a43b50: v4f64,ch = masked_load<(load 32 from %ir.ptr.i)> 0x26e3c78, 0x2fb7538, 0x2a3fcf8, 0x2ed9e40, /home/c285497/.julia/dev/SIMDPirates/src/memory.jl:823 @[ /home/c285497/.julia/dev/SIMDPirates/src/memory.jl:802 ]

Yet these functions work on a build with LLVM 9.

julia> versioninfo()
Julia Version 1.4.0-DEV.513
Commit 8f7855a* (2019-11-21 01:58 UTC)
Platform Info:
  OS: Linux (x86_64-redhat-linux)
  CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-9.0.0 (ORCJIT, haswell)
Environment:
  JULIA_NUM_THREADS = 24

julia> @time using SIMDPirates
  1.459145 seconds (1.64 M allocations: 88.784 MiB, 1.63% gc time)

julia> x = ntuple(Val(4)) do i Core.VecElement(randn()) end
(VecElement{Float64}(0.15844536654536248), VecElement{Float64}(1.3029855351761224), VecElement{Float64}(-0.5349914246588564), VecElement{Float64}(0.4026832110654877))

julia> y = collect(1.0:99.0);

julia> SIMDPirates.expandload!(Vec{4,Float64}, pointer(y), UInt8(5))
(VecElement{Float64}(1.0), VecElement{Float64}(0.0), VecElement{Float64}(2.0), VecElement{Float64}(0.0))

julia> SIMDPirates.compressstore!(pointer(y), x, UInt8(5))

julia> y'
1×99 LinearAlgebra.Adjoint{Float64,Array{Float64,1}}:
 0.158445  -0.534991  3.0  4.0  5.0  6.0  7.0  8.0  9.0  10.0  11.0  12.0  13.0  14.0  15.0  16.0  17.0  18.0  19.0  20.0  21.0  22.0  23.0  24.0  25.0  26.0  27.0  28.0  29.0  30.0  31.0  32.0  33.0  34.0  35.0  36.0  37.0  38.0  39.0  40.0    60.0  61.0  62.0  63.0  64.0  65.0  66.0  67.0  68.0  69.0  70.0  71.0  72.0  73.0  74.0  75.0  76.0  77.0  78.0  79.0  80.0  81.0  82.0  83.0  84.0  85.0  86.0  87.0  88.0  89.0  90.0  91.0  92.0  93.0  94.0  95.0  96.0  97.0  98.0  99.0

julia> @code_native debuginfo=:none SIMDPirates.expandload!(Vec{4,Float64}, pointer(y), UInt8(5))
        .text
        vmovd   %edx, %xmm0
        andl    $1, %edx
        vmovd   %edx, %xmm2
        movabsq $.rodata.cst16, %rax
        vmovdqa (%rax), %xmm1
        vpbroadcastd    %xmm0, %xmm0
        vpand   %xmm1, %xmm0, %xmm0
        vpcmpeqd        %xmm1, %xmm0, %xmm0
        vpsrld  $31, %xmm0, %xmm1
        vpextrb $0, %xmm2, %eax
        testb   %al, %al
        je      L73
        vmovq   (%rsi), %xmm0           # xmm0 = mem[0],zero
        addq    $8, %rsi
        vpextrb $4, %xmm1, %eax
        cmpb    $1, %al
        jne     L101
        jmp     L87
L73:
        vpxor   %xmm0, %xmm0, %xmm0
        vpextrb $4, %xmm1, %eax
        cmpb    $1, %al
        jne     L101
L87:
        vmovhps (%rsi), %xmm0, %xmm2    # xmm2 = xmm0[0,1],mem[0,1]
        vpblendd        $15, %ymm2, %ymm0, %ymm0 # ymm0 = ymm2[0,1,2,3],ymm0[4,5,6,7]
        addq    $8, %rsi
L101:
        vpextrb $8, %xmm1, %eax
        cmpb    $1, %al
        je      L122
        vpextrb $12, %xmm1, %eax
        cmpb    $1, %al
        je      L152
L121:
        retq
L122:
        vextracti128    $1, %ymm0, %xmm2
        vmovlps (%rsi), %xmm2, %xmm2    # xmm2 = mem[0,1],xmm2[2,3]
        vinserti128     $1, %xmm2, %ymm0, %ymm0
        addq    $8, %rsi
        vpextrb $12, %xmm1, %eax
        cmpb    $1, %al
        jne     L121
L152:
        vextracti128    $1, %ymm0, %xmm1
        vmovhps (%rsi), %xmm1, %xmm1    # xmm1 = xmm1[0,1],mem[0,1]
        vinserti128     $1, %xmm1, %ymm0, %ymm0
        retq
        nopl    (%rax)

julia> @code_native debuginfo=:none SIMDPirates.compressstore!(pointer(y), x, UInt8(5))
        .text
        vmovd   %esi, %xmm1
        andl    $1, %esi
        vmovd   %esi, %xmm2
        movabsq $.rodata.cst16, %rax
        vmovdqa (%rax), %xmm3
        vpbroadcastd    %xmm1, %xmm1
        vpand   %xmm3, %xmm1, %xmm1
        vpcmpeqd        %xmm3, %xmm1, %xmm1
        vpsrld  $31, %xmm1, %xmm1
        vpextrb $0, %xmm2, %eax
        testb   %al, %al
        jne     L93
        vpextrb $4, %xmm1, %eax
        cmpb    $1, %al
        je      L111
L63:
        vpextrb $8, %xmm1, %eax
        vextractf128    $1, %ymm0, %xmm0
        cmpb    $1, %al
        je      L135
L79:
        vpextrb $12, %xmm1, %eax
        cmpb    $1, %al
        je      L153
L89:
        vzeroupper
        retq
L93:
        vmovlps %xmm0, (%rdi)
        addq    $8, %rdi
        vpextrb $4, %xmm1, %eax
        cmpb    $1, %al
        jne     L63
L111:
        vmovhps %xmm0, (%rdi)
        addq    $8, %rdi
        vpextrb $8, %xmm1, %eax
        vextractf128    $1, %ymm0, %xmm0
        cmpb    $1, %al
        jne     L79
L135:
        vmovlps %xmm0, (%rdi)
        addq    $8, %rdi
        vpextrb $12, %xmm1, %eax
        cmpb    $1, %al
        jne     L89
L153:
        vmovhps %xmm0, (%rdi)
        vzeroupper
        retq
        nopw    %cs:(%rax,%rax)
        nopl    (%rax,%rax)

Haswell obviously isn't one of the targets that support efficient expand loads or compress stores, but it's nice to have things work.

*Impacting approximately 0% of users.

@vtjnash
Copy link
Member

vtjnash commented Nov 21, 2019

This was only for LLVM 8. For LLVM 9, you want #33916.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Misattributed profile information
8 participants