Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Illegal instruction (core dumped) when running in qemu #22338

Open
abhisheksinghrgit opened this issue Jun 28, 2019 · 19 comments
Open

Illegal instruction (core dumped) when running in qemu #22338

abhisheksinghrgit opened this issue Jun 28, 2019 · 19 comments
Labels
high priority module: cpu CPU specific problem (e.g., perf, algorithm) module: crash Problem manifests as a hard crash, as opposed to a RuntimeError module: vectorization Related to SIMD vectorization, e.g., Vec256 triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@abhisheksinghrgit
Copy link

abhisheksinghrgit commented Jun 28, 2019

I had installed Pytorch using Anaconda from pytorch website.

conda install pytorch-cpu torchvision-cpu -c pytorch
I'm getting Illegal instruction (core dumped)

When I start the app using gdb, I get the following error.
I guess something is wrong with libcaffe2.so

Thread 1 "python" received signal SIGILL, Illegal instruction.
0x00007fffa8ec2486 in THFloatVector_normal_fill_AVX2 () from /U01/anaconda3/envs/pyt1/lib/python3.7/site-packages/torch/lib/libcaffe2.so

Machine configuration:

Machine is Ubuntu 18.04.2

root@USANZV-MER-DMA485:~# lscpu
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              16
On-line CPU(s) list: 0-15
Thread(s) per core:  1
Core(s) per socket:  1
Socket(s):           16
NUMA node(s):        1
Vendor ID:           GenuineIntel
CPU family:          6
Model:               13
Model name:          QEMU Virtual CPU version (cpu64-rhel6)
Stepping:            3
CPU MHz:             2294.598
BogoMIPS:            4589.19
Hypervisor vendor:   KVM
Virtualization type: full
L1d cache:           32K
L1i cache:           32K
L2 cache:            4096K
NUMA node0 CPU(s):   0-15
Flags:               fpu de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pse36 clflush mmx fxsr sse sse2 syscall nx lm nopl cpuid pni cx16 hypervisor lahf_lm abm pti
root@USANZV-MER-DMA485:~#

cc @ezyang @gchanan @zou3519 @VitalyFedyunin

@ezyang ezyang added the module: vectorization Related to SIMD vectorization, e.g., Vec256 label Jun 28, 2019
@ezyang
Copy link
Contributor

ezyang commented Jun 28, 2019

Hi, if possible, can you run the program under gdb and then at the SIGILL site, run disas to get what instructions were executing at that time, and post it?

@ezyang ezyang changed the title Illegal instruction (core dumped) Illegal instruction (core dumped) when running in qemu Jun 28, 2019
@ezyang
Copy link
Contributor

ezyang commented Jun 28, 2019

cc @cpuhrsch this looks like some sort of cpuinfo(?) problem when running under qemu.

@ezyang ezyang added high priority triage review module: crash Problem manifests as a hard crash, as opposed to a RuntimeError module: cpu CPU specific problem (e.g., perf, algorithm) triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Jun 28, 2019
@abhisheksinghrgit
Copy link
Author

abhisheksinghrgit commented Jun 28, 2019

I tried adding avx but still the same issue.

(pyt1) webtech@USA:/U01/code$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 16
On-line CPU(s) list: 0-15
Thread(s) per core: 1
Core(s) per socket: 1
Socket(s): 16
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 6
Model name: QEMU Virtual CPU version 0.12.1
Stepping: 3
CPU MHz: 2294.598
BogoMIPS: 4589.19
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 4096K
NUMA node0 CPU(s): 0-15
Flags: fpu de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pse36 clflush mmx fxsr sse sse2 syscall nx lm rep_good nopl cpuid pni cx16 popcnt avx hypervisor lahf_lm abm pti

The stack after running with gdb

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7fffd9424700 (LWP 2248)]
[New Thread 0x7fffd8c23700 (LWP 2249)]
[New Thread 0x7fffd6422700 (LWP 2250)]
[New Thread 0x7fffd1c21700 (LWP 2251)]
[New Thread 0x7fffd1420700 (LWP 2252)]
[New Thread 0x7fffcec1f700 (LWP 2253)]
[New Thread 0x7fffcc41e700 (LWP 2254)]
[New Thread 0x7fffc7c1d700 (LWP 2255)]
[New Thread 0x7fffc541c700 (LWP 2256)]
[New Thread 0x7fffc4c1b700 (LWP 2257)]
[New Thread 0x7fffc041a700 (LWP 2258)]
[New Thread 0x7fffbfc19700 (LWP 2259)]
[New Thread 0x7fffbd418700 (LWP 2260)]
[New Thread 0x7fffb8c17700 (LWP 2261)]
[New Thread 0x7fffb6416700 (LWP 2262)]
[New Thread 0x7fff8957c780 (LWP 2269)]
[New Thread 0x7fff88d7a800 (LWP 2270)]
[New Thread 0x7fff88578880 (LWP 2271)]
[New Thread 0x7fff87d76900 (LWP 2272)]
[New Thread 0x7fff87574980 (LWP 2273)]
[New Thread 0x7fff86d72a00 (LWP 2274)]
[New Thread 0x7fff86570a80 (LWP 2275)]
[New Thread 0x7fff85d6eb00 (LWP 2276)]
[New Thread 0x7fff8556cb80 (LWP 2277)]
[New Thread 0x7fff84d6ac00 (LWP 2278)]
[New Thread 0x7fff84568c80 (LWP 2279)]
[New Thread 0x7fff83d66d00 (LWP 2280)]
[New Thread 0x7fff83564d80 (LWP 2281)]
[New Thread 0x7fff82d62e00 (LWP 2282)]
[New Thread 0x7fff82560e80 (LWP 2283)]
/U01/anaconda3/envs/pyt1/lib/python3.7/site-packages/torch/serialization.py:454: SourceChangeWarning: source code of class 'torch.nn.modules.loss.CrossEntropyLoss' has changed. you can retrieve the original source code by accessing the object's source attribute or set torch.nn.Module.dump_patches = True and use the patch tool to revert the changes.
warnings.warn(msg, SourceChangeWarning)

Thread 1 "python" received signal SIGILL, Illegal instruction.
0x00007fffa8ec06d0 in THFloatVector_fill_AVX () from /U01/anaconda3/envs/pyt1/lib/python3.7/site-packages/torch/lib/libcaffe2.so

@ezyang
Copy link
Contributor

ezyang commented Jul 1, 2019

Can you disas at the failure site? So we can see what the problem is.

@ezyang
Copy link
Contributor

ezyang commented Jul 1, 2019

Can you perhaps try adding avx2, since the SIGILL is probably on an avx2 instruction?

In any case, this is a bug.

@gchanan
Copy link
Contributor

gchanan commented Aug 22, 2019

removing high priority until we see a case outside of qemu or get a reproduction.

@benoitmartin88
Copy link

Hi,
I am experiencing the same issue.

Here is my cpu info:

$ lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                4
On-line CPU(s) list:   0-3
Thread(s) per core:    1
Core(s) per socket:    4
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 13
Model name:            QEMU Virtual CPU version 1.5.3
Stepping:              3
CPU MHz:               2399.996
BogoMIPS:              4799.99
Hypervisor vendor:     KVM
Virtualization type:   full
L1d cache:             32K
L1i cache:             32K
L2 cache:              4096K
NUMA node0 CPU(s):     0-3
Flags:                 fpu de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pse36 clflush mmx fxsr sse sse2 ht syscall nx lm rep_good nopl pni cx16 hypervisor lahf_lm abm

And gdb:

Thread 1 "python" received signal SIGILL, Illegal instruction.
0x00007fff9eaea4d0 in THFloatVector_normal_fill_AVX2 () from [...]/python3.7/site-packages/torch/lib/libtorch.so

Any news on this ?

@ezyang
Copy link
Contributor

ezyang commented Sep 30, 2019

@benoitmartin88 We deprioritized this issue because we weren't able to easily get a reproduction. Do you think you could give us more detailed information about your qemu setup?

@benoitmartin88
Copy link

@ezyang Unfortunately I do not have more information on the qemu setup as it is part of a continuous integration setup that I use (most probably VMs). I will try and find more about the used setup and let you know as soon as I have something.

@benoitmartin88
Copy link

@ezyang I'm afraid I won't be getting any more info on the qemu setup. That being said, it probably has to do with the fact that my qemu virtual cpu does not have the avx2 instruction set. This would probably be reproducible if qemu was configured to not use avx2.

@ezyang
Copy link
Contributor

ezyang commented Oct 2, 2019

Thanks. We'll give that a try.

@fmassa
Copy link
Member

fmassa commented Jan 27, 2020

This issue has been showing up again for some users, see pytorch/vision#1782 and #29371 for an example.

Here is the stacktrace from pytorch/vision#1782

Thread 1 "python" received signal SIGILL, Illegal instruction.
0x00007ffff0772fcc in THFloatVector_normal_fill_AVX2 () from /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so
(gdb) bt
#0 0x00007ffff0772fcc in THFloatVector_normal_fill_AVX2 () from /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so
#1 0x00007ffff0522269 in THFloatTensor_normal () from /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so
#2 0x00007ffff03fc29f in at::native::legacy::cpu::th_normal(at::Tensor&, double, double, at::Generator*) () from /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so
#3 0x00007ffff0392e94 in at::CPUType::(anonymous namespace)::normal_(at::Tensor&, double, double, at::Generator*) () from /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so
#4 0x00007ffff23537ae in torch::autograd::VariableType::(anonymous namespace)::normal_(at::Tensor&, double, double, at::Generator*) ()
from /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so
#5 0x00007ffff568ee6a in torch::autograd::THPVariable_normal_(_object*, _object*, _object*) () from /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so
#6 0x00005555556bca04 in _PyMethodDef_RawFastCallKeywords () at /home/conda/feedstock_root/build_artifacts/python_1578433408510/work/Objects/call.c:694
#7 0x00005555556d432f in _PyMethodDescr_FastCallKeywords () at /home/conda/feedstock_root/build_artifacts/python_1578433408510/work/Objects/descrobject.c:288
#8 0x0000555555728b1c in call_function (kwnames=0x0, oparg=3, pp_stack=) at /home/conda/feedstock_root/build_artifacts/python_1578433408510/work/Python/ceval.c:4593
#9 _PyEval_EvalFrameDefault () at /home/conda/feedstock_root/build_artifacts/python_1578433408510/work/Python/ceval.c:3110
#10 0x00005555556bbf7b in function_code_fastcall (globals=, nargs=3, args=, co=)
at /home/conda/feedstock_root/build_artifacts/python_1578433408510/work/Objects/call.c:283
#11 _PyFunction_FastCallKeywords () at /home/conda/feedstock_root/build_artifacts/python_1578433408510/work/Objects/call.c:408
#12 0x0000555555724156 in call_function (kwnames=0x0, oparg=, pp_stack=) at /home/conda/feedstock_root/build_artifacts/python_1578433408510/work/Python/ceval.c:4616
#13 _PyEval_EvalFrameDefault () at /home/conda/feedstock_root/build_artifacts/python_1578433408510/work/Python/ceval.c:3124
#14 0x0000555555668729 in _PyEval_EvalCodeWithName () at /home/conda/feedstock_root/build_artifacts/python_1578433408510/work/Python/ceval.c:3930
#15 0x00005555556bc207 in _PyFunction_FastCallKeywords () at /home/conda/feedstock_root/build_artifacts/python_1578433408510/work/Objects/call.c:433
#16 0x000055555572521f in call_function (kwnames=0x7fffd9564460, oparg=, pp_stack=)
at /home/conda/feedstock_root/build_artifacts/python_1578433408510/work/Python/ceval.c:4616
#17 _PyEval_EvalFrameDefault () at /home/conda/feedstock_root/build_artifacts/python_1578433408510/work/Python/ceval.c:3139
#18 0x0000555555668a0a in _PyEval_EvalCodeWithName () at /home/conda/feedstock_root/build_artifacts/python_1578433408510/work/Python/ceval.c:3930
#19 0x0000555555669865 in _PyFunction_FastCallDict () at /home/conda/feedstock_root/build_artifacts/python_1578433408510/work/Objects/call.c:376
#20 0x0000555555689313 in _PyObject_Call_Prepend () at /home/conda/feedstock_root/build_artifacts/python_1578433408510/work/Objects/call.c:908
#21 0x00005555556d372a in slot_tp_init () at /home/conda/feedstock_root/build_artifacts/python_1578433408510/work/Objects/typeobject.c:6636
#22 0x00005555556d4287 in type_call () at /home/conda/feedstock_root/build_artifacts/python_1578433408510/work/Objects/typeobject.c:971
#23 0x000055555567b06e in PyObject_Call () at /home/conda/feedstock_root/build_artifacts/python_1578433408510/work/Objects/call.c:245
#24 0x0000555555725c8f in do_call_core (kwdict=0x7fffd6c64fa0, callargs=0x7ffff6c3fa10, func=0x555557315f70) at /home/conda/feedstock_root/build_artifacts/python_1578433408510/work/Python/ceval.c:4645
#25 _PyEval_EvalFrameDefault () at /home/conda/feedstock_root/build_artifacts/python_1578433408510/work/Python/ceval.c:3191
#26 0x0000555555668729 in _PyEval_EvalCodeWithName () at /home/conda/feedstock_root/build_artifacts/python_1578433408510/work/Python/ceval.c:3930
#27 0x0000555555669865 in _PyFunction_FastCallDict () at /home/conda/feedstock_root/build_artifacts/python_1578433408510/work/Objects/call.c:376
#28 0x0000555555725c8f in do_call_core (kwdict=0x7ffff6d8cdc0, callargs=0x7fffd6c5f370, func=0x7fffd955ddd0) at /home/conda/feedstock_root/build_artifacts/python_1578433408510/work/Python/ceval.c:4645
#29 _PyEval_EvalFrameDefault () at /home/conda/feedstock_root/build_artifacts/python_1578433408510/work/Python/ceval.c:3191
#30 0x0000555555668729 in _PyEval_EvalCodeWithName () at /home/conda/feedstock_root/build_artifacts/python_1578433408510/work/Python/ceval.c:3930
#31 0x00005555556bc207 in _PyFunction_FastCallKeywords () at /home/conda/feedstock_root/build_artifacts/python_1578433408510/work/Objects/call.c:433
#32 0x00005555557289b9 in call_function (kwnames=0x0, oparg=, pp_stack=) at /home/conda/feedstock_root/build_artifacts/python_1578433408510/work/Python/ceval.c:4616
#33 _PyEval_EvalFrameDefault () at /home/conda/feedstock_root/build_artifacts/python_1578433408510/work/Python/ceval.c:3093
#34 0x0000555555668729 in _PyEval_EvalCodeWithName () at /home/conda/feedstock_root/build_artifacts/python_1578433408510/work/Python/ceval.c:3930
#35 0x0000555555669654 in PyEval_EvalCodeEx () at /home/conda/feedstock_root/build_artifacts/python_1578433408510/work/Python/ceval.c:3959
#36 0x000055555566967c in PyEval_EvalCode (co=, globals=, locals=)
at /home/conda/feedstock_root/build_artifacts/python_1578433408510/work/Python/ceval.c:524
#37 0x000055555577fcb4 in run_mod () at /home/conda/feedstock_root/build_artifacts/python_1578433408510/work/Python/pythonrun.c:1035
#38 0x000055555578a191 in PyRun_FileExFlags () at /home/conda/feedstock_root/build_artifacts/python_1578433408510/work/Python/pythonrun.c:988
#39 0x000055555578a383 in PyRun_SimpleFileExFlags () at /home/conda/feedstock_root/build_artifacts/python_1578433408510/work/Python/pythonrun.c:429
#40 0x000055555578b475 in pymain_run_file (p_cf=0x7fffffffdc10, filename=0x5555558c1700 L"test.py", fp=0x555555909950)
at /home/conda/feedstock_root/build_artifacts/python_1578433408510/work/Modules/main.c:428
#41 pymain_run_filename (cf=0x7fffffffdc10, pymain=0x7fffffffdd20) at /home/conda/feedstock_root/build_artifacts/python_1578433408510/work/Modules/main.c:1607
#42 pymain_run_python (pymain=0x7fffffffdd20) at /home/conda/feedstock_root/build_artifacts/python_1578433408510/work/Modules/main.c:2868
#43 pymain_main () at /home/conda/feedstock_root/build_artifacts/python_1578433408510/work/Modules/main.c:3029
#44 0x000055555578b59c in _Py_UnixMain () at /home/conda/feedstock_root/build_artifacts/python_1578433408510/work/Modules/main.c:3064
#45 0x00007ffff77e6b97 in __libc_start_main (main=0x5555556492a0

, argc=2, argv=0x7fffffffde78, init=, fini=, rtld_fini=, stack_end=0x7fffffffde68)
at ../csu/libc-start.c:310
#46 0x0000555555733b50 in _start () at ../sysdeps/x86_64/elf/start.S:103
(gdb)

@Tridet
Copy link

Tridet commented Jan 30, 2020

I faced the same problem, with torch 1.1.0, here's the trace.
CPU :

Architecture :                          x86_64
Mode(s) opératoire(s) des processeurs : 32-bit, 64-bit
Boutisme :                              Little Endian
Processeur(s) :                         8
Liste de processeur(s) en ligne :       0-7
Thread(s) par cœur :                    2
Cœur(s) par socket :                    4
Socket(s) :                             1
Nœud(s) NUMA :                          1
Identifiant constructeur :              GenuineIntel
Famille de processeur :                 6
Modèle :                                142
Nom de modèle :                         Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz
Révision :                              10
Vitesse du processeur en MHz :          2989.918
Vitesse maximale du processeur en MHz : 4000,0000
Vitesse minimale du processeur en MHz : 400,0000
BogoMIPS :                              3999.93
Virtualisation :                        VT-x
Cache L1d :                             32K
Cache L1i :                             32K
Cache L2 :                              256K
Cache L3 :                              8192K
Nœud NUMA 0 de processeur(s) :          0-7
Drapaux :                               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx 

gdb disas :

(gdb) disas
Thread 1 "python" received signal SIGILL, Illegal instruction.
0x00007fffbe810cdb in mkldnn::impl::scales_t::set(int, int, float const*) () from /home/USERNAME/anaconda3/lib/python3.7/site-packages/torch/lib/libcaffe2.so
(gdb) disas
Dump of assembler code for function _ZN6mkldnn4impl8scales_t3setEiiPKf:
   0x00007fffbe810c70 <+0>:     lea    0x8(%rsp),%r10
   0x00007fffbe810c75 <+5>:     and    $0xffffffffffffffc0,%rsp
   0x00007fffbe810c79 <+9>:     pushq  -0x8(%r10)
   0x00007fffbe810c7d <+13>:    push   %rbp
   0x00007fffbe810c7e <+14>:    mov    %rsp,%rbp
   0x00007fffbe810c81 <+17>:    push   %r13
   0x00007fffbe810c83 <+19>:    push   %r12
   0x00007fffbe810c85 <+21>:    mov    %rcx,%r12
   0x00007fffbe810c88 <+24>:    push   %r10
   0x00007fffbe810c8a <+26>:    push   %rbx
   0x00007fffbe810c8b <+27>:    mov    %rdi,%rbx
   0x00007fffbe810c8e <+30>:    lea    0x10(%rbx),%r13
   0x00007fffbe810c92 <+34>:    sub    $0x50,%rsp
   0x00007fffbe810c96 <+38>:    mov    0x8(%rdi),%rdi
   0x00007fffbe810c9a <+42>:    cmp    %r13,%rdi
   0x00007fffbe810c9d <+45>:    je     0x7fffbe810cb6 <_ZN6mkldnn4impl8scales_t3setEiiPKf+70>
   0x00007fffbe810c9f <+47>:    test   %rdi,%rdi
   0x00007fffbe810ca2 <+50>:    je     0x7fffbe810cb6 <_ZN6mkldnn4impl8scales_t3setEiiPKf+70>
   0x00007fffbe810ca4 <+52>:    mov    %edx,-0x38(%rbp)
   0x00007fffbe810ca7 <+55>:    mov    %esi,-0x34(%rbp)
   0x00007fffbe810caa <+58>:    addr32 callq 0x7fffbe816f80 <_ZN6mkldnn4impl4freeEPv>
   0x00007fffbe810cb0 <+64>:    mov    -0x38(%rbp),%edx
   0x00007fffbe810cb3 <+67>:    mov    -0x34(%rbp),%esi
   0x00007fffbe810cb6 <+70>:    mov    %r13,0x8(%rbx)
   0x00007fffbe810cba <+74>:    mov    %esi,(%rbx)
   0x00007fffbe810cbc <+76>:    mov    %edx,0x4(%rbx)
   0x00007fffbe810cbf <+79>:    cmp    $0x1,%esi
   0x00007fffbe810cc2 <+82>:    jne    0x7fffbe810d08 <_ZN6mkldnn4impl8scales_t3setEiiPKf+152>
   0x00007fffbe810cc4 <+84>:    lea    0x4(%r12),%rax
   0x00007fffbe810cc9 <+89>:    cmp    %rax,%r13
   0x00007fffbe810ccc <+92>:    jae    0x7fffbe810cdb <_ZN6mkldnn4impl8scales_t3setEiiPKf+107>
   0x00007fffbe810cce <+94>:    lea    0x50(%rbx),%rax
   0x00007fffbe810cd2 <+98>:    cmp    %rax,%r12
   0x00007fffbe810cd5 <+101>:   jb     0x7fffbe810f38 <_ZN6mkldnn4impl8scales_t3setEiiPKf+712>
=> 0x00007fffbe810cdb <+107>:   vbroadcastss (%r12),%zmm0
   0x00007fffbe810ce2 <+114>:   vmovups %zmm0,0x10(%rbx)
   0x00007fffbe810cec <+124>:   vzeroupper 
   0x00007fffbe810cef <+127>:   xor    %eax,%eax
   0x00007fffbe810cf1 <+129>:   add    $0x50,%rsp
   0x00007fffbe810cf5 <+133>:   pop    %rbx
   0x00007fffbe810cf6 <+134>:   pop    %r10
   0x00007fffbe810cf8 <+136>:   pop    %r12
   0x00007fffbe810cfa <+138>:   pop    %r13
   0x00007fffbe810cfc <+140>:   pop    %rbp
   0x00007fffbe810cfd <+141>:   lea    -0x8(%r10),%rsp
   0x00007fffbe810d01 <+145>:   retq   
   0x00007fffbe810d02 <+146>:   nopw   0x0(%rax,%rax,1)
   0x00007fffbe810d08 <+152>:   movslq %esi,%rsi
   0x00007fffbe810d0b <+155>:   lea    0x0(,%rsi,4),%rdi
   0x00007fffbe810d13 <+163>:   mov    $0x40,%esi
   0x00007fffbe810d18 <+168>:   addr32 callq 0x7fffbe816f30 <_ZN6mkldnn4impl6mallocEmi>
   0x00007fffbe810d1e <+174>:   mov    %rax,%rdi
   0x00007fffbe810d21 <+177>:   mov    %rax,0x8(%rbx)
   0x00007fffbe810d25 <+181>:   mov    $0x1,%eax
   0x00007fffbe810d2a <+186>:   test   %rdi,%rdi
   0x00007fffbe810d2d <+189>:   je     0x7fffbe810cf1 <_ZN6mkldnn4impl8scales_t3setEiiPKf+129>
   0x00007fffbe810d2f <+191>:   mov    (%rbx),%r8d
   0x00007fffbe810d32 <+194>:   test   %r8d,%r8d
   0x00007fffbe810d35 <+197>:   jle    0x7fffbe810cef <_ZN6mkldnn4impl8scales_t3setEiiPKf+127>
   0x00007fffbe810d37 <+199>:   lea    0x40(%rdi),%rax
   0x00007fffbe810d3b <+203>:   cmp    %rax,%r12
   0x00007fffbe810d3e <+206>:   lea    0x40(%r12),%rax
   0x00007fffbe810d43 <+211>:   setae  %dl
   0x00007fffbe810d46 <+214>:   cmp    %rax,%rdi
   0x00007fffbe810d49 <+217>:   setae  %al
   0x00007fffbe810d4c <+220>:   or     %al,%dl
   0x00007fffbe810d4e <+222>:   je     0x7fffbe810ff0 <_ZN6mkldnn4impl8scales_t3setEiiPKf+896>
   0x00007fffbe810d54 <+228>:   cmp    $0x16,%r8d
   0x00007fffbe810d58 <+232>:   jbe    0x7fffbe810ff0 <_ZN6mkldnn4impl8scales_t3setEiiPKf+896>
   0x00007fffbe810d5e <+238>:   mov    %r12,%rcx
   0x00007fffbe810d61 <+241>:   lea    -0x1(%r8),%edx
   0x00007fffbe810d65 <+245>:   shr    $0x2,%rcx
   0x00007fffbe810d69 <+249>:   neg    %rcx
   0x00007fffbe810d6c <+252>:   and    $0xf,%ecx
   0x00007fffbe810d6f <+255>:   lea    0xf(%rcx),%eax
   0x00007fffbe810d72 <+258>:   cmp    %eax,%edx
   0x00007fffbe810d74 <+260>:   jb     0x7fffbe811042 <_ZN6mkldnn4impl8scales_t3setEiiPKf+978>
   0x00007fffbe810d7a <+266>:   test   %ecx,%ecx
   0x00007fffbe810d7c <+268>:   je     0x7fffbe811079 <_ZN6mkldnn4impl8scales_t3setEiiPKf+1033>
   0x00007fffbe810d82 <+274>:   vmovss (%r12),%xmm0
   0x00007fffbe810d88 <+280>:   vmovss %xmm0,(%rdi)
   0x00007fffbe810d8c <+284>:   cmp    $0x1,%ecx
   0x00007fffbe810d8f <+287>:   je     0x7fffbe811020 <_ZN6mkldnn4impl8scales_t3setEiiPKf+944>
   0x00007fffbe810d95 <+293>:   vmovss 0x4(%r12),%xmm0
   0x00007fffbe810d9c <+300>:   vmovss %xmm0,0x4(%rdi)
   0x00007fffbe810da1 <+305>:   cmp    $0x2,%ecx
   0x00007fffbe810da4 <+308>:   je     0x7fffbe811038 <_ZN6mkldnn4impl8scales_t3setEiiPKf+968>
   0x00007fffbe810daa <+314>:   vmovss 0x8(%r12),%xmm0
   0x00007fffbe810db1 <+321>:   vmovss %xmm0,0x8(%rdi)
   0x00007fffbe810db6 <+326>:   cmp    $0x3,%ecx
   0x00007fffbe810db9 <+329>:   je     0x7fffbe811049 <_ZN6mkldnn4impl8scales_t3setEiiPKf+985>
   0x00007fffbe810dbf <+335>:   vmovss 0xc(%r12),%xmm0
   0x00007fffbe810dc6 <+342>:   vmovss %xmm0,0xc(%rdi)
   0x00007fffbe810dcb <+347>:   cmp    $0x4,%ecx
   0x00007fffbe810dce <+350>:   je     0x7fffbe811053 <_ZN6mkldnn4impl8scales_t3setEiiPKf+995>
   0x00007fffbe810dd4 <+356>:   vmovss 0x10(%r12),%xmm0
   0x00007fffbe810ddb <+363>:   vmovss %xmm0,0x10(%rdi)
   0x00007fffbe810de0 <+368>:   cmp    $0x5,%ecx
   0x00007fffbe810de3 <+371>:   je     0x7fffbe81105d <_ZN6mkldnn4impl8scales_t3setEiiPKf+1005>
   0x00007fffbe810de9 <+377>:   vmovss 0x14(%r12),%xmm0
   0x00007fffbe810df0 <+384>:   vmovss %xmm0,0x14(%rdi)
   0x00007fffbe810df5 <+389>:   cmp    $0x6,%ecx
   0x00007fffbe810df8 <+392>:   je     0x7fffbe81106f <_ZN6mkldnn4impl8scales_t3setEiiPKf+1023>
   0x00007fffbe810dfe <+398>:   vmovss 0x18(%r12),%xmm0
   0x00007fffbe810e05 <+405>:   vmovss %xmm0,0x18(%rdi)
   0x00007fffbe810e0a <+410>:   cmp    $0x7,%ecx
   0x00007fffbe810e0d <+413>:   je     0x7fffbe811080 <_ZN6mkldnn4impl8scales_t3setEiiPKf+1040>
   0x00007fffbe810e13 <+419>:   vmovss 0x1c(%r12),%xmm0
   0x00007fffbe810e1a <+426>:   vmovss %xmm0,0x1c(%rdi)
   0x00007fffbe810e1f <+431>:   cmp    $0x8,%ecx
   0x00007fffbe810e22 <+434>:   je     0x7fffbe81108a <_ZN6mkldnn4impl8scales_t3setEiiPKf+1050>
   0x00007fffbe810e28 <+440>:   vmovss 0x20(%r12),%xmm0
   0x00007fffbe810e2f <+447>:   vmovss %xmm0,0x20(%rdi)
   0x00007fffbe810e34 <+452>:   cmp    $0x9,%ecx
   0x00007fffbe810e37 <+455>:   je     0x7fffbe81102a <_ZN6mkldnn4impl8scales_t3setEiiPKf+954>
   0x00007fffbe810e3d <+461>:   vmovss 0x24(%r12),%xmm0
   0x00007fffbe810e44 <+468>:   vmovss %xmm0,0x24(%rdi)
   0x00007fffbe810e49 <+473>:   cmp    $0xa,%ecx
   0x00007fffbe810e4c <+476>:   je     0x7fffbe811094 <_ZN6mkldnn4impl8scales_t3setEiiPKf+1060>
   0x00007fffbe810e52 <+482>:   vmovss 0x28(%r12),%xmm0
   0x00007fffbe810e59 <+489>:   vmovss %xmm0,0x28(%rdi)
   0x00007fffbe810e5e <+494>:   cmp    $0xb,%ecx
   0x00007fffbe810e61 <+497>:   je     0x7fffbe81109e <_ZN6mkldnn4impl8scales_t3setEiiPKf+1070>
   0x00007fffbe810e67 <+503>:   vmovss 0x2c(%r12),%xmm0
   0x00007fffbe810e6e <+510>:   vmovss %xmm0,0x2c(%rdi)
   0x00007fffbe810e73 <+515>:   cmp    $0xc,%ecx
   0x00007fffbe810e76 <+518>:   je     0x7fffbe8110a8 <_ZN6mkldnn4impl8scales_t3setEiiPKf+1080>
   0x00007fffbe810e7c <+524>:   vmovss 0x30(%r12),%xmm0
   0x00007fffbe810e83 <+531>:   vmovss %xmm0,0x30(%rdi)
   0x00007fffbe810e88 <+536>:   cmp    $0xd,%ecx
   0x00007fffbe810e8b <+539>:   je     0x7fffbe8110b2 <_ZN6mkldnn4impl8scales_t3setEiiPKf+1090>
   0x00007fffbe810e91 <+545>:   vmovss 0x34(%r12),%xmm0
   0x00007fffbe810e98 <+552>:   vmovss %xmm0,0x34(%rdi)
   0x00007fffbe810e9d <+557>:   cmp    $0xe,%ecx
   0x00007fffbe810ea0 <+560>:   je     0x7fffbe8110bc <_ZN6mkldnn4impl8scales_t3setEiiPKf+1100>
   0x00007fffbe810ea6 <+566>:   vmovss 0x38(%r12),%xmm0
   0x00007fffbe810ead <+573>:   mov    $0xf,%edx
   0x00007fffbe810eb2 <+578>:   vmovss %xmm0,0x38(%rdi)
   0x00007fffbe810eb7 <+583>:   nopw   0x0(%rax,%rax,1)
   0x00007fffbe810ec0 <+592>:   mov    %r8d,%r11d
   0x00007fffbe810ec3 <+595>:   xor    %esi,%esi
   0x00007fffbe810ec5 <+597>:   xor    %eax,%eax
   0x00007fffbe810ec7 <+599>:   sub    %ecx,%r11d
   0x00007fffbe810eca <+602>:   shl    $0x2,%rcx
   0x00007fffbe810ece <+606>:   mov    %r11d,%r10d
   0x00007fffbe810ed1 <+609>:   lea    (%r12,%rcx,1),%r9
   0x00007fffbe810ed5 <+613>:   add    %rdi,%rcx
   0x00007fffbe810ed8 <+616>:   shr    $0x4,%r10d
   0x00007fffbe810edc <+620>:   nopl   0x0(%rax)
   0x00007fffbe810ee0 <+624>:   vmovaps (%r9,%rsi,1),%zmm0
   0x00007fffbe810ee7 <+631>:   add    $0x1,%eax
   0x00007fffbe810eea <+634>:   vmovups %zmm0,(%rcx,%rsi,1)
   0x00007fffbe810ef1 <+641>:   add    $0x40,%rsi
   0x00007fffbe810ef5 <+645>:   cmp    %eax,%r10d
   0x00007fffbe810ef8 <+648>:   ja     0x7fffbe810ee0 <_ZN6mkldnn4impl8scales_t3setEiiPKf+624>
   0x00007fffbe810efa <+650>:   mov    %r11d,%eax
   0x00007fffbe810efd <+653>:   and    $0xfffffff0,%eax
   0x00007fffbe810f00 <+656>:   add    %eax,%edx
   0x00007fffbe810f02 <+658>:   cmp    %eax,%r11d
   0x00007fffbe810f05 <+661>:   je     0x7fffbe811067 <_ZN6mkldnn4impl8scales_t3setEiiPKf+1015>
   0x00007fffbe810f0b <+667>:   vzeroupper 
   0x00007fffbe810f0e <+670>:   movslq %edx,%rdx
   0x00007fffbe810f11 <+673>:   nopl   0x0(%rax)
   0x00007fffbe810f18 <+680>:   vmovss (%r12,%rdx,4),%xmm0
   0x00007fffbe810f1e <+686>:   vmovss %xmm0,(%rdi,%rdx,4)
   0x00007fffbe810f23 <+691>:   add    $0x1,%rdx
   0x00007fffbe810f27 <+695>:   cmp    %edx,%r8d
   0x00007fffbe810f2a <+698>:   jg     0x7fffbe810f18 <_ZN6mkldnn4impl8scales_t3setEiiPKf+680>
   0x00007fffbe810f2c <+700>:   jmpq   0x7fffbe810cef <_ZN6mkldnn4impl8scales_t3setEiiPKf+127>
   0x00007fffbe810f31 <+705>:   nopl   0x0(%rax)
   0x00007fffbe810f38 <+712>:   vmovss (%r12),%xmm0
   0x00007fffbe810f3e <+718>:   vmovss %xmm0,0x10(%rbx)
   0x00007fffbe810f43 <+723>:   vmovss (%r12),%xmm0
   0x00007fffbe810f49 <+729>:   vmovss %xmm0,0x14(%rbx)
   0x00007fffbe810f4e <+734>:   vmovss (%r12),%xmm0
   0x00007fffbe810f54 <+740>:   vmovss %xmm0,0x18(%rbx)
   0x00007fffbe810f59 <+745>:   vmovss (%r12),%xmm0
   0x00007fffbe810f5f <+751>:   vmovss %xmm0,0x1c(%rbx)
   0x00007fffbe810f64 <+756>:   vmovss (%r12),%xmm0
   0x00007fffbe810f6a <+762>:   vmovss %xmm0,0x20(%rbx)
   0x00007fffbe810f6f <+767>:   vmovss (%r12),%xmm0
   0x00007fffbe810f75 <+773>:   vmovss %xmm0,0x24(%rbx)
   0x00007fffbe810f7a <+778>:   vmovss (%r12),%xmm0
   0x00007fffbe810f80 <+784>:   vmovss %xmm0,0x28(%rbx)
   0x00007fffbe810f85 <+789>:   vmovss (%r12),%xmm0
   0x00007fffbe810f8b <+795>:   vmovss %xmm0,0x2c(%rbx)
   0x00007fffbe810f90 <+800>:   vmovss (%r12),%xmm0
   0x00007fffbe810f96 <+806>:   vmovss %xmm0,0x30(%rbx)
   0x00007fffbe810f9b <+811>:   vmovss (%r12),%xmm0
   0x00007fffbe810fa1 <+817>:   vmovss %xmm0,0x34(%rbx)
   0x00007fffbe810fa6 <+822>:   vmovss (%r12),%xmm0
   0x00007fffbe810fac <+828>:   vmovss %xmm0,0x38(%rbx)
   0x00007fffbe810fb1 <+833>:   vmovss (%r12),%xmm0
   0x00007fffbe810fb7 <+839>:   vmovss %xmm0,0x3c(%rbx)
   0x00007fffbe810fbc <+844>:   vmovss (%r12),%xmm0
   0x00007fffbe810fc2 <+850>:   vmovss %xmm0,0x40(%rbx)
   0x00007fffbe810fc7 <+855>:   vmovss (%r12),%xmm0
   0x00007fffbe810fcd <+861>:   vmovss %xmm0,0x44(%rbx)
   0x00007fffbe810fd2 <+866>:   vmovss (%r12),%xmm0
   0x00007fffbe810fd8 <+872>:   vmovss %xmm0,0x48(%rbx)
   0x00007fffbe810fdd <+877>:   vmovss (%r12),%xmm0
   0x00007fffbe810fe3 <+883>:   vmovss %xmm0,0x4c(%rbx)
   0x00007fffbe810fe8 <+888>:   jmpq   0x7fffbe810cef <_ZN6mkldnn4impl8scales_t3setEiiPKf+127>
   0x00007fffbe810fed <+893>:   nopl   (%rax)
   0x00007fffbe810ff0 <+896>:   lea    -0x1(%r8),%eax
   0x00007fffbe810ff4 <+900>:   lea    0x4(,%rax,4),%rdx
   0x00007fffbe810ffc <+908>:   xor    %eax,%eax
   0x00007fffbe810ffe <+910>:   xchg   %ax,%ax
   0x00007fffbe811000 <+912>:   vmovss (%r12,%rax,1),%xmm0
   0x00007fffbe811006 <+918>:   vmovss %xmm0,(%rdi,%rax,1)
   0x00007fffbe81100b <+923>:   add    $0x4,%rax
   0x00007fffbe81100f <+927>:   cmp    %rdx,%rax
   0x00007fffbe811012 <+930>:   jne    0x7fffbe811000 <_ZN6mkldnn4impl8scales_t3setEiiPKf+912>
   0x00007fffbe811014 <+932>:   jmpq   0x7fffbe810cef <_ZN6mkldnn4impl8scales_t3setEiiPKf+127>
   0x00007fffbe811019 <+937>:   nopl   0x0(%rax)
   0x00007fffbe811020 <+944>:   mov    $0x1,%edx
   0x00007fffbe811025 <+949>:   jmpq   0x7fffbe810ec0 <_ZN6mkldnn4impl8scales_t3setEiiPKf+592>
   0x00007fffbe81102a <+954>:   mov    $0x9,%edx
   0x00007fffbe81102f <+959>:   jmpq   0x7fffbe810ec0 <_ZN6mkldnn4impl8scales_t3setEiiPKf+592>
   0x00007fffbe811034 <+964>:   nopl   0x0(%rax)
   0x00007fffbe811038 <+968>:   mov    $0x2,%edx
   0x00007fffbe81103d <+973>:   jmpq   0x7fffbe810ec0 <_ZN6mkldnn4impl8scales_t3setEiiPKf+592>
   0x00007fffbe811042 <+978>:   xor    %edx,%edx
   0x00007fffbe811044 <+980>:   jmpq   0x7fffbe810f0e <_ZN6mkldnn4impl8scales_t3setEiiPKf+670>
   0x00007fffbe811049 <+985>:   mov    $0x3,%edx
   0x00007fffbe81104e <+990>:   jmpq   0x7fffbe810ec0 <_ZN6mkldnn4impl8scales_t3setEiiPKf+592>
   0x00007fffbe811053 <+995>:   mov    $0x4,%edx
   0x00007fffbe811058 <+1000>:  jmpq   0x7fffbe810ec0 <_ZN6mkldnn4impl8scales_t3setEiiPKf+592>
   0x00007fffbe81105d <+1005>:  mov    $0x5,%edx
   0x00007fffbe811062 <+1010>:  jmpq   0x7fffbe810ec0 <_ZN6mkldnn4impl8scales_t3setEiiPKf+592>
   0x00007fffbe811067 <+1015>:  vzeroupper 
   0x00007fffbe81106a <+1018>:  jmpq   0x7fffbe810cef <_ZN6mkldnn4impl8scales_t3setEiiPKf+127>
   0x00007fffbe81106f <+1023>:  mov    $0x6,%edx
   0x00007fffbe811074 <+1028>:  jmpq   0x7fffbe810ec0 <_ZN6mkldnn4impl8scales_t3setEiiPKf+592>
   0x00007fffbe811079 <+1033>:  xor    %edx,%edx
   0x00007fffbe81107b <+1035>:  jmpq   0x7fffbe810ec0 <_ZN6mkldnn4impl8scales_t3setEiiPKf+592>
   0x00007fffbe811080 <+1040>:  mov    $0x7,%edx
   0x00007fffbe811085 <+1045>:  jmpq   0x7fffbe810ec0 <_ZN6mkldnn4impl8scales_t3setEiiPKf+592>
   0x00007fffbe81108a <+1050>:  mov    $0x8,%edx
   0x00007fffbe81108f <+1055>:  jmpq   0x7fffbe810ec0 <_ZN6mkldnn4impl8scales_t3setEiiPKf+592>
   0x00007fffbe811094 <+1060>:  mov    $0xa,%edx
   0x00007fffbe811099 <+1065>:  jmpq   0x7fffbe810ec0 <_ZN6mkldnn4impl8scales_t3setEiiPKf+592>
   0x00007fffbe81109e <+1070>:  mov    $0xb,%edx
   0x00007fffbe8110a3 <+1075>:  jmpq   0x7fffbe810ec0 <_ZN6mkldnn4impl8scales_t3setEiiPKf+592>
   0x00007fffbe8110a8 <+1080>:  mov    $0xc,%edx
   0x00007fffbe8110ad <+1085>:  jmpq   0x7fffbe810ec0 <_ZN6mkldnn4impl8scales_t3setEiiPKf+592>
   0x00007fffbe8110b2 <+1090>:  mov    $0xd,%edx
   0x00007fffbe8110b7 <+1095>:  jmpq   0x7fffbe810ec0 <_ZN6mkldnn4impl8scales_t3setEiiPKf+592>
   0x00007fffbe8110bc <+1100>:  mov    $0xe,%edx
   0x00007fffbe8110c1 <+1105>:  jmpq   0x7fffbe810ec0 <_ZN6mkldnn4impl8scales_t3setEiiPKf+592>

EDIT :
I upgraded to torch 1.4.0 with conda install pytorch torchvision cpuonly -c pytorch, and then it worked (after that, I got some SIGKILL error because I needed more memory, but otherwise it was an unrelated problem that I managed to resolve)

@ezyang
Copy link
Contributor

ezyang commented Feb 3, 2020

Maybe we should offer some way to manually override the instruction selection

@peterbell10
Copy link
Collaborator

Maybe we should offer some way to manually override the instruction selection

I think exporting ATEN_CPU_CAPABILITY=default should work:

auto envar = std::getenv("ATEN_CPU_CAPABILITY");

@peterbell10
Copy link
Collaborator

Actually, that's not quite right. It looks like TH has it's own cpu capability detection and with it it's own environment variables:

evar = getenv("TH_NO_AVX2");

So, you would actually need TH_NO_AVX=1 TH_NO_AVX2=1 as well.

@ezyang Would it make sense to have TH call at::native::get_cpu_capability to avoid that duplication? It might also be more robust since aten doesn't make the cpuid calls itself, and just uses cpuinfo.

@ezyang
Copy link
Contributor

ezyang commented Mar 2, 2020

@peterbell10 Yes, absolutely. This is a nice find.

facebook-github-bot pushed a commit that referenced this issue Mar 4, 2020
Summary:
As per #22338 (comment), this removes the AVX detection code from TH. Now the environment variable `ATEN_CPU_CAPABILITY` is the only setting needed to disable AVX/AVX2.
Pull Request resolved: #34088

Differential Revision: D20236039

Pulled By: ezyang

fbshipit-source-id: eecec64b41a7a6ca7e42c1c2762032eb47af535c
ttumiel pushed a commit to ttumiel/pytorch that referenced this issue Mar 4, 2020
Summary:
As per pytorch#22338 (comment), this removes the AVX detection code from TH. Now the environment variable `ATEN_CPU_CAPABILITY` is the only setting needed to disable AVX/AVX2.
Pull Request resolved: pytorch#34088

Differential Revision: D20236039

Pulled By: ezyang

fbshipit-source-id: eecec64b41a7a6ca7e42c1c2762032eb47af535c
@mattip
Copy link
Collaborator

mattip commented Apr 19, 2020

Other than more clearly documenting the need for ATEN_CPU_CAPABILITY=default under qemu, is there anything more to be done here?

@ezyang
Copy link
Contributor

ezyang commented Apr 20, 2020

I think that, if possible, it would be good for capability detection to automatically discover the qemu situation and "do the right thing". If qemu is straight up reporting the wrong capabilities there may not be very much we can do though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
high priority module: cpu CPU specific problem (e.g., perf, algorithm) module: crash Problem manifests as a hard crash, as opposed to a RuntimeError module: vectorization Related to SIMD vectorization, e.g., Vec256 triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests

8 participants