-
Notifications
You must be signed in to change notification settings - Fork 23.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Illegal instruction (core dumped) when running in qemu #22338
Comments
Hi, if possible, can you run the program under gdb and then at the SIGILL site, run |
cc @cpuhrsch this looks like some sort of cpuinfo(?) problem when running under qemu. |
I tried adding avx but still the same issue. (pyt1) webtech@USA:/U01/code$ lscpu The stack after running with gdb [Thread debugging using libthread_db enabled] Thread 1 "python" received signal SIGILL, Illegal instruction. |
Can you disas at the failure site? So we can see what the problem is. |
Can you perhaps try adding avx2, since the SIGILL is probably on an avx2 instruction? In any case, this is a bug. |
removing high priority until we see a case outside of qemu or get a reproduction. |
Hi, Here is my cpu info: $ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Thread(s) per core: 1
Core(s) per socket: 4
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 13
Model name: QEMU Virtual CPU version 1.5.3
Stepping: 3
CPU MHz: 2399.996
BogoMIPS: 4799.99
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 4096K
NUMA node0 CPU(s): 0-3
Flags: fpu de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pse36 clflush mmx fxsr sse sse2 ht syscall nx lm rep_good nopl pni cx16 hypervisor lahf_lm abm And gdb: Thread 1 "python" received signal SIGILL, Illegal instruction.
0x00007fff9eaea4d0 in THFloatVector_normal_fill_AVX2 () from [...]/python3.7/site-packages/torch/lib/libtorch.so Any news on this ? |
@benoitmartin88 We deprioritized this issue because we weren't able to easily get a reproduction. Do you think you could give us more detailed information about your qemu setup? |
@ezyang Unfortunately I do not have more information on the qemu setup as it is part of a continuous integration setup that I use (most probably VMs). I will try and find more about the used setup and let you know as soon as I have something. |
@ezyang I'm afraid I won't be getting any more info on the qemu setup. That being said, it probably has to do with the fact that my qemu virtual cpu does not have the avx2 instruction set. This would probably be reproducible if qemu was configured to not use avx2. |
Thanks. We'll give that a try. |
This issue has been showing up again for some users, see pytorch/vision#1782 and #29371 for an example. Here is the stacktrace from pytorch/vision#1782
|
I faced the same problem, with torch 1.1.0, here's the trace.
gdb disas : (gdb) disas
EDIT : |
Maybe we should offer some way to manually override the instruction selection |
I think exporting
|
Actually, that's not quite right. It looks like pytorch/aten/src/TH/vector/simd.h Line 144 in 327e94f
So, you would actually need TH_NO_AVX=1 TH_NO_AVX2=1 as well.
@ezyang Would it make sense to have |
@peterbell10 Yes, absolutely. This is a nice find. |
Summary: As per #22338 (comment), this removes the AVX detection code from TH. Now the environment variable `ATEN_CPU_CAPABILITY` is the only setting needed to disable AVX/AVX2. Pull Request resolved: #34088 Differential Revision: D20236039 Pulled By: ezyang fbshipit-source-id: eecec64b41a7a6ca7e42c1c2762032eb47af535c
Summary: As per pytorch#22338 (comment), this removes the AVX detection code from TH. Now the environment variable `ATEN_CPU_CAPABILITY` is the only setting needed to disable AVX/AVX2. Pull Request resolved: pytorch#34088 Differential Revision: D20236039 Pulled By: ezyang fbshipit-source-id: eecec64b41a7a6ca7e42c1c2762032eb47af535c
Other than more clearly documenting the need for |
I think that, if possible, it would be good for capability detection to automatically discover the qemu situation and "do the right thing". If qemu is straight up reporting the wrong capabilities there may not be very much we can do though. |
I had installed Pytorch using Anaconda from pytorch website.
conda install pytorch-cpu torchvision-cpu -c pytorch
I'm getting Illegal instruction (core dumped)
When I start the app using gdb, I get the following error.
I guess something is wrong with libcaffe2.so
Thread 1 "python" received signal SIGILL, Illegal instruction.
0x00007fffa8ec2486 in THFloatVector_normal_fill_AVX2 () from /U01/anaconda3/envs/pyt1/lib/python3.7/site-packages/torch/lib/libcaffe2.so
Machine configuration:
Machine is Ubuntu 18.04.2
cc @ezyang @gchanan @zou3519 @VitalyFedyunin
The text was updated successfully, but these errors were encountered: