Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

mxnet-cu90mkl does not work on g2.8xlarge instance #12817

Closed
mirocody opened this issue Oct 12, 2018 · 9 comments
Closed

mxnet-cu90mkl does not work on g2.8xlarge instance #12817

mirocody opened this issue Oct 12, 2018 · 9 comments

Comments

@mirocody
Copy link

Description

got core dump when import mxnet(mxnet-cu90mkl) in g2.8xlarge instance.

Environment info (Required)

----------Python Info----------
('Version :', '2.7.15')
('Compiler :', 'GCC 7.2.0')
('Build :', ('default', 'May 1 2018 23:32:55'))
('Arch :', ('64bit', ''))
------------Pip Info-----------
('Version :', '10.0.1')
('Directory :', '/home/ec2-user/anaconda3/envs/mxnet_p27/lib/python2.7/site-packages/pip')
----------MXNet Info-----------
Illegal instruction

Note:
Below hardware info are getting from mxnet1.3.0post0, as mxnet-cu90mkl will stop when testing MXNet.
----------System Info----------
('Platform :', 'Linux-4.14.72-68.55.amzn1.x86_64-x86_64-with-glibc2.2.5')
('system :', 'Linux')
('node :', 'ip-172-31-23-121')
('release :', '4.14.72-68.55.amzn1.x86_64')
('version :', '#1 SMP Fri Sep 28 21:14:54 UTC 2018')
----------Hardware Info----------
('machine :', 'x86_64')
('processor :', 'x86_64')
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 32
On-line CPU(s) list: 0-31
Thread(s) per core: 2
Core(s) per socket: 8
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 45
Model name: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
Stepping: 7
CPU MHz: 2599.851
BogoMIPS: 5270.04
Hypervisor vendor: Xen
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 20480K
NUMA node0 CPU(s): 0-15
NUMA node1 CPU(s): 16-31

Package used (Python/R/Scala/Julia):
Python

Error Message:

Illegal instruction

Minimum reproducible example

python
import mxnet

Steps to reproduce

(Paste the commands you ran that produced the error.)

What have you tried to solve it?

Install mxnet1.3.0post0 it works, but mxnet-cu90mkl does not work at all.

@piyushghai
Copy link
Contributor

@mxnet-label-bot [Build, Installation, Breaking]

@marcoabreu
Copy link
Contributor

Did you compile MXNet from source? It looks like you're using a published version of MXNet that was compiled for a CPU (?) instruction set that's not supported by the CPU in a G2 instance (aka, the CPU is too old).

This should be resolved if you compile from source. Could you give that a shot please?

@piyushghai
Copy link
Contributor

@mirocody Was this issue resolved for you ?

@KellenSunderland
Copy link
Contributor

Agree with Marco, looks like the error is related to AVX vector instructions being used that aren't available on the E5-2670.

@lupesko
Copy link
Contributor

lupesko commented Oct 21, 2018

@mirocody bouncing again.
Suggest you check these steps:
(1) Try with non-mkl version.
(2) if it still fails, try building from source.

@mirocody
Copy link
Author

Thanks for the response. I will give a try to compile from source, but just want to make sure here: is there any plan to fix the pip package, so it can work on g2? We need to know this so we can discuss our plan for dlami.

@mirocody
Copy link
Author

Thanks @marcoabreu @KellenSunderland @lupesko for your suggestions.
Update:
After trying different versions of packages, got following observations:

  1. both cu90-mkl and cu90 versions wok on E5-2650
  2. Both cu90-mkl and cu90 versions failed on E5-2670

Currently we do not have plan to build from source for g2(which has both e5 2670 and 2650 cpu) specifically. We need find a way that make it work on e5-2670 too.

@apeforest
Copy link
Contributor

apeforest commented Oct 26, 2018

@mirocody This problem has been fixed in the latest release 1.3.1 (to be released)
You may verify using pip install mxnet-cu90mkl --pre

@mirocody
Copy link
Author

Verified, thanks. I close this issue.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants