Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

mkldnn is not properly installed #15294

Closed
hubutui opened this issue Jun 20, 2019 · 20 comments
Closed

mkldnn is not properly installed #15294

hubutui opened this issue Jun 20, 2019 · 20 comments

Comments

@hubutui
Copy link

hubutui commented Jun 20, 2019

Note: Providing complete information in the most concise form is the best way to get help. This issue template serves as the checklist for essential information to most of the technical issues and bug reports. For non-technical issues and feature requests, feel free to present the information in what you believe is the best form.

For Q & A and discussion, please start a discussion thread at https://discuss.mxnet.io

Description

libmklml_intel.so and libmklml_gnu.so are missing in the final installation, only libmkldnn.so is installed. But libmkldnn.so needs libmklml_intel.so, not sure if libmklml_gnu.so is needed too.

Environment info (Required)

Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
Address sizes:       46 bits physical, 48 bits virtual
CPU(s):              24
On-line CPU(s) list: 0-23
Thread(s) per core:  2
Core(s) per socket:  6
Socket(s):           2
NUMA node(s):        2
Vendor ID:           GenuineIntel
CPU family:          6
Model:               63
Model name:          Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz
Stepping:            2
CPU MHz:             1200.000
CPU max MHz:         2600.0000
CPU min MHz:         1200.0000
BogoMIPS:            4801.92
Virtualization:      VT-x
L1d cache:           32K
L1i cache:           32K
L2 cache:            256K
L3 cache:            15360K
NUMA node0 CPU(s):   0,2,4,6,8,10,12,14,16,18,20,22
NUMA node1 CPU(s):   1,3,5,7,9,11,13,15,17,19,21,23
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm cpuid_fault epb invpcid_single pti tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm xsaveopt cqm_llc cqm_occup_llc dtherm ida arat pln pts
----------Python Info----------
Version      : 3.7.3
Compiler     : GCC 8.2.1 20181127
Build        : ('default', 'Mar 26 2019 21:43:19')
Arch         : ('64bit', 'ELF')
------------Pip Info-----------
No corresponding pip install for current python.
----------MXNet Info-----------
No MXNet installed.
----------System Info----------
Platform     : Linux-4.18.5-arch1-1-ARCH-x86_64-with-arch
system       : Linux
node         : hubutui
release      : 4.18.5-arch1-1-ARCH
version      : #1 SMP PREEMPT Fri Aug 24 12:48:58 UTC 2018
----------Hardware Info----------
machine      : x86_64
processor    : 
----------Network Test----------
Setting timeout: 10
Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0062 sec, LOAD: 0.4946 sec.
Timing for Gluon Tutorial(en): http://gluon.mxnet.io, DNS: 0.2516 sec, LOAD: 0.7011 sec.
Timing for Gluon Tutorial(cn): https://zh.gluon.ai, DNS: 0.0924 sec, LOAD: 0.7636 sec.
Timing for FashionMNIST: https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.0213 sec, LOAD: 0.9804 sec.
Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0024 sec, LOAD: 0.2850 sec.
Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 0.0035 sec, LOAD: 0.0291 sec.

Package used (Python/R/Scala/Julia):
python 3.7

Build info (Required if built from source)

Compiler (gcc/clang/mingw/visual studio):
clang 8.0.0
MXNet commit hash:
(Paste the output of git rev-parse HEAD here.)
4d96671
Build config:
(Paste the content of config.mk, or the build command.)

cmake \
    -DBUILD_CPP_EXAMPLES=OFF \
    -DBUILD_TESTING=OFF \
    -DCMAKE_BUILD_TYPE:String=Release \
    -DCMAKE_C_COMPILER=clang \
    -DCMAKE_CXX_COMPILER=clang++ \
    -DCMAKE_EXE_LINKER_FLAGS=$(pkg-config --libs blas lapacke cblas) \
    -DCMAKE_INSTALL_LIBDIR:PATH=lib \
    -DCMAKE_INSTALL_PREFIX:PATH=/usr \
    -DCMAKE_SHARED_LINKER_FLAGS=$(pkg-config --libs blas lapacke cblas) \
    -DUSE_BLAS=open \
    -DUSE_MKLDNN:BOOL=ON \
    -DUSE_MKLML_MKL=OFF \
    -DUSE_NCCL:BOOL=OFF \
    -DUSE_CUDA:BOOL=OFF \
    -DUSE_CUDNN:BOOL=OFF \
    -DUSE_OPENCV:BOOL=ON \
    -GNinja \
    ..
ninja
ninja install

Error Message:

ldd /usr/lib/libmkldnn.so
	linux-vdso.so.1 (0x00007ffd4e325000)
	libcblas.so.3 => /usr/lib/libcblas.so.3 (0x00007f433c10f000)
	libmklml_intel.so => not found
	libiomp5.so => /usr/lib/libiomp5.so (0x00007f433c030000)
	libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x00007f433be48000)
	libm.so.6 => /usr/lib/libm.so.6 (0x00007f433bd02000)
	libgcc_s.so.1 => /usr/lib/libgcc_s.so.1 (0x00007f433bce8000)
	libc.so.6 => /usr/lib/libc.so.6 (0x00007f433bb23000)
	/usr/lib64/ld-linux-x86-64.so.2 (0x00007f433d151000)
	libblas.so.3 => /usr/lib/libblas.so.3 (0x00007f433a8ea000)
	libpthread.so.0 => /usr/lib/libpthread.so.0 (0x00007f433a8c9000)
	libdl.so.2 => /usr/lib/libdl.so.2 (0x00007f433a8c4000)
	libgomp.so.1 => /usr/lib/libgomp.so.1 (0x00007f433a88b000)

Minimum reproducible example

(If you are using your own code, please provide a short script that reproduces the error. Otherwise, please provide link to the existing example.)

Steps to reproduce

(Paste the commands you ran that produced the error.)

  1. build with the config mentioned above
  2. ldd /usr/lib/libmkldnn.so

What have you tried to solve it?

  1. manual install the missing mkldnn so files. and it seems works.
@mxnet-label-bot
Copy link
Contributor

Hey, this is the MXNet Label Bot.
Thank you for submitting the issue! I will try and suggest some labels so that the appropriate MXNet community members can help resolve it.

@hubutui
Copy link
Author

hubutui commented Jun 20, 2019

Also, I have add -llapacke to linker flags. Otherwise, I got something like:

la_op.cc:(.text._Z12linalg_orglqIN7mshadow3cpuEdEvRKNS0_6TensorIT_Li2ET0_EERKNS2_IS3_Li1ES4_EEPNS0_6StreamIS3_EE[_Z12linalg_orglqIN7mshadow3cpuEdEvRKNS0_6TensorIT_Li2ET0_EERKNS2_IS3_Li1ES4_EEPNS0_6StreamIS3_EE]+0x4f): undefined reference to `LAPACKE_dorglq'

It seems that lapacke is not properly linked when using mkl-dnn.

@TaoLv
Copy link
Member

TaoLv commented Jun 20, 2019

@NeoZhangJianyu please help to take a look the cmake build. Thanks!

@leleamol
Copy link
Contributor

@mxnet-label-bot add [Build, MKLDNN]

@NeoZhangJianyu
Copy link
Contributor

@hubutui The MKLML will be compiled when MKLDNN is enable, even if the -DUSE_MKLML_MKL=OFF.

  1. Could you run following cmd in the build folder?
find . -name *mklml_intel.so
./3rdparty/mkldnn/external/mklml_lnx_2019.0.5.20190502/lib/libmklml_intel.so
./build/mklml/mklml_lnx_2019.0.5.20190502/lib/libmklml_intel.so
  1. Could you share the build log?

@hubutui
Copy link
Author

hubutui commented Jun 25, 2019

I use the PKGBUILD to build mxnet in a archroot environment days ago, so the build dir is deleted automatically. But here is the complete build log for these for package (mxnet, mxnet-mkl, mxnet-cuda, mxnet-cuda-mkl). Note that in the build log, I use gcc 8 instead of clang. To reproduce:

  1. A ArchLinux environment with Arch Linux Chinese Community Repository, maybe using Docker?
  2. install devtools-cn-git which is available in ArchLinuxCN repo.
  3. Download the PKGBUILD to mxnet.
  4. cd mxnet, and then archlinuxcn-x86_64-build -- -- --nocheck. Yeah, we need to skip the check. The checks step would fail.
    It would take up to 2 hours for building mxnet depending on your system hardware and internet speed.

mxnet-1.5.0.rc1-5-x86_64-prepare.log is the log of prepare function in the PKGBUILD.
mxnet-1.5.0.rc1-5-x86_64-build.log is the log of build function in the PKGBUILD, aka the build log, corresponding to the ninja -v or make step.

mxnet-1.5.0.rc1-5-x86_64-package_mxnet.log
mxnet-1.5.0.rc1-5-x86_64-package_mxnet-cuda.log
mxnet-1.5.0.rc1-5-x86_64-package_mxnet-cuda-mkl.log
mxnet-1.5.0.rc1-5-x86_64-package_mxnet-mkl.log are the logs of the package function to mxnet, mxnet-mkl, mxnet-cuda, mxnet-cuda-mkl packages, corresponding to the ninja install or make install step.

@hubutui
Copy link
Author

hubutui commented Jun 26, 2019

hmm, it seems that mkl-dnn link to mklml libs provided in the mkl-dnn release page when buiding. DownloadMKLML.cmake downloads the mklml libs to build mkl-dnn. This behavior is affected by MKLDNN_USE_MKL. So, in the finnal installation step of mxnet, mklml should be installed too, but it's not.
Currently, mxnet use mkl-dnn 0.19. However, according to latest doc of mkl-dnn, MKLDNN_USE_MKL will not be available in next release. And actually, mkl-dnn doesn't need mklml at all at build time, we could use openmp from clang or gcc itself instead.
So, in my opinion, we got two options to solve this issue:

  1. update mkl-dnn to the v1.0rc release, which does not need mklml
  2. maintain the current version of mkl-dnn, but build mkl-dnn without mklml

@TaoLv
Copy link
Member

TaoLv commented Jun 26, 2019

So, in the finnal installation step of mxnet, mklml should be installed too, but it's not.

This sounds like an issue. Can you elaborate more? We didn't get issue report about that before.

update mkl-dnn to the v1.0rc release, which does not need mklml

We definitely will update MKL-DNN to v1.0 in the future. But it's not as easy as change the commit id. v1.0 is not compatible with v0.x versions. So many things need be changed.

maintain the current version of mkl-dnn, but build mkl-dnn without mklml

That will hurt performance.

@hubutui
Copy link
Author

hubutui commented Jun 26, 2019

you could check the build log. Or reproduce as described above. If I do not manually install mklml as https://github.com/archlinuxcn/repo/blob/master/archlinuxcn/mxnet/PKGBUILD do, libmklml_intel.so and libiomp5.so would be missing in the final installation. Also, libiomp5.so provided by mklml will confilict with openmp package, which also installs libiomp5.so in the same place. A pr is created to solve the openmp issue, but might not be merged. That's another issue that has been discussed for a long time, but nothing yet could be done.

@yinghu5
Copy link
Contributor

yinghu5 commented Jul 5, 2019

Hi butui
Thank you a lot for your report. I actually was able to reproduce the problem under linux by CMake install system.

Cmake ..
make -j && make install
and did some investigation:
the root cause should be CMAKE install mechanism will do the strip by default. It removes the runtime dependency library libmklml_intel.so (coexist wtih intel openmp ) or libmklml_gnu.so (coexist with gomp).
for example, after Cmake, you can see the below code in the cmake_install.cmake :

file(RPATH_CHANGE
FILE "${file}"
OLD_RPATH "/home/yhu5/mxnet/incubator-mxnet/build/mklml/mklml_lnx_2019.0.5.20190502/lib:"
# NEW_RPATH " ")
NEW_RPATH "$ENV{DESTDIR}${CMAKE_INSTALL_PREFIX}/lib:")
if(CMAKE_INSTALL_DO_STRIP)
execute_process(COMMAND "/usr/bin/strip" "${file}")
it did the remove.
So one of basic solution may be

  1. to rewrite the CMAKElist .txt to make sure cmake install the libmklml and set the set(CMAKE_INSTALL_RPATH "${CMAKE_INSTALL_PREFIX}/lib" for example,
    it required to create a new PR for libmklml install

I discussed this with tao.lv and as you sknow, there are some change about libmklml in later design. so I may suggested keep the current workaround as you did
or use the workaround :
2. 1. cmake make -j , make install after that manually export the so runtime path.
2. 2 export LD_LIBRARY_PATH = < include the path of mklml
how do you think? Please feel free to let us know if the workaround is ok.

@hubutui
Copy link
Author

hubutui commented Jul 5, 2019

Currently, the workaround is install mkldnn libs to /usr/lib/mxnet/mkldnn, patch with RPATH, see https://github.com/archlinuxcn/repo/blob/master/archlinuxcn/mxnet/PKGBUILD#L170. You might wanna consider installing 3rdparty deps (include headers and libs) in to /usr/lib/mxnet subfolder to avoid potential conflicts.

@pengzhao-intel
Copy link
Contributor

@hubutui does our document can resolve the issue? If not, what's part missed?
https://mxnet.incubator.apache.org/versions/master/tutorials/mkldnn/MKLDNN_README.html

@hubutui
Copy link
Author

hubutui commented Aug 6, 2019

@pengzhao-intel I'm not sure, I don't use the Makefile to build, I use cmake.

@pengzhao-intel
Copy link
Contributor

@hubutui does this issue resolve now?

@hubutui
Copy link
Author

hubutui commented Nov 2, 2019

@pengzhao-intel Yeah, after bump mkldnn to 1.0 version, I could succussfully build mxnet now. Any plan to install mkldnn so files to mxnet's subfolder, say /usr/lib/mxnet/mkldnn? And the header files of mkldnn. Or could we compile mxnet with system's mkldnn? There is a package mkl-dnn available in arch4edu repo, but not in official repo yet for ArchLinux.

@pengzhao-intel
Copy link
Contributor

@pengzhao-intel Yeah, after bump mkldnn to 1.0 version, I could succussfully build mxnet now. Any plan to install mkldnn so files to mxnet's subfolder, say /usr/lib/mxnet/mkldnn? And the header files of mkldnn. Or could we compile mxnet with system's mkldnn? There is a package mkl-dnn available in arch4edu repo, but not in official repo yet for ArchLinux.

It's great that the problem is fixed in the latest code. Currently, the MKLDNN in the submodule directory. It's not doable to use the system's mkldnn since the API is not portable in different versions like from v0.x to v1.x.

@hubutui
Copy link
Author

hubutui commented Nov 3, 2019

OK, I see. To avoid potential file conflicts with mkl-dnn, I would need to manual install the mkldnn so files to /usr/lib/mxnet/mkldnn and patch the rpath for now.

@TaoLv
Copy link
Member

TaoLv commented Nov 3, 2019

@hubutui Do you think it's still a problem if we statically link mkldnn.a into mxnet? I know there was an effort in mxnet community for that.

@pengzhao-intel
Copy link
Contributor

the static link is used in master so I think the problem is fixed, closing.

@pengzhao-intel
Copy link
Contributor

#16731

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

8 participants