-
Notifications
You must be signed in to change notification settings - Fork 6.8k
OpenMP Error #17641
Comments
that is because unfortunately, openmp dont have pkg-config file, or cmake files for interact with cmake seems need a own cmake module for search it |
Same issue: #17366 |
this case also same effects with intel-dnnl. some distros already provide a package intel-dnnl, but mxnet force download the sources again |
@cjolivier01 you previously vetoed changing the omp configuration in cmake build, due to a race condition that had not been fixed. As that has been fixed, are you OK with proceeding to prefer system OMP for the CMake build by default? Or what is your recommendation? Static build should still statically build omp. |
@sl1pkn07 given the rapid development of intel-dnnl, MXNet expects a fixed version of intel-dnnl. It's quite unlikely that the system provides that particular version, but patches to improve detection are welcome. Do you want to contribute a PR? But let's track this in a separate issue. |
|
btw, cmake files have min cmake at 3.13, but default 18.04 cmake install is cmake 3.10. Does anyone know what the deal is with 3.13? Ubuntu 18.04 is a pretty widely-used release... |
Not actually. Due to no legitimate reason to remove it. |
MKL
Just
Speed up developer build. No need to build llvm openmp if system openmp is present. |
openmp is like a 4-5-second build. On my desktop machine it's < 3: |
I installed mkl, but it does not appear to pick it up. is there a way to force it? |
Actually, i don;t see this behavior when it does pull in mkl/pulling in the other omp (this is Ubuntu 18.04):
I don;t show libmkl_rt.so pulling in libiomp5:
|
I think |
Linking in any version of omp statically would probably be a bad idea, since startup order would be important. |
Clearly it does not: |
Yes, that's why |
and with |
can you supply a script to reproduc this error? I am not able to reproduce. |
i'm using system openmp and no mkl-dnnl. sorry @icemelon9? |
@sl1pkn07 please open a separate issue for your problem. This issue is about MKL. |
@icemelon9 please provide the a reproducer to trigger the error message. |
Sorry about the late response. Here's the script to reproduce the error message. import numpy as np
import mxnet as mx
a = mx.nd.array(np.random.uniform(size=(1024, 128)).astype('float32'))
b = mx.nd.array(np.random.uniform(size=(128, 1024)).astype('float32'))
c = mx.nd.dot(a, b)
c.wait_to_read() The following shows shared library used by libmxnet on my machine.
|
|
(pytorch) [chriso@chriso-ripper:~/src/mxnet (master)]python
Python 3.6.10 |Anaconda, Inc.| (default, Jan 7 2020, 21:14:29)
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> import mxnet as mx
>>>
>>> a = mx.nd.array(np.random.uniform(size=(1024, 128)).astype('float32'))
>>> b = mx.nd.array(np.random.uniform(size=(128, 1024)).astype('float32'))
>>> c = mx.nd.dot(a, b)
>>> c.wait_to_read()
>>> exit() Stll can't reproduce. |
CI can also reproduce this issue. I switched CI to testing CMake builds instead of Makefile build in #17645 and the Python MKLDNN + MKL Pipeline fails with this issue: Log of test failure and Raw log of test failure and Raw log of build @cjolivier01 the build log contains the output of cmake configuration. That pipeline relies on the following build
|
Here is the cmake log.
|
This is with latest 2020 version of mkl? |
@TaoLv which version of gcc? |
@cjolivier01 How can we stop clang from pulling in libomp? It doesn't have an effect when static linking, as the symbols are already resolved, but it would be better to not pull libomp in in the first place. |
Stopping clang/others from linking to omp seems like the tail wagging the dog. I think we should consider other options, such as making the static mkl build work, or somehow stopping mkl from being so "clever". |
It's 4.8.5 on centos. Do you want me to try a higher version or exclude opencv from the build?
@leezu, with this flag on, I would expect only MKL libraries to be statically linked while omp runtime is dynamically linked to mxnet.so. That's how we handle the omp linkage of DNNL. |
@leezu what's the error of statically linking MKL libraries? |
@leezu, previously we thought intel is not distributing iomp static library: #8532 (comment). But from the linked issue, even we fix the omp runtime conflict inside mxnet, we may still encounter conflicts in down stream projects. |
for me it was some link error on some secondary thing like cpp unit test or something like that. libmxnet.so built successfully and the test script was successful. probably not too hard to fix. |
Thank you @cjolivier01 . That's exactly what I just observed. |
@TaoLv @cjolivier01 its not hard to fix. If you look above, I posted the patch to fix it 12 hours ago. An improved version of that patch is in #17645 |
Thanks @leezu! It seems that we have got a consensus to address this issue? |
there’s a lot of stuff in that PR,would prefer a more targeted PR. |
i will post a pr in the next day or two that addresses this and also clang issue as well as transitive omp dependencies which may also cause the error due to mkl behaving foolishly. |
@cjolivier01 the PR coniststs of two commits. Only the second commit is related to omp and implements the conclusion from the discussion in this issue. I have removed the second commit and disabled testing the MKL cmake builds. I look forward to your improved fix, thanks for contributing that. |
The following Makefile based builds are preserved 1) staticbuild scripts 2) Docs builds. Language binding specific build logic requires further changes 3) Jetson build. Jetpack 3.3 toolchain based on Cuda 9.0 causes 'Internal Compiler Error (codegen): "there was an error in verifying the lgenfe output!"' errors with cmake. This seems to be a known issue in Cuda 9.0 and we need to update Jetpack toolchain to work around it. 4) MKL builds. Waiting for fix of #17641 All Makefile based builds are marked with a "Makefile" postfix in the title. Improvements to CMake build - Enable -Werror for RelWithDebugInfo build in analogy to "make DEV=1" build - Add USE_LIBJPEG_TURBO to CMake build - Improve finding Python 3 executable Changes to CI setup - Install protobuf and zmq where missing - Install up-to-date CMake on Centos 7 - Don't use RelWithDebInfo on Android builds, as gcc 4.9 throws -Wdelete-non-virtual-dtor Code changes - Disable warnings introduced by GCC7 at via #pragma GCC diagnostic
@cjolivier01 thanks for volunteering to contribute the PR! Do you have any status update? |
@cjolivier01 please prioritize the PR, as this affects other users. For example #17733 Let me know if I may resubmit the MKL static linkage commit earlier included in #17645. |
yesh just submit the static linkage
…On Tue, Mar 3, 2020 at 10:18 AM Leonard Lausen ***@***.***> wrote:
@cjolivier01 <https://github.com/cjolivier01> please prioritize the PR,
as this affects other users. For example #17733
<#17733>
Let me know if I may resubmit the MKL static linkage commit earlier
included in #17645 <#17645>.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#17641>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACVWZ7OGQYOOPVXQJOEII2LRFVCW5ANCNFSM4KYYMAMQ>
.
|
* Fix MKL static link & default to static link on unix Fixes #17641 * Test cmake MKL build on CI
The following Makefile based builds are preserved 1) staticbuild scripts 2) Docs builds. Language binding specific build logic requires further changes 3) Jetson build. Jetpack 3.3 toolchain based on Cuda 9.0 causes 'Internal Compiler Error (codegen): "there was an error in verifying the lgenfe output!"' errors with cmake. This seems to be a known issue in Cuda 9.0 and we need to update Jetpack toolchain to work around it. 4) MKL builds. Waiting for fix of apache#17641 All Makefile based builds are marked with a "Makefile" postfix in the title. Improvements to CMake build - Enable -Werror for RelWithDebugInfo build in analogy to "make DEV=1" build - Add USE_LIBJPEG_TURBO to CMake build - Improve finding Python 3 executable Changes to CI setup - Install protobuf and zmq where missing - Install up-to-date CMake on Centos 7 - Don't use RelWithDebInfo on Android builds, as gcc 4.9 throws -Wdelete-non-virtual-dtor Code changes - Disable warnings introduced by GCC7 at via #pragma GCC diagnostic
* Fix MKL static link & default to static link on unix Fixes apache#17641 * Test cmake MKL build on CI
The following Makefile based builds are preserved 1) staticbuild scripts 2) Docs builds. Language binding specific build logic requires further changes 3) Jetson build. Jetpack 3.3 toolchain based on Cuda 9.0 causes 'Internal Compiler Error (codegen): "there was an error in verifying the lgenfe output!"' errors with cmake. This seems to be a known issue in Cuda 9.0 and we need to update Jetpack toolchain to work around it. 4) MKL builds. Waiting for fix of apache#17641 All Makefile based builds are marked with a "Makefile" postfix in the title. Improvements to CMake build - Enable -Werror for RelWithDebugInfo build in analogy to "make DEV=1" build - Add USE_LIBJPEG_TURBO to CMake build - Improve finding Python 3 executable Changes to CI setup - Install protobuf and zmq where missing - Install up-to-date CMake on Centos 7 - Don't use RelWithDebInfo on Android builds, as gcc 4.9 throws -Wdelete-non-virtual-dtor Code changes - Disable warnings introduced by GCC7 at via #pragma GCC diagnostic
* Fix MKL static link & default to static link on unix Fixes apache#17641 * Test cmake MKL build on CI
Description
Compiled MxNet has duplicate OpenMP library link to both libomp and libiomp.
Error Message
(Paste the complete error message. Please also include stack trace by setting environment variable
DMLC_LOG_STACK_TRACE_DEPTH=10
before running your script.)To Reproduce
I have both Intel MKL and MKLDNN library installed on Ubuntu 18.04. Use the following config to compile MxNet will lead the error shown above.
What have you tried to solve it?
After I deleted 3rdparty/openmp, and recompiled mxnet, this error no longer occurs.
Environment
Ubuntu 18.04, installed with Intel MKL and MKLDNN library.
The text was updated successfully, but these errors were encountered: