-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Upgrading MKLDNN to 1.0 causes performance regression. #16891
Comments
@TaoLv @pengzhao-intel @zixuanweeei @samskalicky |
@mxnet-label-bot add [R1.6.0] |
@leleamol How did you install the mxnet package, from source code or the nightly build? If build from source code, could you please share the make line also? #16555 removed the libiomp5 library from mxnet default build to comply with Apache License requirements. That could be the reason of this issue but I still need reproduce to confirm. If possible, could you please try to build mxnet with |
Our test results, #16845 (comment) |
@TaoLv I have build the mxnet package from source. I followed the instructions that are mentioned in the README.md I just put them in the script form for quicker execution like below. For building the mkl variant, invoke the following script with "mkl" as command line parameter.
|
@zachgk assign [@apeforest ] |
cpu test on both v1.5.x and v1.6.x mkldnn + openblas, but no regression issue was found. I have tried to use build.sh but failed for: CMake Error at simd/CMakeLists.txt:41 (enable_language): 1.6.x: v1.5.x: OMP=36 v1.6.x: |
Considering @rongzha1 comment I don't consider this issue to be a blocker for 1.6 release. Please comment if you disagree @leleamol @samskalicky . |
@ptrendx @rongzha1 @PatricZhao thanks for looking into this, but the issue is not resolved until we verify by running the script @leleamol shared. The build.sh is the script used to generate the pip wheels. using make doesnt follow the same steps and reproduce the problem. If you cant reproduce the build using the same scripts, I can share a pre-built pip wheel with you separately. |
Regarding the following error:
you can install with |
Hi @samskalicky I applied AWS Deep learning AMI, c5.18xlarge and ubuntu 14.04 as yours
Cannot reproduce performance regression issue. Details:
Result is as following: I also use build cmd: |
Hi @TaoLv, is there an ETA to have this issue fixed? It's causing quite some concern around here. Thanks, Omar |
Added a script for easy repro: To run:
|
@oorqueda @samskalicky @leleamol As mentioned in #16891 (comment), I suspect that the regression is caused by the removal of libiomp5.so. To verify, please try to apply the below patch to
And then build MXNet with:
If it's true, I don't think we have any choice to avoid the regression in pip packages as removing libiomp5.so is a requirement from Apache. Please refer to #15544. Thanks! |
@leleamol could you help to confirm the current test status based on our feedback? |
Retried with this patch after installing MKL BLAS with https://github.com/apache/incubator-mxnet/blob/master/ci/docker/install/ubuntu_mkl.sh and got these results: Average Throughput: 1663.49 samples/sec
|
@NihalHarish thanks for verifying @TaoLv
If it was omitted by mistake and since it is required, I could push a PR for the same.
Thanks. |
@ChaiBapchya The file is used to build mxnet-mkl pip package. If you want to change the configurations, I think you need have a proposal on dev@. |
What is the status of this issue? From the conversation it seems to me that Intel people think it is not an issue (or at least it is unavoidable) and Amazon people are concerned about this. Is that accurate? If so, how does it affect the 1.6 release - should I go ahead and make the RC despite this issue or is there active work going on to fix it? |
@TaoLv are you saying that we should keep the current config where we build the mkl flavor with openblas: |
mkl flavor packages are always built with USE_BLAS=openblas. We can change that to MKL BLAS if we are allowed to include dependency with category x license [1] into MXNet convenient releases. |
Thanks @TaoLv I was able to rebuild and reproduce Nihal's results:
The root cause of this performance regression is from the difference of BLAS libraries (switching from MKL BLAS to OpenBLAS) and removing the libiomp5.so library. Now the next step is to determine how we want to proceed. Do we continue with OpenBLAS and take the hit on performance, or as @TaoLv mentioned can we use the category x licensed dependency? |
Hi @TaoLv, @samskalicky, Intel MKL-DNN includes GEMM implementation that is comparable in terms of performance to Intel MKL. Is using |
@TaoLv @pengzhao-intel Are there features in MXNet that require MKL as the BLAS library? I was able to find this line: Im rereading the previous comment and now im confused:
Is the performance difference coming from using Intel's OpenMP library (libiomp5) or from using the MKL BLAS library itself and some routines like GEMM (as @vpirogov mentions)? |
@vpirogov @samskalicky Although MKL BLAS may also have positive impact to the case demonstrated above, I think the main gap is from different OMP runtimes. Setting @samskalicky The code you referred will not be called in the ResNet18 case. Most of the computation in ResNet18 should go to DNNL. |
@TaoLv, is anything preventing us from using LLVM OpenMP runtime (libomp)? It is pretty much an open source version of libiomp5. |
@vpirogov We can do that. My only concern is the interoperability of it. Also from MXNet perspective, we need move the release process from make to cmake which I don't think can be done within the schedule of the 1.6.0 release. |
What do you mean by interoperability exactly? |
@TaoLv To get a closure on this topic, would it be possible to move the discussion forward |
@vpirogov @ChaiBapchya The interoperability means:
|
You are right that when different OpenMP runtimes are used in the same application there's a potential for interoperability issues. For this particular discussion it's important to note that the interoperability considerations are the same for libiomp5 and libomp. From that perspective using libomp does not introduce any additional issues in comparison to what MXNet used before (i.e. libiomp5). |
@vpirogov, yes, that's true. libomp and libiomp5 should have the same interoperability issue. From this perspective, the current release build solution (makefile + gomp) sounds a safer choice though it has relatively worse performance. I assume that gomp has better interoperability than the other two runtimes, maybe not true. |
@samskalicky and all,
From my side, I prefer the first option. What's your opinion? |
Hi @pengzhao-intel, in MXNet 2.0 Cmake is planned to be the only build system: https://github.com/apache/incubator-mxnet/projects/18#card-30594044 Would that address the cons in Option 2? |
It's a good chance to make the system clean :) |
closing since the fix has alrady updated with latest MKLDNN version. |
Description
The change that upgraded MKLDNN to 1.0 caused performance (images/sec) to drop by 200 points.
Error Message
The through-put performance (images/sec) during training dropped to 1300 images/sec.
Prior to this change the throughput was in the range of 1500-1530 images/sec.
To Reproduce
The attached gzip file contains the training script that trains resnet18_v2 network on Cifar10 dataset.
image_classification.tar.gz
The above numbers were measured on C5.18xlarge ubuntu instance.
Steps to reproduce
(Paste the commands you ran that produced the error.)
The sample output looks like below.
Environment
The text was updated successfully, but these errors were encountered: