-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Fall back when sparse arrays are passed to MKLDNN-enabled operators #11664
Conversation
tests/python/mkl/test_mkldnn.py
Outdated
@@ -240,5 +240,135 @@ def check_batchnorm_training(stype): | |||
for stype in stypes: | |||
check_batchnorm_training(stype) | |||
|
|||
@with_seed() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that these tests are mkldnn specific and should rather be put in test_operators or test_operators_sparse
@eric-haibin-lin @haojin2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we discussed earlier, we should still have such test in place for testing the fallback logic with USE_MKLDNN=1. Depends on the logic in InferStorage, we may need an extra test in test_sparse_operator/test_operator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, but these tests can then be placed in the regular operator tests. The fallback will be implicitly tested. I don't see any mkldnn specific code that would justify it from not being run with MXNet or cuda as backend.
I had the same discussion in another PR from shufan (on mobile ATM). Would you mind joining in over there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I definitely would like to, would you mind providing a pointer to that PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks
Hello @luobao-intel, welcome to MXNet and thanks for your contribution! |
Why not call |
This is the final rectify for fallback problem(functions call)
Conflicts: tests/python/mkl/test_mkldnn.py
The final commit deleted the wrong test case added by me. And this commit follows the unifying code structure of batchnorm. @zheng-da |
src/operator/nn/activation.cc
Outdated
} | ||
if (dev_mask == mshadow::cpu::kDevMask && SupportMKLDNNAct(param)) | ||
return ElemwiseStorageType<1, 1, false, false, false>( | ||
attrs, dev_mask, dispatch_mode, in_attrs, out_attrs); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's CPU and support MKLDNN activation, is dispatch_mode
kFComputeEx
?
Thanks for reviewing again. @zheng-da |
src/operator/nn/activation.cc
Outdated
if (param.act_type != activation::kReLU) { | ||
CHECK_EQ(in_attrs->size(), 3U); | ||
ret = ElemwiseStorageType<3, 1, false, false, false>( | ||
attrs, dev_mask, dispatch_mode, in_attrs, out_attrs); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still have the same question here. dispatch_mode can be kFComputeEx? ElemwiseStorageType only uses kFComputeEx for sparse storage type.
Please benchmark the performance with this modification to make sure there isn't performance regression. |
@marcoabreu , if cudnn and mkldnn are both used, the behavior is unchanged. This PR only covers scenarios using mkldnn only. |
@marcoabreu I did some investigation and found that activation doesn't support FComputeEx<GPU>, so FInferStorageType should be used for MKLDNN only, just like other ops. I refactored the code, please review. Thanks. |
@luobao-intel please help add the performance data. @zheng-da @marcoabreu please take a review again and merge in case no further comments. |
The training and inference performance of regression verifying for fallback merge is shown as follows. This data is collected on commit 3ea67a7.
The Inference comparison:
|
thanks for providing the performance results. |
@marcoabreu any other comments? Could you help to merge? |
NDArray data = inputs_[i]; | ||
inputs.emplace_back(data.shape(), ctx, false, data.dtype()); | ||
if (data.IsMKLDNNData() && data.IsView()) | ||
data = data.Reorder2Default(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do you remove the code here?
I think the original code is correct.
if (is_excluded.get(attrs.op, false)) { | ||
LOG(WARNING) << attrs.op->name << " not checked. TExcludeMKLDNNDebug flag present"; | ||
return; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, I didn't mean to remove the code. It was caused by the wrong merge. I'll handle it.
@azai91 could you please review the code? |
…pache#11664) * softmax_fallbach * Fallback Amend This is the final rectify for fallback problem(functions call) * Lint amend * test_try * Patch for test fail * Pooling amend * Delete non_rectified_operation_test * fallback_normal * Fixed_dispatch * activation-amend * activation second * activation backward * activate_try * activation_debug * Act change. * test_random * mkldnn choice * format_modify * rebase
…pache#11664) * softmax_fallbach * Fallback Amend This is the final rectify for fallback problem(functions call) * Lint amend * test_try * Patch for test fail * Pooling amend * Delete non_rectified_operation_test * fallback_normal * Fixed_dispatch * activation-amend * activation second * activation backward * activate_try * activation_debug * Act change. * test_random * mkldnn choice * format_modify * rebase
Description
Currently, the MKLDNN-enabled operators, such as convolution and pooling, can't handle sparse arrays correctly. The reason is that the storage inference of these operators doesn't return the right dispatch mode.#11448
Checklist
Essentials
Please feel free to remove inapplicable items for your PR.
Changes
Comments
@pengzhao-intel @zheng-da @xinyu-intel