-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it will support only 3D to 5D? What's the limitation here?
looks good |
There is a bug when the dimension of inputs is 2.
|
…-mxnet into support_5d_sync_batchnorm
There seems to be a bug in SyncBatchNorm, when spatial_shape is 1x1, 1xn or nx1. I am checking it. |
It seems that the bug has been addressed, although I do not know the specific reason yet. I will add a test for multi-output. |
@zhreshold @szha Hi! I have updated the PR and add adquate unittests. |
If @szha has no complaint, I can merge it in 24hr |
thanks @wkcn , this is merged! |
* support SyncBatchNorm5D * fix * update testcase and reformat code * retrigger CI * update test case * test * Retrigger CI * disable cudnn for batchnorm * fix BatchNorm(cudnn) * fix build * Remove a testcase * Update sync_batch_norm-inl.h * update unittest * update unittest * update test * fix test * change atol and rtol * BN(cudnn) 5d * update test * test * Testing * Update batch_norm.cu * test cudnnoff * Update test_operator.py * update BN! : )
* support SyncBatchNorm5D * fix * update testcase and reformat code * retrigger CI * update test case * test * Retrigger CI * disable cudnn for batchnorm * fix BatchNorm(cudnn) * fix build * Remove a testcase * Update sync_batch_norm-inl.h * update unittest * update unittest * update test * fix test * change atol and rtol * BN(cudnn) 5d * update test * test * Testing * Update batch_norm.cu * test cudnnoff * Update test_operator.py * update BN! : )
input2grad.asnumpy(), atol=atol, rtol=rtol) | ||
|
||
cfgs = [(1, False)] | ||
num_gpus = mx.context.num_gpus() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line requires having GPU when CUDA is installed, or it would throw this error:
======================================================================
ERROR: test_gluon.test_sync_batchnorm
----------------------------------------------------------------------
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/nose/case.py", line 197, in runTest
self.test(*self.arg)
File "/home/travis/build/dmlc/mxnet-distro/mxnet-build/tests/python/unittest/common.py", line 177, in test_new
orig_test(*args, **kwargs)
File "/home/travis/build/dmlc/mxnet-distro/mxnet-build/tests/python/unittest/test_gluon.py", line 693, in test_sync_batchnorm
num_gpus = mx.context.num_gpus()
File "/home/travis/build/dmlc/mxnet-distro/mxnet/context.py", line 258, in num_gpus
check_call(_LIB.MXGetGPUCount(ctypes.byref(count)))
File "/home/travis/build/dmlc/mxnet-distro/mxnet/base.py", line 254, in check_call
raise MXNetError(py_str(_LIB.MXGetLastError()))
MXNetError: [11:47:54] include/mxnet/base.h:427: Check failed: e == cudaSuccess (30 vs. 0) : CUDA: unknown error
Stack trace:
[bt] (0) /home/travis/build/dmlc/mxnet-distro/mxnet/libmxnet.so(+0x4b60fb) [0x7f8d608830fb]
[bt] (1) /home/travis/build/dmlc/mxnet-distro/mxnet/libmxnet.so(+0x2440eec) [0x7f8d6280deec]
[bt] (2) /home/travis/build/dmlc/mxnet-distro/mxnet/libmxnet.so(MXGetGPUCount+0x19) [0x7f8d6280df79]
[bt] (3) /usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call_unix64+0x4c) [0x7f8d9a2e1c7c]
[bt] (4) /usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call+0x1fc) [0x7f8d9a2e15ac]
[bt] (5) /usr/lib/python2.7/lib-dynload/_ctypes.x86_64-linux-gnu.so(_ctypes_callproc+0x48e) [0x7f8d9a4f85fe]
[bt] (6) /usr/lib/python2.7/lib-dynload/_ctypes.x86_64-linux-gnu.so(+0x15f9e) [0x7f8d9a4f9f9e]
[bt] (7) /usr/bin/python(PyEval_EvalFrameEx+0x965) [0x4c84a5]
[bt] (8) /usr/bin/python(PyEval_EvalCodeEx+0x2ac) [0x4cfedc]
-------------------- >> begin captured logging << --------------------
common: INFO: Setting test np/mx/python random seeds, use MXNET_TEST_SEED=1179889124 to reproduce.
--------------------- >> end captured logging << ---------------------
----------------------------------------------------------------------
Can you please move this test to tests/python/gpu/test_gluon_contrib_gpu.py? @wkcn @zhreshold
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know why a unknown CUDA error was raised.
https://github.com/apache/incubator-mxnet/blob/master/include/mxnet/base.h#L424
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was testing it on a platform without GPU, with CUDA installed. In any case, the test seems misplaced.
* support SyncBatchNorm5D * fix * update testcase and reformat code * retrigger CI * update test case * test * Retrigger CI * disable cudnn for batchnorm * fix BatchNorm(cudnn) * fix build * Remove a testcase * Update sync_batch_norm-inl.h * update unittest * update unittest * update test * fix test * change atol and rtol * BN(cudnn) 5d * update test * test * Testing * Update batch_norm.cu * test cudnnoff * Update test_operator.py * update BN! : )
Description
Hi! there.
Currently, SyncBatchNorm doesn't support 5+D input.
In this PR, I fix it.
Checklist
Essentials
Please feel free to remove inapplicable items for your PR.
Changes
key
for SyncBatchNormtest_sync_batchnorm
tests/python/gpu/test_gluon_gpu.py
Comments