Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Failing unit tests on aarch64 #20289

Open
mseth10 opened this issue May 20, 2021 · 1 comment
Open

Failing unit tests on aarch64 #20289

mseth10 opened this issue May 20, 2021 · 1 comment

Comments

@mseth10
Copy link
Contributor

mseth10 commented May 20, 2021

Description

When enabling CD pipeline for AArch64 #20288, observed that 4 unit tests fail/error out. Skipping these tests for now in the CD pipeline and tracking them in this issue. The CD pipeline currently uses python3.6 (observed some more failures with python3.7)

Error Message

Following are the error logs from test CD pipeline:

[2021-05-19T22:59:56.273Z] ======================================================================
[2021-05-19T22:59:56.273Z] ERROR: test_contrib_operator.test_multibox_target_op
[2021-05-19T22:59:56.273Z] ----------------------------------------------------------------------
[2021-05-19T22:59:56.273Z] Traceback (most recent call last):
[2021-05-19T22:59:56.273Z]   File "/usr/local/lib/python3.6/dist-packages/nose/case.py", line 198, in runTest
[2021-05-19T22:59:56.273Z]     self.test(*self.arg)
[2021-05-19T22:59:56.273Z]   File "/work/mxnet/tests/python/unittest/test_contrib_operator.py", line 285, in test_multibox_target_op
[2021-05-19T22:59:56.273Z]     assert_allclose(loc_target.asnumpy(), expected_loc_target, rtol=1e-5, atol=1e-5)
[2021-05-19T22:59:56.273Z]   File "/work/mxnet/python/mxnet/ndarray/ndarray.py", line 2571, in asnumpy
[2021-05-19T22:59:56.273Z]     ctypes.c_size_t(data.size)))
[2021-05-19T22:59:56.273Z]   File "/work/mxnet/python/mxnet/base.py", line 246, in check_call
[2021-05-19T22:59:56.273Z]     raise get_last_ffi_error()
[2021-05-19T22:59:56.273Z] mxnet.base.MXNetError: Traceback (most recent call last):
[2021-05-19T22:59:56.273Z]   [bt] (9) /usr/lib/aarch64-linux-gnu/libstdc++.so.6(+0xd148c) [0xffffadbf448c]
[2021-05-19T22:59:56.273Z]   [bt] (8) /work/mxnet/python/mxnet/../../lib/libmxnet.so(std::thread::_State_impl<std::thread::_Invoker<std::tuple<std::function<void (std::shared_ptr<dmlc::ManualEvent>)>, std::shared_ptr<dmlc::ManualEvent> > > >::_M_run()+0x48) [0xffff0cb5e808]
[2021-05-19T22:59:56.273Z]   [bt] (7) /work/mxnet/python/mxnet/../../lib/libmxnet.so(std::_Function_handler<void (std::shared_ptr<dmlc::ManualEvent>), mxnet::engine::ThreadedEnginePerDevice::PushToExecute(mxnet::engine::OprBlock*, bool)::{lambda()#1}::operator()() const::{lambda(std::shared_ptr<dmlc::ManualEvent>)#1}>::_M_invoke(std::_Any_data const&, std::shared_ptr<dmlc::ManualEvent>&&)+0x114) [0xffff0cb60084]
[2021-05-19T22:59:56.273Z]   [bt] (6) /work/mxnet/python/mxnet/../../lib/libmxnet.so(mxnet::engine::ThreadedEngine::ExecuteOprBlock(mxnet::RunContext, mxnet::engine::OprBlock*)+0x11c) [0xffff0cb5f3bc]
[2021-05-19T22:59:56.273Z]   [bt] (5) /work/mxnet/python/mxnet/../../lib/libmxnet.so(+0x88c710) [0xffff0cb54710]
[2021-05-19T22:59:56.273Z]   [bt] (4) /work/mxnet/python/mxnet/../../lib/libmxnet.so(std::_Function_handler<void (mxnet::RunContext), mxnet::imperative::PushOperator(mxnet::OpStatePtr const&, nnvm::Op const*, nnvm::NodeAttrs const&, mxnet::Context const&, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, std::vector<mxnet::Resource, std::allocator<mxnet::Resource> > const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&, std::vector<unsigned int, std::allocator<unsigned int> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, mxnet::DispatchMode)::{lambda(mxnet::RunContext)#4}>::_M_invoke(std::_Any_data const&, mxnet::RunContext&&)+0x44) [0xffff0cc17fb4]
[2021-05-19T22:59:56.273Z]   [bt] (3) /work/mxnet/python/mxnet/../../lib/libmxnet.so(mxnet::imperative::PushOperator(mxnet::OpStatePtr const&, nnvm::Op const*, nnvm::NodeAttrs const&, mxnet::Context const&, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, std::vector<mxnet::Resource, std::allocator<mxnet::Resource> > const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&, std::vector<unsigned int, std::allocator<unsigned int> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, mxnet::DispatchMode)::{lambda(mxnet::RunContext, mxnet::engine::CallbackOnComplete)#3}::operator()(mxnet::RunContext, mxnet::engine::CallbackOnComplete) const+0x28c) [0xffff0cc17c5c]
[2021-05-19T22:59:56.273Z]   [bt] (2) /work/mxnet/python/mxnet/../../lib/libmxnet.so(mxnet::op::MultiBoxTargetOp<mshadow::cpu, float>::Forward(mxnet::OpContext const&, std::vector<mxnet::TBlob, std::allocator<mxnet::TBlob> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::TBlob, std::allocator<mxnet::TBlob> > const&, std::vector<mxnet::TBlob, std::allocator<mxnet::TBlob> > const&)+0x18a4) [0xffff0d0e0434]
[2021-05-19T22:59:56.273Z]   [bt] (1) /work/mxnet/python/mxnet/../../lib/libmxnet.so(void mshadow::MultiBoxTargetForward<float>(mshadow::Tensor<mshadow::cpu, 2, float> const&, mshadow::Tensor<mshadow::cpu, 2, float> const&, mshadow::Tensor<mshadow::cpu, 2, float> const&, mshadow::Tensor<mshadow::cpu, 2, float> const&, mshadow::Tensor<mshadow::cpu, 3, float> const&, mshadow::Tensor<mshadow::cpu, 3, float> const&, mshadow::Tensor<mshadow::cpu, 4, float> const&, float, float, float, float, int, mxnet::Tuple<float> const&)+0x11a0) [0xffff0d0d0330]
[2021-05-19T22:59:56.273Z]   [bt] (0) /work/mxnet/python/mxnet/../../lib/libmxnet.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x74) [0xffff0c9f8364]
[2021-05-19T22:59:56.273Z]   File "../src/operator/contrib/multibox_target.cc", line 235
[2021-05-19T22:59:56.273Z] MXNetError: Check failed: temp.size() >= num_negative (0 vs. 1) : 

[2021-05-19T22:59:56.273Z] ======================================================================
[2021-05-19T22:59:56.273Z] ERROR: test_numpy_interoperability.test_np_array_function_protocol
[2021-05-19T22:59:56.273Z] ----------------------------------------------------------------------
[2021-05-19T22:59:56.273Z] Traceback (most recent call last):
[2021-05-19T22:59:56.273Z]   File "/usr/local/lib/python3.6/dist-packages/nose/case.py", line 198, in runTest
[2021-05-19T22:59:56.273Z]     self.test(*self.arg)
[2021-05-19T22:59:56.273Z]   File "/work/mxnet/tests/python/unittest/common.py", line 218, in test_new
[2021-05-19T22:59:56.273Z]     orig_test(*args, **kwargs)
[2021-05-19T22:59:56.273Z]   File "/work/mxnet/python/mxnet/util.py", line 297, in _with_np_shape
[2021-05-19T22:59:56.273Z]     return func(*args, **kwargs)
[2021-05-19T22:59:56.273Z]   File "/work/mxnet/python/mxnet/util.py", line 481, in _with_np_array
[2021-05-19T22:59:56.273Z]     return func(*args, **kwargs)
[2021-05-19T22:59:56.273Z]   File "/work/mxnet/python/mxnet/numpy_dispatch_protocol.py", line 59, in _run_with_array_func_proto
[2021-05-19T22:59:56.273Z]     .format(func.__name__, str(e)))
[2021-05-19T22:59:56.273Z] RuntimeError: Running function test_np_array_function_protocol with NumPy array function protocol failed with exception arrays used as indices must be of integer (or boolean) type
[2021-05-19T22:59:56.273Z] common: INFO: Setting module np/mx/python random seeds, use MXNET_MODULE_SEED=382051259 to reproduce.
[2021-05-19T22:59:56.273Z] common: WARNING: Error seen with seeded test, use MXNET_TEST_SEED=1596024483 to reproduce.
[2021-05-19T22:59:56.273Z] --------------------- >> end captured logging << ---------------------
[2021-05-19T22:59:56.273Z] ======================================================================
[2021-05-19T22:59:56.273Z] ERROR: test_numpy_op.test_np_delete
[2021-05-19T22:59:56.273Z] ----------------------------------------------------------------------
[2021-05-19T22:59:56.273Z] Traceback (most recent call last):
[2021-05-19T22:59:56.273Z]   File "/usr/local/lib/python3.6/dist-packages/nose/case.py", line 198, in runTest
[2021-05-19T22:59:56.273Z]     self.test(*self.arg)
[2021-05-19T22:59:56.273Z]   File "/work/mxnet/tests/python/unittest/common.py", line 218, in test_new
[2021-05-19T22:59:56.273Z]     orig_test(*args, **kwargs)
[2021-05-19T22:59:56.273Z]   File "/work/mxnet/python/mxnet/util.py", line 297, in _with_np_shape
[2021-05-19T22:59:56.273Z]     return func(*args, **kwargs)
[2021-05-19T22:59:56.273Z]   File "/work/mxnet/python/mxnet/util.py", line 481, in _with_np_array
[2021-05-19T22:59:56.273Z]     return func(*args, **kwargs)
[2021-05-19T22:59:56.273Z]   File "/work/mxnet/tests/python/unittest/test_numpy_op.py", line 3636, in test_np_delete
[2021-05-19T22:59:56.273Z]     expected_ret = _np.delete(a.asnumpy(), obj_onp, axis=axis)
[2021-05-19T22:59:56.273Z]   File "<__array_function__ internals>", line 6, in delete
[2021-05-19T22:59:56.273Z]   File "/usr/local/lib/python3.6/dist-packages/numpy/lib/function_base.py", line 4406, in delete
[2021-05-19T22:59:56.273Z]     keep[obj,] = False
[2021-05-19T22:59:56.273Z] IndexError: index 0 is out of bounds for axis 0 with size 0
[2021-05-19T22:59:56.273Z] -------------------- >> begin captured logging << --------------------
[2021-05-19T22:59:56.273Z] common: WARNING: Error seen with seeded test, use MXNET_TEST_SEED=39792435 to reproduce.
[2021-05-19T22:59:56.273Z] --------------------- >> end captured logging << ---------------------```
[2021-05-19T22:59:56.273Z] ======================================================================
[2021-05-19T22:59:56.273Z] FAIL: test_ndarray.test_ndarray_fluent
[2021-05-19T22:59:56.273Z] ----------------------------------------------------------------------
[2021-05-19T22:59:56.273Z] Traceback (most recent call last):
[2021-05-19T22:59:56.273Z]   File "/usr/local/lib/python3.6/dist-packages/nose/case.py", line 198, in runTest
[2021-05-19T22:59:56.273Z]     self.test(*self.arg)
[2021-05-19T22:59:56.273Z]   File "/work/mxnet/tests/python/unittest/common.py", line 218, in test_new
[2021-05-19T22:59:56.273Z]     orig_test(*args, **kwargs)
[2021-05-19T22:59:56.273Z]   File "/work/mxnet/tests/python/unittest/test_ndarray.py", line 1313, in test_ndarray_fluent
[2021-05-19T22:59:56.273Z]     check_fluent_regular(func, {})
[2021-05-19T22:59:56.273Z]   File "/work/mxnet/tests/python/unittest/test_ndarray.py", line 1308, in check_fluent_regular
[2021-05-19T22:59:56.273Z]     assert almost_equal(regular.asnumpy(), fluent.asnumpy(), equal_nan=equal_nan)
[2021-05-19T22:59:56.273Z] AssertionError: 
[2021-05-19T22:59:56.273Z] -------------------- >> begin captured logging << --------------------
[2021-05-19T22:59:56.273Z] common: WARNING: Error seen with seeded test, use MXNET_TEST_SEED=690006707 to reproduce.
[2021-05-19T22:59:56.273Z] --------------------- >> end captured logging << ---------------------

To Reproduce

https://cwiki.apache.org/confluence/display/MXNET/Reproducing+test+results

What have you tried to solve it?

  1. These errors don't seem to be build related and might be due to python version, so I tried running them with different python versions, but got even more errors with Python 3.7 https://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/restricted-mxnet-cd%2Fsrivrohi-release-job-v1.x/detail/srivrohi-release-job-v1.x/72/pipeline/82
@mseth10
Copy link
Contributor Author

mseth10 commented Jul 5, 2021

In #20392, the docker base image was changed from ubuntu 14 to centos 7, and python v3.8 and numpy v1.17.3 were installed. With these changes, most of the above tests pass and are enabled in the PR (hence we confirm that numpy version was the cause of failure).
Only one test fail now test_contrib_operator.test_multibox_target_op with the following error log

[2021-07-03T11:05:44.813Z] Traceback (most recent call last):
[2021-07-03T11:05:44.813Z]   File "/opt/rh/rh-python38/root/usr/local/lib/python3.8/site-packages/nose/case.py", line 198, in runTest
[2021-07-03T11:05:44.813Z]     self.test(*self.arg)
[2021-07-03T11:05:44.813Z]   File "/work/mxnet/tests/python/unittest/test_contrib_operator.py", line 284, in test_multibox_target_op
[2021-07-03T11:05:44.813Z]     assert_allclose(loc_target.asnumpy(), expected_loc_target, rtol=1e-5, atol=1e-5)
[2021-07-03T11:05:44.813Z]   File "/work/mxnet/python/mxnet/ndarray/ndarray.py", line 2568, in asnumpy
[2021-07-03T11:05:44.813Z]     check_call(_LIB.MXNDArraySyncCopyToCPU(
[2021-07-03T11:05:44.813Z]   File "/work/mxnet/python/mxnet/base.py", line 246, in check_call
[2021-07-03T11:05:44.813Z]     raise get_last_ffi_error()
[2021-07-03T11:05:44.813Z] mxnet.base.MXNetError: Traceback (most recent call last):
[2021-07-03T11:05:44.813Z]   [bt] (9) /work/mxnet/python/mxnet/../../lib/libmxnet.so(+0x5b75e9c) [0xffffaa8d3e9c]
[2021-07-03T11:05:44.813Z]   [bt] (8) /work/mxnet/python/mxnet/../../lib/libmxnet.so(std::thread::_State_impl<std::thread::_Invoker<std::tuple<std::function<void (std::shared_ptr<dmlc::ManualEvent>)>, std::shared_ptr<dmlc::ManualEvent> > > >::_M_run()+0x34) [0xffffa560ae54]
[2021-07-03T11:05:44.813Z]   [bt] (7) /work/mxnet/python/mxnet/../../lib/libmxnet.so(std::_Function_handler<void (std::shared_ptr<dmlc::ManualEvent>), mxnet::engine::ThreadedEnginePerDevice::PushToExecute(mxnet::engine::OprBlock*, bool)::{lambda()#1}::operator()() const::{lambda(std::shared_ptr<dmlc::ManualEvent>)#1}>::_M_invoke(std::_Any_data const&, std::shared_ptr<dmlc::ManualEvent>&&)+0x104) [0xffffa560c944]
[2021-07-03T11:05:44.813Z]   [bt] (6) /work/mxnet/python/mxnet/../../lib/libmxnet.so(mxnet::engine::ThreadedEngine::ExecuteOprBlock(mxnet::RunContext, mxnet::engine::OprBlock*)+0x10c) [0xffffa560bc3c]
[2021-07-03T11:05:44.813Z]   [bt] (5) /work/mxnet/python/mxnet/../../lib/libmxnet.so(+0x8a27f0) [0xffffa56007f0]
[2021-07-03T11:05:44.813Z]   [bt] (4) /work/mxnet/python/mxnet/../../lib/libmxnet.so(std::_Function_handler<void (mxnet::RunContext), mxnet::imperative::PushOperator(mxnet::OpStatePtr const&, nnvm::Op const*, nnvm::NodeAttrs const&, mxnet::Context const&, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, std::vector<mxnet::Resource, std::allocator<mxnet::Resource> > const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&, std::vector<unsigned int, std::allocator<unsigned int> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, mxnet::DispatchMode)::{lambda(mxnet::RunContext)#4}>::_M_invoke(std::_Any_data const&, mxnet::RunContext&&)+0x44) [0xffffa56c2ce4]
[2021-07-03T11:05:44.813Z]   [bt] (3) /work/mxnet/python/mxnet/../../lib/libmxnet.so(mxnet::imperative::PushOperator(mxnet::OpStatePtr const&, nnvm::Op const*, nnvm::NodeAttrs const&, mxnet::Context const&, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, std::vector<mxnet::Resource, std::allocator<mxnet::Resource> > const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&, std::vector<unsigned int, std::allocator<unsigned int> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, mxnet::DispatchMode)::{lambda(mxnet::RunContext, mxnet::engine::CallbackOnComplete)#3}::operator()(mxnet::RunContext, mxnet::engine::CallbackOnComplete) const+0x274) [0xffffa56c29b4]
[2021-07-03T11:05:44.813Z]   [bt] (2) /work/mxnet/python/mxnet/../../lib/libmxnet.so(mxnet::op::MultiBoxTargetOp<mshadow::cpu, float>::Forward(mxnet::OpContext const&, std::vector<mxnet::TBlob, std::allocator<mxnet::TBlob> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::TBlob, std::allocator<mxnet::TBlob> > const&, std::vector<mxnet::TBlob, std::allocator<mxnet::TBlob> > const&)+0x18b4) [0xffffa5b9f8a4]
[2021-07-03T11:05:44.813Z]   [bt] (1) /work/mxnet/python/mxnet/../../lib/libmxnet.so(void mshadow::MultiBoxTargetForward<float>(mshadow::Tensor<mshadow::cpu, 2, float> const&, mshadow::Tensor<mshadow::cpu, 2, float> const&, mshadow::Tensor<mshadow::cpu, 2, float> const&, mshadow::Tensor<mshadow::cpu, 2, float> const&, mshadow::Tensor<mshadow::cpu, 3, float> const&, mshadow::Tensor<mshadow::cpu, 3, float> const&, mshadow::Tensor<mshadow::cpu, 4, float> const&, float, float, float, float, int, mxnet::Tuple<float> const&)+0x1150) [0xffffa5b8f9d0]
[2021-07-03T11:05:44.813Z]   [bt] (0) /work/mxnet/python/mxnet/../../lib/libmxnet.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x60) [0xffffa54a66e0]
[2021-07-03T11:05:44.813Z]   File "../src/operator/contrib/multibox_target.cc", line 235
[2021-07-03T11:05:44.813Z] MXNetError: Check failed: temp.size() >= num_negative (0 vs. 1) : 

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

1 participant