-
Notifications
You must be signed in to change notification settings - Fork 6.8k
[MXNET-244] Work around likely compiler bug on nested inlines and temporary acces… #13535
Conversation
7ceac60
to
c4af669
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are these files in the src/operator/ dir? (Seems strange to me, as the rest of the files in there belong to operators)
@jlcontreras good point, but that was there already, I think its because lapack is used in some operators. Do you have a better suggestion? |
Difficult to say, maybe common/ ? |
@jlcontreras do you think it makes sense to keep it in a different PR? |
@@ -375,7 +364,10 @@ inline void flip(int m, int n, DType *b, int ldb, DType *a, int lda) { | |||
|
|||
MXNET_LAPACK_CWRAPPER3(ssyevd, float) | |||
MXNET_LAPACK_CWRAPPER3(dsyevd, double) | |||
|
|||
#undef MXNET_LAPACK_CWRAPPER1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this is needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to avoid being defined again in the implementation
src/operator/c_lapack_api.h
Outdated
@@ -325,41 +325,30 @@ inline void flip(int m, int n, DType *b, int ldb, DType *a, int lda) { | |||
#else | |||
|
|||
// use pragma message instead of warning | |||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why commenting out? Maybe it should go to c_lapack_api.cc
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point
c4af669
to
2f7d5d0
Compare
LGTM. Don't think that it makes sense to move the c_lapack_api to a different directory now. There are pros and cons about where it should sit, but no immediate benefit from moving. |
@mxnet-label-bot [pr-awaiting-merge] |
1e5df93
to
609595a
Compare
Any idea why the CI failed? Looks like a timeout right? |
609595a
to
69adf38
Compare
@KellenSunderland I guess we can merge now. Thank you. |
@szha @KellenSunderland @tqchen could we please merge this? it's needed for fixing crashes on some scenarios. Thank you. |
…porary acces… (apache#13535) * Work around likely compiler bug on nested inlines and temporary access to stream * Don't compile khatri_rao tests if we don't have LAPACK * Address CR comment
…porary acces… (apache#13535) * Work around likely compiler bug on nested inlines and temporary access to stream * Don't compile khatri_rao tests if we don't have LAPACK * Address CR comment
…s to stream
Description
This fixes a segfault in ARMv7 on throwing a fatal error on the lapack undefined functions.
Addresses #13342
We don't crash as before, the test fails which is expected.
I think this is a bug in the compiler due to the temporary LogMessageFatal created on unwrapping the macro https://github.com/apache/incubator-mxnet/blob/master/src/operator/c_lapack_api.h#L369 and too many inlined functions. The segfault is inside the ostream in libc, could be related to the lifetime of the temporary variable with a reference to the stream which is a very corner case of temporary variable lifetime.
Having the functions not inlined solves the problem.
Checklist
Essentials
Please feel free to remove inapplicable items for your PR.
Changes
Comments