-
Notifications
You must be signed in to change notification settings - Fork 243
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Fp16] MIOpen integration or layout transpose issue with FP16_INF and FP16_MAX #2496
Comments
hotfix is here: |
The root cause is in this case wrw kernel using atomic, and in the solver it will use _FLOAT_SRC temp_src = *(src + sindex + srcOffset);
#if MIOPEN_SRC_TYPE == 3 && MIOPEN_DST_TYPE == 4
temp_src *= alpha;
*(dst + dindex + dstOffset) = float_to_bfloat16(temp_src);
#else
bool over_flow = (alpha * ((float)temp_src)) >= ((float)MAX_VAL);
*(dst + dindex + dstOffset) =
(_FLOAT_DST)(over_flow ? MAX_VAL : alpha * ((float)temp_src));
#endif
} @junliume I think need some one to evaluate why here need clamp to MAX |
@junliume @JehandadKhan @carlushuang First of all, it should be noted that the convolution code we generate is focused on performance, not IEEE-754 correctness. Convolutions should provide maximum performance, provided that the accuracy is sufficient for the neural networks to function correctly. That is why the OpenCL kernels (for example) are compiled with the Please also note that the result of MAX instead of INF can be interpreted as a difference of 1ULP, which is sufficient accuracy for convolutions ;) So the test that checks for INF on output is questionable. From the other hand, if we want to make the test passing, fixing the cast kernel (as triaged by @carlushuang) is a way to go. But please note that while this resolves the immediate problem, we cannot always guarantee the expected special number result (INF) due to the the above. WRT the implementation of the full-blown fix. Git blame shows that
@carlushuang Thanks for triaging the issue! |
I want to make two points. Second, whatever the reasoning was for that clamp into finite range, we can probably agree that applying it only on the positive side can't possibly be correct. From the frameworks perspective, it would be best if the cast kernel clamped into finite range for forward but not backward convolutions. If that is too hard, then it should not clamp at all. |
@JehandadKhan and @atamazov (CC: @carlushuang) we could change the logic and only "clamp" in the forward direction? Then we could let it be tested through a whole round of staging and trying to identify if there are any issues long before the next release. |
@ekuznetsov139 Thanks for the explanation of the frameworks' requirement. @JehandadKhan @carlushuang I can work on this if no one else is doing it yet. The plan is to implement the 3rd and 4th bullets from #2496 (comment). It seems like we have little time. Q: shall we add a run-time or build-time parameter?
Sure, and this seems the best solution from the frameworks' POV. |
@junliume @JehandadKhan The fix is implemented in #2538. The fix is very conservative (minimal functional changes because release branching date is near). We should consider other option: remove clamping controls (introduced in this PR) and keep clamping permanent, but replace clamping to MAX with clamping to INF. This seems mathematically correct (and can't break #2496 again), and should slightly improve performance, but let's do that after release branching. Therefore I recommend keeping this ticket open (after merging #2538) but lowering its urgency, and then closing after implementing the clamping to INF. |
@junliume @ekuznetsov139 The fix is merged into |
[Observations]
On gfx90a platform, with the input
input.txt
The IGEMM kernel gave output:
But the expected output (from direct convolution) should be:
[Analysis]
INF
@carlushuang could you help to comment on this issue if anything is missing? Thanks!
The text was updated successfully, but these errors were encountered: