-
Notifications
You must be signed in to change notification settings - Fork 242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MIOpenDriver gemm is broken and MIOpenDriver gemmfp16 is disabled without a reason #2505
Comments
@AngryLoki Thanks for the proposed fix. Would you like to contribute to MIOpen yourself or would you like us to use your code and create a PR? /cc @junliume |
@atamazov , could you do it, please? Maybe there is a better way to convert template type to miopen-type (also I forgot to remove |
@AngryLoki I see, Ok. How urgent and important is this for you? |
@atamazov , not urgent, I will add a patch to 5.7.1 ebuild and just would like to see this patch eventually removed. |
@AngryLoki, have you tried to run |
Without #2558 it can't pass anything since dataType is not initialized. I tried tiny problems (2x3 * 3x2) - fp32 passed, fp16 did not. |
@CAHEK7 , With patch from initial comment, gemmfp16 verifies for me with following result: # gemm
$ ./bin/MIOpenDriver gemm --verify 1
MIOpenDriver gemm --verify 1
MIOpen(HIP): Info [get_device_name] Raw device name: gfx1036
MIOpen(HIP): Info [Handle] stream: 0x5577070fb440, device_id: 0
Forward GEMM Verifies on CPU and GPU (err=0.000000)
# gemmfp16
$ ./bin/MIOpenDriver gemmfp16 --verify 1
MIOpenDriver gemmfp16 --verify 1
MIOpen(HIP): Info [get_device_name] Raw device name: gfx1036
MIOpen(HIP): Info [Handle] stream: 0x55a62bfc8440, device_id: 0
Forward GEMM Verifies on CPU and GPU (err=0.000139)
$ ./bin/MIOpenDriver gemmfp16 --verify 1 -m 1024 -n 1024 -k 1024
MIOpenDriver gemmfp16 --verify 1 -m 1024 -n 1024 -k 1024
MIOpen(HIP): Info [get_device_name] Raw device name: gfx1036
MIOpen(HIP): Info [Handle] stream: 0x558e86238440, device_id: 0
Forward GEMM Verifies on CPU and GPU (err=0.000111) I think this err is normal for half-floats, right? If you see huge error, maybe you encounter ROCm/rocBLAS#1350 (failure to do basic fp16 conversions on CPU), it has pending PR in LLVM, but I use a quick workaround with |
Yep, I've seen significantly bigger error. |
@CAHEK7 @AngryLoki rocblas is working fine as a backend for fp16 convolutions. It seems like we need to look a bit deeper on what is going on. Current status is that "MIOpenDriver gemm is broken" part of the problem can be fixed soon, while "MIOpenDriver gemmfp16 is disabled without a reason" requires some more work than initially expected. I am going to takeover the latter from @CAHEK7. |
Status update:
|
Hi,
While testing build result of MIOpen 5.7.1 ebuild, I noticed that
MIOpenDriver gemm
fails in https://github.com/ROCmSoftwarePlatform/MIOpen/blob/rocm-5.7.1/src/gemm_v2.cpp#L672 after falling through switch above it with uninitialized gemm_desc.dataType.Additionally I noticed that GemmDriver also works fine with float16. It looks like
gemmfp16
flag was commented out since initial commit.Here is the fix I've checked AngryLoki@717861b
Feel free to apply it as is or partially up to your decision.
The text was updated successfully, but these errors were encountered: