Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MIOpenDriver gemm is broken and MIOpenDriver gemmfp16 is disabled without a reason #2505

Closed
AngryLoki opened this issue Nov 3, 2023 · 11 comments · Fixed by #2592
Closed

Comments

@AngryLoki
Copy link
Contributor

Hi,

While testing build result of MIOpen 5.7.1 ebuild, I noticed that MIOpenDriver gemm fails in https://github.com/ROCmSoftwarePlatform/MIOpen/blob/rocm-5.7.1/src/gemm_v2.cpp#L672 after falling through switch above it with uninitialized gemm_desc.dataType.

Additionally I noticed that GemmDriver also works fine with float16. It looks like gemmfp16 flag was commented out since initial commit.

Here is the fix I've checked AngryLoki@717861b
Feel free to apply it as is or partially up to your decision.

@atamazov
Copy link
Contributor

atamazov commented Nov 4, 2023

@AngryLoki Thanks for the proposed fix. Would you like to contribute to MIOpen yourself or would you like us to use your code and create a PR?

/cc @junliume

@AngryLoki
Copy link
Contributor Author

@atamazov , could you do it, please? Maybe there is a better way to convert template type to miopen-type (also I forgot to remove // TODO half is not supported in gemm in main.cpp...)

@atamazov
Copy link
Contributor

atamazov commented Nov 4, 2023

@AngryLoki I see, Ok. How urgent and important is this for you?

@AngryLoki
Copy link
Contributor Author

@atamazov , not urgent, I will add a patch to 5.7.1 ebuild and just would like to see this patch eventually removed.

@CAHEK7
Copy link
Contributor

CAHEK7 commented Nov 28, 2023

@AngryLoki, have you tried to run gemmfp16 with verification (-V 1)?

@atamazov
Copy link
Contributor

@CAHEK7 I would expect gemmfp16 to pass validation even without #2558 but only if GEMM problem under test is very small.

@CAHEK7
Copy link
Contributor

CAHEK7 commented Nov 29, 2023

Without #2558 it can't pass anything since dataType is not initialized. I tried tiny problems (2x3 * 3x2) - fp32 passed, fp16 did not.

@AngryLoki
Copy link
Contributor Author

@CAHEK7 ,

With patch from initial comment, gemmfp16 verifies for me with following result:

# gemm
$ ./bin/MIOpenDriver gemm --verify 1
MIOpenDriver gemm --verify 1
MIOpen(HIP): Info [get_device_name] Raw device name: gfx1036
MIOpen(HIP): Info [Handle] stream: 0x5577070fb440, device_id: 0
Forward GEMM Verifies on CPU and GPU (err=0.000000)

# gemmfp16
$ ./bin/MIOpenDriver gemmfp16 --verify 1
MIOpenDriver gemmfp16 --verify 1
MIOpen(HIP): Info [get_device_name] Raw device name: gfx1036
MIOpen(HIP): Info [Handle] stream: 0x55a62bfc8440, device_id: 0
Forward GEMM Verifies on CPU and GPU (err=0.000139)

$ ./bin/MIOpenDriver gemmfp16 --verify 1 -m 1024 -n 1024 -k 1024
MIOpenDriver gemmfp16 --verify 1 -m 1024 -n 1024 -k 1024
MIOpen(HIP): Info [get_device_name] Raw device name: gfx1036
MIOpen(HIP): Info [Handle] stream: 0x558e86238440, device_id: 0
Forward GEMM Verifies on CPU and GPU (err=0.000111)

I think this err is normal for half-floats, right? If you see huge error, maybe you encounter ROCm/rocBLAS#1350 (failure to do basic fp16 conversions on CPU), it has pending PR in LLVM, but I use a quick workaround with -mf16c or equivalent in miopen CXXFLAGS.

@CAHEK7
Copy link
Contributor

CAHEK7 commented Nov 29, 2023

Yep, I've seen significantly bigger error.
It seems we can safely enable it only when rocBLAS will be fixed, since under the hood we use rocBLAS for GEMM. @AngryLoki thanks for pointing to that issue, it saved a lot of time.

@atamazov

@atamazov
Copy link
Contributor

@CAHEK7 @AngryLoki rocblas is working fine as a backend for fp16 convolutions. It seems like we need to look a bit deeper on what is going on.

Current status is that "MIOpenDriver gemm is broken" part of the problem can be fixed soon, while "MIOpenDriver gemmfp16 is disabled without a reason" requires some more work than initially expected. I am going to takeover the latter from @CAHEK7.

@atamazov
Copy link
Contributor

Status update:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment