-
Notifications
You must be signed in to change notification settings - Fork 173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: rocBLAS fails tests badly in FP16 for distro packages #1350
Comments
@littlewu2508 , |
To ReproduceThis result comes from running In Gentoo system, you can replace the default repo with this experiment branch, then build and test rocBLAS: cd /var/db/repos
mv gentoo{,.bak}
git clone -b rocm-5.6 https://github.com/littlewu2508/gentoo.git
echo 'ACCEPT_KEYWORDS="~amd64"' > /etc/portage/make.conf
mkdir -p /etc/portage/env /etc/portage/package.use
echo 'FEATURES=test' > /etc/portage/env/test.conf
echo 'sci-libs/rocBLAS test.conf' >> /etc/portage/package.env
emerge "=sci-libs/rocBLAS-5.6.0" Expected behaviorAll tests pass. Log-filesThe complete build-and-test log is EnvironmentThere are two environments MI210
Radeon VII
|
@littlewu2508 , I tried to follow some steps to unmask it , but no luck. Not very familiar with Gentoo environment. Any pointers on how to proceed further? I was not able to reproduce this issue using ROCm 5.6 in Ubuntu |
Sorry I made a mistake in reproducing steps. Try adding
If you're using the official ROCm stack shipped by repo.radeon.com and with upstream kernel installed, then you shouldn't encounter this issue. I does not reproduce it as well on Debian12 with .deb from repo.radeon.com installed. So I guess it's Gentoo use upstream LLVM that causes all discrepancies. |
@littlewu2508, Would you be able to try some of the suggestions from ROCm team provided in rocFFT Issues #439 For reproducing the error, you could use the sample program provided here in Gentoo environment. And maybe you could try this suggestion to verify if it resolves the issue |
Thank you very much for these suggestions. I have also reproduced the |
@littlewu2508 , Fedoro fix for half precisions is below: |
Thank you! Is this patch submitted to llvm-project upstream? |
CMAKE_TRY_COMPILE_TARGET_TYPE defaults to EXECUTABLE, which causes any feature detection code snippet without a main function to fail, so we need to make sure it gets explicitly set to STATIC_LIBRARY. Bug: ROCm/rocFFT#439 Bug: ROCm/rocBLAS#1350 Bug: https://bugs.gentoo.org/916069 Closes: #69842 Reviewed by: thesamesam, mgorny
@littlewu2508, |
Describe the bug
Distro rocBLAS-5.6.0 (compiled with upstream llvm-16) fails many FP16 related tests. Both seen on MI210 and Radeon VII. Details can be seen in gzipped test.log:
MI210-test.log.gz
RadeonVII-test.log.gz
The build log is also appended:
MI210-build.log.gz
RadeonVII-build.log.gz
The text was updated successfully, but these errors were encountered: