Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Step to newer rocm versions and vendor independent naming #292

Merged
merged 10 commits into from
Sep 20, 2024

Conversation

td-mpcdf
Copy link
Contributor

This PR

  • adapts to the new include directory structure of ROCM (without giving up the old one)
  • changes a CMAKE variable name from CUDA to general GTENSOR_GPU_ARCHITECTURES naming

@bd4 bd4 requested review from germasch, bd4 and gmerlo September 19, 2024 14:10
@bd4
Copy link
Contributor

bd4 commented Sep 20, 2024

Have you tested this with ROCM 5.5 and < 5.5? The include change happened in 5.5. I tried changing the ifs to check for 5.5 and later, and it still fails to compile tests even with -Wno-error, because google tests insists on compiling with -Werror which includes the gtensor tests that have a bunch of rocm warnings. The way Werror is handled in C++ is kind of garbage, breaks things with new releases and causes major headaches. I think even if we use correct new includes, ROCm's own includes in 5.5 have internal deprecated includes triggering warnings.

Given that GENE is probably the only user of this, do we need to support ROCM <5.5 or even <6? If we do, then I want to make sure we can verify it works, otherwise we can just assume the new include paths and not worry about backward compat.

@bd4
Copy link
Contributor

bd4 commented Sep 20, 2024

Got it working with ROCm 5.5, was a configuration issue on my end. The dir structure changed in 5.5 and it has backward compatibility, but the backward compatibility is fragile. In particular, HIP_PATH need to be set to same as ROCM_PATH instead of ROCM_PATH/hip, that avoids all the warnings from the distributed header files.

Copy link
Contributor

@bd4 bd4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thank you for the contribution!

We can also consider removing the <5.5. compatibility in the future.

@bd4 bd4 merged commit f769ce5 into wdmapp:main Sep 20, 2024
18 checks passed
@bd4 bd4 mentioned this pull request Sep 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants