Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

{compiler}[AOMP/11.12-0] Add AOMP compiler module #12018

Closed
wants to merge 5 commits into from

Conversation

nordmoen
Copy link
Contributor

This change adds an AOMP compiler module to EasyBuild.

The setup has been tested and is working on both AMD and Nvidia GPUs.

The patch is necessary to remove some hard coded behavior on RHEL8 and to get the build process to respect EasyBuild --parallel option. The reason for the older GCC version is to conform with what AOMP expects when building, it could probably work with a newer version, but I have not tried. The same goes for gcccuda, AOMP recommends CUDA version 10.

@nordmoen nordmoen changed the title Add AOMP compiler module {compiler}[AOMP/11.12-0] Add AOMP compiler module Jan 18, 2021
@migueldiascosta
Copy link
Member

@nordmoen thank you for your contribution!

quick note while starting to test, this requires libelf if elfutils-libelf-devel or equivalent is not installed - you can point to the one in #12048 once that's merged or add the easyconfig to this PR

@migueldiascosta migueldiascosta added this to the 4.x milestone Jan 26, 2021
@nordmoen
Copy link
Contributor Author

I'll add the dependency once it is merged =]

@migueldiascosta
Copy link
Member

@boegelbot please test @ generoso

@boegelbot
Copy link
Collaborator

@migueldiascosta: Request for testing this PR well received on generoso

PR test command 'EB_PR=12018 EB_ARGS= /apps/slurm/default/bin/sbatch --job-name test_PR_12018 --ntasks=4 ~/boegelbot/eb_from_pr_upload_generoso.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 13705

Test results coming soon (I hope)...

- notification for comment with ID 767360274 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegelbot
Copy link
Collaborator

Test report by @boegelbot
SUCCESS
Build succeeded for 2 out of 2 (2 easyconfigs in total)
generoso-x-5 - Linux centos linux 8.2.2004, x86_64, Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz (haswell), Python 3.6.8
See https://gist.github.com/2adfa2a67ca5e8f11b71644c929c1337 for a full test report.

@migueldiascosta
Copy link
Member

Test report by @migueldiascosta
FAILED
Build succeeded for 0 out of 2 (2 easyconfigs in total)
sms - Linux centos linux 7.6.1810, x86_64, AMD EPYC 7601 32-Core Processor (zen), Python 2.7.5
See https://gist.github.com/29ae97e128926e9b446fe1983d3687fa for a full test report.

@migueldiascosta
Copy link
Member

@nordmoen w.r.t. to my failed test above, it seems that at least libomptarget is including the system header files instead of the ones in GCCcore, and I suppose this isn't a problem for you or in generoso because those have more recent system headers (e.g., centos 8 vs centos 7)

In file included from /dev/shm/build/AOMP/11.12-0/GCCcore-9.3.0/aomp11/amd-llvm-project/openmp/libomptarget/src/device.cpp:13:
/dev/shm/build/AOMP/11.12-0/GCCcore-9.3.0/aomp11/amd-llvm-project/openmp/libomptarget/src/device.h:95:43: error: too few template arguments for class template 'less'
typedef std::set<HostDataToTargetTy, std::less<>> HostDataToTargetListTy;
                                          ^
/usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../include/c++/4.8.5/bits/stl_function.h:231:12: note: template is declared here
    struct less : public binary_function<_Tp, _Tp, bool>

@nordmoen
Copy link
Contributor Author

@nordmoen w.r.t. to my failed test above, it seems that at least libomptarget is including the system header files instead of the ones in GCCcore, and I suppose this isn't a problem for you or in generoso because those have more recent system headers (e.g., centos 8 vs centos 7)

In file included from /dev/shm/build/AOMP/11.12-0/GCCcore-9.3.0/aomp11/amd-llvm-project/openmp/libomptarget/src/device.cpp:13:
/dev/shm/build/AOMP/11.12-0/GCCcore-9.3.0/aomp11/amd-llvm-project/openmp/libomptarget/src/device.h:95:43: error: too few template arguments for class template 'less'
typedef std::set<HostDataToTargetTy, std::less<>> HostDataToTargetListTy;
                                          ^
/usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../include/c++/4.8.5/bits/stl_function.h:231:12: note: template is declared here
    struct less : public binary_function<_Tp, _Tp, bool>

I'm looking around in the build setup, but I can't really find anything where it should be hardcoded and not set correctly by loading GCCcore.

@nordmoen
Copy link
Contributor Author

@migueldiascosta I discovered that additional build tools were needed, with one of them being pkg-config which could account for the wrong link (?). Could you test the new changes with your setup?

@migueldiascosta
Copy link
Member

migueldiascosta commented Jan 28, 2021

@nordmoen libomptarget is being built with the supplied clang, so the problem is likely this: #11868 (comment)...

Since we are not building clang (et al), I suppose we need a way of always passing --gcc-toolchain=$EBROOTGCCCORE to clang (et al) (@boegel, @SebastianAchilles ?)

@SebastianAchilles
Copy link
Member

Yes, as far as I understood you have to use --gcc-toolchain=$EBROOTGCCCORE in order to use the correct gcccore from the easybuild installation. Otherwise the gcc from the system will be used (if found).

@nordmoen
Copy link
Contributor Author

nordmoen commented Feb 2, 2021

Looking through the build sources of AOMP it seems that several places need to change to have this take effect. We could of course do this in a patch, but maybe there is a better way?

@migueldiascosta
Copy link
Member

maybe a compiler wrapper... @boegel ?

@boegel
Copy link
Member

boegel commented Mar 24, 2021

@nordmoen My apologies for never replying to your hail for help/input, this got totally snowed under...

We can definitely put lightweight wrappers in place to make sure the right GCC installation is picked up, if there's no better alternative, but then we'll need to implement a small easyblock for this I guess.
It's a bit frustrating that Clang doesn't just pick up the first gcc it finds in $PATH...

That said, is there any particular reason why we're not building AOMP from source, and configure the build with -DGCC_INSTALL_PREFIX (cfr. @SebastianAchilles' comment in #11868 (comment)).

edit: At 2nd look, maybe this is a from-source build, but then why aren't we using the procedure outlined at https://github.com/ROCm-Developer-Tools/aomp/blob/master/docs/RELEASESOURCEINSTALL.md?

Maybe it's better to discuss this interactively in EasyBuild Slack?

# support, e.g. '50,75'.
preinstallopts += ' AOMP_BUILD_CUDA=1 '
preinstallopts += 'CUDA="$CUDA_ROOT" '
preinstallopts += 'NVPTXGPUS="70"'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be configurable. Does it support 7.0 or sm_70 format? Then you can use the templates cuda_compute_capabilities etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the pre-configured targets, I assumed one had to use whole numbers only.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, then I'd say an EasyBlock is better here. I'm afraid short of patching the EC (e.g. via hooks) this would limit the GPU usability unusually, especially after we introduced the --cuda-compute-capabilities option
This has the additional benefit that those hard-coded, repeated preinstallopts can be unified and handled better (e.g. detecting presence of CUDA)

@nordmoen
Copy link
Contributor Author

nordmoen commented Apr 6, 2021

It seems that AOMP is deprecated in favor of built in support in ROCm (as per the official documentation).

Because of this I don't see the value in adding this package, especially when it seems like there is quite a lot work to get it set up properly.

@nordmoen nordmoen closed this Apr 6, 2021
@nordmoen nordmoen reopened this Apr 21, 2021
@nordmoen
Copy link
Contributor Author

Just had a meeting with AMD where they confirmed that AOMP is the official way to use OpenMP directives with AMD GPUs.

nordmoen pushed a commit to nordmoen/easybuild-easyconfigs that referenced this pull request May 20, 2021
This is an updated version of the PR easybuilders#12018. This PR updates the version
to `13.0-2`. In addition this PR adds an improved patch which should
force the built modules to pick up the EasyBuild installed version of
`gcc`.

This module requires the `aomp.py` EasyBlock to build.
@nordmoen
Copy link
Contributor Author

Closing in favor of #12909

@nordmoen nordmoen closed this May 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants