-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compiler toolchain compatibility for transition to new HPC #157
Comments
Rui forwarded this info
|
With Andrew's advice the above issue has been fixed by changing ice_ocean_timestep from 600 to 300. |
Another issue on 01deg_jra55_iaf: bash-4.1$ more access-om2.out
&OCEAN_SOLO_NML / Reading zbgc_nmlMPI_ABORT was invoked on rank 5744 in communicator MPI_COMM_WORLD NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
|
You'll need to change |
I execute the example from cosima-om2 repository: By comparing it with the one from https://github.com/COSIMA/01deg_jra55_iaf I can see some differences config.yaml:
|
Yes, I'm in the process of updating everything in https://github.com/OceansAus/access-om2, including these control dirs. If you want to try the very latest (bleeding edge) config, use branch |
The example from https://github.com/COSIMA/01deg_jra55_iaf works for all builds using ompi v1,2,3,4 as the CPU cores used for cice is consistent between config.yaml and cice_in.nml. Again I also need to change the original value of ice_ocean_timestep from 450 to 300 to avoid errors happened for 01deg_jra55_ryf. |
comment from #127 in December: |
I gather we may also need to migrate to a newer netcdf library on gadi. I suppose something in the latest 4.6.x series would be most future-proof. There's some discussion here: COSIMA/libaccessom2#24 |
@aekiss NetCDF 4.7.0 has been out for a while now (since the start of May this year), so I'd suggest looking into using that one... |
Thanks but 4.6.1 seems to be the newest module on raijin |
If you want the new version, just send an e-mail to the helpdesk and someone will install it for you :-) . |
Note that modules are loaded in numerous places, which would all need updating:
Have I missed anything?
|
#178 builds with I guess we can close this issue now? |
gadi now has openMPI 4.0.2 installed, which is the latest release: https://www.open-mpi.org/software/ompi/v4.0/ |
|
I've changed these
so that the The new gadi builds with openMPI4.0.2 are here:
I haven't tested whether they run. |
I do think it is worthwhile upgrading to OpenMPI 4.0.2 before pushing this to So this change does not, a priori, mean more stable performance with the tenth. The 1 and 0.25 degree seem to be fine. |
Is there a compelling reason to hard-code the OpenMPI version? I'd suggest keeping it up to date with the latest version so that you don't get surprised as new versions are released and existing ones are deprecated or removed. |
It is a very complex collection of code, model configuration and build environment. Keeping the build environment as stable as possible takes out one possible culprit when things stop working. |
Just noting that intel-compiler/2020.0.166 is now installed on Gadi, whereas we are using intel-compiler/2019.5.281. Presumably there's no reason to switch to the newer compiler? |
I generally wouldn't change things unless necessary. Just adds another possible thing to go wrong. It is pretty trivial, so I'd suggest bedding down new code/forcing versions and then upgrade that stuff later when comparisons can be easily made. There is a confounding factor that you may be reluctant to change anything once OMIP style runs have started, so up to you I guess. |
are we ready to close this issue now? AFAIK the gadi transition is now complete. |
NCI is installing a new peak HPC, called
gadi
The new machine will not support 1.x series OpenMPI, and current builds use
1.10.2
. We will need to migrate to a new version of OpenMPI, which will also require a new version of the intel fortran compiler.This issue is a collection point for information about tests that have been performed so as to not duplicate effort.
The text was updated successfully, but these errors were encountered: