-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UFS-WM regression test failures: spack stack PR #1707 #708
Comments
I'm currently exploring this on Gaea using the cpld_2threads_p8 case. I tried reverting a number of libraries, including jasper+zlib, to no avail. Interestingly, when I swap out a number of libraries for hpc-stack ones, I can get my cpld_2threads_p8 results to match cpld_control_p8 baseline results based hpc-stack (20230804). I'll keep updating here as I find out more. |
As a general observation, the main differences between spack-stack and hpc-stack are:
I have not been able to find any difference in terms of which packages are built with OpenMP support, which seemed like a logical thing to check based on the inconsistency between the cpld_control_p8 and cpld_2threads_p8 cases. |
I can get the cpld_2threads_p8 test on Gaea to successfully match against the hpc-stack (20230804) baseline results if I swap the hpc-stack esmf library into my spack-stack installation but leave everything else exactly the same. For what it's worth, the references in libesmf.a to other libraries (NetCDF, PIO) are identical between the spack-stack (failing) and hpc-stack (succeeding) version of the library, and in my test that passes, I'm linking entirely against the spack-stack versions of NetCDF and PIO, so long story short, I'm pretty confident the problem is with the build of the ESMF library itself and not its dependencies. Looking at the libesmf.a symbols ( |
I've got cpld_2threads_p8 running successfully on Gaea with a pure spack-stack installation by making just a couple of build setting changes to ESMF. The most critical ones appear to be adding "-fp-model precise" to the F90 and CXX flags; I also fiddled with a few other settings so I'll keep testing to make sure that the "-fp-model precise" change is the only tweak needed. If that does it, then we should either update our spack-stack configuration (use |
Sounds like progress! |
Solved. I can get these RTs to match the hpc-stack-based baseline data by adding |
Awesome! |
The |
the tests reported here by @jkbk2004 now pass (see attached log; each test was compared against a baseline created with spack-stack PR 1707 w/ alex's test stack with the esmf/mapl fixes). @climbfuji i think we can go ahead and perform alex's modifications as described in his email for all machines w/ the ufs-pio env. |
Thanks @ulmononian - can you create a list of machines that need the updated installation? Since I was on vacation when the chained environments were created, I don't know for sure which machines received those updates. |
@jkbk2004 @climbfuji since these tests now pass on gaea c4 in a spack-stack - spack-stack comparison (i.e. the method suggested by jong in the issue description), can we close this issue? |
The UFS transitioned to spack-stack two weeks ago after @AlexanderRichert-NOAA and @ulmononian figured out the b4b reproducibility issues. We can close this issue as completed. |
Describe the bug
Reproducibility problem was found during the regression test with ufs-community/ufs-weather-model#1707
To Reproduce
Steps to reproduce the behavior: create baseline for those cases and run the regression test on Gaea: e.g. ./rt.sh -c -e -a [account name] and ./rt.sh -m -e -a [account name]
Expected behavior
A clear and concise description of what you expected to happen. nccmp result shows difference in fields
System:
What system(s) are you running the code on? Gaea and we suspect similar behavior happening on Acorn as well
Additional context
Add any other context about the problem here. We didn't see any problem when we merge in the latest library update PR ufs-community/ufs-weather-model#1745. A suggestion is to revert the jasper and zlib updates and set exactly same library option as current UFS-WM develop and start debugging from there.
The text was updated successfully, but these errors were encountered: