-
Notifications
You must be signed in to change notification settings - Fork 259
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove requirement for esmf and mapl debug versions, remove DEBUG_LINKMPI #1681
Remove requirement for esmf and mapl debug versions, remove DEBUG_LINKMPI #1681
Conversation
The PR that is about to be merged, #1658, makes major revisions to the smoke implementation and adds a debug test for it. |
I ran
|
The new version hasn't been merged to develop yet, so that isn't the one you tested. |
Good to know that it's not my PR that causes the smoke test problems! |
I ran the test in develop branch about an hour ago, whatever is/was in develop at that time. |
The cpld_debug_p8 does this, too. This has been a known issue for quite a while: #1432 |
I sincerely hope that if you run it again after the PR is merged, you will see it pass. |
Thanks, Sam. Once upon a time, all tests passed on all platforms ("an old man complaining" ;-) ). |
Actually, the cpld_control_p8 was broken for a long time, for pretty much as long as that test has existed. The problem has gotten steadily worse, and it's at the point where three tries is usually not enough. |
@SamuelTrahanNOAA Thanks for creating the issue to report the problem. @jkbk2004 Do we know when the problem started and in which PRs the cpld_debug_p8 failed many times with gnu compiler? I agree with Dom, the test used to run fine. This is a surprise. |
The problem is much older than the issue I submitted. I just got fed up with it one day, and submitted an issue. |
The update to the smoke code and regression tests has been merged. You should update your branches and try again. Hopefully, your smoke issue will be gone. |
…r-model into feature/remove-debug-stuff
@SamuelTrahanNOAA I can confirm (for GNU, didn't test Intel) that after pulling in develop (after your smoke update was merged) only this tests fails:
|
@climbfuji Was the cpld_control_p8 a timeout failure as well? If so I think we can still consider this ready for commit queue. |
@BrianCurtis-NOAA If Dom confirms that the failed cpld_control_p8 was a timeout on hera.gnu, I think this might be a temporary work-around since this test is failing consistently on hera.gnu now
I can't test because hera is down today though. |
Yes, it was:
|
I just pulled in develop. |
Thanks, @BrianCurtis-NOAA . Starting CI now. |
Automated RT Failure Notification |
@FernandoAndrade-NOAA let me add orion label again with locally sticking in |
Automated RT Failure Notification |
@jkbk2004 , Looks like there's still an issue with Orion, I'll update when my manual run finishes. |
on-behalf-of @ufs-community <[email protected]>
on-behalf-of @ufs-community <[email protected]>
on-behalf-of @ufs-community <[email protected]>
Due to a combination recent WCOSS2 and UFSWM changes, wallclock needs to be bumped to 45 minutes for compiles to succeed.
All tests are done. We can start merging process. No dependencies. Please, go ahead for final reviews and approvals. |
Thanks everyone for this quick turnaround, highly appreciated! |
@climbfuji I thought this commit meant we were no longer compiling w/ the debug ESMF library. When I look at the PET logs in PR branch I'm preparing, I still see the message in the log
This looks like ESMF being used is the debug version? |
Gerhard gave me a clue where to look for where this is being triggered. In CMEPS, we have
|
Description
Fixes #1680
Fixes #330
Note on expected/unexpected baseline changes. I ran the regression tests on Hera and thought I'd see changes in the results for at least some of the DEBUG tests. Instead I found this:
Hera/Intel
All tests passed against the existing baseline, except one test that does not use the DEBUG build and therefore the result change doesn't make sense to me (maybe something wrong with the test/code tested itself):
I reran this test and it failed with the same b4b mismatch.
Hera/GNU
All tests passed against the existing baseline, except:
rrfs_smoke_conus13km_hrrr_warm
: same test as for Intel, does not use the DEBUG build and therefore the result change doesn't make sense to me (maybe something wrong with the test/code tested itself)?cpld_control_p8
failed because it timed out (exceeded walltime) consistently. How does this complete within the walltime when using debug builds of ESMF and MAPL? It seems to hang in the first UFS Aerosols step:Logs of rt runs on Hera attached here: rt_hera_intel_gnu_pr1681.tar.gz
Top of commit queue on: TBD
n/a - no changes to any of the submodules
Input data additions/changes
Anticipated changes to regression tests:
Subcomponents involved:
Combined with PR's (If Applicable):
Commit Queue Checklist:
Linked PR's and Issues:
Fixes #1680
Fixes #330
Testing Day Checklist:
Testing Log (for CM's):