-
Notifications
You must be signed in to change notification settings - Fork 568
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
optimize do loops #979
optimize do loops #979
Conversation
* use npa when filling va at the end of pdlib_explict_block * re-indent w3wavemd after removal of if-block (from wise pr) * trailing whitespace cleanup (from wise pr)
More exciting work, thank you @DeniseWorthen. The matrix.comp for #975 is running now so that PR should be ready for merging later this morning if no issues. I'm looking forward to seeing the speed improvements from this PR! |
@DeniseWorthen, the dependency #975 was just merged. I'll start the regtests for this now. |
@DeniseWorthen could you please update your branch? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey Denise,
missed that one, thanks for optimizing this part!
Cheers
Aron
I re-ran my tests using 4x as many tasks for the WAV model. The improvement is still there, but is reduced (as expected): current code: The ESMF Profile gives the following values for the WAV RunPhase1 (min time, mean time, max time): current code: 385.2281 187.5005 806.7036 So these are reductions of ~4.5% in wall clock, and ~10% in mean run phase time. |
Thanks for the update @DeniseWorthen. The testing will finish this morning and then complete the review this afternoon if there's no problems. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code review Pass
Regtests Pass
matrixCompFull.txt
matrixDiff.txt
matrixCompSummary.txt
- Summary of non-identicals shows only the known non-b4b cases with unstructured mod_defs.
**********************************************************************
********************* non-identical cases ****************************
**********************************************************************
mww3_test_03/./work_PR1_MPI_e (1 files differ)
mww3_test_03/./work_PR3_UQ_MPI_e_c (1 files differ)
mww3_test_03/./work_PR3_UNO_MPI_e (1 files differ)
mww3_test_03/./work_PR2_UQ_MPI_e (1 files differ)
mww3_test_03/./work_PR2_UNO_MPI_d2 (15 files differ)
mww3_test_03/./work_PR1_MPI_d2 (14 files differ)
mww3_test_03/./work_PR3_UNO_MPI_d2_c (17 files differ)
mww3_test_03/./work_PR3_UQ_MPI_d2_c (16 files differ)
mww3_test_03/./work_PR3_UNO_MPI_d2 (10 files differ)
mww3_test_03/./work_PR2_UQ_MPI_d2 (14 files differ)
mww3_test_03/./work_PR3_UQ_MPI_e (1 files differ)
mww3_test_03/./work_PR3_UNO_MPI_e_c (1 files differ)
mww3_test_03/./work_PR3_UQ_MPI_d2 (16 files differ)
ww3_ta1/./work_UPD0F_U (0 files differ)
ww3_tp2.10/./work_MPI_OMPH (6 files differ)
ww3_tp2.16/./work_MPI_OMPH (4 files differ)
ww3_tp2.17/./work_ma (1 files differ)
ww3_tp2.17/./work_a (1 files differ)
ww3_tp2.17/./work_mc1 (1 files differ)
ww3_tp2.17/./work_mb (1 files differ)
ww3_tp2.17/./work_mc (1 files differ)
ww3_tp2.17/./work_ma1 (1 files differ)
ww3_tp2.17/./work_c (1 files differ)
ww3_tp2.17/./work_b (1 files differ)
ww3_tp2.19/./work_1B_a (1 files differ)
ww3_tp2.19/./work_1A_a (1 files differ)
ww3_tp2.19/./work_1C_a (1 files differ)
ww3_ufs1.3/./work_a (3 files differ)
**********************************************************************
************************ identical cases *****************************
Thanks again, @DeniseWorthen for this excellent catch. |
@MatthewMasarik-NOAA Thanks. Could you or Jessica please create a PR to update the ufs-weather-model branch? |
Yes, we will do that today or tomorrow at the latest. |
Pull Request Summary
Reorders the loops in
PDLIB_EXPLICIT_BLOCK
for efficiency.Description
Reorders several loops across
ip=1,npa
orip=1,np
so that the inner most index varies the fastest.Note: this PR was built on top of the changes in PR #975 and so depends on that PR being merged first.
Issue(s) addressed
Commit Message
Reorders some loops in PDLIB_EXPLICIT_BLOCK for efficiency.
Check list
Testing