Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimize do loops #979

Merged

Conversation

DeniseWorthen
Copy link
Contributor

@DeniseWorthen DeniseWorthen commented Apr 10, 2023

Pull Request Summary

Reorders the loops in PDLIB_EXPLICIT_BLOCK for efficiency.

Description

Reorders several loops across ip=1,npa or ip=1,np so that the inner most index varies the fastest.
Note: this PR was built on top of the changes in PR #975 and so depends on that PR being merged first.

Issue(s) addressed

Commit Message

Reorders some loops in PDLIB_EXPLICIT_BLOCK for efficiency.

Check list

Testing

  • How were these changes tested?
  • Are the changes covered by regression tests? (If not, why? Do new tests need to be added?)
  • Have the matrix regression tests been run (if yes, please note HPC and compiler)?
  • Please indicate the expected changes in the regression test output, (Note the list of known non-identical tests.)
  • Please provide the summary output of matrix.comp (matrix.Diff.txt, matrixCompFull.txt and matrixCompSummary.txt):

DeniseWorthen and others added 3 commits April 7, 2023 12:05
* use npa when filling va at the end of pdlib_explict_block
* re-indent w3wavemd after removal of if-block (from wise pr)
* trailing whitespace cleanup (from wise pr)
@DeniseWorthen DeniseWorthen changed the title Feature/optimizeloop dev optimize do loops Apr 10, 2023
@MatthewMasarik-NOAA
Copy link
Collaborator

More exciting work, thank you @DeniseWorthen. The matrix.comp for #975 is running now so that PR should be ready for merging later this morning if no issues. I'm looking forward to seeing the speed improvements from this PR!

@MatthewMasarik-NOAA
Copy link
Collaborator

@DeniseWorthen, the dependency #975 was just merged. I'll start the regtests for this now.

@MatthewMasarik-NOAA
Copy link
Collaborator

@DeniseWorthen could you please update your branch?

Copy link
Collaborator

@aronroland aronroland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Denise,
missed that one, thanks for optimizing this part!
Cheers
Aron

@DeniseWorthen
Copy link
Contributor Author

More exciting work, thank you @DeniseWorthen. The matrix.comp for #975 is running now so that PR should be ready for merging later this morning if no issues. I'm looking forward to seeing the speed improvements from this PR!

I re-ran my tests using 4x as many tasks for the WAV model. The improvement is still there, but is reduced (as expected):

current code: The total amount of wall time = 870.122403
reordered loops: The total amount of wall time = 830.824244

The ESMF Profile gives the following values for the WAV RunPhase1 (min time, mean time, max time):

current code: 385.2281 187.5005 806.7036
reordered loops: 364.6094 170.1290 763.8478

So these are reductions of ~4.5% in wall clock, and ~10% in mean run phase time.

@MatthewMasarik-NOAA
Copy link
Collaborator

Thanks for the update @DeniseWorthen. The testing will finish this morning and then complete the review this afternoon if there's no problems.

Copy link
Collaborator

@MatthewMasarik-NOAA MatthewMasarik-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review Pass

Regtests Pass
matrixCompFull.txt
matrixDiff.txt
matrixCompSummary.txt

  • Summary of non-identicals shows only the known non-b4b cases with unstructured mod_defs.
**********************************************************************           
********************* non-identical cases ****************************           
**********************************************************************           
mww3_test_03/./work_PR1_MPI_e                     (1 files differ)               
mww3_test_03/./work_PR3_UQ_MPI_e_c                     (1 files differ)          
mww3_test_03/./work_PR3_UNO_MPI_e                     (1 files differ)           
mww3_test_03/./work_PR2_UQ_MPI_e                     (1 files differ)            
mww3_test_03/./work_PR2_UNO_MPI_d2                     (15 files differ)         
mww3_test_03/./work_PR1_MPI_d2                     (14 files differ)             
mww3_test_03/./work_PR3_UNO_MPI_d2_c                     (17 files differ)       
mww3_test_03/./work_PR3_UQ_MPI_d2_c                     (16 files differ)        
mww3_test_03/./work_PR3_UNO_MPI_d2                     (10 files differ)         
mww3_test_03/./work_PR2_UQ_MPI_d2                     (14 files differ)          
mww3_test_03/./work_PR3_UQ_MPI_e                     (1 files differ)            
mww3_test_03/./work_PR3_UNO_MPI_e_c                     (1 files differ)         
mww3_test_03/./work_PR3_UQ_MPI_d2                     (16 files differ)          
ww3_ta1/./work_UPD0F_U                     (0 files differ)                      
ww3_tp2.10/./work_MPI_OMPH                     (6 files differ)                  
ww3_tp2.16/./work_MPI_OMPH                     (4 files differ)                  
ww3_tp2.17/./work_ma                     (1 files differ)                        
ww3_tp2.17/./work_a                     (1 files differ)                         
ww3_tp2.17/./work_mc1                     (1 files differ)                       
ww3_tp2.17/./work_mb                     (1 files differ)                        
ww3_tp2.17/./work_mc                     (1 files differ)                        
ww3_tp2.17/./work_ma1                     (1 files differ)                       
ww3_tp2.17/./work_c                     (1 files differ)                         
ww3_tp2.17/./work_b                     (1 files differ)                         
ww3_tp2.19/./work_1B_a                     (1 files differ)                      
ww3_tp2.19/./work_1A_a                     (1 files differ)                      
ww3_tp2.19/./work_1C_a                     (1 files differ)                      
ww3_ufs1.3/./work_a                     (3 files differ)                         
                                                                                 
**********************************************************************           
************************ identical cases *****************************

@MatthewMasarik-NOAA
Copy link
Collaborator

Thanks again, @DeniseWorthen for this excellent catch.

@MatthewMasarik-NOAA MatthewMasarik-NOAA merged commit 11d4f67 into NOAA-EMC:develop Apr 12, 2023
@DeniseWorthen
Copy link
Contributor Author

@MatthewMasarik-NOAA Thanks. Could you or Jessica please create a PR to update the ufs-weather-model branch?

@JessicaMeixner-NOAA
Copy link
Collaborator

Yes, we will do that today or tomorrow at the latest.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

loops in explict block are in wrong order
4 participants