Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix issue with scaling over instances #329

Merged
merged 2 commits into from
Dec 15, 2022

Conversation

jedwards4b
Copy link
Collaborator

Description of changes

Remove a performance funnel in ensemble_driver which caused linear scaling with number of members

Specific notes

Contributors other than yourself, if any:

CMEPS Issues Fixed (include github issue #): Fixes #326

Are changes expected to change answers? (specify if bfb, different at roundoff, more substantial)

Any User Interface Changes (namelist or namelist defaults changes)?

Testing performed

Testing performed if application target is CESM:

  • (recommended) CIME_DRIVER=nuopc scripts_regression_tests.py
    • machines:
    • details (e.g. failed tests):
  • (recommended) CESM testlist_drv.xml
    • machines and compilers:
    • details (e.g. failed tests):
  • (optional) CESM prealpha test
    • machines and compilers
    • details (e.g. failed tests):
  • (other) please described in detail
    • machines and compilers
    • details (e.g. failed tests):

Testing performed if application target is UFS-coupled:

  • (recommended) UFS-coupled testing
    • description:
    • details (e.g. failed tests):

Testing performed if application target is UFS-HAFS:

  • (recommended) UFS-HAFS testing
    • description:
    • details (e.g. failed tests):

Hashes used for testing:

  • CESM:
  • UFS-coupled, then umbrella repostiory to check out and associated hash:
    • repository to check out:
    • branch/hash:
  • UFS-HAFS, then umbrella repostiory to check out and associated hash:
    • repository to check out:
    • branch/hash:

@jedwards4b jedwards4b self-assigned this Dec 14, 2022
@jedwards4b
Copy link
Collaborator Author

I tested the C5 case and saw considerable improvement over what @fischer-ncar reported. I would like if you and @fischer-ncar confirm this result.

Region PETs PEs Count Mean (s) Min (s) Min PET Max (s) Max PET
[ESMF] 540 540 1 217.2501 215.4383 176 217.9992 5
[ensemble] Init 1 540 540 1 204.3150 202.3066 206 205.1041 96
[ESM0004] IPDv02p3 108 108 1 166.4674 166.4671 324 166.4693 395
[LND] IPDv01p3 108 108 1 120.8491 120.8489 344 120.8492 388

@fischer-ncar
Copy link
Contributor

I'm cleaning out my office this morning. So I'll give it a try this afternoon.

@fischer-ncar
Copy link
Contributor

This is what I'm seeing for C2 and C5. So I can confirm a considerable improvement.

Region                                                                       PETs   PEs    Count    Mean (s)    Min (s)     Min PET Max (s)     Max PET
[ensemble] Init 1                                                            216    216    1        226.9003    226.8974    109     226.9062    18   
[ensemble] Init 1                                                            540    540    1        222.5143    222.5101    113     222.5175    216

@jedwards4b jedwards4b merged commit 78448ba into ESCOMP:master Dec 15, 2022
@jedwards4b jedwards4b deleted the ninst_scaling branch December 15, 2022 22:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Multi-instances init time scaling linearly.
3 participants