Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ESMF managed threading job submission issue with large number of cores on wcoss2 #1764

Closed
junwang-noaa opened this issue May 22, 2023 · 2 comments
Assignees
Labels
bug Something isn't working

Comments

@junwang-noaa
Copy link
Collaborator

Description

It is found that when submitting large number of cores (>20000), the C768 coupled test with ESMF managed threading failed to submit. While the error message does not show up when running with the same number of cores with single thread using traditional threading.

To Reproduce:

  1. run C768 coupled test with layout 32x32 using 4 threads and ocean & ice with 1 thread.
  2. following error message shows up: "start RPC: RPC timeout after 30s, received 38 of 129 responses", the job can't run.
  3. The same test with atm layout 64x64, atm & ocean &ice all have 1 thread, the job can start to run.

Additional context

Output

@junwang-noaa junwang-noaa added the bug Something isn't working label May 22, 2023
@junwang-noaa junwang-noaa self-assigned this Jun 5, 2023
@junwang-noaa junwang-noaa moved this from In Progress to Blocked in Model infrastructure FY23Q3-Q4 Jul 3, 2023
@junwang-noaa
Copy link
Collaborator Author

There are updates in ESMF 8.5.0. We will test UFS WM with ESMF 8.5.0 on C5.

@junwang-noaa
Copy link
Collaborator Author

Since we are starting to use traditional threading for high resolution runs, I will close the issue until a solution is provided in ESMF library for ESMF managed threading

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
No open projects
Development

No branches or pull requests

1 participant