Placeholder issue seeking guidance for optimizing model run time on WCOSS2 for RRFS application #1792

MatthewPyle-NOAA · 2023-06-08T20:33:52Z

We are looking for guidance on optimizing run-time speed on WCOSS2 for a couple of different node count configurations for a large regional domain. One is for a small node count (~13 nodes) used in making 1 hour forecasts within the EnKF system (and writing a restart file at the 1 h time). The other is for a large node count (~110 nodes) used in making 60 hour forecasts with hourly history output (and eventually 15 minute history output for the first ~18 hours), and (eventually) 6 h restart writes.

Input files and a run script has been collected under /lfs/h2/emc/lam/noscrub/Matthew.Pyle/rrfs_optimization_update/ on Cactus and Dogwood

The job_card.sh_6hfore_fullnode script currently is set up to run a 6 h forecast using a 61 node configuration, but can switch to 76, 97, or 110 node configurations easily. If the initialization time could be reduced (it is several minutes for large node counts), and the integration speed could be improved, it would be a great help.

The job_card.sh_1hfore_fullnode script runs a 1 h forecast, writing out a restart file at the end. This general configuration is for the EnKF system, which will either be 30 members run concurrently (each member using 12-15 nodes to run) or in two batches of 15 members (each running on 25-30 nodes). In either scenario, would ideally have all of these forecasts complete within ~10 minutes.

GeorgeVandenberghe-NOAA · 2023-10-20T16:39:00Z

How much I/O does each member do, particularly input. What are the file names and how large are they and how fast to they need to get read? There is evidence the WCOSS2 filesystem itself is being crushed by this input when ensembles are started and if so, the I/O needs redesign. I am late to this investigation and am aware of FMS work to mitigate it

GeorgeVandenberghe-NOAA · 2023-10-20T16:43:39Z

FMS work is described in NOAA-GFDL/FMS#1322

MatthewPyle-NOAA assigned junwang-noaa and unassigned junwang-noaa Jun 8, 2023

MatthewPyle-NOAA added the question Further information is requested label Jun 9, 2023

junwang-noaa added this to RRFS model infrastructure FY23Q3-Q4 Jul 3, 2023

junwang-noaa moved this to Todo in RRFS model infrastructure FY23Q3-Q4 Aug 28, 2023

junwang-noaa added this to Model infrastructure development FY24Q1 Sep 28, 2023

junwang-noaa moved this to Todo in Model infrastructure development FY24Q1 Sep 28, 2023

junwang-noaa removed this from RRFS model infrastructure FY23Q3-Q4 Sep 28, 2023

junwang-noaa added this to Model infrastructure development FY2024Q2 Jan 9, 2024

junwang-noaa removed this from Model infrastructure development FY24Q1 Jan 9, 2024

junwang-noaa added this to Model infrastructure development FY2024Q3 Apr 12, 2024

junwang-noaa removed this from Model infrastructure development FY2024Q3 Apr 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Placeholder issue seeking guidance for optimizing model run time on WCOSS2 for RRFS application #1792

Placeholder issue seeking guidance for optimizing model run time on WCOSS2 for RRFS application #1792

MatthewPyle-NOAA commented Jun 8, 2023 •

edited

Loading

GeorgeVandenberghe-NOAA commented Oct 20, 2023

GeorgeVandenberghe-NOAA commented Oct 20, 2023

Placeholder issue seeking guidance for optimizing model run time on WCOSS2 for RRFS application #1792

Placeholder issue seeking guidance for optimizing model run time on WCOSS2 for RRFS application #1792

Comments

MatthewPyle-NOAA commented Jun 8, 2023 • edited Loading

GeorgeVandenberghe-NOAA commented Oct 20, 2023

GeorgeVandenberghe-NOAA commented Oct 20, 2023

MatthewPyle-NOAA commented Jun 8, 2023 •

edited

Loading