You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The WE2E test grid_RRFS_NA_3km_ics_FV3GFS_lbcs_FV3GFS_suite_RRFS_v1beta is the most expensive test currently in the test suite, and so is currently not run as part of the comprehensive suite. At the last point it was tested (prior to merging #686), this test succeeded, and was expected to succeed in the develop branch.
Current behavior
As part of PR #676, we were testing all tests including this one, and noticed it was not succeeding. After some debugging we realized the failures not due to those changes, but was in fact also failing in the top of develop. The run_post tasks are failing with a number of error messages, eventually with a segmentation fault.
This has only been tested on Hera. Unclear if this will occur on other platforms.
Steps To Reproduce
Run WE2E test grid_RRFS_NA_3km_ics_FV3GFS_lbcs_FV3GFS_suite_RRFS_v1beta (this is a large test, uses ~1500 core hours on Hera)
Observe failure
To save core hours, reference failures in log files here on disk on Hera: /scratch2/BMC/fv3lam/kavulich/UFS/workdir/test_develop/expt_dirs/grid_RRFS_NA_3km_ics_FV3GFS_lbcs_FV3GFS_suite_RRFS_v1beta/
Detailed Description of Fix
Giving the task more processors (12 nodes vs 8 nodes) results in success, so this seems likely to be an out-of-memory issue. I will address this in my next test reorg PR.
The text was updated successfully, but these errors were encountered:
Expected behavior
The WE2E test
grid_RRFS_NA_3km_ics_FV3GFS_lbcs_FV3GFS_suite_RRFS_v1beta
is the most expensive test currently in the test suite, and so is currently not run as part of thecomprehensive
suite. At the last point it was tested (prior to merging #686), this test succeeded, and was expected to succeed in the develop branch.Current behavior
As part of PR #676, we were testing all tests including this one, and noticed it was not succeeding. After some debugging we realized the failures not due to those changes, but was in fact also failing in the top of
develop
. Therun_post
tasks are failing with a number of error messages, eventually with a segmentation fault.Machines affected
This has only been tested on Hera. Unclear if this will occur on other platforms.
Steps To Reproduce
grid_RRFS_NA_3km_ics_FV3GFS_lbcs_FV3GFS_suite_RRFS_v1beta
(this is a large test, uses ~1500 core hours on Hera)/scratch2/BMC/fv3lam/kavulich/UFS/workdir/test_develop/expt_dirs/grid_RRFS_NA_3km_ics_FV3GFS_lbcs_FV3GFS_suite_RRFS_v1beta/
Detailed Description of Fix
Giving the task more processors (12 nodes vs 8 nodes) results in success, so this seems likely to be an out-of-memory issue. I will address this in my next test reorg PR.
The text was updated successfully, but these errors were encountered: