-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[develop] Jet switch from CentOS to Rocky #1045
[develop] Jet switch from CentOS to Rocky #1045
Conversation
Using new slurm and rocoto, some HPSS jobs were failing due to lack of memory. Increased from 2GB to 4GB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fundamental tests were run on Jet Rocky8 and all successfully passed:
----------------------------------------------------------------------------------------------------
Experiment name | Status | Core hours used
----------------------------------------------------------------------------------------------------
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta_2 COMPLETE 10.56
nco_grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_timeoffset_suite_ COMPLETE 14.39
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2_20240 COMPLETE 7.49
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v17_p8_plot COMPLETE 16.27
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_HRRR_suite_HRRR_2024022 COMPLETE 34.57
grid_SUBCONUS_Ind_3km_ics_HRRR_lbcs_RAP_suite_WoFS_v0_20240229185 COMPLETE 25.68
grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_GFS_v16_2024022918551 COMPLETE 23.57
----------------------------------------------------------------------------------------------------
Total COMPLETE 132.53
Approving now.
The fundamental tests were also successfully run on Jet using CentOS:
|
Built the SRW App on Rocky 8 using the changes from this PR and ensured the changes worked by running this case: /lfs4/HFIP/hfv3gfs/Edward.Snyder/PR_1045/expt_dirs/grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2 |
Fundamental tests ran successfully on Jet (xjet):
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ran the tests, with the following changes made to allow testing for Rocky 8 OS:
modulefiles/build_jet_intel.lua:
uncommented
prepend_path("MODULEPATH","/mnt/lfs4/HFIP/hfv3gfs/role.epic/spack-stack/spack-stack-1.5.0/envs/unified-env-rocky8/install/modulefiles/Core")
commented out
prepend_path("MODULEPATH","/mnt/lfs4/HFIP/hfv3gfs/role.epic/spack-stack/spack-stack-1.5.0/envs/unified-env/install/modulefiles/Core")
In ./ush/machine/jet.yaml, set the all the partitions to xjet:
PARTITION_DEFAULT: xjet
...
PARTITION_FCST: xjet
The Hera Jenkins tests failed due to the system coming down yesterday for maintenance. These tests have been requeued. There was also a failure on Jet. The
Once the Hera tests complete, this PR can be merged. |
The Hera Intel tests were run on Rocky8 and all tests passed:
|
Unfortunately, while running the WE2E tests with Rocky8 on Hera GNU, the issue that you noted during the UFS apps and components coordination meeting showed up - all tests are failing due to using srun and not being able to find We will need to hope that the tests are able to run over the weekend on CentOS and no longer set in queue. |
Given that Hera GNU tests are just sitting in queue for days and the inability to run Hera GNU on Rocky8, the successful run of the Hera Intel and the rest of the platforms will be enough to get this work merged. Since Rocky8 will be the default package of the nodes following today's update, I will go ahead and set the spack-stack path to point at the rocky8 location and change the |
….lua and set PARTITION_FCST=xjet in ush/machine/jet.yaml
The rerun of the Jenkins tests on Jet had one failure,
None of the changes made in this PR will cause this issue. The use of rocotorewind/rocotoboot allowed the failed task to successfully pass:
Moving forward with merging this PR now. |
DESCRIPTION OF CHANGES:
Jet is switching from CentOS to Rocky OS.
Type of change
TESTS CONDUCTED:
ISSUE:
Solves issue #1044
CHECKLIST
LABELS (optional):
A Code Manager needs to add the following labels to this PR: