Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DO NOT MERGE] Weekly CI Tests Friday Feb 09, 2024 #2305

Closed
wants to merge 1 commit into from

Conversation

emcbot
Copy link

@emcbot emcbot commented Feb 9, 2024

[DO NOT MERGE] Weekly CI Tests Friday Feb 09, 2024

@emcbot emcbot marked this pull request as draft February 9, 2024 22:47
@emcbot emcbot added CI/CD Issue related to CI/CD CI-Orion-Ready **CM use only** PR is ready for CI testing on Orion CI-Hera-Ready **CM use only** PR is ready for CI testing on Hera CI-Orion-Building **Bot use only** CI testing is cloning/building on Orion and removed CI-Orion-Ready **CM use only** PR is ready for CI testing on Orion labels Feb 9, 2024
@emcbot
Copy link
Author

emcbot commented Feb 9, 2024

CI Update on Orion at 02/09/24 04:48:06 PM
============================================
Cloning and Building global-workflow PR: 2305
with PID: 418550 on host: Orion-login-1

@emcbot emcbot added CI-Orion-Running **Bot use only** CI testing on Orion for this PR is in-progress and removed CI-Orion-Building **Bot use only** CI testing is cloning/building on Orion labels Feb 9, 2024
@emcbot
Copy link
Author

emcbot commented Feb 9, 2024

Automated global-workflow Testing Results:

Machine: Orion
Start: Fri Feb  9 16:51:36 CST 2024 on Orion-login-1.HPC.MsState.Edu
---------------------------------------------------
Build: Completed at 02/09/24 05:48:09 PM
Case setup: Completed for experiment C384_atm3DVar_941f419a
Case setup: Completed for experiment C384C192_hybatmda_941f419a
Case setup: Completed for experiment C384_S2SWA_941f419a

@emcbot emcbot added CI-Orion-Failed **Bot use only** CI testing on Orion for this PR has failed and removed CI-Orion-Running **Bot use only** CI testing on Orion for this PR is in-progress labels Feb 10, 2024
@emcbot
Copy link
Author

emcbot commented Feb 10, 2024

Experiment C384_atm3DVar_941f419a  *** FAILED *** on Orion
Experiment C384_atm3DVar_941f419a  with 3 tasks failed at 02/09/24 06:20:18 PM
Error logs:
/work2/noaa/stmp/GFS_CI_ROOT/ORION/PR/2305/RUNTESTS/COMROOT/C384_atm3DVar_941f419a/logs/2023040118/gdasanal.log
/work2/noaa/stmp/GFS_CI_ROOT/ORION/PR/2305/RUNTESTS/COMROOT/C384_atm3DVar_941f419a/logs/2023040118/gdasfcst.log
/work2/noaa/stmp/GFS_CI_ROOT/ORION/PR/2305/RUNTESTS/COMROOT/C384_atm3DVar_941f419a/logs/2023040118/gfsanal.log
/work2/noaa/stmp/GFS_CI_ROOT/ORION/PR/2305/RUNTESTS/COMROOT/C384_atm3DVar_941f419a/logs/2023040200/gdasanal.log
/work2/noaa/stmp/GFS_CI_ROOT/ORION/PR/2305/RUNTESTS/COMROOT/C384_atm3DVar_941f419a/logs/2023040200/gdasfcst.log
/work2/noaa/stmp/GFS_CI_ROOT/ORION/PR/2305/RUNTESTS/COMROOT/C384_atm3DVar_941f419a/logs/2023040200/gfsanal.log

@WalterKolczynski-NOAA
Copy link
Contributor

Experiment C384_atm3DVar_941f419a  *** FAILED *** on Orion
Experiment C384_atm3DVar_941f419a  with 3 tasks failed at 02/09/24 06:20:18 PM
Error logs:
/work2/noaa/stmp/GFS_CI_ROOT/ORION/PR/2305/RUNTESTS/COMROOT/C384_atm3DVar_941f419a/logs/2023040118/gdasanal.log
/work2/noaa/stmp/GFS_CI_ROOT/ORION/PR/2305/RUNTESTS/COMROOT/C384_atm3DVar_941f419a/logs/2023040118/gdasfcst.log
/work2/noaa/stmp/GFS_CI_ROOT/ORION/PR/2305/RUNTESTS/COMROOT/C384_atm3DVar_941f419a/logs/2023040118/gfsanal.log
/work2/noaa/stmp/GFS_CI_ROOT/ORION/PR/2305/RUNTESTS/COMROOT/C384_atm3DVar_941f419a/logs/2023040200/gdasanal.log
/work2/noaa/stmp/GFS_CI_ROOT/ORION/PR/2305/RUNTESTS/COMROOT/C384_atm3DVar_941f419a/logs/2023040200/gdasfcst.log
/work2/noaa/stmp/GFS_CI_ROOT/ORION/PR/2305/RUNTESTS/COMROOT/C384_atm3DVar_941f419a/logs/2023040200/gfsanal.log

No idea what happened here. Some of these logs are gone. The fcst job hit the wallclock. Modules hanging again?

@emcbot emcbot added CI-Hera-Building **Bot use only** CI testing is cloning/building on Hera CI-Hera-Running **Bot use only** CI testing on Hera for this PR is in-progress and removed CI-Hera-Ready **CM use only** PR is ready for CI testing on Hera CI-Hera-Building **Bot use only** CI testing is cloning/building on Hera labels Feb 10, 2024
@emcbot
Copy link
Author

emcbot commented Feb 10, 2024

Running experiments: C48_ATM on Hera
Built against system [] in directory:
/scratch1/NCEPDEV/global/Terry.McGuinness/GFS_CI_ROOT_JENKINS/Jenkins/workspace/flow_EMC-Global-Pipeline_PR-2305/TESTDIR/gfs
With the experiment in directory:
/scratch1/NCEPDEV/global/Terry.McGuinness/GFS_CI_ROOT_JENKINS/Jenkins/workspace/flow_EMC-Global-Pipeline_PR-2305/TESTDIR/RUNTESTS/

@emcbot
Copy link
Author

emcbot commented Feb 10, 2024

Running experiments: C48_S2SWA_gefs on Hera
Built against system [] in directory:
/scratch1/NCEPDEV/global/Terry.McGuinness/GFS_CI_ROOT_JENKINS/Jenkins/workspace/flow_EMC-Global-Pipeline_PR-2305/TESTDIR/gfs
With the experiment in directory:
/scratch1/NCEPDEV/global/Terry.McGuinness/GFS_CI_ROOT_JENKINS/Jenkins/workspace/flow_EMC-Global-Pipeline_PR-2305/TESTDIR/RUNTESTS/

@emcbot
Copy link
Author

emcbot commented Feb 10, 2024

Running experiments: C48_S2SW on Hera
Built against system [] in directory:
/scratch1/NCEPDEV/global/Terry.McGuinness/GFS_CI_ROOT_JENKINS/Jenkins/workspace/flow_EMC-Global-Pipeline_PR-2305/TESTDIR/gfs
With the experiment in directory:
/scratch1/NCEPDEV/global/Terry.McGuinness/GFS_CI_ROOT_JENKINS/Jenkins/workspace/flow_EMC-Global-Pipeline_PR-2305/TESTDIR/RUNTESTS/

@emcbot
Copy link
Author

emcbot commented Feb 10, 2024

Running experiments: C96_atm3DVar on Hera
Built against system [] in directory:
/scratch1/NCEPDEV/global/Terry.McGuinness/GFS_CI_ROOT_JENKINS/Jenkins/workspace/flow_EMC-Global-Pipeline_PR-2305/TESTDIR/gfs
With the experiment in directory:
/scratch1/NCEPDEV/global/Terry.McGuinness/GFS_CI_ROOT_JENKINS/Jenkins/workspace/flow_EMC-Global-Pipeline_PR-2305/TESTDIR/RUNTESTS/

@emcbot
Copy link
Author

emcbot commented Feb 10, 2024

SUCCESS running experiments: C48_ATM on Hera

@emcbot
Copy link
Author

emcbot commented Feb 10, 2024

SUCCESS running experiments: C48_S2SWA_gefs on Hera

@emcbot
Copy link
Author

emcbot commented Feb 10, 2024

SUCCESS running experiments: C48_S2SW on Hera

@emcbot
Copy link
Author

emcbot commented Feb 10, 2024

SUCCESS running experiments: C96_atm3DVar on Hera

@emcbot emcbot added CI-Hera-Passed **Bot use only** CI testing on Hera for this PR has completed successfully and removed CI-Hera-Running **Bot use only** CI testing on Hera for this PR is in-progress labels Feb 10, 2024
@emcbot
Copy link
Author

emcbot commented Feb 10, 2024

CI SUCCESS Hera at 02 10 02:07:54

Built and ran in directory /scratch1/NCEPDEV/global/Terry.McGuinness/GFS_CI_ROOT_JENKINS/Jenkins/workspace/flow_EMC-Global-Pipeline_PR-2305/TESTDIR

@emcbot
Copy link
Author

emcbot commented Feb 10, 2024

CI SUCCESS Hera at 02 10 02:07:56

Built and ran in directory /scratch1/NCEPDEV/global/Terry.McGuinness/GFS_CI_ROOT_JENKINS/Jenkins/workspace/flow_EMC-Global-Pipeline_PR-2305/TESTDIR

@emcbot
Copy link
Author

emcbot commented Feb 10, 2024

CI SUCCESS Hera at 02 10 02:07:57

Built and ran in directory /scratch1/NCEPDEV/global/Terry.McGuinness/GFS_CI_ROOT_JENKINS/Jenkins/workspace/flow_EMC-Global-Pipeline_PR-2305/TESTDIR

@emcbot
Copy link
Author

emcbot commented Feb 10, 2024

CI SUCCESS Hera at 02 10 02:07:58

Built and ran in directory /scratch1/NCEPDEV/global/Terry.McGuinness/GFS_CI_ROOT_JENKINS/Jenkins/workspace/flow_EMC-Global-Pipeline_PR-2305/TESTDIR

@WalterKolczynski-NOAA WalterKolczynski-NOAA added CI-Orion-Ready **CM use only** PR is ready for CI testing on Orion and removed CI-Orion-Failed **Bot use only** CI testing on Orion for this PR has failed labels Feb 10, 2024
@emcbot emcbot added CI-Orion-Building **Bot use only** CI testing is cloning/building on Orion and removed CI-Orion-Ready **CM use only** PR is ready for CI testing on Orion labels Feb 10, 2024
@emcbot
Copy link
Author

emcbot commented Feb 10, 2024

CI Update on Orion at 02/10/24 02:24:41 AM
============================================
Cloning and Building global-workflow PR: 2305
with PID: 48296 on host: Orion-login-1

@emcbot emcbot added CI-Orion-Running **Bot use only** CI testing on Orion for this PR is in-progress and removed CI-Orion-Building **Bot use only** CI testing is cloning/building on Orion labels Feb 10, 2024
@emcbot
Copy link
Author

emcbot commented Feb 10, 2024

Automated global-workflow Testing Results:

Machine: Orion
Start: Sat Feb 10 02:28:31 CST 2024 on Orion-login-1.HPC.MsState.Edu
---------------------------------------------------
Build: Completed at 02/10/24 03:25:01 AM
Case setup: Completed for experiment C384_atm3DVar_941f419a
Case setup: Completed for experiment C384C192_hybatmda_941f419a
Case setup: Completed for experiment C384_S2SWA_941f419a

@emcbot emcbot added CI-Orion-Failed **Bot use only** CI testing on Orion for this PR has failed and removed CI-Orion-Running **Bot use only** CI testing on Orion for this PR is in-progress labels Feb 10, 2024
@emcbot
Copy link
Author

emcbot commented Feb 10, 2024

Experiment C384_atm3DVar_941f419a  *** FAILED *** on Orion
Experiment C384_atm3DVar_941f419a  with 1 tasks failed at 02/10/24 04:00:17 AM
Error logs:
/work2/noaa/stmp/GFS_CI_ROOT/ORION/PR/2305/RUNTESTS/COMROOT/C384_atm3DVar_941f419a/logs/2023040118/gdasfcst.log

@DavidHuber-NOAA
Copy link
Contributor

Looks like the gdasfcst timed while attempting to write the 9-hour surface forecast file. No errors reported. The timestamp of the gdas.t18z.sfcf009.nc file is the same as the timeout timestamp, so I don't think it stalled. Moreover, the gdas.t18z.atmf009.nc file timestamp is also the same, further suggesting that it wasn't taking an inordinate amount of time to write the surface file. The forecast steps were all reasonable as well (mostly less than 2s, with only 1 longer than 10s). So I think there was a delay in setting up the forecast.

Should this be rebooted or should we wait until next weekend?

@TerrenceMcGuinness-NOAA
Copy link
Collaborator

@DavidHuber-NOAA I'll rewind that job and update the CI state database and reset the Label

@TerrenceMcGuinness-NOAA
Copy link
Collaborator

mterry (Orion-login-1) C384C192_hybatmda_941f419a $ rocotorewind -v 10 -d C384C192_hybatmda_941f419a.db -w C384C192_hybatmda_941f419a.xml -c 202304020000 -t gdasanal
02/12/24 11:44:32 CST :: C384C192_hybatmda_941f419a.xml :: Cycle 202304020000, Task gdasanal, jobid=16548531, in state DEAD (FAILED), ran for 88.0 seconds, exit status=9, try=2 (of 2)
02/12/24 11:44:32 CST :: C384C192_hybatmda_941f419a.xml :: Cycle 202304020000, Task gfsanal, jobid=16548532, in state DEAD (FAILED), ran for 29.0 seconds, exit status=9, try=2 (of 2)
202304020000: Rewind tasks for 202304020000 in state "activated" since 2024-02-10 09:30:12
202304020000: gdasanal: rewinding dead job.
202304020000: gdasanal: deleting all records of this job.
mterry (Orion-login-1) C384C192_hybatmda_941f419a $ 

@TerrenceMcGuinness-NOAA TerrenceMcGuinness-NOAA added CI-Orion-Running **Bot use only** CI testing on Orion for this PR is in-progress and removed CI-Orion-Failed **Bot use only** CI testing on Orion for this PR has failed labels Feb 12, 2024
@emcbot emcbot added CI-Orion-Failed **Bot use only** CI testing on Orion for this PR has failed and removed CI-Orion-Running **Bot use only** CI testing on Orion for this PR is in-progress labels Feb 12, 2024
@emcbot
Copy link
Author

emcbot commented Feb 12, 2024

Experiment C384_atm3DVar_941f419a *** FAILED *** on Orion
Experiment C384_atm3DVar_941f419a with 1 tasks failed at 02/10/24 04:00:17 AM
Error logs:
/work2/noaa/stmp/GFS_CI_ROOT/ORION/PR/2305/RUNTESTS/COMROOT/C384_atm3DVar_941f419a/logs/2023040118/gdasfcst.log
Experiment C384_atm3DVar_941f419a *** FAILED *** on Orion
Experiment C384_atm3DVar_941f419a with 2 tasks failed at 02/12/24 11:50:18 AM
Error logs:
/work2/noaa/stmp/GFS_CI_ROOT/ORION/PR/2305/RUNTESTS/COMROOT/C384_atm3DVar_941f419a/logs/2023040118/gdasanal.log
/work2/noaa/stmp/GFS_CI_ROOT/ORION/PR/2305/RUNTESTS/COMROOT/C384_atm3DVar_941f419a/logs/2023040118/gdasfcst.log
/work2/noaa/stmp/GFS_CI_ROOT/ORION/PR/2305/RUNTESTS/COMROOT/C384_atm3DVar_941f419a/logs/2023040200/gdasanal.log
/work2/noaa/stmp/GFS_CI_ROOT/ORION/PR/2305/RUNTESTS/COMROOT/C384_atm3DVar_941f419a/logs/2023040200/gdasfcst.log

@TerrenceMcGuinness-NOAA TerrenceMcGuinness-NOAA added CI-Orion-Running **Bot use only** CI testing on Orion for this PR is in-progress and removed CI-Orion-Failed **Bot use only** CI testing on Orion for this PR has failed labels Feb 12, 2024
@emcbot emcbot added CI-Orion-Failed **Bot use only** CI testing on Orion for this PR has failed and removed CI-Orion-Running **Bot use only** CI testing on Orion for this PR is in-progress labels Feb 12, 2024
@emcbot
Copy link
Author

emcbot commented Feb 12, 2024

Experiment C384_atm3DVar_941f419a  *** FAILED *** on Orion
Experiment C384_atm3DVar_941f419a  with 1 tasks failed at 02/10/24 04:00:17 AM
Error logs:
/work2/noaa/stmp/GFS_CI_ROOT/ORION/PR/2305/RUNTESTS/COMROOT/C384_atm3DVar_941f419a/logs/2023040118/gdasfcst.log
Experiment C384_atm3DVar_941f419a  *** FAILED *** on Orion
Experiment C384_atm3DVar_941f419a  with 2 tasks failed at 02/12/24 11:50:18 AM
Error logs:
/work2/noaa/stmp/GFS_CI_ROOT/ORION/PR/2305/RUNTESTS/COMROOT/C384_atm3DVar_941f419a/logs/2023040118/gdasanal.log
/work2/noaa/stmp/GFS_CI_ROOT/ORION/PR/2305/RUNTESTS/COMROOT/C384_atm3DVar_941f419a/logs/2023040118/gdasfcst.log
/work2/noaa/stmp/GFS_CI_ROOT/ORION/PR/2305/RUNTESTS/COMROOT/C384_atm3DVar_941f419a/logs/2023040200/gdasanal.log
/work2/noaa/stmp/GFS_CI_ROOT/ORION/PR/2305/RUNTESTS/COMROOT/C384_atm3DVar_941f419a/logs/2023040200/gdasfcst.log
Experiment C384C192_hybatmda_941f419a  *** FAILED *** on Orion
Experiment C384C192_hybatmda_941f419a  with 2 tasks failed at 02/12/24 12:20:47 PM
Error logs:
/work2/noaa/stmp/GFS_CI_ROOT/ORION/PR/2305/RUNTESTS/COMROOT/C384C192_hybatmda_941f419a/logs/2023040118/gdasfcst.log
/work2/noaa/stmp/GFS_CI_ROOT/ORION/PR/2305/RUNTESTS/COMROOT/C384C192_hybatmda_941f419a/logs/2023040118/gfsanal.log
/work2/noaa/stmp/GFS_CI_ROOT/ORION/PR/2305/RUNTESTS/COMROOT/C384C192_hybatmda_941f419a/logs/2023040200/gdasfcst.log
/work2/noaa/stmp/GFS_CI_ROOT/ORION/PR/2305/RUNTESTS/COMROOT/C384C192_hybatmda_941f419a/logs/2023040200/gfsanal.log

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI/CD Issue related to CI/CD CI-Hera-Passed **Bot use only** CI testing on Hera for this PR has completed successfully CI-Orion-Failed **Bot use only** CI testing on Orion for this PR has failed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants