Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[develop] Feature cicd scorecard metric #1079

Merged

Conversation

BruceKropp-Raytheon
Copy link
Collaborator

@BruceKropp-Raytheon BruceKropp-Raytheon commented Apr 24, 2024

DESCRIPTION OF CHANGES:

  1. Update CI/CD scripts to include skill-score metric output so that follow-on metrics collection can display it on metrics Dashboard.
  2. Update Jenkinsfile to fix post() section that calls follow-on metrics collection job so that it is only called once at the end, regardless if any platforms builds or tests fail independently.
  3. Update the Jenkinsfile to skip platform Nodes that appear to be offline, rather than put them in the launch queue. This also means we can re-add the NOAAcloud platforms to the list of possible Nodes to attempt. The will be skipped if they are not online.
  4. Update Jenkinsfile to include timeout limits on Build stage and Test stage, so they don't run forever.
  5. Update Jenkinsfile to allow seeing timestamps in the Jenkins console log.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

TESTS CONDUCTED:

Three separate runs to verify SUCCESS even without executing all E2E tests
I. Fundamental Tests
II. skill-score test only
III. none

  • hera.intel - this was passing until sysAdmin changed system module version for nco. A new PR corrects this.
  • hera.gnu
  • orion.intel - Unreliable results. there appears to be underlying compile/module issues on this platform.
  • hercules.intel
  • derecho.intel
  • gaea.intel - tests remain in QUEUED state forever
  • jet.intel - this was passing until Jet went into LFS maintenance
  • wcoss2.intel
  • NOAA Cloud (indicate which platform)
  • Jenkins - a new Jenkinsfile and CI/CD scripts to include skill-score metric
  • fundamental test suite
  • comprehensive tests (specify which if a subset was used)

DEPENDENCIES:

DOCUMENTATION:

No new changes to the way the Jenkins job is launched.

ISSUE:

The previous update to Jenkinsfile added a trigger to another job that collects metrics of all the target platforms. It was initially in a final stage of the pipeline, but needs to only be in the post() section of the pipeline so that it is only triggered once at the end.

CHECKLIST

  • My code follows the style guidelines in the Contributor's Guide
  • I have performed a self-review of my own code using the Code Reviewer's Guide
  • I have commented my code, particularly in hard-to-understand areas
  • My changes need updates to the documentation. I have made corresponding changes to the documentation
  • My changes do not require updates to the documentation (explain). - No changes to the way the Jenkins job is launched.
  • My changes generate no new warnings - except where noted above in TESTS CONDUCTED section
  • New and existing tests pass with my changes
  • Any dependent changes have been merged and published

LABELS (optional):

A Code Manager needs to add the following labels to this PR:

  • Work In Progress
  • bug
  • enhancement
  • documentation
  • release
  • high priority
  • run_ci
  • run_we2e_fundamental_tests
  • run_we2e_comprehensive_tests
  • Needs Cheyenne test
  • Needs Jet test
  • Needs Hera test (intel)
  • Needs Orion test
  • help wanted

CONTRIBUTORS (optional):

Edward Snyder

Copy link
Collaborator

@EdwardSnyder-NOAA EdwardSnyder-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good to me. The srw_test.sh passed on PW AWS and the expt can be found here: /contrib/Edward.Snyder/ss-ci/ufs-srweather-app/expt_dirs/grid_SUBCONUS_Ind_3km_ics_FV3GFS_lbcs_FV3GFS_suite_WoFS_v0

To run the srw_test.sh these variables were exported:

  • export WORKSPACE=$PWD
  • export SRW_PLATFORM=noaacloud
  • export SRW_COMPILER=intel
  • export SRW_PROJECT=ca-epic
  • export SRW_WE2E_COMPREHENSIVE_TESTS=FALSE
  • export SRW_WE2E_SINGLE_TEST=skill-score

Copy link
Collaborator

@MichaelLueken MichaelLueken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BruceKropp-Raytheon -

These changes look good to me. I was also able to successfully run the coverage tests using the Jenkins scripts on Hera Intel:

----------------------------------------------------------------------------------------------------
Experiment name                                                  | Status    | Core hours used
----------------------------------------------------------------------------------------------------
custom_ESGgrid_Peru_12km_20240424161412                            COMPLETE              29.64
get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_grib2_2019061200_2024042  COMPLETE               5.91
get_from_HPSS_ics_GDAS_lbcs_GDAS_fmt_netcdf_2022040400_ensemble_2  COMPLETE            1436.48
get_from_HPSS_ics_HRRR_lbcs_RAP_20240424161416                     COMPLETE              14.37
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2_20240  COMPLETE               5.93
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16_plot_20  COMPLETE              12.60
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_RAP_suite_RAP_20240424161419  COMPLETE               9.85
grid_RRFS_CONUS_25km_ics_GSMGFS_lbcs_GSMGFS_suite_GFS_v15p2_20240  COMPLETE               6.20
grid_RRFS_CONUS_3km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2_202404  COMPLETE             436.04
grid_RRFS_CONUS_3km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16_20240424  COMPLETE             577.13
grid_RRFS_CONUScompact_3km_ics_HRRR_lbcs_RAP_suite_HRRR_202404241  COMPLETE             870.05
pregen_grid_orog_sfc_climo_20240424161425                          COMPLETE               7.54
----------------------------------------------------------------------------------------------------
Total                                                              COMPLETE            3411.74

and Jet:

----------------------------------------------------------------------------------------------------
Experiment name                                                  | Status    | Core hours used 
----------------------------------------------------------------------------------------------------
community_20240424163221                                           COMPLETE              19.09
custom_ESGgrid_20240424163223                                      COMPLETE              24.73
custom_ESGgrid_Great_Lakes_snow_8km_20240424163225                 COMPLETE              23.12
custom_GFDLgrid_20240424163227                                     COMPLETE              10.40
get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_nemsio_2021032018_202404  COMPLETE              11.63
get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_netcdf_2022060112_48h_20  COMPLETE              80.78
get_from_HPSS_ics_RAP_lbcs_RAP_20240424163230                      COMPLETE              17.00
grid_RRFS_AK_3km_ics_FV3GFS_lbcs_FV3GFS_suite_HRRR_20240424163232  COMPLETE             627.43
grid_RRFS_CONUS_13km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16_plot_20  COMPLETE              65.27
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2_20240  COMPLETE               9.09
grid_RRFS_CONUS_3km_ics_FV3GFS_lbcs_FV3GFS_suite_RRFS_v1beta_2024  COMPLETE             921.15
----------------------------------------------------------------------------------------------------
Total                                                              COMPLETE            1809.69

Approving PR now.

@MichaelLueken MichaelLueken added the run_we2e_jenkins_coverage_tests SRW App automated CI testing with modified Jenkinsfile label Apr 25, 2024
@MichaelLueken
Copy link
Collaborator

The Jenkins tests successfully passed for Derecho, Gaea, Hera GNU, Hera Intel, and Hercules. There are currently issues with the Jenkins runner on Jet, but I was able to successfully run the coverage tests manually on Jet yesterday as part of my review. The nodes on Orion are currently down due to a cooling issue that was encountered last night.

I will now go ahead and merge this PR.

@MichaelLueken MichaelLueken merged commit 527b242 into ufs-community:develop Apr 25, 2024
2 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request run_we2e_jenkins_coverage_tests SRW App automated CI testing with modified Jenkinsfile
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants