Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-implement updated auto-rts for NCAR fork #101

Merged
merged 56 commits into from
Oct 30, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
56 commits
Select commit Hold shift + click to select a range
e9980c1
add auto-rt to my forked repository
dustinswales Aug 29, 2022
be6f5a6
Initial commit
dustinswales Sep 23, 2022
4e55b1c
Auto-RTs working
dustinswales Oct 4, 2022
c8e86d5
Starting removal of hard-coding of test parameters (machine, workdir).
mkavulich Jul 5, 2023
dea25d9
I think this will work! Adding workdir as a command line argument all…
mkavulich Sep 27, 2023
2bf3cdc
maybe this will work
mkavulich Jul 20, 2023
9869ed6
I somehow did something weird...hopefully now it succeeds.
mkavulich Jul 20, 2023
0cd3f0d
Okay, got to the ECflow step, let's see what happens next
mkavulich Jul 20, 2023
43bb1d4
argh! another dumb error
mkavulich Jul 20, 2023
41c555d
Finally figured out how to manually tests properly...this is close to…
mkavulich Jul 21, 2023
b59b926
One more fix
mkavulich Jul 21, 2023
40e26b5
Updating paths, almost there!
mkavulich Jul 21, 2023
6f2af79
things are compiling and starting to run. Just need to get them to fi…
mkavulich Jul 21, 2023
f9622de
This hack should get it working. A better solution will be to pass th…
mkavulich Jul 21, 2023
2fabe88
Incorrect input data and baseline directories
mkavulich Jul 21, 2023
4c90704
Last bad directory (HAH)
mkavulich Jul 21, 2023
67f9976
Add back label-remover, testing appears to work on Hera!
mkavulich Jul 21, 2023
e5e079c
Update paths for Cheyenne
mkavulich Jul 21, 2023
fcc3a5b
Need to purge module environment, my default environment has conda wh…
mkavulich Jul 25, 2023
2ea3840
Put module purge in wrong place; need it prior to building.
mkavulich Aug 23, 2023
2c59a02
Final fixes to Cheyenne build; now working!
mkavulich Aug 23, 2023
55a3649
Fix stuff that got messed up in rebase
mkavulich Aug 24, 2023
b4dc8dd
Need to specify account, and running rocoto vs ecflow depends on plat…
mkavulich Aug 25, 2023
494c5d5
Apparently Cheyenne *does* support ECFlow, I just missed a different …
mkavulich Aug 25, 2023
c36a056
Provide account as command line argument
mkavulich Aug 25, 2023
231434d
Going rogue here: removing ncar-specific script and going straight fo…
mkavulich Sep 13, 2023
916efdd
Fix usage statement
mkavulich Aug 30, 2023
45a35d3
Restore old flag letters for rt.sh, want to keep things as back-compa…
mkavulich Aug 30, 2023
155c0d1
Pull out all machine-specific settings into their own source files un…
mkavulich Aug 30, 2023
c67ab50
Turns out you could already overwrite the working directory, but only…
mkavulich Aug 30, 2023
3fd441e
Forgot to add new option to getopts list
mkavulich Aug 30, 2023
be2f0be
No reason to try to create STMP directory if we aren't creating a new…
mkavulich Aug 30, 2023
19289b0
Need to define NEW_BASELINE even if it isn't needed *sigh*
mkavulich Aug 30, 2023
5a103d7
Fix specification of baseline and input locations for hera
mkavulich Sep 13, 2023
fa4c938
Fix baseline location
mkavulich Aug 30, 2023
f97d96d
Update for latest hera baseline
mkavulich Aug 30, 2023
92f426c
Working on Hera, needs refinement of python hard-coding
mkavulich Sep 13, 2023
f5d65f1
Paths for Cheyenne
mkavulich Sep 13, 2023
46ec486
Fix Cheyenne data location
mkavulich Sep 14, 2023
0537c1b
Pass through baseline and new baseline locations as command-line argu…
mkavulich Sep 14, 2023
4c6f62a
More improvements and configurability to whole system
mkavulich Sep 15, 2023
560c7d7
Apparently nested `source` calls doesnt work, need to re-think this...
mkavulich Sep 16, 2023
05ba826
Restore some default variables in rt.sh
mkavulich Sep 16, 2023
527b72b
I was wrong, nested sourcing DOES work, I was just passing the wrong …
mkavulich Sep 16, 2023
87f7284
Relative paths will be the death of me
mkavulich Sep 16, 2023
ae013ab
More python-side improvements
mkavulich Sep 20, 2023
7946885
Getting to the final stretch here! Implement yaml config file to supp…
mkavulich Sep 20, 2023
b523dc5
I think I have all the major features that I want! Now to test on Her…
mkavulich Sep 20, 2023
d2620a2
Update input data for Hera
mkavulich Sep 21, 2023
ff76d7c
Handle "workdir" as a job_obj element rather than referencing the
mkavulich Sep 27, 2023
9d5542d
Post-rebase fixes:
mkavulich Sep 27, 2023
5598dd0
Forgot to initialize new variable RTPWD_NEW_BASELINE
mkavulich Sep 27, 2023
6a97b95
pass "new baseline" command-line arg
mkavulich Oct 9, 2023
a85bc65
Fix logfile parsing problem with test failures
mkavulich Oct 10, 2023
fc745c4
Remove module purge per PR comments
mkavulich Oct 11, 2023
055ae96
[AutoRT] hera.intel Job Completed.
mkavulich Oct 11, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 19 additions & 0 deletions tests/auto/README.rt_auto.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# This README describes the format of rt_auto.yaml, which is read by rt_auto.py in order to get git- and GitHub-related settings, and optionally to provide arguments (to avoid having to type long command-line arguments).
# The required section is named 'git', and has two sub-sections, one required, and one optional:
# config (required): Specifies user.email and user.name settings tied to the user's GitHub account
# github (optional): Specifies the repository and base branch (i.e. the destination of the PR) to check
# An example configuration can be found below:
args:
machine: cheyenne
account: P48503002
workdir: /glade/p/ral/jntp/CCPP_regression_testing/NCAR_ufs-weather-model/run/
envfile: machine/cheyenne.ncar # The default machine file is machine/{machine_name}, but you can specify your own
additional_args: -n control_p8 intel # Specify any additional options you'd like to pass to rt.sh; here we specify to only run the "control_p8" test for intel compilers
git:
config:
user.email: [email protected]
user.name: test_user
github:
org: NCAR
repo: ufs-weather-model
base: main
164 changes: 76 additions & 88 deletions tests/auto/jobs/bl.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,61 +7,77 @@

def run(job_obj):
logger = logging.getLogger('BL/RUN')
workdir, rtbldir, blstore = set_directories(job_obj)
pr_repo_loc, repo_dir_str = clone_pr_repo(job_obj, workdir)
new_baseline, blstore = set_directories(job_obj)
pr_repo_loc, repo_dir_str = clone_pr_repo(job_obj)
bldate = get_bl_date(job_obj, pr_repo_loc)
bldir = f'{blstore}/develop-{bldate}/{job_obj.compiler.upper()}'
bldir = f'{blstore}/main-{bldate}/{job_obj.compiler.upper()}'
bldirbool = check_for_bl_dir(bldir, job_obj)
run_regression_test(job_obj, pr_repo_loc)
post_process(job_obj, pr_repo_loc, repo_dir_str, rtbldir, bldir)
post_process(job_obj, pr_repo_loc, repo_dir_str, new_baseline, bldir, bldate, blstore)


def set_directories(job_obj):
logger = logging.getLogger('BL/SET_DIRECTORIES')
if job_obj.machine == 'hera':
workdir = '/scratch1/NCEPDEV/nems/emc.nemspara/autort/pr'
blstore = '/scratch1/NCEPDEV/nems/emc.nemspara/RT/NEMSfv3gfs'
rtbldir = '/scratch1/NCEPDEV/stmp4/emc.nemspara/FV3_RT/'\
workdir = ''
blstore = ''
new_baseline = ''
machine = job_obj.clargs.machine
if machine == 'hera':
rt_dir = '/scratch1/NCEPDEV/nems/emc.nemspara/'
workdir = f'{rt_dir}/autort/pr'
blstore = f'{rt_dir}/RT/NEMSfv3gfs'
new_baseline = f'{rt_dir}/FV3_RT/'\
f'REGRESSION_TEST_{job_obj.compiler.upper()}'
elif job_obj.machine == 'jet':
workdir = '/lfs4/HFIP/h-nems/emc.nemspara/autort/pr'
blstore = '/lfs4/HFIP/h-nems/emc.nemspara/RT/NEMSfv3gfs/'
rtbldir = '/lfs4/HFIP/h-nems/emc.nemspara/RT_BASELINE/'\
elif machine == 'jet':
rt_dir = '/lfs4/HFIP/h-nems/emc.nemspara/'
workdir = f'{rt_dir}/autort/pr'
blstore = f'{rt_dir}/RT/NEMSfv3gfs'
new_baseline = '{rt_dir}/RT_BASELINE/'\
f'emc.nemspara/FV3_RT/REGRESSION_TEST_{job_obj.compiler.upper()}'
elif job_obj.machine == 'gaea':
elif machine == 'gaea':
workdir = '/lustre/f2/pdata/ncep/emc.nemspara/autort/pr'
blstore = '/lustre/f2/pdata/ncep_shared/emc.nemspara/RT/NEMSfv3gfs'
rtbldir = '/lustre/f2/scratch/emc.nemspara/FV3_RT/'\
new_baseline = '/lustre/f2/scratch/emc.nemspara/FV3_RT/'\
f'REGRESSION_TEST_{job_obj.compiler.upper()}'
elif job_obj.machine == 'orion':
elif machine == 'orion':
workdir = '/work/noaa/nems/emc.nemspara/autort/pr'
blstore = '/work/noaa/nems/emc.nemspara/RT/NEMSfv3gfs'
rtbldir = '/work/noaa/stmp/bcurtis/stmp/bcurtis/FV3_RT/'\
new_baseline = '/work/noaa/stmp/bcurtis/stmp/bcurtis/FV3_RT/'\
f'REGRESSION_TEST_{job_obj.compiler.upper()}'
elif job_obj.machine == 'cheyenne':
elif machine == 'cheyenne':
workdir = '/glade/scratch/dtcufsrt/autort/tests/auto/pr'
blstore = '/glade/p/ral/jntp/GMTB/ufs-weather-model/RT/NEMSfv3gfs'
rtbldir = '/glade/scratch/dtcufsrt/FV3_RT/'\
new_baseline = '/glade/scratch/dtcufsrt/FV3_RT/'\
f'REGRESSION_TEST_{job_obj.compiler.upper()}'
else:
logger.critical(f'Machine {job_obj.machine} is not supported for this job')
raise KeyError

logger.info(f'machine: {job_obj.machine}')
logger.info(f'workdir: {workdir}')
if not job_obj.clargs.workdir:
job_obj.workdir = workdir
if job_obj.clargs.baseline:
blstore = job_obj.clargs.baseline
if job_obj.clargs.new_baseline:
new_baseline = job_obj.clargs.new_baseline

logger.info(f'machine: {machine}')
logger.info(f'workdir: {job_obj.workdir}')
logger.info(f'blstore: {blstore}')
logger.info(f'rtbldir: {rtbldir}')
logger.info(f'new_baseline: {new_baseline}')

if not job_obj.workdir or not blstore or not new_baseline:
logger.critical(f'One of workdir, blstore, or new_baseline has not been specified')
logger.critical(f'Provide these on the command line or specify a supported machine')
raise KeyError

return workdir, rtbldir, blstore

return new_baseline, blstore


def check_for_bl_dir(bldir, job_obj):
logger = logging.getLogger('BL/CHECK_FOR_BL_DIR')
logger.info('Checking if baseline directory exists')
if os.path.exists(bldir):
logger.critical(f'Baseline dir: {bldir} exists. It should not, yet.')
job_obj.comment_text_append(f'{bldir}\n Exists already. '
'It should not yet. Please delete.')
job_obj.comment_text_append(f'[BL] ERROR: Baseline location exists before '
f'creation:\n{bldir}')
raise FileExistsError
return False

Expand All @@ -75,39 +91,23 @@ def create_bl_dir(bldir, job_obj):
raise FileNotFoundError


#def get_bl_date(job_obj):
# logger = logging.getLogger('BL/GET_BL_DATE')
# for line in job_obj.preq_dict['preq'].body.splitlines():
# if 'BL_DATE:' in line:
# bldate = line
# bldate = bldate.replace('BL_DATE:', '')
# bldate = bldate.replace(' ', '')
# if len(bldate) != 8:
# print(f'Date: {bldate} is not formatted YYYYMMDD')
# raise ValueError
# logger.info(f'BL_DATE: {bldate}')
# bl_format = '%Y%m%d'
# try:
# datetime.datetime.strptime(bldate, bl_format)
# except ValueError:
# logger.info(f'Date {bldate} is not formatted YYYYMMDD')
# raise ValueError
# return bldate
# logger.critical('"BL_DATE:YYYYMMDD" needs to be in the PR body.'\
# 'On its own line. Stopping')
# raise ValueError


def run_regression_test(job_obj, pr_repo_loc):
logger = logging.getLogger('BL/RUN_REGRESSION_TEST')

rt_command = 'cd tests'
rt_command += f' && export RT_COMPILER="{job_obj.compiler}"'
if job_obj.workdir:
rt_command += f' && export RUNDIR_ROOT={job_obj.workdir}'
if job_obj.clargs.new_baseline:
rt_command += f' && export NEW_BASELINE={job_obj.clargs.new_baseline}'
rt_command += f' && /bin/bash --login ./rt.sh -e -a {job_obj.clargs.account} -c -p {job_obj.clargs.machine} -n control_p8 intel'
if job_obj.compiler == 'gnu':
rt_command = [[f'export RT_COMPILER="{job_obj.compiler}" && cd tests '
'&& /bin/bash --login ./rt.sh -e -c -l rt_gnu.conf',
pr_repo_loc]]
elif job_obj.compiler == 'intel':
rt_command = [[f'export RT_COMPILER="{job_obj.compiler}" && cd tests '
'&& /bin/bash --login ./rt.sh -e -c', pr_repo_loc]]
job_obj.run_commands(logger, rt_command)
rt_command += f' -l rt_gnu.conf'
if job_obj.clargs.envfile:
rt_command += f' -s {job_obj.clargs.envfile}'
rt_command += f' {job_obj.clargs.additional_args}'

job_obj.run_commands(logger, [[rt_command, pr_repo_loc]])


def remove_pr_data(job_obj, pr_repo_loc, repo_dir_str, rt_dir):
Expand All @@ -119,28 +119,27 @@ def remove_pr_data(job_obj, pr_repo_loc, repo_dir_str, rt_dir):
job_obj.run_commands(logger, rm_command)


def clone_pr_repo(job_obj, workdir):
def clone_pr_repo(job_obj):
''' clone the GitHub pull request repo, via command line '''
logger = logging.getLogger('BL/CLONE_PR_REPO')
repo_name = job_obj.preq_dict['preq'].head.repo.name
branch = job_obj.preq_dict['preq'].head.ref
git_url = job_obj.preq_dict['preq'].head.repo.html_url.split('//')
git_url = f'{git_url[0]}//${{ghapitoken}}@{git_url[1]}'
logger.debug(f'GIT URL: {git_url}')
git_ssh_url = job_obj.preq_dict['preq'].head.repo.ssh_url
logger.debug(f'GIT SSH_URL: {git_ssh_url}')
logger.info('Starting repo clone')
repo_dir_str = f'{workdir}/'\
repo_dir_str = f'{job_obj.workdir}/'\
f'{str(job_obj.preq_dict["preq"].id)}/'\
f'{datetime.datetime.now().strftime("%Y%m%d%H%M%S")}'
pr_repo_loc = f'{repo_dir_str}/{repo_name}'
job_obj.comment_text_append(f'Repo location: {pr_repo_loc}')
job_obj.comment_text_append(f'[BL] Repo location: {pr_repo_loc}')
create_repo_commands = [
[f'mkdir -p "{repo_dir_str}"', os.getcwd()],
[f'git clone -b {branch} {git_url}', repo_dir_str],
[f'git clone -b {branch} {git_ssh_url}', repo_dir_str],
['git submodule update --init --recursive',
f'{repo_dir_str}/{repo_name}'],
['git config user.email "[email protected]"',
[f'git config user.email {job_obj.gitargs["config"]["user.email"]}',
f'{repo_dir_str}/{repo_name}'],
['git config user.name "Brian Curtis"',
[f'git config user.name {job_obj.gitargs["config"]["user.name"]}',
f'{repo_dir_str}/{repo_name}']
]

Expand All @@ -150,37 +149,31 @@ def clone_pr_repo(job_obj, workdir):
return pr_repo_loc, repo_dir_str


def post_process(job_obj, pr_repo_loc, repo_dir_str, rtbldir, bldir):
def post_process(job_obj, pr_repo_loc, repo_dir_str, new_baseline, bldir, bldate, blstore):
logger = logging.getLogger('BL/MOVE_RT_LOGS')
rt_log = f'tests/RegressionTests_{job_obj.machine}'\
f'.{job_obj.compiler}.log'
rt_log = f'tests/logs/RegressionTests_{job_obj.clargs.machine}.log'
filepath = f'{pr_repo_loc}/{rt_log}'
rt_dir, logfile_pass = process_logfile(job_obj, filepath)
if logfile_pass:
create_bl_dir(bldir, job_obj)
move_bl_command = [[f'mv {rtbldir}/* {bldir}/', pr_repo_loc]]
if job_obj.machine == 'orion':
move_bl_command.append([f'/bin/bash --login adjust_permissions.sh orion develop-{bldate}', blstore])
move_bl_command = [[f'mv {new_baseline}/* {bldir}/', pr_repo_loc]]
job_obj.run_commands(logger, move_bl_command)
job_obj.comment_text_append('Baseline creation and move successful')
job_obj.comment_text_append('[BL] Baseline creation and move successful')
logger.info('Starting RT Job')
rt.run(job_obj)
logger.info('Finished with RT Job')
remove_pr_data(job_obj, pr_repo_loc, repo_dir_str, rt_dir)


def get_bl_date(job_obj, pr_repo_loc):
logger = logging.getLogger('BL/UPDATE_RT_SH')
logger = logging.getLogger('BL/UPDATE_RT_NCAR_SH')
BLDATEFOUND = False
with open(f'{pr_repo_loc}/tests/rt.sh', 'r') as f:
with open(f'{pr_repo_loc}/tests/bl_date.ncar.conf', 'r') as f:
for line in f:
if 'BL_DATE=' in line:
logger.info('Found BL_DATE in line')
BLDATEFOUND = True
bldate = line
bldate = line.split('=')[1].strip()
bldate = bldate.rstrip('\n')
bldate = bldate.replace('BL_DATE=', '')
bldate = bldate.strip(' ')
logger.info(f'bldate is "{bldate}"')
logger.info(f'Type bldate: {type(bldate)}')
bl_format = '%Y%m%d'
Expand All @@ -190,9 +183,7 @@ def get_bl_date(job_obj, pr_repo_loc):
logger.info(f'Date {bldate} is not formatted YYYYMMDD')
raise ValueError
if not BLDATEFOUND:
job_obj.comment_text_append('BL_DATE not found in rt.sh.'
'Please manually edit rt.sh '
'with BL_DATE={bldate}')
job_obj.comment_text_append('[BL] ERROR: Variable "BL_DATE" not found in rt.sh.')
job_obj.job_failed(logger, 'get_bl_date()')
logger.info('Finished get_bl_date')

Expand All @@ -208,20 +199,17 @@ def process_logfile(job_obj, logfile):
for line in f:
if all(x in line for x in fail_string_list):
# if 'FAIL' in line and 'Test' in line:
job_obj.comment_text_append(f'{line.rstrip(chr(10))}')
job_obj.comment_text_append(f'[BL] Error: {line.rstrip(chr(10))}')
elif 'working dir' in line and not rt_dir:
logger.info(f'Found "working dir" in line: {line}')
rt_dir = os.path.split(line.split()[-1])[0]
logger.info(f'It is: {rt_dir}')
job_obj.comment_text_append(f'Please manually delete: '
f'{rt_dir}')
elif 'SUCCESSFUL' in line:
logger.info('RT Successful')
return rt_dir, True
logger.critical(f'Log file exists but is not complete')
job_obj.job_failed(logger, f'{job_obj.preq_dict["action"]}')
else:
logger.critical(f'Could not find {job_obj.machine}'
f'.{job_obj.compiler} '
f'{job_obj.preq_dict["action"]} log')
logger.critical(f'Could not find {job_obj.clargs.machine}.{job_obj.compiler} '
f'{job_obj.preq_dict["action"]} log: {logfile}')
raise FileNotFoundError
Loading