-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stat Analysis Implementation for global-workflow #8
base: develop
Are you sure you want to change the base?
Conversation
@CoryMartin-NOAA you will need to add Ed and Andrew to be able to review whenever you get a chance. |
@ADCollard @EdwardSafford-NOAA feel free to take a look at this. I'll do the same and once we've done a review/test, we will open the PR with the authoritative global-workflow repo. |
This reverts commit 899b20b.
# Get list of .nc4 files | ||
# obs_space_paths = glob.glob(os.path.join(ob_dir_str, "*.{nc,nc4}")) # THIS SHOULD WORK BUT ISNT, glob patterns introduced in Python 3.9 | ||
nc_paths = glob.glob(os.path.join(ob_dir_str, "*.nc")) | ||
nc4_paths = glob.glob(os.path.join(ob_dir_str, "*.nc4")) | ||
obs_space_paths = nc_paths + nc4_paths | ||
|
||
# Temporary. Create condition check here for available jcb algorithms? | ||
if OB == 'snow': | ||
obs_space_paths = glob.glob(os.path.join(ob_dir_str, "diag_ims_snow_*.nc")) | ||
|
||
# This grabs the obspace string from the .nc4 files, however not all are perfect. Need solution. | ||
self.task_config.OBSPACES_LIST = ['_'.join(os.path.basename(path).split('_')[1:3]) for path in obs_space_paths] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cory and I were discussing this offline. Maybe would be best to have some sort of configuration file to avoid grabbing the wrong files. Also not all file names follow the same format mentioned in comment on line 133. This would remove the temporary fix on lines 130-131.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes this is too rigid and prone to failure, this and the above globs are risky. We should have a list of files that are expected based on some configuration file
# Gunzip .nc files | ||
logger.info("Gunzip files from tar file") | ||
gz_files = glob.glob(os.path.join(ob_dir_str, "*.gz")) | ||
logger.info(f"Gunzip files: {gz_files}") | ||
|
||
for diagfile in gz_files: | ||
output_file = os.path.join(ob_dir_str, os.path.basename(diagfile)[:-3]) | ||
with gzip.open(diagfile, 'rb') as f_in: | ||
with open(output_file, 'wb') as f_out: | ||
f_out.write(f_in.read()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe I need to have a check after this to make sure there are files in the tar file. If not, move on to next ob. What are you thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, a check if it is non-empty is necessary
@@ -105,6 +105,11 @@ elif [[ "${step}" = "atmanlfv3inc" ]]; then | |||
export NTHREADS_ATMANLFV3INC=${NTHREADSmax} | |||
export APRUN_ATMANLFV3INC="${APRUN_default} --cpus-per-task=${NTHREADS_ATMANLFV3INC}" | |||
|
|||
elif [[ "${step}" = "anlstat" ]]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we will need similar entries for the other machines when this goes into a PR, not just HERA
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kevindougherty-noaa we need FOO.env files for each supported machine (GAEA, HERCULES, ORION, WCOSS2) not just HERA
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First pass of comments, @kevindougherty-noaa let me know if you need any clarifications or want to discuss anything
jobs/JGLOBAL_ANALYSIS_STATS
Outdated
############################################## | ||
|
||
# Generate COM variables from templates | ||
YMD=${PDY} HH=${cyc} declare_from_tmpl -rx COM_OBS COM_CHEM_ANALYSIS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Workflow will need these modified to be COMIN
not COM
, also I suggest a cleaner declaration so that we can add each 'component' on a new line.
jobs/JGLOBAL_ANALYSIS_STATS
Outdated
cat "${pgmout}" | ||
fi | ||
|
||
exit 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no EOL
parm/config/gfs/config.anlstat
Outdated
export JCB_BASE_YAML="${PARMgfs}/gdas/stat/aero/jcb-base.yaml.j2" | ||
export JCB_ALGO_YAML="${PARMgfs}/gdas/jcb-algorithms/anlstat.yaml.j2" | ||
export JEDIEXE=${HOMEgfs}/sorc/gdas.cd/build/bin/ioda-stats.x | ||
export STAT_OUTDIR="${COMOUT}/${net}/${run.yyyymmdd}/${hh}/products/stats/" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this gets replaced by COMOUT_ATMOS_ANALYSIS_STATS
etc. or something similar
parm/config/gfs/config.anlstat
Outdated
export JEDIEXE=${HOMEgfs}/sorc/gdas.cd/build/bin/ioda-stats.x | ||
export STAT_OUTDIR="${COMOUT}/${net}/${run.yyyymmdd}/${hh}/products/stats/" | ||
|
||
echo "END: config.anlstat" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no EOL
'npx_ges': _res + 1, | ||
'npy_ges': _res + 1, | ||
'npz_ges': self.task_config.LEVS - 1, | ||
'npz': self.task_config.LEVS - 1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we care about the number of levels?
|
||
# Loop through ob space list | ||
for OB in self.task_config.STAT_OBS: | ||
logger.info(f"Working on current observation: {OB}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is snow and aero right? then it's not an 'observation'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would be a good replacement for this? Maybe anl
to fit with your above suggestion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes I'd suggest something like f"Working on current component: {anl}"
# Gunzip .nc files | ||
logger.info("Gunzip files from tar file") | ||
gz_files = glob.glob(os.path.join(ob_dir_str, "*.gz")) | ||
logger.info(f"Gunzip files: {gz_files}") | ||
|
||
for diagfile in gz_files: | ||
output_file = os.path.join(ob_dir_str, os.path.basename(diagfile)[:-3]) | ||
with gzip.open(diagfile, 'rb') as f_in: | ||
with open(output_file, 'wb') as f_out: | ||
f_out.write(f_in.read()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, a check if it is non-empty is necessary
# Get list of .nc4 files | ||
# obs_space_paths = glob.glob(os.path.join(ob_dir_str, "*.{nc,nc4}")) # THIS SHOULD WORK BUT ISNT, glob patterns introduced in Python 3.9 | ||
nc_paths = glob.glob(os.path.join(ob_dir_str, "*.nc")) | ||
nc4_paths = glob.glob(os.path.join(ob_dir_str, "*.nc4")) | ||
obs_space_paths = nc_paths + nc4_paths | ||
|
||
# Temporary. Create condition check here for available jcb algorithms? | ||
if OB == 'snow': | ||
obs_space_paths = glob.glob(os.path.join(ob_dir_str, "diag_ims_snow_*.nc")) | ||
|
||
# This grabs the obspace string from the .nc4 files, however not all are perfect. Need solution. | ||
self.task_config.OBSPACES_LIST = ['_'.join(os.path.basename(path).split('_')[1:3]) for path in obs_space_paths] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes this is too rigid and prone to failure, this and the above globs are risky. We should have a list of files that are expected based on some configuration file
|
||
@logit(logger) | ||
def finalize(self, jedi_dict_key: str) -> None: | ||
"""Finalize a statistic analysis |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to reword these, as 'finalize a statistic analysis' doesn't make sense
tagging @EdwardSafford-NOAA for awareness |
Co-authored-by: Cory Martin <[email protected]>
Co-authored-by: Cory Martin <[email protected]>
Co-authored-by: Cory Martin <[email protected]>
@CoryMartin-NOAA @EdwardSafford-NOAA I have addressed the following requests made for the accompanying scripts. However, I would like to open the discussion on how best to handle Cory's comment on how a config file should be created for the issue in the |
@kevindougherty-noaa can you put a YAML in the config directory that is something like this:
that then gets templated by Jinja? Would that be viable? |
jobs/JGLOBAL_ANALYSIS_STATS
Outdated
YMD=${PDY} HH=${cyc} declare_from_tmpl -rx COMIN_OBS:COM_OBS_TMPL \ | ||
COMOUT_CONF:COM_CONF_TMPL \ | ||
COMIN_ATMOS_ANALYSIS:COM_ATMOS_ANALYSIS_TMPL \ | ||
COMIN_OCEAN_ANALYSIS:COM_OCEAN_ANALYSIS_TMPL \ | ||
COMIN_CHEM_ANALYSIS:COM_CHEM_ANALYSIS_TMPL \ | ||
COMIN_SNOW_ANALYSIS:COM_SNOW_ANALYSIS_TMPL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to double check to make sure these work.
Below are the appropriate changes to run a stat analysis job in the global-workflow.
Additional changes to GDASApp are also needed for
jcb-gdas
andjcb-algorithms
for these scripts to run completely successfully.