Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stat Analysis Implementation for global-workflow #8

Open
wants to merge 47 commits into
base: develop
Choose a base branch
from

Conversation

kevindougherty-noaa
Copy link
Collaborator

Below are the appropriate changes to run a stat analysis job in the global-workflow.

Additional changes to GDASApp are also needed for jcb-gdas and jcb-algorithms for these scripts to run completely successfully.

@kevindougherty-noaa
Copy link
Collaborator Author

@CoryMartin-NOAA you will need to add Ed and Andrew to be able to review whenever you get a chance.

@CoryMartin-NOAA
Copy link
Owner

@ADCollard @EdwardSafford-NOAA feel free to take a look at this. I'll do the same and once we've done a review/test, we will open the PR with the authoritative global-workflow repo.

This was referenced Dec 16, 2024
@kevindougherty-noaa kevindougherty-noaa marked this pull request as ready for review January 28, 2025 16:29
Comment on lines 123 to 134
# Get list of .nc4 files
# obs_space_paths = glob.glob(os.path.join(ob_dir_str, "*.{nc,nc4}")) # THIS SHOULD WORK BUT ISNT, glob patterns introduced in Python 3.9
nc_paths = glob.glob(os.path.join(ob_dir_str, "*.nc"))
nc4_paths = glob.glob(os.path.join(ob_dir_str, "*.nc4"))
obs_space_paths = nc_paths + nc4_paths

# Temporary. Create condition check here for available jcb algorithms?
if OB == 'snow':
obs_space_paths = glob.glob(os.path.join(ob_dir_str, "diag_ims_snow_*.nc"))

# This grabs the obspace string from the .nc4 files, however not all are perfect. Need solution.
self.task_config.OBSPACES_LIST = ['_'.join(os.path.basename(path).split('_')[1:3]) for path in obs_space_paths]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cory and I were discussing this offline. Maybe would be best to have some sort of configuration file to avoid grabbing the wrong files. Also not all file names follow the same format mentioned in comment on line 133. This would remove the temporary fix on lines 130-131.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this is too rigid and prone to failure, this and the above globs are risky. We should have a list of files that are expected based on some configuration file

Comment on lines 112 to 121
# Gunzip .nc files
logger.info("Gunzip files from tar file")
gz_files = glob.glob(os.path.join(ob_dir_str, "*.gz"))
logger.info(f"Gunzip files: {gz_files}")

for diagfile in gz_files:
output_file = os.path.join(ob_dir_str, os.path.basename(diagfile)[:-3])
with gzip.open(diagfile, 'rb') as f_in:
with open(output_file, 'wb') as f_out:
f_out.write(f_in.read())
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe I need to have a check after this to make sure there are files in the tar file. If not, move on to next ob. What are you thoughts?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, a check if it is non-empty is necessary

@@ -105,6 +105,11 @@ elif [[ "${step}" = "atmanlfv3inc" ]]; then
export NTHREADS_ATMANLFV3INC=${NTHREADSmax}
export APRUN_ATMANLFV3INC="${APRUN_default} --cpus-per-task=${NTHREADS_ATMANLFV3INC}"

elif [[ "${step}" = "anlstat" ]]; then
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we will need similar entries for the other machines when this goes into a PR, not just HERA

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kevindougherty-noaa we need FOO.env files for each supported machine (GAEA, HERCULES, ORION, WCOSS2) not just HERA

Copy link
Owner

@CoryMartin-NOAA CoryMartin-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First pass of comments, @kevindougherty-noaa let me know if you need any clarifications or want to discuss anything

##############################################

# Generate COM variables from templates
YMD=${PDY} HH=${cyc} declare_from_tmpl -rx COM_OBS COM_CHEM_ANALYSIS
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Workflow will need these modified to be COMIN not COM, also I suggest a cleaner declaration so that we can add each 'component' on a new line.

cat "${pgmout}"
fi

exit 0
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no EOL

export JCB_BASE_YAML="${PARMgfs}/gdas/stat/aero/jcb-base.yaml.j2"
export JCB_ALGO_YAML="${PARMgfs}/gdas/jcb-algorithms/anlstat.yaml.j2"
export JEDIEXE=${HOMEgfs}/sorc/gdas.cd/build/bin/ioda-stats.x
export STAT_OUTDIR="${COMOUT}/${net}/${run.yyyymmdd}/${hh}/products/stats/"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this gets replaced by COMOUT_ATMOS_ANALYSIS_STATS etc. or something similar

export JEDIEXE=${HOMEgfs}/sorc/gdas.cd/build/bin/ioda-stats.x
export STAT_OUTDIR="${COMOUT}/${net}/${run.yyyymmdd}/${hh}/products/stats/"

echo "END: config.anlstat"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no EOL

'npx_ges': _res + 1,
'npy_ges': _res + 1,
'npz_ges': self.task_config.LEVS - 1,
'npz': self.task_config.LEVS - 1,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we care about the number of levels?


# Loop through ob space list
for OB in self.task_config.STAT_OBS:
logger.info(f"Working on current observation: {OB}")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is snow and aero right? then it's not an 'observation'

Copy link
Collaborator Author

@kevindougherty-noaa kevindougherty-noaa Jan 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would be a good replacement for this? Maybe anl to fit with your above suggestion?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes I'd suggest something like f"Working on current component: {anl}"

Comment on lines 112 to 121
# Gunzip .nc files
logger.info("Gunzip files from tar file")
gz_files = glob.glob(os.path.join(ob_dir_str, "*.gz"))
logger.info(f"Gunzip files: {gz_files}")

for diagfile in gz_files:
output_file = os.path.join(ob_dir_str, os.path.basename(diagfile)[:-3])
with gzip.open(diagfile, 'rb') as f_in:
with open(output_file, 'wb') as f_out:
f_out.write(f_in.read())
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, a check if it is non-empty is necessary

Comment on lines 123 to 134
# Get list of .nc4 files
# obs_space_paths = glob.glob(os.path.join(ob_dir_str, "*.{nc,nc4}")) # THIS SHOULD WORK BUT ISNT, glob patterns introduced in Python 3.9
nc_paths = glob.glob(os.path.join(ob_dir_str, "*.nc"))
nc4_paths = glob.glob(os.path.join(ob_dir_str, "*.nc4"))
obs_space_paths = nc_paths + nc4_paths

# Temporary. Create condition check here for available jcb algorithms?
if OB == 'snow':
obs_space_paths = glob.glob(os.path.join(ob_dir_str, "diag_ims_snow_*.nc"))

# This grabs the obspace string from the .nc4 files, however not all are perfect. Need solution.
self.task_config.OBSPACES_LIST = ['_'.join(os.path.basename(path).split('_')[1:3]) for path in obs_space_paths]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this is too rigid and prone to failure, this and the above globs are risky. We should have a list of files that are expected based on some configuration file


@logit(logger)
def finalize(self, jedi_dict_key: str) -> None:
"""Finalize a statistic analysis
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to reword these, as 'finalize a statistic analysis' doesn't make sense

@CoryMartin-NOAA
Copy link
Owner

tagging @EdwardSafford-NOAA for awareness

@CoryMartin-NOAA
Copy link
Owner

@RussTreadon-NOAA FYI

@kevindougherty-noaa
Copy link
Collaborator Author

@CoryMartin-NOAA @EdwardSafford-NOAA I have addressed the following requests made for the accompanying scripts. However, I would like to open the discussion on how best to handle Cory's comment on how a config file should be created for the issue in the stat_analysis.py file. Where and how should this config file be stored? What is the best method to use this? Like Cory mentioned, globs can be risky and all the files do not follow the same naming structures. What are your thoughts?

@CoryMartin-NOAA
Copy link
Owner

@kevindougherty-noaa can you put a YAML in the config directory that is something like this:

atmos:
  obs spaces:
  - name:
     input file:
     config yaml:
aero:
  obs spaces:
  - name:
     input file:
     config yaml:

that then gets templated by Jinja? Would that be viable?

Comment on lines 16 to 21
YMD=${PDY} HH=${cyc} declare_from_tmpl -rx COMIN_OBS:COM_OBS_TMPL \
COMOUT_CONF:COM_CONF_TMPL \
COMIN_ATMOS_ANALYSIS:COM_ATMOS_ANALYSIS_TMPL \
COMIN_OCEAN_ANALYSIS:COM_OCEAN_ANALYSIS_TMPL \
COMIN_CHEM_ANALYSIS:COM_CHEM_ANALYSIS_TMPL \
COMIN_SNOW_ANALYSIS:COM_SNOW_ANALYSIS_TMPL
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to double check to make sure these work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants