Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cam6_3_118: Integrating HEMCO emissions component in CESM #560

Merged

Conversation

jimmielin
Copy link
Member

@jimmielin jimmielin commented Apr 5, 2022

Please refer to companion issue: #504 for previous discussion.

An updated copy of the code description plus compile and run instructions is written below:

Intro

HEMCO is a versatile emissions and data component with the primary purpose of processing emissions data online with inventory selection, masking, and scaling functionality performed as described by a configuration file (see Lin et al., 2021 Figure 2). Emissions data is ingested from individual netCDF files of different inventories and combined into a collection of 3-D emissions containers for each chemical species. HEMCO also is capable of processing generic data containers that are used to provide data (e.g., UV albedo) for GEOS-Chem chemistry within CESM (working PR: #484).

Motivation to implement HEMCO as an online emissions data source is to remove the need for re-processing offline emission datasets for simulations, instead allowing users to select on-the-fly emission inventories to use, their hierarchy/precedence and effective geographical region, and apply scale factors on-line. The original netCDF input data does not need to be modified, sub-setting and regridding are performed by HEMCO automatically to the target simulation domain.

In its current implementation, HEMCO within CESM can provide emissions and data to GEOS-Chem chemistry (#484) and CAM-chem chemistry.

High level code description

The HEMCO-to-CESM interface (HEMCO_CESM) has been implemented as a module within CAM residing under src/hemco. The most recent version of the code is available at: https://github.com/jimmielin/HEMCO_CESM but will eventually be hosted under ESCOMP/HEMCO_CESM. It includes:

  • the main HEMCO source code as an external sourced from the HEMCO repository maintained by the GEOS-Chem Support Team (https://github.com/geoschem/HEMCO/),
  • a main driver (hemco_interface) designed to be called from control/cam_comp.F90, and
  • supporting routines (hco_cam_exports) designed to export data from the HEMCO interface.

HEMCO in CESM runs on its own separate grid and MPI decomposition (as initialized by hco_esmf_grid), currently limited to a rectilinear latitude-longitude grid (see limitations), and regrids its computed data to the CAM grid as described by cam_physics_mesh in recent versions of CAM.

3-D emissions can be computed. ESMF regridding is layer-by-layer, and vertical regridding is only performed when files are read into HEMCO_CESM.

Emissions and data computed are written into the CAM physics buffer (prefixed by HCO_) and retrieved by whatever component within CAM that requires this data.

The version of code submitted as pull request here does not have any GEOS-Chem dependencies and can be considered independently of the #484 PR for GEOS-Chem chemistry.

Requirements

Apart from the main HEMCO source code, the following things are required:

  • USE_ESMF_LIB=true is required for compile and env_mach_specific.xml needs to point to an ESMF version supporting netCDF I/O (ncdfio and not defio). This may be the default already in recent versions of CAM.

Limitations and points of discussion

  • HEMCO uses its own input module, using "vanilla" netCDF routines. HEMCO supports the implementation of different input modules (e.g., for operation within MAPL for the GEOS model, HEMCO uses ExtData). Due to time limitations it is not certain when HEMCO will support CDEPS as the input module.
  • For HEMCO to replace the current CAM-chem emissions module, input datasets used presently by CAM-chem for emissions need to be ported to HEMCO in a combination of input files (if these emission inventories are unavailable in HEMCO; see Lin et al., 2021, Table 1) and configuration files to combine the inventories into an end result consistent with presently-used configurations (e.g., CEDS + BB4CMIP6). The responsibility of maintaining input data sets and configuration files would need to be discussed as well.

Build instructions

Building instructions for a test case with CAM-chem support:

  • Checkout ESCOMP/CESM at master
  • include modded CAM external. Edit Externals.cfg to source from testing branch, based off cam_development (as of now, cam6_3_095)
[cam]
branch = HEMCO-CESM_rebased_on_cam6_3_045
protocol = git 
repo_url = https://github.com/CESM-GC/CAM
local_path = components/cam
externals = Externals_CAM.cfg
required = True
  • ./manage_externals/checkout_externals
  • cd cime/scripts
  • ./create_newcase --case ~/hemco_dev_202201 --compset FCSD_HCO --res f09_f09_mg17 --run-unsupported --mach cheyenne --project project_number_here (change --case and --project as appropriate). HEMCO compsets ending in _HCO for common chemistry compsets have been implemented. These add %HEMCO to the compset long name.
  • ./case.setup and ./case.build
  • You may also want to empty ext_frc_specifier, srf_emis_specifier in the atm namelist; they will be ignored if HEMCO_CESM is enabled, but emptying it will allow the pre-run phase to skip detection of these files.
  • run case for testing (./case.submit)

Notes

  • Compsets including HEMCO have been added with the suffix _HCO and long name equivalent %HEMCO.
  • HEMCO is now always built with CESM with minimal impact to compile time, however now ESMF lib must be included (USE_ESMF_LIB=true)
  • HEMCO is configured as a runtime option using use_hemco = .true. in the physics control namelist.

End notes and implementation roadmap

Target branch has been rebased over cam6_3_095, the latest cam_development as of March 7 2023. Changes include:

  • Runtime option use_hemco in phys control namelist to control HEMCO routine at runtime.
  • Namelist options
  • _HCO compsets for common chemistry compsets, adding %HEMCO to compset long name. This long name triggers the -hemco configure option used to set up namelist entries for HEMCO, but these namelist entries can also be added in the future.
  • everything else self-contained within the hemco external (in src/hemco) and of course the HEMCO external (in src/hemco/HEMCO) containing shared HEMCO code by all models.

We suggest implementing HEMCO within CESM in phases, first bringing HEMCO into CESM so GEOS-Chem can be operational, then implementing a transitional period for CAM-chem users to possibly use HEMCO, finally moving all CAM-chem compsets to full HEMCO usage.

Thank you for your help and please do not hesitate to provide any feedback.

…om Haipeng Lin

Feat: Implement HEMCO_CESM within CESM-GC (Initial attempt in syncing CAM sources)
Feat: HEMCO_CESM code integration (full). Add mo_sim_dat and mo_tracname
Fix: Call set_sim_dat from chemistry.F90 to pass solsym to HEMCO_CESM.
Fix: Do not assign other variables within mo_sim_dat -- to discuss
Fix: Comment out mo_sim_dat except solsym, expand solsym size to nTracersMax
*This might not be needed after all as HEMCO_CESM now reads tracer names from chem_mods. But fixing this so it doesn't infinite loop.
Copy link

@fvitt fvitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does HEMCO need to be a configure/build option? Can it be a runtime option? Does HEMCO work when CAM is on an unstructured grid?

bld/namelist_files/namelist_definition.xml Outdated Show resolved Hide resolved
src/chemistry/mozart/chemistry.F90 Outdated Show resolved Hide resolved
src/chemistry/mozart/mo_extfrc.F90 Outdated Show resolved Hide resolved
src/chemistry/mozart/mo_extfrc.F90 Outdated Show resolved Hide resolved
src/chemistry/mozart/mo_gas_phase_chemdr.F90 Outdated Show resolved Hide resolved
src/chemistry/mozart/mo_gas_phase_chemdr.F90 Outdated Show resolved Hide resolved
src/chemistry/mozart/mo_srf_emissions.F90 Outdated Show resolved Hide resolved
src/chemistry/mozart/mo_srf_emissions.F90 Outdated Show resolved Hide resolved
@jimmielin
Copy link
Member Author

jimmielin commented Jul 27, 2022

Hi @fvitt, thanks for the review. I'll update the code based on your comments.

Does HEMCO need to be a configure/build option? Can it be a runtime option?

HEMCO can be a runtime option. However, we need to make sure emissions are either handled by mo_extfrc and mo_srf_emissions, or HEMCO, never both, to avoid double-counting of emissions. Should we introduce a runtime option to select which emissions are used?

Of note is that GEOS-Chem chemistry only works with HEMCO (@lizziel), so there needs to be error-handling to ensure HEMCO is enabled when GEOS-Chem chemistry is used, as well.

Does HEMCO work when CAM is on an unstructured grid?

Yes, tested with ne30 grid for unstructured. And f09_f09_mg17, f19_f19_mg17 for regular grids.

Regarding comments in review:

Should use pbuf passed into subroutine rather and a module level pointer to pbuf2d
and
Rather than cluttering up mo_extfrc with HECO specific code, it would be preferrable to have HECMO specific module for setting forcings.

I will add a new HEMCO module to be included in chemistry for purposes of providing emissions to CAM-chem. The source changes would then be in mo_setext.F90 and chemistry.F90 to skip calls to extfrc_set and set_srf_emissions_time and retrieve fluxes from HEMCO instead.

I will commit updates to reflect this in a few days. Thanks!

@cacraigucar cacraigucar changed the title [WIP] Integrating HEMCO emissions component in CESM Integrating HEMCO emissions component in CESM Jul 27, 2022
@cacraigucar cacraigucar requested a review from gold2718 August 2, 2022 15:33
@jimmielin
Copy link
Member Author

Thanks for your comments. I have updated the code to reflect requested changes.

Code that is HEMCO-specific to retrieve emissions from the physics buffer are now in a separate module, src/chemistry/mozart/hco_cc_emissions.F90. This module implements two subroutines:

  • subroutine hco_set_srf_emissions( lchnk, ncol, sflx, pbuf )
  • subroutine hco_set_extfrc( lchnk, zint, frcing, ncol, pbuf )

These replace set_srf_emissions, and extfrc_set from mo_srf_emissions.F90, and mo_extfrc.F90, when the compile option HEMCO_CESM is set. This is to prevent double-counting of emissions.

Right now HEMCO-CESM is still implemented as a compile-time option. I can move it to a runtime option in the namelist in future commits if we want to go in this direction.

@cacraigucar cacraigucar removed the request for review from gold2718 August 23, 2022 15:58
src/chemistry/mozart/chemistry.F90 Outdated Show resolved Hide resolved
src/chemistry/mozart/chemistry.F90 Outdated Show resolved Hide resolved
src/chemistry/mozart/hco_cc_emissions.F90 Outdated Show resolved Hide resolved
src/chemistry/mozart/hco_cc_emissions.F90 Show resolved Hide resolved
src/chemistry/mozart/hco_cc_emissions.F90 Outdated Show resolved Hide resolved
src/chemistry/mozart/mo_extfrc.F90 Outdated Show resolved Hide resolved
src/chemistry/mozart/mo_srf_emissions.F90 Outdated Show resolved Hide resolved
src/control/cam_comp.F90 Outdated Show resolved Hide resolved
src/control/cam_comp.F90 Outdated Show resolved Hide resolved
src/control/cam_comp.F90 Outdated Show resolved Hide resolved
- Changed C-preprocessor constants to positively match HEMCO_CESM
- Fixed tabs indendation to spaces for consistency
- Removed unnecessary HEMCO_CESM check in mo_srf_emissions.F90 - now same file as upstream
- Added phase= dummy argument call for HCOI_Chunk_Run.
… allocating

and updating pbuf fields for safety.
@jimmielin
Copy link
Member Author

Thank you @fvitt for the comments and edits. I have updated the pull request with the corresponding updates.

@jimmielin
Copy link
Member Author

I've updated the PR with an additional update:

The &hemco namelist now includes a hemco_grid_xdim and hemco_grid_ydim configuration to specify the HEMCO intermediate grid resolution. Previously, this was hardcoded in src/hemco/hemco_interface.F90 and was not optimal.

Defaults are provided for 1.9x2.5, 0.9x1.25 and ne30np4 grids in namelist_defaults.xml.

Additionally, the HEMCO-CESM interface version used is now tagged (hemco-cesm1_0_hemco3_5_0) instead of tracking the development branch.

Thanks,
Haipeng

Copy link

@fvitt fvitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more questions

bld/config_files/definition.xml Outdated Show resolved Hide resolved
src/chemistry/mozart/mo_extfrc.F90 Outdated Show resolved Hide resolved
src/chemistry/mozart/mo_extfrc.F90 Outdated Show resolved Hide resolved
src/control/cam_comp.F90 Outdated Show resolved Hide resolved
src/control/cam_comp.F90 Outdated Show resolved Hide resolved
src/control/cam_comp.F90 Outdated Show resolved Hide resolved
src/control/cam_comp.F90 Outdated Show resolved Hide resolved
@jimmielin
Copy link
Member Author

Hi @fvitt, @cacraigucar, is there any way I can obtain the absolute path of inputdata at runtime?

There are two relevant lines in the HEMCO configuration file that need to be revised:

ROOT:                        /glade/p/univ/umit0034/ExtData/HEMCO
...
DiagnFile:                   /glade/p/cesmdata/cseg/inputdata/atm/cam/geoschem/emis/HEMCO_Diagn.3_5_0.c230307.rc

We can solve this by:

  • Changing the HEMCO configuration file handling so that a token (maybe {CESM_INPUTDATA}) can be replaced by the correct inputdata path (/glade/p/cesmdata/cseg/inputdata) at runtime.
  • Soft-linking /glade/p/cesmdata/cseg/inputdata/atm/cam/geoschem/emis/ExtData/HEMCO to /glade/p/univ/umit0034/ExtData/HEMCO so the full set of HEMCO input files is accessible as a path from inputdata. The files in /glade/p/univ/umit0034/ExtData/HEMCO may be too large to host under cesmdata/cesg/inputdata for now. I think we have not decided yet how these files will be maintained in coordination between the CESM and GEOS-Chem communities (the full dataset is currently hosted at WUSTL at http://geoschemdata.wustl.edu/ExtData/HEMCO/)

@fvitt
Copy link

fvitt commented Jun 15, 2023

Hi @fvitt, @cacraigucar, is there any way I can obtain the absolute path of inputdata at runtime?

@jimmielin ,
You can add a namelist variable for the HEMCO input root, such as 'hemco_data_root'. 'DiagnFile' could be moved from the hemco config file to the namelist and named something like 'hemco_DiagnFile'. The absolute paths will be set at build-namelist time.

It would be nice if there is a minimal set of HEMCO input files for CAM-Chem that can be easily downloaded to other machines. The data volume under /glade/p/univ/umit0034/ExtData/HEMCO is quite large.

@jimmielin
Copy link
Member Author

Thanks @fvitt, I can work on this now. Just to confirm:

The absolute paths will be set at build-namelist time.

If I set hemco_data_root as a relative path in the default (e.g., atm/cam/geoschem/emis/ExtData/HEMCO) and set it up with input_pathname="abs" in the definition, the namelist generation will transform this to an absolute path (/glade/p/cesmdata/cseg/inputdata/atm/cam/geoschem/emis/ExtData/HEMCO), is that correct?

Then it should be easy for me to make HEMCO_CESM provide that to HEMCO instead.

I'll add two namelist variables, hemco_data_root and hemco_diagn_file (named consistently with hemco_config_file).

It would be nice if there is a minimal set of HEMCO input files for CAM-Chem that can be easily downloaded to other machines.

We've explored this idea before but didn't go through. I will look at some reasonable defaults for the compsets we added. Probably still need a few years worth of data, hopefully a within a few GBs.

@fvitt
Copy link

fvitt commented Jun 15, 2023

If I set hemco_data_root as a relative path in the default (e.g., atm/cam/geoschem/emis/ExtData/HEMCO) and set it up with input_pathname="abs" in the definition, the namelist generation will transform this to an absolute path (/glade/p/cesmdata/cseg/inputdata/atm/cam/geoschem/emis/ExtData/HEMCO), is that correct?

Correct. For an example see "Nudge_Path" in bld/namelist_files/namelist_definition.xml.

@jimmielin
Copy link
Member Author

Thanks @fvitt, I have added these features into HEMCO-CESM and updated the external version to hemco-cesm1_2_0_hemco3_6_2_cesm. The namelist variables hemco_data_root and hemco_diagn_file will control the root data directory and diagnostics file location, so they will be machine-agnostic.

  1. I have created a new version of the HEMCO configuration file at /glade/work/hplin/hco_inputdata/HEMCO_Config.CC.CEDS_AEIC19.NEx.c230615.rc (note it doesn't have any Cheyenne-specific paths now). Could you please copy it to /glade/p/cesmdata/cseg/inputdata/atm/cam/geoschem/emis?

I think these old files can also be deleted now:

-r--r--r-- 1 fvitt cseg 534910 May 23 15:57 HEMCO_Config.CC.CEDS_AEIC19.c230517.rc
-r--r--r-- 1 fvitt cseg 534933 May 23 16:25 HEMCO_Config.CC.CEDS_AEIC19.c230523.rc
-r--r--r-- 1 fvitt cseg 534933 Jun  9 08:02 HEMCO_Config.CC.CEDS_AEIC19.NEx.c230608.rc
-r--r--r-- 1 fvitt cseg 534910 Mar  7 15:27 HEMCO_Config_FCnudged.CC.c230306.rc
-r--r--r-- 1 fvitt cseg 534910 Mar  7 15:27 HEMCO_Config_FCnudged.CC.interp.c230307.rc

So we just keep HEMCO_Config.CC.CEDS_AEIC19.NEx.c230615.rc and HEMCO_Diagn.3_5_0.c230307.rc.

  1. For the new data root to work, /glade/p/univ/umit0034/ExtData/HEMCO has to be linked from /glade/p/cesmdata/cseg/inputdata/atm/cam/geoschem/emis/ExtData/HEMCO. The extra folder ExtData is intentional to keep the folder hierarchy consistent with GEOS-Chem data conventions.

  2. I will work on finding a minimal set of data that works at least to run the _HCO compsets out of the box and update back soon.

Thanks!

@gold2718
Copy link
Collaborator

Sorry for the late question but I've been wondering if all this work has to be done at run time.
If the config file could be parsed at build namelist (cime_config/buildnml) time, the files for the current run could be identified and added to Buildconf/cam.input_data_list. Then the usual process (check_input_data) would find the files in various servers (the WUSTL server could easily be added) to make sure they are available for the run. Finding the files at runtime would then be easier, similar to how other files are found and opened in CESM.
Is this a completely dumb idea?

@jimmielin
Copy link
Member Author

Hi @gold2718, thanks for the comment - that is a great idea and something we want to move towards. However it's probably not feasible at the moment.

The current infrastructure for HEMCO to identify necessary files for the run is a "dry-run" option which will go through all the time steps and simulate the read of files (only checking if they exist) and printing out a list of present/not present. But this currently relies on the GEOS-Chem model to drive this and there's no lightweight version of this program.

We could also parse the configuration file directly, but it would take time to develop, and we have plans to move to YAML or other open formats in the future, so we don't want to develop this for the current config file format which will be superseded soon.

So definitely in the future we want to integrate into check_input_data by having a script to parse the HEMCO config file & the namelists and list the necessary files, but the infrastructure is not ready yet. So we have to do this juggling at run time at the moment.

@fvitt
Copy link

fvitt commented Jul 6, 2023

@jimmielin I am getting the follow error from check_input_data in my latest CAM-CHEM-HEMCO tests:

Model cam missing file hemco_data_root = '/glade/p/cesmdata/cseg/inputdata/atm/cam/geoschem/emis/ExtData/HEMCO'

Any ideas?

@brian-eaton
Copy link
Collaborator

@fvitt, I think this error is due to including the attribute input_pathname="abs" in the namelist definition. If you remove that then that variable won't be put into the list of input datafiles.

@jimmielin
Copy link
Member Author

Hi @brian-eaton, would build-namelist still automatically make it an absolute path if input_pathname="abs" is removed? We specify <hemco_data_root>atm/cam/geoschem/emis/ExtData/HEMCO</hemco_data_root> in namelist_defaults_cam.xml so its relative to inputdata. Thanks!

@jimmielin
Copy link
Member Author

Hi @fvitt, may I ask if this definition look correct for specifying this as a relative path to inputdata? It looks like check_input_data is attempting to look for a file but this is a path. Maybe I configured this wrong.

<entry id="hemco_data_root" type="char*256" input_pathname="abs" category="hemco"
       group="hemco_nl" valid_values="" >
Full pathname of HEMCO data root for use in reading HEMCO input files. (e.g., 'atm/cam/geoschem/emis/ExtData/HEMCO').
Default: set by build-namelist.
</entry>

@fvitt
Copy link

fvitt commented Jul 7, 2023

@jimmielin and @brian-eaton
I needed to create a link to /glade/p/univ/umit0034/ExtData.
I think we should keep input_pathname="abs" in the hemco_data_root definition so that the check happens.

@brian-eaton
Copy link
Collaborator

@fvitt, you're right. I forgot how the variables that specify a root filepath work.

@fvitt
Copy link

fvitt commented Jul 10, 2023

@jimmielin @lkemmons @cacraigucar
If there are no objections I would like to move forward with merging this into cam_development with the understanding that HEMCO will not supported until the restart/reproducibility issues are resolved.

@lkemmons
Copy link
Collaborator

lkemmons commented Jul 10, 2023 via email

@jimmielin
Copy link
Member Author

Thanks @fvitt! I have no objections as well. I am working on the restart/reproducibility issues (waiting for some patches from upstream GEOS-Chem/HEMCO) and will soon create a minimal set of input data that can be distributed and works with the provided compsets out of the box.

@peverwhee peverwhee changed the title Integrating HEMCO emissions component in CESM cam6_3_118: Integrating HEMCO emissions component in CESM Jul 14, 2023
@fvitt fvitt merged commit 69c5b1c into ESCOMP:cam_development Jul 17, 2023
gold2718 pushed a commit to gold2718/CAM that referenced this pull request Aug 16, 2023
Merge pull request ESCOMP#560 from CESM-GC/HEMCO-CESM_rebased_on_cam6_3_045

cam6_3_118: Integrating HEMCO emissions component in CESM

ESCOMP commit: 69c5b1c
gold2718 pushed a commit to gold2718/CAM that referenced this pull request Aug 18, 2023
Merge pull request ESCOMP#560 from CESM-GC/HEMCO-CESM_rebased_on_cam6_3_045

cam6_3_118: Integrating HEMCO emissions component in CESM

ESCOMP commit: 69c5b1c
gold2718 pushed a commit to gold2718/CAM that referenced this pull request Aug 23, 2023
Merge pull request ESCOMP#560 from CESM-GC/HEMCO-CESM_rebased_on_cam6_3_045

cam6_3_118: Integrating HEMCO emissions component in CESM

ESCOMP commit: 69c5b1c
gold2718 pushed a commit to gold2718/CAM that referenced this pull request Aug 25, 2023
Merge pull request ESCOMP#560 from CESM-GC/HEMCO-CESM_rebased_on_cam6_3_045

cam6_3_118: Integrating HEMCO emissions component in CESM

ESCOMP commit: 69c5b1c
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Tag
Development

Successfully merging this pull request may close these issues.

9 participants