Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion: Fixing ObsPack diagnostics in GEOS-Chem and/or standardizing ObsPack file inputs #2328

Open
Tracked by #2312
yantosca opened this issue Jun 14, 2024 · 9 comments
Assignees
Labels
category: Discussion An extended discussion of a particular topic topic: Diagnostics Related to output diagnostic data topic: Input Data Related to input data

Comments

@yantosca
Copy link
Contributor

Your name

Bob Yantosca

Your affiliation

Harvard + GCST

Please provide a clear and concise description of your question or discussion topic.

The ObsPack diagnostic relies on input files from NOAA that often have inconsistent dimensions, depending on which sites they come from. This issue can be a central point of discussion for efforts related to fix outstanding issues in the ObsPack diagnostic in GEOS-Chem.

There are also few python packages for prepping ObsPack inputs that we could try to add to the community folder of GCPy. We can also keep track of these development efforts here.

Tagging @eastjames @jhaskinsPhD @alli-moon

@yantosca yantosca added topic: Input Data Related to input data topic: Diagnostics Related to output diagnostic data category: Discussion An extended discussion of a particular topic labels Jun 14, 2024
@yantosca yantosca self-assigned this Jun 14, 2024
@yantosca yantosca changed the title Discussion: FIxing ObsPack diagnostics in GEOS-Chem and/or standardizing ObsPack file inputs Discussion: Fixing ObsPack diagnostics in GEOS-Chem and/or standardizing ObsPack file inputs Jun 14, 2024
@jhaskinsPhD
Copy link
Contributor

Hi Folks,

I last had Obspack working for me in v13.3.4, but have gotten errors using it in GCClassic v14.0.0, v14.2.3 and v14.3.0. The error I get looks like this:

********************************************
* B e g i n   T i m e   S t e p p i n g !! *
********************************************

---> DATE: 2013/06/01  UTC: 00:00
 HEMCO already called for this timestep. Returning.

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

In Ncrd_1d_Char #2:  NetCDF: Start+count exceeds dimension bound
     65536         6

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

Code stopped from DO_ERR_OUT (in module NcdfUtil/m_do_err_out.F90) 

This is an error that was encountered in one of the netCDF I/O modules,
which indicates an error in writing to or reading from a netCDF file!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

real 51.28
user 441.31
sys 5.71
srun: error: notch201: task 0: Exited with exit code 231

I think this is coming from a call to m_do_err_out.F90 that I added when we made the update to include the Obspack Wildcard where it will throw an error if it can't find/read the obspack input files when the diagnostic is turned on, since before that it would actually complete the run and just not save any Obspack output without an error & it seems to be an issue with the expected dimensions of the inputs.

@eastjames has code to convert the default Obspack files into a format readable by GEOS-Chem here and I have code to make your own Obspack files for GEOS-Chem sampling here.

Between the two repos, I think we have a consistent way to build Obspack files for GEOS-Chem input that are consistent with the documentation on the Obspack inputs needed on ReadtheDocs, but I haven't done enough digging to figure out what exactly the newer versions are throwing this error.

I know that at some point between v13.0.0 and v13.3.4, the required Obspack Input files changed from requiring a "time components" input in [YYYY, MM, DD, HH, mm, SS] list to a 'time' input in units of seconds since 1970-01-01 00:00:00 and the code got a bit more picky about the types of the required inputs (e.g. having the Obspack ID required as a S200 bytes string) . The default Obspack files seem to have both of these inputs (in some of the files), but as someone pointed out at IGC11, the default variables in the different types of Obspack files varies a bit. I think there must be something in the GEOS-Chem code that reads in the files that is expecting one of these time components in a different format than we're giving it now... So, some info on what exactly it's expecting would be valuable.

In my repo, I've uploaded an example Obspack input file here I created using my obspack_io.py script for sampling at the SOAS site on June 1st, 2013 that did work in previous versions, but is now throwing the error above when I turn the diagnostic on. When I compared it to a file I generated from @eastjames's repo on more classic Obspack files, it seemed entirely consistent in the required inputs listed on ReadTheDocs & I thought it was just an error with my code until IGC11.

@yantosca
Copy link
Contributor Author

Thanks @jhaskinsPhD for this. I think complicating matters is that we had to switch from the netCDF-F77 to the netCDF-F90 interface (which was required by CESM). This shouldn't matter, but who knows. I will try to replicate your error with the sample ObsPack file.

For your reference, here is some info about netCDF strings.

@yantosca
Copy link
Contributor Author

Hi @jhaskinsPhD and @eastjames. I was looking at the example netCDF file from https://github.com/jhaskinsPhD/gcpy_campaigns folder (which I believe represents the most recent ObsPack format) and I get this output:

$ ncdump -cts obspack_input.20130601.nc | grep obs
1:netcdf obspack_input.20130601 {
3:      obs = 48 ;
6:      int64 obs(obs) ;
7:              obs:long_name = "Sample latitude" ;
8:              obs:_Storage = "chunked" ;
9:              obs:_ChunkSizes = 1024LL ;
10:             obs:_Endianness = "little" ;
11:             obs:_Storage = "contiguous" ;
12:             obs:_Endianness = "little" ;
13:     int64 time(obs) ;
23:     float latitude(obs) ;
33:     float longitude(obs) ;
43:     float altitude(obs) ;
54:     char obspack_id(obs, string200) ;
55:             obspack_id:long_name = "Unique ObsPack observation id" ;
56:             obspack_id:comment = "Unique observation id string that includes obs_id, dataset_id and obspack_num." ;
57:             obspack_id:_Storage = "chunked" ;
58:             obspack_id:_ChunkSizes = 1LL ;
59:             obspack_id:_DeflateLevel = 5LL ;
60:             obspack_id:_Storage = "contiguous" ;
61:     int64 CT_sampling_strategy(obs) ;
79: obs = 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 

But in one of my older obspack data files that I had used for testing, I get this output:

$ ncdump -cts obspack_co2_1_OCO2MIP_2018-11-28.2018092 | grep obs
61:netcdf obspack_co2_1_OCO2MIP_2018-11-28.20180926 {
3:      obs = UNLIMITED ; // (662 currently)
7:      int obs(obs) ;
8:              obs:long_name = "obs" ;
9:              obs:_Storage = "chunked" ;
10:             obs:_ChunkSizes = 1024 ;
11:             obs:_Endianness = "little" ;
12:     int time(obs) ;
20:     int model_sample_window_start(obs) ;
28:     int model_sample_window_end(obs) ;
36:     float latitude(obs) ;
44:     float longitude(obs) ;
52:     float altitude(obs) ;
61:     float value(obs) ;
70:     double time_decimal(obs) ;
79:     int time_components(obs, calendar_components) ;
88:     char obspack_id(obs, string_of_200chars) ;
89:             obspack_id:long_name = "Unique ObsPack observation id" ;
90:             obspack_id:comment = "Unique observation id string that includes obs_id, dataset_id and obspack_num." ;
91:             obspack_id:_Storage = "chunked" ;
92:             obspack_id:_ChunkSizes = 1, 200 ;
93:             obspack_id:_DeflateLevel = 5 ;
94:     int obs_flag(obs) ;
95:             obs_flag:units = "binary" ;
96:             obs_flag:_FillValue = -9 ;
97:             obs_flag:long_name = "obspack flag" ;
98:             obs_flag:comment = "Determined by data provider (1: large spatial scale representation; 0: local/regional influence)" ;
99:             obs_flag:_Storage = "chunked" ;
100:            obs_flag:_ChunkSizes = 662 ;
101:            obs_flag:_DeflateLevel = 5 ;
102:            obs_flag:_Endianness = "little" ;
103:    int CT_sampling_strategy(obs) ;
111:    float CT_MDM(obs) ;
120:    float CT_RMSE(obs) ;
129:    int CT_assim(obs) ;
138:    int CT_may_reject(obs) ;
146:    int CT_may_localize(obs) ;
170: obs = 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 

As you can see, the time variable has been updated from int (aka INTEGER) to int64 (aka INTEGER*8) between the older and newer file. I suspect this has to do with the Year 2038 problem. In other words, a 4-byte integer won't be sufficient to store Linux time values (seconds since 1970) after the year 2038. Then you need to use an 8-byte integer. Maybe they updated the format of the data files to be proactive.

Maybe a long-term fix would be to add logic into GEOS-Chem to test the type of the time variable and then read it into the appropriately-typed variable. I'm not sure I have time to do that now but it's something to think about.

@yantosca
Copy link
Contributor Author

yantosca commented Dec 5, 2024

Hi @jhaskinsPhD @eastjames: It seems like as long as people use one of the preprocessing packages for ObsPack then files can be produced that will work in GEOS-Chem "out-of-the-box".

I guess the larger question is, are folks OK with using the preprocessors? Or is there a need to rewrite the ObsPack module so that it can handle the newer files w/o preprocessing (such as for input into the IMI or whatever other application?) If the former, then I can close out this issue. If the latter, we may need to think about this a bit further. Also tagging @msulprizio @lizziel.

@lizziel
Copy link
Contributor

lizziel commented Dec 6, 2024

My two cents is it would be nice for GEOS-Chem to work with ObsPack files directly if the new format is stable. We could reach out to the ObsPack developers to find out the motivation for the format change and whether the new format is planned to be permanent for the long-term. If yes, we could update GEOS-Chem to use the new format moving forward. Perhaps the tools to convert the new files to the older format used by GEOS-Chem could be reverse-engineered to convert old files to the new format. The tools for this could be included in the ObsPack directory in GEOS-Chem if the authors are okay with that. Maybe someone from the community could even make that tool.

Copy link
Contributor Author

yantosca commented Dec 6, 2024

That's a good idea @lizziel. I'll reach out.

@yantosca
Copy link
Contributor Author

yantosca commented Dec 6, 2024

@lizziel @msulprizio @eastjames @jhaskinsPhD: I think I may know what is going on here. So the original ObsPack diagnostic in GEOS-Chem was sent to us by Andy Jacobson, who was working with the CO2 GLOBALVIEW_plus files. These files have the following variables:

netcdf obspack_co2_1_GLOBALVIEWplus_v6.0_2020-09-11.20190408 {
dimensions:
	obs = UNLIMITED ; // (3111 currently)
	calendar_components = 6 ;
	string_of_200chars = 200 ;
	string_of_500chars = 500 ;
variables:
	int obs(obs) ;
	int time(obs) ;
	int model_sample_window_start(obs) ;
	int model_sample_window_end(obs) ;
	float latitude(obs) ;
	float longitude(obs) ;
	float altitude(obs) ;
	float intake_height(obs) ;
	float elevation(obs) ;
	float value(obs) ;
	double time_decimal(obs) ;
	int time_components(obs, calendar_components) ;
	char obspack_id(obs, string_of_200chars) ;
	int obs_flag(obs) ;
	int CT_sampling_strategy(obs) ;
		CT_sampling_strategy:_FillValue = -9 ;
		CT_sampling_strategy:long_name = "model sampling strategy" ;
		CT_sampling_strategy:values = "How to sample model. 1=4-hour avg; 2=1-hour avg; 3=90-min avg; 4=instantaneous" ;
	float CT_MDM(obs) ;
	float CT_RMSE(obs) ;
	int CT_assim(obs) ;
	int CT_may_reject(obs) ;
	int CT_may_localize(obs) ;
	char ccgg_evn(obs, string_of_500chars) ;

but the latest GLOBALVIEWplus product for CH4 uses these variables:

netcdf ch4_zsf_surface-insitu_25_allvalid {
dimensions:
	obs = UNLIMITED ; // (23762 currently)
	string_of_100chars = 100 ;
	calendar_components = 6 ;
	string_of_10chars = 10 ;
	string_of_200chars = 200 ;
	dim_concerns = 6 ;
variables:
	int time(obs) ;
	int start_time(obs) ;
	int midpoint_time(obs) ;
	char datetime(obs, string_of_100chars) ;
	double time_decimal(obs) ;
	int time_components(obs, calendar_components) ;
	int solartime_components(obs, calendar_components) ;
	float value(obs) ;
	int nvalue(obs) ;
	float value_std_dev(obs) ;
	float latitude(obs) ;
	float longitude(obs) ;
	float altitude(obs) ;
	float elevation(obs) ;
	float intake_height(obs) ;
	char qcflag(obs, string_of_10chars) ;
	char instrument(obs, string_of_100chars) ;
	char source_id(obs, string_of_200chars) ;
	int obs_flag(obs) ;
	int assimilation_concerns(obs, dim_concerns) ;
	int obspack_num(obs) ;
	char obspack_id(obs, string_of_200chars) ;

So it looks like the format is totally different between the CO2 product and the CH4 product. This is why GEOS-Chem can't parse the latest CH4 files w/o pre-processing.

In particular, the GEOS-Chem uses the "CT_sampling_strategy" field, which determines how the data should be averaged in the atmospheric model. The preprocessing scripts add this variable in the processed files (among other variables).

I wonder if the eventual solution would be to have 2 parsing routines in ObsPack, one for CO2 and one for CH4.

I will also reach out to ObsPack support to ask why the CT_sampling_strategy isn't in the CH4 files and how best to replicate that.

@eastjames
Copy link

Hi @yantosca, thanks for catching this. I can confirm that we have to infer the CT_sampling_strategy from the measurement time fields in the CH4 obspack files. That is done on these lines in my code for processing CH4 obspack data. @hannahnesser originally shared that part of the code with me so she may have insight. If ObsPack can provide that natively with CH4 files, that would help.

@yantosca
Copy link
Contributor Author

@eastjames @jhaskinsPhD @lizziel @msulprizio: I heard back from NOAA ObsPack support:

The CT_ variables in the CO2 GV+ are model derived data from our CarbonTracker CO2 product, reflecting data assimilation, sampling, withholding and other data choices used while running the model. We make these available for use by other modelers and to make comparisons easier. CarbonTracker CH4 is a relatively new model product being developed by Lori Bruhwiler & Youmi Oh (cc’d). I’ll let them reply on future development plans and whether they intend to make similar model information available for use by the community. If so, we would likely employ a similar strategy for the CH4_GLOBALVIEWplus and include them using the same variable names.

So perhaps we would have the CT_sampling_strategy field available in a future release of ObsPack. I'm inclined to let this go for now and just use the Python preprocessor since this is a workable solution (and since we have many other pressing demands on GCST time at the moment). But let me know what you all think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: Discussion An extended discussion of a particular topic topic: Diagnostics Related to output diagnostic data topic: Input Data Related to input data
Projects
None yet
Development

No branches or pull requests

4 participants