-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bugfix: Fix memory management issues by replacing variable length arrays with STL vectors and arrays #3075
Comments
This problem is due to a buffer overflow when using a fixed length character array. Here is the contents of the
That long string consists of 568 characters. However, line 170 of this So this array overflow is very likely the source of the segfault! |
… attribute value in a string rather than a fixed-length character array for which overflow may occur.
I am testing a fix for this on seneca in
If all the tests run without error and no differences are flagged, I'll submit a PR to merge this change into the Note that we could also consider updating this line of the |
…g to using -O2 optimization since that's what how we configure installations on supported platforms. This makes the testing environment more simliar to the deployed versions. And we've found some bugs due to unexpected behavior when compiled with -O2 optimization.
Note that running the unit tests on seneca with
The output file contains corrupted diagnostics names, whereas the non-optimized diagnostic names look fine:
|
…an a pointer to temporary memory to solve the problem with diagnostic names in unit_tc_pairs.xml when compiled with optimization enabled.
…ers rather than local variables which go out of scope.
…ignment operator is used when the == comparison operator is needed.
* Per #3075, update get_att_value_chars() utility function to store the attribute value in a string rather than a fixed-length character array for which overflow may occur. * Per #3075, switch from compiling MET in Docker using the -g debug flag to using -O2 optimization since that's what how we configure installations on supported platforms. This makes the testing environment more simliar to the deployed versions. And we've found some bugs due to unexpected behavior when compiled with -O2 optimization. * Per #3075, remove accidentally committed log file * Per #3075, update TrackInfo::diag_name() to return a string rather than a pointer to temporary memory to solve the problem with diagnostic names in unit_tc_pairs.xml when compiled with optimization enabled. * Per #3075, update read_netcdf_logic() to store pointers to class members rather than local variables which go out of scope. * Per #3075, don't need to use local variables at all. * Per #3075, switch to using STL vectors for memory management * Per #3075, reimplement month_name_to_m() with stl strings to avoid variable length arrays. * Per #3075, update MetNcFile::readFile() to use stl vectors instead of variable length arrays * Per #3075, update NcCfFile member functions to use stl vectors instead of variable length arrays * Per #3075, update is_netcdf_file() to use stl vectors instead of variable length arrays * Per #3075, update 3d_conv.cc to use stl vectors instead of variable length arrays * Per #3075, update the vx_util library to use stl vectors instead of variable length arrays * Per #3075, update ensemble_stat to use stl vectors instead of variable length arrays * Per #3075, update decode_lat_lon() to use stl vectors instead of variable length arrays * Per #3075, update grid_diag to use stl vectors instead of variable length arrays * Per #3075, update ioda2nc to use stl vectors instead of variable length arrays * Per #3075, update madis2nc to use stl vectors instead of variable length arrays * Per #3075, update mode_graphics to use stl vectors instead of variable length arrays * Per #3075, update the vx_nc_obs library to use stl vectors instead of variable length arrays * Per #3075, update plot_point_obs to use stl vectors instead of variable length arrays * Per #3075, update point_stat to use stl vectors instead of variable length arrays * Per #3075, update wavelet_stat to use stl vectors instead of variable length arrays * Per #3075, no real code change, just whitespace * Per #3075, removing commented out code * Per #3075, need to add 2 to account for time_count being initialized to -1. An array of length 0 is different from a vector of length 0. * Per #3075, can't use 2D vectors to read data from NetCDF files into a contiguous block of memory. * Per #3075, can't use 2D vectors to read data from NetCDF files into a contiguous block of memory. * Per #3075, update looping logic * Per #3075, eliminate all instances of vector<vector<type>> since it's not stored in contiguous memory and therefore not useful for reading data from the NetCDF files. * Per #303075, bit more madis2nc changes. * Per #3075, fix Nx typo * Per #3075, fix chaNetCDF attribute character type * Per #3075, minor changes to satisfy SonarQube findings. * Per #3075, when sizing vectors of type <char> add one for the trailing null. * Per #3075, remove debugging code. * Per #3075, unit_ioda2nc.xml fails when compiled with Intel since there are issues parsing NC_STRING attribute types. Reverting back to the previous logic from main_v12.0 since that works. * Per #3750, back out the change to using -O2 in development.docker. With it, differences on flagged by GHA. I'd like to make sure all the changes on this branch cause NO differences before switching to using -O2... most likely in the develop branch rather than main_v12.0. * Per #3075, getting segfault from point2grid. Null terminating character vectors after reading NetCDF attributes just to be safe. * Unrelated to #3075, only whitespace changes. * Per #3075, fix logic of the write_nc(...) function so that all variable attributes and added and defined prior to writing the data for that variable. Writing attributes AFTER the data, as we had been doing, causes unexpected failures, as found when compiled with Intel. * Per #3075, update args to write_nc(...) to minimize regression test diffs. * Per #3075, fix madis2nc i_buf definition problem. * Per #3075, more refinement of i_buf definition in madis2nc for acars and raob inputs. * Per #3075, remove FFLAGS from development.docker becuase there's no good reason to add it. * replace raw array with std array * replace raw array with std array in 2 other places * Per #3075, fix clear bug in vx_bool_calc/tokenizer.cc where the = assignment operator is used when the == comparison operator is needed. --------- Co-authored-by: George McCabe <[email protected]>
* use custom GitHub Action to trigger METplus use cases * Updating values * Bugfix #3020 main_v12.0 grid_stat_seeps (#3022) * Per #3020, add missing GridStatNcOutInfo::do_seeps flag and use it to determine if SEEPS information should be written to the Grid-Stat NetCDF matched pairs output file. * Unrelated to #3020, fix broken NetCDF cf-conventions links in the User's Guide. * Per #3020, no real changes. Just whitespace * Update to reflect usage of oneAPI compilers * Updating file to reflect usage of oneAPI compilers * Hotfix to the main_v12.0 branch after PR #3022 fixed a SEEPS bug. The GridStatConfig_SEEPS config file needs to be updated with nc_pairs_flag.seeps = TRUE in order for the same output to be produced by the unit tests. * Adding In Memoriam * Feature #3032 main_v12.0 docs data type (#3040) * Per #3032, add data type column to all of the output tables * Per #3032, remove the first row from each output table since its info is repeated from the table name. Additional changes for consistency and accuracy in column names. * Update docs/Users_Guide/gsi-tools.rst Co-authored-by: Julie Prestopnik <[email protected]> --------- Co-authored-by: Julie Prestopnik <[email protected]> * Making a superficial change in the main_v12.0 branch to trigger GHA to create and push an updated test output image. * Feature #3033 v12.0.0 (#3042) * Per #3033, update version info, consolidate release notes, and add upgrade instructions. * Per #3033, remove all instances of 'Bugfix: ' from the release notes since it's redundant with the dropdown name * Per #3030, based on request from Randy Pierce, also add MTD header columns to met_header_columns_v12.0.txt to make it easier to parse the output from MET. * Per #3033, fix typo and correct alignment in table * Update install_met_env.acorn Removing reference to beta version * Update install_met_env.cactus Remove references to beta version * Update install_met_env.cactus Update paths for eckit and atlas * Update install_met_env.wcoss2 Remove beta references * Fix typo, missing one * to make SciPy bold in appendixF.rst * Per #3051, update unit tests so that installed files are found relative to MET_BASE (<install_loc>/share/met) and other files that are only in the MET repo are found relative to MET_TEST_BASE (MET/internal/test_unit). Also remove MET_BUILD_BASE env var (#3052) * Bugfix #3054 main_v12.0 parusr (#3068) * Per #3054, fix PARUSR BUFRLIB error by solving the upstream reference to temporary memory returned by c_str(). Store a copy of the temporary variable name in a string rather than a pointer to temporary memory. Note that I checked all other calls to c_str() in pb2nc.cc and found these 2 instances to be only problematic ones. All others are used as arguments to functions for which a copy is made. * Unrelated to #3054, but discovered while investigating the dtcenter/METplus#2875 discussion, the PairBase::calc_obs_summary() function loops over map entries and attempts to update the mapped 'summary_val' value. However, the current version only updates it in a copy and not what's actually in the map. This changes how we loop over the map to actually udpate its contents. Note that the only impact is fixing a log file to accurately report the 'summary_val'. So this is really a logging bug. * Per #3054, revert emplace_back() to its original push_back() to make the bugfix diffs as limited as possible. * Per #3054, correct bugfix in PairBase::calc_obs_summary() in pair_base.cc --------- Co-authored-by: MET Tools Test Account <[email protected]> * Per #3070, updates for the 12.0.1 bugfix release. (#3071) * Updating file for 12.0.1 installation for NCO * Updating to 12.0.1 for NCO * Update and rename 12.0.0_acorn to 12.0.1_acorn for NCO * Rename 12.0.0.lua_wcoss2 to 12.0.1.lua_wcoss2 for NCO * Update 12.0.0_hercules * Update install_met_env.hercules * Update compiler and MET version in install_met_env.orion * Update compiler and MET version in 12.0.0_orion * Bugfix #3075 main_v12.0 optimization (#3076) * Per #3075, update get_att_value_chars() utility function to store the attribute value in a string rather than a fixed-length character array for which overflow may occur. * Per #3075, switch from compiling MET in Docker using the -g debug flag to using -O2 optimization since that's what how we configure installations on supported platforms. This makes the testing environment more simliar to the deployed versions. And we've found some bugs due to unexpected behavior when compiled with -O2 optimization. * Per #3075, remove accidentally committed log file * Per #3075, update TrackInfo::diag_name() to return a string rather than a pointer to temporary memory to solve the problem with diagnostic names in unit_tc_pairs.xml when compiled with optimization enabled. * Per #3075, update read_netcdf_logic() to store pointers to class members rather than local variables which go out of scope. * Per #3075, don't need to use local variables at all. * Per #3075, switch to using STL vectors for memory management * Per #3075, reimplement month_name_to_m() with stl strings to avoid variable length arrays. * Per #3075, update MetNcFile::readFile() to use stl vectors instead of variable length arrays * Per #3075, update NcCfFile member functions to use stl vectors instead of variable length arrays * Per #3075, update is_netcdf_file() to use stl vectors instead of variable length arrays * Per #3075, update 3d_conv.cc to use stl vectors instead of variable length arrays * Per #3075, update the vx_util library to use stl vectors instead of variable length arrays * Per #3075, update ensemble_stat to use stl vectors instead of variable length arrays * Per #3075, update decode_lat_lon() to use stl vectors instead of variable length arrays * Per #3075, update grid_diag to use stl vectors instead of variable length arrays * Per #3075, update ioda2nc to use stl vectors instead of variable length arrays * Per #3075, update madis2nc to use stl vectors instead of variable length arrays * Per #3075, update mode_graphics to use stl vectors instead of variable length arrays * Per #3075, update the vx_nc_obs library to use stl vectors instead of variable length arrays * Per #3075, update plot_point_obs to use stl vectors instead of variable length arrays * Per #3075, update point_stat to use stl vectors instead of variable length arrays * Per #3075, update wavelet_stat to use stl vectors instead of variable length arrays * Per #3075, no real code change, just whitespace * Per #3075, removing commented out code * Per #3075, need to add 2 to account for time_count being initialized to -1. An array of length 0 is different from a vector of length 0. * Per #3075, can't use 2D vectors to read data from NetCDF files into a contiguous block of memory. * Per #3075, can't use 2D vectors to read data from NetCDF files into a contiguous block of memory. * Per #3075, update looping logic * Per #3075, eliminate all instances of vector<vector<type>> since it's not stored in contiguous memory and therefore not useful for reading data from the NetCDF files. * Per #303075, bit more madis2nc changes. * Per #3075, fix Nx typo * Per #3075, fix chaNetCDF attribute character type * Per #3075, minor changes to satisfy SonarQube findings. * Per #3075, when sizing vectors of type <char> add one for the trailing null. * Per #3075, remove debugging code. * Per #3075, unit_ioda2nc.xml fails when compiled with Intel since there are issues parsing NC_STRING attribute types. Reverting back to the previous logic from main_v12.0 since that works. * Per #3750, back out the change to using -O2 in development.docker. With it, differences on flagged by GHA. I'd like to make sure all the changes on this branch cause NO differences before switching to using -O2... most likely in the develop branch rather than main_v12.0. * Per #3075, getting segfault from point2grid. Null terminating character vectors after reading NetCDF attributes just to be safe. * Unrelated to #3075, only whitespace changes. * Per #3075, fix logic of the write_nc(...) function so that all variable attributes and added and defined prior to writing the data for that variable. Writing attributes AFTER the data, as we had been doing, causes unexpected failures, as found when compiled with Intel. * Per #3075, update args to write_nc(...) to minimize regression test diffs. * Per #3075, fix madis2nc i_buf definition problem. * Per #3075, more refinement of i_buf definition in madis2nc for acars and raob inputs. * Per #3075, remove FFLAGS from development.docker becuase there's no good reason to add it. * replace raw array with std array * replace raw array with std array in 2 other places * Per #3075, fix clear bug in vx_bool_calc/tokenizer.cc where the = assignment operator is used when the == comparison operator is needed. --------- Co-authored-by: George McCabe <[email protected]> --------- Co-authored-by: George McCabe <[email protected]> Co-authored-by: Julie Prestopnik <[email protected]> Co-authored-by: John Halley Gotway <[email protected]> Co-authored-by: MET Tools Test Account <[email protected]> Co-authored-by: metplus-bot <[email protected]>
…ed in the #3076 pull request. The logic for converting to camel case was in the wrong order and the details of this are described in this comment: #3078 (comment)
…owing for an empty input in the statement_list.
* Per #3077, back out the yystate patching logic added to Makefile.am for MET #2408 to allow for the parsing of empty configuration file. This has caused a 'shift/reduce conflict' warning message from the Intel compiler, is flagged as problem by '-fsanitize=address', and very well may be causing the sporadic yyerror failures described in #3077. Recommend testing with this change to see if the yyerror's go away, but also re-testing MET #2408 to assess the handling of empty configuration files. * Per #3075, update config.tab.yy/.cc to allow for empty inputs but allowing for an empty input in the statement_list.
* use custom GitHub Action to trigger METplus use cases * Updating values * Bugfix #3020 main_v12.0 grid_stat_seeps (#3022) * Per #3020, add missing GridStatNcOutInfo::do_seeps flag and use it to determine if SEEPS information should be written to the Grid-Stat NetCDF matched pairs output file. * Unrelated to #3020, fix broken NetCDF cf-conventions links in the User's Guide. * Per #3020, no real changes. Just whitespace * Update to reflect usage of oneAPI compilers * Updating file to reflect usage of oneAPI compilers * Hotfix to the main_v12.0 branch after PR #3022 fixed a SEEPS bug. The GridStatConfig_SEEPS config file needs to be updated with nc_pairs_flag.seeps = TRUE in order for the same output to be produced by the unit tests. * Adding In Memoriam * Feature #3032 main_v12.0 docs data type (#3040) * Per #3032, add data type column to all of the output tables * Per #3032, remove the first row from each output table since its info is repeated from the table name. Additional changes for consistency and accuracy in column names. * Update docs/Users_Guide/gsi-tools.rst Co-authored-by: Julie Prestopnik <[email protected]> --------- Co-authored-by: Julie Prestopnik <[email protected]> * Making a superficial change in the main_v12.0 branch to trigger GHA to create and push an updated test output image. * Feature #3033 v12.0.0 (#3042) * Per #3033, update version info, consolidate release notes, and add upgrade instructions. * Per #3033, remove all instances of 'Bugfix: ' from the release notes since it's redundant with the dropdown name * Per #3030, based on request from Randy Pierce, also add MTD header columns to met_header_columns_v12.0.txt to make it easier to parse the output from MET. * Per #3033, fix typo and correct alignment in table * Update install_met_env.acorn Removing reference to beta version * Update install_met_env.cactus Remove references to beta version * Update install_met_env.cactus Update paths for eckit and atlas * Update install_met_env.wcoss2 Remove beta references * Fix typo, missing one * to make SciPy bold in appendixF.rst * Per #3051, update unit tests so that installed files are found relative to MET_BASE (<install_loc>/share/met) and other files that are only in the MET repo are found relative to MET_TEST_BASE (MET/internal/test_unit). Also remove MET_BUILD_BASE env var (#3052) * Bugfix #3054 main_v12.0 parusr (#3068) * Per #3054, fix PARUSR BUFRLIB error by solving the upstream reference to temporary memory returned by c_str(). Store a copy of the temporary variable name in a string rather than a pointer to temporary memory. Note that I checked all other calls to c_str() in pb2nc.cc and found these 2 instances to be only problematic ones. All others are used as arguments to functions for which a copy is made. * Unrelated to #3054, but discovered while investigating the dtcenter/METplus#2875 discussion, the PairBase::calc_obs_summary() function loops over map entries and attempts to update the mapped 'summary_val' value. However, the current version only updates it in a copy and not what's actually in the map. This changes how we loop over the map to actually udpate its contents. Note that the only impact is fixing a log file to accurately report the 'summary_val'. So this is really a logging bug. * Per #3054, revert emplace_back() to its original push_back() to make the bugfix diffs as limited as possible. * Per #3054, correct bugfix in PairBase::calc_obs_summary() in pair_base.cc --------- Co-authored-by: MET Tools Test Account <[email protected]> * Per #3070, updates for the 12.0.1 bugfix release. (#3071) * Updating file for 12.0.1 installation for NCO * Updating to 12.0.1 for NCO * Update and rename 12.0.0_acorn to 12.0.1_acorn for NCO * Rename 12.0.0.lua_wcoss2 to 12.0.1.lua_wcoss2 for NCO * Update 12.0.0_hercules * Update install_met_env.hercules * Update compiler and MET version in install_met_env.orion * Update compiler and MET version in 12.0.0_orion * Bugfix #3075 main_v12.0 optimization (#3076) * Per #3075, update get_att_value_chars() utility function to store the attribute value in a string rather than a fixed-length character array for which overflow may occur. * Per #3075, switch from compiling MET in Docker using the -g debug flag to using -O2 optimization since that's what how we configure installations on supported platforms. This makes the testing environment more simliar to the deployed versions. And we've found some bugs due to unexpected behavior when compiled with -O2 optimization. * Per #3075, remove accidentally committed log file * Per #3075, update TrackInfo::diag_name() to return a string rather than a pointer to temporary memory to solve the problem with diagnostic names in unit_tc_pairs.xml when compiled with optimization enabled. * Per #3075, update read_netcdf_logic() to store pointers to class members rather than local variables which go out of scope. * Per #3075, don't need to use local variables at all. * Per #3075, switch to using STL vectors for memory management * Per #3075, reimplement month_name_to_m() with stl strings to avoid variable length arrays. * Per #3075, update MetNcFile::readFile() to use stl vectors instead of variable length arrays * Per #3075, update NcCfFile member functions to use stl vectors instead of variable length arrays * Per #3075, update is_netcdf_file() to use stl vectors instead of variable length arrays * Per #3075, update 3d_conv.cc to use stl vectors instead of variable length arrays * Per #3075, update the vx_util library to use stl vectors instead of variable length arrays * Per #3075, update ensemble_stat to use stl vectors instead of variable length arrays * Per #3075, update decode_lat_lon() to use stl vectors instead of variable length arrays * Per #3075, update grid_diag to use stl vectors instead of variable length arrays * Per #3075, update ioda2nc to use stl vectors instead of variable length arrays * Per #3075, update madis2nc to use stl vectors instead of variable length arrays * Per #3075, update mode_graphics to use stl vectors instead of variable length arrays * Per #3075, update the vx_nc_obs library to use stl vectors instead of variable length arrays * Per #3075, update plot_point_obs to use stl vectors instead of variable length arrays * Per #3075, update point_stat to use stl vectors instead of variable length arrays * Per #3075, update wavelet_stat to use stl vectors instead of variable length arrays * Per #3075, no real code change, just whitespace * Per #3075, removing commented out code * Per #3075, need to add 2 to account for time_count being initialized to -1. An array of length 0 is different from a vector of length 0. * Per #3075, can't use 2D vectors to read data from NetCDF files into a contiguous block of memory. * Per #3075, can't use 2D vectors to read data from NetCDF files into a contiguous block of memory. * Per #3075, update looping logic * Per #3075, eliminate all instances of vector<vector<type>> since it's not stored in contiguous memory and therefore not useful for reading data from the NetCDF files. * Per #303075, bit more madis2nc changes. * Per #3075, fix Nx typo * Per #3075, fix chaNetCDF attribute character type * Per #3075, minor changes to satisfy SonarQube findings. * Per #3075, when sizing vectors of type <char> add one for the trailing null. * Per #3075, remove debugging code. * Per #3075, unit_ioda2nc.xml fails when compiled with Intel since there are issues parsing NC_STRING attribute types. Reverting back to the previous logic from main_v12.0 since that works. * Per #3750, back out the change to using -O2 in development.docker. With it, differences on flagged by GHA. I'd like to make sure all the changes on this branch cause NO differences before switching to using -O2... most likely in the develop branch rather than main_v12.0. * Per #3075, getting segfault from point2grid. Null terminating character vectors after reading NetCDF attributes just to be safe. * Unrelated to #3075, only whitespace changes. * Per #3075, fix logic of the write_nc(...) function so that all variable attributes and added and defined prior to writing the data for that variable. Writing attributes AFTER the data, as we had been doing, causes unexpected failures, as found when compiled with Intel. * Per #3075, update args to write_nc(...) to minimize regression test diffs. * Per #3075, fix madis2nc i_buf definition problem. * Per #3075, more refinement of i_buf definition in madis2nc for acars and raob inputs. * Per #3075, remove FFLAGS from development.docker becuase there's no good reason to add it. * replace raw array with std array * replace raw array with std array in 2 other places * Per #3075, fix clear bug in vx_bool_calc/tokenizer.cc where the = assignment operator is used when the == comparison operator is needed. --------- Co-authored-by: George McCabe <[email protected]> * Per #3075, adding a hotfix to main_v12.0 that should have been included in the #3076 pull request. The logic for converting to camel case was in the wrong order and the details of this are described in this comment: #3078 (comment) * Bugfix #3077 main_v12.0 yyerror (#3083) * Per #3077, back out the yystate patching logic added to Makefile.am for MET #2408 to allow for the parsing of empty configuration file. This has caused a 'shift/reduce conflict' warning message from the Intel compiler, is flagged as problem by '-fsanitize=address', and very well may be causing the sporadic yyerror failures described in #3077. Recommend testing with this change to see if the yyerror's go away, but also re-testing MET #2408 to assess the handling of empty configuration files. * Per #3075, update config.tab.yy/.cc to allow for empty inputs but allowing for an empty input in the statement_list. * Feature #3081 v12.0.2 (#3085) * Per #3081, roll the MET version number from 12.0.1 to 12.0.2 and add release notes. * Per #3081, improve the title for issue #3075 --------- Co-authored-by: George McCabe <[email protected]> Co-authored-by: Julie Prestopnik <[email protected]> Co-authored-by: John Halley Gotway <[email protected]> Co-authored-by: MET Tools Test Account <[email protected]> Co-authored-by: metplus-bot <[email protected]>
* Per #3087, update logic in VarInfoNcMet::set_magic(...) to actuall store the requested level string to allow for discriminating between multiple U/V vertical level matches. * Unrelated to #3087, delete unneeded 'int errno;' local variable from temp_file.cc that caused an unexpected copmilation error with GCC 9.4.0 on Ubuntu as described in the dtcenter/METplus#2897 discussion. * Per #3087, tweak logic to handle '*' and fix resolve regression test differences. * Per #3087, add regrid_data_plane and grid_stat unit test to demostrate creating vector pairs at multiple levels from NetCDF input files. * Per #3087, forgot to add the Grid-Stat config file needed for the new unit test. * Per #3075, refine name and logic the new tests. * Per #3087, update ConcatString class to simplify from a pointer to a string to just a string itself. This is based on SonarQube code smells, but the implementation is much simpler and easier to maintain. * Per #3087, drive down a few more SonarQube code smells. * Per #3087, back out the ConcatString changes to switch from enum to enum class and the use of explicit since those had huge and wide-ranging impacts. Touching that many files is not worth it to reduce SonarQube code smells. * Per #3087, modify the existing point2grid_pb2nc_big_input test in unit_point2grid.xml by switching from requesting the 'Z2' level to using '*', like all the other simliar point2grid tests. Note that I DID actually test to confirm that 'Z2' and '*' produce the same result. So specifying Z2 does NOT actually filter the obs data as you'd expect it would. With this change, the diff of the output from the test should go away for PR #3088. * Unrelated to #3087, but pointed out by @j-opatz, removing an outdated line from the Ensemble-Stat chapter of the MET User's Guide referencing the 'ens' dictionary which was removed at the same time Gen-Ens-Prod was created.
Describe the Problem
Multiple runtime problems were discovered when MET was compiled and tested with the
-O2
level of optimization in support of creating a METplus build through Conda. This issue is to resolve these runtime problems related to optimization. It will only be complete when (1) all MET unit tests run to completion with no error and (2) no significant differences remain between the output from MET compiled with optimization and without.Additional recommendations:
-O2
) enabled to more closely match how it is run in production.-O1
,-O2
, and-O3
.-O1
, 3 files have size diffs > 1% and real diffs are flagged in 1 NetCDF file:-O2
, same 4 diff files as-O1
plus 15 more related to BCLP derivation in TC-Pairs.-O3
, same 19 diff files as-O2
.FFLAGS
did NOT solve the BCLP derivation diffs.-O1
,-O2
, and-O3
.unit_modis.xml
andunit_lidar2nc.xml
are skipped because those tools are not compiled.unit_tc_diag.xml
andunit_python.xml
are skipped because of issues running Python.unit_point2grid.xml
segfaults on theTEST: point2grid_GOES_16_AOD_TO_G212_GAUSSIAN
test:-O1
, ...-O2
, ...-O3
, ...The runtime problems, along with their fixes, are described below:
unit_pb2nc.xml
fails as described in MET issue Bugfix: Fix the PARUSR BUFRLIB failure when PB2NC is compiled with-O2
optimization #3054 and fixed by pull request Bugfix #3054 main_v12.0 parusr #3068. This was due to referencing temporary memory that was impacted by optimization.unit_point2grid.xml
segfaults on thepoint2grid_GOES_16_ADP
test.The point2grid_GOES_16_ADP unit test fails to run when installed via conda. I suspect that this is due to optimization being turned on, similar to the bug from #3054.
Here is an example command that fails on seneca:
This produces a segfault when compiled with the
-O2
optimization flag.This problem has been fixed on the bugfix branch for this issue and was due to buffer overflow, reading > 512 characters into a buffer of that size.
unit_tc_pairs.xml
produces bad output in the TCDIAG lines, where the diagnostic names contains garbage strings.While
tc_pairs
completes without error, the unit test validation logic fails when reading the corrupted strings. This was fixed by updating theTrackInfo::diag_name()
accessor function to return a string copy of the diagnostic name rather than aconst char *
pointer to temporary memory.unit_point2grid.xml
aborts on thepoint2grid_2D_time
test on line 442 ofnc_cf_file.cc
. The debugger indicates that the NetCDF groupId value being read is corrupted:When it should have the same groupId as _xDim (confirmed by debugging the non-optimized version):
The problem lies in the
read_netcdf_grid()
function innc_cf_file.cc
where the_xDim
and_yDim
pointers were pointing to a local variable. The fix is to search for and point them to class variables that won't go out of scope.-g -O2
. However, comparing to the output from the unoptimized nightly build runs reveals differences, including differences in the number of TCST lines written bytc_pairs
:However, all the differences in these 2 files are for the BCLP model, which is computed using NHC-provided FORTRAN code. So it's possible an optimization problem also resides in the in FORTRAN code within MET. For the time being, I'll retest WITHOUT setting
FFLAGS='-O2'
to see if differences remain.unit_tc_diag.xml
segfaults the same way on Intel builds for all optimization levels. The problem occurs in the Python diagnostic computation step.unit_python.xml
also fails on thepython_plot_point_obs_met_nc_to_pandas
test after completing many successful tests.These Python failures could very well be related to us linking to this
/nrit/ral/met-python3/bin/python3.10
version of Python built with GNU. Perhaps we need to link to an Intel Python build instead? For the time being, I've just skipped overunit_tc_diag.xml
andunit_python.xml
.Expected Behavior
All MET unit tests should run to completion and create numerically equivalent output regardless of the optimization level.
Environment
Describe your runtime environment:
Reproduced on the MET development machine named seneca.
To Reproduce
Describe the steps to reproduce the behavior:
CFLAGS
,CXXFLAGS
, andFFLAGS
set to include-O2
.bin/unit_test.sh
to run all of the MET unit tests.Relevant Deadlines
List relevant project deadlines here or state NONE.
Funding Source
2702701 to support conda builds.
Define the Metadata
Assignee
Labels
Milestone and Projects
Define Related Issue(s)
Consider the impact to the other METplus components.
No impacts.
Bugfix Checklist
See the METplus Workflow for details.
Branch name:
bugfix_<Issue Number>_main_<Version>_<Description>
Pull request:
bugfix <Issue Number> main_<Version> <Description>
Select: Reviewer(s) and Development issue
Select: Milestone as the next bugfix version
Select: Coordinated METplus-X.Y Support project for support of the current coordinated release
Branch name:
bugfix_<Issue Number>_develop_<Description>
Pull request:
bugfix <Issue Number> develop <Description>
Select: Reviewer(s) and Development issue
Select: Milestone as the next official version
Select: MET-X.Y.Z Development project for development toward the next official release
The text was updated successfully, but these errors were encountered: