Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

History crashes when there are more than 3 output files #332

Closed
jimmielin opened this issue Dec 10, 2024 · 0 comments
Closed

History crashes when there are more than 3 output files #332

jimmielin opened this issue Dec 10, 2024 · 0 comments
Assignees
Labels
bug Something isn't working correctly

Comments

@jimmielin
Copy link
Member

What happened?

Model crashes with error when there are more than 3 history file slices written:

 1:  Opened file 2412.sima.dev.FKESSLER.intel.cam.h1i.0001-01-01-03600.nc to write
 1:          130
 1:  Opened file 2412.sima.dev.FKESSLER.intel.cam.h1i.0001-01-01-07200.nc to write
 1:          131
 1:  Opened file 2412.sima.dev.FKESSLER.intel.cam.h1i.0001-01-01-10800.nc to write
 1:          132
 1:
 1:
 1:
 1:
 1:
 1:
 1: forrtl: severe (408): fort: (7): Attempt to use pointer FILE_DESC when it is not associated with a target
 1:
 1: Image              PC                Routine            Line        Source
 1: cesm.exe           000000000047FBC5  cam_abortutils_mp          79  cam_abortutils.F90
 1: cesm.exe           00000000004E4090  cam_pio_utils_mp_        1254  cam_pio_utils.F90
 1: cesm.exe           0000000000E6EF7D  cam_hist_file_mp_         991  cam_hist_file.F90
 1: cesm.exe           00000000004963DD  cam_history_mp_hi         211  cam_history.F90
 1: cesm.exe           000000000048235A  cam_comp_mp_cam_t         489  cam_comp.F90
 1: cesm.exe           0000000000460DEF  atm_comp_nuopc_mp        1170  atm_comp_nuopc.F90

What are the steps to reproduce the bug?

  1. Check out latest CAM-SIMA (might have to merge Implements cam_thermo_water_update and CCPPized check_energy #316 to allow build)
  2. Create any case with history output (FKESSLER)
  3. ./xmlchange DEBUG=true
  4. ./case.setup and build
  5. Use the following history configuration to ensure there are more than 3 slices written
hist_add_inst_fields; h1: T,U,V
hist_max_frames; h1: 1
hist_output_frequency; h1: 1*nhours
hist_precision;h1:REAL64
  1. Submit case

What CAM-SIMA hash were you using?

f999707

What machine were you running CAM-SIMA on?

CISL machine (e.g. cheyenne)

What compiler were you using?

Intel

Path to a case directory, if applicable

No response

Will you be addressing this bug yourself?

Yes

Extra info

It appears that cam_register_open_file in src/utils/cam_abortutils.F90 linked-list with pooling logic leads to the second file dropping its %file_desc.

If a debug printout is written above the check

      do while (associated(of_ptr))

         ! debug here
         if(.not. associated(of_ptr%file_desc)) then
             write(6,*) of_ptr%file_name
         endif
         ! / debug here

         if (file%fh == of_ptr%file_desc%fh) then ! <---- segfault location
            call endrun(subname//': Cannot register '//trim(file_name)//', file already open as '//trim(of_ptr%file_name))
         end if
         of_ptr => of_ptr%next
      end do

It can be seen that starting from the second file the linked list is dangling:

 1:  !!! debug hplin: file_desc unalloc, fn
 1:  2412.sima.dev.FKESSLER.intel.cam.h1i.0001-01-01-07200.nc

There are a couple of issues with the pooling code that could be improved for robustness. A PR will follow this issue.

@jimmielin jimmielin added the bug Something isn't working correctly label Dec 10, 2024
@jimmielin jimmielin self-assigned this Dec 10, 2024
jimmielin added a commit that referenced this issue Dec 16, 2024
… with history output (#333)

Tag name (required for release branches):
Originator(s): @jimmielin 

Description (include the issue title, and the keyword ['closes',
'fixes', 'resolves'] followed by the issue number):

- Fixes #332 (`max_mdims` used before defined)
- Fixes #331 (unassociated `of%file_desc` in `cam_register_open_file`
leading to crash with >2 history files)

Describe any changes made to build system: N/A

Describe any changes made to the namelist: N/A

List any changes to the defaults for the input datasets (e.g. boundary
datasets): N/A

List all files eliminated and why: N/A

List all files added and what they do: N/A

List all existing files that have been modified, and describe the
changes:
(Helpful git command: `git diff --name-status
development...<your_branch_name>`)

```
Fixes #331
M       src/history/cam_hist_file.F90

Fixes #332
M       src/utils/cam_abortutils.F90

```

If there are new failures (compared to the
`test/existing-test-failures.txt` file),
have them OK'd by the gatekeeper, note them here, and add them to the
file.
If there are baseline differences, include the test and the reason for
the
diff. What is the nature of the change? Roundoff?

derecho/intel/aux_sima:

derecho/gnu/aux_sima:

If this changes climate describe any run(s) done to evaluate the new
climate in enough detail that it(they) could be reproduced:

CAM-SIMA date used for the baseline comparison tests if different than
latest:
@github-project-automation github-project-automation bot moved this from To Do to Done in CAM Development Dec 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working correctly
Projects
Status: Done
Development

No branches or pull requests

2 participants