You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We are running UFS on WCOSS2 HPE/Cray systems to evaluate RRTMGP radiation scheme and find that it fails in mo_gas_concentrat routine in longwave code if more than 1 OMP thread is used. shortwave code (and other RRTMGP routines executed before that) are OK.
Steps to Reproduce
Build UFS with OpenMP enabled
Run with OMP_NUM_THREADS=1 and get succesful completion
Run with OMP_NUM_THREADS=2 code will get segfault in mo_gas_concentrat routine
78: [h24c21:2818 :0:2884] Caught signal 11 (Segmentation fault: address not mapped to object at address (nil))
56: [h24c12:18836:0:18888] Caught signal 11 (Segmentation fault: address not mapped to object at address (nil))
41: [h24c12:18821:0:18894] Caught signal 11 (Segmentation fault: address not mapped to object at address (nil))
68: forrtl: severe (153): allocatable array or pointer is not allocated
The text was updated successfully, but these errors were encountered:
@pj-gdit Thanks for bringing this to our attention!
I have a fix in place and will open a PR soon, just waiting to combine with some other RRTMGP related cleanup.
I downloaded the modified src, recompiled, and ran some 1 and 2 OMP thread tests with our c96 UFS case. Threading now runs successfully but I noticed that the answers change every run when number of threads is greater than 1. The old RRTMG code does not exhibit this - answers are bit reproducible every run and across different number of threads. There must be some race condition within RRTMGP. I will explore this some more. Thanks again.
Description
We are running UFS on WCOSS2 HPE/Cray systems to evaluate RRTMGP radiation scheme and find that it fails in mo_gas_concentrat routine in longwave code if more than 1 OMP thread is used. shortwave code (and other RRTMGP routines executed before that) are OK.
Steps to Reproduce
Additional Context
Output
Traceback and errors from UFS
68: fv3.exe 00000000043CEE02 Unknown Unknown Unknown
68: fv3.exe 000000000393039E mo_gas_concentrat 288 mo_gas_concentrations.F90
68: fv3.exe 00000000038066A7 rrtmgp_lw_main_mp 300 rrtmgp_lw_main.F90
68: fv3.exe 0000000003429255 ccpp_fv3_gfs_v17_ 518 ccpp_FV3_GFS_v17_p8_rrtmgp_radiation_cap.F90
68: fv3.exe 00000000030E13B6 ccpp_static_api_m 943 ccpp_static_api.F90
68: fv3.exe 00000000030DDDA2 ccpp_driver_mp_cc 188 CCPP_driver.F90
78: [h24c21:2818 :0:2884] Caught signal 11 (Segmentation fault: address not mapped to object at address (nil))
56: [h24c12:18836:0:18888] Caught signal 11 (Segmentation fault: address not mapped to object at address (nil))
41: [h24c12:18821:0:18894] Caught signal 11 (Segmentation fault: address not mapped to object at address (nil))
68: forrtl: severe (153): allocatable array or pointer is not allocated
The text was updated successfully, but these errors were encountered: