Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Various bugs/problems found running MOM6 on raijin in Aus #198

Closed
nichannah opened this issue Jul 10, 2015 · 8 comments
Closed

Various bugs/problems found running MOM6 on raijin in Aus #198

nichannah opened this issue Jul 10, 2015 · 8 comments

Comments

@nichannah
Copy link
Collaborator

This is a bit of a catch-all issue for problems that I've noticed while running MOM6 on raijin in Aus. Also I'm able to run valgrind on the compute nodes there, this may reveal some issues which I'll post here.

  • make_alloc() is not supported by the default gcc/gfortran version (4.4.7) on raijin. There is an assert in the code that checks for this but the error message could be improved.
  • line 1656 in MOM_set_diffusivity.F90 causes a 'floating invalid' exception. Floating point exceptions when using -fpe0 #194 will fix this. Interestingly the approach making sure that the denominator or is at least G%H_subroundoff works on gaea but not on raijin.
@nichannah
Copy link
Collaborator Author

  • various MEKE% variables may be garbage at the edges ... e.g. in parameterizations/lateral/MOM_MEKE.F90:1190 MEKE%Kh is initialised on range j=js-1, je+1, i=is-1,ie+1 using barotrFac2, however this is only defined from js-je, is-ie. The same goes for bottomFac2 and variables initialised using this. .... actually it might only be a problem with MEKE%Kh

@adcroft
Copy link
Collaborator

adcroft commented Jul 14, 2015

Concerning MEKE: is the issue that the loops at L439, L446, L452 do not need to be wide since the is a halo update at L457?

If reducing those loop ranges to the computational domain works under valgrind then I think the model solutions will be unchanged (because of the halo update).

@nichannah
Copy link
Collaborator Author

OK, I'll give that a try. Thanks

@adcroft
Copy link
Collaborator

adcroft commented Jul 14, 2015

Just realized that LmixScale is used at L465 in a wider loop so that needs to be calculated wide as you suggested earlier...

@adcroft
Copy link
Collaborator

adcroft commented Jul 14, 2015

Something like this...

% git diff -U0 src/parameterizations/lateral/MOM_MEKE.F90
diff --git a/src/parameterizations/lateral/MOM_MEKE.F90 b/src/parameterizations/lateral/MOM_MEKE.F90
index 562c22f..0ff7360 100644
--- a/src/parameterizations/lateral/MOM_MEKE.F90
+++ b/src/parameterizations/lateral/MOM_MEKE.F90
@@ -439 +439 @@ subroutine step_forward_MEKE(MEKE, h, SN_u, SN_v, visc, dt, G, CS)
-          do j=js-1,je+1 ; do i=is-1,ie+1
+          do j=js,je ; do i=is,ie
@@ -446 +446 @@ subroutine step_forward_MEKE(MEKE, h, SN_u, SN_v, visc, dt, G, CS)
-          do j=js-1,je+1 ; do i=is-1,ie+1
+          do j=js,je ; do i=is,ie
@@ -452 +452 @@ subroutine step_forward_MEKE(MEKE, h, SN_u, SN_v, visc, dt, G, CS)
-        do j=js-1,je+1 ; do i=is-1,ie+1
+        do j=js,je ; do i=is,ie
@@ -636 +636 @@ subroutine MEKE_lengthScales(CS, MEKE, G, SN_u, SN_v, &
-  do j=js,je ; do i=is,ie
+  do j=js-1,je+1 ; do i=is-1,ie+1

@nichannah
Copy link
Collaborator Author

When running OM4_025 on raijin:

forrtl: error (63): output conversion error, unit -5, file Internal Formatted Write
Image PC Routine Line Source
MOM6 00000000054E649A Unknown Unknown Unknown
MOM6 00000000054E4F96 Unknown Unknown Unknown
MOM6 0000000005490D10 Unknown Unknown Unknown
MOM6 000000000542DC5E Unknown Unknown Unknown
MOM6 000000000542D19F Unknown Unknown Unknown
MOM6 000000000546E8AF Unknown Unknown Unknown
MOM6 00000000007035F7 ice_grid_mod_mp_s 573 ice_grid_mod.F90
MOM6 00000000006F8293 ice_grid_mod_mp_s 459 ice_grid_mod.F90
MOM6 00000000018ACC05 ice_model_mod_mp_ 3732 ice_model.F90
MOM6 0000000001ACB443 coupler_main_IP_c 1447 coupler_main.F90
MOM6 0000000001ABBACC MAIN__ 428 coupler_main.F90
MOM6 000000000040D36C Unknown Unknown Unknown
libc.so.6 00007F88B0754D5D Unknown Unknown Unknown
MOM6 000000000040D269 Unknown Unknown Unknown

The same thing in the ocean:

forrtl: error (63): output conversion error, unit -5, file Internal Formatted Write
Image PC Routine Line Source
MOM6 00000000054E31DA Unknown Unknown Unknown
MOM6 00000000054E1CD6 Unknown Unknown Unknown
MOM6 000000000548DA50 Unknown Unknown Unknown
MOM6 000000000542A99E Unknown Unknown Unknown
MOM6 0000000005429EDF Unknown Unknown Unknown
MOM6 000000000546B5EF Unknown Unknown Unknown
MOM6 00000000007DAAC5 mom_grid_initiali 440 MOM_grid_initialize.F90
MOM6 00000000007D8BAE mom_grid_initiali 399 MOM_grid_initialize.F90
MOM6 0000000001E1A528 mom_fixed_initial 93 MOM_fixed_initialization.F90
MOM6 00000000033586BF mom_mp_initialize 1962 MOM.F90
MOM6 0000000001266DE9 ocean_model_mod_m 226 ocean_model_MOM.F90
MOM6 0000000001AC8917 coupler_main_IP_c 1466 coupler_main.F90
MOM6 0000000001AB8814 MAIN__ 428 coupler_main.F90
MOM6 000000000040D36C Unknown Unknown Unknown
libc.so.6 00007FB134F25D5D Unknown Unknown Unknown

@nichannah
Copy link
Collaborator Author

forrtl: error (73): floating divide by zero
Image PC Routine Line Source
MOM6 0000000001C96B7A ice_dyn_cgrid_mp_ 687 ice_dyn_cgrid.F90
MOM6 00000000016EC0B5 ice_model_mod_mp_ 1968 ice_model.F90
MOM6 0000000001525A41 ice_model_mod_mp_ 134 ice_model.F90
MOM6 0000000001AB8626 MAIN__ 783 coupler_main.F90
MOM6 000000000040D36C Unknown Unknown Unknown
libc.so.6 00007F21EC6EAD5D Unknown Unknown Unknown
MOM6 000000000040D269 Unknown Unknown Unknown

adcroft referenced this issue in NOAA-GFDL/SIS2 Jul 15, 2015
- In NOAA-GFDL/MOM6#198, @nicjhan reported problems running OM4_025
  with intel compiler in debug mode. This fixes the "output conversion
  error, unit -5, file Internal Formatted Write: error mentioned in that
  issue for MOM.
- No answer changes.
adcroft added a commit that referenced this issue Jul 15, 2015
- In #198, @nicjhan reported problems running OM4_025 with intel
  compiler in debug mode. This fixes the "output conversion error,
  unit -5, file Internal Formatted Write: error mentioned in that
  issue for MOM.
- No answer changes.
@nichannah
Copy link
Collaborator Author

I am closing this issue and breaking the remain problems up into separate issues. They are:

make_alloc: #202
floating invalid: #194
MEKE%Kh: #203

Formatting issues have been fixed and merged by @adcroft

marshallward pushed a commit to OlgaSergienko/MOM6 that referenced this issue Nov 10, 2022
* Setup OBC segments for COBALT/OBGC tracers

    - These are updates required to setup OBC segments for OBGC tracers.
    - Since COBALT package has more than 50 tracers using the MOM6 table
      mechanism for setting up OBC segments is not feasible. Rather, this
      update delegates such setup to mechanims used in ocean_BGS tracers
      leaving MOM6 mechanism for native tracers intact.
    - Fixed issues caught by MOM6 githubCI

* Add capability to change obc segment update period

- COBALT tracers do not need as frequent segment bc updates and can
  use a larger update period to speed up the model.
  This commit introduces a new parameter DT_OBC_SEG_UPDATE_OBGC
  that can be adjusted for obc segment update period.
- This commit applies the change only to BGC tracers but can easily
  be changed to apply for all.

* Insert missing US%T_to_sec

- The unit conversion factor was missing causing a crash in a newer test.

* Updates from Andrew Ross

- Avoid low initial values in the tracer reservoirs

* Per Andrew Ross review

* corrected indentation per review

* Avoid using module vars per review request

- Reviewer asked to avoid using module variables with "save" attributes.
- This commit hides the module variables inside the existing OBC type.

* Coding style corrections per review

* Modification per review: do_not_log if .not.associated(CS%OBC)

Co-authored-by: Robert Hallberg <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants