Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compiler error in framework/posix.F90 #230

Closed
jedwards4b opened this issue Dec 10, 2022 · 25 comments
Closed

compiler error in framework/posix.F90 #230

jedwards4b opened this issue Dec 10, 2022 · 25 comments

Comments

@jedwards4b
Copy link
Collaborator

Using the cray compiler cce/15.0.0 on gust I am seeing an error in file posix.F90

ftn-1279 ftn: ERROR SIGLONGJMP_POSIX, File = ../../../../../work/jedwards/sandboxes/cesm2_x_alpha/components/mom/MOM6/src/framework/posix.F90, Line = 191, Column = 31
Procedure "longjmp" is defined at line 178 (/glade/work/jedwards/sandboxes/cesm2_x_alpha/components/mom/MOM6/src/framework/posix.F90). The type of this argument does not agree with dummy argument "ENV".

@jedwards4b
Copy link
Collaborator Author

@marshallward
Copy link

This probably needs something similar to the sigsetjmp handler, see NOAA-GFDL#196

I can prepare a patch if necessary.

@marshallward
Copy link

marshallward commented Dec 10, 2022

This has also been reported by people on very old Linux kernels distros where longjmp was not available in libc.

@jedwards4b
Copy link
Collaborator Author

@marshallward thank you, a patch would be great.

@marshallward
Copy link

@jedwards4b I have prepared a branch here

https://github.com/marshallward/MOM6/tree/posix_jmp_fixes

However, this is based on the dev/gfdl branch at NOAA-GFDL/MOM6. I don't think this can be easily applied to the dev/ncar branch, which is missing some changes to these files.

I could produce a similar patch for dev/ncar, but this would likely tangle up the histories of the repositories, which we generally try to avoid.

If this isn't urgent, then we could update dev/ncar through our standard process via main at mom-ocean/MOM6, but this requires review from partners and can be rather slow. Optionally, dev/ncar could try to update against main, which may be close enough to dev/gfdl to accept this patch.

If none of those options work, then we look into some other solutions.

@marshallward
Copy link

I should add, this patch just disables the function for standard builds.

The error does suggest a different problem with mismatched function signatures, and maybe we could fix this another way. But it would probably take longer to understand this, and we already needed to disable those functions for other reasons.

@gustavo-marques
Copy link
Collaborator

We having a pending PR to update dev/ncar against main.

@marshallward
Copy link

The configure.ac patch was incompatible with #229, but the other files went through. Maybe I could figure something out.

@jedwards4b
Copy link
Collaborator Author

@marshallward Thank you for the patch but I don't think that it will work. I've sent a test code to the cce compiler developers to see if they understand the issue.

@marshallward
Copy link

The patch replaces the function with a local inert function. Do you know of any reason that this won't work for you?

We have cray compilers on our systems so I will test it out.

@jedwards4b
Copy link
Collaborator Author

Because the issue isn't the function itself, it's the definition of the input variable env which the inclusion of the inert function does not change.

@marshallward
Copy link

True but the current version assumes consistency between the system longjmp and the locally defined wrapper. The patch is matching calls to a new function (longjmp_missing), which should guarantee a match (even if not matching the libc signature).

I believe the problem is the definition of jmp_buf which this patch should avert. But if you don't want to test this then I can try it locally.

@jedwards4b
Copy link
Collaborator Author

I would be happy to test.

@jedwards4b
Copy link
Collaborator Author

@marshallward have you had a chance to work on this further? I was about to open a ticket against the cce compiler when a college pointed out this from the fortran standard:

@jedwards4b
Copy link
Collaborator Author

A procedure defined by means of Fortran shall not invoke setjmp or longjmp (ISO/IEC 9899:1999, 7.13). If a procedure defined by means other than Fortran invokes setjmp or longjmp, that procedure shall not cause any procedure defined by means of Fortran to be invoked. A procedure defined by means of Fortran shall not be invoked as a signal handler (ISO/IEC 9899:1999, 7.14.1).

@marshallward
Copy link

Yes, we're breaking that bit of the standard, but the patch will disable this function, and will only re-enable them if they are confirmed to be available and work correctly. On our end, this is handled by autoconf. (The interesting part is that it seems to work fine in GNU, Intel, and PGI/Nvidia, but that is probably a separate discussion).

I have not tested this on our Cray compilers since I was waiting to hear how you went with the patch, but I can try to find some time and get back to you.

@jedwards4b
Copy link
Collaborator Author

I'm sorry I thought that you were reworking the patch - do you have something I should try?

@marshallward
Copy link

marshallward commented Dec 22, 2022

This was the branch: https://github.com/marshallward/MOM6/tree/posix_jmp_fixes

I think it will work, since it disables setjmp etc as the default config, which is the correct standard-conforming solution, as you point out. But of course its hard to predict if it will.

BTW I should have probably mentioned that we do not test in Cray because there are other parts which failed to work correctly. This all predates me, so I don't know if it's still true.

How urgent is this on your end? If you do not need it right away, we can push this through our usual process to the main branch. Also, it would be useful to know if we need to add Cray in our test suite.

@marshallward
Copy link

I can reproduce this on my system, although it is a warning on my end:

ftn -DPACKAGE_NAME=\"MOM6\" -DPACKAGE_TARNAME=\"mom6\" -DPACKAGE_VERSION=\"\ \" -DPACKAGE_STRING=\"MOM6\ \ \" -DPACKAGE_BUGREPORT=\"https://github.com/NOAA-GFDL/MOM6/issues\" -DPACKAGE_URL=\"https://github.com/NOAA-GFDL/MOM6\" -DHAVE_MPI=1 -DSIGSETJMP_NAME=\"__sigsetjmp\" -DHAVE_MPI=1 -DSIZEOF_JMP_BUF=200 -DSIZEOF_SIGJMP_BUF=200 -I../../ac/../ac/deps/include -g -O0 -s real64  -c ../../config_src/external/ODA_hooks/ocean_da_types.F90 

  subroutine siglongjmp_posix(env, val) bind(c, name="longjmp")
                              ^                                 
ftn-1279 crayftn: WARNING SIGLONGJMP_POSIX, File = ../../src/framework/posix.F90, Line = 191, Column = 31 
  Procedure "longjmp" is defined at line 178 (../../src/framework/posix.F90).  The type of this argument does not agree with dummy argument "ENV".

function sigsetjmp_missing(env, savesigs) result(rc) bind(c)
                                                 ^           
ftn-287 crayftn: WARNING SIGSETJMP_MISSING, File = ../../src/framework/posix.F90, Line = 352, Column = 50 
  The result of function name "RC" in the function subprogram is not defined.

The function is only used during unit testing, so would not cause an issue in actual simulations.

I am using cce/11.0.4, perhaps these were promoted to errors in cce/15 ?

@marshallward
Copy link

I've had a closer look, and I think this is just a bug on our end, which Cray caught:

subroutine siglongjmp_posix(env, val) bind(c, name="longjmp")
...
end subroutine siglongjmp_posix

The name= is wrong and Cray is objecting. If I rename this to siglongjmp then the warning goes away for me.

@marshallward
Copy link

This patch is much simpler (also addresses the other warning):

diff --git a/src/framework/posix.F90 b/src/framework/posix.F90
index 142d7634e..e5ec0e60d 100644
--- a/src/framework/posix.F90
+++ b/src/framework/posix.F90
@@ -188,7 +188,7 @@ interface
 
   !> C interface to POSIX siglongjmp()
   !! Users should use the Fortran-defined siglongjmp() function.
-  subroutine siglongjmp_posix(env, val) bind(c, name="longjmp")
+  subroutine siglongjmp_posix(env, val) bind(c, name="siglongjmp")
     ! #include <setjmp.h>
     ! int siglongjmp(jmp_buf env, int val);
     import :: sigjmp_buf, c_int
@@ -360,6 +360,9 @@ function sigsetjmp_missing(env, savesigs) result(rc) bind(c)
   print '(a)', 'ERROR: sigsetjmp() is not implemented in this build.'
   print '(a)', 'Recompile with autoconf or -DSIGSETJMP_NAME=\"<symbol name>\".'
   error stop
+
+  ! NOTE: Compilers may expect a return value, even if it is unreachable
+  rc = -1
 end function sigsetjmp_missing
 
 end module posix

If it works, we can talk about how to best put it into the source tree.

@jedwards4b
Copy link
Collaborator Author

Yes this works in cce/15.0.0

@marshallward
Copy link

@gustavo-marques I can send this as a PR to dev/ncar and we can pick it up when you guys submit to main down the road. Would that work for you and @jedwards4b ?

@jedwards4b
Copy link
Collaborator Author

@marshallward works for me - thank you.

@marshallward
Copy link

#233

Thanks for patiently working through this. I've been expecting problems with those functions for a while now, but luckily this was not (yet) a serious one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants