-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error with MOM5 with current main #352
Comments
Well, I ran MOM5 at NAS on this "day of no NCCS" and it's duplicable. To try and debug, first, I added some prints in that chunk of code above: write(*,*) "counts(1): ", counts(1)
write(*,*) "counts(2): ", counts(2)
write(*,*) "iec: ", iec, "isc: ", isc
write(*,*) "jec: ", iec, "jsc: ", isc
write(*,*) "IM :", IM
write(*,*) "JM :", JM Here
and on a bad run (
So for some reason, MOM/FMS is returning...no grid? Or something. And again...we did not touch MOM5. Or FMS. And Ricardo's App changes are pretty boring. I did a test where I made an experiment in |
Looks like a problem with decomposition. isc, jsc, iec, jec are start and ens indexes of compute domain. IM, JM is size of the domain. |
@yvikhlya I agree but...nothing changed. My hope was that My current fear is that this is one of those "We changed the memory state and things are running differently" things. As in, we can add a print statement in MOM5 somewhere and all will work again! |
@mathomp4 |
We will get back to this issue on a later date. @mathomp4 closing it for now, please feel free to reopen if needed. Thanks! |
Seems like ocean_model_init did not run properly and grid dimensions which are returned by mom4_get_dimensions are junk. Now, why is that? |
It may be easier or faster to know why using 1-deg resolution that @mathomp4 says has the same problem. |
I already have 0.25 degree set up and interactive session, so I don't see a need to switch to 1 degree. |
The last successful run with MOM5 I did was with v10.14.1 about 2 years ago. Something got broken since then. |
@mathomp4 Do you have any suggestion how to debug this? I can't think of anything better that put printouts inside of ocean_model_init. |
@yvikhlya Not really. When this happened I was just confused. It just sort of "happened" one night and the only changes I could see were whitespace changes! It was like all of the sudden the system decided to do this. I suppose one possible thought is to try a run with GNU? Maybe it will show a different error? I am not sure. |
@mathomp4 Unrelated issue, but I can't push stuff to github today. I have my ssh rsa key uploaded to github and i was always to push without password, but today it asks me a password and then says that I need access token. How do you use github these days? |
If you are seeing "access token", you might have cloned the https URL instead of the SSH. You can run If that happened, you can switch your remote url with:
where you change that to whichever repo you are in. Now, if you are like me and never want an HTTPS url from github ever again, you can run:
and from now on, git will always clone with SSH from github even if you accidentally pass it an HTTPS one! |
@mathomp4 Thanks! That was it. |
Well, 2 suggestions:
|
@mathomp4 There is something wrong here. A printout from MOM5 run:
P.S. Just verified that it runs MOM_GEOS5PlugMod.F90 (MOM5), but |
Hmm! Maybe that shared object lib/ DSO stuff hitting us again? |
We might need to add back |
@mathomp4 Could you remind me how to use LD_PRELOAD in csh? It works in bash for me:
But gives error in csh:
|
@yvikhlya You have to use
|
LD_PRELOAD works! MOM5 initialized correctly. If this is a solution we are going to use, we need to update gcm_run.j and submit a PR (I can do it). The model crashed in land component though with error:
This is a whole separate issue, something is wrong with restarts which we generated with @sanAkel last week. I am investigating this issue. |
Nice! I suppose a simple "If MOM5, add LD_PRELOAD" can work.
Ouch. Yeah. That's when I start asking around! |
I can confirm that works for both:
|
The current GEOSgcm
main
(as of today) seems to have an issue running MOM5. But as near as I can tell nothing has fundamentally changed with MOM5! There were some changes to GEOSgcm_App from Ricardo, but I tested those in a MOM5 run yesterday and it seemed to work. I also did anxxdiff
between a working run from last night to the current test and pretty much all the differences are in whitespace!To wit as an example in this file (NOTE the SLURM file name will change everynight):
the error is:
Looking in GEOSgcm_GridComp:
the line in question (877) is:
ASSERT_(counts(1)==IM)
This error did not happen in MAPL
develop
tests last night, so I can't blame MAPL.I will probably be consulting @yvikhlya and @sanAkel about this.
The text was updated successfully, but these errors were encountered: