-
Notifications
You must be signed in to change notification settings - Fork 139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Evp kernel_v2 #252
Evp kernel_v2 #252
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like these changes are all isolated and shouldn't make any difference to simulations as long as the new namelist flag is set to turn them off. However we are a few days from releasing CICE v6 and I'm a little hesitant to include this much new code, which most of us don't have experience running or maintaining. I propose that we wait until after the v6 release to begin working with this new capability. I'm open to counterarguments, though, since @mhrib et al have been working with it for a while now!
I agree this should not be part of the v6 release, but I think we should be able to review, test, and merge this to the trunk after the release. |
Hi Elisabeth et al.,
I made the PR, as we just updated CICE to master and still want to include the new kernel. As I did that, and because there has been many updates since I made the first PR, I found it naturally to update the PR.
As it is implemented, it will not use the kernel unless evp_kernel_ver is actively changed / set to "2" within the ice_in name list.
It sounds reasonable to wait until the official release is done in order to have more eyes on the code and more tests
However, one could consider my isolated question within the PR:
Is "strocnx,strocny" used at all from subroutine "stepu"? It seems to be re-calculated in evp_finish routine and not used at all during the evp-loop.
Rgs
Mads
…________________________________________
From: Elizabeth Hunke <[email protected]>
Sent: Tuesday, November 27, 2018 6:26:11 PM
To: CICE-Consortium/CICE
Cc: Mads Hvid Ribergaard; Mention
Subject: Re: [CICE-Consortium/CICE] Evp kernel_v2 (#252)
@eclare108213 commented on this pull request.
It looks like these changes are all isolated and shouldn't make any difference to simulations as long as the new namelist flag is set to turn them off. However we are a few days from releasing CICE v6 and I'm a little hesitant to include this much new code, which most of us don't have experience running or maintaining. I propose that we wait until after the v6 release to begin working with this new capability. I'm open to counterarguments, though, since @mhrib<https://github.com/mhrib> et al have been working with it for a while now!
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#252 (review)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AkUFxbCq4SfbKP2mNgxejU3f_yVqJWrVks5uzXWzgaJpZM4Yve0F>.
|
Agree, I can also update it to master when the release is official. It should be straight forward...
Rgs
Mads
…________________________________________
From: Tony Craig <[email protected]>
Sent: Tuesday, November 27, 2018 7:08:09 PM
To: CICE-Consortium/CICE
Cc: Mads Hvid Ribergaard; Mention
Subject: Re: [CICE-Consortium/CICE] Evp kernel_v2 (#252)
I agree this should not be part of the v6 release, but I think we should be able to review, test, and merge this to the trunk after the release.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#252 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AkUFxSSIMTeJmf3mYw1mnG3uVEN4YKFRks5uzX-JgaJpZM4Yve0F>.
|
You are correct about this. The calculation of ice-ocean stress has changed, and now the portion that's in stepu is no longer needed. That and the old, commented-out code in dyn_finish that went with it could be removed. Thanks for pointing it out!
famous last words... |
Updates for RASM compatibility
doc mods for CICE release
* Update ice_dyn_eap.F90 * merge in hh eap_efficient branches * fix declaration of variables for stress_rdg interpolate feature * update icepack and cleanup log file * Update ice_dyn_eap.F90 * merge in hh eap_efficient branches * fix declaration of variables for stress_rdg interpolate feature * update icepack and cleanup log file * backup icepack * remove CAM_ICE and BARRIERS from cice.settings
update license file
correction to license file
* added subroutines for slotted cylinder test, added ice_data_type to namelist * finalized advection test * fixed wrong syntax for logical not * Corrected description of dxrect, dyrect Documentation said they were in meters but in the code they are in centimeters * Added documentation for boxslotcyl test Also : minor correction in the documentation of the box2001 test, added link to box2001 doc from Table of namelist options * Moved velocity initialization for boxslotcyl test to ice_init.F90 * revert ice_dyn_shared.F90 to CICE master version * add test to base_suite; fix LANL machine files * fixed options file for gbox80 and gbox128 following introduction of ice_data_type namelist flag * resolved conflict in ug_case_settings.rst * change dxrect, dyrect units to cm We ought to fix this in the code so that the units are m... cm is left over from POP grid files.
Some documentation fixes
I have now updated the EVP kernel to master.
+ added timers for "evp_1d" and "evp_2d"
+ removed "strcnx,strcny" in stepu, as it is also calculated in dyn_finish()
rgs
Mads
…________________________________________
From: Tony Craig <[email protected]>
Sent: Thursday, January 17, 2019 18:06
To: CICE-Consortium/CICE
Cc: Mads Hvid Ribergaard; Mention
Subject: Re: [CICE-Consortium/CICE] Evp kernel_v2 (#252)
Thanks Mads, that all sounds good. I agree an issue about removing the dead code is a good idea, and I would be happy for that code to be removed in this PR and for the issue to be associated with this PR.
I will play around with the new feature. I am still not totally clear about the new cpps. Does the compiler automatically set those cpps if openmp/openacc is turned on during compilation? Is that a standard that we can rely on with all compilers/machines? If not, I don't see other logic that would trigger the cpps automatically and we may want to address that. I also see that it depends on omp_lib. Is that something we can count on being on all machines and all compilers? I suspect the answer is yes to both, just not familiar with the current standard. I thought I've used omp_get_num_threads without a use statement before, but maybe I'm remembering wrong.
I continue to be worried about the OpenMP calls in the model. We have only turned off ones that have created problems that we detected (like not bit-for-bit results with different pes/threads). But apparently, there are still questions about others. Not sure how to best move forward to validate the OpenMP.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#252 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AkUFxSn2Ir-zVTUrTX9poIwFnx8087S1ks5vEK2AgaJpZM4Yve0F>.
|
Where are we with this PR? Is it ready to be reviewed? @apcraig are you doing that? Let me know if my help is needed. |
I am running some tests now. We have another problem in that the update to master was not done as a rebase so all the master changes are appearing in this commit. I think we believe it will not affect the PR merge but it does make it difficult to review. We might want to consider making a new branch off the consortium master and then merging in this branch on top. That's what a rebase pull would have done. For now, I'll proceed with testing though. |
Test results are here, https://github.com/CICE-Consortium/Test-Results/wiki/cice_by_hash_forks, hash aa6de33. Looks like some box configurations changed answers for intel, pgi, and gnu but not cray. Otherwise everything is ok. Do we need to understand why some of the box configurations have different answers? |
Hi Tony,
I did "git pull --rebase upstream master" and afterwards changed a few things in one file, that did not merge out-of-the-box.
Should I do it in another fashion, or make a new PR?
I am absolutely not an expert in git, but if you could give me some hints I may be able to fix it so it will keep the comments etc?
Rgs
Mads
…________________________________________
From: Tony Craig <[email protected]>
Sent: Sunday, January 27, 2019 1:52:02 AM
To: CICE-Consortium/CICE
Cc: Mads Hvid Ribergaard; Mention
Subject: Re: [CICE-Consortium/CICE] Evp kernel_v2 (#252)
I am running some tests now.
We have another problem in that the update to master was not done as a rebase so all the master changes are appearing in this commit. I think we believe it will not affect the PR merge but it does make it difficult to review. We might want to consider making a new branch off the consortium master and then merging in this branch on top. That's what a rebase pull would have done. For now, I'll proceed with testing though.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#252 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AkUFxWD2QVBjZoFlHDCnk86nkgKS7Be3ks5vHPgygaJpZM4Yve0F>.
|
@mhrib I look a bit closer. You are correct, it looks like you did a rebase based on the log on your branch. I expected this would result in a cleaner representation of the files changed in the PR, but it doesn't. That's a bit frustrating. Lets leave this branch as is for the moment. |
@apcraig: Note, that there is also changes in mpi/serial scatter_global and gather_global_ext making them able to handle integers and logicals for passing /icetmask, iceumask) matrices. This is quite some lines - however very much similar to the existing gather/scatter routines. The gather_global_ext has been transformed into an interface.
…________________________________________
From: Tony Craig <[email protected]>
Sent: Sunday, January 27, 2019 20:48
To: CICE-Consortium/CICE
Cc: Mads Hvid Ribergaard; Mention
Subject: Re: [CICE-Consortium/CICE] Evp kernel_v2 (#252)
@mhrib<https://github.com/mhrib> I look a bit closer. You are correct, it looks like you did a rebase based on the log on your branch. I expected this would result in a cleaner representation of the files changed in the PR, but it doesn't. That's a bit frustrating. Lets leave this branch as is for the moment.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#252 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AkUFxWNnFplTaQe1-HXO0J9GgPXe_C4gks5vHgJ3gaJpZM4Yve0F>.
|
@apcraig: Not sure I understand the differences of your test runs. I assume all did compile, but is it different box-decompositions that fails for some compilers and not for others?
If so, the errors could also stems from OMP problems within other routines. This can be checked "simply" (sed command) by comment-out all OMP statements in all routines except for the ones in ice_dyn_evp_1d.F90
…________________________________________
From: Tony Craig <[email protected]>
Sent: Sunday, January 27, 2019 05:11
To: CICE-Consortium/CICE
Cc: Mads Hvid Ribergaard; Mention
Subject: Re: [CICE-Consortium/CICE] Evp kernel_v2 (#252)
Test results are here, https://github.com/CICE-Consortium/Test-Results/wiki/cice_by_hash_forks, hash aa6de33<aa6de33>. Looks like some box configurations changed answers for intel, pgi, and gnu but not cray. Otherwise everything is ok. Do we need to understand why some of the box configurations have different answers?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#252 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AkUFxRIHW3lOXspS-8HN7cFWUdYDUf4Aks5vHSbYgaJpZM4Yve0F>.
|
Getting back to the box test failures. The model runs fine, the problem is the answer is different for several (but not all) box cases compared to the current master. This PR is bit-for-bit for all non-box cases with all compilers. I will look a bit further. But I doubt it's openmp as that is not likely to affect just the box cases and not others. I think the thing that jumps out at me is the boundary conditions. The box case has different boundary conditions than the global grids, at least in the "longitudinal" direction where non-box cases wrap around. Could there be a difference/error in the implementation for cases with non-wrap-around boundary conditions? I will try to look into this a bit more. @mhrib Is it possible for you to run some of the cases. You can see all the test results here, https://github.com/CICE-Consortium/Test-Results/wiki/cice_by_hash_forks under hash aa6de33. You can see that 5 tests failed bit-for-bit comparison on each pgi, intel, gnu. But they all passed on cray. That's also a bit odd. It could be a compiler/omp thing or something else. Is the omp behaving differently in this code compared to the master. We test all mpi, openmp, and mixed pretty carefully and compare answers between them all when we can. That is how we have detected other openmp problems in the past. I'm not saying there aren't problems, but I just don't understand why it only affects a few of the box cases and only on 3/4 compilers on gordon. |
I found a good failing test. I cut below a case where openmp is off, we are running the gbox128 on 8x1 tasks x threads, diagnostic written every timestep. Current master:
Current version:
So this suggests it's not openmp. It also suggests the initial divergence is roundoff. If we beleive these changes might be introducing a roundoff error (order or operations or similar), then maybe this is all fine. I still wonder why this only shows up on some box cases and not other cases. |
Let me summarize where we are. With evp_kernel_ver=0, results are bit-for-bit for most tests against the current master. This is running full test suites on gordon for 4 compilers. A subset of box tests are NOT bit-for-bit on 3/4 compilers. Rerunning the failed box tests with the debug flag (reduced optimization and run time checks) on both master and this PR results in bit-for-bit identical answers. It seems the changes in the answers in the box test is caused by some compiler optimization as a results of the code changes. This might be associated with the evp kernel changes (although @mhrib makes a case it shouldn't) or it might be associated with some of the code cleanup. We could look into this further or we could accept it. Personally, I am comfortable with this outcome as it stands. I believe we've shown the answers are roundoff different (see above gbox128 diff) as a result of compiler optimization and that we can make this bit-for-bit if we reduce compiler optimization. I think based on these results, we could merge this PR. evp_kernel_ver=0 will be the default setting. Separately, there is an effort to test and validate the evp_kernel_ver=2. The same test suite on gordon was run with the new kernel on. Results can be found https://github.com/CICE-Consortium/Test-Results/wiki/cice_by_hash_forks, hash aa6de33...+evpk=2. Three to four tests fail on each compiler, and they are the same tests across the compilers. Looking at the intel results, https://github.com/CICE-Consortium/Test-Results/wiki/aa6de33f19.gordon.pgi.190128.235649, there are four failures.
Again, many tests passed, but these 4 failures need to be debugged. In addition, the qc test relies on the gx1 configuration, so the qc testing comparing evp_kernel_ver=2 to 0 could not be done. So, the outstanding tasks are
I have reviewed the code diffs by doing a diff in two sandboxes (the file diffs in the PR contain a bunch of changes from the master merge). I am fine with the current code. And I am comfortable with the evp_kernel_ver=0 implementation. I propose we merge this PR now with the understanding that we'd create a new issue that identified the outstanding tasks above still to be done. @eclare108213, are you OK with that approach. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved with caveat that the evp_kernel_ver=2 setting still needs some debugging, testing and validation work. We would create a new issue to address those outstanding tasks.
This was merged as #278, exact same code changes just on a clean branch. |
#205 updated to present master.
New vectorized EVP kernel:
Possible code correction/reduction??: affects "stepu" routines in: evp_kernel1d.F90 AND default code ice_dyn_shared.F90:
--o--
Namelist:
&dynamics_nml
evp_kernel_ver = 0 ! 0: CICE (default) , 2: kernel_v2
Environment:
OMP_NUM_THREADS
Option: REAL4 internally instead of REAL8:
mv evp_kernel1d.F90 evp_kernel1d_r8.F90
cat evp_kernel1d_r8.F90 | sed s/DBL_KIND/REAL_KIND/g > evp_kernel1d.F90
--o--
Developer(s):
Mads Hvid Ribergaard and Jacob Weismann Poulsen, DMI
Please suggest code Pull Request reviewers in the column at right.
@eclare108213
Are the code changes bit for bit, different at roundoff level, or more substantial?
In most cases it should be "bit-to-bit". Depending on how the code is translated during compilation and which math libs are used.
Does this PR create or have dependencies on Icepack or any other models?
Nothing other than CICE normally have
Is the documentation being updated with this PR?
A few lines added in developers guide: doc/source/developer_guide/dg_dynamics.rst
If not, does the documentation need to be updated separately at a later time?
Other Relevant Details:
Basic structure
evp_copyin() : gather
evp_kernel() : loop stress/stepu/halo_update
evp_copyout() : scatter
There is a natural "overhead" associated with gather/scatter and defining vectors. Therefore it becomes more efficient for "large" setups.
Affected files:
./cicecore/cicedynB/dynamics/evp_kernel1d.F90 (The core, new file)
./cicecore/cicedynB/dynamics/ice_dyn_evp.F90 (switch between CICE/new kernels)
./cicecore/cicedynB/dynamics/ice_dyn_shared.F90 (namelist)
./cicecore/cicedynB/general/ice_init.F90 (namelist)
./cicecore/cicedynB/infrastructure/comm/mpi/ice_gather_scatter.F90 ()
./cicecore/cicedynB/infrastructure/comm/serial/ice_gather_scatter.F90 ()
(*) Updated gather_scatter_ext to take care of integers + logicals (icetmask,iceumask)
Possible OMP issues in these files (see more below):
./cicecore/cicedynB/analysis/ice_diagnostics.F90 (2x)
./cicecore/cicedynB/general/ice_init.F90
./cicecore/drivers/cice/CICE_RunMod.F90
Update since Fast vectorized EVP kernel #205 and maybe solved (to be tested):
./cicecore/cicedynB/analysis/ice_history.F90
./cicecore/cicedynB/dynamics/ice_transport_driver.F90 (2x)
./cicecore/cicedynB/dynamics/ice_transport_remap.F90
Update since Fast vectorized EVP kernel #205: OMP -> TCXOMP
./cicecore/cicedynB/dynamics/ice_dyn_eap.F90 - TCXOMP
./cicecore/cicedynB/dynamics/ice_dyn_evp.F90 - TCXOMP
OpenMP Issues ??:
Maybe OpenMP raises?. I have not tested it carefully, but a previous CICE version shows some raises. I have added a comment line just before the OMPs, that I did comment out last time. But all OMP's stays un-commented in this PR.
Search for "!MHRI: CHECK THIS OMP"
NOTE: I did only un-comment the OMPs to check its runs smoothly. Ie. this only means, that there is possible a thread-issue in one of the files - not necessary all of them.