Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evp kernel_v2 #252

Closed
wants to merge 66 commits into from
Closed

Evp kernel_v2 #252

wants to merge 66 commits into from

Conversation

mhrib
Copy link
Contributor

@mhrib mhrib commented Nov 22, 2018

#205 updated to present master.

  • Added "evp_1d"+"evp_2d" timer
  • Removed "strocnx,strocny" from stepu subroutine in dyn_evp_shared

New vectorized EVP kernel:

  • vectorized - contiguous vectors.
  • Running OpenMP on the GLOBAL net on one single node ie. reduce communication and avoiding HALO updates to every single block every iteration.
  • Able to run CICE mixed MPI+OpenMP on several nodes. Use Node "my_task==master_task" for EVP iterations:
  • Works for "regular" circular grids - eg. GX3. Ie. do a "circular" halo update.
  • Not tested for tri-pole grids. Doubt that it works. A stop is introduced in the code.
  • Namelist: Shift between "CICE standard" EVP and this EVP via. namelist. Default is evp_kernel_ver=0 , which is "CICE standard" version
  • Use environment "OMP_NUM_THREADS" to select number of threads used by the evp_kernel.
  • There is a natural "overhead" associated with gather/scatter and defining vectors. Therefore it becomes more efficient for "large" setups.

Possible code correction/reduction??: affects "stepu" routines in: evp_kernel1d.F90 AND default code ice_dyn_shared.F90:

  • Is "strocnx,strocny" used at all from subroutine "stepu"? Is it needed here? It is also calculated in subroutine "dyn_finish" short after the EVP-loop. Therefore I have removed the calculation in this EVP_kernel, but it can easily be re-introduced (using slightly more memory) by searching for: "!strocn"

--o--

Namelist:
&dynamics_nml
evp_kernel_ver = 0 ! 0: CICE (default) , 2: kernel_v2

Environment:
OMP_NUM_THREADS

Option: REAL4 internally instead of REAL8:

  • Not implemented as a namelist switch, but can be tried out simply by this "poor-man-option":
    mv evp_kernel1d.F90 evp_kernel1d_r8.F90
    cat evp_kernel1d_r8.F90 | sed s/DBL_KIND/REAL_KIND/g > evp_kernel1d.F90

--o--

  • Developer(s):
    Mads Hvid Ribergaard and Jacob Weismann Poulsen, DMI

  • Please suggest code Pull Request reviewers in the column at right.
    @eclare108213

  • Are the code changes bit for bit, different at roundoff level, or more substantial?
    In most cases it should be "bit-to-bit". Depending on how the code is translated during compilation and which math libs are used.

  • Does this PR create or have dependencies on Icepack or any other models?
    Nothing other than CICE normally have

  • Is the documentation being updated with this PR?
    A few lines added in developers guide: doc/source/developer_guide/dg_dynamics.rst

  • If not, does the documentation need to be updated separately at a later time?

  • Other Relevant Details:

Basic structure
evp_copyin() : gather
evp_kernel() : loop stress/stepu/halo_update
evp_copyout() : scatter

There is a natural "overhead" associated with gather/scatter and defining vectors. Therefore it becomes more efficient for "large" setups.

  • Affected files:
    ./cicecore/cicedynB/dynamics/evp_kernel1d.F90 (The core, new file)
    ./cicecore/cicedynB/dynamics/ice_dyn_evp.F90 (switch between CICE/new kernels)
    ./cicecore/cicedynB/dynamics/ice_dyn_shared.F90 (namelist)
    ./cicecore/cicedynB/general/ice_init.F90 (namelist)
    ./cicecore/cicedynB/infrastructure/comm/mpi/ice_gather_scatter.F90 ()
    ./cicecore/cicedynB/infrastructure/comm/serial/ice_gather_scatter.F90 ()
    (*) Updated gather_scatter_ext to take care of integers + logicals (icetmask,iceumask)

  • Possible OMP issues in these files (see more below):
    ./cicecore/cicedynB/analysis/ice_diagnostics.F90 (2x)
    ./cicecore/cicedynB/general/ice_init.F90
    ./cicecore/drivers/cice/CICE_RunMod.F90
    Update since Fast vectorized EVP kernel #205 and maybe solved (to be tested):
    ./cicecore/cicedynB/analysis/ice_history.F90
    ./cicecore/cicedynB/dynamics/ice_transport_driver.F90 (2x)
    ./cicecore/cicedynB/dynamics/ice_transport_remap.F90
    Update since Fast vectorized EVP kernel #205: OMP -> TCXOMP
    ./cicecore/cicedynB/dynamics/ice_dyn_eap.F90 - TCXOMP
    ./cicecore/cicedynB/dynamics/ice_dyn_evp.F90 - TCXOMP

  • OpenMP Issues ??:
    Maybe OpenMP raises?. I have not tested it carefully, but a previous CICE version shows some raises. I have added a comment line just before the OMPs, that I did comment out last time. But all OMP's stays un-commented in this PR.
    Search for "!MHRI: CHECK THIS OMP"

NOTE: I did only un-comment the OMPs to check its runs smoothly. Ie. this only means, that there is possible a thread-issue in one of the files - not necessary all of them.

@mhrib mhrib mentioned this pull request Nov 26, 2018
Copy link
Contributor

@eclare108213 eclare108213 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like these changes are all isolated and shouldn't make any difference to simulations as long as the new namelist flag is set to turn them off. However we are a few days from releasing CICE v6 and I'm a little hesitant to include this much new code, which most of us don't have experience running or maintaining. I propose that we wait until after the v6 release to begin working with this new capability. I'm open to counterarguments, though, since @mhrib et al have been working with it for a while now!

@apcraig
Copy link
Contributor

apcraig commented Nov 27, 2018

I agree this should not be part of the v6 release, but I think we should be able to review, test, and merge this to the trunk after the release.

@mhrib
Copy link
Contributor Author

mhrib commented Nov 27, 2018 via email

@mhrib
Copy link
Contributor Author

mhrib commented Nov 27, 2018 via email

@eclare108213
Copy link
Contributor

However, one could consider my isolated question within the PR: Is "strocnx,strocny" used at all from subroutine "stepu"? It seems to be re-calculated in evp_finish routine and not used at all during the evp-loop.

You are correct about this. The calculation of ice-ocean stress has changed, and now the portion that's in stepu is no longer needed. That and the old, commented-out code in dyn_finish that went with it could be removed. Thanks for pointing it out!

It should be straight forward...

famous last words...

eclare108213 and others added 17 commits November 27, 2018 15:41
* Update ice_dyn_eap.F90

* merge in hh eap_efficient branches

* fix declaration of variables for stress_rdg interpolate feature

* update icepack and cleanup log file

* Update ice_dyn_eap.F90

* merge in hh eap_efficient branches

* fix declaration of variables for stress_rdg interpolate feature

* update icepack and cleanup log file

* backup icepack

* remove CAM_ICE and BARRIERS from cice.settings
* added subroutines for slotted cylinder test, added ice_data_type to namelist

* finalized advection test

* fixed wrong syntax for logical not

* Corrected description of dxrect, dyrect

Documentation said they were in meters but in the code they are in centimeters

* Added documentation for boxslotcyl test

Also :  minor correction in the documentation of the box2001 test, added link to box2001 doc from Table of namelist options

* Moved velocity initialization for boxslotcyl test to ice_init.F90

* revert ice_dyn_shared.F90 to CICE master version

* add test to base_suite; fix LANL machine files

* fixed options file for gbox80 and gbox128 following introduction of ice_data_type namelist flag

* resolved conflict in ug_case_settings.rst

* change dxrect, dyrect units to cm

We ought to fix this in the code so that the units are m...  cm is left over from POP grid files.
@mhrib
Copy link
Contributor Author

mhrib commented Jan 24, 2019 via email

@eclare108213
Copy link
Contributor

Where are we with this PR? Is it ready to be reviewed? @apcraig are you doing that? Let me know if my help is needed.

@apcraig
Copy link
Contributor

apcraig commented Jan 27, 2019

I am running some tests now.

We have another problem in that the update to master was not done as a rebase so all the master changes are appearing in this commit. I think we believe it will not affect the PR merge but it does make it difficult to review. We might want to consider making a new branch off the consortium master and then merging in this branch on top. That's what a rebase pull would have done. For now, I'll proceed with testing though.

@apcraig
Copy link
Contributor

apcraig commented Jan 27, 2019

Test results are here, https://github.com/CICE-Consortium/Test-Results/wiki/cice_by_hash_forks, hash aa6de33. Looks like some box configurations changed answers for intel, pgi, and gnu but not cray. Otherwise everything is ok. Do we need to understand why some of the box configurations have different answers?

@mhrib
Copy link
Contributor Author

mhrib commented Jan 27, 2019 via email

@apcraig
Copy link
Contributor

apcraig commented Jan 27, 2019

@mhrib I look a bit closer. You are correct, it looks like you did a rebase based on the log on your branch. I expected this would result in a cleaner representation of the files changed in the PR, but it doesn't. That's a bit frustrating. Lets leave this branch as is for the moment.

@mhrib
Copy link
Contributor Author

mhrib commented Jan 28, 2019 via email

@mhrib
Copy link
Contributor Author

mhrib commented Jan 28, 2019 via email

@apcraig
Copy link
Contributor

apcraig commented Jan 28, 2019

Getting back to the box test failures. The model runs fine, the problem is the answer is different for several (but not all) box cases compared to the current master. This PR is bit-for-bit for all non-box cases with all compilers. I will look a bit further. But I doubt it's openmp as that is not likely to affect just the box cases and not others. I think the thing that jumps out at me is the boundary conditions. The box case has different boundary conditions than the global grids, at least in the "longitudinal" direction where non-box cases wrap around. Could there be a difference/error in the implementation for cases with non-wrap-around boundary conditions? I will try to look into this a bit more. @mhrib Is it possible for you to run some of the cases. You can see all the test results here, https://github.com/CICE-Consortium/Test-Results/wiki/cice_by_hash_forks under hash aa6de33. You can see that 5 tests failed bit-for-bit comparison on each pgi, intel, gnu. But they all passed on cray. That's also a bit odd. It could be a compiler/omp thing or something else. Is the omp behaving differently in this code compared to the master. We test all mpi, openmp, and mixed pretty carefully and compare answers between them all when we can. That is how we have detected other openmp problems in the past. I'm not saying there aren't problems, but I just don't understand why it only affects a few of the box cases and only on 3/4 compilers on gordon.

@apcraig
Copy link
Contributor

apcraig commented Jan 28, 2019

I found a good failing test. I cut below a case where openmp is off, we are running the gbox128 on 8x1 tasks x threads, diagnostic written every timestep.

Current master:

istep1:         1    idate:  19970101    sec:      3600
  Initial forcing data year =         1997
  Final   forcing data year =         1997
  Current forcing data year =         1997
  
 Finished writing ./history/iceh_ic.1997-01-01-00000.nc
                                             Arctic                 Antarctic
total ice area  (km^2) =    7.14237020674198773E+06   0.00000000000000000E+00
total ice extent(km^2) =    1.21635000000000000E+07   0.00000000000000000E+00
total ice volume (m^3) =    1.42960388367532070E+13   0.00000000000000000E+00
total snw volume (m^3) =    1.42853391809006396E+12   0.00000000000000000E+00
tot kinetic energy (J) =    1.04308509283625344E+14   0.00000000000000000E+00
rms ice speed    (m/s) =        0.12393981696376018       0.00000000000000000
average albedo         =        1.00000000000000000       0.00000000000000000
max ice volume     (m) =        1.99371521192846801  ************************
max ice speed    (m/s) =        0.17140107227822357  ************************
max strength    (kN/m) =      155.95968782447957324  ************************
 ----------------------------
arwt rain h2o kg in dt =    0.00000000000000000E+00   0.00000000000000000E+00
arwt snow h2o kg in dt =    0.00000000000000000E+00   0.00000000000000000E+00
arwt evap h2o kg in dt =    1.91513434279816513E+10   0.00000000000000000E+00
arwt frzl h2o kg in dt =    0.00000000000000000E+00   0.00000000000000000E+00
arwt frsh h2o kg in dt =   -1.03046549289842930E+13   0.00000000000000000E+00
arwt ice mass (kg)     =    1.31094676133026900E+16   0.00000000000000000E+00
arwt snw mass (kg)     =    4.71416192969721125E+14   0.00000000000000000E+00
arwt tot mass (kg)     =    1.35808838062724120E+16   0.00000000000000000E+00
arwt tot mass chng(kg) =    1.03238062724120000E+13   0.00000000000000000E+00
arwt water flux        =    1.03238062724122754E+13   0.00000000000000000E+00
 (=rain+snow+evap+frzl-fresh)  
water flux error       =    2.02778132062958380E-17   0.00000000000000000E+00
 ----------------------------
arwt atm heat flux (W) =   -5.14185247756211328E+13   0.00000000000000000E+00
arwt ocn heat flux (W) =   -1.17835525201700000E+14   0.00000000000000000E+00
arwt frzl heat flux(W) =    0.00000000000000000E+00   0.00000000000000000E+00
arwt tot energy    (J) =   -4.76571858097386029E+21   0.00000000000000000E+00
arwt net heat      (J) =    2.39101201533883936E+17   0.00000000000000000E+00
arwt tot energy chng(J)=    2.39101201534156800E+17   0.00000000000000000E+00
arwt heat error        =    5.72555838880945994E-17   0.00000000000000000E+00
 ----------------------------
arwt incoming sw (W)   =    0.00000000000000000E+00   0.00000000000000000E+00
arwt absorbed sw (W)   =    0.00000000000000000E+00   0.00000000000000000E+00
arwt swdn error        =    0.00000000000000000E+00   0.00000000000000000E+00
 ----------------------------
total brine tr (m^3)   =    0.00000000000000000E+00   0.00000000000000000E+00
arwt salt mass (kg)    =    5.24378704532107578E+13   0.00000000000000000E+00
arwt salt mass chng(kg)=    4.12240532107600021E+10   0.00000000000000000E+00
arwt salt flx in dt(kg)=   -4.12240532107647934E+10   0.00000000000000000E+00
arwt salt flx error    =   -9.13702201141072666E-17   0.00000000000000000E+00
 ----------------------------
                          
       Lat, Long                     89.9   23.6           71.8 -122.4
  my_task, iblk, i, j            4    1    3    8          3    1   33    4
 ----------atm----------
air temperature (C)    =      -20.14999999999997726     -20.14999999999997726
specific humidity      =        0.00060000000000000       0.00060000000000000
snowfall (m)           =        0.00000000000000000       0.00000000000000000
rainfall (m)           =        0.00000000000000000       0.00000000000000000
shortwave radiation sum=        0.00000000000000000       0.00000000000000000
longwave radiation     =      180.00000000000000000     180.00000000000000000
 ----------ice----------
area fraction          =        0.01170127822002812       0.99160553732165613
avg ice thickness (m)  =        2.00157353786122405       2.00168379704297283
avg snow depth (m)     =        0.20000772644486051       0.20001128813199259
avg salinity (ppt)     =        2.32551636526343097       2.32551636526344252
avg brine thickness (m)=        0.00000000000000000       0.00000000000000000
surface temperature(C) =      -22.28674009109586862     -22.34808630976561261
absorbed shortwave flx =        0.00000000000000000       0.00000000000000000
outward longwave flx   =     -222.66327458031167907    -223.12825998110193382
sensible heat flx      =       34.12714109554898556      33.80801285663843458
latent heat flx        =        2.04113905604011636       2.08830094793734666
subl/cond (m ice)      =        0.00000278470764555       0.00000284904149604
top melt (m)           =        0.00000000000000000       0.00000000000000000
bottom melt (m)        =        0.00000000000000000       0.00000000000000000
lateral melt (m)       =        0.00000000000000000       0.00000000000000000
new ice (m)            =        0.00000000000000000       0.00000000000000000
congelation (m)        =        0.00001843989680427       0.00156739122836325
snow-ice (m)           =        0.00000000000000000       0.00000000000000000
snow change (m)        =        0.00000000000000000       0.00000000000000000
effective dhi (m)      =       -0.00001653115563983      -0.00730676288514998
effective dhs (m)      =       -0.00000340394671341      -0.00088644916147890
intnl enrgy chng(W/m^2)=       -3.35542558169561556    -831.78274165365428416
 ----------ocn----------
sst (C)                =       -1.90458264992426463      -1.90458264992426463
sss (ppt)              =       34.00000000000000000      34.00000000000000000
freezing temp (C)      =       -1.90458264992426463      -1.90458264992426463
heat used (W/m^2)      =       16.50273655272619777      16.81543837335271618

Current version:

istep1:         1    idate:  19970101    sec:      3600
  Initial forcing data year =         1997
  Final   forcing data year =         1997
  Current forcing data year =         1997
  
 Finished writing ./history/iceh_ic.1997-01-01-00000.nc
                                             Arctic                 Antarctic
total ice area  (km^2) =    7.14237020674198866E+06   0.00000000000000000E+00
total ice extent(km^2) =    1.21635000000000000E+07   0.00000000000000000E+00
total ice volume (m^3) =    1.42960388367532070E+13   0.00000000000000000E+00
total snw volume (m^3) =    1.42853391809006396E+12   0.00000000000000000E+00
tot kinetic energy (J) =    1.04308509283625297E+14   0.00000000000000000E+00
rms ice speed    (m/s) =        0.12393981696376015       0.00000000000000000
average albedo         =        0.99999999999999989       0.00000000000000000
max ice volume     (m) =        1.99371521192844514  ************************
max ice speed    (m/s) =        0.17140107227822357  ************************
max strength    (kN/m) =      155.95968782447957324  ************************
 ----------------------------
arwt rain h2o kg in dt =    0.00000000000000000E+00   0.00000000000000000E+00
arwt snow h2o kg in dt =    0.00000000000000000E+00   0.00000000000000000E+00
arwt evap h2o kg in dt =    1.91513434279816513E+10   0.00000000000000000E+00
arwt frzl h2o kg in dt =    0.00000000000000000E+00   0.00000000000000000E+00
arwt frsh h2o kg in dt =   -1.03046549289842930E+13   0.00000000000000000E+00
arwt ice mass (kg)     =    1.31094676133026900E+16   0.00000000000000000E+00
arwt snw mass (kg)     =    4.71416192969721125E+14   0.00000000000000000E+00
arwt tot mass (kg)     =    1.35808838062724120E+16   0.00000000000000000E+00
arwt tot mass chng(kg) =    1.03238062724120000E+13   0.00000000000000000E+00
arwt water flux        =    1.03238062724122754E+13   0.00000000000000000E+00
 (=rain+snow+evap+frzl-fresh)  
water flux error       =    2.02778132062958380E-17   0.00000000000000000E+00
 ----------------------------
arwt atm heat flux (W) =   -5.14185247756211328E+13   0.00000000000000000E+00
arwt ocn heat flux (W) =   -1.17835525201699937E+14   0.00000000000000000E+00
arwt frzl heat flux(W) =    0.00000000000000000E+00   0.00000000000000000E+00
arwt tot energy    (J) =   -4.76571858097385924E+21   0.00000000000000000E+00
arwt net heat      (J) =    2.39101201533883712E+17   0.00000000000000000E+00
arwt tot energy chng(J)=    2.39101201535205376E+17   0.00000000000000000E+00
arwt heat error        =    2.77327327987109579E-16   0.00000000000000000E+00
 ----------------------------
arwt incoming sw (W)   =    0.00000000000000000E+00   0.00000000000000000E+00
arwt absorbed sw (W)   =    0.00000000000000000E+00   0.00000000000000000E+00
arwt swdn error        =    0.00000000000000000E+00   0.00000000000000000E+00
 ----------------------------
total brine tr (m^3)   =    0.00000000000000000E+00   0.00000000000000000E+00
arwt salt mass (kg)    =    5.24378704532107578E+13   0.00000000000000000E+00
arwt salt mass chng(kg)=    4.12240532107600021E+10   0.00000000000000000E+00
arwt salt flx in dt(kg)=   -4.12240532107647934E+10   0.00000000000000000E+00
arwt salt flx error    =   -9.13702201141072666E-17   0.00000000000000000E+00
 ----------------------------
                          
       Lat, Long                     89.9   23.6           71.8 -122.4
  my_task, iblk, i, j            4    1    3    8          3    1   33    4
 ----------atm----------
air temperature (C)    =      -20.14999999999997726     -20.14999999999997726
specific humidity      =        0.00060000000000000       0.00060000000000000
snowfall (m)           =        0.00000000000000000       0.00000000000000000
rainfall (m)           =        0.00000000000000000       0.00000000000000000
shortwave radiation sum=        0.00000000000000000       0.00000000000000000
longwave radiation     =      180.00000000000000000     180.00000000000000000
 ----------ice----------
area fraction          =        0.01170127822002812       0.99160553732165624
avg ice thickness (m)  =        2.00157353786122405       2.00168379704297283
avg snow depth (m)     =        0.20000772644486051       0.20001128813199259
avg salinity (ppt)     =        2.32551636526343097       2.32551636526344252
avg brine thickness (m)=        0.00000000000000000       0.00000000000000000
surface temperature(C) =      -22.28674009109586862     -22.34808630976561261
absorbed shortwave flx =        0.00000000000000000       0.00000000000000000
outward longwave flx   =     -222.66327458031167907    -223.12825998110190540
sensible heat flx      =       34.12714109554898556      33.80801285663842748
latent heat flx        =        2.04113905604011636       2.08830094793734622
subl/cond (m ice)      =        0.00000278470764555       0.00000284904149604
top melt (m)           =        0.00000000000000000       0.00000000000000000
bottom melt (m)        =        0.00000000000000000       0.00000000000000000
lateral melt (m)       =        0.00000000000000000       0.00000000000000000
new ice (m)            =        0.00000000000000000       0.00000000000000000
congelation (m)        =        0.00001843989680427       0.00156739122836325
snow-ice (m)           =        0.00000000000000000       0.00000000000000000
snow change (m)        =        0.00000000000000000       0.00000000000000000
effective dhi (m)      =       -0.00001653115563983      -0.00730676288514975
effective dhs (m)      =       -0.00000340394671341      -0.00088644916147887
intnl enrgy chng(W/m^2)=       -3.35542558169561556    -831.78274165358811842
 ----------ocn----------
sst (C)                =       -1.90458264992426463      -1.90458264992426463
sss (ppt)              =       34.00000000000000000      34.00000000000000000
freezing temp (C)      =       -1.90458264992426463      -1.90458264992426463
heat used (W/m^2)      =       16.50273655272619777      16.81543837335268776

The diffs are

< total ice area  (km^2) =    7.14237020674198773E+06   0.00000000000000000E+00
---
> total ice area  (km^2) =    7.14237020674198866E+06   0.00000000000000000E+00
354,357c355,358
< tot kinetic energy (J) =    1.04308509283625344E+14   0.00000000000000000E+00
< rms ice speed    (m/s) =        0.12393981696376018       0.00000000000000000
< average albedo         =        1.00000000000000000       0.00000000000000000
< max ice volume     (m) =        1.99371521192846801  ************************
---
> tot kinetic energy (J) =    1.04308509283625297E+14   0.00000000000000000E+00
> rms ice speed    (m/s) =        0.12393981696376015       0.00000000000000000
> average albedo         =        0.99999999999999989       0.00000000000000000
> max ice volume     (m) =        1.99371521192844514  ************************
375c376
< arwt ocn heat flux (W) =   -1.17835525201700000E+14   0.00000000000000000E+00
---
> arwt ocn heat flux (W) =   -1.17835525201699937E+14   0.00000000000000000E+00
377,380c378,381
< arwt tot energy    (J) =   -4.76571858097386029E+21   0.00000000000000000E+00
< arwt net heat      (J) =    2.39101201533883936E+17   0.00000000000000000E+00
< arwt tot energy chng(J)=    2.39101201534156800E+17   0.00000000000000000E+00
< arwt heat error        =    5.72555838880945994E-17   0.00000000000000000E+00
---
> arwt tot energy    (J) =   -4.76571858097385924E+21   0.00000000000000000E+00
> arwt net heat      (J) =    2.39101201533883712E+17   0.00000000000000000E+00
> arwt tot energy chng(J)=    2.39101201535205376E+17   0.00000000000000000E+00
> arwt heat error        =    2.77327327987109579E-16   0.00000000000000000E+00
403c404
< area fraction          =        0.01170127822002812       0.99160553732165613
---
> area fraction          =        0.01170127822002812       0.99160553732165624
410,412c411,413
< outward longwave flx   =     -222.66327458031167907    -223.12825998110193382
< sensible heat flx      =       34.12714109554898556      33.80801285663843458
< latent heat flx        =        2.04113905604011636       2.08830094793734666
---
> outward longwave flx   =     -222.66327458031167907    -223.12825998110190540
> sensible heat flx      =       34.12714109554898556      33.80801285663842748
> latent heat flx        =        2.04113905604011636       2.08830094793734622
421,423c422,424
< effective dhi (m)      =       -0.00001653115563983      -0.00730676288514998
< effective dhs (m)      =       -0.00000340394671341      -0.00088644916147890
< intnl enrgy chng(W/m^2)=       -3.35542558169561556    -831.78274165365428416
---
> effective dhi (m)      =       -0.00001653115563983      -0.00730676288514975
> effective dhs (m)      =       -0.00000340394671341      -0.00088644916147887
> intnl enrgy chng(W/m^2)=       -3.35542558169561556    -831.78274165358811842
428c429
< heat used (W/m^2)      =       16.50273655272619777      16.81543837335271618
---
> heat used (W/m^2)      =       16.50273655272619777      16.81543837335268776

So this suggests it's not openmp. It also suggests the initial divergence is roundoff. If we beleive these changes might be introducing a roundoff error (order or operations or similar), then maybe this is all fine. I still wonder why this only shows up on some box cases and not other cases.

@apcraig
Copy link
Contributor

apcraig commented Jan 31, 2019

Let me summarize where we are.

With evp_kernel_ver=0, results are bit-for-bit for most tests against the current master. This is running full test suites on gordon for 4 compilers. A subset of box tests are NOT bit-for-bit on 3/4 compilers. Rerunning the failed box tests with the debug flag (reduced optimization and run time checks) on both master and this PR results in bit-for-bit identical answers. It seems the changes in the answers in the box test is caused by some compiler optimization as a results of the code changes. This might be associated with the evp kernel changes (although @mhrib makes a case it shouldn't) or it might be associated with some of the code cleanup. We could look into this further or we could accept it. Personally, I am comfortable with this outcome as it stands. I believe we've shown the answers are roundoff different (see above gbox128 diff) as a result of compiler optimization and that we can make this bit-for-bit if we reduce compiler optimization. I think based on these results, we could merge this PR. evp_kernel_ver=0 will be the default setting.

Separately, there is an effort to test and validate the evp_kernel_ver=2. The same test suite on gordon was run with the new kernel on. Results can be found https://github.com/CICE-Consortium/Test-Results/wiki/cice_by_hash_forks, hash aa6de33...+evpk=2. Three to four tests fail on each compiler, and they are the same tests across the compilers. Looking at the intel results, https://github.com/CICE-Consortium/Test-Results/wiki/aa6de33f19.gordon.pgi.190128.235649, there are four failures.

  • restart gbox128 4x2. This test runs but fails to restart exactly.
  • restart gx1 40x4 droundrobin medium. This test fails with "(abort_ice) error = (horizontal_remap)ERROR: bad departure points" on the first timestep.
  • restart gx3 16x2x5x10x20 drakeX2. This test fails with "(abort_ice) error = (horizontal_remap)ERROR: bad departure points" on the first timestep.
  • restart tx1 40x4 dsectrobin medium. This test fails gracefully in the evp kernel. tx1 is not supported yet.

Again, many tests passed, but these 4 failures need to be debugged. In addition, the qc test relies on the gx1 configuration, so the qc testing comparing evp_kernel_ver=2 to 0 could not be done.

So, the outstanding tasks are

  • debug the 4 failures noted above
  • run the qc test comparing evp_kernel_ver=0 to evp_kernel_ver=2. This requires gx1 (one of the failing tests)
  • update documentation
  • produce and document timing information comparing evp_kernel_ver=0 and 2.
  • add evp_kernel_ver=2 tests to the test suite
  • maybe do a little cleanup on ice_dyn_evp_1d.F90 to make the code a little more readable (breaks between subroutines and such)

I have reviewed the code diffs by doing a diff in two sandboxes (the file diffs in the PR contain a bunch of changes from the master merge). I am fine with the current code. And I am comfortable with the evp_kernel_ver=0 implementation. I propose we merge this PR now with the understanding that we'd create a new issue that identified the outstanding tasks above still to be done. @eclare108213, are you OK with that approach.

@apcraig apcraig self-requested a review January 31, 2019 21:10
Copy link
Contributor

@apcraig apcraig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved with caveat that the evp_kernel_ver=2 setting still needs some debugging, testing and validation work. We would create a new issue to address those outstanding tasks.

@apcraig
Copy link
Contributor

apcraig commented Feb 2, 2019

This was merged as #278, exact same code changes just on a clean branch.

@apcraig apcraig closed this Feb 2, 2019
@mhrib mhrib deleted the EVP_kernelv2 branch November 8, 2023 13:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants