Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bit for bit error in OpenMP IEVA #1442

Merged
merged 1 commit into from
Mar 27, 2021
Merged

Conversation

davegill
Copy link
Contributor

@davegill davegill commented Mar 25, 2021

TYPE: bug fix

KEYWORDS: IEVA, OpenMP

SOURCE: internal

DESCRIPTION OF CHANGES:
Problem:
With IEVA activated, differences appear between OpenMP results with OMP_NUM_THREADS > 1 and any of the following:

  1. serial results
  2. MPI results
  3. OpenMP results with a single thread

Solution:
Working backwards, the computation in WRF (pre-IEVA code) computed the full MU field only on the mass-point tile size:

DO j = jts, jte-1
DO i = its, ite-1

We extend the computation one grid cell to the left and right:

DO j = jts-1, jte-1
DO i = its-1, ite-1

Since WRF previously did not use those values, that is not a problem to have additional rows and columns of valid data inside of the halo region.

This is a follow-on PR to 4412521 #1373 "Implicit Explicit Vertical Advection (IEVA)".

LIST OF MODIFIED FILES:
M dyn_em/module_big_step_utilities_em.F

TESTS CONDUCTED:

  1. I used a simple Jan 2000 case, with 60 levels, 30-km resolution, and a 20*dx time step. This caused calls to advect_u_implicit and advect_v_implicit in the first time step. Without the mods, the code generated different results depending on the number of OpenMP threads. With the mods, the results are bit-for-bit for OpenMP with the standard y-only decomposition and with a manual x-only decomposition.

Below is a figure of the differences of the V field after the first time step (before the modification). This plot is the difference of the same executable using two different OMP_NUM_THREADS values. After the mod, the results are bit-for-bit.
Screen Shot 2021-03-25 at 4 05 09 PM

Before the mods, during the first time step, the following diffs were apparent along the OpenMP boundaries:

Diffing np=1/wrfout_d01_2000-01-24_12:10:00 np=6/wrfout_d01_2000-01-24_12:10:00
 Next Time 2000-01-24_12:10:00
     Field   Ndifs    Dims       RMS (1)            RMS (2)     DIGITS    RMSE     pntwise max
         U     49384    3   0.2158843112E+02   0.2158843113E+02   9   0.5344E-05   0.3589E-05
         V     61738    3   0.1834835473E+02   0.1834835712E+02   6   0.1045E-03   0.2183E-03
         W    139132    3   0.4977466348E-01   0.4977466098E-01   7   0.3382E-05   0.4809E-03
        PH     66955    3   0.2327166773E+04   0.2327166753E+04   8   0.1078E-02   0.7572E-05
         T      4838    3   0.7925254902E+02   0.7925254902E+02  12   0.9349E-05   0.2484E-05
       THM      4812    3   0.7921679023E+02   0.7921679023E+02  12   0.9289E-05   0.2484E-05
        MU      1286    2   0.1460135950E+04   0.1460135956E+04   8   0.1203E-02   0.5148E-05
         P      6737    3   0.6512715435E+03   0.6512716390E+03   6   0.2086E-01   0.8162E-03
    QVAPOR     26582    3   0.2913825518E-02   0.2913825518E-02   9   0.4536E-09   0.5671E-05
    QCLOUD       429    3   0.6474288021E-05   0.6474289263E-05   6   0.3257E-09   0.3024E-03
      QICE       715    3   0.4136477606E-05   0.4136463263E-05   5   0.1303E-09   0.1757E-03
     QNICE       676    3   0.4164261806E+06   0.4164261805E+06   9   0.1341E+00   0.1125E-05
    RAINNC        94    2   0.3158246772E-02   0.3158239178E-02   5   0.9447E-07   0.1558E-03
    SNOWNC        94    2   0.3158246772E-02   0.3158239178E-02   5   0.9447E-07   0.1558E-03
        SR         1    2   0.3353836226E+00   0.3353836226E+00   9   0.9006E-09   0.5960E-07
  1. Wei successfully tested a separate case with 1x16 and 16x1 OpenMP decompositions, where there were bit-for-bit diffs without the mods.
  2. Jenkins tests are all PASS.

modified:   dyn_em/module_big_step_utilities_em.F
@davegill
Copy link
Contributor Author

@weiwangncar @dudhia @louiswicker
This works for a couple of small cases for me, and for a more reasonable case for Wei. We are heading towards optimism.

@davegill
Copy link
Contributor Author

@Plantain
For the IEVA code, here's a small bug fix that addresses the OpenMP bit-for-bit differences.

@weiwangncar
Copy link
Collaborator

@louiswicker You may want to see if this helps with the 'bug' you were chasing a week or two ago.

@davegill
Copy link
Contributor Author

jenkins results:

Please find result of the WRF regression test cases in the attachment. This build is for Commit ID: 7361a031b36e9f887cfc9941dda823363491871c, requested by: davegill for PR: https://github.com/wrf-model/WRF/pull/1442. For any query please send e-mail to David Gill.

    Test Type              | Expected  | Received |  Failed
    = = = = = = = = = = = = = = = = = = = = = = = =  = = = =
    Number of Tests        : 19           18
    Number of Builds       : 48           46
    Number of Simulations  : 163           161        0
    Number of Comparisons  : 103           102        0

    Failed Simulations are: 
    None
    Which comparisons are not bit-for-bit: 
    None

@louiswicker
Copy link
Contributor

Wow - thanks Dave - will try it tonight!

@davegill
Copy link
Contributor Author

@weiwangncar
Wei,
This is ready for a review. It is a small change. It fixes the problem OpenMP problem. The solution introduces valid data into the halo of memory sized "mu" fields.

@louiswicker
Copy link
Contributor

louiswicker commented Mar 26, 2021

Is there a way for me to pull this fixed code since its not merged?

NVM - I figured out how to do this from dave.gills repo

@weiwangncar
Copy link
Collaborator

@louiswicker Yes, Lou. Do this
git clone https://github.com/davegill/WRF.git
Once you have it, do 'git checkout ieva_bf' to see Dave's change.

@louiswicker
Copy link
Contributor

louiswicker commented Mar 26, 2021 via email

@davegill davegill merged commit c25b4d9 into wrf-model:develop Mar 27, 2021
vlakshmanan-scala pushed a commit to scala-computing/WRF that referenced this pull request Apr 4, 2024
TYPE: bug fix

KEYWORDS: IEVA, OpenMP

SOURCE: internal

DESCRIPTION OF CHANGES:
Problem:
With IEVA activated, differences appear between OpenMP results with OMP_NUM_THREADS > 1 and any of the following:
1. serial results
2. MPI results
3. OpenMP results with a single thread

Solution:
Working backwards, the computation in WRF (pre-IEVA code) computed the full MU field only on the mass-point tile size: 
```
DO j = jts, jte-1
DO i = its, ite-1
```
We extend the computation one grid cell to the left and right:
```
DO j = jts-1, jte-1
DO i = its-1, ite-1
```
Since WRF previously did not use those values, that is not a problem to have additional rows and columns of valid data inside of the halo region.

This is a follow-on PR to 4412521 wrf-model#1373 "Implicit Explicit Vertical Advection (IEVA)".

LIST OF MODIFIED FILES: 
M dyn_em/module_big_step_utilities_em.F

TESTS CONDUCTED: 
1. I used a simple Jan 2000 case, with 60 levels, 30-km resolution, and a 20*dx time step. This caused calls to `advect_u_implicit` and `advect_v_implicit` in the first time step. Without the mods, the code generated different results depending on the number of OpenMP threads. With the mods, the results are bit-for-bit for OpenMP with the standard y-only decomposition and with a manual x-only decomposition. 

Below is a figure of the differences of the V field after the first time step (before the modification). This plot is the difference of the same executable using two different OMP_NUM_THREADS values. After the mod, the results are bit-for-bit.
<img width="1152" alt="Screen Shot 2021-03-25 at 4 05 09 PM" src="https://user-images.githubusercontent.com/12666234/112549911-291e8e80-8d84-11eb-8b03-1e1ea50ef731.png">

Before the mods, during the first time step, the following diffs were apparent along the OpenMP boundaries:
```
Diffing np=1/wrfout_d01_2000-01-24_12:10:00 np=6/wrfout_d01_2000-01-24_12:10:00
 Next Time 2000-01-24_12:10:00
     Field   Ndifs    Dims       RMS (1)            RMS (2)     DIGITS    RMSE     pntwise max
         U     49384    3   0.2158843112E+02   0.2158843113E+02   9   0.5344E-05   0.3589E-05
         V     61738    3   0.1834835473E+02   0.1834835712E+02   6   0.1045E-03   0.2183E-03
         W    139132    3   0.4977466348E-01   0.4977466098E-01   7   0.3382E-05   0.4809E-03
        PH     66955    3   0.2327166773E+04   0.2327166753E+04   8   0.1078E-02   0.7572E-05
         T      4838    3   0.7925254902E+02   0.7925254902E+02  12   0.9349E-05   0.2484E-05
       THM      4812    3   0.7921679023E+02   0.7921679023E+02  12   0.9289E-05   0.2484E-05
        MU      1286    2   0.1460135950E+04   0.1460135956E+04   8   0.1203E-02   0.5148E-05
         P      6737    3   0.6512715435E+03   0.6512716390E+03   6   0.2086E-01   0.8162E-03
    QVAPOR     26582    3   0.2913825518E-02   0.2913825518E-02   9   0.4536E-09   0.5671E-05
    QCLOUD       429    3   0.6474288021E-05   0.6474289263E-05   6   0.3257E-09   0.3024E-03
      QICE       715    3   0.4136477606E-05   0.4136463263E-05   5   0.1303E-09   0.1757E-03
     QNICE       676    3   0.4164261806E+06   0.4164261805E+06   9   0.1341E+00   0.1125E-05
    RAINNC        94    2   0.3158246772E-02   0.3158239178E-02   5   0.9447E-07   0.1558E-03
    SNOWNC        94    2   0.3158246772E-02   0.3158239178E-02   5   0.9447E-07   0.1558E-03
        SR         1    2   0.3353836226E+00   0.3353836226E+00   9   0.9006E-09   0.5960E-07
```

2. Wei successfully tested a separate case with 1x16 and 16x1 OpenMP decompositions, where there were bit-for-bit diffs without the mods.
3. Jenkins tests are all PASS.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants