Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MEKE: No-division implementation of src_btm_drag #9

Conversation

marshallward
Copy link

This PR contains two commits. The first eliminates divisions from the calculation of src_btm_drag in Strang splitting. The second removes a redundant if-block in the second-stage of Strang splitting.

Re-indentation of the second commit can make the PR a bit more complicated to read, so it might be easier to inspect the individual commits.

Commit messages are posted below.


MEKE: No-division implementation of src_btm_drag

This patch modifies the calculation of src_btm_drag to avoid
additional division operations.

We first note the following variable renames:

  • Isfac is now damping, since it describes the net damping or
    reduction of any damped fields. sfac has been removed since it no
    longer appears in the expressions.

  • ldamping is now damp_rate. This is primarily to avoid confusion
    with damping, but also to more accurately describe its role.

  • ldamping_Strang1 is replaced to damp_rate_s1. It is used for a
    similar purpose (cacheing of values from the first stage) but now
    holds a slightly different value.

The following modifications were made.

  1. The conditional split of sdt into sdt_damping substeps is
    quantified by a new term, damp_step which is equal to 0.5 or 1.
    sdt_damp is computed as sdt * damp_step.

    The presence of sdt_damp / sdt in our expressions is replaced with
    damp_step, which avoids an extra division.

  2. The first expression (using new notation) was originally

     sfac = 1 + sdt_damp * damp_rate
     D = M * (1 - sfac) / (sdt * sfac)
    

    where D is src_btm_drag and M is MEKE_current(i,j)

    This has been transformed to

     D = - M * (sdt_damp * damp_rate)
             / (sdt * (1 + sdt_damp * damp_rate))
    
       = - M * (sdt_damp / sft) * damp_rate
             * (1 / (1 + sdt_damp * damp_rate))
    
       = - M * damp_step * damp_rate * damping
    

    This new expression for D no longer requires a division.

  3. In the second stage expression for src_btm_drag, we again use
    damp_rate to replace ldamping. We also use damp_rate1 to
    denote the original value of ldaming_Strang1.

     sfac = (1 + sdt_damp * damp_rate1) * (1 + sdt_damp * damp_rate)
     D = M * (1 - sfac) / (sdt * sfac)
    

    Using damping1 to denote the damping from the first stage, D can
    be transformed as follows.

     D = -M * (sdt_damp * damp_rate + sdt_damp * damp_rate1
                 + sdt_damp**2 * damp_rate * damp_rate1)
       / (sdt * (1 + sdt_damp * damp_rate) * (1 + sdt_damp * damp_rate1))
    
       = -M * (sdt_damp / sdt) * (1 / (1 + sdt_damp * damp_rate))
         * (damp_rate + damp_rate1 + sdt_damp * damp_rate * damp_rate1)
         / (1 + sdt_damp * damp_rate1)
    
       = -M * damp_step * damping
         * (damp_rate * (1 + sdt_damp * damp_rate1) + damp_rate1)
         / (1 + sdt_damp * damp_rate1)
    
       = -M * damp_step * damping * (damp_rate + damp_rate1 * damping1)
    

    We now store damp_rate1 * damping1 in damp_rate_s1(:,:) rather
    than just damping1, and can reduce this expression to

     D = -M * damp_step * damping * (damp_rate + damp_rate_s1(i,j))
    

    which requires no division. It also requires no additional
    read or writes.

Assuming there are no mistakes here, this should only cause negligible
bitwise differences in the results.

Also, comments have been added which explain that we cannot modify the
existing expressions for MEKE%MEKE, since they would alter the bitwise
results of existing runs.


In the second stage of Strang splitting, there is a check for sdt > sdt_damping, following a check for nonzero KH or K4.

However, sdt_damp can only be less than sdt when these are nonzero.
Otherwise, sdt_damping equals sdt.

Best as I can tell, this check is unnecessary and potentially
problematic for optimization. This patch removes the redundant check.

This patch modifies the calculation of `src_btm_drag` to avoid
additional division operations.

We first note the following variable renames:

* `Isfac` is now `damping`, since it describes the net damping or
  reduction of any damped fields.  `sfac` has been removed since it no
  longer appears in the expressions.

* `ldamping` is now `damp_rate`.  This is primarily to avoid confusion
  with `damping`, but also to more accurately describe its role.

* `ldamping_Strang1` is replaced to `damp_rate_s1`.  It is used for a
  similar purpose (cacheing of values from the first stage) but now
  holds a slightly different value.

The following modifications were made.

1. The conditional split of `sdt` into `sdt_damping` substeps is
   quantified by a new term, `damp_step` which is equal to 0.5 or 1.
   `sdt_damp` is computed as `sdt * damp_step`.

   The presence of `sdt_damp / sdt` in our expressions is replaced with
   `damp_step`, which avoids an extra division.

2. The first expression (using new notation) was originally

    sfac = 1 + sdt_damp * damp_rate
    D = M * (1 - sfac) / (sdt * sfac)

   where `D` is `src_btm_drag` and `M` is `MEKE_current(i,j)`

   This has been transformed to

    D = - M * (sdt_damp * damp_rate)
            / (sdt * (1 + sdt_damp * damp_rate))

      = - M * (sdt_damp / sft) * damp_rate
            * (1 / (1 + sdt_damp * damp_rate))

      = - M * damp_step * damp_rate * damping

   This new expression for `D` no longer requires a division.

3. In the second stage expression for `src_btm_drag`, we again use
   `damp_rate` to replace `ldamping`.  We also use `damp_rate1` to
   denote the original value of `ldaming_Strang1`.

    sfac = (1 + sdt_damp * damp_rate1) * (1 + sdt_damp * damp_rate)
    D = M * (1 - sfac) / (sdt * sfac)

   Using `damping1` to denote the damping from the first stage, `D` can
   be transformed as follows.

    D = -M * (sdt_damp * damp_rate + sdt_damp * damp_rate1
                + sdt_damp**2 * damp_rate * damp_rate1)
      / (sdt * (1 + sdt_damp * damp_rate) * (1 + sdt_damp * damp_rate1))

      = -M * (sdt_damp / sdt) * (1 / (1 + sdt_damp * damp_rate))
        * (damp_rate + damp_rate1 + sdt_damp * damp_rate * damp_rate1)
        / (1 + sdt_damp * damp_rate1)

      = -M * damp_step * damping
        * (damp_rate * (1 + sdt_damp * damp_rate1) + damp_rate1)
        / (1 + sdt_damp * damp_rate1)

      = -M * damp_step * damping * (damp_rate + damp_rate1 * damping1)

   We now store `damp_rate1 * damping1` in `damp_rate_s1(:,:)` rather
   than just `damping1`, and can reduce this expression to

    D = -M * damp_step * damping * (damp_rate + damp_rate_s1(i,j))

   which requires no division.  It also requires no additional
   read or writes.

Assuming there are no mistakes here, this should only cause negligible
bitwise differences in the results.

Also, comments have been added which explain that we cannot modify the
existing expressions for `MEKE%MEKE`, since they would alter the bitwise
results of existing runs.
In the second stage of Strang splitting, there is a check for `sdt >
sdt_damping`, following a check for nonzero `KH` or `K4`.

However, `sdt_damp` can only be less than `sdt` when these are nonzero.
Otherwise, `sdt_damping` equals `sdt`.

Best as I can tell, this check is unnecessary and potentially
problematic for optimization.  This patch removes the redundant check.
@marshallward
Copy link
Author

marshallward commented Sep 8, 2024

There are some significant changes to the equations here, so probably needs a thorough review from @ElizabethYankovsky or @Wendazhang33 if possible.

Also: I was able to confirm that replacing MEKE%MEKE / (1 + sdt_damp * damp_rate) with MEKE%MEKE * damping change the answer at a bitwise level, so those expressions do need to be kept for now.

@marshallward
Copy link
Author

marshallward commented Sep 8, 2024

I think these regressions are not a problem. We would expect that replacement of the divisions would shuffle the bits around. Seems reassuring that the answers are mostly the same, however.

(Even more reassuring that the testing suite detected and caught the regression!)

@ElizabethYankovsky ElizabethYankovsky merged commit 4869911 into ElizabethYankovsky:grace_backscatter Sep 17, 2024
8 of 10 checks passed
@marshallward marshallward deleted the backscat_nodiv branch January 14, 2025 15:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants