-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MEKE: No-division implementation of src_btm_drag #9
MEKE: No-division implementation of src_btm_drag #9
Conversation
This patch modifies the calculation of `src_btm_drag` to avoid additional division operations. We first note the following variable renames: * `Isfac` is now `damping`, since it describes the net damping or reduction of any damped fields. `sfac` has been removed since it no longer appears in the expressions. * `ldamping` is now `damp_rate`. This is primarily to avoid confusion with `damping`, but also to more accurately describe its role. * `ldamping_Strang1` is replaced to `damp_rate_s1`. It is used for a similar purpose (cacheing of values from the first stage) but now holds a slightly different value. The following modifications were made. 1. The conditional split of `sdt` into `sdt_damping` substeps is quantified by a new term, `damp_step` which is equal to 0.5 or 1. `sdt_damp` is computed as `sdt * damp_step`. The presence of `sdt_damp / sdt` in our expressions is replaced with `damp_step`, which avoids an extra division. 2. The first expression (using new notation) was originally sfac = 1 + sdt_damp * damp_rate D = M * (1 - sfac) / (sdt * sfac) where `D` is `src_btm_drag` and `M` is `MEKE_current(i,j)` This has been transformed to D = - M * (sdt_damp * damp_rate) / (sdt * (1 + sdt_damp * damp_rate)) = - M * (sdt_damp / sft) * damp_rate * (1 / (1 + sdt_damp * damp_rate)) = - M * damp_step * damp_rate * damping This new expression for `D` no longer requires a division. 3. In the second stage expression for `src_btm_drag`, we again use `damp_rate` to replace `ldamping`. We also use `damp_rate1` to denote the original value of `ldaming_Strang1`. sfac = (1 + sdt_damp * damp_rate1) * (1 + sdt_damp * damp_rate) D = M * (1 - sfac) / (sdt * sfac) Using `damping1` to denote the damping from the first stage, `D` can be transformed as follows. D = -M * (sdt_damp * damp_rate + sdt_damp * damp_rate1 + sdt_damp**2 * damp_rate * damp_rate1) / (sdt * (1 + sdt_damp * damp_rate) * (1 + sdt_damp * damp_rate1)) = -M * (sdt_damp / sdt) * (1 / (1 + sdt_damp * damp_rate)) * (damp_rate + damp_rate1 + sdt_damp * damp_rate * damp_rate1) / (1 + sdt_damp * damp_rate1) = -M * damp_step * damping * (damp_rate * (1 + sdt_damp * damp_rate1) + damp_rate1) / (1 + sdt_damp * damp_rate1) = -M * damp_step * damping * (damp_rate + damp_rate1 * damping1) We now store `damp_rate1 * damping1` in `damp_rate_s1(:,:)` rather than just `damping1`, and can reduce this expression to D = -M * damp_step * damping * (damp_rate + damp_rate_s1(i,j)) which requires no division. It also requires no additional read or writes. Assuming there are no mistakes here, this should only cause negligible bitwise differences in the results. Also, comments have been added which explain that we cannot modify the existing expressions for `MEKE%MEKE`, since they would alter the bitwise results of existing runs.
In the second stage of Strang splitting, there is a check for `sdt > sdt_damping`, following a check for nonzero `KH` or `K4`. However, `sdt_damp` can only be less than `sdt` when these are nonzero. Otherwise, `sdt_damping` equals `sdt`. Best as I can tell, this check is unnecessary and potentially problematic for optimization. This patch removes the redundant check.
There are some significant changes to the equations here, so probably needs a thorough review from @ElizabethYankovsky or @Wendazhang33 if possible. Also: I was able to confirm that replacing |
I think these regressions are not a problem. We would expect that replacement of the divisions would shuffle the bits around. Seems reassuring that the answers are mostly the same, however. (Even more reassuring that the testing suite detected and caught the regression!) |
4869911
into
ElizabethYankovsky:grace_backscatter
This PR contains two commits. The first eliminates divisions from the calculation of
src_btm_drag
in Strang splitting. The second removes a redundant if-block in the second-stage of Strang splitting.Re-indentation of the second commit can make the PR a bit more complicated to read, so it might be easier to inspect the individual commits.
Commit messages are posted below.
MEKE: No-division implementation of src_btm_drag
This patch modifies the calculation of
src_btm_drag
to avoidadditional division operations.
We first note the following variable renames:
Isfac
is nowdamping
, since it describes the net damping orreduction of any damped fields.
sfac
has been removed since it nolonger appears in the expressions.
ldamping
is nowdamp_rate
. This is primarily to avoid confusionwith
damping
, but also to more accurately describe its role.ldamping_Strang1
is replaced todamp_rate_s1
. It is used for asimilar purpose (cacheing of values from the first stage) but now
holds a slightly different value.
The following modifications were made.
The conditional split of
sdt
intosdt_damping
substeps isquantified by a new term,
damp_step
which is equal to 0.5 or 1.sdt_damp
is computed assdt * damp_step
.The presence of
sdt_damp / sdt
in our expressions is replaced withdamp_step
, which avoids an extra division.The first expression (using new notation) was originally
where
D
issrc_btm_drag
andM
isMEKE_current(i,j)
This has been transformed to
This new expression for
D
no longer requires a division.In the second stage expression for
src_btm_drag
, we again usedamp_rate
to replaceldamping
. We also usedamp_rate1
todenote the original value of
ldaming_Strang1
.Using
damping1
to denote the damping from the first stage,D
canbe transformed as follows.
We now store
damp_rate1 * damping1
indamp_rate_s1(:,:)
ratherthan just
damping1
, and can reduce this expression towhich requires no division. It also requires no additional
read or writes.
Assuming there are no mistakes here, this should only cause negligible
bitwise differences in the results.
Also, comments have been added which explain that we cannot modify the
existing expressions for
MEKE%MEKE
, since they would alter the bitwiseresults of existing runs.
In the second stage of Strang splitting, there is a check for
sdt > sdt_damping
, following a check for nonzeroKH
orK4
.However,
sdt_damp
can only be less thansdt
when these are nonzero.Otherwise,
sdt_damping
equalssdt
.Best as I can tell, this check is unnecessary and potentially
problematic for optimization. This patch removes the redundant check.