Mean opacity for variable kappa based on piecewise-power-law approximation #626

chongchonghe · 2024-05-02T06:51:34Z

Description

This PR implements piecewise-power-law approximation for multigroup radiation transfer. We calculate Planck-mean, energy-mean, and flux-mean opacity given an arbitrary function kappa(nu, T, rho) based on piecewise-power-law fitting to kappa(nu), E(nu), and F(nu). The user can enable this routine by setting OpacityModel to OpacityModel::piecewisePowerLaw. See examples in test_radhydro_shock_multigroup.cpp and test_radhydro_pulse_MG_int.hpp. The original approach that the opacities are defined via ComputePlanckOpacity, ComputeFluxMeanOpacity, etc are retained by setting OpacityModel to OpacityModel::user, which is the default.

We calculate the power-law slope of a radiation quantity via the following relations:

$$ \alpha_{i} = {\rm minmod}(s_{i-1/2}, s_{i+1/2}) $$

where

$$ s_{i+1/2} = \frac{\ln \frac{X_{i+1}}{\nu_{i+3/2} - \nu_{i+1/2}} - \ln \frac{X_i}{\nu_{i+1/2} - \nu_{i-1/2}}}{\ln \sqrt{\nu_{i+1/2} \nu_{i+3/2}} - \ln \sqrt{\nu_{i-1/2} \nu_{i+1/2}}} $$

I will try to construct a better estimate of the power-law slope in a separate PR. The user can choose to overwrite this by defining your own RadSystem<problem_t>::ComputeRadQuantityExponents; see example in test_radhydro_pulse_MG_int.cpp.

The power-law fitting of $\kappa(\nu)$ can be specified in two ways. The first method is to specify its power-law slope and lower-bound value for every photon bin, given $T$ and $\rho$. The second method is to specify a precise definition of $\kappa(\nu, T, \rho)$. Then, a power-law fitting to $\kappa(\nu)$ is done on the fly in the following way:

$$ \begin{gather} \kappa_{L}^i = \kappa(\nu_{i-1/2}, T, \rho) \\ \alpha^i_{\kappa} = \frac{\ln(\kappa(\nu_{i+1/2}, T, \rho) / \kappa(\nu_{i-1/2},T,\rho))}{\ln(\nu_{i+1/2} / \nu_{i-1/2})} \\ \kappa_{{\rm fit}}^i(\nu) = \kappa_{L}^i \left( \frac{\nu}{\nu_{i-1/2}} \right)^{\alpha_{\kappa}^i} \end{gather} $$

This is definitely not the most accurate way to do this as it is not volume conservative, but it's simple and this is important since it's done for every cell. An alternative is to construct $\kappa_{L,i}$ and $\alpha_{\kappa,i}$ in the beginning if $\kappa$ does not depend on $T$ or $\rho$.

Related issues

N/A

Checklist

Before this pull request can be reviewed, all of these tasks should be completed. Denote completed tasks with an x inside the square brackets [ ] in the Markdown source below:

I have added a description (see above).
I have added a link to any related issues see (see above).
I have read the Contributing Guide.
I have added tests for any new physics that this PR adds to the code.
I have tested this PR on my local computer and all tests pass.
I have manually triggered the GPU tests with the magic comment /azp run.
I have requested a reviewer for this PR.

for more information, see https://pre-commit.ci

github-actions

clang-tidy made some suggestions

src/RadhydroPulseMGint/test_radhydro_pulse_MG_int.cpp

for more information, see https://pre-commit.ci

…-astro/quokka into chongchong/kappa-int-v8.2

src/radiation_system.hpp

for more information, see https://pre-commit.ci

…-astro/quokka into chongchong/kappa-int-v8.2

azure-pipelines · 2024-05-03T00:48:56Z

Azure Pipelines successfully started running 5 pipeline(s).

chongchonghe · 2024-05-03T01:37:22Z

/azp run

azure-pipelines · 2024-05-03T01:37:41Z

Azure Pipelines successfully started running 5 pipeline(s).

chongchonghe · 2024-05-08T03:18:08Z

Is there any way this can be done with an approximate logarithm function? The issue is that transcendental functions on GPUs have to use special hardware units that are not able to process as many threads simultaneously as the normal arithmetic units can, so there can be a severe performance cost.
If this is not possible (and I think it might not be), can you benchmark the radiation solver performance before/after this PR on either AMD GPUs or NVIDIA GPUs (or both)?

Yes. log⁡(x) can be approximated with
log⁡(x)=lima→0xa−1a
By setting a=10−7, the relative error is below 4e-6 for x between 1e-30 and 1e30, according to Mathematica. @BenWibking Do you think this is worth doing?
I was doing the opposite in ComputeGroupMeanOpacity. I need to calculate the above equation, and when a < 1e-8, I turned to using std::log(x). See here:

quokka/src/radiation_system.hpp

Line 967 in 926370c

part1 = std::log(radBoundaryRatios[g]);

Unfortunately, I think std::pow has the same issue, so I don't think that will help.

Any approximate form should either be a rational polynomial interpolant, or else use std::ldexp and std::frexp, which can be used to approximately compute 2^x and log base 2: https://en.cppreference.com/w/cpp/numeric/math/ldexp.

In that case, I'll leave that for future work. If you can approximate log_2(x), then you can approximate log(x) with log(x) = log_2(x) * log(2).

For now, I'll do a quick benchmark as you suggested.

chongchonghe · 2024-05-10T02:29:24Z

The 3D GPU test is taking very long. This PR is fine: pulse_MG_int takes 2.5 hours on gadi. However, the old pulse_MG test will take a projected time of 100 hours to finish. I suspect this is because of the low resid_tol value at 1e-13, while in this new PR I changed it to 1e-11. I reduced the resid_tol of the old pulse_MG test and I'll wait till it finishes and compare with the new test.

chongchonghe · 2024-05-10T05:07:12Z

OK. I did a benchmark on gadi gpuvolta for the old pulse-MG test (bin-centered opacity) and the new pulse-MG-int (bin integrated opacity with piecewise power-law) test. The elapsed time is 45min vs 37min. The new test is faster, but I realized it is not a fair test. The old test runs two simulations, each with 4 photon groups. The new test runs two simulations, one with 4 photon groups and the other with 1 group (exact solution where kappa is integrated over the whole spectrum). The amount of computation is roughly proportional to the nGroups, so the elapsed time per group are: old: 45/8=5.625, new: 37/5=7.4. So, the difference is not large. I wouldn't worry too much about transcendental functions on GPU.

for more information, see https://pre-commit.ci

github-actions

clang-tidy made some suggestions

src/RadhydroShockMultigroup/test_radhydro_shock_multigroup.cpp

for more information, see https://pre-commit.ci

chongchonghe · 2024-05-10T06:24:41Z

/azp run

azure-pipelines · 2024-05-10T06:25:02Z

Azure Pipelines successfully started running 5 pipeline(s).

BenWibking · 2024-05-10T14:22:13Z

OK. I did a benchmark on gadi gpuvolta for the old pulse-MG test (bin-centered opacity) and the new pulse-MG-int (bin integrated opacity with piecewise power-law) test. The elapsed time is 45min vs 37min. The new test is faster, but I realized it is not a fair test. The old test runs two simulations, each with 4 photon groups. The new test runs two simulations, one with 4 photon groups and the other with 1 group (exact solution where kappa is integrated over the whole spectrum). The amount of computation is roughly proportional to the nGroups, so the elapsed time per group are: old: 45/8=5.625, new: 37/5=7.4. So, the difference is not large. I wouldn't worry too much about transcendental functions on GPU.

Can you run a test with a 128^3 grid (with blocking_factor=128 and max_grid=128)? For a performance benchmark, it is not necessary to run the whole problem, only $\sim 10$ timesteps.

chongchonghe · 2024-05-12T04:03:46Z

@BenWibking
OK, here are the test results on gadi with 1 GPU. The parameters are 128^3 grids running for 10 time steps. Both the total run time and μs/zone-update are very similar, so the conclusion is there is no significant performance issue from this PR.

Old, bin-centered approach, opacity_model = OpacityModel::user:

186 Performance figure-of-merit: 0.7211298889 μs/zone-update [1.386712734 Mupdates/s]
187 Zone-updates on level 0: 20971520
188
189 Relative L1 error norm = 1.646387658e-05
190 elapsed time: 1459.157303 seconds.

New, opacity_model = OpacityModel::piecewisePowerLaw

186 Performance figure-of-merit: 0.8290508373 μs/zone-update [1.206198649 Mupdates/s]
187 Zone-updates on level 0: 20971520
188
189 Relative L1 error norm = 1.60272655e-05
190 elapsed time: 1394.70028 seconds.

With 4 GPUs, the performance is similar, too. These takes (about 4x) shorter time to run, although the reported μs/zone-update is bigger.

Old,

116 Performance figure-of-merit: 17.29543015 μs/zone-update [0.05781874122 Mupdates/s]
117 Zone-updates on level 0: 20971520

new:

116 Performance figure-of-merit: 16.46246597 μs/zone-update [0.06074424098 Mupdates/s]
117 Zone-updates on level 0: 20971520

BenWibking · 2024-05-12T17:58:22Z

Ok, that seems reasonable to me.

As a final check, can you see how many iterations per zone there are on average for your benchmark runs?

When I tested the radiation code in the methods paper using 128^3 grids, I got 40 million updates/sec/GPU. However, that problem was optically thin, so there number of iterations should have been close to 1.

chongchonghe · 2024-05-12T23:47:15Z

Ok, that seems reasonable to me.

As a final check, can you see how many iterations per zone there are on average for your benchmark runs?

When I tested the radiation code in the methods paper using 128^3 grids, I got 40 million updates/sec/GPU. However, that problem was optically thin, so there number of iterations should have been close to 1.

@BenWibking
Do you mean the number of Newton-Raphson iterations? How would you check that? I don't see a global counter for it.

BenWibking · 2024-05-13T00:42:52Z

Ok, that seems reasonable to me.
As a final check, can you see how many iterations per zone there are on average for your benchmark runs?
When I tested the radiation code in the methods paper using 128^3 grids, I got 40 million updates/sec/GPU. However, that problem was optically thin, so there number of iterations should have been close to 1.

@BenWibking Do you mean the number of Newton-Raphson iterations? How would you check that? I don't see a global counter for it.

Yes. There currently isn't one, but there is one for the number of cooling iterations, which you could very nearly copy and paste from:

quokka/src/TabulatedCooling.hpp

Line 309 in d7f4b43

    
           amrex::Print() << fmt::format("\tcooling substeps (per cell): avg {}, max {}\n", navg, nmax);

chongchonghe · 2024-05-13T01:15:17Z

Ok, that seems reasonable to me.
As a final check, can you see how many iterations per zone there are on average for your benchmark runs?
When I tested the radiation code in the methods paper using 128^3 grids, I got 40 million updates/sec/GPU. However, that problem was optically thin, so there number of iterations should have been close to 1.

@BenWibking Do you mean the number of Newton-Raphson iterations? How would you check that? I don't see a global counter for it.

Yes. There currently isn't one, but there is one for the number of cooling iterations, which you could very nearly copy and paste from:

quokka/src/TabulatedCooling.hpp

Line 309 in d7f4b43

amrex::Print() << fmt::format("\tcooling substeps (per cell): avg {}, max {}\n", navg, nmax);

OK, I'll try that later. Gadi is currently down.

chongchonghe · 2024-05-13T01:48:43Z

/azp run

azure-pipelines · 2024-05-13T01:49:06Z

Azure Pipelines successfully started running 5 pipeline(s).

chongchonghe · 2024-05-13T05:08:57Z

I did the same test with the RadForce problem instead. This is in the optically-thin limit and the performance is 8.3 Mupdates/s. Note that this is done with 2 photon groups, and the computational complexity is supposed to scale linearly with the number of photon groups (with a ~2.5 jump from 1 group to 2 groups due to additional setup for multigroup, e.g. interpolating the Planck integral). This brings this test to roughly 20 Mupdates/s/GPU per group. This also applies to the RadhydroPulseMGint problem presented above which uses 4 groups.

Old,

 86 Performance figure-of-merit: 0.119798939 μs/zone-update [8.347319334 Mupdates/s]
 87 Zone-updates on level 0: 20971520

New, PPL

 86 Performance figure-of-merit: 0.1201915345 μs/zone-update [8.320053521 Mupdates/s]
 87 Zone-updates on level 0: 20971520

BenWibking · 2024-05-13T14:14:49Z

I did the same test with the RadForce problem instead. This is in the optically-thin limit and the performance is 8.3 Mupdates/s. Note that this is done with 2 photon groups, and the computational complexity is supposed to scale linearly with the number of photon groups (with a ~2.5 jump from 1 group to 2 groups due to additional setup for multigroup, e.g. interpolating the Planck integral). This brings this test to roughly 20 Mupdates/s/GPU per group. This also applies to the RadhydroPulseMGint problem presented above which uses 4 groups.

Old,
 86 Performance figure-of-merit: 0.119798939 μs/zone-update [8.347319334 Mupdates/s]
 87 Zone-updates on level 0: 20971520
New, PPL
 86 Performance figure-of-merit: 0.1201915345 μs/zone-update [8.320053521 Mupdates/s]
 87 Zone-updates on level 0: 20971520

Ok, I am satisfied with this.

(Sometime in the future, we should figure out why it's a factor of ~2 slower than the version from the original paper.)

chongchonghe · 2024-05-13T15:19:26Z

@markkrumholz Should we merge this PR? I think the only thing I haven't addressed is the Piecewise powerlaw reconstruction of the radiation quantities, which I haven't finished and I'll leave for the next PR.

markkrumholz · 2024-05-13T15:28:44Z

@markkrumholz Should we merge this PR? I think the only thing I haven't addressed is the Piecewise powerlaw reconstruction of the radiation quantities, which I haven't finished and I'll leave for the next PR.

Will look today and approve. It’s first thing in the morning here.

chongchonghe and others added 14 commits April 24, 2024 13:37

piecewise-power-law step 1: work term

b544f27

clean up

bbf52f2

step 2: kappaP and kappaE

5e19e6a

step 3: flux update

d0b1be9

step 4: write ComputeRadQuantityExponents

f1efcce

define DefineOpacityExponentsAndLowerValues

a452cea

pulse_MG_int working (with drifting)

5edb1f2

Piecewise power-law working, but has drifting problem

c2f0470

correct for the work term

5ba3752

fixed bug from branch int-v8

cdf795b

uncomment ComputeRadQuantityExponents

d38d88f

define RadSystem_Has_Opacity_Model

76720fd

[pre-commit.ci] auto fixes from pre-commit.com hooks

28e474e

for more information, see https://pre-commit.ci

add error check to pulse_MG_int

66737b3

github-actions bot reviewed May 2, 2024

View reviewed changes

src/RadhydroPulseMGint/test_radhydro_pulse_MG_int.cpp Outdated Show resolved Hide resolved

src/RadhydroPulseMGint/test_radhydro_pulse_MG_int.cpp Show resolved Hide resolved

chongchonghe and others added 5 commits May 2, 2024 17:11

fix clang-tidy problems

56a98f4

[pre-commit.ci] auto fixes from pre-commit.com hooks

948cf46

for more information, see https://pre-commit.ci

restore .gitignore

29bec88

Merge branch 'development' into chongchong/kappa-int-v8.2

4c1bd48

Merge branch 'chongchong/kappa-int-v8.2' of https://github.com/quokka…

cdbedc1

…-astro/quokka into chongchong/kappa-int-v8.2

github-advanced-security bot found potential problems May 2, 2024

View reviewed changes

src/radiation_system.hpp Fixed Show resolved Hide resolved

chongchonghe and others added 5 commits May 3, 2024 10:29

fix first_capture error

c4d4ad6

Merge branch 'development' into chongchong/kappa-int-v8.2

0ba0323

[pre-commit.ci] auto fixes from pre-commit.com hooks

cdb9c6d

for more information, see https://pre-commit.ci

Merge branch 'development' into chongchong/kappa-int-v8.2

3f01604

Merge branch 'chongchong/kappa-int-v8.2' of https://github.com/quokka…

bf828aa

…-astro/quokka into chongchong/kappa-int-v8.2

fix std::copy() error

abdb03a

fix device issues on nvcc

543acb0

[pre-commit.ci] auto fixes from pre-commit.com hooks

545a660

for more information, see https://pre-commit.ci

github-actions bot reviewed May 10, 2024

View reviewed changes

src/RadhydroShockMultigroup/test_radhydro_shock_multigroup.cpp Outdated Show resolved Hide resolved

src/RadhydroShockMultigroup/test_radhydro_shock_multigroup.cpp Outdated Show resolved Hide resolved

chongchonghe and others added 2 commits May 10, 2024 16:23

fix minor error

cc62b33

[pre-commit.ci] auto fixes from pre-commit.com hooks

e38aa63

for more information, see https://pre-commit.ci

chongchonghe requested review from markkrumholz and BenWibking May 10, 2024 09:33

Merge branch 'development' into chongchong/kappa-int-v8.2

39280fe

markkrumholz approved these changes May 13, 2024

View reviewed changes

markkrumholz added this pull request to the merge queue May 13, 2024

Merged via the queue into development with commit 66da980 May 13, 2024
21 checks passed

chongchonghe deleted the chongchong/kappa-int-v8.2 branch September 27, 2024 03:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mean opacity for variable kappa based on piecewise-power-law approximation #626

Mean opacity for variable kappa based on piecewise-power-law approximation #626

chongchonghe commented May 2, 2024 •

edited

Loading

github-actions bot left a comment

azure-pipelines bot commented May 3, 2024

chongchonghe commented May 3, 2024

azure-pipelines bot commented May 3, 2024

chongchonghe commented May 8, 2024

chongchonghe commented May 10, 2024

chongchonghe commented May 10, 2024

github-actions bot left a comment

chongchonghe commented May 10, 2024

azure-pipelines bot commented May 10, 2024

BenWibking commented May 10, 2024 •

edited

Loading

chongchonghe commented May 12, 2024

BenWibking commented May 12, 2024 •

edited

Loading

chongchonghe commented May 12, 2024 •

edited

Loading

BenWibking commented May 13, 2024

chongchonghe commented May 13, 2024

chongchonghe commented May 13, 2024

azure-pipelines bot commented May 13, 2024

chongchonghe commented May 13, 2024

BenWibking commented May 13, 2024

chongchonghe commented May 13, 2024

markkrumholz commented May 13, 2024

Mean opacity for variable kappa based on piecewise-power-law approximation #626

Mean opacity for variable kappa based on piecewise-power-law approximation #626

Conversation

chongchonghe commented May 2, 2024 • edited Loading

Description

Related issues

Checklist

github-actions bot left a comment

Choose a reason for hiding this comment

azure-pipelines bot commented May 3, 2024

chongchonghe commented May 3, 2024

azure-pipelines bot commented May 3, 2024

chongchonghe commented May 8, 2024

chongchonghe commented May 10, 2024

chongchonghe commented May 10, 2024

github-actions bot left a comment

Choose a reason for hiding this comment

chongchonghe commented May 10, 2024

azure-pipelines bot commented May 10, 2024

BenWibking commented May 10, 2024 • edited Loading

chongchonghe commented May 12, 2024

BenWibking commented May 12, 2024 • edited Loading

chongchonghe commented May 12, 2024 • edited Loading

BenWibking commented May 13, 2024

chongchonghe commented May 13, 2024

chongchonghe commented May 13, 2024

azure-pipelines bot commented May 13, 2024

chongchonghe commented May 13, 2024

BenWibking commented May 13, 2024

chongchonghe commented May 13, 2024

markkrumholz commented May 13, 2024

chongchonghe commented May 2, 2024 •

edited

Loading

BenWibking commented May 10, 2024 •

edited

Loading

BenWibking commented May 12, 2024 •

edited

Loading

chongchonghe commented May 12, 2024 •

edited

Loading