Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reflect via indices outside boundaries #3

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

elisno
Copy link

@elisno elisno commented Jun 21, 2021

This PR adds HighDimPDE_reflect_outs, another implementation of HighDimPDE._reflect/HighDimPDE._reflect_GPU, which computes out (out1 and out2) along with rtemp/rmin exclusively from indices of b where it "lies outside the boundary of [s,e]^d".

One should be able to reuse these indices in the while loop, but repeated reflections along a given dimension are slightly trickier to handle. These repeated reflections appear to be relatively uncommon, so a workaround would be to recompute out1, out2 and out only for these cases in a given iteration.

_swap_boundary_outs! doesn't seem to work as expected, but the idea was to avoid recomputing out1 and out2 within the while loop.

I added some benchmarks, comparing the reflect methods for d-dimensional trajectories. Note that this is done with batch_size=1, see added code in profiling/reflect.jl.

Click to view benchmark results
d = 2
CPU       0.000003 seconds (7 allocations: 384 bytes)
GPU       0.000003 seconds (13 allocations: 768 bytes)
Index     0.000003 seconds (14 allocations: 864 bytes)

d = 4
CPU       0.000001 seconds (6 allocations: 384 bytes)
GPU       0.000002 seconds (13 allocations: 768 bytes)
Index     0.000002 seconds (14 allocations: 864 bytes)

d = 8
CPU       0.000004 seconds (21 allocations: 2.391 KiB)
GPU       0.000071 seconds (121 allocations: 6.172 KiB)
Index     0.000002 seconds (14 allocations: 928 bytes)

d = 16
CPU       0.000003 seconds (21 allocations: 3.328 KiB)
GPU       0.000024 seconds (121 allocations: 6.891 KiB)
Index     0.000002 seconds (14 allocations: 928 bytes)

d = 32
CPU       0.000004 seconds (56 allocations: 16.688 KiB)
GPU       0.000064 seconds (391 allocations: 27.625 KiB)
Index     0.000003 seconds (14 allocations: 1024 bytes)

d = 64
CPU       0.000024 seconds (154 allocations: 90.469 KiB)
GPU       0.000194 seconds (1.15 k allocations: 116.906 KiB)
Index     0.000004 seconds (14 allocations: 1.281 KiB)

d = 128
CPU       0.000060 seconds (336 allocations: 376.688 KiB)
GPU       0.000476 seconds (2.55 k allocations: 410.719 KiB)
Index     0.000009 seconds (14 allocations: 1.812 KiB)

d = 256
CPU       0.000252 seconds (582 allocations: 1.198 MiB)
GPU       0.001089 seconds (4.61 k allocations: 1.855 MiB)
Index     0.000018 seconds (17 allocations: 15.141 KiB)

d = 512
CPU       0.000682 seconds (1.19 k allocations: 4.774 MiB)
GPU       0.002394 seconds (9.48 k allocations: 5.790 MiB)
Index     0.000055 seconds (35 allocations: 40.375 KiB)

d = 1024
CPU       0.005550 seconds (2.28 k allocations: 18.064 MiB, 60.49% gc time)
GPU       0.005773 seconds (18.22 k allocations: 18.740 MiB)
Index     0.000177 seconds (44 allocations: 63.281 KiB)

d = 2048
CPU       0.012779 seconds (4.74 k allocations: 74.535 MiB, 25.15% gc time)
GPU       0.020421 seconds (37.87 k allocations: 70.653 MiB, 14.94% gc time)
Index     0.000683 seconds (71 allocations: 147.672 KiB)

d = 4096
CPU       0.040447 seconds (18.67 k allocations: 292.342 MiB, 12.00% gc time)
GPU       0.058996 seconds (82.66 k allocations: 263.911 MiB, 11.60% gc time)
Index     0.002649 seconds (143 allocations: 470.703 KiB)

d = 8192
CPU       0.155976 seconds (37.49 k allocations: 1.145 GiB, 13.94% gc time)
GPU       0.225138 seconds (165.99 k allocations: 1.008 GiB, 10.21% gc time)
Index     0.010904 seconds (268 allocations: 1.351 MiB)

d = 16384
CPU       0.685121 seconds (74.25 k allocations: 4.534 GiB, 13.60% gc time)
GPU       0.908984 seconds (328.80 k allocations: 3.938 GiB, 9.77% gc time)
Index     0.041428 seconds (501 allocations: 3.839 MiB)

d = 32768
Index     0.169528 seconds (1.03 k allocations: 15.046 MiB)

d = 65536
Index     0.698473 seconds (2.22 k allocations: 62.908 MiB, 0.27% gc time)

d = 131072
Index     2.817615 seconds (4.25 k allocations: 239.533 MiB, 0.18% gc time)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant