-
-
Notifications
You must be signed in to change notification settings - Fork 213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't unbroadcast
some cases which don't need broadcasting
#973
Conversation
Yeah, this seems too specific of an optimisation to be acceptable. It also seems to need some testing with the Julialang pr you mentioned. |
Great, if you have a GPU handy, this would be a useful thing to do. |
a 30% speedup for such common operations with a few lines of code seems great |
It looks like CUDA.jl supports julia> @btime gradient((x,y) -> sum(x .* y), $(rand(100,100)), pi);
20.125 μs (7 allocations: 234.62 KiB) # master
6.108 μs (3 allocations: 78.22 KiB) # this PR + FillArrays PR
julia> @btime gradient((x,y) -> sum(x ./ y), $(rand(100,100)), pi);
19.667 μs (7 allocations: 234.62 KiB)
6.781 μs (3 allocations: 78.22 KiB) This is trimmed down from a bigger attempt to do this for more broadcasts. Things like Edit -- here what the full
Simple test case:
|
In fact the |
unbroadcast
using mapreduce
, sometimesunbroadcast
some cases which don't need broadcasting
This is a small optimisation of what I think are fairly common broadcasts:
For best effect it will need JuliaLang/julia#39053, which you can simulate via