-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mapreduce is 10x slower than loop. #38558
Comments
Perhaps this is because julia> @btime f($x,$y);
22.352 μs (1 allocation: 10.19 KiB)
julia> @btime f2($x,$y);
2.178 μs (0 allocations: 0 bytes)
julia> using MappedArrays
julia> function f3(x,y)
reduce(+, mappedarray(==, x, y))
end
f3 (generic function with 1 method)
julia> @btime f3($x,$y);
4.934 μs (0 allocations: 0 bytes) Still not quite as good as the loop, but almost there. |
If Generators were faster we could just delete the specialization for multiple arrays and use Line 288 in 19b3ec5
Generators are like generalized MappedArrays, but unfortunately they never have quite gotten the same performance. Working on their performance might be the best way to tackle this issue. |
Yes please. Generators are super useful and I use them all the time (much more than higher-order functions). |
I get 1.3 us for the loop, 10.4 us for mapreduce, and 7.2 us for the generator version (no allocations). So generators are already faster in this case, but yes need to be faster still. |
As a reasonable goalpost, here's one more timing comparison:
For me the gap between the explicit loop and MappedArrays is very small. |
|
That would be a pretty easy to do automatically, right? |
Partially addresses JuliaLang#38558
@timholy is there any good reason why Generators are slow? Is there low hanging fruit? If you have a place to point me, I'd be glad to take a look. |
- A faster version is implemented. - mapreduce seems slow, see: JuliaLang/julia#38558 - Now it returns an SVector
- A new function get_xyz_uvw(m) has been implemented. It preallocates the matrices xyz and uvw and then does a manual (threaded) loop to fill the matrices. As seen in the previous commit, mapreduce is slow (JuliaLang/julia#38558), so we have chosen this "manual" approach. - A new function get_VTKconec(m) has been implemented. It follows the same ideas as in get_xyz_uvw. - Finally, the function write_vtk(m, filename) has been modified accordingly.
It's been a while, but I think I get reasonable performance from this: f1(x, y) = mapreduce(==, +, x, y)
function f2(x, y)
total = 0
for i in eachindex(x, y)
total += @inbounds x[i] == y[i]
end
return total
end
function mapreduce_same(f, op, A::Vararg{AbstractArray,N}; kw...) where {N}
tup_f(i) = f(ntuple(j -> @inbounds(A[j][i]), Val(N))...)
mapreduce(tup_f, op, eachindex(A...); kw...)
end
f3(x, y) = mapreduce_same(==, +, x, y; init=0) My benchmarks give julia> @btime f1($x, $y);
2.489 μs (1 allocation: 10.19 KiB)
julia> @btime f2($x, $y);
1.250 μs (0 allocations: 0 bytes)
julia> @btime f3($x, $y);
1.380 μs (0 allocations: 0 bytes) There are a lot of problems with this, like breaking current behaviour for (at least) a mixture of function mapreduce_cart(f, op, A::Vararg{AbstractArray, N}; kw...) where N
tup_f(i) = f(ntuple(j -> @inbounds(A[j][i]), Val(N))...)
inds = eachindex(A...)
mapreduce(tup_f, op, LinearIndices(inds); kw...)
end
f4(x, y) = mapreduce_cart(==, +, x, y; init=0) Performance is degraded and the function now allocates. julia> @btime f4($x, $y);
1.550 μs (3 allocations: 80 bytes) Why does this happen? |
@wheeheee that's not a bad idea, similar to MappedArrays but in a few lines. Am not sure what function mapreduce_same_reshape(f, op, A::Vararg{AbstractArray,N}; kw...) where {N}
tup_f(i) = f(ntuple(j -> @inbounds(A[j][i]), Val(N))...)
# inds = reshape(eachindex(A...), axes(A[1])) # this change allows for dims=2 etc
ei = eachindex(A...) # when this is CartesianIndices, reshaping it makes it slow, and isn't necc
inds = ei isa AbstractUnitRange ? reshape(ei, axes(A[1])) : ei
mapreduce(tup_f, op, inds; kw...)
end
mapreduce_same_reshape(*, +, [1 2; 3 4], [5 6; 7 8]; dims=1) == [26 44] Benchmarks for all the above functions here: https://gist.github.com/mcabbott/4746e69f321909c3ba209518dc0447bb |
For the life of me, I can't remember what I used the |
A slack conversation led me to the realization that currently the following two functions have very different speeds.
This is clearly unfortunate as the
mapreduce
version is much clearer. Is there any way we can make this case formapreduce
not haveO(n)
allocation? (or otherwise be faster)The text was updated successfully, but these errors were encountered: