More efficient `mapreduce(f, op, A, B, C; init)` #41001

mcabbott · 2021-05-29T14:41:41Z

This is an attempt to make at least some cases of mapreduce(f, op, A, B C) faster. At present all cases just call reduce(op, map(f, A, B C)).

This PR generalises _mapreducedim! to allow several arrays, which is enough to make reductions with init fast. When init is not given, generalising reducedim_init looks delicate & largely orthogonal, so is left for future work.

For the case of scalar reductions, it appears to be faster to call only(_mapreducedim!(...)) when init is given. When init is not given, it uses zip exactly as in #39053. That isn't as fast, it saves memory but not always time compared to existing reduce(op, map(f, .... More could be done here too.

Times, with dims:

julia> @btime reduce(+, map(+, $(rand(100,100)), $(rand(100,100))), dims=1);
  4.315 μs (3 allocations: 79.08 KiB)

julia> @btime mapreduce(/, +, $(rand(100,100)), $(rand(100,100)), dims=1); # dims=1 uses fast linear path
  4.202 μs (3 allocations: 79.08 KiB)  # unchanged, without init
  9.708 μs (3 allocations: 79.08 KiB)  # test without mapreduce_impl option, to compare

julia> @btime mapreduce(/, +, $(rand(100,100)), $(rand(100,100)), dims=1, init=0.0);
  266.750 μs (40090 allocations: 627.27 KiB)  # with vararg mapreduce_impl, first attempt
  4.869 μs (301 allocations: 5.56 KiB)        # with explicit N==2 version of mapreduce_impl
  2.273 μs (1 allocation: 896 bytes)          # with ith_all --  this PR

julia> @btime mapreduce(/, +, $(rand(100,100)), $(rand(100,100)), dims=2); # dims=2 uses a CartesianIndices path
  5.910 μs (7 allocations: 79.16 KiB)

julia> @btime mapreduce(/, +, $(rand(100,100)), $(rand(100,100)), dims=2, init=0.0);
  2.755 μs (5 allocations: 976 bytes)       # with explicit if N==2 ...
  2.750 μs (5 allocations: 976 bytes)       # with ith_all --  this PR

Times, scalar reduction:

julia> @btime reduce(+, map(/, $(rand(100,100)), $(rand(100,100))));  # behaviour on master
  5.736 μs (2 allocations: 78.20 KiB)

julia> @btime mapreduce(/, +, $(rand(100,100)), $(rand(100,100)));
  9.291 μs (0 allocations: 0 bytes)    # with zip --  this PR, or #39053
  5.340 μs (1 allocation: 16 bytes)    # with _mapreduce, ultimately mapreduce_impl, removed for now

julia> @btime mapreduce(/, +, $(rand(100,100)), $(rand(100,100)); init=0.0);
  2.713 μs (9 allocations: 368 bytes)  # with only(..., dims=all), ultimately mapreduce_impl

Times on the example from #38558, on my computer:

julia> summary(x)
"10240-element Vector{Float64}"

julia> @btime reduce(+, map(==, $x, $y));  # behaviour of f(x,y) on master
  5.423 μs (1 allocation: 10.19 KiB)

julia> @btime f($x,$y);  # mapreduce, turned into zip by this PR (or #39053)
  5.028 μs (0 allocations: 0 bytes)

julia> @btime f2($x,$y);  # explicit loop
  1.975 μs (0 allocations: 0 bytes)

julia> @btime f3($x,$y);  # using MappedArrays
  2.042 μs (0 allocations: 0 bytes)

julia> @btime f4($x,$y);  # with zip
  5.028 μs (0 allocations: 0 bytes)

julia> @btime mapreduce(==, +, $x, $y, init=0);  # with init, for this PR
  2.528 μs (6 allocations: 288 bytes)

I haven't checked tests closely, nor thought about whether more are needed.

mcabbott · 2021-05-29T14:43:40Z

base/reducedim.jl

+_mapreduce_dim_vararg(f, op, init, As::Tuple, dims) =
+    _mapreduce_dim_vararg(f, op, init, As, dims, IteratorSize(Generator(f, As...)))


The only purpose of Generator(f, As...) is for the traits HasShape etc. Perhaps there's a better way.

Some weird test cases:

mapreduce(/,+,[1 2 3; 4 5 6],[7,8,9,10],dims=2,init=0.0) mapreduce(/,+,[1 2 3; 4 5 6],[7,8,9,10],dims=1,init=0.0) mapreduce(/,+,[1 2 3; 4 5 6],rand(2,3,1),dims=1,init=0.0) mapreduce(/,+,rand(2,3),rand(2,3,1),dims=2,init=0.0)

A bolder possibility would be to declare those to be errors. This would fit with restricting map(f, A, B) to size(A)==size(B), instead of this zip behaviour. For tuples, this is still how map acts; in the discussion at #42216 most people seem to prefer that to the zip.

mcabbott · 2021-05-29T14:46:49Z

base/reducedim.jl

+_mapreduce_dim_vararg(f, op, init, As::Tuple, ::Colon) =
+    only(_mapreduce_dim_vararg(f, op, init, As, 1:maximum(ndims, As)))


What I've called _mapreduce_dim_vararg could simply be methods of _mapreduce_dim. It would be tidiest to re-order its arguments from _mapreduce_dim(f, op, init, A, dims) to _mapreduce_dim(f, op, init, dims, A, Bs...). Would this be acceptable?

According to https://juliahub.com/ui/Search?q=_mapreduce_dim&type=code there are 3 packages which overload _mapreduce_dim, so perhaps it's not a disaster to change it.

mcabbott · 2021-05-29T15:21:39Z

base/reduce.jl

+# vararg version
+@noinline function mapreduce_impl(f::F, op, A::AbstractArrayOrBroadcasted,
+                                  ifirst::Integer, ilast::Integer, blksize::Int,
+                                  Bs::Vararg{AbstractArrayOrBroadcasted,N}) where {F,N}
+    Xs = (A, Bs...)
+    if ifirst == ilast
+        @inbounds x1 = ith_all(ifirst, Xs)


This could be combined with the original version? It is a bit of an ugly signature, mapreduce_impl(f, op, A, i,j,b, Bs...); not sure how hard it would be to change that to put all the arrays last.

oscardssmith · 2021-11-12T21:57:41Z

is this ready to merge?

mcabbott · 2021-11-12T22:09:23Z

I believe it works, but it's quite a thicket of functions, and it's possible that someone who knows it better may prefer a different approach.

For instance, mapreduce(f, g, A, B, C) could go to reduce(g, broadcasted(f, A, B, C)), maybe that would allow more re-use, including the case when init is not given. IIRC complete reductions of lazy Broadcasted objects were made fast, but with dims they are still slow.

If we do want this approach, I highlighted places above where I wasn't sure this was the best choice of signature.

oscardssmith · 2021-11-12T22:38:10Z

As I understand it, this isn't valid since map doesn't do the same things as broadcasted

mcabbott · 2021-11-12T22:45:36Z

It might be easy to first check for cases broadcasting allows and map does not (like extending dimensions). Cases where map stops early like zip would be harder, although it's not so clear to me whether those are intentional anyway, see #41001 (comment).

N5N3 · 2022-01-10T08:57:25Z

Fuse Broadcasted here is a better choise to me. But for performance reason, I guess the final result is a light weight MappedArrays.jl in Base.

This reverts commit e0ed9e3.

mcabbott commented May 29, 2021

View reviewed changes

dkarrasch added the performance Must go faster label May 29, 2021

mcabbott added 10 commits April 28, 2022 10:27

vararg mapreduce, take 1

ec1b068

take II

4431e3b

take III

c72ecc1

document map on multi-dimensional arrays

103b1a4

fast at last, and mapreduce_impl

691e3a8

tidy & scalar

cf4eef2

tidy, comments

7cd4cad

Revert "document map on multi-dimensional arrays"

f7f8109

This reverts commit e0ed9e3.

comments

4b02024

a case which went missing

343d9bf

mcabbott force-pushed the mapreduce branch from efcc945 to 343d9bf Compare April 28, 2022 14:34

alyst mentioned this pull request Jul 20, 2022

Faster filldist() TuringLang/DistributionsAD.jl#227

Merged

mcabbott mentioned this pull request Feb 21, 2024

mapreduce with multiple arrays allocates #53417

Open

mbauman mentioned this pull request Jul 29, 2024

avoid intermediate map allocations in multi-arg mapreduce #55301

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More efficient `mapreduce(f, op, A, B, C; init)` #41001

More efficient `mapreduce(f, op, A, B, C; init)` #41001

mcabbott commented May 29, 2021 •

edited

Loading

mcabbott May 29, 2021

mcabbott Nov 12, 2021

mcabbott May 29, 2021

mcabbott May 29, 2021

oscardssmith commented Nov 12, 2021

mcabbott commented Nov 12, 2021

oscardssmith commented Nov 12, 2021

mcabbott commented Nov 12, 2021

N5N3 commented Jan 10, 2022

		_mapreduce_dim_vararg(f, op, init, As::Tuple, dims) =
		_mapreduce_dim_vararg(f, op, init, As, dims, IteratorSize(Generator(f, As...)))

		_mapreduce_dim_vararg(f, op, init, As::Tuple, ::Colon) =
		only(_mapreduce_dim_vararg(f, op, init, As, 1:maximum(ndims, As)))

More efficient mapreduce(f, op, A, B, C; init) #41001

Are you sure you want to change the base?

More efficient mapreduce(f, op, A, B, C; init) #41001

Conversation

mcabbott commented May 29, 2021 • edited Loading

mcabbott May 29, 2021

Choose a reason for hiding this comment

mcabbott Nov 12, 2021

Choose a reason for hiding this comment

mcabbott May 29, 2021

Choose a reason for hiding this comment

mcabbott May 29, 2021

Choose a reason for hiding this comment

oscardssmith commented Nov 12, 2021

mcabbott commented Nov 12, 2021

oscardssmith commented Nov 12, 2021

mcabbott commented Nov 12, 2021

N5N3 commented Jan 10, 2022

More efficient `mapreduce(f, op, A, B, C; init)` #41001

More efficient `mapreduce(f, op, A, B, C; init)` #41001

mcabbott commented May 29, 2021 •

edited

Loading