Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

avoid intermediate map allocations in multi-arg mapreduce #55301

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

mbauman
Copy link
Member

@mbauman mbauman commented Jul 29, 2024

Now that the mapreduce infrastructure (mostly) supports broadcasted objects, we can use a lazy Broadcasted on array-likes instead of using an intermediate map. This exercises the internals more thoroughly and identifies a few more places where we need to use ::AbstractArrayOrBroadcasted and requires an offset axis bugfix.

Fixes #38558

local benchmarks
julia> using BenchmarkTools

julia> function f(x, y)
           return @inbounds mapreduce(==,+,x, y)
       end
f (generic function with 1 method)

julia> function f2(x, y)
               total=0
               @inbounds for i in 1:length(x)
                       total += x[i]==y[i]
               end
               return total
       end
f2 (generic function with 1 method)

julia> x = randn(10240); y = similar(x);

julia> @btime f2(x,y)
  1.988 μs (0 allocations: 0 bytes)
0

julia> @btime f(x,y)
  2.023 μs (0 allocations: 0 bytes)
0

Alternative to: #41001 (cc @mcabbott)

Fixes #53417

more benchmarking from 53417
julia> x = randn((512,512));

julia> y = randn((512, 512));

julia> g(x) = x^2
g (generic function with 1 method)

julia> @time mapreduce(g,+,x)
  0.031427 seconds (90.73 k allocations: 4.607 MiB, 24.14% gc time, 99.23% compilation time)
262233.95257231314

julia> @time mapreduce(g,+,x)
  0.000064 seconds (1 allocation: 16 bytes)
262233.95257231314

julia> f(x,y) = x * y
f (generic function with 1 method)

julia> mapreduce(f, +, x, y);

julia> @time mapreduce(f, +, x, y);
  0.000746 seconds (3 allocations: 80 bytes)

Now that the mapreduce infrastructure (mostly) supports broadcasted objects, we can use a lazy Broadcasted on array-likes instead of using an intermediate `map`. This exercises the internals more thoroughly and identifies a few more places where we need to use `::AbstractArrayOrBroadcasted` and requires an offset axis bugfix.
@mbauman mbauman added the fold sum, maximum, reduce, foldl, etc. label Jul 29, 2024
@mbauman mbauman requested review from nsajko and N5N3 July 29, 2024 17:02
Copy link
Member

@N5N3 N5N3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change on mapreduce is a broken one as map(f, c...) has no length constraint if ndims of inputs are 1

julia> mapreduce(+, +, 1:3, 1:2) == mapreduce(+, +, 1:2, 1:2)
true

But the rest seems good though, see also #41054.

base/broadcast.jl Show resolved Hide resolved
@mbauman
Copy link
Member Author

mbauman commented Jul 29, 2024

mapreduce(+, +, 1:3, 1:2) == mapreduce(+, +, 1:2, 1:2)

😭

Ooof, this is why we can't have nice things. Pretty wildly, this behavior doesn't look to be tested. I could've sworn we had an absurdly long megathread on this (mis)feature, but I can't find it at the moment. It is documented in the generic map...

For multiple collection arguments, apply f elementwise, and stop when any of them is exhausted.

But for the specific method I'm changing here, there's a slightly different spec:

When acting on multi-dimensional arrays of the same ndims, they must all have the same axes, and the answer will too.

I suppose the easy answer is just fall back to the allocating map for the different-ndims case.

@mcabbott
Copy link
Contributor

I think I added the documentation at some point to match the reality... which IIRC was added basically by accident in 1.5, and had no tests for ages. On 1.0 map gave an error instead:

julia> VERSION
v"1.0.5"

julia>  map(+, 1:2, 1:3)
ERROR: DimensionMismatch("dimensions must match")

@mbauman
Copy link
Member Author

mbauman commented Jul 29, 2024

Yeah, but the iterator version has done it "forever":

julia> VERSION
v"1.0.5"

julia> map(+, Iterators.drop(1:5, 1), Iterators.drop(1:5, 2))
3-element Array{Int64,1}:
 5
 7
 9

It's somewhat related to #46707, but I still can't find the discussion I was thinking of.

this now matches the documentation, but still is not *exactly* the previous behavior
@mbauman
Copy link
Member Author

mbauman commented Jul 30, 2024

So this is still slightly more restrictive than what we had before. Let's see what Nanosoldier says:

@nanosoldier runtests()

@KristofferC
Copy link
Member

@nanosoldier runtests()

reduce(op, map(f, A, B...); kw...)
function mapreduce(f, op, A::AbstractArrayOrBroadcasted, B::AbstractArrayOrBroadcasted...; kwargs...)
Adims = ndims(A)
if any(b->Adims != ndims(b), B)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if any(b->Adims != ndims(b), B)
if Adims != 1 && any(b->Adims != ndims(b), B)

return reduce(op, map(f, A, B...); kwargs...)
end
Aax = axes(A)
all(b->Aax==axes(b), B) || throw(ArgumentError("all arguments must have the same axes"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps DimensionMismatch is better than ArgumentError here?

@@ -87,7 +94,7 @@ end

# initialization when computing minima and maxima requires a little care
for (f1, f2, initval, typeextreme) in ((:min, :max, :Inf, :typemax), (:max, :min, :(-Inf), :typemin))
@eval function reducedim_init(f, op::typeof($f1), A::AbstractArray, region)
@eval function reducedim_init(f, op::typeof($f1), A::AbstractArrayOrBroadcasted, region)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change seems incomplete as view in L106 has no Broadcasted support.
we might need

reduced_view(A::AbstractArray, ri) = view(A, ri...)
reduced_view(bc::Broadcasted, ri) = Broadcasted(bc.style, bc.f, bc.args, ri)

@nanosoldier
Copy link
Collaborator

The package evaluation job you requested has completed - possible new issues were detected.
The full report is available.

@mbauman mbauman marked this pull request as draft July 31, 2024 22:09
@mbauman
Copy link
Member Author

mbauman commented Jul 31, 2024

I'll put this on hold for now — #55318 will unblock the hard part here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fold sum, maximum, reduce, foldl, etc.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

mapreduce with multiple arrays allocates Mapreduce is 10x slower than loop.
5 participants