Support applying plans by column? (Or is it really 300x slower??) #216

dlfivefifty · 2021-09-28T19:17:10Z

I was surprised to learn two thing:

a vector plan doesn't work on matrices:

julia> n = 10; F = FFTW.plan_r2r!(Vector{Float64}(undef,n), FFTW.REDFT00, 1);

julia> X = randn(n,1000);

julia> F * X
ERROR: ArgumentError: FFTW plan applied to wrong-size array
Stacktrace:
 [1] assert_applicable(p::FFTW.r2rFFTWPlan{Float64, (3,), true, 1, Int64}, X::Matrix{Float64})
   @ FFTW ~/.julia/packages/FFTW/pHa9y/src/fft.jl:426
 [2] *(p::FFTW.r2rFFTWPlan{Float64, (3,), true, 1, Int64}, x::Matrix{Float64})
   @ FFTW ~/.julia/packages/FFTW/pHa9y/src/fft.jl:889
 [3] top-level scope
   @ REPL[72]:1

applying the plan column-by-column is 300x slower than the matrix plan (the copy is to avoid NaN possibly being special cased):

julia> M = FFTW.plan_r2r!(Array{Float64}(undef,n,1000), FFTW.REDFT00, 1);

julia> @btime M*copy(X);
  70.895 μs (2 allocations: 78.20 KiB)

julia> function plmul!(F, X)
       @inbounds for j in 1:size(X,2)
       F * view(X,:,j)
       end
       X
       end
plmul! (generic function with 1 method)

julia> @btime plmul!(F, copy(X));
  22.529 ms (2 allocations: 78.20 KiB)

So originally I was going to propose adding support to applying a vector plan to a matrix, but now I'm worried it's too slow. Though the scale of the slow-ness seems unbelievable! So perhaps I've just done something wrong.

The text was updated successfully, but these errors were encountered:

stevengj · 2021-09-29T02:49:37Z

Internally, FFTW doesn’t actually have an in-place DCT algorithm IIRC, so it is allocating a temporary buffer each time you execute the plan in order to perform the transform out-of-place and copy back the result. With the matrix plan it does this only once, but with your plmul! it is calling malloc once per column, which is pretty slow compared to the cost of the transform for n=10.

Of course, you could use an out-of-place plan and allocate the buffers yourself.

dlfivefifty · 2021-09-29T10:51:23Z

TFW you've been using in-place transforms to be "non-allocating" when I could should have been using out-of-place transforms 🤦‍♂️

Can confirm it's a more reasonable 3x slowdown (due to better multithreading??)

julia> n = 10; F = FFTW.plan_r2r(Vector{Float64}(undef,n), FFTW.REDFT00, 1);

julia> function plmul!(Y, F, X)
              @inbounds for j in 1:size(X,2)
              mul!(view(Y,:,j), F, view(X,:,j))
              end
              Y
           end
plmul! (generic function with 1 method)

julia> Y = randn(n,1000); X = randn(n,1000); @btime plmul!(Y, F, copy(X));
  277.529 μs (2 allocations: 78.20 KiB)

Any thoughts on whether F * X should be supported? Or perhaps 3x slowdown is reason enough to not support it to point users to the "right" way?

stevengj · 2021-09-29T20:15:42Z

It's not threading by default. Probably the slowdown is because FFTW's matrix in-place plan is only allocating a buffer big enough to hold a single column, and is re-using that for all of the columns?

dlfivefifty · 2021-09-30T10:01:37Z

What I was thinking is not actually possible in practice since the memory alignment may be off:

julia> mul!(view(randn(5,2),:,2), FFTW.plan_r2r(Vector{Float64}(undef,5), FFTW.REDFT10), view(randn(5,2),:,2))
ERROR: ArgumentError: FFTW plan applied to array with wrong memory alignment
Stacktrace:
 [1] assert_applicable(p::FFTW.r2rFFTWPlan{Float64, (5,), false, 1, UnitRange{Int64}}, X::SubArray{Float64, 1, Matrix{Float64}, Tuple{Base.Slice{Base.OneTo{Int64}}, Int64}, true})
   @ FFTW ~/.julia/packages/FFTW/pHa9y/src/fft.jl:430
 [2] assert_applicable(p::FFTW.r2rFFTWPlan{Float64, (5,), false, 1, UnitRange{Int64}}, X::SubArray{Float64, 1, Matrix{Float64}, Tuple{Base.Slice{Base.OneTo{Int64}}, Int64}, true}, Y::SubArray{Float64, 1, Matrix{Float64}, Tuple{Base.Slice{Base.OneTo{Int64}}, Int64}, true})
   @ FFTW ~/.julia/packages/FFTW/pHa9y/src/fft.jl:435
 [3] mul!(y::SubArray{Float64, 1, Matrix{Float64}, Tuple{Base.Slice{Base.OneTo{Int64}}, Int64}, true}, p::FFTW.r2rFFTWPlan{Float64, (5,), false, 1, UnitRange{Int64}}, x::SubArray{Float64, 1, Matrix{Float64}, Tuple{Base.Slice{Base.OneTo{Int64}}, Int64}, true})
   @ FFTW ~/.julia/packages/FFTW/pHa9y/src/fft.jl:876
 [4] top-level scope
   @ REPL[13]:1

dlfivefifty mentioned this issue Sep 29, 2021

Support matrix chebyshev transforms, matrix coefficients in Clenshaw JuliaApproximation/FastTransforms.jl#152

Merged

dlfivefifty closed this as completed Sep 30, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support applying plans by column? (Or is it really 300x slower??) #216

Support applying plans by column? (Or is it really 300x slower??) #216

dlfivefifty commented Sep 28, 2021

stevengj commented Sep 29, 2021

dlfivefifty commented Sep 29, 2021

stevengj commented Sep 29, 2021

dlfivefifty commented Sep 30, 2021

Support applying plans by column? (Or is it really 300x slower??) #216

Support applying plans by column? (Or is it really 300x slower??) #216

Comments

dlfivefifty commented Sep 28, 2021

stevengj commented Sep 29, 2021

dlfivefifty commented Sep 29, 2021

stevengj commented Sep 29, 2021

dlfivefifty commented Sep 30, 2021