`@batch` slower than `Threads.@threads` #32

ParadaCarleton · 2021-07-15T15:46:34Z

Tried to see if replacing Threads.@threads with @batch would improve performance on smaller workloads for a function. This seems to introduce some kind of data race (which wasn't present with Threads.@threads), which results in many NaN values and incorrect results, which were caught by the tests. The relevant loop is here. Not sure what the cause is; the for loop is looping over calculations for each data point, and each one of these should be completely independent of the calculations for every other data point.

The text was updated successfully, but these errors were encountered:

chriselrod · 2021-07-15T15:56:56Z

Can you provide a minimal example I can run?

ParadaCarleton · 2021-07-15T16:01:42Z

Can you provide a minimal example I can run?

I'll try, but I expect it to be quite difficult since the tests are based on comparing the results of a fairly complex calculation to those produced by another program in R; I'd have to build tests for much smaller subsets of the calculations. Replacing Threads.@threads with @batch per=thread in the loop I linked generates the problem, but I understand that's not really "Minimal."

chriselrod · 2021-07-15T16:11:04Z

Replacing Threads.@threads with @Batch per=thread in the loop I linked generates the problem, but I understand that's not really "Minimal."

While not ideal, having code to call psis would help.

chriselrod · 2021-07-15T16:13:50Z

What I'd do from there is store post_sample_size, r_eff, and weights and hopefully produce a minimal example from that.

    @inbounds Threads.@threads for i in eachindex(tail_length)
        tail_length[i] = def_tail_length(post_sample_size, r_eff[i])
        ξ[i] = @views ParetoSmooth.do_psis_i!(weights[i,:], tail_length[i])
    end

A trick for storing things is defining this in your REPL:

julia> const ARGS = Ref{Any}()

and then (using Revise.jl), edit your package to say

    Main.ARGS[] = copy.((post_sample_size, r_eff, weights))
    @inbounds Threads.@threads for i in eachindex(tail_length)
        tail_length[i] = def_tail_length(post_sample_size, r_eff[i])
        ξ[i] = @views ParetoSmooth.do_psis_i!(weights[i,:], tail_length[i])
    end

now, calling the function will store those into the ARGS defined in your REPL.
This is a neat trick for getting at internal data structures that may be hard to extract otherwise from deeply nested functions.

ParadaCarleton · 2021-07-15T16:17:05Z

Replacing Threads.@threads with @Batch per=thread in the loop I linked generates the problem, but I understand that's not really "Minimal."

While not ideal, having code to call psis would help.

The tests in the test file should do what you need, in that case; the test folder contains an example log-likelihood array used for testing, as well as the saved results of the same calculations in R.

ParadaCarleton · 2021-07-15T16:17:57Z

What I'd do from there is store post_sample_size, r_eff, and weights and hopefully produce a minimal example from that.
    @inbounds Threads.@threads for i in eachindex(tail_length)
        tail_length[i] = def_tail_length(post_sample_size, r_eff[i])
        ξ[i] = @views ParetoSmooth.do_psis_i!(weights[i,:], tail_length[i])
    end
A trick for storing things is defining this in your REPL:
julia> const ARGS = Ref{Any}()
and then (using Revise.jl), edit your package to say
    Main.ARGS[] = copy.((post_sample_size, r_eff, weights))
    @inbounds Threads.@threads for i in eachindex(tail_length)
        tail_length[i] = def_tail_length(post_sample_size, r_eff[i])
        ξ[i] = @views ParetoSmooth.do_psis_i!(weights[i,:], tail_length[i])
    end
now, calling the function will store those into the ARGS defined in your REPL.
This is a neat trick for getting at internal data structures that may be hard to extract otherwise from deeply nested functions.

I see; how do I get those to you after that?

chriselrod · 2021-07-15T16:19:11Z

I see; how do I get those to you after that?

If they're big, then there isn't really a convenient way. I'll look at the test suite.

ParadaCarleton · 2021-07-15T16:20:16Z

I see; how do I get those to you after that?

If they're big, then there isn't really a convenient way.

Yeah, weights is pretty big (big enough to be inconvenient, at least; 32 x 1000).

ParadaCarleton · 2021-07-15T16:24:56Z

Something that might be worth noting is that I tried to parallelize this with FLoops.jl as well, and it complained about "weights" being a boxed variable, which might create data races. I don't know what that means, but it might be useful to you. I figured this was just a problem for FLoops, rather than an inherent feature of the code that would produce data races, since the tests all passed when using Threads.@threads, indicating near-equality between the R outputs and Julia outputs.

chriselrod · 2021-07-15T16:26:39Z

I don't think that'd be a problem.

The first thing I'll look at is if Polyester/StrideArraysCore has a problem with views.
Either with the macro, or with StrideArraysCore's custom view-handling.

It'll be at least a couple hours before I can look at this.

chriselrod · 2021-07-18T06:31:42Z

Just as an update, I fixed a bug in StrideArraysCore so that it now gets the correct answer, but @threads is several times faster than @batch for me. I still need to look into why.

ParadaCarleton · 2021-07-18T16:30:16Z

Just as an update, I fixed a bug in StrideArraysCore so that it now gets the correct answer, but @threads is several times faster than @batch for me. I still need to look into why.

Cool, thanks! That's surprising -- I would have expected it to be, at worst, the same speed.

chriselrod · 2021-08-28T16:07:38Z

I changed the name of the issue to reflect the updated status.

chriselrod changed the title ~~@batch results in incorrect calculations (While Threads.@threads doesn't)~~ @batch slower than Threads.@threads Aug 28, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`@batch` slower than `Threads.@threads` #32

`@batch` slower than `Threads.@threads` #32

ParadaCarleton commented Jul 15, 2021

chriselrod commented Jul 15, 2021

ParadaCarleton commented Jul 15, 2021 •

edited

Loading

chriselrod commented Jul 15, 2021

chriselrod commented Jul 15, 2021 •

edited

Loading

ParadaCarleton commented Jul 15, 2021

ParadaCarleton commented Jul 15, 2021

chriselrod commented Jul 15, 2021 •

edited

Loading

ParadaCarleton commented Jul 15, 2021 •

edited

Loading

ParadaCarleton commented Jul 15, 2021

chriselrod commented Jul 15, 2021 •

edited

Loading

chriselrod commented Jul 18, 2021

ParadaCarleton commented Jul 18, 2021

chriselrod commented Aug 28, 2021

@batch slower than Threads.@threads #32

@batch slower than Threads.@threads #32

Comments

ParadaCarleton commented Jul 15, 2021

chriselrod commented Jul 15, 2021

ParadaCarleton commented Jul 15, 2021 • edited Loading

chriselrod commented Jul 15, 2021

chriselrod commented Jul 15, 2021 • edited Loading

ParadaCarleton commented Jul 15, 2021

ParadaCarleton commented Jul 15, 2021

chriselrod commented Jul 15, 2021 • edited Loading

ParadaCarleton commented Jul 15, 2021 • edited Loading

ParadaCarleton commented Jul 15, 2021

chriselrod commented Jul 15, 2021 • edited Loading

chriselrod commented Jul 18, 2021

ParadaCarleton commented Jul 18, 2021

chriselrod commented Aug 28, 2021

`@batch` slower than `Threads.@threads` #32

`@batch` slower than `Threads.@threads` #32

ParadaCarleton commented Jul 15, 2021 •

edited

Loading

chriselrod commented Jul 15, 2021 •

edited

Loading

chriselrod commented Jul 15, 2021 •

edited

Loading

ParadaCarleton commented Jul 15, 2021 •

edited

Loading

chriselrod commented Jul 15, 2021 •

edited

Loading