Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

@batch slower than Threads.@threads #32

Open
ParadaCarleton opened this issue Jul 15, 2021 · 13 comments
Open

@batch slower than Threads.@threads #32

ParadaCarleton opened this issue Jul 15, 2021 · 13 comments

Comments

@ParadaCarleton
Copy link

Tried to see if replacing Threads.@threads with @batch would improve performance on smaller workloads for a function. This seems to introduce some kind of data race (which wasn't present with Threads.@threads), which results in many NaN values and incorrect results, which were caught by the tests. The relevant loop is here. Not sure what the cause is; the for loop is looping over calculations for each data point, and each one of these should be completely independent of the calculations for every other data point.

@chriselrod
Copy link
Member

Can you provide a minimal example I can run?

@ParadaCarleton
Copy link
Author

ParadaCarleton commented Jul 15, 2021

Can you provide a minimal example I can run?

I'll try, but I expect it to be quite difficult since the tests are based on comparing the results of a fairly complex calculation to those produced by another program in R; I'd have to build tests for much smaller subsets of the calculations. Replacing Threads.@threads with @batch per=thread in the loop I linked generates the problem, but I understand that's not really "Minimal."

@chriselrod
Copy link
Member

Replacing Threads.@threads with @Batch per=thread in the loop I linked generates the problem, but I understand that's not really "Minimal."

While not ideal, having code to call psis would help.

@chriselrod
Copy link
Member

chriselrod commented Jul 15, 2021

What I'd do from there is store post_sample_size, r_eff, and weights and hopefully produce a minimal example from that.

    @inbounds Threads.@threads for i in eachindex(tail_length)
        tail_length[i] = def_tail_length(post_sample_size, r_eff[i])
        ξ[i] = @views ParetoSmooth.do_psis_i!(weights[i,:], tail_length[i])
    end

A trick for storing things is defining this in your REPL:

julia> const ARGS = Ref{Any}()

and then (using Revise.jl), edit your package to say

    Main.ARGS[] = copy.((post_sample_size, r_eff, weights))
    @inbounds Threads.@threads for i in eachindex(tail_length)
        tail_length[i] = def_tail_length(post_sample_size, r_eff[i])
        ξ[i] = @views ParetoSmooth.do_psis_i!(weights[i,:], tail_length[i])
    end

now, calling the function will store those into the ARGS defined in your REPL.
This is a neat trick for getting at internal data structures that may be hard to extract otherwise from deeply nested functions.

@ParadaCarleton
Copy link
Author

Replacing Threads.@threads with @Batch per=thread in the loop I linked generates the problem, but I understand that's not really "Minimal."

While not ideal, having code to call psis would help.

The tests in the test file should do what you need, in that case; the test folder contains an example log-likelihood array used for testing, as well as the saved results of the same calculations in R.

@ParadaCarleton
Copy link
Author

What I'd do from there is store post_sample_size, r_eff, and weights and hopefully produce a minimal example from that.

    @inbounds Threads.@threads for i in eachindex(tail_length)
        tail_length[i] = def_tail_length(post_sample_size, r_eff[i])
        ξ[i] = @views ParetoSmooth.do_psis_i!(weights[i,:], tail_length[i])
    end

A trick for storing things is defining this in your REPL:

julia> const ARGS = Ref{Any}()

and then (using Revise.jl), edit your package to say

    Main.ARGS[] = copy.((post_sample_size, r_eff, weights))
    @inbounds Threads.@threads for i in eachindex(tail_length)
        tail_length[i] = def_tail_length(post_sample_size, r_eff[i])
        ξ[i] = @views ParetoSmooth.do_psis_i!(weights[i,:], tail_length[i])
    end

now, calling the function will store those into the ARGS defined in your REPL.
This is a neat trick for getting at internal data structures that may be hard to extract otherwise from deeply nested functions.

I see; how do I get those to you after that?

@chriselrod
Copy link
Member

chriselrod commented Jul 15, 2021

I see; how do I get those to you after that?

If they're big, then there isn't really a convenient way. I'll look at the test suite.

@ParadaCarleton
Copy link
Author

ParadaCarleton commented Jul 15, 2021

I see; how do I get those to you after that?

If they're big, then there isn't really a convenient way.

Yeah, weights is pretty big (big enough to be inconvenient, at least; 32 x 1000).

@ParadaCarleton
Copy link
Author

Something that might be worth noting is that I tried to parallelize this with FLoops.jl as well, and it complained about "weights" being a boxed variable, which might create data races. I don't know what that means, but it might be useful to you. I figured this was just a problem for FLoops, rather than an inherent feature of the code that would produce data races, since the tests all passed when using Threads.@threads, indicating near-equality between the R outputs and Julia outputs.

@chriselrod
Copy link
Member

chriselrod commented Jul 15, 2021

I don't think that'd be a problem.

The first thing I'll look at is if Polyester/StrideArraysCore has a problem with views.
Either with the macro, or with StrideArraysCore's custom view-handling.

It'll be at least a couple hours before I can look at this.

@chriselrod
Copy link
Member

Just as an update, I fixed a bug in StrideArraysCore so that it now gets the correct answer, but @threads is several times faster than @batch for me. I still need to look into why.

@ParadaCarleton
Copy link
Author

Just as an update, I fixed a bug in StrideArraysCore so that it now gets the correct answer, but @threads is several times faster than @batch for me. I still need to look into why.

Cool, thanks! That's surprising -- I would have expected it to be, at worst, the same speed.

@chriselrod chriselrod changed the title @batch results in incorrect calculations (While Threads.@threads doesn't) @batch slower than Threads.@threads Aug 28, 2021
@chriselrod
Copy link
Member

I changed the name of the issue to reflect the updated status.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants