-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Evaluate parallelization #1
Conversation
What's up with the jumps in running time in the parallel case? E.g. it's non-increasing between 2^5 and 2^6 |
The speed up seems great! As Marcin says, there are some steps with unexpected time jumps. A possible explanation is some degree of parallelisation only activating if some runtime threshold is reached (just guessing). Another thing to take into account: when benchmarking such low-work code as the first few entries of the table, it is best to run that code many times (say Two questions:
|
The measurements are produced in a single round; I assumed these results already show a clear speedup for large input instances. I can run multiple rounds to have more significant results though. |
@Antonio95 you are right, It definitely makes sense to zip the arrays for readability. |
8cfc291
to
ae3ed75
Compare
New benchmarks with respect to the original implementation, computed using
|
I wonder if there are any tricks that would allow you to use the non-parallel case for <2^7. |
Also make sure that the poly definition used here is the one that is not redundant (due to microsoft#9) |
The obsolete poly definition is in |
Cool. You can remove the obsolete definition in this PR or a follow up, up to you. |
Yeah looks good. Could you repost the updated benches? |
Alright, I guess it does make sense to do it here.
Sorry, I didn't mean to re-request review. The benchmarks did not change much; I'm running them again, though. |
Unfortunately, any trick would require a hard coded check on the length of the array and not parallelizing if it is too small. This is the problem with parallelizing small instances (rayon-rs/rayon#648). |
Right, so it's as we all thought. Thanks for pointing to a source for this! |
New benches:
|
Top.
Then we can PR directly to upstream I think |
Fix tabulation Add missing import
Fix random `Fp` slice creation Fix formatting
c3b8b3e
to
c87e6ae
Compare
c87e6ae
to
3ccd639
Compare
The current implementation of EqPolynomial::evaluate is not parallelized. Using rayon's
into_par_iter
reduces the function's performance.The following time benchmarks show the performance of the function with and without parallelization:
In my machine, a speedup of roughly 4x is achieved when parallelizing the function, which is already quite significant.