Issue 68: Refactor existing modular arithmetic code #80

unzvfu · 2020-09-20T00:07:17Z

This PR addresses the easy parts of #68. Almost all of the functions that were specialised to 4 or 6 limbs have been replaced with a single version that takes a const generic parameter N that can be any positive integer. There are a lot of changes, but they are all very repetitive and straight forward I think.

I say "almost all" of the functions because it became clear that more significant restructuring will be necessary to refactor all the repeated montgomery_multiplication implementations. Specifically, the dependence on the representation is more explicit in that function (though it is of course there in the other arithmetic functions). I have a solution in mind for tidying this all up that I will implement eventually (it's already partly done), but at this stage I think it's a better idea for me to do some actual performance work on the arithmetic. :)

Speaking of performance, it's important that we don't introduce any performance regressions with these refactoring PRs. I couldn't detect any performance issues with this PR with my setup, but the reviewer should double-check this on their setup!

(The commit log incorporates all the commits from my previous PR, because this one was built on top of that; the messages can be ignored, it will merge fine. Sorry for the noise; I'll rebase properly next time.)

… eventually).

…tic-code

benches/bigint_arithmetic.rs

dlubarov

Looks great otherwise! I'll just do a quick cargo bench u64 once I'm able to build, to double check for perf changes as you suggested, though it seems like there shouldn't be any.

dlubarov

LGTM! I ran the u64 benches a few times on your branch & master, and saw some 5-10% swings in both directions. I don't think performance was actually affected; most likely my laptop is just being inconsistent.

unzvfu · 2020-09-22T23:12:15Z

So after all that, I am now somewhat doubtful that rustc optimises const generic code properly. I finished implementing Montgomery squaring last night and saw that, with const generics the code was almost twice as slow as Montgomery multiplication, whereas when I simply replaced the const generic parameter with the number 4 (for the Tweedle* curves) in the same code, it took only about 60% of the time of Montgomery multiplication which is much more in line with expectations (and a pretty great result itself!).

I'll have to investigate a bit further to see if other people have had performance problems with const generics; I assumed they would work much like non-type template parameters in C++, but there's no way we should be seeing such huge performance differences if that were the case. Our options might be (i) return to the previous situation with explicit specialisations to limb length 4 and 6, or (ii) use macros instead. Let's discuss further at the next catch-up.

dlubarov · 2020-09-23T18:06:35Z

Interesting! Sorry I didn't think of this in advance, but I think the issue may be with #[unroll_for_loops]. (It's from this crate, not something officially supported by Rust devs. There's a ticket for official support, but it probably won't happen soon.) I think it only looks for for loops with integer literal bounds.

Maybe we could build a proc macro similar to #[unroll_for_loops], which replaced a for loop like

#[unroll_const_generics(max = 6)]
for i in 0..n {
  body(i);
}

with

if 0 < n { body(0); }
if 1 < n { body(1); }
if 2 < n { body(2); }
if 3 < n { body(3); }
if 4 < n { body(4); }
if 5 < n { body(5); }
if 6 < n { body(6); }
// Fallback in case a loop has more than 6 iterations.
for i in 6..n {
  body(i);
}

Then I think we should end up with the unrolled code we want, after the compiler eliminated the dead code and the always-true conditionals. Admittedly it would be a huge hack, and it might slow down compilation (though probably not that bad for small-ish max values like 6).

dlubarov · 2020-09-23T18:17:35Z

We actually started with macros, but eventually got rid of them here. As I recall we ran into a couple little annoyances. One was that some of our macros would produce dead code given certain parameters, e.g. in subtraction, we had a sometimes-empty for loop to check limbs which only existed in one operand. (Of course we could have written a separate macro for subtraction with different limb counts, or suppressed the warnings.)

Another little thing was that we ended up with lists of macro invocations like

sub_symmetric!(sub_4_4, 4);
sub_symmetric!(sub_6_6, 6);
sub_asymmetric!(sub_7_6, 7, 6);
sub_asymmetric!(sub_5_4, 5, 4);

which were slightly annoying to maintain.

Nothing that major though, so we could definitely consider returning to that approach.

Perhaps another option to consider would be just removing the BLS curve so that we only need to deal with ~256 bit fields. There's a chance that we might go back to BLS (Aztec and Matter Labs are going that direction), but I think it's relatively low.

dlubarov · 2020-09-26T19:29:10Z

I think this seems OK to merge. I was a bit confused since our main method (cargo run --release) seems to fail earlier on this branch than on master, but after digging into the error, it looks like it's just due to a recent fix on master (e30ffb8). (There are still other errors even on master though.)

…tic-code

unzvfu · 2020-09-27T04:46:06Z

I investigated a bit more on this topic. It seems rustc does unroll loops with const generic parameters to some extent, but for some reason my squaring code does not pass the compiler's internal threshold for code that's suitable to unroll. I'm fairly confident that the simple functions like cmp, add, etc. will be unrolled correctly with const generics (so this PR can indeed be merged), but anything more complicated will have to be dealt with manually.

Indeed, the unroll_for_loops procedural macro doesn't touch loops with const generic bounds, so we would have to resolve this in a manner similar to what you mentioned above. At this stage, I think it will be more advantageous to simply focus on specialising to 4 limbs; then we can adapt to 6 or some other number of limbs if and when the need arises. My sense is that this foray into writing const generic versions of the code has been more of a distraction than helpful unfortunately.

unzvfu added 22 commits August 15, 2020 14:19

Add test case generation script.

b690579

Change test case names.

ac2a208

Move gentests to subdirectory; ignore generated files.

7e8dd7b

Document gentests.py file output format.

6e40190

Add BLS12-377 base field to 'gentests.py'.

d638f19

First more-or-less full version of test suite for Tweedledum.

a4619cc

Ensure input is normalised.

6f53291

Support unary operators in test suite.

a33898a

Use zero instead of modulus for the 'None' value.

8103791

Finish unaryop/binaryop refactor; make division work.

1b161f1

Remove dependency on Tweedledum curve.

edaa478

Move test utils out of TweedledumBase (on their way to their own file…

bbde3cf

… eventually).

Generate tests with a macro; instantiate in each base field.

0901e4d

Use Field::BYTES when reading in test cases.

92f4421

Generate test inputs in a function.

9f43923

Completed test suite.

1b6d9b5

Remove unused files.

d59039e

Document some functions.

fafb997

Addressed PR comments.

3a958c9

Used const generics to refactor bigint arithmetic.

e1f3999

Merge branch 'master' into issue-68-refactor-existing-modular-arithme…

d73433e

…tic-code

Fix bad merge.

e54e1e6

unzvfu self-assigned this Sep 20, 2020

unzvfu requested a review from dlubarov September 20, 2020 00:13

unzvfu marked this pull request as ready for review September 20, 2020 00:13

dlubarov reviewed Sep 20, 2020

View reviewed changes

benches/bigint_arithmetic.rs Outdated Show resolved Hide resolved

dlubarov reviewed Sep 20, 2020

View reviewed changes

Remove mul use clause and fix cmp visibility.

2d0d8bb

dlubarov approved these changes Sep 22, 2020

View reviewed changes

Merge branch 'master' into issue-68-refactor-existing-modular-arithme…

2150541

…tic-code

unzvfu merged commit a676a71 into master Sep 27, 2020

unzvfu deleted the issue-68-refactor-existing-modular-arithmetic-code branch September 27, 2020 04:46

unzvfu mentioned this pull request Sep 27, 2020

Refactor existing modular arithmetic code #68

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue 68: Refactor existing modular arithmetic code #80

Issue 68: Refactor existing modular arithmetic code #80

unzvfu commented Sep 20, 2020 •

edited

Loading

dlubarov left a comment

dlubarov left a comment

unzvfu commented Sep 22, 2020

dlubarov commented Sep 23, 2020

dlubarov commented Sep 23, 2020

dlubarov commented Sep 26, 2020

unzvfu commented Sep 27, 2020

Issue 68: Refactor existing modular arithmetic code #80

Issue 68: Refactor existing modular arithmetic code #80

Conversation

unzvfu commented Sep 20, 2020 • edited Loading

dlubarov left a comment

Choose a reason for hiding this comment

dlubarov left a comment

Choose a reason for hiding this comment

unzvfu commented Sep 22, 2020

dlubarov commented Sep 23, 2020

dlubarov commented Sep 23, 2020

dlubarov commented Sep 26, 2020

unzvfu commented Sep 27, 2020

unzvfu commented Sep 20, 2020 •

edited

Loading