Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: msm spawn more go routines if scalar contain small values for first chunk #76

Merged
merged 5 commits into from
Sep 21, 2021

Conversation

gbotrel
Copy link
Collaborator

@gbotrel gbotrel commented Sep 20, 2021

In few instances, inputs to the msm (MultiExp) may have a significant number of 0 and 1.

For large instances, the cost of filtering these values is high since we must allocate large area of memory and copy.

Zeroes are not too costly, and in practice we don't seem to suffer from high number of branch misprediction. Each go routine processing a chunk of bit is going to test the digit it gets, if it's zero, it does nothing (and all the other go routines running at the same time do nothing too, since all the digits are 0).

However, "1" values means the first go routine (processing the c lower bits of the scalars) is going to do more work that the other go routines (who only hit "0" digits).

To workaround since (avoiding memory copy); we count the number of scalar where only the c-lowest bits are set. If this represent more than 10% of the total msm instance, we spawn 2 go routines (and process half of the scalars in each) for the msm.

This is likely not an optimal solution and we may iterate to improve complexity for all cases since:

  • we now allocate 1 extra set of buckets
  • the way we determine c, the size of the bit-window could change if we took into account these small values and the zeroes.

However, in practice, it allows the msm instance in such cases to finish faster, since most of the go routines will finish at the same time.

@gbotrel gbotrel merged commit bf7349f into develop Sep 21, 2021
@gbotrel gbotrel deleted the msm-ones branch September 21, 2021 17:26
Copy link
Collaborator

@yelhousni yelhousni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, the choice of C is not theoretically optimal but in practice this leads to nice speedups especially when there are many 1 scalars.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants