Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

⚡️ A Cheaper Sqrt Function #212

Merged
merged 7 commits into from
May 12, 2022

Conversation

Gaussian-Process
Copy link

Description

This modifies the existing sqrt implementation to remove the final three iterations from the binary search and add a linear estimate given the intermediate value. The estimate should be within a factor of ~3 of the true sqrt, which is still enough that 7 Babylonian method iterations suffice.

Checklist

Ensure you completed all of the steps below before submitting your pull request:

  • Ran forge snapshot?
  • Ran npm run lint?
  • Ran forge test?

Pull requests with an incomplete checklist will be thrown out.

@@ -141,7 +141,7 @@ FixedPointMathLibTest:testMulWadDownEdgeCases() (gas: 886)
FixedPointMathLibTest:testMulWadUp() (gas: 959)
FixedPointMathLibTest:testMulWadUpEdgeCases() (gas: 1002)
FixedPointMathLibTest:testRPow() (gas: 2142)
FixedPointMathLibTest:testSqrt() (gas: 2537)
FixedPointMathLibTest:testSqrt() (gas: 2156)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

amazing

z := shr(12, mul(z, add(y, 65536))) // A multiply by 3 is saved from the initial z := 3
// The estimate sqrt(x) = (181/1024) * (x+1) is off by a factor of ~2.63 both when x=1
// and when x = 256 or 1/256. In the worst case, this needs seven Babylonian iterations.
z := shr(18, mul(z, add(y, 65536))) // A multiply is saved from the initial z := 181
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there overflow risk here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, since y is guaranteed to be smaller than 2^136 after the first branch above. I've made this clearer in a comment

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wonderful

// The estimate sqrt(x) = (181/1024) * (x+1) is off by a factor of ~2.83 both when x=1
// and when x = 256 or 1/256. In the worst case, this needs seven Babylonian iterations.
// There is no overflow risk here since y < 2^136 after the first branch above.
z := shr(18, mul(z, add(y, 65536))) // A multiply is saved from the initial z := 181
Copy link
Owner

@transmissions11 transmissions11 Apr 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

forgive me for the stupid questions but where do 18 and 65536 come from? where's the 1024 thats mentioned in the comment?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll make this clearer in a comment too

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome ty!

// Correctness can be checked exhaustively for x < 256, so we assume y >= 256.
// Then z*sqrt(y) is within sqrt(257)/sqrt(256) of sqrt(x), or about 20bps.

// The estimate sqrt(x) = (181/1024) * (x+1) is off by a factor of ~2.83 both when x=1
Copy link
Owner

@transmissions11 transmissions11 Apr 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

am i missing something or is it only off by ~0.646 at 1?

https://www.desmos.com/calculator/k63qkvmg4m

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and a lot more at 256

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so e.g. at x=1, the estimate is 0.3535, which is ~1/2.83 of the correct value, i.e. off by a multiplicative factor of 2.83
I'll make this a bit clearer in a comment

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahhhh gotcha my bad not familiar with that terminology

z := shr(18, mul(z, add(y, 65536))) // A multiply is saved from the initial z := 181

// Run the Babylonian method seven times. This should be enough given initial estimate.
// Possibly with a quadratic/cubic polynomial above we could get 4-6.
Copy link
Owner

@transmissions11 transmissions11 Apr 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whats the reason not to do this haha? esp if we could keep the iterations and remove more parts of the binary search (the jumps for the ifs are probably the most expensive part here)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so I think removing even one more binary search branch is a bit tricky - that would increase the range we need to estimate sqrt on from [1/256, 256) to [1/65536, 65536).

In that range, the best linear estimate is then off by a factor of ~12 or so, and would need 10 Babylonian iterations
If a quadratic estimate was within a factor of ~5.5 across the whole range, 8 Babylonian iterations would be enough, and maybe that's worth it?

I don't know a good way to find the best quadratic/cubic polynomials, so I'm not sure what their error bounds would look like

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah yeahh that bounds increase sounds like a PITA, this is already great can explore that at another time hehe

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried one fewer binary branches, and the linear estimate 45/1024 with max error factor in [1/65536, 65536) of smaller than 11.4, which I think is just enough for 9 Babylonian iterations to suffice, but it looks like it uses slightly more gas unfortunately

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah interesting, thanks for investigating!

@transmissions11
Copy link
Owner

amazing work thank you so much @Gaussian-Process, will get this merged asap!

@transmissions11 transmissions11 changed the base branch from main to v7 May 12, 2022 22:05
@transmissions11 transmissions11 changed the title A Cheaper Sqrt Function ⚡️ A Cheaper Sqrt Function May 12, 2022
@transmissions11 transmissions11 merged commit e1677c9 into transmissions11:v7 May 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants