-
Notifications
You must be signed in to change notification settings - Fork 672
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
⚡️ A Cheaper Sqrt Function #212
⚡️ A Cheaper Sqrt Function #212
Conversation
@@ -141,7 +141,7 @@ FixedPointMathLibTest:testMulWadDownEdgeCases() (gas: 886) | |||
FixedPointMathLibTest:testMulWadUp() (gas: 959) | |||
FixedPointMathLibTest:testMulWadUpEdgeCases() (gas: 1002) | |||
FixedPointMathLibTest:testRPow() (gas: 2142) | |||
FixedPointMathLibTest:testSqrt() (gas: 2537) | |||
FixedPointMathLibTest:testSqrt() (gas: 2156) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
amazing
z := shr(12, mul(z, add(y, 65536))) // A multiply by 3 is saved from the initial z := 3 | ||
// The estimate sqrt(x) = (181/1024) * (x+1) is off by a factor of ~2.63 both when x=1 | ||
// and when x = 256 or 1/256. In the worst case, this needs seven Babylonian iterations. | ||
z := shr(18, mul(z, add(y, 65536))) // A multiply is saved from the initial z := 181 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there overflow risk here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, since y is guaranteed to be smaller than 2^136 after the first branch above. I've made this clearer in a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wonderful
// The estimate sqrt(x) = (181/1024) * (x+1) is off by a factor of ~2.83 both when x=1 | ||
// and when x = 256 or 1/256. In the worst case, this needs seven Babylonian iterations. | ||
// There is no overflow risk here since y < 2^136 after the first branch above. | ||
z := shr(18, mul(z, add(y, 65536))) // A multiply is saved from the initial z := 181 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
forgive me for the stupid questions but where do 18 and 65536 come from? where's the 1024
thats mentioned in the comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll make this clearer in a comment too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
awesome ty!
src/utils/FixedPointMathLib.sol
Outdated
// Correctness can be checked exhaustively for x < 256, so we assume y >= 256. | ||
// Then z*sqrt(y) is within sqrt(257)/sqrt(256) of sqrt(x), or about 20bps. | ||
|
||
// The estimate sqrt(x) = (181/1024) * (x+1) is off by a factor of ~2.83 both when x=1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
am i missing something or is it only off by ~0.646 at 1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and a lot more at 256
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so e.g. at x=1, the estimate is 0.3535, which is ~1/2.83 of the correct value, i.e. off by a multiplicative factor of 2.83
I'll make this a bit clearer in a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ahhhh gotcha my bad not familiar with that terminology
z := shr(18, mul(z, add(y, 65536))) // A multiply is saved from the initial z := 181 | ||
|
||
// Run the Babylonian method seven times. This should be enough given initial estimate. | ||
// Possibly with a quadratic/cubic polynomial above we could get 4-6. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whats the reason not to do this haha? esp if we could keep the iterations and remove more parts of the binary search (the jumps for the ifs are probably the most expensive part here)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so I think removing even one more binary search branch is a bit tricky - that would increase the range we need to estimate sqrt on from [1/256, 256) to [1/65536, 65536).
In that range, the best linear estimate is then off by a factor of ~12 or so, and would need 10 Babylonian iterations
If a quadratic estimate was within a factor of ~5.5 across the whole range, 8 Babylonian iterations would be enough, and maybe that's worth it?
I don't know a good way to find the best quadratic/cubic polynomials, so I'm not sure what their error bounds would look like
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah yeahh that bounds increase sounds like a PITA, this is already great can explore that at another time hehe
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried one fewer binary branches, and the linear estimate 45/1024 with max error factor in [1/65536, 65536) of smaller than 11.4, which I think is just enough for 9 Babylonian iterations to suffice, but it looks like it uses slightly more gas unfortunately
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah interesting, thanks for investigating!
amazing work thank you so much @Gaussian-Process, will get this merged asap! |
Description
This modifies the existing sqrt implementation to remove the final three iterations from the binary search and add a linear estimate given the intermediate value. The estimate should be within a factor of ~3 of the true sqrt, which is still enough that 7 Babylonian method iterations suffice.
Checklist
Ensure you completed all of the steps below before submitting your pull request:
forge snapshot
?npm run lint
?forge test
?Pull requests with an incomplete checklist will be thrown out.