-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
String conversion from decimal (#108) #122
Conversation
Sorry, I missed that this PR existed. It is still marked wip, but I guess it's more or less done? |
@holiman no problem, thanks for taking the time. yeah, it's marked as WIP because I am not exactly sure if this is what everyone prefers (in terms of the sql stuff), and performance isn't super optimized. if you're fine with that for now, I can mark as done and finalize. otherwise let me know if there is anything you would like me to change |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking ok. I think we should also add a fuzzer which compares the SetString
in Int
against big.Int
SetString
.
conversion.go
Outdated
// Value implements the database/sql/driver Valuer interface. | ||
// It encodes a string, because that is what postgres uses for its numeric type | ||
func (src Int) Value() (driver.Value, error) { | ||
return string(src.ToBig().String()) + "e0", nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why string(..)
on a string
? And why the concatenation of e0
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no clue why i cast a string to string, will fix lol.
The e0 is to ensure that a large number is properly passed and understood as as "numeric/decimal". postgres wasn't happy when i fed it very large base 10 integer strings without the e0. postgres is able to downcast integers with exponential notation that are below size int64 to bigint transparently, so this felt like the most inclusive solution.
https://www.postgresql.org/docs/current/sql-syntax-lexical.html section 4.1.2.6. Numeric Constants
that said, I don't use any SQL-style DB's that aren't postgresql compatible. Not sure how MariaDB/Spanner will treat this
i've added a fuzz test for base10 using go's built in fuzzer. It runs when go test runs. It checks that
I've also changed SetString to closer match the big.Int interface, and changed the errors returned by FromBase10 let me know if you want me to change / add / fix anything! |
conversion.go
Outdated
if b == nil { | ||
return true | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In order to get a verdict, I checked what big
does:
func TestFoobar(t *testing.T) {
new(big.Int).Set(nil)
}
==>
--- FAIL: TestFoobar (0.00s)
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
panic: runtime error: invalid memory address or nil pointer dereference
So, this is just defensive programming, let's remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool. just removed.
fix scan
Update conversion.go
Sorry for being so slow on this, and thanks for keeping it up! Regarding the API, with this PR, it becomes
And a few more. The odd one out is
Also, isn't |
I pushed a minor change to make the API more coherent |
base10.go
Outdated
iv := 19 | ||
c := 0 | ||
if len(bs) >= (iv * 4) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a bit of description what happens inside this method -- what algorithm you're following?
Doesn't have to be a lot, but a brief description.
Also, what does iv
mean? I associate the term with cryptographic input vector, but this is probably something else?
Also, is the unrolled loop really worth it? If there's no tangible gain, I'd prefer to have it in a shorter form
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lol. i think iv originally meant "index value" or something stupid like that.
I've cleaned up the function and added some comments - let me know if it's still unclear
base10_test.go
Outdated
} | ||
// if its too large, ignore it also | ||
if val.Cmp(max256.ToBig()) > 0 { | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If >256, we should have a non-nil err
. Should check that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking pretty good, I might do some finishing touches before merge
base10.go
Outdated
// however, the last number will always be below 19 characters, so i=0 is dealt with as special case | ||
for i := 4; i >= 1; i-- { | ||
// check if the length of the string is larger than cutLength * i | ||
if len(bs) >= (cutLength * i) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reverse the check, so you can unindent the clause
if len(bs) < (cutLength * i){
continue
}
Although, I wonder if it wouldn't make sense to start a better index. Something like this (might be off-by-one):
for i := len(bs) / 19; ...
Also, cutLength is a constant. It's ok to have it as a var
, but I don't know, we could just also inline 19
. I don't know which I prefer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch on the loop, my bad! switched it to use continue and start at len(bs)/cutLength
I don't think it's off by one since if it's below 19, we would want it to be 0, and ignore the loop altogether, while otherwise, we are looking for the floor. if it's 19, it will perfectly match, and the stragglers case will get short circuited
cutLength is only used in this function - to me it makes more sense to leave it inside the function, unless other things start reusing that number 19.
Co-authored-by: Martin Holst Swende <[email protected]>
Co-authored-by: Martin Holst Swende <[email protected]>
i was thinking about this a little bit. @holiman what do you feel about parsing strings with "-" prefix as the two's complement? (so remove the -, parse the number, then Neg it) it would be convenient, and i don't feel it is out of scope for the package. (it already supports sdiv, smod, neg, abs, etc) of course, there is no way to tell if a string should be marshaled into negative or not, so i think it should always marshal to string as positive, but being able to parse negative string input doesn't seem like a super awful idea. |
I'm not keen on it. Also, I think encoders/decoders should come in pairs, and as you say, "there is no way to tell if a string should be marshaled into negative or not, so i think it should always marshal to string as positive" -- I take that as a sign that it's not a good fit. Also, I saw that
So if we merge this PR, then we can never implement that interface, because they are mutually exclusive.... :/ But I think that's ok, just wanted to highlight it |
The coverage is now back at 100%, and I changed the circle-ci fuzzer to use golang native fuzzing. TODO there is to make our old fuzzer (fuzzTernaryData etc) compatible with the native format, so it's run on circle. And then to make them all work with oss-fuzz, and we can scrap the dvyakov fuzzers. But that's not for this PR. @chfast want to take a look before we merge this? |
|
good points - totally agree. I think where negatives are needed - application level can deal with it. (i went through our codebase, it was only 3 places of, so not a big deal) |
if splt[1] == "0" { | ||
return nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1e0
should be 1
, not 0
... ?
Not sure what 0e0
is defined as, need to check that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't that what this will return? assume X = splt[0], Y = splt[1]
I think XeY woudl be X * 10 ^ Y
so 0e0
should be 0 * 10^0 = 0
In the previous lines, i do dst.SetFromDecimal(X)
then if Y == 0
, then i should just return, since X*10^0=X
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My bad, yes you're right
conversion.go
Outdated
exp.Exp(NewInt(10), exp) | ||
dst.Mul(dst, exp) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For Exp
we have no overflow checks,
// Exp sets z = base**exponent mod 2**256, and returns z.
For Mul
, we do have MulOverflow
, so we could check that and return error if it does not fit. But it feels a bit wonky to just silently fail the conversion (because of the Exp
) without signalling an error.
IMO, it would have been ok if the failure-mode had been:
- The result is
X mod 2^256
, whereX
is the correct result.
But I think the possible failures are different .. (?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think you are right here, they will be different.
i think the popular postgresql scanner will error anyways if the value is too large to be scanned in, so I think that it is fine to do the same.
I am not really fond of the whole scientific notation-parsing in |
Thanks for the hard work! |
hi, i was in need of two things
i saw that issue #108 was a thing, and so I created a fromBase10 and incomplete SetString(s string, base int) (err error), since it can't do binary (0b prefix)
not really sure where to post this, but i thought a PR would be good at least so you could see the code changes. i don't really know what interface you are targeting, maybe setstring should return a bool to match big.Int?
big.Int's setString has an alloc in it, so I thought i would get rid of it since i saw the library was promising 0 alloc. i did a super stupid and naive implementation for base 10 parsing, since it was something i needed in another project i was working on, and I just needed something that worked. i've been using this fork and it's been working fine (at least for now)
if you are curious, the benchmarks are here. so it's better than std but it's still not great... i'll be working on improving it, will probably require a rethink lol
base16 is the existing hex implementation.