-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Removes the crate's dependency on BigInt
, BigUint
#767
Conversation
This PR limits the range of integers that can be read and written to those that can be represented by the `i128` type. This reduces the size of `Int`, `UInt`, `Decimal`, and `Timestamp`, `RawValueRef`, and `ValueRef`. It also allows those types to implement `Copy`. It also allows us to remove several custom methods to compare across mixed `i64`/`BigInt` and `u64`/`BigUint` representations for equality and ordering.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ PR Tour
# We need at least 1.65 for GATs[1] and 1.67 for `ilog`[2] | ||
# [1] https://blog.rust-lang.org/2022/11/03/Rust-1.65.0.html | ||
# [2] https://blog.rust-lang.org/2023/01/26/Rust-1.67.0.html#stabilized-apis | ||
rust-version = "1.67" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ Small bump to the minimum Rust version to enable us to use the stabilized ilog10
method for counting decimal digits in integers.
@@ -62,7 +63,6 @@ chrono = { version = "0.4", default-features = false, features = ["clock", "std" | |||
delegate = "0.12.0" | |||
thiserror = "1.0" | |||
nom = "7.1.1" | |||
num-bigint = "0.4.3" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ 👋
@@ -99,7 +99,7 @@ pub fn criterion_benchmark(c: &mut Criterion) { | |||
b.iter(|| { | |||
let mut encoded_length: usize = 0; | |||
for value in &unsigned_values { | |||
encoded_length += black_box(FlexUInt::write_u64(&mut output, *value).unwrap()); | |||
encoded_length += black_box(FlexUInt::write(&mut output, *value).unwrap()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ The FlexUInt::write
method can now accept any int size. I ran the encoding_primitives
benchmark and confirmed that this is the same or faster than the u64
-specific method in all cases.
src/binary/decimal.rs
Outdated
magnitude: UInt { | ||
data: UIntData::U64(0), | ||
}, | ||
magnitude: UInt::ZERO, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ I added ZERO
and NEGATIVE_ZERO
const
values to the Decimal
and Coefficient
types for easier reference.
src/binary/uint.rs
Outdated
// TODO: copy from the slice and use from_be_bytes | ||
let mut magnitude: u128 = 0; | ||
for &byte in uint_bytes { | ||
let byte = u128::from(byte); | ||
magnitude <<= 8; | ||
magnitude |= byte; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ I moved this from the (now gone) BigUInt::from_bytes_be
method. As the TODO
notes, it could still be optimized, but that's for another PR.
fn write_repr_decimal(&mut self, value: Option<Decimal>) -> IonResult<()>; | ||
fn write_repr_timestamp(&mut self, value: Option<Timestamp>) -> IonResult<()>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ Several small changes to ion-hash based on this update.
self.push_byte(0xE3); // Biased FlexUInt follows | ||
FlexUInt::encode_u64(self.encoding_buffer, symbol_id as u64 - 65_792) | ||
} | ||
let _ = FlexUInt::write(self.encoding_buffer, symbol_id - 65_792).unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ This is encoding to a Vec
, so it can't fail.
), | ||
-40, | ||
), | ||
Decimal::new(Int::from(27182818284590452353602874713526624i128), -40), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ I switched this test value out for a marginally shorter one that fits in an i128
.
/// Extract the integer and fractional parts and the exponents of this as a `(BigInt, (BigInt, i64))` | ||
/// | ||
/// ```ignore | ||
/// # use num_bigint::BigInt; | ||
/// # use ion_rs::Decimal; | ||
/// let d1: Decimal = Decimal::new(123456789, -2); | ||
/// let (int_part, (frac_part, exponent)) = d1.into_parts(); | ||
/// assert_eq!(int_part, BigInt::from(1234567u64)); | ||
/// assert_eq!(frac_part, BigInt::from(89)); | ||
/// assert_eq!(exponent, -2); | ||
/// ``` | ||
pub(crate) fn into_parts(self) -> (BigInt, (BigInt, i64)) { | ||
let magnitude: BigInt = self.coefficient.try_into().unwrap(); | ||
if self.exponent.is_zero() { | ||
(magnitude, (BigInt::zero(), 0)) | ||
} else if self.exponent.is_negative() { | ||
let divisor = BigInt::from(10u64).pow((-self.exponent) as u32); | ||
let (i, f) = magnitude.div_rem(&divisor); | ||
(i, (f, self.exponent)) | ||
} else { | ||
let multiplicand = BigInt::from(10u64).pow(self.exponent as u32); | ||
(magnitude * multiplicand, (BigInt::zero(), 0)) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ This and related methods existed solely to enable conversion of decimal components to/from BigInt
.
} | ||
|
||
#[test] | ||
fn test_difference_by_parts_lossy() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ Removed tests for the also-removed difference_by_parts_lossy
& co.
src/binary/decimal.rs
Outdated
|
||
#[test] | ||
fn oversized_decimal() { | ||
let mut buf = vec![]; | ||
// Even though our output stream is a growable Vec, the encoder uses a | ||
// fixed-size, stack-allocated buffer to encode the body of the decimal. | ||
// If it's asked to encode a decimal that won't fit (>32 bytes), it should | ||
// return an error. | ||
let precise_pi = Decimal::new( | ||
BigUint::from_str("31415926535897932384626433832795028841971693993751058209749445923078164062862089986280348253421170679").unwrap(), | ||
-99 | ||
); | ||
let result = buf.encode_decimal_value(&precise_pi); | ||
assert!(result.is_err()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to keep this test and assert that it is an error? Do we elsewhere have a test that verifies the new encoding error behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sent me down a useful rabbit hole.
Because I had removed the constructor that accepted a BigUint
coefficient, it became statically impossible to trigger this case. However, because the Coefficient
type was a (UInt, Sign)
tuple, it was possible for it to store magnitudes outside the i128
range but inside the u128
range.
In my latest commits I've modified Coefficient
to an (Int, Sign)
pair where the Int
's sign information always agrees with Sign
except when the coefficient is a negative zero. This is essentially the same logic as before but with a constructor that's now capped at Int::MAX
. This had the nice side effect of also eliminating the need for constructors that take an explicit Sign
parameter because that would only ever be necessary to construct a -0
. There are now Coefficient::new(some_int_value)
and Coefficient::negative_zero()
constructors.
All of that too say, it's now statically impossible to programmatically construct a Decimal
that's too big.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great simplification. Particularly for the Decimal and Timestamp implementations. Love it.
I'm not concerned for the potential limiting of decimals and timestamps, but for Ion Integers I don't have a clear picture of what a users options are (could be none) when they have a value stream with an "overlarge" integer. I'm good with an answer like "they can access a vec of the two's complement bytes from this API", and am likely fine with they can't. But I'm not familiar enough with ion-rust to see it from this PR. Thanks.
For the moment, there's not a (stable) option. An upcoming release will expose the Longer-term, I've been thinking about augmenting the |
Co-authored-by: Joshua Barr <[email protected]>
This PR limits the range of integers that can be read and written to those that can be represented by the
i128
type.This reduces the size of
Int
,UInt
,Decimal
,Timestamp
,RawValueRef
, andValueRef
. It also allows those types to implementCopy
.It also allows us to remove several custom methods to compare across mixed
i64
/BigInt
andu64
/BigUint
representations for equality and ordering.Here's the benchmark output following this change:
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.