Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implements reading/writing FlexInt, FlexUInt #690

Merged
merged 18 commits into from
Jan 10, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion .github/workflows/miri.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,6 @@ jobs:
if: github.event_name == 'push' || github.event.pull_request.head.repo.full_name != 'amazon-ion/ion-rust'
strategy:
matrix:
# See https://github.com/amazon-ion/ion-rust/issues/353
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ Issue #353 was fixed in #688, which changed the CI task back to windows-latest.

os: [ubuntu-latest, windows-latest, macos-latest]
# build and test for experimental features with miri
features: ['experimental']
Expand Down
4 changes: 0 additions & 4 deletions .github/workflows/rust.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,10 +18,6 @@ jobs:
checks: write

steps:
- name: Remove MSys64 MingW64 Binaries
if: runner.os == 'Windows'
# remove this because there is a bad libclang.dll that confuses bindgen
run: Remove-Item -LiteralPath "C:\msys64\mingw64\bin" -Force -Recurse
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ This was necessary for ion-c-sys, which was removed.

- name: Install Dependencies
if: runner.os == 'Windows'
run: choco install llvm -y
Expand Down
5 changes: 5 additions & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -85,11 +85,16 @@ walkdir = "2.3"
test-generator = "0.3"
memmap = "0.7.0"
criterion = "0.5.1"
rand = "0.8.5"

[[bench]]
name = "read_many_structs"
harness = false

[[bench]]
name = "encoding_primitives"
harness = false

[profile.release]
lto = true
codegen-units = 1
Expand Down
213 changes: 213 additions & 0 deletions benches/encoding_primitives.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,213 @@
use criterion::{black_box, criterion_group, criterion_main, Criterion};
use rand::prelude::StdRng;
use rand::{distributions::Uniform, Rng, SeedableRng};
use std::io;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ This file is a suite of benchmarks comparing the time needed to read or write 10k random integers using VarUInt, VarInt, FlexUInt, or FlexInt encoding.


use ion_rs::{FlexInt, FlexUInt, ImmutableBuffer, IonResult, VarInt, VarUInt};

// Rather than store a set of test values, we hardcode a seed value and generate the same set
// on each run.
const RNG_SEED: u64 = 1024;

// The number of values (signed or unsigned) that will be read or written in each benchmark.
const NUM_VALUES: usize = 10_000;

fn generate_unsigned_values(min: u64, max: u64) -> Vec<u64> {
let mut rng = StdRng::seed_from_u64(RNG_SEED);
let range = Uniform::new(min, max);

(0..NUM_VALUES).map(|_| rng.sample(range)).collect()
}

fn generate_signed_values(min: i64, max: i64) -> Vec<i64> {
let mut rng = StdRng::seed_from_u64(RNG_SEED);
let range = Uniform::new(min, max);

(0..NUM_VALUES).map(|_| rng.sample(range)).collect()
}

pub fn criterion_benchmark(c: &mut Criterion) {
println!("# Values: {NUM_VALUES}");

// TODO: For now, these benchmarks only write values that can be serialized in 8 bytes or fewer.
// This is because `VarUInt` has a bug[1] that causes it to encode very large u64s incorrectly.
// [1]: https://github.com/amazon-ion/ion-rust/issues/689
let unsigned_values = generate_unsigned_values(u64::MIN, (2 << 49) - 1);
let signed_values = generate_signed_values(-2 << 49, (2 << 49) - 1);

// Roundtrip all of the values as 1.1 encoding primitives as a correctness/sanity check.
// Save the encoded bytes of each value sequence; we'll check its length at the end of each
// benchmark as another sanity check. VarUInt/FlexUint and VarInt/FlexInt are the same size.
let encoded_var_uints = roundtrip_var_uint_test(&unsigned_values).unwrap();
let encoded_var_ints = roundtrip_var_int_test(&signed_values).unwrap();
let encoded_flex_uints = roundtrip_flex_uint_test(&unsigned_values).unwrap();
let encoded_flex_ints = roundtrip_flex_int_test(&signed_values).unwrap();

let mut binary_1_0_group = c.benchmark_group("binary 1.0");
binary_1_0_group.bench_function("write VarUInt", |b| {
// `io::sink()` is an implementation of `io::Write` that simply discards the provided bytes
// and declares success, analogous to `/dev/null`. This minimizes the I/O logic being
// measured in each benchmark.
let mut output = io::sink();
b.iter(|| {
let mut encoded_length: usize = 0;
for value in &unsigned_values {
encoded_length += black_box(VarUInt::write_u64(&mut output, *value).unwrap());
}
assert_eq!(encoded_length, encoded_flex_uints.len());
})
});
binary_1_0_group.bench_function("read VarUInt", |b| {
b.iter(|| {
let mut decoded_length: usize = 0;
let mut input = ImmutableBuffer::new(encoded_var_uints.as_slice());
for _ in 0..unsigned_values.len() {
let (var_uint, remaining) = input.read_var_uint().unwrap();
input = remaining;
decoded_length += var_uint.size_in_bytes();
}
assert_eq!(decoded_length, encoded_var_uints.len());
})
});
binary_1_0_group.bench_function("write VarInt", |b| {
let mut output = io::sink();
b.iter(|| {
let mut encoded_length: usize = 0;
for value in &signed_values {
encoded_length += black_box(VarInt::write_i64(&mut output, *value).unwrap());
}
assert_eq!(encoded_length, encoded_flex_ints.len());
})
});
binary_1_0_group.bench_function("read VarInt", |b| {
b.iter(|| {
let mut decoded_length: usize = 0;
let mut input = ImmutableBuffer::new(encoded_var_ints.as_slice());
for _ in 0..unsigned_values.len() {
let (var_int, remaining) = input.read_var_int().unwrap();
input = remaining;
decoded_length += var_int.size_in_bytes();
}
assert_eq!(decoded_length, encoded_var_ints.len());
})
});
binary_1_0_group.finish();

let mut binary_1_1_group = c.benchmark_group("binary 1.1");
binary_1_1_group.bench_function("write FlexUInt", |b| {
let mut output = io::sink();
b.iter(|| {
let mut encoded_length: usize = 0;
for value in &unsigned_values {
encoded_length += black_box(FlexUInt::write_u64(&mut output, *value).unwrap());
}
assert_eq!(encoded_length, encoded_flex_uints.len());
})
});
binary_1_1_group.bench_function("read FlexUInt", |b| {
b.iter(|| {
let mut decoded_length: usize = 0;
let mut input = ImmutableBuffer::new(encoded_flex_uints.as_slice());
for _ in 0..unsigned_values.len() {
let (flex_uint, remaining) = input.read_flex_uint().unwrap();
input = remaining;
decoded_length += flex_uint.size_in_bytes();
}
assert_eq!(decoded_length, encoded_flex_uints.len());
})
});
binary_1_1_group.bench_function("write FlexInt", |b| {
let mut output = io::sink();
b.iter(|| {
let mut encoded_length: usize = 0;
for value in &signed_values {
encoded_length += black_box(FlexInt::write_i64(&mut output, *value).unwrap());
}
assert_eq!(encoded_length, encoded_flex_ints.len());
})
});
binary_1_1_group.bench_function("read FlexInt", |b| {
b.iter(|| {
let mut decoded_length: usize = 0;
let mut input = ImmutableBuffer::new(encoded_flex_ints.as_slice());
for _ in 0..unsigned_values.len() {
let (flex_int, remaining) = input.read_flex_int().unwrap();
input = remaining;
decoded_length += flex_int.size_in_bytes();
}
assert_eq!(decoded_length, encoded_flex_ints.len());
})
});
binary_1_1_group.finish();
}

fn roundtrip_var_uint_test(unsigned_values: &[u64]) -> IonResult<Vec<u8>> {
println!("Roundtripping unsigned values as VarUInts to check for correctness.");
let mut encoded_values_buffer = Vec::new();
for value in unsigned_values {
VarUInt::write_u64(&mut encoded_values_buffer, *value)?;
}
let mut decoded_values = Vec::new();
let mut input = ImmutableBuffer::new(encoded_values_buffer.as_slice());
for _ in 0..unsigned_values.len() {
let (var_uint, remaining) = input.read_var_uint()?;
input = remaining;
decoded_values.push(var_uint.value() as u64);
}
assert_eq!(decoded_values.as_slice(), unsigned_values);
Ok(encoded_values_buffer)
}

fn roundtrip_var_int_test(signed_values: &[i64]) -> IonResult<Vec<u8>> {
println!("Roundtripping signed values as VarInts to check for correctness.");
let mut encoded_values_buffer = Vec::new();
for value in signed_values {
VarInt::write_i64(&mut encoded_values_buffer, *value)?;
}
let mut decoded_values = Vec::new();
let mut input = ImmutableBuffer::new(encoded_values_buffer.as_slice());
for _ in 0..signed_values.len() {
let (var_int, remaining) = input.read_var_int()?;
input = remaining;
decoded_values.push(var_int.value());
}
assert_eq!(decoded_values.as_slice(), signed_values);
Ok(encoded_values_buffer)
}

fn roundtrip_flex_uint_test(unsigned_values: &[u64]) -> IonResult<Vec<u8>> {
println!("Roundtripping unsigned values as FlexUInts to check for correctness.");
let mut encoded_values_buffer = Vec::new();
for value in unsigned_values {
FlexUInt::write_u64(&mut encoded_values_buffer, *value)?;
}
let mut decoded_values = Vec::new();
let mut input = ImmutableBuffer::new(encoded_values_buffer.as_slice());
for _ in 0..unsigned_values.len() {
let (flex_uint, remaining) = input.read_flex_uint()?;
input = remaining;
decoded_values.push(flex_uint.value());
}
assert_eq!(decoded_values.as_slice(), unsigned_values);
Ok(encoded_values_buffer)
}

fn roundtrip_flex_int_test(signed_values: &[i64]) -> IonResult<Vec<u8>> {
println!("Roundtripping signed values as FlexInts to check for correctness.");
let mut encoded_values_buffer = Vec::new();
for value in signed_values {
FlexInt::write_i64(&mut encoded_values_buffer, *value)?;
}
let mut decoded_values = Vec::new();
let mut input = ImmutableBuffer::new(encoded_values_buffer.as_slice());
for _ in 0..signed_values.len() {
let (flex_int, remaining) = input.read_flex_int()?;
input = remaining;
decoded_values.push(flex_int.value());
}
assert_eq!(decoded_values.as_slice(), signed_values);
Ok(encoded_values_buffer)
}

criterion_group!(benches, criterion_benchmark);
criterion_main!(benches);
2 changes: 1 addition & 1 deletion src/binary/non_blocking/type_descriptor.rs
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ use crate::IonType;
/// [Typed Value Formats](https://amazon-ion.github.io/ion-docs/docs/binary.html#typed-value-formats)
/// section of the binary Ion spec.
#[derive(Copy, Clone, Debug, PartialEq)]
pub(crate) struct TypeDescriptor {
pub struct TypeDescriptor {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ This PR makes a couple of types pub while leaving their module pub(crate) or even private. A later change in this diff re-exports these pub types if the experimental-lazy-reader feature is enabled, allowing the benchmarks to use some of these implementation details.

pub ion_type_code: IonTypeCode,
pub ion_type: Option<IonType>,
pub length_code: u8,
Expand Down
1 change: 0 additions & 1 deletion src/element/builders.rs
Original file line number Diff line number Diff line change
Expand Up @@ -372,7 +372,6 @@ macro_rules! ion_seq {
}

use crate::{List, SExp};
pub use {ion_list, ion_sexp, ion_struct};

#[cfg(test)]
mod tests {
Expand Down
2 changes: 0 additions & 2 deletions src/element/writer.rs
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,7 @@
use crate::result::IonResult;

use crate::ion_writer::IonWriter;
pub use crate::Format::*;
use crate::IonType;
pub use crate::TextKind::*;
use crate::{Element, Value};

/// Serializes [`Element`] instances into some kind of output sink.
Expand Down
21 changes: 19 additions & 2 deletions src/lazy/binary/immutable_buffer.rs
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@ use crate::binary::var_int::VarInt;
use crate::binary::var_uint::VarUInt;
use crate::lazy::binary::encoded_value::EncodedValue;
use crate::lazy::binary::raw::value::LazyRawBinaryValue;
use crate::lazy::encoder::binary::v1_1::flex_int::FlexInt;
use crate::lazy::encoder::binary::v1_1::flex_uint::FlexUInt;
use crate::result::IonFailure;
use crate::types::UInt;
use crate::{Int, IonError, IonResult, IonType};
Expand All @@ -33,7 +35,7 @@ const MAX_INT_SIZE_IN_BYTES: usize = 2048;
///
/// Methods that `peek` at the input stream do not return a copy of the buffer.
#[derive(PartialEq, Clone, Copy)]
pub(crate) struct ImmutableBuffer<'a> {
pub struct ImmutableBuffer<'a> {
// `data` is a slice of remaining data in the larger input stream.
// `offset` is the position in the overall input stream where that slice begins.
//
Expand Down Expand Up @@ -131,7 +133,7 @@ impl<'a> ImmutableBuffer<'a> {

/// Reads the first byte in the buffer and returns it as a [TypeDescriptor].
#[inline]
pub fn peek_type_descriptor(&self) -> IonResult<TypeDescriptor> {
pub(crate) fn peek_type_descriptor(&self) -> IonResult<TypeDescriptor> {
if self.is_empty() {
return IonResult::incomplete("a type descriptor", self.offset());
}
Expand All @@ -157,6 +159,21 @@ impl<'a> ImmutableBuffer<'a> {
}
}

/// Reads a [`FlexInt`] from the buffer.
pub fn read_flex_int(self) -> ParseResult<'a, FlexInt> {
zslayton marked this conversation as resolved.
Show resolved Hide resolved
let flex_int = FlexInt::read(self.bytes(), self.offset())?;
let remaining = self.consume(flex_int.size_in_bytes());
Ok((flex_int, remaining))
}

/// Reads a [`FlexUInt`] from the buffer.
#[inline]
pub fn read_flex_uint(self) -> ParseResult<'a, FlexUInt> {
zslayton marked this conversation as resolved.
Show resolved Hide resolved
let flex_uint = FlexUInt::read(self.bytes(), self.offset())?;
let remaining = self.consume(flex_uint.size_in_bytes());
Ok((flex_uint, remaining))
}

/// Reads a `VarUInt` encoding primitive from the beginning of the buffer. If it is successful,
/// returns an `Ok(_)` containing its [VarUInt] representation.
///
Expand Down
Loading
Loading