Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial raw lazy text reader (top-level nulls, bools, ints) #609

Merged
merged 3 commits into from
Aug 10, 2023
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ num-bigint = "0.4.3"
num-integer = "0.1.44"
num-traits = "0.2"
arrayvec = "0.7"
smallvec = "1.9.0"
smallvec = {version ="1.9.0", features = ["const_generics"]}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ The const_generics feature of the smallvec crate provides trait implementations for all sizes of backing array ([u8; N] in our case) rather than just 0-32 and several powers of two beyond. It's a feature because smallvec predates const generics and didn't want to force a breaking change.

digest = { version = "0.9", optional = true }
sha2 = { version = "0.9", optional = true }

Expand Down
19 changes: 0 additions & 19 deletions src/lazy/binary/encoding.rs
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ This file was moved to the parent directory, not deleted. It appears again later in the diff.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ This file was moved to its parent directory, not deleted. I'll call out its new location when it appears later in the diff.

This file was deleted.

1 change: 0 additions & 1 deletion src/lazy/binary/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,5 @@ mod encoded_value;
pub mod immutable_buffer;
pub mod raw;

pub(crate) mod encoding;
#[cfg(test)]
pub(crate) mod test_utilities;
2 changes: 1 addition & 1 deletion src/lazy/binary/raw/mod.rs
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
pub mod annotations_iterator;
pub mod lazy_raw_sequence;
pub mod reader;
pub mod sequence;
pub mod r#struct;
pub mod value;
2 changes: 1 addition & 1 deletion src/lazy/binary/raw/reader.rs
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
use crate::lazy::binary::encoding::BinaryEncoding;
use crate::lazy::binary::immutable_buffer::ImmutableBuffer;
use crate::lazy::binary::raw::value::LazyRawBinaryValue;
use crate::lazy::decoder::LazyRawReader;
use crate::lazy::encoding::BinaryEncoding;
use crate::lazy::raw_stream_item::RawStreamItem;
use crate::result::IonFailure;
use crate::IonResult;
Expand Down
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
use crate::lazy::binary::encoding::BinaryEncoding;
use crate::lazy::binary::immutable_buffer::ImmutableBuffer;
use crate::lazy::binary::raw::annotations_iterator::RawBinaryAnnotationsIterator;
use crate::lazy::binary::raw::reader::DataSource;
use crate::lazy::binary::raw::value::LazyRawBinaryValue;
use crate::lazy::decoder::private::LazyContainerPrivate;
use crate::lazy::decoder::LazyRawSequence;
use crate::lazy::encoding::BinaryEncoding;
use crate::{IonResult, IonType};
use std::fmt;
use std::fmt::{Debug, Formatter};
Expand Down
2 changes: 1 addition & 1 deletion src/lazy/binary/raw/struct.rs
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
use crate::lazy::binary::encoding::BinaryEncoding;
use crate::lazy::binary::immutable_buffer::ImmutableBuffer;
use crate::lazy::binary::raw::annotations_iterator::RawBinaryAnnotationsIterator;
use crate::lazy::binary::raw::reader::DataSource;
use crate::lazy::binary::raw::value::LazyRawBinaryValue;
use crate::lazy::decoder::private::{LazyContainerPrivate, LazyRawFieldPrivate};
use crate::lazy::decoder::{LazyRawField, LazyRawStruct};
use crate::lazy::encoding::BinaryEncoding;
use crate::lazy::raw_value_ref::RawValueRef;
use crate::raw_symbol_token_ref::AsRawSymbolTokenRef;
use crate::{IonResult, RawSymbolTokenRef};
Expand Down
16 changes: 12 additions & 4 deletions src/lazy/binary/raw/value.rs
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
use crate::binary::int::DecodedInt;
use crate::binary::uint::DecodedUInt;
use crate::lazy::binary::encoded_value::EncodedValue;
use crate::lazy::binary::encoding::BinaryEncoding;
use crate::lazy::binary::immutable_buffer::ImmutableBuffer;
use crate::lazy::binary::raw::annotations_iterator::RawBinaryAnnotationsIterator;
use crate::lazy::binary::raw::lazy_raw_sequence::LazyRawBinarySequence;
use crate::lazy::binary::raw::r#struct::LazyRawBinaryStruct;
use crate::lazy::binary::raw::sequence::LazyRawBinarySequence;
use crate::lazy::decoder::private::LazyRawValuePrivate;
use crate::lazy::decoder::LazyRawValue;
use crate::lazy::encoding::BinaryEncoding;
use crate::lazy::raw_value_ref::RawValueRef;
use crate::result::IonFailure;
use crate::types::SymbolId;
Expand Down Expand Up @@ -35,7 +35,7 @@ impl<'a> Debug for LazyRawBinaryValue<'a> {
fn fmt(&self, f: &mut Formatter<'_>) -> fmt::Result {
write!(
f,
"LazyRawValue {{\n val={:?},\n buf={:?}\n}}\n",
"LazyRawBinaryValue {{\n val={:?},\n buf={:?}\n}}\n",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ LazyRawBinaryValue used to be the only kind of lazy value and was called LazyRawValue. Now LazyRawValue is the trait and this type is LazyRawBinaryValue.

self.encoded_value, self.input
)
}
Expand All @@ -54,6 +54,10 @@ impl<'data> LazyRawValue<'data, BinaryEncoding> for LazyRawBinaryValue<'data> {
self.ion_type()
}

fn is_null(&self) -> bool {
self.is_null()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️is_null is a method on the LazyRawValue trait. This implementation delegates the call to the is_null method that lives on the LazyRawBinaryValue concrete type.

This arrangement produces a small amount of extra code (there are two fn is_nulls), but means that consumer code that works explicitly with LazyRawBinaryValue doesn't have to import a trait just to use its methods.

}

fn annotations(&self) -> RawBinaryAnnotationsIterator<'data> {
self.annotations()
}
Expand All @@ -70,6 +74,10 @@ impl<'data> LazyRawBinaryValue<'data> {
self.encoded_value.ion_type()
}

pub fn is_null(&self) -> bool {
self.encoded_value.header().is_null()
}

/// Returns `true` if this value has a non-empty annotations sequence; otherwise, returns `false`.
fn has_annotations(&self) -> bool {
self.encoded_value.has_annotations()
Expand Down Expand Up @@ -118,7 +126,7 @@ impl<'data> LazyRawBinaryValue<'data> {
/// [`LazyRawBinarySequence`] or [`LazyStruct`](crate::lazy::struct::LazyStruct)
/// that can be traversed to access the container's contents.
pub fn read(&self) -> ValueParseResult<'data, BinaryEncoding> {
if self.encoded_value.header().is_null() {
if self.is_null() {
let raw_value_ref = RawValueRef::Null(self.ion_type());
return Ok(raw_value_ref);
}
Expand Down
1 change: 1 addition & 0 deletions src/lazy/decoder.rs
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,7 @@ pub trait LazyRawValue<'data, D: LazyDecoder<'data>>:
private::LazyRawValuePrivate<'data> + Clone + Debug
{
fn ion_type(&self) -> IonType;
fn is_null(&self) -> bool;
fn annotations(&self) -> D::AnnotationsIterator;
fn read(&self) -> IonResult<RawValueRef<'data, D>>;
}
Expand Down
133 changes: 133 additions & 0 deletions src/lazy/encoding.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,133 @@
use crate::lazy::binary::raw::annotations_iterator::RawBinaryAnnotationsIterator;
use crate::lazy::binary::raw::r#struct::LazyRawBinaryStruct;
use crate::lazy::binary::raw::reader::LazyRawBinaryReader;
use crate::lazy::binary::raw::sequence::LazyRawBinarySequence;
use crate::lazy::binary::raw::value::LazyRawBinaryValue;
use crate::lazy::decoder::private::{LazyContainerPrivate, LazyRawFieldPrivate};
use crate::lazy::decoder::{LazyDecoder, LazyRawField, LazyRawSequence, LazyRawStruct};
use crate::lazy::raw_value_ref::RawValueRef;
use crate::lazy::text::raw::reader::LazyRawTextReader;
use crate::lazy::text::value::LazyRawTextValue;
use crate::{IonResult, IonType, RawSymbolTokenRef};
use std::marker::PhantomData;

// These types derive trait implementations in order to allow types that containing them
// to also derive trait implementations.

/// The Ion 1.0 binary encoding.
#[derive(Clone, Debug)]
pub struct BinaryEncoding;

/// The Ion 1.0 text encoding.
#[derive(Clone, Debug)]
pub struct TextEncoding;

impl<'data> LazyDecoder<'data> for BinaryEncoding {
type Reader = LazyRawBinaryReader<'data>;
type Value = LazyRawBinaryValue<'data>;
type Sequence = LazyRawBinarySequence<'data>;
type Struct = LazyRawBinaryStruct<'data>;
type AnnotationsIterator = RawBinaryAnnotationsIterator<'data>;
}
Comment on lines +25 to +31
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ As mentioned earlier in the diff, encoding.rs was moved here from the binary directory. The BinaryEncoding struct and this impl existed before this PR, but appear as additions here because of the move. The TextEncoding struct and the content below this impl (line 33+) are new.


// === Placeholders ===
// The types below will need to be properly defined in order for the lazy text reader to be complete.
// The exist to satisfy various trait definitions.
Comment on lines +33 to +35
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ At this point, we have a LazyDecoder trait, a complete binary impl of that trait, and the very beginnings of the text impl. I've created placeholder ToDo* types that allow the text impl to technically implement the trait. I'll replace them with actual implementations in follow-on PRs.

#[derive(Debug, Clone)]
pub struct ToDoTextSequence;

impl<'data> LazyContainerPrivate<'data, TextEncoding> for ToDoTextSequence {
fn from_value(_value: LazyRawTextValue<'data>) -> Self {
todo!()
}
}

impl<'data> LazyRawSequence<'data, TextEncoding> for ToDoTextSequence {
type Iterator = Box<dyn Iterator<Item = IonResult<LazyRawTextValue<'data>>>>;

fn annotations(&self) -> ToDoTextAnnotationsIterator<'data> {
todo!()
}

fn ion_type(&self) -> IonType {
todo!()
}

fn iter(&self) -> Self::Iterator {
todo!()
}

fn as_value(&self) -> &<TextEncoding as LazyDecoder<'data>>::Value {
todo!()
}
}

#[derive(Debug, Clone)]
pub struct ToDoTextStruct;

#[derive(Debug, Clone)]
pub struct ToDoTextField;

impl<'data> LazyRawFieldPrivate<'data, TextEncoding> for ToDoTextField {
fn into_value(self) -> LazyRawTextValue<'data> {
todo!()
}
}

impl<'data> LazyRawField<'data, TextEncoding> for ToDoTextField {
fn name(&self) -> RawSymbolTokenRef<'data> {
todo!()
}

fn value(&self) -> &LazyRawTextValue<'data> {
todo!()
}
}

impl<'data> LazyContainerPrivate<'data, TextEncoding> for ToDoTextStruct {
fn from_value(_value: <TextEncoding as LazyDecoder>::Value) -> Self {
todo!()
}
}

impl<'data> LazyRawStruct<'data, TextEncoding> for ToDoTextStruct {
type Field = ToDoTextField;
type Iterator = Box<dyn Iterator<Item = IonResult<ToDoTextField>>>;

fn annotations(&self) -> ToDoTextAnnotationsIterator<'data> {
todo!()
}

fn find(&self, _name: &str) -> IonResult<Option<LazyRawTextValue<'data>>> {
todo!()
}

fn get(&self, _name: &str) -> IonResult<Option<RawValueRef<'data, TextEncoding>>> {
todo!()
}

fn iter(&self) -> Self::Iterator {
todo!()
}
}

#[derive(Debug, Clone)]
pub struct ToDoTextAnnotationsIterator<'data> {
spooky: &'data PhantomData<()>,
}

impl<'data> Iterator for ToDoTextAnnotationsIterator<'data> {
type Item = IonResult<RawSymbolTokenRef<'data>>;

fn next(&mut self) -> Option<Self::Item> {
todo!()
}
}

impl<'data> LazyDecoder<'data> for TextEncoding {
type Reader = LazyRawTextReader<'data>;
type Value = LazyRawTextValue<'data>;
type Sequence = ToDoTextSequence;
type Struct = ToDoTextStruct;
type AnnotationsIterator = ToDoTextAnnotationsIterator<'data>;
}
2 changes: 2 additions & 0 deletions src/lazy/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,14 @@

pub mod binary;
pub mod decoder;
pub(crate) mod encoding;
pub mod raw_stream_item;
pub mod raw_value_ref;
pub mod reader;
pub mod sequence;
pub mod r#struct;
pub mod system_reader;
pub mod system_stream_item;
pub mod text;
pub mod value;
pub mod value_ref;
8 changes: 8 additions & 0 deletions src/lazy/raw_value_ref.rs
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,14 @@ impl<'data, D: LazyDecoder<'data>> RawValueRef<'data, D> {
}
}

pub fn expect_i64(self) -> IonResult<i64> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ Without this method, "get-int-as-i64-or-IonError" is expressed as:

let int = raw_reader
  .next()?
  .expect_value()? // It's not an IVM or end-of-stream
  .expect_int()?   // The value is an Int
  .expect_i64()?;  // The Int fits in an i64

This reduces it to:

let int = raw_reader
  .next()?
  .expect_value()?
  .expect_i64()?;

and is consistent with both Element and Int, each of which have an expect_i64() method.

if let RawValueRef::Int(i) = self {
i.expect_i64()
} else {
IonResult::decoding_error("expected an i64 (int)")
}
}

pub fn expect_float(self) -> IonResult<f64> {
if let RawValueRef::Float(f) = self {
Ok(f)
Expand Down
2 changes: 1 addition & 1 deletion src/lazy/reader.rs
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
use crate::binary::constants::v1_0::IVM;
use crate::element::reader::ElementReader;
use crate::element::Element;
use crate::lazy::binary::encoding::BinaryEncoding;
use crate::lazy::decoder::LazyDecoder;
use crate::lazy::encoding::BinaryEncoding;
use crate::lazy::system_reader::LazySystemReader;
use crate::lazy::value::LazyValue;
use crate::result::IonFailure;
Expand Down
2 changes: 1 addition & 1 deletion src/lazy/sequence.rs
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
use crate::lazy::binary::encoding::BinaryEncoding;
use crate::lazy::decoder::{LazyDecoder, LazyRawSequence, LazyRawValue};
use crate::lazy::encoding::BinaryEncoding;
use crate::lazy::value::{AnnotationsIterator, LazyValue};
use crate::{Annotations, Element, IntoAnnotatedElement, Sequence, Value};
use crate::{IonError, IonResult, IonType, SymbolTable};
Expand Down
2 changes: 1 addition & 1 deletion src/lazy/struct.rs
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
use crate::element::builders::StructBuilder;
use crate::lazy::binary::encoding::BinaryEncoding;
use crate::lazy::decoder::private::{LazyRawFieldPrivate, LazyRawValuePrivate};
use crate::lazy::decoder::{LazyDecoder, LazyRawStruct};
use crate::lazy::encoding::BinaryEncoding;
use crate::lazy::value::{AnnotationsIterator, LazyValue};
use crate::lazy::value_ref::ValueRef;
use crate::result::IonFailure;
Expand Down
2 changes: 1 addition & 1 deletion src/lazy/system_reader.rs
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
use crate::lazy::binary::encoding::BinaryEncoding;
use crate::lazy::encoding::BinaryEncoding;
use crate::result::IonFailure;
use crate::{IonResult, IonType, RawSymbolTokenRef, SymbolTable};

Expand Down
33 changes: 33 additions & 0 deletions src/lazy/text/as_utf8.rs
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ This PR involves two types that are effectively wrappers around a &[u8]. This trait adds an extension method to view those bytes as a &str or return an IonError with the appropriate Position.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ This PR involves two types that are effectively wrappers around a &[u8]: TextBufferView and SmallVec. Most of the time we can process their contents without turning them into validated UTF-8 &strs (relying instead on the grammar to reject invalid content), but it is occasionally necessary. This trait adds an extension method to streamline the "view as &str or return an IonError" idiom.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ This PR deals with two types that are effectively wrappers around a &[u8]: TextBufferView and SmallVec. Most of the time we don't need to view them as validated UTF-8 &strs--we can rely on the grammar to reject anything invalid--but sometimes we do. This trait defines an extension method that streamlines the "view these bytes as a &str or raise an IonError" task.

Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
use crate::lazy::text::buffer::TextBufferView;
use crate::position::Position;
use crate::result::DecodingError;
use crate::{IonError, IonResult};
use smallvec::SmallVec;

/// Attempts to validate a byte sequence as UTF-8 text. If the data is not valid UTF-8, returns
/// an [`IonError`].
///
/// The provided `position` is added to the `IonError` that is constructed if the data is not valid.
pub(crate) trait AsUtf8 {
fn as_utf8(&self, position: impl Into<Position>) -> IonResult<&str>;
}

impl<const N: usize> AsUtf8 for SmallVec<[u8; N]> {
fn as_utf8(&self, position: impl Into<Position>) -> IonResult<&str> {
std::str::from_utf8(self.as_ref()).map_err(|_| {
let decoding_error =
DecodingError::new("encountered invalid UTF-8").with_position(position);
IonError::Decoding(decoding_error)
})
}
}

impl<'data> AsUtf8 for TextBufferView<'data> {
fn as_utf8(&self, position: impl Into<Position>) -> IonResult<&str> {
std::str::from_utf8(self.bytes()).map_err(|_| {
let decoding_error =
DecodingError::new("encountered invalid UTF-8").with_position(position);
IonError::Decoding(decoding_error)
})
}
}
Loading