-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial raw lazy text reader (top-level nulls, bools, ints) #609
Conversation
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## main #609 +/- ##
==========================================
- Coverage 82.39% 81.72% -0.67%
==========================================
Files 111 118 +7
Lines 20048 20969 +921
Branches 20048 20969 +921
==========================================
+ Hits 16518 17137 +619
- Misses 1912 2193 +281
- Partials 1618 1639 +21
☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ PR tour
@@ -66,7 +66,7 @@ num-bigint = "0.4.3" | |||
num-integer = "0.1.44" | |||
num-traits = "0.2" | |||
arrayvec = "0.7" | |||
smallvec = "1.9.0" | |||
smallvec = {version ="1.9.0", features = ["const_generics"]} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ The const_generics
feature of the smallvec
crate provides trait implementations for all sizes of backing array ([u8; N]
in our case) rather than just 0-32 and several powers of two beyond. It's a feature because smallvec
predates const generics and didn't want to force a breaking change.
src/lazy/binary/encoding.rs
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ This file was moved to the parent directory, not deleted. It appears again later in the diff.
@@ -35,7 +35,7 @@ impl<'a> Debug for LazyRawBinaryValue<'a> { | |||
fn fmt(&self, f: &mut Formatter<'_>) -> fmt::Result { | |||
write!( | |||
f, | |||
"LazyRawValue {{\n val={:?},\n buf={:?}\n}}\n", | |||
"LazyRawBinaryValue {{\n val={:?},\n buf={:?}\n}}\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ LazyRawBinaryValue
used to be the only kind of lazy value and was called LazyRawValue
. Now LazyRawValue
is the trait and this type is LazyRawBinaryValue
.
@@ -54,6 +54,10 @@ impl<'data> LazyRawValue<'data, BinaryEncoding> for LazyRawBinaryValue<'data> { | |||
self.ion_type() | |||
} | |||
|
|||
fn is_null(&self) -> bool { | |||
self.is_null() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️is_null
is a method on the LazyRawValue
trait. This implementation delegates the call to the is_null
method that lives on the LazyRawBinaryValue
concrete type.
This arrangement produces a small amount of extra code (there are two fn is_null
s), but means that consumer code that works explicitly with LazyRawBinaryValue
doesn't have to import a trait just to use its methods.
impl<'data> LazyDecoder<'data> for BinaryEncoding { | ||
type Reader = LazyRawBinaryReader<'data>; | ||
type Value = LazyRawBinaryValue<'data>; | ||
type Sequence = LazyRawBinarySequence<'data>; | ||
type Struct = LazyRawBinaryStruct<'data>; | ||
type AnnotationsIterator = RawBinaryAnnotationsIterator<'data>; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ As mentioned earlier in the diff, encoding.rs
was moved here from the binary
directory. The BinaryEncoding
struct and this impl
existed before this PR, but appear as additions here because of the move. The TextEncoding
struct and the content below this impl
(line 33+) are new.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ This PR deals with two types that are effectively wrappers around a &[u8]
: TextBufferView
and SmallVec
. Most of the time we don't need to view them as validated UTF-8 &str
s--we can rely on the grammar to reject anything invalid--but sometimes we do. This trait defines an extension method that streamlines the "view these bytes as a &str
or raise an IonError
" task.
self.data.is_empty() | ||
} | ||
|
||
pub fn match_whitespace(self) -> IonMatchResult<'data> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ Note that only a handful of the match_
parsing methods are pub
. Most are helper-parsers.
// impl<'data> From<InvalidInputError<'data>> for IonError { | ||
// fn from(value: InvalidInputError) -> Self { | ||
// dbg!(&value.backtrace); | ||
// let mut message = String::from(value.description().unwrap_or("invalid text Ion syntax")); | ||
// if let Some(label) = value.label { | ||
// message.push_str(" while "); | ||
// message.push_str(label.as_ref()); | ||
// } | ||
// let position = Position::with_offset(value.input.offset()).with_length(value.input.len()); | ||
// let decoding_error = DecodingError::new(message).with_position(position); | ||
// IonError::Decoding(decoding_error) | ||
// } | ||
// } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ I'll remove this obsolete code after the PR is reviewed. Pushing code as I write the PR tour comments causes some number of the comments to be discarded 😐.
use crate::IonType; | ||
|
||
#[test] | ||
fn test_top_level() -> IonResult<()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ This test demonstrates the total capability introduced in this PR. Namely, a LazyRawTextReader
that can read top-level, unannotated nulls, bools, and ints in a text stream.
/// If only part of the value is in the input buffer, calls to [`LazyRawTextValue::read`] (which examines | ||
/// bytes beyond the value's header) may return [`IonError::Incomplete`](crate::result::IonError::Incomplete). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ I have a lingering reference to "the value's header" that was a copy/paste error from this type's binary counterpart. I'll fix that before merging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ couple more PR tour comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ Much of this was copied over from the old text reader implementation's error handling code and then updated.
pub(crate) trait AddContext<'data, T> { | ||
fn with_context( | ||
self, | ||
label: impl Into<Cow<'static, str>>, | ||
input: TextBufferView<'data>, | ||
) -> IonResult<(TextBufferView<'data>, T)>; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ I plan to add a doc comment to this trait before merging. I intentionally avoided providing an Into<IonError>
impl for IonParseError
because some of the nom error kinds that bubble up as IonParseError
don't retain context. This trait is an easy way for methods that could surface without context to require more information before being passed on to the user.
@@ -6,12 +7,23 @@ use thiserror::Error; | |||
#[error("{description}")] | |||
pub struct DecodingError { | |||
description: Cow<'static, str>, | |||
position: Option<Position>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Suggestion) Add comment regarding why position is optional here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add this in a follow-on PR to avoid needing to modify the queue.
} | ||
|
||
/// Matches a single scalar value or the beginning of a container. | ||
pub fn match_value(self) -> IonParseResult<'data, LazyRawTextValue<'data>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Question) Should this be called read_value
as it returns a parsed value? Same for match_top_level
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was reserving the read_*
names for methods that produce one of the RawValueRef::*
variants. The LazyRawTextValue
that this returns has its own read()
method that serves that purpose.
This adds an initial implementation of a
LazyRawTextReader
that can yieldLazyRawTextValue
s. At present it is limited to reading top-levelnull
s,bool
s, andint
s of any radix. Support for other types, annotations, and containers will be added in follow-on PRs.This implementation splits the parsing process for each value into two phases:
This split makes it possible for the reader to avoid the cost of validation and materialization for values that the user simply skips over. To minimize rework done in the two phases, the matching phase records information it discovers along the way (subfield offsets, radix, etc) so it is available if and when the reading phase begins.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.