-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds ion-tests
integration for the lazy reader
#639
Conversation
Codecov ReportPatch coverage is
📢 Thoughts on this report? Let us know!. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ PR tour
src/lazy/binary/immutable_buffer.rs
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ Previously, the binary reader had a monolithic peek_value
method that contained logic for reading fields, annotations, NOPs, and the values themselves. I refactored this so that each has its own code path with clearer invariants. This made it easier to track down a NOP-related buffer indexing problem.
self.buffer = buffer; | ||
// If the value we read doesn't start where we began reading, there was a NOP. | ||
let num_nop_bytes = lazy_value.input.offset() - buffer.offset(); | ||
self.buffer = buffer.consume(num_nop_bytes); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ This was the buffer indexing bug I referred to in an earlier comment.
// Skip the type descriptor | ||
let input = self.input.consume(1); | ||
// Skip the type descriptor and length bytes | ||
let input = ImmutableBuffer::new(self.value_body()?); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ This placeholder impl lived longer than it was supposed to, and always slipped past the tests because I didn't dummy up any decimals that were > 13 bytes long.
src/lazy/reader.rs
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ I tried to standardize the reader level names a bit more. There are three levels: Raw
, System
, and Application
. Each level has three supported decoders: Text
, Binary
, or Any
, where Any
uses enum dispatch to abstract over Text
or Binary
.
LazyRawBinaryReader
,LazyRawTextReader
,LazyRawAnyReader
LazySystemBinaryReader
,LazySystemTextReader
,LazySystemAnyReader
LazyApplicationBinaryReader
,LazyApplicationSystemTextReader
,LazyApplicationSystemAnyReader
Because users will almost always interact with the Application
level, that is the default and type aliases are provided to allow the level to be omitted.
LazyBinaryReader
,LazyTextReader
Similarly, most users won't want to have to specify the format, so they can just use the LazyReader
to get a LazyApplicationAnyReader
.
@@ -107,7 +124,7 @@ impl<'data, D: LazyDecoder<'data>> LazySystemReader<'data, D> { | |||
return Ok(false); | |||
} | |||
if let Some(symbol_ref) = lazy_value.annotations().next() { | |||
return Ok(symbol_ref? == ION_SYMBOL_TABLE); | |||
return Ok(symbol_ref?.matches_sid_or_text(3, "$ion_symbol_table")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ Now that there's a text reader, we cannot simply look at symbol IDs when comparing annotations or field names.
if let RawStreamItem::VersionMarker(major, minor) = matched { | ||
if (major, minor) != (1, 0) { | ||
return IonResult::decoding_error(format!( | ||
"Ion version {major}.{minor} is not supported" | ||
)); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ There are ion-tests
which expect the reader to reject future Ion versions.
src/lazy/text/raw/reader.rs
Outdated
@@ -191,8 +199,8 @@ mod tests { | |||
// Second item | |||
2 /*comment before comma*/, | |||
// Third item | |||
3 | |||
] | |||
3 ,] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ I added a trailing comma while debugging.
.slice_to_end(1) | ||
.match_optional_comments_and_whitespace() | ||
.with_context("skipping a list's trailing comma", input_after_ws)?; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ If a list had a trailing comma, this would break.
.slice_to_end(1) | ||
.match_optional_comments_and_whitespace() | ||
.with_context("skipping a list's trailing comma", input_after_ws)?; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ If the struct had a trailing comma, this would break.
let value: Value = value_ref.try_into()?; | ||
Ok(value.into()) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ This enables: let element: Element = reader.next()?.read()?.try_into()?;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ Restoring PR tour comments that GitHub decided to remove/hide as "outdated."
src/lazy/binary/immutable_buffer.rs
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ Previously, the binary reader had a monolithic peek_value
method that contained logic for reading fields, annotations, NOPs, and the values themselves. I refactored this so that each has its own code path with clearer invariants. This made it easier to track down a NOP-related buffer indexing problem.
src/lazy/reader.rs
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ I tried to normalize the reader level names a bit more. There are three levels: Raw
, System
, and Application
. Each level has three supported decoders: Text
, Binary
, or Any
, where Any
uses enum dispatch to abstract over Text
or Binary
.
LazyRawBinaryReader
,LazyRawTextReader
,LazyRawAnyReader
LazySystemBinaryReader
,LazySystemTextReader
,LazySystemAnyReader
LazyApplicationBinaryReader
,LazyApplicationSystemTextReader
,LazyApplicationSystemAnyReader
Because users will almost always interact with the Application
level, that is the default and type aliases are provided to allow the level to be omitted.
LazyBinaryReader
,LazyTextReader
Similarly, most users won't want to have to specify the format, so they can just use the LazyReader
to get a LazyApplicationAnyReader
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I missed this comment when I was reviewing the code. However, I think my question is still valid. Why do we need the encoding distinction at every level of reader?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall it looks good. I do have one question though.
You've made a "Binary", "Text", and "Any" implementation for the "Raw", "System", and "Application" readers. Is there any reason that the text and binary distinction has to be pushed up that far? Would it be possible to create a RawAnyReader
that is an enum of binary and text raw readers so that higher levels of readers can have just one implementation?
@@ -55,11 +56,11 @@ use crate::{IonError, IonResult}; | |||
///# Ok(()) | |||
///# } | |||
/// ``` | |||
pub struct LazyReader<'data, D: LazyDecoder<'data>> { | |||
pub struct LazyApplicationReader<'data, D: LazyDecoder<'data>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doc comments need to be updated. It still refers to this as a "binary" reader.
@@ -357,7 +366,7 @@ impl<'data> TextBufferView<'data> { | |||
/// input bytes where the field name is found, and the value. | |||
pub fn match_struct_field_name_and_value( | |||
self, | |||
) -> IonParseResult<'data, ((MatchedSymbol, Range<usize>), LazyRawTextValue<'data>)> { | |||
) -> IonParseResult<'data, ((MatchedFieldName, Range<usize>), LazyRawTextValue<'data>)> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
{'''f''' '''o''' '''o''': '''b''' '''a''' '''r'''}
🤯
@@ -701,7 +722,7 @@ impl<'data> TextBufferView<'data> { | |||
|
|||
/// Matches the three parts of an int--its base, its sign, and its digits--without actually | |||
/// constructing an Int from them. | |||
fn match_int(self) -> IonParseResult<'data, MatchedInt> { | |||
pub fn match_int(self) -> IonParseResult<'data, MatchedInt> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious—why does this need to be pub
now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted each of the methods that matched a full Ion type to be reusable from outside the buffer (but not outside the crate without more consideration).
The old reader uses a Note that |
Adds
ion-tests
integration for the lazy reader. Also addresses a number bugs surfaced by the conformance tests.Fixes issue #636.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.