-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for delimited containers to the 1.1 binary reader #787
Add support for delimited containers to the 1.1 binary reader #787
Conversation
Found an issue while rebasing that's causing unit tests to fail. Fixing that before opening this up to review. |
0387b31
to
fd917bd
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #787 +/- ##
==========================================
- Coverage 80.60% 76.98% -3.63%
==========================================
Files 140 131 -9
Lines 28209 28369 +160
Branches 28209 28369 +160
==========================================
- Hits 22737 21839 -898
- Misses 3641 4754 +1113
+ Partials 1831 1776 -55 ☔ View full report in Codecov by Sentry. |
if opcode.is_delimited() { | ||
self.peek_delimited_container(opcode) | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The peek_delimited_sequence
and peek_delimited_struct
methods reproduce a lot of the code below this point. Would it be possible to refactor them to return their expression cache and the length consumed so they could still share the EncodedValue
construction code in this method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missed this comment in the previous commit, following up quick to address it.
} else if let Some(value) = ImmutableBuffer::peek_sequence_value_expr(input)? { | ||
offsets.push(input.offset()); | ||
input = input.consume(value.range().len()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
peek_sequence_value_expr(input)
is fully lexing the value, is there a reason you decided to store the offset rather than the LazyRawValueExpr
(value
) itself?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a lifetime issue with the values when I first started working on it. Now that I think about it though, I had some mismatched lifetimes that I ended up fixing later that could have been the problem. I'l revisit.
src/lazy/binary/raw/v1_1/sequence.rs
Outdated
if offsets.len() <= 1 { | ||
None | ||
} else { | ||
let offset = offsets.first().unwrap(); // Safety: Already tested that there's > 1 item. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To flatten the method a bit, you could do:
let offset = offsets.first()?; // <-- Early returns on `None`.
src/lazy/binary/raw/v1_1/struct.rs
Outdated
FlexSymValue::Opcode(Opcode{ opcode_type: OpcodeType::DelimitedContainerClose, ..}) => { | ||
return Ok(None) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Slightly more concise:
FlexSymValue::Opcode(o) if o.opcode_type == DelimitedContainerClose => {...}
Or it might be nice to have an is_delimited_end()
helper method to complement the is_delimited_container()
method?
Opcode(o) if o.is_delimited_end() => {...},
enum SymAddressFieldName<'top> { | ||
ModeChange, | ||
FieldName(LazyRawBinaryFieldName_1_1<'top>), | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice.
src/lazy/binary/raw/v1_1/struct.rs
Outdated
) -> IonResult<Option<(LazyRawFieldExpr<'top, BinaryEncoding_1_1>, usize)>> { | ||
let mut buffer = input; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could/should this method return a ParseResult<LazyRawFieldExpr<_>>
instead so the caller gets the remaining input? (And can deduce the length as desired.)
EDIT: I see now that this is being used to populate the offsets cache. I think this is another place where it would make sense to store the LazyRawFieldExpr
in the cache instead of just the offset to avoid later rework.
src/lazy/binary/raw/v1_1/struct.rs
Outdated
if offsets.len() <= 1 { | ||
None | ||
} else { | ||
let offset = offsets.first().unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let offset = offsets.first()?;
(0xF, 0x0) => (DelimitedContainerClose, low_nibble, None), | ||
(0xF, 0x1) => (ListDelimited, low_nibble, Some(IonType::List)), | ||
(0xF, 0x2) => (SExpressionDelimited, low_nibble, Some(IonType::SExp)), | ||
(0xF, 0x3) => (StructDelimited, low_nibble, Some(IonType::Struct)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like this jump table. I've found myself wondering if we should add a higher-level OpcodeKind
enum alongside the OpcodeType
that captures some of the natural groupings that currently require testing for several OpcodeType
variants. For example, having:
DelimitedContainer
for0xF1
/0xF2
/0xF3
EExp
for0x00 .. 0x5F
- etc
No TODO
here, just thinking out loud.
…ypes; Fix delimited struct delimiters
…lify some expressions
05d0296
to
127bf4c
Compare
src/lazy/binary/raw/v1_1/reader.rs
Outdated
@@ -712,6 +713,57 @@ mod tests { | |||
Ok(()) | |||
} | |||
|
|||
#[test] | |||
fn nested_sequence() -> IonResult<()> { | |||
let ion_data: &[u8] = &[0xF1, 0x61, 0x01, 0xF1, 0x61, 0x02, 0xF0, 0x61, 0x03, 0xF0]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a comment with the text equivalent?
src/lazy/binary/raw/v1_1/struct.rs
Outdated
struct_type: match opcode_type { | ||
// TODO: Delimited struct handling | ||
OpcodeType::Struct => StructType::SymbolAddress, | ||
// bytes_to_skip: 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leftover?
@@ -125,6 +129,14 @@ impl Opcode { | |||
self.low_nibble | |||
} | |||
|
|||
pub fn is_delimited(&self) -> bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe is_delimited_start
for symmetry with is_delimited_end
?
src/lazy/binary/raw/v1_1/struct.rs
Outdated
pub struct RawBinaryStructIterator_1_1<'top> { | ||
source: ImmutableBuffer<'top>, | ||
bytes_to_skip: usize, | ||
struct_type: StructType, | ||
// bytes_to_skip: usize, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leftover?
…ader_delimited_containers
Can we merge this or is there something blocking it? |
Just had to push up the merge from main. CI failed, looking into that now. |
CI fixed, and feedback addressed. Merging. |
Issue #, if available: #665
Description of changes:
This PR adds delimited container support for the binary 1.1 reader.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.