Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for delimited containers to the 1.1 binary reader #787

Merged

Conversation

nirosys
Copy link
Contributor

@nirosys nirosys commented Jun 7, 2024

Issue #, if available: #665

Description of changes:
This PR adds delimited container support for the binary 1.1 reader.


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@nirosys
Copy link
Contributor Author

nirosys commented Jun 17, 2024

Found an issue while rebasing that's causing unit tests to fail. Fixing that before opening this up to review.

@nirosys nirosys force-pushed the rgiliam/1_1_binary_reader_delimited_containers branch from 0387b31 to fd917bd Compare June 18, 2024 23:39
Copy link

codecov bot commented Jun 18, 2024

Codecov Report

Attention: Patch coverage is 77.92553% with 83 lines in your changes missing coverage. Please review.

Project coverage is 76.98%. Comparing base (fa3167d) to head (fd917bd).
Report is 75 commits behind head on main.

Current head fd917bd differs from pull request most recent head f89c91f

Please upload reports for the commit f89c91f to get more accurate results.

Files Patch % Lines
src/lazy/binary/raw/v1_1/reader.rs 71.64% 0 Missing and 38 partials ⚠️
src/lazy/binary/raw/v1_1/immutable_buffer.rs 82.03% 7 Missing and 16 partials ⚠️
src/lazy/binary/raw/v1_1/struct.rs 86.15% 6 Missing and 3 partials ⚠️
src/lazy/binary/raw/v1_1/sequence.rs 75.75% 8 Missing ⚠️
src/lazy/binary/raw/v1_1/type_descriptor.rs 44.44% 5 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #787      +/-   ##
==========================================
- Coverage   80.60%   76.98%   -3.63%     
==========================================
  Files         140      131       -9     
  Lines       28209    28369     +160     
  Branches    28209    28369     +160     
==========================================
- Hits        22737    21839     -898     
- Misses       3641     4754    +1113     
+ Partials     1831     1776      -55     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@nirosys nirosys marked this pull request as ready for review June 18, 2024 23:55
@nirosys nirosys requested a review from zslayton June 18, 2024 23:55
Comment on lines 437 to 473
if opcode.is_delimited() {
self.peek_delimited_container(opcode)
} else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The peek_delimited_sequence and peek_delimited_struct methods reproduce a lot of the code below this point. Would it be possible to refactor them to return their expression cache and the length consumed so they could still share the EncodedValue construction code in this method?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missed this comment in the previous commit, following up quick to address it.

Comment on lines 330 to 334
} else if let Some(value) = ImmutableBuffer::peek_sequence_value_expr(input)? {
offsets.push(input.offset());
input = input.consume(value.range().len());
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

peek_sequence_value_expr(input) is fully lexing the value, is there a reason you decided to store the offset rather than the LazyRawValueExpr (value) itself?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a lifetime issue with the values when I first started working on it. Now that I think about it though, I had some mismatched lifetimes that I ended up fixing later that could have been the problem. I'l revisit.

Comment on lines 168 to 171
if offsets.len() <= 1 {
None
} else {
let offset = offsets.first().unwrap(); // Safety: Already tested that there's > 1 item.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To flatten the method a bit, you could do:

let offset = offsets.first()?; // <-- Early returns on `None`.

Comment on lines 184 to 186
FlexSymValue::Opcode(Opcode{ opcode_type: OpcodeType::DelimitedContainerClose, ..}) => {
return Ok(None)
}
Copy link
Contributor

@zslayton zslayton Jun 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slightly more concise:

FlexSymValue::Opcode(o) if o.opcode_type == DelimitedContainerClose => {...}

Or it might be nice to have an is_delimited_end() helper method to complement the is_delimited_container() method?

Opcode(o) if o.is_delimited_end() => {...},

Comment on lines +136 to +142
enum SymAddressFieldName<'top> {
ModeChange,
FieldName(LazyRawBinaryFieldName_1_1<'top>),
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice.

Comment on lines 247 to 253
) -> IonResult<Option<(LazyRawFieldExpr<'top, BinaryEncoding_1_1>, usize)>> {
let mut buffer = input;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could/should this method return a ParseResult<LazyRawFieldExpr<_>> instead so the caller gets the remaining input? (And can deduce the length as desired.)

EDIT: I see now that this is being used to populate the offsets cache. I think this is another place where it would make sense to store the LazyRawFieldExpr in the cache instead of just the offset to avoid later rework.

Comment on lines 301 to 304
if offsets.len() <= 1 {
None
} else {
let offset = offsets.first().unwrap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let offset = offsets.first()?;

Comment on lines +79 to +82
(0xF, 0x0) => (DelimitedContainerClose, low_nibble, None),
(0xF, 0x1) => (ListDelimited, low_nibble, Some(IonType::List)),
(0xF, 0x2) => (SExpressionDelimited, low_nibble, Some(IonType::SExp)),
(0xF, 0x3) => (StructDelimited, low_nibble, Some(IonType::Struct)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like this jump table. I've found myself wondering if we should add a higher-level OpcodeKind enum alongside the OpcodeType that captures some of the natural groupings that currently require testing for several OpcodeType variants. For example, having:

  • DelimitedContainer for 0xF1/0xF2/0xF3
  • EExp for 0x00 .. 0x5F
  • etc

No TODO here, just thinking out loud.

@nirosys nirosys force-pushed the rgiliam/1_1_binary_reader_delimited_containers branch from 05d0296 to 127bf4c Compare July 9, 2024 21:19
@nirosys nirosys requested a review from zslayton July 9, 2024 21:47
@@ -712,6 +713,57 @@ mod tests {
Ok(())
}

#[test]
fn nested_sequence() -> IonResult<()> {
let ion_data: &[u8] = &[0xF1, 0x61, 0x01, 0xF1, 0x61, 0x02, 0xF0, 0x61, 0x03, 0xF0];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a comment with the text equivalent?

struct_type: match opcode_type {
// TODO: Delimited struct handling
OpcodeType::Struct => StructType::SymbolAddress,
// bytes_to_skip: 0,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leftover?

@@ -125,6 +129,14 @@ impl Opcode {
self.low_nibble
}

pub fn is_delimited(&self) -> bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe is_delimited_start for symmetry with is_delimited_end?

pub struct RawBinaryStructIterator_1_1<'top> {
source: ImmutableBuffer<'top>,
bytes_to_skip: usize,
struct_type: StructType,
// bytes_to_skip: usize,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leftover?

@zslayton
Copy link
Contributor

Can we merge this or is there something blocking it?

@nirosys
Copy link
Contributor Author

nirosys commented Aug 12, 2024

Can we merge this or is there something blocking it?

Just had to push up the merge from main. CI failed, looking into that now.

@nirosys
Copy link
Contributor Author

nirosys commented Aug 12, 2024

CI fixed, and feedback addressed. Merging.

@nirosys nirosys merged commit bc2e976 into amazon-ion:main Aug 12, 2024
22 of 26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants