Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds LazyRawTextReader support for reading lists #617

Merged
merged 16 commits into from
Aug 23, 2023
Merged

Adds LazyRawTextReader support for reading lists #617

merged 16 commits into from
Aug 23, 2023

Conversation

zslayton
Copy link
Contributor

@zslayton zslayton commented Aug 3, 2023

Builds on outstanding PRs #609, #612, #613, #614 and #616.

Adds support for reading lists to the LazyRawTextReader.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Copy link
Contributor Author

@zslayton zslayton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ PR Tour

// Get as much of the sequence's body as is available in the input buffer.
// Reading a child value may fail as `Incomplete`
let buffer_slice = self.value.available_body();
RawSequenceIterator::new(buffer_slice)
RawBinarySequenceIterator::new(buffer_slice)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ Now that there are both binary and text readers, RawSequenceIterator has become the RawBinarySequenceIterator.

@@ -33,34 +34,6 @@ impl<'data> LazyDecoder<'data> for BinaryEncoding {
// === Placeholders ===
// The types below will need to be properly defined in order for the lazy text reader to be complete.
// The exist to satisfy various trait definitions.
#[derive(Debug, Clone)]
pub struct ToDoTextSequence;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ This PR defines the LazyRawTextSequence type and removes the stubbed-out ToDoTextSequence type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ This file contains the new parsing logic needed to detect the beginning of a list, a value in the list, and the end of the list.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ All changes in this file stem from the change on line 175; see the TODO comment.

}
}

impl<'a> Debug for LazyRawTextSequence<'a> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ This was copied from the binary sequence's impl. It tries to read the sequence's contents on a best-effort basis. We may be able to think up something more robust, but as a Debug impl on an experimental API, I'm ok with this for the moment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion—can we just use self.value.input.data (assuming it's valid utf-8) and if it fails, then we fallback to the Debug impl of the TextBufferView? This seems like an approach that could work for all lazy text values. What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. Opened #641 to track this.

Ok((_remaining, None)) => None,
Err(e) => e
.with_context("reading the next list value", self.input)
.transpose(),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ This call to transpose() is the ToIteratorOutput trait's extension method.

Comment on lines 175 to 178
// TODO: Decide what (if anything) to store here.
// Storing any collection of bytes or ranges means that this type cannot implement Copy,
// which in turn means MatchedValue and EncodedTextValue also cannot implement Copy.
// We probably also don't want to heap allocate just to match the long string.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way I see it, a long string is basically a list of strings that gets treated as a single string. I would lean toward storing nothing here, just like MatchedValue::List. While we probably do need a heap allocation to read the string, I agree that we should not be performing heap allocations to match the string (or any value, really) since that is unnecessarily eager.

}
}

impl<'a> Debug for LazyRawTextSequence<'a> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion—can we just use self.value.input.data (assuming it's valid utf-8) and if it fails, then we fallback to the Debug impl of the TextBufferView? This seems like an approach that could work for all lazy text values. What do you think?

Base automatically changed from lazy-symbols to main August 23, 2023 00:31
@codecov
Copy link

codecov bot commented Aug 23, 2023

Codecov Report

Patch coverage: 52.06% and project coverage change: -0.09% ⚠️

Comparison is base (dc8579d) 81.72% compared to head (0883ec5) 81.64%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #617      +/-   ##
==========================================
- Coverage   81.72%   81.64%   -0.09%     
==========================================
  Files         119      120       +1     
  Lines       21778    21876      +98     
  Branches    21778    21876      +98     
==========================================
+ Hits        17799    17860      +61     
- Misses       2331     2366      +35     
- Partials     1648     1650       +2     
Files Changed Coverage Δ
src/lazy/encoding.rs 0.00% <ø> (ø)
src/lazy/text/encoded_value.rs 64.76% <0.00%> (-0.63%) ⬇️
src/lazy/text/matched.rs 69.80% <ø> (ø)
src/lazy/text/parse_result.rs 25.56% <0.00%> (-1.00%) ⬇️
src/lazy/text/raw/sequence.rs 25.00% <25.00%> (ø)
src/lazy/text/value.rs 36.36% <66.66%> (+4.10%) ⬆️
src/lazy/text/raw/reader.rs 92.98% <86.66%> (-0.48%) ⬇️
src/lazy/binary/raw/sequence.rs 60.93% <100.00%> (ø)
src/lazy/text/buffer.rs 88.73% <100.00%> (+0.37%) ⬆️

... and 1 file with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@zslayton zslayton merged commit cb1042a into main Aug 23, 2023
@zslayton zslayton deleted the lazy-lists branch August 23, 2023 00:44
@zslayton zslayton self-assigned this Aug 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants