Adds `ion-tests` integration for the lazy reader #639

zslayton · 2023-09-06T18:25:33Z

Adds ion-tests integration for the lazy reader. Also addresses a number bugs surfaced by the conformance tests.

Fixes issue #636.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

codecov · 2023-09-06T18:29:41Z

Codecov Report

Patch coverage is 92.14% of modified lines.

Files Changed	Coverage
src/lazy/struct.rs	`ø`
src/lazy/value.rs	`ø`
src/lazy/value_ref.rs	`0.00%`
src/lazy/text/raw/struct.rs	`83.33%`
src/lazy/binary/immutable_buffer.rs	`86.32%`
src/lazy/text/raw/sequence.rs	`88.23%`
src/lazy/text/matched.rs	`93.67%`
src/lazy/system_reader.rs	`94.73%`
src/lazy/text/buffer.rs	`95.03%`
src/lazy/binary/raw/reader.rs	`100.00%`
... and 8 more

📢 Thoughts on this report? Let us know!.

zslayton

🗺️ PR tour

zslayton · 2023-09-06T18:28:41Z

src/lazy/binary/immutable_buffer.rs

🗺️ Previously, the binary reader had a monolithic peek_value method that contained logic for reading fields, annotations, NOPs, and the values themselves. I refactored this so that each has its own code path with clearer invariants. This made it easier to track down a NOP-related buffer indexing problem.

zslayton · 2023-09-06T18:33:40Z

src/lazy/binary/raw/reader.rs

-        self.buffer = buffer;
+        // If the value we read doesn't start where we began reading, there was a NOP.
+        let num_nop_bytes = lazy_value.input.offset() - buffer.offset();
+        self.buffer = buffer.consume(num_nop_bytes);


🗺️ This was the buffer indexing bug I referred to in an earlier comment.

zslayton · 2023-09-06T18:34:44Z

src/lazy/binary/raw/value.rs

-        // Skip the type descriptor
-        let input = self.input.consume(1);
+        // Skip the type descriptor and length bytes
+        let input = ImmutableBuffer::new(self.value_body()?);


🗺️ This placeholder impl lived longer than it was supposed to, and always slipped past the tests because I didn't dummy up any decimals that were > 13 bytes long.

zslayton · 2023-09-06T18:40:36Z

src/lazy/reader.rs

🗺️ I tried to standardize the reader level names a bit more. There are three levels: Raw, System, and Application. Each level has three supported decoders: Text, Binary, or Any, where Any uses enum dispatch to abstract over Text or Binary.

LazyRawBinaryReader, LazyRawTextReader, LazyRawAnyReader

LazySystemBinaryReader, LazySystemTextReader, LazySystemAnyReader

LazyApplicationBinaryReader, LazyApplicationSystemTextReader, LazyApplicationSystemAnyReader

Because users will almost always interact with the Application level, that is the default and type aliases are provided to allow the level to be omitted.

LazyBinaryReader, LazyTextReader

Similarly, most users won't want to have to specify the format, so they can just use the LazyReader to get a LazyApplicationAnyReader.

zslayton · 2023-09-06T18:42:40Z

src/lazy/system_reader.rs

@@ -107,7 +124,7 @@ impl<'data, D: LazyDecoder<'data>> LazySystemReader<'data, D> {
            return Ok(false);
        }
        if let Some(symbol_ref) = lazy_value.annotations().next() {
-            return Ok(symbol_ref? == ION_SYMBOL_TABLE);
+            return Ok(symbol_ref?.matches_sid_or_text(3, "$ion_symbol_table"));


🗺️ Now that there's a text reader, we cannot simply look at symbol IDs when comparing annotations or field names.

zslayton · 2023-09-06T19:12:35Z

src/lazy/text/raw/reader.rs

+        if let RawStreamItem::VersionMarker(major, minor) = matched {
+            if (major, minor) != (1, 0) {
+                return IonResult::decoding_error(format!(
+                    "Ion version {major}.{minor} is not supported"
+                ));
+            }
+        }


🗺️ There are ion-tests which expect the reader to reject future Ion versions.

zslayton · 2023-09-06T19:13:02Z

src/lazy/text/raw/reader.rs

@@ -191,8 +199,8 @@ mod tests {
            // Second item
            2 /*comment before comma*/,
            // Third item
-            3
-        ]
+            3 ,]


🗺️ I added a trailing comma while debugging.

zslayton · 2023-09-06T19:13:35Z

src/lazy/text/raw/sequence.rs

+                .slice_to_end(1)
+                .match_optional_comments_and_whitespace()
+                .with_context("skipping a list's trailing comma", input_after_ws)?;
+        }


🗺️ If a list had a trailing comma, this would break.

zslayton · 2023-09-06T19:15:47Z

src/lazy/text/raw/struct.rs

+                .slice_to_end(1)
+                .match_optional_comments_and_whitespace()
+                .with_context("skipping a list's trailing comma", input_after_ws)?;
+        }


🗺️ If the struct had a trailing comma, this would break.

zslayton · 2023-09-06T19:17:02Z

src/lazy/value_ref.rs

+        let value: Value = value_ref.try_into()?;
+        Ok(value.into())
+    }
+}


🗺️ This enables: let element: Element = reader.next()?.read()?.try_into()?;

zslayton

🗺️ Restoring PR tour comments that GitHub decided to remove/hide as "outdated."

zslayton · 2023-09-06T19:48:52Z

src/lazy/binary/immutable_buffer.rs

🗺️ Previously, the binary reader had a monolithic peek_value method that contained logic for reading fields, annotations, NOPs, and the values themselves. I refactored this so that each has its own code path with clearer invariants. This made it easier to track down a NOP-related buffer indexing problem.

zslayton · 2023-09-06T19:49:53Z

src/lazy/reader.rs

🗺️ I tried to normalize the reader level names a bit more. There are three levels: Raw, System, and Application. Each level has three supported decoders: Text, Binary, or Any, where Any uses enum dispatch to abstract over Text or Binary.

LazyRawBinaryReader, LazyRawTextReader, LazyRawAnyReader

LazySystemBinaryReader, LazySystemTextReader, LazySystemAnyReader

LazyApplicationBinaryReader, LazyApplicationSystemTextReader, LazyApplicationSystemAnyReader

Because users will almost always interact with the Application level, that is the default and type aliases are provided to allow the level to be omitted.

LazyBinaryReader, LazyTextReader

Similarly, most users won't want to have to specify the format, so they can just use the LazyReader to get a LazyApplicationAnyReader.

Oh, I missed this comment when I was reviewing the code. However, I think my question is still valid. Why do we need the encoding distinction at every level of reader?

popematt

Overall it looks good. I do have one question though.

You've made a "Binary", "Text", and "Any" implementation for the "Raw", "System", and "Application" readers. Is there any reason that the text and binary distinction has to be pushed up that far? Would it be possible to create a RawAnyReader that is an enum of binary and text raw readers so that higher levels of readers can have just one implementation?

popematt · 2023-09-07T18:20:34Z

src/lazy/reader.rs

@@ -55,11 +56,11 @@ use crate::{IonError, IonResult};
 ///# Ok(())
 ///# }
 /// ```
-pub struct LazyReader<'data, D: LazyDecoder<'data>> {
+pub struct LazyApplicationReader<'data, D: LazyDecoder<'data>> {


Doc comments need to be updated. It still refers to this as a "binary" reader.

popematt · 2023-09-07T18:26:37Z

src/lazy/text/buffer.rs

@@ -357,7 +366,7 @@ impl<'data> TextBufferView<'data> {
    /// input bytes where the field name is found, and the value.
    pub fn match_struct_field_name_and_value(
        self,
-    ) -> IonParseResult<'data, ((MatchedSymbol, Range<usize>), LazyRawTextValue<'data>)> {
+    ) -> IonParseResult<'data, ((MatchedFieldName, Range<usize>), LazyRawTextValue<'data>)> {


{'''f''' '''o''' '''o''': '''b''' '''a''' '''r'''}

🤯

popematt · 2023-09-07T18:27:40Z

src/lazy/text/buffer.rs

@@ -701,7 +722,7 @@ impl<'data> TextBufferView<'data> {

    /// Matches the three parts of an int--its base, its sign, and its digits--without actually
    /// constructing an Int from them.
-    fn match_int(self) -> IonParseResult<'data, MatchedInt> {
+    pub fn match_int(self) -> IonParseResult<'data, MatchedInt> {


Just curious—why does this need to be pub now?

I wanted each of the methods that matched a full Ion type to be reusable from outside the buffer (but not outside the crate without more consideration).

zslayton · 2023-09-07T20:43:09Z

Overall it looks good. I do have one question though.

You've made a "Binary", "Text", and "Any" implementation for the "Raw", "System", and "Application" readers. Is there any reason that the text and binary distinction has to be pushed up that far? Would it be possible to create a RawAnyReader that is an enum of binary and text raw readers so that higher levels of readers can have just one implementation?

The old reader uses a Box<dyn IonReader> to abstract over the raw reader and format, and unfortunately it adds a really substantial amount of overhead. A quick test I threw together showed it was about 25% slower than using static dispatch with a single format. The Any readers use enum dispatch, and my assumption is that it will be far faster than dyn (branch predictors are great!), but I haven't quantified the performance impact yet. If it's a slowdown, users focused on performance will probably want the option to work with a format-specific reader.

Note that LazyApplicationReader<D> is pub no matter what, so we're really just talking about what type aliases to provide.

Feedback from PRs: * #609 * #614 * #616 * #619 * #620 * #627 * #628 * #638 * #639

zslayton added 30 commits July 24, 2023 16:54

Top-level nulls, bools, ints

e0a83d8

Consolidate impls of AsUtf8 w/helper fn

89f79aa

Improved TextBufferView docs, removed DataSource

840be4d

Adds lazy text floats

5db1ff0

Adds LazyRawTextReader support for comments

07d4a70

Adds LazyRawTextReader support for reading strings

181e0a5

clippy fixes

357ca8f

Fix a couple of unit tests

716ff34

Less ambitious float eq comparison

e29fec5

Adds LazyRawTextReader support for reading symbols

8f79a36

Adds more doc comments

4cb9b2b

More doc comments

54470d2

Adds LazyRawTextReader support for reading lists

78014e7

Adds LazyRawTextReader support for structs

a6a3aa8

More doc comments

4fc9078

Adds LazyRawTextReader support for reading IVMs

11174ac

Initial impl of a LazyRawAnyReader

719dbaa

Improved comments.

f603872

Adds LazyRawTextReader support for annotations

4696ca5

Adds lazy reader support for timestamps

c7129ac

Lazy reader support for s-expressions

44435ea

Fixed doc comments

d50e05b

Fix internal doc link

8283422

Adds lazy reader support for decimals

0f01099

Fixed bad unit test example case

b60f1fe

clippy fixes

915c83a

Adds lazy reader support for blobs

fe922ff

Adds lazy reader support for long strings

066ddd8

Merged long string matcher tests into overall string tests

c58e5f0

wip

6b5ce1c

zslayton added 7 commits September 1, 2023 16:55

Merge main, complete support for clobs

e45ec35

clippy suggestion

a3f8a21

Adds lazy reader support for clobs

62be7c9

clippy suggestion

0eacd3a

Fix newline normalization, add unit tests

175009d

comment cleanup

3421393

Adds ion-tests integration for the lazy reader.

7efd52b

zslayton changed the base branch from main to lazy-clobs September 6, 2023 18:32

zslayton commented Sep 6, 2023

View reviewed changes

cleanup

0a127b7

zslayton marked this pull request as ready for review September 6, 2023 19:27

zslayton requested review from jobarr-amzn, popematt and desaikd September 6, 2023 19:28

zslayton commented Sep 6, 2023

View reviewed changes

zslayton mentioned this pull request Sep 7, 2023

Adds lazy reader support for reading clobs #638

Merged

Base automatically changed from lazy-clobs to main September 7, 2023 12:07

Merge remote-tracking branch 'origin/main' into lazy-reader-ion-tests

5958ac8

popematt approved these changes Sep 7, 2023

View reviewed changes

zslayton merged commit 700e983 into main Sep 7, 2023

zslayton deleted the lazy-reader-ion-tests branch September 7, 2023 20:45

zslayton added a commit that referenced this pull request Sep 7, 2023

Feedback from PR #639

6cf0936

zslayton added a commit that referenced this pull request Sep 7, 2023

Incorporates pending feedback from lazy reader PRs (#642)

ec91888

Feedback from PRs: * #609 * #614 * #616 * #619 * #620 * #627 * #628 * #638 * #639

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds `ion-tests` integration for the lazy reader #639

Adds `ion-tests` integration for the lazy reader #639

zslayton commented Sep 6, 2023

codecov bot commented Sep 6, 2023 •

edited

Loading

zslayton left a comment

zslayton Sep 6, 2023 •

edited

Loading

zslayton Sep 6, 2023

zslayton Sep 6, 2023

zslayton Sep 6, 2023 •

edited

Loading

zslayton Sep 6, 2023

zslayton Sep 6, 2023

zslayton Sep 6, 2023

zslayton Sep 6, 2023

zslayton Sep 6, 2023

zslayton Sep 6, 2023

zslayton left a comment

zslayton Sep 6, 2023

zslayton Sep 6, 2023

popematt Sep 7, 2023

popematt left a comment

popematt Sep 7, 2023

popematt Sep 7, 2023

popematt Sep 7, 2023

zslayton Sep 7, 2023

zslayton commented Sep 7, 2023

Adds ion-tests integration for the lazy reader #639

Adds ion-tests integration for the lazy reader #639

Conversation

zslayton commented Sep 6, 2023

codecov bot commented Sep 6, 2023 • edited Loading

Codecov Report

zslayton left a comment

Choose a reason for hiding this comment

zslayton Sep 6, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zslayton Sep 6, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zslayton left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

popematt left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zslayton commented Sep 7, 2023

Adds `ion-tests` integration for the lazy reader #639

Adds `ion-tests` integration for the lazy reader #639

codecov bot commented Sep 6, 2023 •

edited

Loading

zslayton Sep 6, 2023 •

edited

Loading

zslayton Sep 6, 2023 •

edited

Loading