-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds read/write support for system eexps, placeholders for remaining system macros #862
Conversation
* Macros now impleement `Send`, allowing them to be shared across threads. This makes it possible for all readers and writers to share the same system macro table instance instead of allocating their own copy. * Adds a `SystemMacroAddress` type that is range-checked at creation. * Adds a `SystemAddress` variant to the `MacroIdRef` enum. * Adds `Opcode::from_byte` support for `SystemEExpression`. * `Parameter` and `Macro` now use `CompactString` to store their names, skipping all heap allocations in the common case.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #862 +/- ##
==========================================
- Coverage 77.83% 77.82% -0.01%
==========================================
Files 136 136
Lines 34021 34199 +178
Branches 34021 34199 +178
==========================================
+ Hits 26479 26617 +138
- Misses 5602 5626 +24
- Partials 1940 1956 +16 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ PR Tour 🧭
@@ -53,6 +53,7 @@ base64 = "0.12" | |||
# chrono < 0.5 brings in a deprecated version of the `time` crate via `oldtime` feature by default | |||
# this makes it explicitly not do this as there is an advisory warning against this: | |||
# See: https://github.com/chronotope/chrono/issues/602 | |||
compact_str = "0.8.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ There are many crates offering an optimized small string type--here's a comparison of the available options.
This one is actively developed and stack-allocates strings up to 24 bytes long, which is enough to cover most parameter and macro names.
benches/read_many_structs.rs
Outdated
'timestamp': timestamp, | ||
'threadId': thread_id, | ||
'threadName': (make_string "scheduler-thread-" thread_name), | ||
'timestamp': (%timestamp), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ This benchmark had broken following recent changes to TDL syntax.
.unwrap() | ||
.read_all_elements()?; | ||
|
||
let mut text_1_0_group = c.benchmark_group("text 1.0"); | ||
// Visit each top level value in the stream without reading it. | ||
text_1_0_group.bench_function("scan all", |b| { | ||
b.iter(|| { | ||
let mut reader = Reader::new(v1_1::Text, text_1_0_data.as_slice()).unwrap(); | ||
let mut reader = Reader::new(v1_0::Text, text_1_0_data.as_slice()).unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It wouldn't have mattered much if at all, but we were using a v1_1 text reader to read 1.0 text.
@@ -726,6 +726,10 @@ impl BinaryValueWriter_1_1<'_, '_> { | |||
MacroIdRef::LocalAddress(_address) => { | |||
todo!("macros with addresses greater than {MAX_20_BIT_ADDRESS}"); | |||
} | |||
MacroIdRef::SystemAddress(address) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ Binary write logic for system macro invocations.
@@ -25,7 +25,10 @@ pub struct TextAnnotatedValueWriter_1_1<'value, W: Write> { | |||
} | |||
|
|||
impl<'value, W: Write + 'value> AnnotatableWriter for TextValueWriter_1_1<'value, W> { | |||
type AnnotatedValueWriter<'a> = TextAnnotatedValueWriter_1_1<'a, W> where Self: 'a; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ New rustfmt
release has new opinions.
thread_local! { | ||
/// An instance of the Ion 1.1 system macro table is lazily instantiated once per thread, | ||
/// minimizing the number of times macro compilation occurs. | ||
/// | ||
/// The thread-local instance holds `Rc` references to its macro names and macro definitions, | ||
/// making its contents inexpensive to `clone()` and reducing the number of duplicate `String`s | ||
/// being allocated over time. | ||
/// | ||
/// Each time the user constructs a reader, it clones the thread-local copy of the system macro | ||
/// table. | ||
pub static ION_1_1_SYSTEM_MACROS: MacroTable = MacroTable::construct_system_macro_table(); | ||
} | ||
/// A lazily initialized singleton instance of the Ion 1.1 system macro table. | ||
/// | ||
/// This instance is shared by all readers, minimizing the number of times that macro compilation | ||
/// and heap allocation occurs. | ||
pub static ION_1_1_SYSTEM_MACROS: LazyLock<MacroTable> = | ||
LazyLock::new(MacroTable::construct_system_macro_table); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ Originally, each reader or writer would instantiate its own copy of the system macro table. This involved several heap allocations and some compilation logic that could feel quite heavyweight for a short-lived stream.
To avoid paying this cost for every reader and writer, I made the system macro table a thread-local value. Only the first reader or writer on a thread would pay the initialization cost, and Macro
and Parameter
instances could continue using lightweight Rc
s to share data within the thread.
However, this introduced a problem: unlike static
values that live for the duration of the program, thread locals only live for the duration of their thread; they do not have a named lifetime. You can only access a thread local value via a closure, and you cannot retain a reference to it. As a result, it wasn't possible to get a reference (&
) to a system macro from the thread local system macro table—only an Rc
.
As a compromise, I used a thread local to build the first system macro table—performing the necessary macro compilation steps—and then had subsequent readers and writers clone()
that instance of the table. This allowed most readers and writers to avoid the cost of compiling system macros, but they still had to pay for the heap allocations. Not terribly satisfying.
Recently, in Rust v1.80, the LazyLock
type was stabilized. It uses synchronization primitives to manage initializing the value, but all accesses thereafter can get a &
reference for 'free'.
By switching the internals of the Macro
type from Rc<str>
to CompactString
and Rc<Macro>
to Arc<Macro>
, I was able to create an immutable, lazily-initialized global singleton system macro table. References to the system macros are now trivial to get and can be reused across all readers and writers.
num_annotations: 0, | ||
}), | ||
}, | ||
ExpansionAnalysis::single_application_value(IonType::String), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ Note the use of named ExpansionAnalysis
constructors for clarity.
@@ -707,8 +938,9 @@ impl MacroTable { | |||
} | |||
} | |||
|
|||
// TODO: Remove this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ Once we implement the encoding module sequence, we will no longer initialize the local macro table by cloning the system macro table.
@@ -731,6 +963,9 @@ impl MacroTable { | |||
match id { | |||
MacroIdRef::LocalName(name) => self.macro_with_name(name), | |||
MacroIdRef::LocalAddress(address) => self.macro_at_address(address), | |||
MacroIdRef::SystemAddress(system_address) => { | |||
ION_1_1_SYSTEM_MACROS.macro_at_address(system_address.as_usize()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ Example of getting a &
reference to a system macro.
@@ -1177,8 +1177,49 @@ impl<'top> TextBuffer<'top> { | |||
Ok((exp_body_after_id, id)) | |||
} | |||
|
|||
pub fn match_system_eexp_id(self) -> IonParseResult<'top, MacroIdRef<'top>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ Text parsing logic for $ion::<name>
and $ion::<addr>
. Later on we can generalize this for any qualified macro reference.
@@ -127,6 +127,7 @@ impl<'top, D: Decoder> EExpression<'top, D> { | |||
environment = self.new_evaluation_environment()?; | |||
MacroExpansionKind::Template(TemplateExpansion::new(template_ref)) | |||
} | |||
MacroKind::ToDo => todo!("system macro {}", invoked_macro.name().unwrap()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Good idea :D
Builds on PR #861.
MacroTable
now implementsSend
, allowing instances to be shared across threads. This made it possible for the system symbol table to be a singleton for all readers. To achieve this:Rc
references to otherMacro
instances are nowArc
. This incurs higher overhead for new macro definitions at compile time, but no overhead at usage time.Reader
andWriter
instantiation benefit by not having to create a new system macro table instance.Rc<str>
instances used to store parameter and macro names have been replaced byCompactString
, which doesn't allocate for strings that are<=24
bytes long.$ion::<addr>
and$ion::<name>
e-expressions.0xEF
system e-expressions.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.