diff --git a/rfcs/2021-12-20-10517-llvm-backend-for-vrl.md b/rfcs/2021-12-20-10517-llvm-backend-for-vrl.md
new file mode 100644
index 0000000000000..064392f2aa580
--- /dev/null
+++ b/rfcs/2021-12-20-10517-llvm-backend-for-vrl.md
@@ -0,0 +1,815 @@
+# RFC 10517 - 2021-12-20 - LLVM Backend for VRL
+
+Performance is a key aspect of VRL. We aim to provide a language that is
+["extremely fast and efficient"](https://github.com/vectordotdev/vector/blob/f1404bea186ba83c4426a32bbef3f633c17cf4d2/website/cue/reference/remap/features/compilation.cue#L6)
+and
+["ergonomically safe in that it makes it difficult to create slow or buggy VRL programs"](https://github.com/vectordotdev/vector/blob/f1404bea186ba83c4426a32bbef3f633c17cf4d2/website/cue/reference/remap/features/ergonomic_safety.cue#L4).
+Moving towards this goal, we propose to speed up general program execution by
+using LLVM to eliminate any runtime overhead that is currently associated with
+interpreting a VRL program.
+
+## Common Misconceptions
+
+Below we present a list of statements that have commonly come up when discussing
+the current execution model of VRL. We illustrate where these thoughts come from
+and how their understanding is often counter-intuitive.
+
+### "VRL programs are compiled to and run as native Rust code" [↪](https://github.com/vectordotdev/vector/blob/f1404bea186ba83c4426a32bbef3f633c17cf4d2/website/cue/reference/remap/features/compilation.cue#L4)
+
+Vector's codebase consists entirely of code written in Rust, so one might be
+inclined to conclude that anything running inside of Vector is running as
+"native Rust" code. This is true from a computability point of view - VRL
+processes events in a way that is semantically indistinguishable from
+transformations that were hand-written in Rust. However, implementation details
+are critical for execution time, not just the semantic definition of their
+computation.
+
+In particular, removing the hidden indirection of "we implement a mechanism that
+can perform sufficiently general computation within our program to execute
+program logic" to "we implement a program that executes program logic" makes a
+surprisingly large
+[difference](#the-time-spent-within-the-vrl-interpretervm-itself-is-small-removing-this-overhead-can-hardly-result-in-significant-performance-improvements),
+even if the aforementioned mechanism is written in a high-performance language.
+
+In its current implementation, "VRL programs are compiled to a representation
+that is interpreted in native Rust code" would be a more fitting description.
+
+### "VRL programs are extremely fast and efficient, with performance characteristics very close to Rust itself" [↪](https://github.com/vectordotdev/vector/blob/f1404bea186ba83c4426a32bbef3f633c17cf4d2/website/cue/reference/remap/features/compilation.cue#L6)
+
+Many code paths taken during execution of a VRL program have been compiled by
+Rust, e.g. when inserting or deleting paths of a VRL value, parsing JSON or
+matching with regular expressions. These paths are highly optimized and don't
+incur any runtime overhead on top of what the Rust compiler is able to produce.
+
+However, the top-level control flow of a VRL program is orchestrated at runtime
+and therefore different from a semantically equivalent transformation that has
+been implemented in Rust. The main difference lies in that when compiling a Rust
+program, the CPU knows statically which branches are taken between VRL
+expressions (minus conditionals and error handling). We elaborate on the
+**super-proportional** effects this has on performance further
+[below](#the-time-spent-within-the-vrl-interpretervm-itself-is-small-removing-this-overhead-can-hardly-result-in-significant-performance-improvements).
+
+### "VRL has no runtime [...]" [↪](https://github.com/vectordotdev/vector/blob/f1404bea186ba83c4426a32bbef3f633c17cf4d2/website/cue/reference/remap/features/compilation.cue#L7)
+
+There is no way to point the CPU instruction counter to a VRL
+[`Program`](https://github.com/vectordotdev/vector/blob/f1404bea186ba83c4426a32bbef3f633c17cf4d2/lib/vrl/compiler/src/program.rs#L5-L10)
+to execute it. Instead, it relies on a runtime to interpret a VRL program by
+implementing a
+[`resolve`](https://github.com/vectordotdev/vector/blob/f1404bea186ba83c4426a32bbef3f633c17cf4d2/lib/vrl/compiler/src/expression.rs#L51-L56)
+method for each expression. It fits the very definition of
+["_any behavior not directly attributable to the program itself_"](https://en.wikipedia.org/wiki/Runtime_system#Overview)
+well and even exists as
+[`Runtime`](https://github.com/vectordotdev/vector/blob/80268ee9b66def9e8cba848b29371013b7cd8b9c/lib/vrl/core/src/runtime.rs#L11-L15)
+in our code.
+
+### "The time spent within the VRL interpreter/VM itself is small, removing this overhead can hardly result in significant performance improvements"
+
+When inspecting flamegraphs of VRL program execution one can see that, on a very
+roughly estimate, no more than 25% of the time is spent in the interpret call
+itself without progressing the program state. So, how can one expect that by
+removing this overhead, any performance increase bigger than 33% would be
+attainable?
+
+The answer has largely to do with the
+[memory bottleneck in the von Neumann architecture](https://en.wikipedia.org/wiki/Von_Neumann_architecture#Von_Neumann_bottleneck)
+and the mechanisms modern CPU architecture employ to mitigate it.
+
+The CPU can improve execution speed of subsequent CPU instructions by using
+[instruction pipelining](https://en.wikipedia.org/wiki/Instruction_pipelining)
+as long as these instructions are not fragmented between unpredictable paths.
+When conditional branches exist but are heavily biased, the CPU can
+[speculatively execute](https://en.wikipedia.org/wiki/Branch_predictor)
+instructions and read/write from main memory to hide the
+[latency of memory access](https://en.wikipedia.org/wiki/Memory_hierarchy#Examples),
+which is _orders of magnitude_ higher than accessing CPU registers/caches or
+executing arithmetic operations.
+
+In the current execution model, the CPU is not able to predict any control flow
+on the boundary between any VRL (sub)expression, majorly limiting the CPU
+utilization by [stalling](https://en.wikipedia.org/wiki/CPU_cache#CPU_stalls)
+the CPU.
+
+### "Optimizing for single core performance is not as important when one can resort to parallelism first"
+
+When a problem looks
+[embarrassingly parallel](https://en.wikipedia.org/wiki/Embarrassingly_parallel)
+one might think that squeezing out single-core performance does not look very
+worthwhile when adding more threads would seemingly always have an outsized
+effect.
+
+However, even a small amount of synchronization points can have a detrimental
+impact on optimal performance. According to
+[Amdahl's law](https://en.wikipedia.org/wiki/Amdahl%27s_law), even when only 5%
+of a program can not be parallelized, the maximum performance increase with an
+_infinite_ amount of threads is capped at 20x.
+
+### "LLVM is a virtual machine"
+
+Judging from its name, one might assume that LL**VM** stands for "... virtual
+machine"[^1] and that it's merely a more general and sophisticated VM
+implementation than our upcoming special purpose VRL virtual machine.
+
+However, even though LLVM provides a virtual instruction set architecture, it is
+an intermediate representation that exists only during the compilation process
+between high-level language and machine code **without** any interpretation at
+runtime.
+
+One essential part of LLVM are optimization passes that act on LLVM IR, e.g. by
+inlining functions, merging code branches, promoting memory access to register
+access, performing constant-folding, batching allocations and more.
+
+Rust uses LLVM to emit machine code, and we intend to employ the exact same
+technique.
+
+## Context / Cross cutting concerns
+
+There's ongoing work on implementing a VM for VRL:
+[#10011](https://github.com/vectordotdev/vector/pull/10011). While it reduces
+the interpretation overhead over the current expression traversal, it doesn't
+eliminate the overhead entirely. More importantly, it doesn't fundamentally
+improve behavior for speculative execution / branch prediction, since the CPU
+can't predict the next instruction in the interpreter loop.
+
+## Scope
+
+### In scope
+
+Introducing a new execution model to VRL that directly executes machine code
+without runtime interpretation overhead, which can be opt in by users.
+
+### Out of scope
+
+Overall, this is an experiment to gauge how much performance improvements can be
+won by using LLVM. We will not roll out the new backend for production use yet
+and need to investigate the specific security and performance needs of our users
+going forward.
+
+Any optimization that applies to all execution models for VRL (traversal, VM and
+LLVM) is not interesting for this consideration, e.g. improving access paths to
+VRL
+[`Value`](https://github.com/vectordotdev/vector/blob/f1404bea186ba83c4426a32bbef3f633c17cf4d2/lib/vrl/compiler/src/value.rs#L21-L32)s.
+
+## Pain
+
+Performance investigations of various Vector topologies suggested that the
+single-core performance of VRL is a bottleneck in many cases.
+
+## Proposal
+
+### User Experience
+
+The semantics of VRL stay **unchanged**. Any case where a VRL program is not
+strictly faster in the LLVM execution model versus traversal or VM is considered
+a definite bug.
+
+This is an unconditional win for user experience.
+
+### Introduction to LLVM
+
+To get familiar with how LLVM looks like and how its code generation builder
+works, I recommend reading through the official tutorial
+["Kaleidoscope: Code generation to LLVM IR"](https://llvm.org/docs/tutorial/MyFirstLanguageFrontend/LangImpl03.html).
+
+There exist an adapted version of the
+[LLVM Kaleidoscope tutorial in Rust](https://github.com/TheDan64/inkwell/blob/master/examples/kaleidoscope/main.rs)
+for [`inkwell`](https://github.com/TheDan64/inkwell), a crate that exposes a
+safe wrapper around [LLVM's C API](https://llvm.org/doxygen/group__LLVMC.html).
+
+Another great reference is Mukul Rathi's
+["A Complete Guide to LLVM for Programming Language Creators"](https://mukulrathi.com/create-your-own-programming-language/llvm-ir-cpp-api-tutorial/).
+
+Godbolt's [Compiler Explorer](https://godbolt.org/) is a great way to understand
+how compilers emit LLVM. E.g. running
+
+```rust
+#[no_mangle]
+pub extern "C" fn foo(n: i32) -> i32 {
+    n * 42
+}
+```
+
+through the compiler and setting the `rustc` argument to `--emit=llvm-ir -O`
+emits
+
+```llvm
+define i32 @foo(i32 %n) unnamed_addr #0 !dbg !6 {
+  %0 = mul i32 %n, 42, !dbg !10
+  ret i32 %0, !dbg !11
+}
+```
+
+Running `rustc ./program.rs --crate-type=lib --emit=llvm-ir -O` locally will
+accomplish the same.
+
+### Implementation
+
+On a high level, the goal is to produce executable machine code for a VRL
+program via emitting LLVM IR. When Vector launches, the VRL program is parsed,
+translated to LLVM IR, compiled to machine code via LLVM and dynamically loaded
+into the running process. The resulting `vrl_execute` function symbol is then
+resolved from the binary and called for each event to be transformed.
+
+Instead of recursively calling
+[`resolve`](https://github.com/vectordotdev/vector/blob/f1404bea186ba83c4426a32bbef3f633c17cf4d2/lib/vrl/compiler/src/expression.rs#L51-L56)
+on an
+[`Expression`](https://github.com/vectordotdev/vector/blob/f1404bea186ba83c4426a32bbef3f633c17cf4d2/lib/vrl/compiler/src/expression.rs#L50),
+we add an `emit_llvm` method to the trait:
+
+```rust
+/// Emit LLVM IR that computes the `Value` for this expression.
+fn emit_llvm<'ctx>(
+    &self,
+    state: &crate::state::Compiler,
+    context: &mut crate::llvm::Context<'ctx>,
+) -> Result<(), String>;
+```
+
+where `Context` is defined as
+
+```rust
+pub struct Context<'ctx> {
+    context: &'ctx inkwell::context::Context,
+    execution_engine: inkwell::execution_engine::ExecutionEngine<'ctx>,
+    module: inkwell::module::Module<'ctx>,
+    builder: inkwell::builder::Builder<'ctx>,
+    function: inkwelll::values::FunctionValue<'ctx>,
+    context_ref: inkwelll::values::PointerValue<'ctx>,
+    result_ref: inkwelll::values::PointerValue<'ctx>,
+    ...
+}
+```
+
+We will preserve the existing `resolve` methods for the time being since they
+provide a great reference for the current semantics of VRL and can serve as a
+target for automated correctness tests.
+
+By convention, each expression can call `context.result_ref()` to get an LLVM
+`PointerValue` with a pointer to where the
+[`Resolved`](https://github.com/vectordotdev/vector/blob/f1404bea186ba83c4426a32bbef3f633c17cf4d2/lib/vrl/compiler/src/expression.rs#L48)
+value should be stored.
+
+The result pointer can be temporarily changed using `context.set_result_ref()`.
+This mechanism allows the parent expression to call `emit_llvm` on the child
+while controlling where the machine code emitted for the child expression stores
+the result. This is useful, e.g. when emitting a binary operation where both of
+its operands need to be computed first.
+
+Calling `context.context_ref()` returns a reference to the VRL
+[`Context`](https://github.com/vectordotdev/vector/blob/f1404bea186ba83c4426a32bbef3f633c17cf4d2/lib/vrl/compiler/src/context.rs#L5-L9)
+that is provided by any Vector component that uses VRL internally.
+
+For anything less trivial than emitting branches or calling functions, we want
+to leverage the Rust compiler. For one, this allows us to not concern ourselves
+with memory layout and doesn't force us to define an FFI when we only use basic
+integer types or pointers/references. It also provides us with Rust's memory
+safety guarantees for large parts of the emitted LLVM IR.
+
+Specifically, the idea is to compose VRLs functionality entirely via the
+following LLVM instructions only:
+
+- `alloca` stack allocations
+- `br` conditional and unconditional branching
+- `call`s to Rust stubs
+- `global`s for constants
+
+This makes the implementation quite maintainable to Rust programmers, even with
+only a superficial understanding of LLVM and the emitted LLVM IR can be
+sufficiently optimized by LLVM's optimization passes.
+
+Most expressions which rely predominantly on precompiled Rust build LLVM IR
+similar to the following
+
+```rust
+let fn_ident = "vrl_resolved_initialize";
+let fn_impl = ctx
+    .module()
+    .get_function(fn_ident)
+    .ok_or(format!(r#"failed to get "{}" function"#, fn_ident))?;
+ctx.builder()
+    .build_call(fn_impl, &[result_ref.into()], fn_ident);
+```
+
+which will emit the LLVM instruction
+
+```llvm
+call void @vrl_resolved_initialize(%"std::result::Result<vrl_compiler::Value, vrl_compiler::ExpressionError>"* %result)
+```
+
+This call refers to a function compiled by Rust in the same LLVM module.
+However, we don't need to actually call this function, the implementation can be
+inlined and optimized by LLVM since the whole source code is known. This way of
+emitting LLVM IR is merely very convenient to stitch together code fragments.
+
+For temporary values needed e.g. in binary operations we can allocate
+uninitialized stack values:
+
+```rust
+let result_temp_ref = ctx.build_alloca_resolved("temp")?;
+```
+
+which will emit the LLVM instruction
+
+```llvm
+%temp = alloca %"std::result::Result<vrl_compiler::Value, vrl_compiler::ExpressionError>", align 8
+```
+
+it is our responsibility to initialize and drop the value accordingly. This can
+be accomplished by calling the implementations for `vrl_resolved_initialize` and
+`vrl_resolved_drop` as shown further below. To facilitate safe usage, we can
+expose a module builder API where allocating a temporary value immediately
+inserts a call to `vrl_resolved_initialize` and to `vrl_resolved_drop` when the
+builder value is dropped from the scope. In combination with the
+[`llvm.lifetime.start`](https://llvm.org/docs/LangRef.html#llvm-lifetime-start-intrinsic)
+and
+[`llvm.lifetime.end`](https://llvm.org/docs/LangRef.html#llvm-lifetime-end-intrinsic)
+intrinsics, that should guard us against use of uninitialized values or usage
+after the value has been dropped.
+
+VRL constants can be moved into the LLVM module by consuming the constant value
+of type `T` on the Rust side and transmuting it to `[i8]` written to a LLVM
+global. This is safe since Rust's semantics allow all types to be moved in
+memory unless they are `Pin`. Writing constants into the LLVM module has the
+benefit of allowing LLVM to apply constant folding at compile time. To guarantee
+that the resources that may have been allocated on the Rust side for creating
+the VRL constant are cleaned up properly, we transmute it back to `T` when
+unloading the LLVM module and drop it afterwards accordingly.
+
+Below we show a preliminary, work-in-progress excerpt of the precompiled
+functions. The LLVM module will be initialized with the resulting bitcode.
+Therefore, these function symbols possibly do not exist at runtime anymore if
+they are optimized out by LLVM.
+
+```rust
+#[no_mangle]
+pub extern "C" fn vrl_resolved_initialize(result: *mut Resolved) {
+    unsafe { result.write(Ok(Value::Null)) };
+}
+
+#[no_mangle]
+pub extern "C" fn vrl_resolved_drop(result: *mut Resolved) {
+    drop(unsafe { result.read() });
+}
+
+#[no_mangle]
+pub extern "C" fn vrl_resolved_is_err(result: &mut Resolved) -> bool {
+    result.is_err()
+}
+
+#[no_mangle]
+pub extern "C" fn vrl_resolved_boolean_is_true(result: &Resolved) -> bool {
+    result.as_ref().unwrap().as_boolean().unwrap()
+}
+
+#[no_mangle]
+pub extern "C" fn vrl_expression_assignment_target_insert_external_impl(
+    ctx: &mut Context,
+    path: &LookupBuf,
+    resolved: &Resolved,
+) {
+    let value = resolved.as_ref().unwrap().clone();
+    let _ = ctx.target_mut().insert(path, value);
+}
+
+#[no_mangle]
+pub extern "C" fn vrl_expression_literal_impl(value: &Value, result: &mut Resolved) {
+    *result = Ok(value.clone());
+}
+
+#[no_mangle]
+pub extern "C" fn vrl_expression_op_eq_impl(rhs: &mut Resolved, result: &mut Resolved) {
+    let rhs = std::mem::replace(rhs, Ok(Value::Null));
+    *result = match (result.clone(), rhs) {
+        (Ok(lhs), Ok(rhs)) => Ok(Value::Boolean(rhs == lhs)),
+        _ => unimplemented!(),
+    };
+}
+
+#[no_mangle]
+pub extern "C" fn vrl_expression_query_target_external_impl(
+    context: &mut Context,
+    path: &LookupBuf,
+    result: &mut Resolved,
+) {
+    *result = Ok(context
+        .target()
+        .get(path)
+        .ok()
+        .flatten()
+        .unwrap_or(Value::Null));
+}
+```
+
+With the precompiled library, we can emit code in terms of it by utilizing stack
+allocations, branches and functions calls only. E.g. the LLVM IR for the
+following VRL program:
+
+```vrl
+if .status == 123 {
+    .foo = "bar"
+}
+```
+
+Would look like this:
+
+```llvm
+; Function Attrs: mustprogress nofree norecurse nosync nounwind readnone uwtable willreturn
+define void @vrl_execute(%"vrl_compiler::Context"* noalias nocapture align 8 dereferenceable(32) %context, %"std::result::Result<vrl_compiler::Value, vrl_compiler::ExpressionError>"* noalias nocapture align 8 dereferenceable(88) %result) unnamed_addr #55 {
+start:
+  br label %if_statement_begin
+
+if_statement_begin:                               ; preds = %start
+  br label %"op_==_begin"
+
+"op_==_begin":                                    ; preds = %if_statement_begin
+  call void @vrl_expression_query_target_external_impl(%"vrl_compiler::Context"* %context, %"lookup_buf::LookupBuf"* bitcast ([32 x i8]* @status to %"lookup_buf::LookupBuf"*), %"std::result::Result<vrl_compiler::Value, vrl_compiler::ExpressionError>"* %result)
+  %rhs = alloca %"std::result::Result<vrl_compiler::Value, vrl_compiler::ExpressionError>", align 8
+  call void @vrl_resolved_initialize(%"std::result::Result<vrl_compiler::Value, vrl_compiler::ExpressionError>"* %rhs)
+  br label %literal_begin
+
+literal_begin:                                    ; preds = %"op_==_begin"
+  call void @vrl_expression_literal_impl(%"memmem::SearcherKind"* bitcast ([40 x i8]* @"123" to %"memmem::SearcherKind"*), %"std::result::Result<vrl_compiler::Value, vrl_compiler::ExpressionError>"* %rhs)
+  call void @vrl_expression_op_eq_impl(%"std::result::Result<vrl_compiler::Value, vrl_compiler::ExpressionError>"* %rhs, %"std::result::Result<vrl_compiler::Value, vrl_compiler::ExpressionError>"* %result)
+  call void @vrl_resolved_drop(%"std::result::Result<vrl_compiler::Value, vrl_compiler::ExpressionError>"* %rhs)
+  %vrl_resolved_boolean_is_true = call i1 @vrl_resolved_boolean_is_true(%"std::result::Result<vrl_compiler::Value, vrl_compiler::ExpressionError>"* %result)
+  br i1 %vrl_resolved_boolean_is_true, label %if_statement_if_branch, label %if_statement_else_branch
+
+if_statement_end:                                 ; preds = %if_statement_else_branch, %block_end
+  ret void
+
+if_statement_if_branch:                           ; preds = %literal_begin
+  br label %block_begin
+
+if_statement_else_branch:                         ; preds = %literal_begin
+  br label %if_statement_end
+
+block_begin:                                      ; preds = %if_statement_if_branch
+  br label %assignment_single_begin
+
+block_end:                                        ; preds = %block_next, %block_error
+  br label %if_statement_end
+
+block_error:                                      ; preds = %assignment_single_end
+  br label %block_end
+
+assignment_single_begin:                          ; preds = %block_begin
+  br label %literal_begin1
+
+assignment_single_end:                            ; preds = %literal_begin1
+  %vrl_resolved_is_err = call i1 @vrl_resolved_is_err(%"std::result::Result<vrl_compiler::Value, vrl_compiler::ExpressionError>"* %result)
+  br i1 %vrl_resolved_is_err, label %block_error, label %block_next
+
+literal_begin1:                                   ; preds = %assignment_single_begin
+  call void @vrl_expression_literal_impl(%"memmem::SearcherKind"* bitcast ([40 x i8]* @"\22bar\22" to %"memmem::SearcherKind"*), %"std::result::Result<vrl_compiler::Value, vrl_compiler::ExpressionError>"* %result)
+  call void @vrl_expression_assignment_target_insert_external_impl(%"vrl_compiler::Context"* %context, %"lookup_buf::LookupBuf"* bitcast ([32 x i8]* @foo to %"lookup_buf::LookupBuf"*), %"std::result::Result<vrl_compiler::Value, vrl_compiler::ExpressionError>"* %result)
+  br label %assignment_single_end
+
+block_next:                                       ; preds = %assignment_single_end
+  br label %block_end
+}
+```
+
+After running several LLVM optimization passes over the LLVM IR:
+
+```llvm
+; Function Attrs: nofree norecurse nosync nounwind readnone uwtable willreturn
+define void @vrl_execute(%142* noalias nocapture align 8 dereferenceable(32) %0, %752* noalias nocapture align 8 dereferenceable(88) %1) unnamed_addr #87 personality i32 (i32, i32, i64, %462*, %9*)* @rust_eh_personality {
+  %3 = alloca %529*, align 8
+  %4 = alloca %752, align 8
+  %5 = alloca [5 x i64], align 8
+  %6 = alloca %135, align 8
+  %7 = alloca %116, align 8
+  %8 = alloca %135, align 8
+  tail call void @vrl_expression_query_target_external_impl(%142* nonnull %0, %74* bitcast ([32 x i8]* @16146 to %74*), %752* nonnull %1) #104
+  %9 = alloca %752, align 8
+  %10 = getelementptr inbounds %752, %752* %9, i64 0, i32 0
+  store i64 0, i64* %10, align 8
+  %11 = getelementptr inbounds %752, %752* %9, i64 0, i32 1
+  %12 = bitcast [10 x i64]* %11 to i8*
+  store i8 8, i8* %12, align 8
+  tail call void @llvm.experimental.noalias.scope.decl(metadata !99596)
+  %13 = bitcast [5 x i64]* %5 to i8*
+  call void @llvm.lifetime.start.p0i8(i64 40, i8* nonnull %13), !noalias !99596
+  %14 = bitcast [5 x i64]* %5 to %135*
+  call fastcc void @17203(%135* noalias nocapture nonnull dereferenceable(40) %14, %135* nonnull align 8 dereferenceable(40) bitcast ([40 x i8]* @16147 to %135*)) #104, !noalias !99596
+  %15 = bitcast [10 x i64]* %11 to %135*
+  invoke fastcc void @17183(%135* nonnull %15)
+          to label %18 unwind label %16
+
+common.resume:                                    ; preds = %49, %16
+  %common.resume.op = phi { i8*, i32 } [ %17, %16 ], [ %50, %49 ]
+  resume { i8*, i32 } %common.resume.op
+
+16:                                               ; preds = %2
+  %17 = landingpad { i8*, i32 }
+          cleanup
+  store i64 0, i64* %10, align 8, !alias.scope !99596
+  call void @llvm.memcpy.p0i8.p0i8.i64(i8* noundef nonnull align 8 dereferenceable(40) %12, i8* noundef nonnull align 8 dereferenceable(40) %13, i64 40, i1 false)
+  br label %common.resume
+
+18:                                               ; preds = %2
+  store i64 0, i64* %10, align 8, !alias.scope !99596
+  call void @llvm.memcpy.p0i8.p0i8.i64(i8* noundef nonnull align 8 dereferenceable(40) %12, i8* noundef nonnull align 8 dereferenceable(40) %13, i64 40, i1 false)
+  call void @llvm.lifetime.end.p0i8(i64 40, i8* nonnull %13), !noalias !99596
+  call void @vrl_expression_op_eq_impl(%752* nonnull %9, %752* nonnull %1) #104
+  %19 = bitcast %752* %4 to i8*
+  call void @llvm.lifetime.start.p0i8(i64 88, i8* nonnull %19)
+  %20 = bitcast %752* %9 to i8*
+  call void @llvm.memcpy.p0i8.p0i8.i64(i8* noundef nonnull align 8 dereferenceable(88) %19, i8* noundef nonnull align 8 dereferenceable(88) %20, i64 88, i1 false) #104
+  %21 = getelementptr inbounds %752, %752* %4, i64 0, i32 0
+  %22 = load i64, i64* %21, align 8, !range !220, !alias.scope !99599
+  %23 = icmp eq i64 %22, 0
+  %24 = getelementptr inbounds %752, %752* %4, i64 0, i32 1
+  br i1 %23, label %25, label %27
+
+25:                                               ; preds = %18
+  %26 = bitcast [10 x i64]* %24 to %135*
+  call fastcc void @17183(%135* nonnull %26) #104
+  br label %29
+
+27:                                               ; preds = %18
+  %28 = bitcast [10 x i64]* %24 to %529*
+  call void @17184(%529* nonnull %28)
+  br label %29
+
+29:                                               ; preds = %25, %27
+  call void @llvm.lifetime.end.p0i8(i64 88, i8* nonnull %19)
+  %30 = getelementptr %752, %752* %1, i64 0, i32 0
+  %31 = load i64, i64* %30, align 8, !range !220
+  %32 = getelementptr inbounds %752, %752* %1, i64 0, i32 1
+  %33 = icmp eq i64 %31, 0
+  br i1 %33, label %38, label %34
+
+34:                                               ; preds = %29
+  %35 = bitcast %529** %3 to i8*
+  call void @llvm.lifetime.start.p0i8(i64 8, i8* nonnull %35), !noalias !99602
+  %36 = bitcast %529** %3 to [10 x i64]**
+  store [10 x i64]* %32, [10 x i64]** %36, align 8, !noalias !99602
+  %37 = bitcast %529** %3 to {}*
+  call void @_ZN4core6result13unwrap_failed17h0f27636d1d025391E([0 x i8]* noalias nonnull readonly align 1 bitcast (<{ [43 x i8] }>* @13883 to [0 x i8]*), i64 43, {}* nonnull align 1 %37, [3 x i64]* noalias readonly align 8 dereferenceable(24) bitcast (<{ i8*, [16 x i8], i8*, [0 x i8] }>* @6297 to [3 x i64]*), %71* noalias nonnull readonly align 8 dereferenceable(24) bitcast (<{ i8*, [16 x i8] }>* @6302 to %71*)) #104
+  unreachable
+
+38:                                               ; preds = %29
+  %39 = bitcast [10 x i64]* %32 to %135*
+  %40 = bitcast [10 x i64]* %32 to i8*
+  %41 = load i8, i8* %40, align 8, !range !1540
+  %42 = icmp eq i8 %41, 3
+  %43 = getelementptr inbounds %135, %135* %39, i64 0, i32 1, i64 0
+  %44 = load i8, i8* %43, align 1
+  %45 = select i1 %42, i8 %44, i8 2
+  br label %NodeBlock
+
+NodeBlock:                                        ; preds = %38
+  %Pivot = icmp slt i8 %45, 2
+  br i1 %Pivot, label %LeafBlock, label %LeafBlock21
+
+LeafBlock21:                                      ; preds = %NodeBlock
+  %SwitchLeaf22 = icmp eq i8 %45, 2
+  br i1 %SwitchLeaf22, label %46, label %NewDefault
+
+LeafBlock:                                        ; preds = %NodeBlock
+  %SwitchLeaf = icmp eq i8 %45, 0
+  br i1 %SwitchLeaf, label %47, label %NewDefault
+
+46:                                               ; preds = %LeafBlock21
+  call void @_ZN4core9panicking5panic17h367b69984712bd50E([0 x i8]* noalias nonnull readonly align 1 bitcast (<{ [43 x i8] }>* @13881 to [0 x i8]*), i64 43, %71* noalias nonnull readonly align 8 dereferenceable(24) bitcast (<{ i8*, [16 x i8] }>* @6303 to %71*)) #104
+  unreachable
+
+47:                                               ; preds = %LeafBlock, %71
+  ret void
+
+NewDefault:                                       ; preds = %LeafBlock21, %LeafBlock
+  br label %48
+
+48:                                               ; preds = %NewDefault
+  call void @llvm.experimental.noalias.scope.decl(metadata !99605)
+  call void @llvm.lifetime.start.p0i8(i64 40, i8* nonnull %13), !noalias !99605
+  call fastcc void @17203(%135* noalias nocapture nonnull dereferenceable(40) %14, %135* nonnull align 8 dereferenceable(40) bitcast ([40 x i8]* @16148 to %135*)) #104, !noalias !99605
+  invoke fastcc void @17183(%135* nonnull %39)
+          to label %51 unwind label %49
+
+49:                                               ; preds = %48
+  %50 = landingpad { i8*, i32 }
+          cleanup
+  store i64 0, i64* %30, align 8, !alias.scope !99605
+  call void @llvm.memcpy.p0i8.p0i8.i64(i8* noundef nonnull align 8 dereferenceable(40) %40, i8* noundef nonnull align 8 dereferenceable(40) %13, i64 40, i1 false)
+  br label %common.resume
+
+51:                                               ; preds = %48
+  store i64 0, i64* %30, align 8, !alias.scope !99605
+  call void @llvm.memcpy.p0i8.p0i8.i64(i8* noundef nonnull align 8 dereferenceable(40) %40, i8* noundef nonnull align 8 dereferenceable(40) %13, i64 40, i1 false)
+  call void @llvm.lifetime.end.p0i8(i64 40, i8* nonnull %13), !noalias !99605
+  call void @llvm.experimental.noalias.scope.decl(metadata !99608)
+  %52 = getelementptr inbounds %135, %135* %8, i64 0, i32 0
+  call void @llvm.lifetime.start.p0i8(i64 40, i8* nonnull %52), !noalias !99611
+  call fastcc void @17203(%135* noalias nocapture nonnull dereferenceable(40) %8, %135* nonnull align 8 dereferenceable(40) %39) #104, !noalias !99611
+  %53 = bitcast %116* %7 to i8*
+  call void @llvm.lifetime.start.p0i8(i64 24, i8* nonnull %53), !noalias !99611
+  %54 = getelementptr inbounds %142, %142* %0, i64 0, i32 0, i32 0
+  %55 = load {}*, {}** %54, align 8, !alias.scope !99613, !noalias !99616, !nonnull !1
+  %56 = getelementptr inbounds %142, %142* %0, i64 0, i32 0, i32 1
+  %57 = load [3 x i64]*, [3 x i64]** %56, align 8, !alias.scope !99613, !noalias !99616, !nonnull !1
+  %58 = getelementptr inbounds %135, %135* %6, i64 0, i32 0
+  call void @llvm.lifetime.start.p0i8(i64 40, i8* nonnull %58), !noalias !99611
+  call void @llvm.memcpy.p0i8.p0i8.i64(i8* noundef nonnull align 8 dereferenceable(40) %58, i8* noundef nonnull align 8 dereferenceable(40) %52, i64 40, i1 false), !noalias !99611
+  %59 = getelementptr inbounds [3 x i64], [3 x i64]* %57, i64 0, i64 4
+  %60 = bitcast i64* %59 to void (%116*, {}*, %74*, %135*)**
+  %61 = load void (%116*, {}*, %74*, %135*)*, void (%116*, {}*, %74*, %135*)** %60, align 8, !invariant.load !1, !noalias !99611, !nonnull !1
+  call void %61(%116* noalias nocapture nonnull sret(%116) dereferenceable(24) %7, {}* nonnull align 1 %55, %74* noalias nonnull readonly align 8 dereferenceable(32) bitcast ([32 x i8]* @16149 to %74*), %135* noalias nocapture nonnull dereferenceable(40) %6) #104, !noalias !99608
+  call void @llvm.lifetime.end.p0i8(i64 40, i8* nonnull %58), !noalias !99611
+  %62 = getelementptr inbounds %116, %116* %7, i64 0, i32 0
+  %63 = load {}*, {}** %62, align 8, !noalias !99611
+  %64 = icmp eq {}* %63, null
+  %65 = bitcast {}* %63 to i8*
+  br i1 %64, label %71, label %66
+
+66:                                               ; preds = %51
+  %67 = getelementptr inbounds %116, %116* %7, i64 0, i32 1, i64 0
+  %68 = load i64, i64* %67, align 8, !noalias !99611
+  %69 = icmp eq i64 %68, 0
+  br i1 %69, label %71, label %70
+
+70:                                               ; preds = %66
+  call void @__rust_dealloc(i8* nonnull %65, i64 %68, i64 1) #104, !noalias !99608
+  br label %71
+
+71:                                               ; preds = %51, %66, %70
+  call void @llvm.lifetime.end.p0i8(i64 24, i8* nonnull %53), !noalias !99611
+  call void @llvm.lifetime.end.p0i8(i64 40, i8* nonnull %52), !noalias !99611
+  br label %47
+}
+```
+
+Note the batched stack allocations, inlining of function calls and consolidation
+of control flow.
+
+The behavior that occurs when `panic`ing inside the Rust stubs can be controlled
+by linking either the `panic_unwind*.bc` or `panic_abort*.bc` files, analogous
+to
+[setting the `panic` key in `Cargo.toml`](https://doc.rust-lang.org/book/ch09-01-unrecoverable-errors-with-panic.html#unwinding-the-stack-or-aborting-in-response-to-a-panic).
+We use the same strategy that is used in our Vector binary, which currently uses
+the default "unwind".
+
+### Testing Strategy
+
+Unit tests: Add tests for the code generation of each expression in isolation,
+making sure that the emitted LLVM IR passes static analysis and the generated
+code produces the expected result. This covers edge cases specific to each
+expression.
+
+Behavior tests: Make sure that the existing test corpus living in
+[`lib/vrl/tests/tests`](https://github.com/vectordotdev/vector/blob/master/lib/vrl/tests/tests)
+and
+[`tests/behavior/transforms/remap.toml`](https://github.com/vectordotdev/vector/blob/master/tests/behavior/transforms/remap.toml)
+passes when using the LLVM based execution engine.
+
+Benchmark tests: Run micro-benchmarks that compare the runtime of VRL scripts in
+varying complexity for each execution mode. The LLVM based approach should
+conceptually always be the fastest. Should we discover a case where that doesn't
+hold, we can examine it closely to see where the other execution engines apply
+optimizations that we left out.
+
+Soak tests: Run end-to-end tests to observe the impact on overall performance in
+a pipeline. Again, the LLVM based approach should be fastest in every case. The
+overall speedup will also largely depend on how heavy the remap script is and if
+there exist other components that bottleneck the pipeline.
+
+Fuzz tests: Run VRL programs that integrate a combination of arbitrary
+expressions that are automatically generated. This can uncover faults in very
+fringe edge cases that only occur when specific expressions interact with each
+other. We can use the existing execution modes to cross-validate for
+correctness.
+
+Manual review: While static analysis tools and automated tests prevent a certain
+class of bugs, there's still many logic errors that can occur in `unsafe` code
+which can lead to memory corruption when invariants are violated. For one
+measure, we can save LLVM IR in textual form in
+[`lib/vrl/tests/tests/expressions`](https://github.com/vectordotdev/vector/blob/master/lib/vrl/tests/tests/expressions)
+such that manual verification of generated code can be incorporated into the
+pull review process. We might also add review guidelines that e.g. require
+adding a label `unsafe` to the pull request (ideally this can be automated), and
+requests reviewers to explicitly acknowledge that they have reviewed the unsafe
+blocks in question.
+
+## Rationale
+
+As long as the single-core performance of VRL is the bottleneck of a topology,
+general performance improvements to VRL are extremely valuable as they equate to
+an equally sized performance improvement to the entire topology.
+
+We want to live up to the
+[performance guarantees](https://github.com/vectordotdev/vector/blob/f1404bea186ba83c4426a32bbef3f633c17cf4d2/website/cue/reference/remap/features/compilation.cue#L4-L8)
+outlined in VRL's list of features.
+
+Every VRL program benefits from the reduced runtime overhead, without us needing
+to optimize any specific use case. Consistent execution speed is important to
+build trust in the language.
+
+Being able to execute our log transformation DSL at speeds which would otherwise
+only be attainable by hand-writing Rust programs will strengthen a key value
+proposition of Vector: best-in-class performance.
+
+## Drawbacks
+
+By generating machine code via LLVM, we are no longer (largely) immune to memory
+violations. To reduce the error surface as much as possible, we employ industry
+practices such as fuzz-testing VRL programs, static analysis via LLVM and rely
+on the Rust compiler for any non-trivial code fragments. That being said, memory
+safety guarantees always rely on a small link of trust that can not be
+automatically verified - this time we are ourselves responsible for maintaining
+a small set of invariants instead of being able to defer to a third party for
+correctness. We still provide an inherently memory safe language to the user.
+
+Producing LLVM bitcode for Rust's `std` library is guarded behind the
+[`-Z build-std`](https://doc.rust-lang.org/cargo/reference/unstable.html#build-std)
+flag and only available on the nightly compiler toolchain. We need `std` to
+fully link our precompiled LLVM bitcode. It's possible to circumvent the nightly
+requirement by setting the `RUSTC_BOOTSTRAP=1` environment variable, such that
+we have a `std` that is built using the same Rust and LLVM version as Vector. We
+isolate the usage of this hack by building a separate crate with `std` only, and
+link it to the library bitcode in a build step.
+
+Statically linking LLVM to the Vector binary adds roughly 9MB, additionally to
+precompiled bitcode that needs to be included with the binary. If this is a
+concern, we can consider shipping binaries with the LLVM feature disabled.
+
+While LLVM is a highly used framework within the industry, working with it
+requires rather specialized knowledge about compiler construction. However,
+there exists plenty of publicly accessible material for code generation using
+LLVM, some of which I linked to [above](#introduction-to-llvm). In addition,
+there should be great focus to document and test this part of the code base
+extraordinarily well.
+
+## Alternatives
+
+### Compile to Rust
+
+Using Rust as a compilation target would require us to
+
+- ship a Rust compiler and its libraries
+- ship Vector source code and its dependent crates
+
+which would be hundreds of MB and therefore infeasible.
+
+### Compile to C
+
+Using C as a compilation target would require us to
+
+- ship a C compiler
+
+while
+
+- not having any better safety guarantees
+- not being able to inline functions and therefore miss optimization potential
+
+and therefore not provide any significant benefits over using LLVM directly.
+
+### Compile to WebAssembly
+
+Using WebAssembly as a compilation target would require us to
+
+- ship a Wasm runtime
+- copy data in and out of WebAssembly or use `mmap`ing techniques which would
+  constrain in which memory regions event data must reside
+
+On the upside, WebAssembly provides a higher abstraction level and semantics
+that allow to execute untrusted code to safely, however at the cost of slower
+execution speed.
+
+Tangentially related to this consideration stands the fact we recently dropped
+support for the WebAssembly transform.
+
+### Compile to Bitcode
+
+As mentioned in the context, we are currently moving forward with a VM for VRL.
+Compared to the current execution model and an LLVM-based approach, the VM
+provides a middle ground for execution speed, memory safety and sophistication.
+
+Weighing the benefits depends on the real world performance of both approaches.
+
+## Plan Of Attack
+
+Incremental steps to execute this change. These will be converted to issues
+after the RFC is approved:
+
+- Submit a PR with spike-level code _roughly_ demonstrating the change:
+  [#10442](https://github.com/vectordotdev/vector/pull/10442).
+- Extract a core library from VRL for exposing its types with minimal
+  dependencies, necessary to reduce size of the precompiled bitcode.
+- Get feature parity close enough to run first soak tests against current
+  execution model to get a first peek on end-to-end performance.
+- Define conventions around optional, named and compiled function arguments.
+- Refine code generation by taking into account type information.
+- Add unit tests for each expression in isolation.
+- Add fuzz tests that cross-validate results of all three execution modes.
+- Investigate if heap allocations use the same strategy as our main Vector
+  binary and are covered by our regular performance analysis tools
+
+---
+
+[^1]:
+    It certainly doesn't help that "LLVM was originally an initialism for Low
+    Level Virtual Machine". However, the "LLVM abbreviation has officially been
+    removed to avoid confusion, as LLVM has evolved into an umbrella project
+    that has little relationship to what most current developers think of as
+    (more specifically) process virtual machines."
+    [↪](https://en.wikipedia.org/wiki/LLVM#History)