Replace LLVM with Cranelift #674

yorickpeterse · 2023-12-20T21:11:03Z

Description

We currently use LLVM, mainly because at the time of switching to a native code compiler Cranelift wasn't quite ready yet. I think since then it released the features we need (e.g. checked integer arithmetic). Switching would significantly simplify distribution and improve compile times. This issue is meant to keep track of what to keep in mind, what we'd have to change, etc.

It's important to keep in mind that we're not going to support both LLVM and Cranelift. Instead, the long-term goal is to write optimizations ourselves and use Cranelift/whatever to just turn that into machine code.

Related work

No response

Related issues

Cache and reuse object files and/or LLVM bitcode when compiling to LLVM #520: I'm looking into object caching to make up for LLVM being slow as molasses. If we switch to Cranelift, we might be able to put this aside for the foreseeable future, which I'd much prefer over implementing a complicated incremental compilation setup.

Missing Cranelift features

Support different atomic ordering guarantees in Cranelift bytecodealliance/wasmtime#7722

yorickpeterse · 2023-12-20T21:12:28Z

One issue I encountered last time, and which still seems present, is that it's not clear how one would produce basic debug information (enough to produce a stacktrace). I asked about this in bytecodealliance/wasmtime#5141, but this didn't yield a simple/clear answer.

yorickpeterse · 2023-12-21T00:09:01Z

We use structure returns in a few places, mainly to interface with the Rust runtime. It seems Cranelift supports this, though it's not clear to what degree:

yorickpeterse · 2023-12-21T00:10:23Z

Tail calls are currently not supported: bytecodealliance/wasmtime#1065. This doesn't matter much though as we don't make use of them, and LLVM's support for them is spotty as well.

yorickpeterse · 2023-12-21T00:12:20Z

LLVM has the getelementptr instruction to automatically calculate field offsets. In Cranelift you need to manually calculate these and pass them to the load() instruction. I'm not sure this is much of an issue though, as we can calculate these at compile-time and probably just pass them as a u32/whatever literal to load().

yorickpeterse · 2023-12-21T00:19:16Z

For stack slots, Cranelift has the following methods:

The declare_var defines a variable and its type. def_var assigns a value to a variable, and use_var marks it as used. The method use_var also loads the value of the variable, so no extra load() is needed for that. Cranelift then turns this into SSA for you.

yorickpeterse · 2023-12-21T14:09:40Z

Digging through the Cranelift code and associated projects, it seems producing debug info is still painful, requiring knowledge/generating of raw DWARF data through the gimli crate. I don't have such knowledge at the moment, so it will likely take a while before we can swap out LLVM with Cranelift.

yorickpeterse · 2023-12-22T14:54:28Z

Per https://docs.rs/cranelift-codegen/latest/cranelift_codegen/ir/trait.InstBuilder.html#method.atomic_load and related methods, it seems Cranelift enforces sequential ordering for atomics. This could have a rather significant impact as we use atomics in a few places, but we absolutely don't need nor want sequential consistency there.

yorickpeterse · 2023-12-22T15:13:06Z

https://github.com/bytecodealliance/wasmtime/blob/b583c54fda13b53dea362861125dd1e2ced1381d/cranelift/codegen/src/isa/x64/lower.isle#L3213-L3223 states that at least on x86, atomics are just regular loads and stores (optionally with a fence for stores). That's not too bad, but it would be nice to have precise control over this.

yorickpeterse · 2023-12-22T15:21:20Z

Cranelift doesn't have an equivalent of llvm.powi. This means we need to use a runtime function call for that.

yorickpeterse · 2023-12-22T15:28:10Z

To define constant values, it seems you have to use declare_data_in_data to define it, then use DataDescription::define() to write the value.

You then "import" to the global into a function using declare_data_in_func, and load it using global_value, at least by the looks of it.

yorickpeterse · 2023-12-22T15:37:17Z

Calling a function is done as follows:

Build the instruction using e.g. call
Obtain the results using ins_results

yorickpeterse · 2023-12-22T15:46:26Z

Function imports are done using declare_function to import the signature, followed by declare_func_in_func to reference it.

yorickpeterse · 2023-12-22T16:20:36Z

Relevant crates:

yorickpeterse · 2023-12-22T16:25:19Z

In the LLVM backend, we rely on the mem2reg pass to construct the IR in SSA form. This requires allocating temporaries using alloca() at the start of a function. For this we use llvm::builder::Builder::new_stack_slot(), which injects the alloca() calls into the start of the function, regardless of where we currently are.

In Cranelift it seems this isn't necessary, as one can just call declare_var at-will. This means we could probably get rid of new_stack_slot entirely.

yorickpeterse · 2023-12-22T16:28:44Z

Jump tables/switches are done using br_table. The jump tables are created using create_jump_table.

I believe our Switch instruction always uses monotonically increasing numbers that start at zero, so we should be able to lower that directly to the br_table instruction.

bjorn3 · 2023-12-22T18:28:31Z

I believe our Switch instruction always uses monotonically increasing numbers that start at zero

If they don't you can use cranelift_frontend::Switch to generate a combination of jump tables and a decision tree. It is almost certainly not the most optimal, but it should save some effort and any optimizations to it would benefit cg_clif too :)

bjorn3 · 2023-12-22T18:32:38Z

To define constant values, it seems you have to use declare_data_in_data to define it, then use DataDescription::define() to write the value.

declare_data_in_data is for referencing a global variable from another global variable. To define one you need to first use declare_data and then define_data with a DataDescription as argument. The DataDescription can be filled with DataDescription::define and then you can use the write_data_addr and write_function_addr with the results of declare_data_in_data and declare_func_in_data respectively to reference other global variables and functions from this global variable.

yorickpeterse · 2023-12-22T18:42:25Z

@bjorn3 Thanks for the details! 😃

yorickpeterse · 2023-12-23T23:13:14Z

Per Zulip and the documentation: the Value type only represents simple types such as integers and floats. For stack allocated structures, one needs to use create_sized_stack_slot and related methods and types. The stack slots registered for this are not reused through some sort of clever mechanism. This means we'd need to come up with some sort of scheme to reuse slots no longer in use, otherwise we might end up using more stack space than necessary.

I'm not sure how we'd implement that though, because when generating stack loads/stores it's not clear if the slot is still used after that. This would probably require some form of liveness analysis.

yorickpeterse · 2023-12-23T23:26:29Z

Related to that:

We'll manually need to generate the correct signatures for functions that take or return structures. This in turn needs to take into account the structure sizes, as structures <= 8 bytes are passed directly while larger ones use an extra argument.

We could simplify things by dropping support for structure returns/arguments, but this requires adjusting the runtime library in a whole bunch of places, which I'd like to avoid.

yorickpeterse · 2023-12-23T23:35:47Z

I think my biggest concern with Cranelift so far is that it's perhaps a bit too low-level. For example, we need to calculate structure sizes, field offsets, etc ourselves. Because of this, I suspect the amount of code needed in the backend would roughly equal that of our current LLVM backend, as we'd simplify things in one area but complicate them in another. For example, we have a bunch of code to specify the structure sizes and what not such that LLVM can calculate the right offsets. I had initially hoped that most of this could go away with Cranelift, but it turns out we'd have to keep most of it.

I think Cranelift would really benefit from a slightly more high-level wrapper that abstracts over the various platform differences and what not, basically LLVM--, but such a library doesn't exist unfortunately.

I'm going to experiment with doing LLVM generation in parallel, to see if/to what degree that's even possible with Inkwell. I'll also need to really think hard if Cranelift is worth the trouble in its current state.

yorickpeterse · 2023-12-24T00:30:21Z

Running LLVM/Inkwell in parallel appears tricky. In spite of some of the types being Sync, it appears that even if most of the data needed is created on a per-thread basis, Inkwell/LLVM segfaults. I'm guessing this is because we do share some StructType structures between threads in my quick hack, and those are probably mutated in some way.

Unfortunately, due to how Inkwell/LLVM is set up, we can't really work around sharing that particular data as it defines all the structure sizes and what not, and creating those for every thread is a waste.

If we do create that data on a per-thread basis, and ignore the code potentially being incorrect for a moment, compile times for the standard library tests go down from 3.3 seconds to 1.15 seconds with 8 threads. That's only 3x faster but with 8x the concurrency, which is disappointing. Note that for this experiment my queue is just Mutex<Vec<_>> and we could do better, but I'd be surprised if changing that yields another 5x improvement.

yorickpeterse · 2023-12-24T00:46:37Z

To add to the above: Inkwell exposes basically everything as immutable types/methods, even though LLVM mutates things under the hoods. This means that even if we were to get things to work in parallel, it may cause problems in the future if LLVM starts to mutate/share data where this isn't expected.

yorickpeterse · 2023-12-24T00:53:40Z

Regarding stack slots: based on how we use structure returns right now, and the fact that Inko values are heap allocated (apart from Int/Float/etc), I think stack slot reuse (or more precisely a lack thereof) isn't a big deal for the time being.

yorickpeterse · 2023-12-24T16:01:42Z

Another challenge: our FFI supports variadic function arguments, something that's needed to interface with certain C libraries (e.g. cURL). Per this issue Cranelift has no support for this whatsoever, and per this comment you basically have to emulate it.

I'm starting to lean more towards sticking with LLVM for the time being, and focusing more on making it parallel and incremental where possible, seeing as transitioning to Cranelift is likely going to require a lot of work.

yorickpeterse · 2024-08-16T14:09:23Z

I'm going to close this as there are no plans to do this in the next several years at least, in addition to Cranelift lacking various features that we need (as discussed in this thread).

yorickpeterse added feature New things to add to Inko, such as a new standard library module compiler Changes related to the compiler labels Dec 20, 2023

yorickpeterse mentioned this issue Dec 20, 2023

Cache and reuse object files and/or LLVM bitcode when compiling to LLVM #520

Closed

9 tasks

yorickpeterse mentioned this issue Dec 22, 2023

Support different atomic ordering guarantees in Cranelift bytecodealliance/wasmtime#7722

Open

yorickpeterse mentioned this issue Dec 24, 2023

Generate LLVM IR and object files in parallel #675

Closed

4 tasks

yorickpeterse added this to the Future ideas milestone Dec 24, 2023

yorickpeterse removed this from the Future ideas milestone Aug 5, 2024

yorickpeterse closed this as completed Aug 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace LLVM with Cranelift #674

Replace LLVM with Cranelift #674

yorickpeterse commented Dec 20, 2023 •

edited

Loading

yorickpeterse commented Dec 20, 2023

yorickpeterse commented Dec 21, 2023

yorickpeterse commented Dec 21, 2023

yorickpeterse commented Dec 21, 2023

yorickpeterse commented Dec 21, 2023

yorickpeterse commented Dec 21, 2023

yorickpeterse commented Dec 22, 2023

yorickpeterse commented Dec 22, 2023

yorickpeterse commented Dec 22, 2023

yorickpeterse commented Dec 22, 2023 •

edited

Loading

yorickpeterse commented Dec 22, 2023

yorickpeterse commented Dec 22, 2023

yorickpeterse commented Dec 22, 2023

yorickpeterse commented Dec 22, 2023

yorickpeterse commented Dec 22, 2023

bjorn3 commented Dec 22, 2023

bjorn3 commented Dec 22, 2023

yorickpeterse commented Dec 22, 2023

yorickpeterse commented Dec 23, 2023

yorickpeterse commented Dec 23, 2023 •

edited

Loading

yorickpeterse commented Dec 23, 2023

yorickpeterse commented Dec 24, 2023 •

edited

Loading

yorickpeterse commented Dec 24, 2023

yorickpeterse commented Dec 24, 2023

yorickpeterse commented Dec 24, 2023

yorickpeterse commented Aug 16, 2024

Replace LLVM with Cranelift #674

Replace LLVM with Cranelift #674

Comments

yorickpeterse commented Dec 20, 2023 • edited Loading

Description

Related work

Related issues

Missing Cranelift features

yorickpeterse commented Dec 20, 2023

yorickpeterse commented Dec 21, 2023

yorickpeterse commented Dec 21, 2023

yorickpeterse commented Dec 21, 2023

yorickpeterse commented Dec 21, 2023

yorickpeterse commented Dec 21, 2023

yorickpeterse commented Dec 22, 2023

yorickpeterse commented Dec 22, 2023

yorickpeterse commented Dec 22, 2023

yorickpeterse commented Dec 22, 2023 • edited Loading

yorickpeterse commented Dec 22, 2023

yorickpeterse commented Dec 22, 2023

yorickpeterse commented Dec 22, 2023

yorickpeterse commented Dec 22, 2023

yorickpeterse commented Dec 22, 2023

bjorn3 commented Dec 22, 2023

bjorn3 commented Dec 22, 2023

yorickpeterse commented Dec 22, 2023

yorickpeterse commented Dec 23, 2023

yorickpeterse commented Dec 23, 2023 • edited Loading

yorickpeterse commented Dec 23, 2023

yorickpeterse commented Dec 24, 2023 • edited Loading

yorickpeterse commented Dec 24, 2023

yorickpeterse commented Dec 24, 2023

yorickpeterse commented Dec 24, 2023

yorickpeterse commented Aug 16, 2024

yorickpeterse commented Dec 20, 2023 •

edited

Loading

yorickpeterse commented Dec 22, 2023 •

edited

Loading

yorickpeterse commented Dec 23, 2023 •

edited

Loading

yorickpeterse commented Dec 24, 2023 •

edited

Loading