-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
x64 new backend: port ABI implementation to shared infrastructure with AArch64. #2142
Conversation
a1f6d22
to
719705b
Compare
Subscribe to Label Actioncc @bnjbvr
This issue or pull request has been labeled: "cranelift", "cranelift:area:aarch64", "cranelift:area:machinst", "cranelift:area:x64"
Thus the following users have been cc'd because of the following labels:
To subscribe or unsubscribe from this label, edit the |
3c6f7a5
to
347b781
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really hard to review from a single look without reviewing the rest of the refactoring, so @julian-seward1 would probably be a better reviewer here. Most of my remarks are mostly about the traits that happened in the previous refactoring, so feel free to defer addressing those in a later refactoring. As long as you've tested it with Spidermonkey and it passes all the tests, I'm fine with it.
cranelift/codegen/src/isa/x64/abi.rs
Outdated
Inst::EpiloguePlaceholder | ||
} | ||
|
||
fn gen_add_imm(into_reg: Writable<Reg>, from_reg: Reg, imm: u64) -> SmallVec<[Self::I; 4]> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't there when the trait was designed, but: would it make sense to pass a context, or the consumer of these insts directly here? (I remember some cases where it wasn't possible to do so, because it was used in both the lowering and code emission contexts, so maybe that's irrelevant.)
This would avoid the smallvec allocation entirely here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about this, but (as you suspected) this is called in contexts where we have no LowerCtx
or other direct consumer of callbacks. We could statically monomorphize on a closure argument that receives instructions, but that seems unnecessarily complex and would duplicate code in the binary; I made sure to size the SmallVec
s so they should fit x64 and aarch64 cases in their inline storage, so the cost is (just) copying a few Inst
s instead. Happy to go the other way though if you'd prefer!
cranelift/codegen/src/isa/x64/abi.rs
Outdated
fn get_fixed_tmp_reg() -> Reg { | ||
// Use a caller-save register for this. Note that we need not exclude it | ||
// from regalloc on x64 because `gen_add_imm()` above never clobbers a | ||
// scratch register. Thus the sequence ends up being: gen stack limit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just reading this doesn't help understanding why it's fine to do so, which makes me think the trait functions should be redesigned so the temp reg is also returned from gen_add_imm
, or something like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing this out; yeah, it is a bit convoluted. I've updated the documentation on the trait definition to clearly describe the requirements placed on the machine backend's implementation and register choices, and I've renamed get_fixed_tmp_reg()
to get_stacklimit_reg()
to make its purpose more explicit.
I thought for a bit if there might be a better way to do this, but I haven't managed to find one, unless we push the whole "compute this GlobalValue
using only caller-saves" logic into each machine backend. gen_add_imm()
could return info about what it clobbers, but that still doesn't help the machine-independent implementation choose another register on its own. Really the machine-backend author has to choose fixed scratch registers and a corresponding add-with-large-immediate lowering (on RISCs at least) and just provide them to the machine-independent part.
On x64, anyway, we can always have a full 32-bit immediate on an add, so the more tricky requirements are moot; all we need is some arbitrary caller-save register here. Hopefully simple enough but I'm open to other ideas :-)
d94f180
to
5f71a03
Compare
Updated -- thanks! This has been verified to work with SpiderMonkey as well. |
(I'm happy to give it a look on another day, unless Julian beats me to it; I really need to read the rest of the refactoring anyways :-)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a great cleanup, thanks a lot! (And sorry about the very long review time.)
It's likely we could even share more of the compute_args_loc
logic by having some kind of architecture agnostic CallingConvention
ABI, and then each architecture could have its own SystemVCallingConvention
/ BaldrdashCallingConvention
impl, but let's think about this later :-)
5f71a03
to
c3b6cdc
Compare
…h AArch64. Previously, in bytecodealliance#2128, we factored out a common "vanilla 64-bit ABI" implementation from the AArch64 ABI code, with the idea that this should be largely compatible with x64. This PR alters the new x64 backend to make use of the shared infrastructure, removing the duplication that existed previously. The generated code is nearly (not exactly) the same; the only difference relates to how the clobber-save region is padded in the prologue. This also changes some register allocations in the aarch64 code because call support in the shared ABI infra now passes a temp vreg in, rather than requiring use of a fixed, non-allocable temp; tests have been updated, and the runtime behavior is unchanged.
c3b6cdc
to
e8f772c
Compare
Previously, in #2128, we factored out a common "vanilla 64-bit ABI"
implementation from the AArch64 ABI code, with the idea that this should
be largely compatible with x64. This PR alters the new x64 backend to
make use of the shared infrastructure, removing the duplication that
existed previously. The generated code is nearly (not exactly) the same;
the only difference relates to how the clobber-save region is padded in
the prologue.