-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cranelift aarch64 backend: use constant pool to defer constant emission to the end, unless out of range #1549
Comments
For reference, there are other things to think about with constant pools; see #1385. |
There is a set of constants that might be particularly problematic - those generated for There is also another problem that is quite similar - lack of deferred emission of instructions, in particular traps. Consider integer division; currently the AArch64 backend generates:
The natural expectation is that most of the time the I think that both issues (the one in the previous paragraph and constant pools) could be solved simultaneously. BTW the limited range of literal loads could be extended by emitting a combination of |
@akirilov-arm no, we haven't considered anything like a GOT, though at some point we do need to fill out support for relocation types like this. Certainly the early/bringup work for the aarch64 backend was the JIT use-case, where GOTs are not really used (or at least, e.g. SpiderMonkey has a table of wasm functions and IIRC globals but it generates custom code for accesses); so we've gotten away with the relatively simple set of relocation types so far. Longer-term, I think we ought to just emit the proper relocations for e.g. GOT references; we'll have to wire through the relocation-type support to the relevant backends, including
This is an interesting one: my first instinct is that it might be simpler to handle this case directly during lowering, by keeping a list of deferred trap-paths (combination of |
This is done, right? |
We actually currently synthesize up to 64-bit values inline with |
@akirilov-arm: yes, sorry, that update somehow slipped off my radar -- thanks very much for the detailed evaluation! I agree that based on that update, we should probably at least replace the inline-constant cases with constant pool usages; the branch-around seems to have a significant impact. |
@cfallin This is a side question, but do we have a mechanism to share a constant between several users instead of rematerializing it each time? This is particularly relevant to the bitmask extraction operations, in which case the constants are always the same. Of course, the increased register pressure is a trade-off to consider. Also, some of the proposals to the Wasm SIMD specification (see WebAssembly/simd#395 for example) are made with such a capability in mind. |
@akirilov-arm we can share within a single function at least: @abrown's work in #2328 added deduplication of We can't dedup across function bodies; that might be nice to have, but it would require a bit more plumbing due to our separate function compilation (which enables other things like parallelization). In principle we could have an "rodata object" that we aggregate from all |
I think #2468 may be somewhat applicable to this thread; I wanted an ergonomic way to add constants during lowering ( |
No, I don't think so. As for the existing deduplication - if I understand correctly, it avoids keeping several copies of the same data inside the pool, but each use of the data still requires a literal load. I was thinking more in the direction of keeping the constant data live in a register and avoiding memory operations; this approach might be relevant even for more complicated integer constants, say those that need more than 2 instructions to materialize (arbitrary limit - I don't have hard data to justify it). |
Ah, OK. In theory, at least for constant duplication at the CLIF level, this may be covered by CSE. Constants that arise during lowering will not be CSE'd, so either post-lowering opts or constant rematerialization at the regalloc level could help. We might get to either or both of these someday. A constant pool with dedup is still better than literal loads from separate constant locations if we're going to do multiple literal loads (perhaps because opts are disabled or we haven't implemented remat yet); so these are complimentary I think. |
Currently, to avoid issues with (i) the PC-relative addressing range of
LDR
, and (ii) the fact that the constant pool needs to be relocatable, the aarch64 backend emits constants inline with code and branches around them. This is obviously suboptimal for code-density and branchiness reasons.To handle the first issue, we should use the
ConstantPool
to collect constants, as the old backend does, and then modify our emission logic to defer constant emission to the end unless we're about to go out of range of pending constant references, in which case we emit a "constant island".To handle the second issue, we need to emit constant-pool relocations.
It's still unclear how we can handle the intersection of these two, i.e., when constant references become out-of-range because the client relocates the constant pool further away. One known use case of the relocatable constant pool is for SpiderMonkey to insert its own epilogue into Wasm functions; there, at least, we can bound how much further away the constant pool will be, so perhaps we can just have a slightly-conservative limit for when we must emit a constant island.
The text was updated successfully, but these errors were encountered: