-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Add a new #[instruction_set(...)]
attribute for supporting per-function instruction set changes
#2867
Conversation
I think it would be helpful to give some examples of why someone might want to use this attribute. |
It touches upon that a bit:
More specifically, T32 code is 16 bits per operation while A32 code is 32 bits per operation. This makes a fairly big difference in code size, and since many ARM devices might have only a 16-bit bus for some parts of the system this can even make a big difference in CPU cycles taken to run the program. |
RISC-V is mentioned in the RFC as well since it does have the "C" extension for the compressed instruction set, which can be found here: https://riscv.org/specifications/isa-spec-pdf/ in Chapter 16 |
Before reading the content I thought |
But unlike Thumb, RVC is a ISA extension that just adds more instructions rather than replacing the ISA entirely. So it can (and should) be handled through the existing |
Some notes from initial review:
|
|
If we're sticking with the overall semantic notion and just renaming, I'd maybe go for
That's not particularly clear from the RFC. I would like to see some elaboration on this aspect. From reading the text, I primarily understood this as an internal aspect of functions, not as something affecting the signature, particularly because the RFC talks about "shims" and whatnot. This is different from extern "C" fn foo() {}
const X: fn() = foo; //~ ERROR expected "Rust" fn, found "C" fn Presumably Rust would need to insert shims when dealing with different (Also, please point out some documentation in the LLVM LangRef in the RFC.)
The main difference is that instrinsics and target feature settings are already established categories of target specific mechanisms. This makes this RFC different than e.g. introducing a new target feature or a new intrinsic. That is why I find it important to answer e.g. (as noted in the RFC):
Can you elaborate on how that interacts with the calling convention? |
Sure. On chips that support both a32 and t32 code there's a specific bit within the CPU status register that determines if the program counter address should be used to read a single 32-bit value (a32) or one to two 16-bit value(s) (t32). Depending on the "thumb" bit the CPU will perform the appropriate read and take appropriate action. Of course, the bit patterns between a32 and t32 are totally non-compatible. If the CPU is reading code from one isa while having the bit set (or not) for the other you'll get either the wrong legal instructions or just illegal instructions (UB either way). This is all sorted out by having specific forms of branch instruction that let you enable or disable the bit (if needed) when making calls. The linker performs the task of generating the "interwork" shims during the linking process so that branches go to the correct location and also perform the correct transition as necessary. |
Hello I'm Lokathor and welcome to my TED talk. Everyone please be sure to thank @Centril for asking me to give this presentation. This target audience for this post is T-Lang specifically. Others can read it too of course, and I hope you all enjoy it, but the text here will be largely conversational and probably not suitable for direct inclusion into this RFC or into any particular Rust documentation. A Primer On A32 / T32 CodeSome of the CPUs in the ARM chip family support more than one form of machine code. This is not really like the Before the ARMv4T series there was just one flavor of assembly / machine code for ARM chips, naturally called "ARM code". Starting with ARMv4T they added a "Thumb code" flavor as well. (There's also a "Thumb2" extension supported in even later chips.) The assembly text of Thumb code is intentionally as close as possible to the assembly text of ARM code, but the binary encoding of the instructions is totally different.
Thumb code can't use as many of the registers, and it can't even do all the operations the CPU is capable of performing, but since ARM chips are usually used for embedded stuff, the code space savings actually are a huge deal. Also, since the bus from the storage to the CPU might be only a 16-bit bus using smaller opcodes has a runtime speed effect as well. The CPU literally stalls while waiting for the "second half" of each 32-bit ARM opcode to transfer across the bus. "Thumb code is typically 65% of the size of the ARM code, and provides 160% of the performance of ARM code when running on a 16-bit memory system." --ARM7TDMI Technical Reference Manual 1.2.2. The Thumb instruction set As embedded developers, we would naturally like to compile as much of the program as possible in Thumb code to get this advantage. However, the CPU generally boots in ARM state, and some parts of the code might also be required to be jumped to in ARM state because the chip is just built that way, so we cannot program all of the program just in Thumb code. Also, as I mentioned above, ARM code has access to more registers at once and can perform more kinds of operation than Thumb can, so even select parts from the "normal" portion of the program might be better written using ARM code. Reference-level explanationPrecisely the way that this works is both simple and clever:
That's it. Code objects generated by LLVM store the address of a given label as either even or odd, so the object files continue to know if each part is ARM or Thumb (link to the ARM ELF spec, check 5.5.3). The linkers for ARM targets can adjust function calls so that calls from ARM to Thumb and back can use Motivation (yes it's out of order from the official template)Currently, Rust supports two target groups for ARM devices (many of which are tier2!):
For these targets, all code of the entire program is restricted to only FAQ
|
I think this is a useful attribute to have, it is useful for embedded systems and mirrors existing GCC/Clang functionality with |
So the main reason you would use this attribute is:
And the way you would use it is:
? |
Are there use cases where you'd want to do the inverse, i.e. compile for one of the "arm" targets and mark some functions as How does this work with libraries? Would some ARM libraries always want to be compiled with one or the other isa in any ARM application? Would it ever make sense for a cross-platform library (e.g. for a hash function) to say things like "if I'm being used on ARM, this part should use thumb isa"? Or do the only embedded programs where this matters use very few libraries or vendor/fork when it matters so it's not a concern? |
I think the most common use case would be to have t32 be the default and explicitly annotating certain hot functions as a32 for performance (or functions that use instructions not available to thumb via intrinsics/inline asm). I can't think of a good use case for opting into t32 when a32 is the default. I do not expect any of these attributes to be used by cross-platform libraries, and if an embedded project really needs a function in an external library to be compiled for a32 then they will most likely fork that library. |
@Diggsey: yes. |
Are there any other architectures using this trick with multiple instruction decoding modes in single application-level execution? What terminology they use? |
Inlining an |
I feel like this needs to comment on the ABI for the generated function as well, and whether blocks can be annotated with What is the future scope for this kind of attribute? Will an x86 Would there ever be a case where additional instructions are required to switch the processor to the new ISA before running the code? If this is allowed, should there be The current use case seems extremely tailored to thumb, and definitely needs more elaboration on how this not only makes sense on ARM, but also on x86 and other architectures. It also needs to elaborate why this should be an attribute instead of a separate target. |
Well I think I explained the "why an attribute and not a target?" question fairly well already. In short, we have both forms as targets already. What's needed is the ability to intermix things. |
Indeed, this comment thread has very thoroughly explained why the proposed feature is the way it is, but the RFC text is still missing most of that reasoning, which is what I think @clarfon meant by "it also needs to elaborate..." |
Yes, I was basing my response on the RFC text, and had only skimmed the comments. The RFC as is is quite bare. |
One place |
|
Definitely. It will probably take a huge amount of work to implement cross-ISA calls in LLVM, so we may want to use a wasm-bindgen kind of approach meanwhile. |
Nominating for mention at lang team meeting: @Lokathor mentioned to me that this RFC had reached a point where more feedback would be useful, and they were curious whether there is a potential @rust-lang/lang sponsor. |
Primarily this provides more clarifications as to what's specified, and also an example of "what this looks like in use".
Based on discussion in the language team meeting today:
|
I've been thinking about the possible interactions with inline asm and come up with the following proposal:
Basically, if your asm depends on |
(to be clear, the lang team is essentially waiting to see responses to the comments that @joshtriplett posted up above). (Although it is worth noting that the RFC text has been updated with changes that are responses to those comments. I will attempt to review the lang team comments and compare against the current state of the RFC to see if there are any outstanding issues beyond the questions regarding the interaction between inline asm and |
@pnkfelix I just reviewed the updated RFC, and I think that the updates have indeed addressed all the outstanding concerns, including documenting the unresolved question. If you agree, then I think this is ready for P-FCP. |
I believe all of the lang team's concerns here have been fully addressed. @rfcbot merge |
Team member @joshtriplett has proposed to merge this. The next step is review by the rest of the tagged team members: No concerns currently listed. Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up! See this document for info about what commands tagged team members can give me. |
Should we make the attribute imply Also, do we expect to create a project group to pursue this implementation and see this through? @Lokathor are you interested in pursuing that, for example? |
I think it's fine as an unresolved question. On LLVM having separate features enabled is generally enough to block function inlining as is, but being able to inline the code into the caller isn't fundamentally a bad idea as long as you're not using inline asm, so being able to do it some day would be nice. I could help coordinate a person or persons doing the implementation work but I don't know enough of the compiler to do this myself in any kind of timely manner. My guess is that this is probably a somewhat smaller change overall, and might not call for a whole group. |
What possible reason would there be for a blanket ban on inlining? Clearly, changing the ISA variant by inlining or other transformations can cause problems (due to inline asm or external constraints like "this is an interrupt handler => needs A32"), but inlining between function using the same ISA should be 100% fine and desirable. The situation is similar to In fact, since the A32/T32 switch is encoded as |
@rfcbot reviewed OK, I'm convinced regarding inlining, thanks. I overlooked the fact that you would have multiple functions within the same "instruction set" invoking one another. Regarding formation of a project group, @Lokathor, my intent and hope is that this is not a "high overhead" activity, it's just basically a way for us to track progress, and to have a dedicated zulip stream for any discussion (in other words, a group of 1 or 2 people seems fine). But perhaps this RFC will proceed under the "old process" of creating a tracking issue with no particular tracking. We can discuss/decide that separately. |
🔔 This is now entering its final comment period, as per the review above. 🔔 |
The final comment period, with a disposition to merge, as per the review above, is now complete. As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed. The RFC will be merged soon. |
Huzzah! The @rust-lang/lang team has decided to accept this RFC. You can follow along with development on the tracking issue. |
This RFC proposes a new function attribute,
#[instruction_set(...)]
. The minimal initial implementation will provide#[instruction_set(a32)]
and#[instruction_set(t32)]
on ARM targets, corresponding respectively to disabling and enabling the LLVM featurethumb-mode
for the annotated function.Render