Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Add a new #[instruction_set(...)] attribute for supporting per-function instruction set changes #2867

Merged
merged 5 commits into from
Jul 24, 2020

Conversation

ketsuban
Copy link
Contributor

@ketsuban ketsuban commented Feb 16, 2020

This RFC proposes a new function attribute, #[instruction_set(...)]. The minimal initial implementation will provide #[instruction_set(a32)] and #[instruction_set(t32)] on ARM targets, corresponding respectively to disabling and enabling the LLVM feature thumb-mode for the annotated function.

Render

@Lokathor
Copy link
Contributor

Rendered

@Diggsey
Copy link
Contributor

Diggsey commented Feb 16, 2020

I think it would be helpful to give some examples of why someone might want to use this attribute.

@Lokathor
Copy link
Contributor

It touches upon that a bit:

ARM targets have a denser but less feature-packed instruction set named T32 alongside the normal A32.

More specifically, T32 code is 16 bits per operation while A32 code is 32 bits per operation. This makes a fairly big difference in code size, and since many ARM devices might have only a 16-bit bus for some parts of the system this can even make a big difference in CPU cycles taken to run the program.

@repnop
Copy link

repnop commented Feb 17, 2020

RISC-V is mentioned in the RFC as well since it does have the "C" extension for the compressed instruction set, which can be found here: https://riscv.org/specifications/isa-spec-pdf/ in Chapter 16

@kennytm
Copy link
Member

kennytm commented Feb 17, 2020

Before reading the content I thought #[isa] means "is a" 😓.

@hanna-kruppe
Copy link

RISC-V is mentioned in the RFC as well since it does have the "C" extension for the compressed instruction set, which can be found here: https://riscv.org/specifications/isa-spec-pdf/ in Chapter 16

But unlike Thumb, RVC is a ISA extension that just adds more instructions rather than replacing the ISA entirely. So it can (and should) be handled through the existing target_feature mechanisms.

@Centril Centril added T-lang Relevant to the language team, which will review and decide on the RFC. A-attributes Proposals relating to attributes A-target Target related proposals & ideas A-target_feature target_feature related proposals & ideas labels Feb 17, 2020
@Centril
Copy link
Contributor

Centril commented Feb 17, 2020

Some notes from initial review:

  • I agree with @kennytm (RFC: Add a new #[instruction_set(...)] attribute for supporting per-function instruction set changes #2867 (comment)) that #[isa] isn't suggestive of what this attribute does. I think something more verbose would be warranted in this case to illuminate what it means.

  • What guarantees does #[isa] actually give the user? From the sound of it, it's an optimization hint? If so, we could tuck this into #[optimize] (Tracking issue for RFC 2412, "The optimize attribute" rust#54882).

  • I'm concerned that this seems to be adding quite a domain-specific / niche feature to the language that might only end up benefiting a single family of targets, and that this is a lot of complexity relative to the generality of the feature. Attempts to make this more generally useful, either now, or in the future, would be welcome I think.

  • What would the interactions with e.g. Cranelift be?

  • Functions are inlined across ISA boundaries as if the #[isa] attribute did not exist.

    This needs some elaboration, preferably with examples of what this means. This does seem to confirm the notion that this is a hint (see the point re. #[optimize]) as opposed to something with semantic guarantees.

@Lokathor
Copy link
Contributor

  • Name: Sure, any ideas?
  • Guarantees: The generated function must use the assembly / machine code flavor of the designated isa. Failure to do so would effectively change the overall calling convention of the function. If an isa attribute is given then the compiler must respect it, the same as the compiler couldn't ignore extern "C" in a function signature.
  • Domain-specific: This is similar to many intrinsics and target feature settings. Mostly specific to a particular architecture because other architectures don't use this technique of having two froms of machine code. It is basically a requirement for practical ergonomic use of Rust in many embedded contexts. I believe this was on the embedded-wg wishlist for all of 2019.
  • Cranelift: This affects code generation, but only on a function level. In the "worst case", alternative isa functions can be placed in separate translation units and compiled into separate objects. The final interwork adjustments are made by the linker.
  • Inlining: This is effectively a calling convention adjustment, so if a function is inlined then the attribute would have no other effect.

@Centril
Copy link
Contributor

Centril commented Feb 17, 2020

  • Name: Sure, any ideas?

If we're sticking with the overall semantic notion and just renaming, I'd maybe go for instruction_set_arch or some variation of that.

  • Guarantees: The generated function must use the assembly / machine code flavor of the designated isa. Failure to do so would effectively change the overall calling convention of the function. If an isa attribute is given then the compiler must respect it, the same as the compiler couldn't ignore extern "C" in a function signature.

That's not particularly clear from the RFC. I would like to see some elaboration on this aspect. From reading the text, I primarily understood this as an internal aspect of functions, not as something affecting the signature, particularly because the RFC talks about "shims" and whatnot. This is different from extern "C" which has an observable difference in the type system:

extern "C" fn foo() {}
const X: fn() = foo; //~ ERROR expected "Rust" fn, found "C" fn

Presumably Rust would need to insert shims when dealing with different #[isa]s and function pointers.

(Also, please point out some documentation in the LLVM LangRef in the RFC.)

  • Domain-specific: This is similar to many intrinsics and target feature settings. Mostly specific to a particular architecture because other architectures don't use this technique of having two froms of machine code. It is basically a requirement for practical ergonomic use of Rust in many embedded contexts. I believe this was on the embedded-wg wishlist for all of 2019.

The main difference is that instrinsics and target feature settings are already established categories of target specific mechanisms. This makes this RFC different than e.g. introducing a new target feature or a new intrinsic. That is why I find it important to answer e.g. (as noted in the RFC):

Are there any presently-supported architectures with a mechanism like A32/T32 which #[isa] could support?

  • Cranelift: This affects code generation, but only on a function level. In the "worst case", alternative isa functions can be placed in separate translation units and compiled into separate objects. The final interwork adjustments are made by the linker.

Can you elaborate on how that interacts with the calling convention?

@Lokathor
Copy link
Contributor

Lokathor commented Feb 17, 2020

Sure. On chips that support both a32 and t32 code there's a specific bit within the CPU status register that determines if the program counter address should be used to read a single 32-bit value (a32) or one to two 16-bit value(s) (t32). Depending on the "thumb" bit the CPU will perform the appropriate read and take appropriate action. Of course, the bit patterns between a32 and t32 are totally non-compatible. If the CPU is reading code from one isa while having the bit set (or not) for the other you'll get either the wrong legal instructions or just illegal instructions (UB either way).

This is all sorted out by having specific forms of branch instruction that let you enable or disable the bit (if needed) when making calls. The linker performs the task of generating the "interwork" shims during the linking process so that branches go to the correct location and also perform the correct transition as necessary.

@Lokathor
Copy link
Contributor

Hello I'm Lokathor and welcome to my TED talk. Everyone please be sure to thank @Centril for asking me to give this presentation.

This target audience for this post is T-Lang specifically. Others can read it too of course, and I hope you all enjoy it, but the text here will be largely conversational and probably not suitable for direct inclusion into this RFC or into any particular Rust documentation.

A Primer On A32 / T32 Code

Some of the CPUs in the ARM chip family support more than one form of machine code. This is not really like the x86 and x86_64 split, where a single binary is built using entirely one machine code format or the other, and an x86_64 CPU can just also run an old x86 binary. I'm talking about two different machine code formats within a single binary and the CPU can jump back and forth between both the two during the program's execution.

Before the ARMv4T series there was just one flavor of assembly / machine code for ARM chips, naturally called "ARM code". Starting with ARMv4T they added a "Thumb code" flavor as well. (There's also a "Thumb2" extension supported in even later chips.) The assembly text of Thumb code is intentionally as close as possible to the assembly text of ARM code, but the binary encoding of the instructions is totally different.

  • ARM code is align4, with an encoding format that always uses 32 bits per opcode.
  • Thumb code is align2, with an alternate encoding format that uses (mostly) 16 bits per opcode (some "Thumb2" instructions are encoded as two 16-bit values).

Thumb code can't use as many of the registers, and it can't even do all the operations the CPU is capable of performing, but since ARM chips are usually used for embedded stuff, the code space savings actually are a huge deal. Also, since the bus from the storage to the CPU might be only a 16-bit bus using smaller opcodes has a runtime speed effect as well. The CPU literally stalls while waiting for the "second half" of each 32-bit ARM opcode to transfer across the bus.

"Thumb code is typically 65% of the size of the ARM code, and provides 160% of the performance of ARM code when running on a 16-bit memory system." --ARM7TDMI Technical Reference Manual 1.2.2. The Thumb instruction set

As embedded developers, we would naturally like to compile as much of the program as possible in Thumb code to get this advantage.

However, the CPU generally boots in ARM state, and some parts of the code might also be required to be jumped to in ARM state because the chip is just built that way, so we cannot program all of the program just in Thumb code.

Also, as I mentioned above, ARM code has access to more registers at once and can perform more kinds of operation than Thumb can, so even select parts from the "normal" portion of the program might be better written using ARM code.

Reference-level explanation

Precisely the way that this works is both simple and clever:

  • There's a bit in the CPU status register, the "thumb bit", and if it's on then the CPU is in "thumb state", and if it's off then the CPU is in "ARM state".
  • Whenever the CPU performs a bx (branch-exchange) or blx (branch-link-exchange) instruction, the CPU exchanges the least bit of the target address with the status register thumb bit.
  • The lowest bit of the program counter is always ignored for purposes of actually getting the opcode, so a target address of 0x0800_ABC0 and 0x0800_ABC1 will both lead to the same position in the code, and the low bit says if that code is to be used as ARM or Thumb.

That's it.

Code objects generated by LLVM store the address of a given label as either even or odd, so the object files continue to know if each part is ARM or Thumb (link to the ARM ELF spec, check 5.5.3). The linkers for ARM targets can adjust function calls so that calls from ARM to Thumb and back can use bx and blx as appropriate (eg: GNU ld calls the flag -mthumb-interwork), again based on the address of the jump target.

Motivation (yes it's out of order from the official template)

Currently, Rust supports two target groups for ARM devices (many of which are tier2!):

  • There are targets with names that start with arm, such as arm-unknown-linux-musleabihf or armv5te-unknown-linux-gnueabi or others
  • There are targets with names that start with thumb, such as thumbv6m-none-eabi, thumbv7neon-linux-androideabi, and so on.

For these targets, all code of the entire program is restricted to only a32 (ARM targets) or t32 (Thumb targets). LLVM allows you to specify, per function, that you would like a particular format (link to what looks to be the PR that added this, and that shows the C usage as LLVM tests). This RFC allows you to tell LLVM to code generate a function using a specific format.

FAQ

  • Is this an optimization hint?

    • No, to be useful it must be an absolute guarantee that tagged functions are generated using the correct code format. It affects the inline assembly that you write, and it even affects the code's ability to be used at all in select situations.
  • Is this an alternate ABI, like C-unwind and C-nounwind? / Does this need to show up in the type system?

    • Nope. LLVM, the linker, and the CPU all carry around the correct info about functions and function pointers based on the lowest bit of the address. The type of the function itself doesn't actually change. The inter-procedure call format also doesn't change (link if you wanna read about that one).

@Amanieu
Copy link
Member

Amanieu commented Feb 19, 2020

I think this is a useful attribute to have, it is useful for embedded systems and mirrors existing GCC/Clang functionality with __attribute__((target)). However it is entirely specific to ARM targets so it should at the very least have "arm" somewhere in the name.

@Diggsey
Copy link
Contributor

Diggsey commented Feb 19, 2020

So the main reason you would use this attribute is:

  1. You're writing code for an embedded arm device.
  2. You want your code to mostly compile to the "thumb" instruction set.
  3. You have specific functions (interrupt handlers or some such) that will be called by the CPU in ARM mode.

And the way you would use it is:

  1. Compile your code for one of the "thumb" targets.
  2. Mark those specific functions as #[isa = "arm"].

?

@Ixrec
Copy link
Contributor

Ixrec commented Feb 19, 2020

Are there use cases where you'd want to do the inverse, i.e. compile for one of the "arm" targets and mark some functions as #[isa = "thumb"]?

How does this work with libraries? Would some ARM libraries always want to be compiled with one or the other isa in any ARM application? Would it ever make sense for a cross-platform library (e.g. for a hash function) to say things like "if I'm being used on ARM, this part should use thumb isa"? Or do the only embedded programs where this matters use very few libraries or vendor/fork when it matters so it's not a concern?

@Amanieu
Copy link
Member

Amanieu commented Feb 19, 2020

I think the most common use case would be to have t32 be the default and explicitly annotating certain hot functions as a32 for performance (or functions that use instructions not available to thumb via intrinsics/inline asm). I can't think of a good use case for opting into t32 when a32 is the default.

I do not expect any of these attributes to be used by cross-platform libraries, and if an embedded project really needs a function in an external library to be compiled for a32 then they will most likely fork that library.

@Lokathor
Copy link
Contributor

  • t32 default and select a32 functions is also my assumption for how this would be used in practice.
  • I also agree that if you need to specify this you'll probably pull in your own copy of a function, library authors in general should not bother thinking about this when writing a general purpose library.
  • No one specifically asked this but I do not think it should be a compilation error to use this in code being compiled for a non-ARM target because that makes cargo test needlessly tricky to use. I'd compare it to how the #![window_subsystem()] attribute just "does nothing" on linux and mac.

@Diggsey: yes.

@petrochenkov
Copy link
Contributor

petrochenkov commented Feb 19, 2020

#[isa] -> #[instruction_set] perhaps?
It's both more precise if you know what this is about, and less ambiguous if you don't.

Are there any other architectures using this trick with multiple instruction decoding modes in single application-level execution? What terminology they use?

@programmerjake
Copy link
Member

Inlining an A32 function with inline assembly that uses an A32-only instruction into a T32 function can break code, so it does affect inlining.

@clarfonthey
Copy link

I feel like this needs to comment on the ABI for the generated function as well, and whether blocks can be annotated with #[isa].

What is the future scope for this kind of attribute? Will an x86 #[isa = "mov"] ever be supported?

Would there ever be a case where additional instructions are required to switch the processor to the new ISA before running the code? If this is allowed, should there be #[isa = "protected"], #[isa = "real"], #[isa = "long"], etc. provided for x86?

The current use case seems extremely tailored to thumb, and definitely needs more elaboration on how this not only makes sense on ARM, but also on x86 and other architectures. It also needs to elaborate why this should be an attribute instead of a separate target.

@Lokathor
Copy link
Contributor

Well I think I explained the "why an attribute and not a target?" question fairly well already. In short, we have both forms as targets already. What's needed is the ability to intermix things.

@Ixrec
Copy link
Contributor

Ixrec commented Feb 22, 2020

Indeed, this comment thread has very thoroughly explained why the proposed feature is the way it is, but the RFC text is still missing most of that reasoning, which is what I think @clarfon meant by "it also needs to elaborate..."

@clarfonthey
Copy link

Yes, I was basing my response on the RFC text, and had only skimmed the comments. The RFC as is is quite bare.

@programmerjake
Copy link
Member

One place #[isa] might also help is with inter-ISA calls when rustc and LLVM eventually gain support for that. We (Libre-SOC -- formerly Libre-RISCV) are currently building a hybrid PowerPC/RISC-V processor that supports calling between ISAs. Admittedly, rustc and/or LLVM will need some rearchitecting before it can be done without some custom linking steps.

@Lokathor
Copy link
Contributor

  1. Ketsuban doesn't have the time to continue the PR and so has added me as a maintainer on the PR, so I'll try to get edits done soon.

  2. @programmerjake So I suppose that'd be a "future possibilities"?

@programmerjake
Copy link
Member

  1. @programmerjake So I suppose that'd be a "future possibilities"?

Definitely. It will probably take a huge amount of work to implement cross-ISA calls in LLVM, so we may want to use a wasm-bindgen kind of approach meanwhile.

@nikomatsakis
Copy link
Contributor

Nominating for mention at lang team meeting: @Lokathor mentioned to me that this RFC had reached a point where more feedback would be useful, and they were curious whether there is a potential @rust-lang/lang sponsor.

Primarily this provides more clarifications as to what's specified, and also an example of "what this looks like in use".
@joshtriplett
Copy link
Member

Based on discussion in the language team meeting today:

  • The instruction_set attribute shouldn't implicitly include the behavior of cfg_attr. It should be a hard error if you use it on an architecture that doesn't support the instruction set.
    • Ideally this hard error should hint to the user that they should use cfg_attr.
  • If we're going to scope instruction set names to make them unambiguous, there was some discussion about using :: for that (e.g. arm::t32).
  • This will make the syntax somewhat more verbose, but it would be easy enough to provide simplified attributes like #[thumb] that expand to the more verbose #[cfg_attr(..., instruction_set(...))].
  • The issue of interactions between this feature, inline assembly, and function inlining raise some concerns. Based on some experimentation from @Amanieu, it sounds like LLVM might do the right thing here. The consensus in the meeting was definitely that we should try to do the right thing if at all possible; there was a concern of how well we can guarantee that. This also raises questions of whether and how inline assembly should be labeled, such as identifying Thumb vs ARM inline assembly.
    • The proposal in the meeting was to add this as an unresolved issue that needs settling before stabilization.

@Amanieu
Copy link
Member

Amanieu commented Apr 24, 2020

I've been thinking about the possible interactions with inline asm and come up with the following proposal:

  • If a function is marked with #[instruction_set] then inline asm in that function will use that instruction set.
  • If inline asm is used in a function that is not marked with #[instruction_set] then it is up to the compiler what instruction set is used. This may vary depending on the optimization level and inlining.

Basically, if your asm depends on #[instruction_set] then it is your responsibility to mark the function containing it appropriately. This does mean that the compiler will need to prevent inlining of functions with incompatible instruction sets if they use asm!.

@pnkfelix
Copy link
Member

pnkfelix commented May 18, 2020

(to be clear, the lang team is essentially waiting to see responses to the comments that @joshtriplett posted up above).

(Although it is worth noting that the RFC text has been updated with changes that are responses to those comments. I will attempt to review the lang team comments and compare against the current state of the RFC to see if there are any outstanding issues beyond the questions regarding the interaction between inline asm and #[instruction_set(..)]

@joshtriplett
Copy link
Member

@pnkfelix I just reviewed the updated RFC, and I think that the updates have indeed addressed all the outstanding concerns, including documenting the unresolved question. If you agree, then I think this is ready for P-FCP.

@joshtriplett
Copy link
Member

I believe all of the lang team's concerns here have been fully addressed.

@rfcbot merge

@rfcbot
Copy link
Collaborator

rfcbot commented Jun 15, 2020

Team member @joshtriplett has proposed to merge this. The next step is review by the rest of the tagged team members:

No concerns currently listed.

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@rfcbot rfcbot added proposed-final-comment-period Currently awaiting signoff of all team members in order to enter the final comment period. disposition-merge This RFC is in PFCP or FCP with a disposition to merge it. labels Jun 15, 2020
@nikomatsakis
Copy link
Contributor

Should we make the attribute imply #[inline(never)]? Are we happy to just leave that as an "unresolved question" for the future?

Also, do we expect to create a project group to pursue this implementation and see this through? @Lokathor are you interested in pursuing that, for example?

@Lokathor
Copy link
Contributor

I think it's fine as an unresolved question. On LLVM having separate features enabled is generally enough to block function inlining as is, but being able to inline the code into the caller isn't fundamentally a bad idea as long as you're not using inline asm, so being able to do it some day would be nice.

I could help coordinate a person or persons doing the implementation work but I don't know enough of the compiler to do this myself in any kind of timely manner. My guess is that this is probably a somewhat smaller change overall, and might not call for a whole group.

@hanna-kruppe
Copy link

Should we make the attribute imply #[inline(never)]? Are we happy to just leave that as an "unresolved question" for the future?

What possible reason would there be for a blanket ban on inlining? Clearly, changing the ISA variant by inlining or other transformations can cause problems (due to inline asm or external constraints like "this is an interrupt handler => needs A32"), but inlining between function using the same ISA should be 100% fine and desirable. The situation is similar to target_feature, where inlining between functions with different "modes" is tricky due to the "mode switch" being function-level rather than more fine-grained, but any code transformation that takes care to avoid those issues is OK.

In fact, since the A32/T32 switch is encoded as target_feature in LLVM IR and LLVM already takes care to avoid inlining problems due to differing target_feature sets, rustc shouldn't have to do anything special to get the correct behavior. (MIR inlining will also have to take care, but again this applies to ordinary target features like AVX as well.)

@nikomatsakis
Copy link
Contributor

@rfcbot reviewed

OK, I'm convinced regarding inlining, thanks. I overlooked the fact that you would have multiple functions within the same "instruction set" invoking one another.

Regarding formation of a project group, @Lokathor, my intent and hope is that this is not a "high overhead" activity, it's just basically a way for us to track progress, and to have a dedicated zulip stream for any discussion (in other words, a group of 1 or 2 people seems fine). But perhaps this RFC will proceed under the "old process" of creating a tracking issue with no particular tracking. We can discuss/decide that separately.

@rfcbot rfcbot added the final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. label Jun 29, 2020
@rfcbot
Copy link
Collaborator

rfcbot commented Jun 29, 2020

🔔 This is now entering its final comment period, as per the review above. 🔔

@rfcbot rfcbot removed the proposed-final-comment-period Currently awaiting signoff of all team members in order to enter the final comment period. label Jun 29, 2020
@rfcbot rfcbot added finished-final-comment-period The final comment period is finished for this RFC. and removed final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. labels Jul 9, 2020
@rfcbot
Copy link
Collaborator

rfcbot commented Jul 9, 2020

The final comment period, with a disposition to merge, as per the review above, is now complete.

As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed.

The RFC will be merged soon.

@nikomatsakis
Copy link
Contributor

Huzzah! The @rust-lang/lang team has decided to accept this RFC. You can follow along with development on the tracking issue.

@ketsuban ketsuban deleted the isa-attribute branch July 24, 2020 19:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-attributes Proposals relating to attributes A-target_feature target_feature related proposals & ideas A-target Target related proposals & ideas disposition-merge This RFC is in PFCP or FCP with a disposition to merge it. finished-final-comment-period The final comment period is finished for this RFC. T-lang Relevant to the language team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.