[JIT] Add legacy extended EVEX encoding and EVEX.ND/NF feature to x64 emitter backend #108796

Ruihan-Yin · 2024-10-11T17:13:53Z

Overview

This PR is built based on #106557, and is the first one that covers APX-EXTENDED-EVEX encoding.

This PR adds extended EVEX encoding for legacy instructions that are promoted to the EVEX encoding space, currently only instructions wit the new data destination (EVEX.NDD) feature, are covered in the PR. ~~We plan to cover the encoding and instructions for flag suppression (EVEX.NF) in follow-up PRs~~.

EVEX.ND covered instructions:

INC, DEC, NOT, NEG, 

ADD, SUB, AND, OR, XOR, 

SAL, SAR, SHL, SHR, RCL, RCR, ROL, ROR, 

CMOVcc, IMUL(0xAF).

EVEX.NF cover instructions:

 INC, DEC, NEG, ADD, SUB, AND, OR, XOR, SAL, IMUL, IDIV, MUL, DIV,

ROL, ROR, RCL, RCR, SHL, SHR, SAR

TZCNT, LZCNT, POPCNT,

ANDN, BEXTR, BLSI, BLSMSK, BLSR.

Specification

EVEX extension of legacy instructions is one of the changes made on the original EVEX prefix to accommodate the ISA features and new instructions introduced by APX, and this part of extension focuses on promoting legacy instructions into EVEX encoding space and providing them with features like EGPR access, new data destination, zero upper, flag suppression.

As shown in the figure, some bits in original EVEX prefix have been re-purposed: EVEX.b to EVEX.ND, first bit of EVEX.aaa to EVEX.NF, and some bits have become reserved and has to be 0. Also, the promoted legacy instructions take a new legacy-map-index: map-4, as shown at EVEX.bits[18:16], say EVEX.mmm field, to be 100b.

All the promoted legacy instructions should follow this encoding schema, and for instructions that does not use these REX bits for access upper registers, these bits: EVEX.R4, X4, B4, R3, X3, B3 should be kept in logical-0 (0, or 1 if defined in inverted way.).

Design

As stated above, this PR will cover the encoding changes needed for EVEX extension for legacy instructions and support for EVEX.ND.

The bulk of the changes occur in the backend emitter, and some changes are added to code generation as the entry of optimization of NDD format.

One part I need to call out in the design is that we separated the EVEX encoding path for legacy instructions with the original EVEX path, and the new emit path will be guarded by TakesApxExtendedEvexPrefix. The main reason for this is that the legacy extension part for APX-EVEX will break the assumption that EVEX is only for SIMD instructions and will only be appear on SIMD instruction emit paths, which JIT carries a lot of assertion check to verify. To let the original checks hold as much as possible, we finally chose to establish a stand-alone branch for extended legacy instructions on the path that does not have legacy encoding, or re-use the existing legacy encoding path with some prefix work.

Optimization & Performance

In the asmdiff part below, code size regression was observed, say the use of EVEX.ND feature will increase the code size, in detail, the NDD form will introduce at most 2-byte regression per instruction, this is expected as we are using a 4-byte prefixed instruction to replace 2 legacy instructions which are normally 2 bytes. This creates the tradeoff between code size and instructions count, and we will be contributing to teach JIT how to wisely use this feature to get maximum performance gain while controlling the code size regression with a series of followed tuning works.

For better tuning the features, we added the optimization knob for NDD: JitEnableAPXNDD, now NDD optimization is there for a few binary and unary instructions when the target register is different from src operands, but to use this feature more wisely, we will need more tunning work in the future, so we plan to have individual tunning knob for each feature APX provides, like NDD, NF, etc.

Testing

Note: The testing plan for APX work has been discussed in #106557, please refer to that PR for details, only results and comments will be posted in this PR.

Results separately posted below.

Follow-up plans

After this PR, we will continue to complete the APX-EVEX support for EVEX.NF for legacy/VEX instructions, and further APX-EVEX support for VEX/EVEX instructions.

Edit:
We eventually decided to cover the EVEX.NF feature within this PR as well. This feature will be enabled with encoding only, and there will be no active surface for this feature until we have some related codegen works.

In summary, this PR covers all the changes to enable EVEX.ND/NF feature, plus the needed register encoding, while this PR is not intended for full coverage for this part.

Ruihan-Yin · 2024-10-11T17:14:14Z

1. Emitter unit tests

JIT disassembler outputs:

LLVM disassembler outputs:

No disassemble diff observed:

Ruihan-Yin · 2024-10-11T17:14:36Z

2. SuperPMI

Verification with SuperPMI:

asmdiffs:
Diffs are based on 2,830,588 contexts (1,185,269 MinOpts, 1,645,319 FullOpts).

MISSED contexts: base: 0 (0.00%), diff: 11 (0.00%)

Diff JIT options: JitBypassAPXCheck=1

Overall (+330,453 bytes)

Collection	Base size (bytes)	Diff size (bytes)	PerfScore in Diffs
aspnet.run.windows.x64.checked.mch	49,406,065	+17,322	-0.62%
benchmarks.run.windows.x64.checked.mch	12,230,572	+12,326	-0.78%
benchmarks.run_pgo.windows.x64.checked.mch	40,192,955	+34,649	-0.73%
benchmarks.run_tiered.windows.x64.checked.mch	17,606,620	+8,076	-0.87%
coreclr_tests.run.windows.x64.checked.mch	409,086,766	+40,588	-0.42%
libraries.crossgen2.windows.x64.checked.mch	45,250,222	+15,319	-1.03%
libraries.pmi.windows.x64.checked.mch	63,022,393	+26,233	-1.10%
libraries_tests.run.windows.x64.Release.mch	336,307,360	+113,846	-0.62%
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch	147,986,092	+48,675	-0.47%
realworld.run.windows.x64.checked.mch	11,552,911	+5,010	-0.76%
smoke_tests.nativeaot.windows.x64.checked.mch	5,023,568	+8,409	-1.45%

MinOpts (+17,921 bytes)

Collection	Base size (bytes)	Diff size (bytes)	PerfScore in Diffs
aspnet.run.windows.x64.checked.mch	23,379,337	+552	-0.38%
benchmarks.run.windows.x64.checked.mch	588	+2	-0.43%
benchmarks.run_pgo.windows.x64.checked.mch	18,796,230	+1,116	-0.46%
benchmarks.run_tiered.windows.x64.checked.mch	13,707,415	+945	-0.45%
coreclr_tests.run.windows.x64.checked.mch	287,081,075	+8,891	-0.44%
libraries.crossgen2.windows.x64.checked.mch	1,705	+2	-0.43%
libraries.pmi.windows.x64.checked.mch	112,961	+2	-0.43%
libraries_tests.run.windows.x64.Release.mch	203,705,533	+5,515	-0.42%
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch	10,696,900	+896	-0.12%

FullOpts (+312,532 bytes)

Collection	Base size (bytes)	Diff size (bytes)	PerfScore in Diffs
aspnet.run.windows.x64.checked.mch	26,026,728	+16,770	-0.63%
benchmarks.run.windows.x64.checked.mch	12,229,984	+12,324	-0.78%
benchmarks.run_pgo.windows.x64.checked.mch	21,396,725	+33,533	-0.74%
benchmarks.run_tiered.windows.x64.checked.mch	3,899,205	+7,131	-0.98%
coreclr_tests.run.windows.x64.checked.mch	122,005,691	+31,697	-0.42%
libraries.crossgen2.windows.x64.checked.mch	45,248,517	+15,317	-1.03%
libraries.pmi.windows.x64.checked.mch	62,909,432	+26,231	-1.10%
libraries_tests.run.windows.x64.Release.mch	132,601,827	+108,331	-0.64%
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch	137,289,192	+47,779	-0.48%
realworld.run.windows.x64.checked.mch	11,139,943	+5,010	-0.76%
smoke_tests.nativeaot.windows.x64.checked.mch	5,022,597	+8,409	-1.45%

tpdiff:

Diff JIT options: JitBypassAPXCheck=1

Overall (+0.27% to +0.60%)

Collection	PDIFF
aspnet.run.windows.x64.checked.mch	+0.41%
benchmarks.run.windows.x64.checked.mch	+0.27%
benchmarks.run_pgo.windows.x64.checked.mch	+0.35%
benchmarks.run_tiered.windows.x64.checked.mch	+0.60%
coreclr_tests.run.windows.x64.checked.mch	+0.53%
libraries.crossgen2.windows.x64.checked.mch	+0.38%
libraries.pmi.windows.x64.checked.mch	+0.30%
libraries_tests.run.windows.x64.Release.mch	+0.46%
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch	+0.32%
realworld.run.windows.x64.checked.mch	+0.29%
smoke_tests.nativeaot.windows.x64.checked.mch	+0.27%

MinOpts (+0.82% to +1.08%)

Collection	PDIFF
aspnet.run.windows.x64.checked.mch	+1.08%
benchmarks.run.windows.x64.checked.mch	+0.92%
benchmarks.run_pgo.windows.x64.checked.mch	+1.04%
benchmarks.run_tiered.windows.x64.checked.mch	+1.05%
coreclr_tests.run.windows.x64.checked.mch	+0.82%
libraries.crossgen2.windows.x64.checked.mch	+1.02%
libraries.pmi.windows.x64.checked.mch	+0.82%
libraries_tests.run.windows.x64.Release.mch	+1.07%
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch	+0.90%
realworld.run.windows.x64.checked.mch	+1.02%
smoke_tests.nativeaot.windows.x64.checked.mch	+0.95%

FullOpts (+0.22% to +0.38%)

Collection	PDIFF
aspnet.run.windows.x64.checked.mch	+0.27%
benchmarks.run.windows.x64.checked.mch	+0.27%
benchmarks.run_pgo.windows.x64.checked.mch	+0.22%
benchmarks.run_tiered.windows.x64.checked.mch	+0.24%
coreclr_tests.run.windows.x64.checked.mch	+0.31%
libraries.crossgen2.windows.x64.checked.mch	+0.38%
libraries.pmi.windows.x64.checked.mch	+0.30%
libraries_tests.run.windows.x64.Release.mch	+0.26%
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch	+0.31%
realworld.run.windows.x64.checked.mch	+0.28%
smoke_tests.nativeaot.windows.x64.checked.mch	+0.27%

dotnet-policy-service · 2024-10-11T17:14:54Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Ruihan-Yin · 2024-10-11T17:14:56Z

3. JIT unit tests

DOTNET_JitBypassAPXCheck = 0:

DOTNET_JitBypassAPXCheck = 1:

Ruihan-Yin · 2024-10-11T17:15:18Z

4. Supplement files:

To see detail diffs, please refer to the following files: (files are too large to display on github)

asm:
asmdiff_summary.md
(~100+ assertion failures related ISA checks, expected due to the CPUID updates).
asm.log

tpdiff:
tpdiff_summary.md
(~500+ assertion failures related to the pipeline itself - map-key missing, should not be related to the changes.)
tpdiff.log

Update comments. Merge the REX2 changes into the original legacy emit path bug fix: Set REX2.W with correct mask code. register encoding and prefix emitting logics. Add REX2 prefix emit logic bug fixes Add Stress mode for REX2 encoding and some bug fixes resolve comments: 1. add assertion check for UD opcodes. 2. add checks for EGPRs. Add REX2 to emitOutputAM, and let LEA to be REX2 compatible. Add REX2.X encoding for SIB byte But fixes: add REX2 prefix on the path in RI where MOV is specially handled. Enable REX2 encoding for `movups` fixed bugs in REX2 prefix emitting logic when working with map 1 instructions, and enabled REX2 for POPCNT legacy map index-er bug fixes some clean-up Adding initial APX unit testing path. Adding a coredistools dll that has LLVM APX disasm capability. It must be coppied into a CORE_ROOT manually. clean up work for REX2 narrow the REX2 scope to `sub` only some clean up based on the comments. bug fix resolve comment

- SV path is mostly for debugging purposes Added encoding unit tests for instructions with immediates

Code refactoring: AddX86PrefixIfNeeded.

… missing in JIT, may indicate these instructions are not being used in JIT, drop them for now.

tannergooding · 2025-01-24T17:15:37Z

Looks like there's still a merge conflict. I've started the secondary review however, so if that gets resolved I expect we can get this merged today

tannergooding · 2025-01-24T18:28:54Z

src/coreclr/jit/codegenxarch.cpp

+        if (GetEmitter()->DoJitUseApxNDD(ins) && (targetReg != operandReg))
+        {
+            GetEmitter()->emitIns_R_R(ins, emitTypeSize(operand), targetReg, operandReg, INS_OPTS_EVEX_nd);
+        }
+        else
+        {
+            inst_Mov(targetType, targetReg, operandReg, /* canSkip */ true);
+            inst_RV(ins, targetReg, targetType);
+        }


This general pattern is repeated quite a lot (with some variations), so I wonder if we should have a helper like I added for SIMD.

For example, we have emitIns_SIMD_R_R_R which looks like: https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/emitxarch.cpp#L8855-L8880 (other variations exist for handling things like memory operands or immediate; and higher level helpers like genHWIntrinsic_R_R_RM exist for determining which of the variations to call between emitIns_SIMD_R_R_R, emitIns_SIMD_R_R_A, emitIns_SIMD_R_R_C, and emitIns_SIMD_R_R_S)

This lets us correctly represent any SIMD dst = src1 op src2 operation given the raw registers and then internally handles the RMW consideration, so that the rest of codegen can remain simpler and more readable.

In this case, for example, it seems like we "could" have simplified this down to something like:

GetEmitter()->emitIns_BASE_R_R(ins, emitTypeSize(operand), targetReg, operandReg);

and than had this helper make the distinction of handling APX, NDD, inserting the Mov for the regular case; etc

Presumably this would also make the diffs for other APX support much simpler as well, since we have fewer centralized helpers to update.

tannergooding · 2025-01-24T18:42:42Z

src/coreclr/jit/codegenxarch.cpp

+            // reg3 = op1 op op2 without extra mov
+
+            // see if it can be optimized by inc/dec
+            if (oper == GT_ADD && op2->isContainedIntOrIImmed() && !treeNode->gtOverflowEx())


The handling here of ADD into INC/DEC is also repeated in multiple locations, so probably another place where having centralized helpers is beneficial and ensures we're not missing it anywhere.

It's much better as a peephole in emit than something codegen must directly consider, IMO.

tannergooding · 2025-01-24T18:43:53Z

src/coreclr/jit/codegenxarch.cpp

@@ -4406,23 +4469,23 @@ void CodeGen::genCodeForLockAdd(GenTreeOp* node)
        if (imm == 1)
        {
            // inc [addr]
-            GetEmitter()->emitIns_AR(INS_inc, size, addr->GetRegNum(), 0);
+            GetEmitter()->emitIns_AR(INS_inc_no_evex, size, addr->GetRegNum(), 0);


nit: We should probably keep the existing name since its the baseline instruction. We should rather give the APX specific variant a new name, like INS_inc_apx or similar, to helper ensure other paths don't accidentally use the wrong one.

The changes here were made due to the fact that the LOCK prefix can not be used with a EVEX prefixed instruction, but it is legal with a REX2 prefixed instructions. And this happens in very limited cases with inc, dec, and, or

I definitely agree with the idea that we should make the new naming variants pointing to the instructions with new features and only use them when new features are needed like EGPRs, NDD, and NF. But I will probably need to preserve the REX2 functionality in the original INS_inc to get EGPRs support. It might be a bit off the semantic the names: INS_inc/INS_inc_apx. Will that be acceptable?

But I will probably need to preserve the REX2 functionality in the original INS_inc to get EGPRs support.

It's definitely fine for an instruction like INS_inc to allow opportunistic lightup for the REX2 encoding; we have the same thing where INS_addps is used for legacy, vex, and evex all together for example.

The main consideration is simply that we don't want the "good name" like INS_inc to be the thing that requires higher level checks (i.e. requires checking APX is supported). Such a case would inevitably cause issues down the road because someone thinks it is simply the inc instruction that's been around for 40+ years now.

If there must be two different entries for the same instruction because the opcodes conflict, then names like INS_inc and INS_inc_apx sound good to me. However, if its just a restriction that something like LOCK can't use the EVEX encoding and the opcode and base information otherwise remains the same, that sounds like we don't actually need "two instructions" defined and is rather something that LSRA handles in the allowed registers and codegen handles in the INS_OPTS it passes down

tannergooding · 2025-01-24T18:44:22Z

src/coreclr/jit/codegenxarch.cpp

@@ -4449,7 +4512,7 @@ void CodeGen::genLockedInstructions(GenTreeOp* node)

    if (node->OperIs(GT_XORR, GT_XAND))
    {
-        const instruction ins = node->OperIs(GT_XORR) ? INS_or : INS_and;
+        const instruction ins = node->OperIs(GT_XORR) ? INS_or_no_evex : INS_and_no_evex;


Same comment on dec, add, or, etc.

tannergooding · 2025-01-24T18:50:37Z

src/coreclr/jit/emitxarch.cpp

+    // TODO-Xarch-apx: we have special stress mode for REX2 on non-compatible machine, that will
+    //                 force UseRex2Encoding return true regardless of the CPUID results.


Is this comment necessary? I don't believe we have such a comment for the EVEX path.

I also thought we weren't enforcing it regardless of the CPUID; but rather were allowing it to be set where supported and using AltJit to get ISAs like APX enabled so disassembly can be gotten

No, I will remove it, this was for some internal testing at initial stage, which has been deprecated.

tannergooding · 2025-01-24T18:56:57Z

src/coreclr/jit/emitxarch.cpp

+                        assert(hasEvexPrefix(code));
+                        code = AddRexWPrefix(id, code);
+                    }
+                    if ((ins != INS_lzcnt_evex) && (ins != INS_tzcnt_evex) && (ins != INS_popcnt_evex))


These are rather lzcnt_apx, not strictly "evex" right? That is, there are other lzcnt/popcnt instructions for SIMD under a different name, so perhaps the base ISA is better than the encoding here?

It is strictly for apx-promoted-evex, REX2 does not have this issue though, we may alternatively make it like INS_lzcnt_apx_evex?

This should come down to the same consideration as #108796 (comment)

If we have a unique opcode situation (like the manual calls out) then having a new instruction defined is reasonable. However, if the opcode is the same and its really just special handling of prefixes or features like embedded masking that is only available in a newer ISA, then that rather seems like something that is implicitly handled in emit or other places instead.

The screenshot of the manual you gave, for example, is referring to the opcode as being the actual single-byte opcode + additional information like prefixes. Where-as the JIT really just considers the main opcode byte and automatically extracts any required prefixes into the relevant bit positions.

lzcnt is a case that actually has a new main opcode byte, as it changes the "main opcode byte" from 0xBD to instead be 0xF5. The legacy encoding also has a required prefix of 0xF3 and 0x0F; where-as the APX encoding uses 66 + MAP4, so they are clearly incompatible and require separate instruction table entries to be handled.

tannergooding · 2025-01-24T19:00:11Z

Changes overall LGTM. However, I have some concerns about the naming/renaming of some instructions that are liable to cause issues down the road. -- I'd like to see these handled before merging

There was then a separate suggestion about defining helpers to extract and hide the differences of BASE vs APX encoding from codegen. -- I don't think these must be handled before merging, but I expect doing them now may simplify other work we're going to be doing here and so at the very least logging an issue and aiming for that to be done as part of the overall APX work is goodness; as it will help avoid places where we miss such optimizations.

Ruihan-Yin · 2025-01-24T19:55:42Z

Thanks for the feedback! I will make changes for naming and helpers in this PR together, also I left some thoughts on the naming issue, would appreciate it if we can discuss more on that.

tannergooding

LGTM. Thanks for handling some of the comments, this is a lot easier to follow now and the code reuse helps build confidence in it being correct.

CC. @dotnet/jit-contrib for secondary review of the additional cleanup

BruceForstall · 2025-01-29T23:13:46Z

@Ruihan-Yin There are a few asm diffs where not arguments change from 32-bit registers to smaller registers (al, ax, etc.):

https://dev.azure.com/dnceng-public/public/_build/results?buildId=932515&view=ms.vss-build-web.run-extensions-tab

Can you explain why there are diffs? Shouldn't this PR be "no diff"?

Ruihan-Yin · 2025-01-29T23:51:22Z

@Ruihan-Yin There are a few asm diffs where not arguments change from 32-bit registers to smaller registers (al, ax, etc.):

https://dev.azure.com/dnceng-public/public/_build/results?buildId=932515&view=ms.vss-build-web.run-extensions-tab

Can you explain why there are diffs? Shouldn't this PR be "no diff"?

https://github.com/dotnet/runtime/pull/108796/files#diff-63dc452244e1b3fea66bfdc746d83c26f866ef153966fd708585df5428e49093R745
The discrepancy should be due to emitTypeSize() and emitActualTypeSize() returns different values on smaller type, e.g. BYTE or USHORT. emitTypeSize() returns the "real" size, 1 or 2, and emitActualTypeSize() returns 4.

I'm actually a bit confused when to use these 2 helpers, it would be useful to know as this will lead to different code gens.
I did see this diff locally, and didn't get the time to bring this up here, thanks for pointing out.

Edit:
I will change the size helper back to emitActualTypeSize since as mentioned this PR is not intended to introduce any asm diff when the tuning knob is off, but I would still appreciate it if some thoughts on the question I had above can be shared. :)

BruceForstall · 2025-01-30T00:35:06Z

emitActualTypeSize and emitTypeSize only differ in the byte/short integer types. There's some sloppiness in the JIT as a result: since in many (or even most) cases the argument is known to NOT be a byte/short type, either can be used. In actuality, in most cases emitActualTypeSize is probably what should be used. The exception would be loads/stores, when the precise size matters.

Ruihan-Yin · 2025-01-30T00:36:31Z

I see, thanks for the explanation!

Ruihan-Yin · 2025-01-30T04:49:56Z

failure looks much similar to #84911, rerunning the CI.

BruceForstall · 2025-01-30T17:48:44Z

@Ruihan-Yin Thanks for all the work!

Ruihan-Yin · 2025-01-30T17:49:48Z

Thanks for reviews and suggestions!

* main: (31 commits) More native AOT Pri-1 test tree bring up (dotnet#111994) Fix BigInteger outerloop test (dotnet#111841) JIT: Run 3-opt once across all regions (dotnet#111989) JIT: Check for profile consistency throughout JIT backend (dotnet#111684) [JIT] Add legacy extended EVEX encoding and EVEX.ND/NF feature to x64 emitter backend (dotnet#108796) [iOS][globalization] Fix IndexOf on empty strings on iOS to return -1 (dotnet#111898) System.Speech: Use intellisense xml from dotnet-api-docs (dotnet#111983) [mono][mini] Disable inlining if we encounter class initialization failure (dotnet#111754) [main] Update dependencies from dotnet/roslyn (dotnet#111946) Update dependencies from https://github.com/dotnet/arcade build 20250129.2 (dotnet#111996) Try changing the ICustomQueryInterface implementation to always return NotHandled instead of Failed to defer back to the ComWrappers impl. (dotnet#111978) Combined dependency update (dotnet#111852) Replace OPTIMIZE_FOR_SIZE with feature switch (dotnet#111743) Fix failed assertion 'FPbased == FPbased2' (dotnet#111787) Add remark to `ConditionalSelect` (dotnet#111945) JIT: fix try region cloning when try is nested in a handler (dotnet#111975) Use IRootFunctions in Tensor.StdDev (dotnet#110641) Remove zlib dependencies from Docker containers (dotnet#111939) Avoid `Unsafe.As` for `Memory<T>` and `ReadOnlyMemory<T>` conversion (dotnet#111023) Cleanup membarrier portability (dotnet#111943) ...

dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Oct 11, 2024

dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Oct 11, 2024

DeepakRajendrakumaran mentioned this pull request Oct 11, 2024

[JIT] [APX] Enable additional General Purpose Registers. #108799

Open

BruceForstall added the apx Related to the Intel Advanced Performance Extensions (APX) label Oct 15, 2024

BruceForstall mentioned this pull request Oct 15, 2024

Intel architecture improvements for .NET 10 #108869

Open

36 tasks

Ruihan-Yin force-pushed the apx-evex-legacy branch from 0bd4680 to c4b162d Compare November 19, 2024 19:49

Ruihan-Yin added 13 commits November 19, 2024 12:05

resolve comments

d1afc68

refactor register encoding for REX2

2335aa3

merge REX2 path to legacy path

6578c58

Enable REX2 in more instructions.

01eeb80

Avoid repeatedly estimate the size of REX2 prefix

690aee3

Enable REX2 encoding on RI and SV path

31d7fb4

- SV path is mostly for debugging purposes Added encoding unit tests for instructions with immediates

Add rex2 support to rotate and shift.

a995878

CR session.

74aacf6

Testing infra updates: assert REX2 is enabled.

c330927

Code refactoring: AddX86PrefixIfNeeded.

revert rcl_N and rcr_N, tp and latency data for these instructions is…

fbf20d1

… missing in JIT, may indicate these instructions are not being used in JIT, drop them for now.

partially enable REX2 on emitOutputAM, case covered: R_AR and AR_R.

ea02e70

Adding unit tests.

c74b801

Ruihan-Yin added 3 commits January 24, 2025 09:44

Merge remote-tracking branch 'origin/main' into apx-evex-legacy-jan

521f978

Resolve merge error.

b48e3d1

formatting

5340893

tannergooding reviewed Jan 24, 2025

View reviewed changes

Ruihan-Yin added 2 commits January 28, 2025 13:44

resolve comments

8dd0b62

formatting

f424766

Ruihan-Yin requested a review from tannergooding January 29, 2025 18:36

tannergooding approved these changes Jan 29, 2025

View reviewed changes

use the right size for neg and not code gen.

f18e5f0

Ruihan-Yin closed this Jan 30, 2025

Ruihan-Yin reopened this Jan 30, 2025

Ruihan-Yin requested a review from BruceForstall January 30, 2025 17:20

BruceForstall approved these changes Jan 30, 2025

View reviewed changes

BruceForstall merged commit f652094 into dotnet:main Jan 30, 2025
123 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[JIT] Add legacy extended EVEX encoding and EVEX.ND/NF feature to x64 emitter backend #108796

[JIT] Add legacy extended EVEX encoding and EVEX.ND/NF feature to x64 emitter backend #108796

Ruihan-Yin commented Oct 11, 2024 •

edited

Loading

Ruihan-Yin commented Oct 11, 2024

Ruihan-Yin commented Oct 11, 2024

dotnet-policy-service bot commented Oct 11, 2024

Ruihan-Yin commented Oct 11, 2024

Ruihan-Yin commented Oct 11, 2024

tannergooding commented Jan 24, 2025

tannergooding Jan 24, 2025 •

edited

Loading

tannergooding Jan 24, 2025

tannergooding Jan 24, 2025

tannergooding Jan 24, 2025

Ruihan-Yin Jan 24, 2025

tannergooding Jan 24, 2025 •

edited

Loading

tannergooding Jan 24, 2025

tannergooding Jan 24, 2025

Ruihan-Yin Jan 24, 2025

tannergooding Jan 24, 2025

Ruihan-Yin Jan 24, 2025 •

edited

Loading

tannergooding Jan 24, 2025

tannergooding commented Jan 24, 2025

Ruihan-Yin commented Jan 24, 2025

tannergooding left a comment

BruceForstall commented Jan 29, 2025

Ruihan-Yin commented Jan 29, 2025 •

edited

Loading

BruceForstall commented Jan 30, 2025

Ruihan-Yin commented Jan 30, 2025

Ruihan-Yin commented Jan 30, 2025

BruceForstall commented Jan 30, 2025

Ruihan-Yin commented Jan 30, 2025

		// TODO-Xarch-apx: we have special stress mode for REX2 on non-compatible machine, that will
		// force UseRex2Encoding return true regardless of the CPUID results.

[JIT] Add legacy extended EVEX encoding and EVEX.ND/NF feature to x64 emitter backend #108796

[JIT] Add legacy extended EVEX encoding and EVEX.ND/NF feature to x64 emitter backend #108796

Conversation

Ruihan-Yin commented Oct 11, 2024 • edited Loading

Overview

Specification

Design

Optimization & Performance

Testing

Follow-up plans

Ruihan-Yin commented Oct 11, 2024

1. Emitter unit tests

Ruihan-Yin commented Oct 11, 2024

2. SuperPMI

dotnet-policy-service bot commented Oct 11, 2024

Ruihan-Yin commented Oct 11, 2024

3. JIT unit tests

Ruihan-Yin commented Oct 11, 2024

4. Supplement files:

tannergooding commented Jan 24, 2025

tannergooding Jan 24, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tannergooding Jan 24, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Ruihan-Yin Jan 24, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tannergooding commented Jan 24, 2025

Ruihan-Yin commented Jan 24, 2025

tannergooding left a comment

Choose a reason for hiding this comment

BruceForstall commented Jan 29, 2025

Ruihan-Yin commented Jan 29, 2025 • edited Loading

BruceForstall commented Jan 30, 2025

Ruihan-Yin commented Jan 30, 2025

Ruihan-Yin commented Jan 30, 2025

BruceForstall commented Jan 30, 2025

Ruihan-Yin commented Jan 30, 2025

Ruihan-Yin commented Oct 11, 2024 •

edited

Loading

tannergooding Jan 24, 2025 •

edited

Loading

tannergooding Jan 24, 2025 •

edited

Loading

Ruihan-Yin Jan 24, 2025 •

edited

Loading

Ruihan-Yin commented Jan 29, 2025 •

edited

Loading