-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update jit for new likely class records #51664
Update jit for new likely class records #51664
Conversation
…cords Update jit for new likely class records
cc @dotnet/jit-contrib Unfortunately assessing diffs here is a bit tricky... PMI doesn't see this profile data (because we disable R2R for PMI) and SPMI does not have the method context info for the methods that would have diffs. Using the latter we can ballpark how many methods would have diffs; it looks like maybe 2000 or so across all the collections. I am going to hack PMI to not disable R2R and see if we see anything useful there. Also will try crossgen2 based PMI. |
See #51643 for context. |
Here are the R2R-enabled PMI diffs.
|
Sample regression diff (showing GDV kicking in and enabling some inlining): before; Assembly listing for method TypeExtensions:IsPrimitive(Type):bool
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; optimized code
; rsp based frame
; fully interruptible
; No PGO data
; 1 inlinees with PGO data; 0 single block inlinees; 0 inlinees without PGO data
; Final local variable assignments
;
; V00 arg0 [V00,T00] ( 4, 4 ) ref -> rcx class-hnd
;# V01 OutArgs [V01 ] ( 1, 1 ) lclBlk ( 0) [rsp+00H] "OutgoingArgSpace"
;
; Lcl frame size = 0
G_M56356_IG01:
;; bbWeight=1 PerfScore 0.00
G_M56356_IG02:
mov rax, qword ptr [rcx]
mov rax, qword ptr [rax+112]
mov rax, qword ptr [rax+40]
;; bbWeight=1 PerfScore 6.00
G_M56356_IG03:
rex.jmp rax
;; bbWeight=1 PerfScore 2.00 after; Assembly listing for method TypeExtensions:IsPrimitive(Type):bool
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; optimized code
; rsp based frame
; fully interruptible
; No PGO data
; 1 inlinees with PGO data; 3 single block inlinees; 0 inlinees without PGO data
; Final local variable assignments
;
; V00 arg0 [V00,T00] ( 6, 3.52) ref -> rcx class-hnd
; V01 OutArgs [V01 ] ( 1, 1 ) lclBlk (32) [rsp+00H] "OutgoingArgSpace"
; V02 tmp1 [V02,T02] ( 2, 0.97) int -> rax "guarded devirt return temp"
;* V03 tmp2 [V03 ] ( 0, 0 ) ref -> zero-ref class-hnd exact "guarded devirt this exact temp"
; V04 tmp3 [V04,T01] ( 2, 1.94) ubyte -> rcx "Inlining Arg"
;
; Lcl frame size = 40
G_M56356_IG01:
sub rsp, 40
;; bbWeight=1 PerfScore 0.25
G_M56356_IG02:
mov rax, 0xD1FFAB1E
cmp qword ptr [rcx], rax
jne SHORT G_M56356_IG05
;; bbWeight=1 PerfScore 3.25
G_M56356_IG03:
call RuntimeTypeHandle:GetCorElementType(RuntimeType):ubyte
movzx rcx, al
mov eax, 1
shl eax, cl
test eax, 0xD1FFAB1E
setne al
movzx rax, al
movzx rax, al
;; bbWeight=0.49 PerfScore 2.55
G_M56356_IG04:
add rsp, 40
ret
;; bbWeight=0.49 PerfScore 0.61
G_M56356_IG05:
mov rax, qword ptr [rcx]
mov rax, qword ptr [rax+112]
mov rax, qword ptr [rax+40]
;; bbWeight=0.01 PerfScore 0.09
G_M56356_IG06:
add rsp, 40
rex.jmp rax |
crossgen2 jit-diffs is hitting the assert from #10821 in SPC:
This happens without these changes too. |
The superpmi CI test is hitting some replay failures... investigating
|
Failure is on replay of During collection we see:
and during replay
and so during replay we issue a The replay data looks similar, just the high part of the address is different, so perhaps we're still not recording the right amount of data for a PGO payload, or something else is corrupting the data. Dumping the raw MC for that method, I see
so the method get likely class data at offset 4. So the profileBufSize should be 12, not 8. Hmm. The bug is here: if (pInSchema[i].Offset >= maxOffset)
{
maxOffset = pInSchema[i].Offset + pInSchema[i].Count * sizeof(uintptr_t);
} We process the first record and set |
@dotnet/jit-contrib ping |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the current implementation of getLikelyClass
, as a raw function export from the JIT, is unacceptable. It needs to either be a member of the JIT-EE interface (maybe a member of ICorJitCompiler? or maybe ICorJitCompiler should have a kind of QueryInterface that would give back some kind of ICorJitPgoUtility interface), or be in a different dll. The JIT code should have full access to the jithost (so config variables work, memory allocation works), etc.
@BruceForstall frankly I agree with you. If someone has the bandwidth to suggest what the api should actually look like and tweak the jit side of the fence to provide/use the api, I can take the cost for integration into crossgen2. The ergonomics of what's needed in the jit though are rather alien to me, so I'd really rather not drive defining that interface. For instance, I tried to do better when I put this together in the first place, and I was not successful in coming up with a way to integrate with the JIT's memory allocation apis. I eventually gave up and made a stack allocated fixed size buffer which was "good enough" to solve the functional problems I was working on. |
Fix the jit to consume the new "LikelyClass" records seen from crossgen-processed PGO data.