Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't resolve direct methods outside of the version bubble #38366

Closed
wants to merge 2 commits into from

Conversation

trylek
Copy link
Member

@trylek trylek commented Jun 25, 2020

This change fixes 66 TypeGeneratorTests that were previously failing due to inability to locate the associated MethodDesc.

/cc: @dotnet/crossgen-contrib

@trylek trylek changed the title WIP: Don't resolve direct methods outside of the version bubble Don't resolve direct methods outside of the version bubble Jun 25, 2020
@@ -1244,6 +1244,13 @@ private void getFieldInfo(ref CORINFO_RESOLVED_TOKEN pResolvedToken, CORINFO_MET
//

targetMethod = methodAfterConstraintResolution;
if (!_compilation.CompilationModuleGroup.VersionsWithMethodBody(targetMethod))
{
// This loosely corresponds to the bit around
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the specific bit that this corresponds to?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that the undo of the constraint resolution takes place at line 1301,

methodToken = pResolvedToken->token;

The line I quoted before corresponds to the "same version bubble" check.

// This loosely corresponds to the bit around
// https://github.com/dotnet/runtime/blob/9380bbf37bb0ba19c7e2671359875c93686b400c/src/coreclr/src/vm/zapsig.cpp#L1278
// where we're resorting to use the original token for the sake of version resiliency.
targetMethod = originalMethod;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to undo other effects of the constraint resolution? E.g. set resolvedConstraint back to false?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about it some more it may be better to push this a bit up to the place where we handle the constraint resolution in general so that we only conditionally apply it when the resolved method belongs to the same version bubble. I'll fix that together with other PR feedback based on my attempts at clarifications below.

@davidwrighton
Copy link
Member

@trylek, could please explain which tests specifically are fixed by this change, and what the observed effect of the change is? In particular I'm interested in seeing the impact on the r2rdump disassembly for a case which changes.

@davidwrighton
Copy link
Member

In general this fix looks wrong, as it might result in making an interface dispatch instead of a constrained dispatch when making a constrained dispatch on a structure defined in another module, or something else unfortunate.

@trylek
Copy link
Member Author

trylek commented Jun 25, 2020

@davidwrighton, @jkotas - Thanks for your immediate responses. One example test failing without this change is

Loader\classloader\TypeGeneratorTests\TypeGeneratorTest1\Generated1\Generated1.cmd

Diagnostic output from the failure looks like this:

...
===================== Struct Constrained Interface Calls Test =====================
M.MyStruct51.T)W>(!!W 'inst', string exp)
   # count = 4
   # expectedResultsArray.Length = 4
      # expectedResultsArray[0] = 'MyStruct51::Method0.405()'
      # expectedResultsArray[1] = 'MyStruct51::Method1.MI.407()'
      # expectedResultsArray[2] = 'MyStruct51::Method2.MI.409()'
      # expectedResultsArray[3] = 'MyStruct51::Method3.MI.411()'
    -> EXPECTED: MyStruct51::Method0.405()
    -> GOT:      MyStruct51::Method0.405()
    -> EXPECTED: MyStruct51::Method1.MI.407()
    -> GOT:      MyStruct51::Method1.MI.407()
    -> EXPECTED: MyStruct51::Method2.MI.409()
    -> GOT:      MyStruct51::Method2.MI.409()
    -> EXPECTED: MyStruct51::Method3.MI.411()
    -> GOT:      MyStruct51::Method3.MI.411()
Assert failure(PID 24572 [0x00005ffc], Thread: 36640 [0x8f20]): false
CORECLR! MethodDesc::FindOrCreateAssociatedMethodDesc + 0x1F9F (0x00007ff9`b8c1edab)
CORECLR! ZapSig::DecodeMethod + 0x5A1 (0x00007ff9`b8c50281)
CORECLR! ZapSig::DecodeMethod + 0xF0 (0x00007ff9`b8c5051c)
CORECLR! LoadDynamicInfoEntry + 0x882 (0x00007ff9`b88891fa)
CORECLR! Module::FixupNativeEntry + 0x28B (0x00007ff9`b88cdc5b)
CORECLR! Module::FixupDelayListAux + 0x485 (0x00007ff9`b8ac473d)
CORECLR! ReadyToRunInfo::GetEntryPoint + 0x388 (0x00007ff9`b8ac667c)
CORECLR! MethodDesc::GetPrecompiledR2RCode + 0xD8 (0x00007ff9`b88f28c0)
CORECLR! MethodDesc::GetPrecompiledCode + 0xBE (0x00007ff9`b88f2626)
CORECLR! MethodDesc::PrepareILBasedCode + 0x20D (0x00007ff9`b88f5c3d)
    File: D:\git\runtime6\src\coreclr\src\vm\genmeth.cpp Line: 810
    Image: d:\git\runtime6\artifacts\tests\coreclr\Windows_NT.x64.Checked\Tests\Core_Root\CoreRun.exe
Expected: 100
Actual: -1073740286
END EXECUTION - FAILED

In the test source file Generated1.il, the test fails at line 468,

call void Framework::M.MyStruct51.A<valuetype MyStruct51`1<class BaseClass0>>(!!0,string) leave.s LV1

in the call to

				call void Framework::M.MyStruct51.A>(!!0,string) leave.s LV1

The method source starts at line 218 in the same file:

.method static void M.MyStruct51.A<(valuetype MyStruct51`1<class BaseClass0>)W>(!!W 'inst', string exp) cil managed {

	.method static void M.MyStruct51.A<(valuetype MyStruct51`1)W>(!!W 'inst', string exp) cil managed {
		.maxstack 9
		.locals init (string[] actualResults)
		ldc.i4.s    4
		newarr      string
		stloc.s     actualResults
		ldarg.1
		ldstr "M.MyStruct51.A<(valuetype MyStruct51`1)W>(!!W 'inst', string exp)"
		ldc.i4.s 4
		ldloc.s      actualResults
		ldc.i4.s     0
		ldarga.s     0
		constrained. valuetype MyStruct51`1
		callvirt     instance string IBase0::Method0()
		stelem.ref

As you can easily see, it starts out with a constrained virtual call to IBase0.Method0(). CG1 R2RDump for the beginning of the method looks like this:

void Framework.M.MyStruct51.A>(MyStruct51`1<__Canon>, string)
Handle: 0x06000017
...
    5850: 57                    push    rdi
                                UWOP_PUSH_NONVOL RDI(7)
    5851: 56                    push    rsi
                                UWOP_PUSH_NONVOL RSI(6)
    5852: 53                    push    rbx
                                UWOP_PUSH_NONVOL RBX(3)
    5853: 48 83 ec 20           sub     rsp, 32
                                UWOP_ALLOC_SMALL 32
    5857: 48 89 54 24 48        mov     qword ptr [rsp + 72], rdx
    585c: 49 8b f0              mov     rsi, r8
    585f: b9 04 00 00 00        mov     ecx, 4
    5864: ff 15 16 b8 ff ff     call    qword ptr [0x1080]    // string[] (NEW_ARRAY)
    586a: 48 8b f8              mov     rdi, rax
    586d: 48 8b 15 54 ba ff ff  mov     rdx, qword ptr [0x12c8] // MyStruct51`1 (TYPE_DICTIONARY)
    5874: 48 8d 4c 24 48        lea     rcx, [rsp + 72]
    5879: ff 15 39 b9 ff ff     call    qword ptr [0x11b8]    // string IBase0.Method0() @ MyStruct51`1 (METHOD_ENTRY)

GC2 R2RDump for the same method looks like this:

void Framework.M.MyStruct51.A>(MyStruct51`1<__Canon>, string)
Handle: 0x06000017
...
    2806: 48 89 54 24 48        mov     qword ptr [rsp + 72], rdx
    280b: 49 8b f0              mov     rsi, r8
    280e: b9 04 00 00 00        mov     ecx, 4
    2813: ff 15 07 3a 00 00     call    qword ptr [0x6220]    // string[] (NEW_ARRAY)
    2819: 48 8b f8              mov     rdi, rax
    281c: 48 8b 15 ed 3b 00 00  mov     rdx, qword ptr [0x6410] // MyStruct51`1 (TYPE_DICTIONARY)
    2823: 48 8d 4c 24 48        lea     rcx, [rsp + 72]
    2828: ff 15 4a 3a 00 00     call    qword ptr [0x6278]    // string MyStruct51`1<__Canon>.Method0() @ MyStruct51`1 (METHOD_ENTRY)

When I was stepping through the CoreRun execution, it seems to me that FindOrCreateAssociatedMethodDesc steps through base class chain starting at MyStruct51`1<__Canon> and so it misses the actual Method0 as that is on the IBase0 interface. I may have missed something in the relevant logic, I just tried to make sure that the generated code looks the same. I have actually recently found another bug I'll describe in another comment as this is getting too lengthy ;-).

@trylek
Copy link
Member Author

trylek commented Jun 25, 2020

For the other bug, that affects six tests under the TypeGeneratorTests, numbers 612, 613, 614, 681, 682, and 683. Its manifestation in the log is the following:

========================== Constrained Calls Test ==========================
...
M.IBase1.A<(class IBase1`1)W>(!!W 'inst', string exp)
   # count = 3
   # expectedResultsArray.Length = 3
      # expectedResultsArray[0] = 'G2_C21::Method4.4929()'
      # expectedResultsArray[1] = 'G2_C21::Method5.4930()'
      # expectedResultsArray[2] = 'G2_C21::Method6.4931()'
    -> EXPECTED: G2_C21::Method4.4929()
    -> GOT:      G2_C21::Method4.4929()
    -> EXPECTED: G2_C21::Method5.4930()
    -> GOT:      G2_C21::Method5.4930()
    -> EXPECTED: G2_C21::Method6.4931()
    -> GOT:      G2_C21::Method6.4931()
Unhandled exception. System.BadImageFormatException: An attempt was made to load a program with an incorrect format. (0x8007000B (COR_E_BADIMAGEFORMAT))
   at Framework.M.G3_C1084.T[T0,W](W inst, String exp)
   at Framework.ConstrainedCallsTest()
   at Framework.Main()
Expected: 100
Actual: -532462766
END EXECUTION - FAILED
FAILED

In this case the problem is caused by the fact that our handling of pResolvedToken.hClass in resolveToken and of exactType in ceeInfoGetCallInfo is slightly imprecise from CG1 perspective: for a MemberRef token, resolveToken should set hClass to the type corresponding to the Parent of the MemberReference, not to the OwningType of the resolved method.

Subtle differences between these two may arise in case the MemberReference is a method on instantiated type and its lookup in the method tables finds it in a base class. In such case, CG1 retains the "declaring" type as the exactType whereas we blindly set hClass to the method OwningType which is the base type in this case. That later makes us generate a slightly different codegen than CG1, for instance in the test 612:

CG1:

void Framework.M.G3_C1084.T<__Canon, __Canon>(__Canon, string)
Handle: 0x0600001B
...
    a272: ff 15 d8 71 ff ff     call    qword ptr [0x1450]    // string G3_C1084`1<__Canon>.ClassMethod1326() (VIRTUAL_ENTRY)

Whereas CG2 generates:

void Framework.M.G3_C1084.T<__Canon, __Canon>(__Canon, string)
Handle: 0x0600001B
...
    61e2: ff 15 50 af 00 00     call    qword ptr [0x11138]   // string G1_C6`2.ClassMethod1326() (VIRTUAL_ENTRY)

I have a partial local fix but I'm still struggling with properly instatiating the exact type...

@trylek
Copy link
Member Author

trylek commented Jun 25, 2020

For further reference, this is the list of the "sixty six tests" - each line corresponds to a contiguous interval of test numbers:

1
14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
62 63 64 65 66 67 68 69 70
171
192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215
292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307

@trylek
Copy link
Member Author

trylek commented Jun 26, 2020

I chatted offline with DavidWr. My primary motivation for this PR has been to raise visibility of these remaining corner cases where CG2 codegen doesn't yet fully comply with CoreCLR runtime requirements as indicated by the Pri1 test failures, not to merge some hasty hotfix in. I'm going to spend the remaining time before I go on vacation on Saturday updating this PR by adding minimized repro examples of the two bugs I've found to the Crossgen2 smoke test and hopefully proposing a more coherent compiler change to fix them (I'm still hitting several issues in my local testing). The title bug is specific to the small version (non-composite) bubble build mode, the other bug I described in the thread related to exactType imprecision is common to composite and non-composite build mode.

@trylek trylek force-pushed the AssociatedMethodBug branch from 70e0758 to 1f5e8a2 Compare July 9, 2020 11:37
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants