Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve startup and warmup performance for invoke by reducing emit usage #109901

Open
wants to merge 28 commits into
base: main
Choose a base branch
from

Conversation

steveharter
Copy link
Member

@steveharter steveharter commented Nov 17, 2024

This increases startup and warmup performance by:

  • Re-using previously emitted invoke stubs by:
    • Changing from the Call opcode to the CallI opcode when possible. CallI uses a function pointer as a stack parameter while Call (and CallVirt) use a method handle as an operand. This means Calli can re-use an emitted method as long as the signature is compatible while Call uses a method handle as an operand meaning the handle is embedded in the IL, and thus the generated IL is coupled to that specific method and can't be re-used. See Change IL Emit invoke path to use function pointers and OpCodes.Calli #75357.
    • Changing from the Newobj opcode to calling the internal GetUninitializedObject() and then calling the constructor. Since Newobj, like Call, are based on method handles, this prevented re-use of emitted methods. Constructors can now be cached and re-used the same way as non-constructor methods. This also will make this issue obsolete: Have ConstructorInfo use Activator.CreateInstance() instead of emit for default constructors #78917.
    • Caching of previously generated methods along with hashcode lookup are based on "compatible" methods. This and the use of CallI only applies to methods that can't be overridden since CallI does not support vtables and adding vtable support before the CallI would be too slow. The compatibility rules are nuanced, but basically methods are compatible if they have the same number of parameters and the same parameter\return types except that reference type parameters can all be considered object which helps with the normalization.
  • Avoid using emit for specific well-known methods called during CoreClr startup. See issue EventSource use of EventAttribute triggers dynamic methods in attribute usage #90405.
    • We can improve on this after this PR so that we don't have to hard-code these well-known methods - see notes in the code and the comments here about adding an intrinsic to call function pointers on instances. This means many other methods, such as all property getters\setters on reference types, will not need to be emitted.

This PR also changes the heuristics of when emit is used:

  • Today we do not emit the invoke stub until the 2nd call on the same method (we use the interpreted path on the 1st call) which helps with startup\warmup and with methods only called once.
  • This PR instead uses emit on the 1st call unless it is one of the well-known startup cases or a compatible method was previously emitted. This means the interpreted path for CoreClr is not used by default any longer, but the feature switch Switch.System.Reflection.ForceInterpretedInvoke is still supported. For CoreClr, we could remove the interpreted path pending findings -- the interpreted path is a large amount of code and also doesn't work with certain scenarios such as debugging in Visual Studio, so it would be a win to remove it.

Performance notes:

  • Cold start cases in a "hello world" console app scenario saves ~5% of startup time by reducing emit + JIT cost.
  • Due to the caching of compatible methods, the number of emitted methods will be significantly less over the lifetime of an application if it uses reflection across many types. This helps to reduce memory and improve gen2 collections.
  • Throughput will remain about the same - there will be minor differences due to the use of CallI and function pointers as well as refactoring. Some benchmarks will be slightly slower, and some slightly faster.
  • Some basic notes of emit vs. interpreted:
    • Say invoking an existing IL-emitted, JIT'd method cost 1x.
    • Invoking via interpreted method cost ~5x-15x.
      • Note that interpreted has been optimized over the last 2 releases (in the managed code) so this is a current relative comparison.
    • Emitting a method + JIT cost 3000x.
    • Using a 10x delta between interpreted vs. emit (say 11x for interpreted - 1x for emitted method = 10x) we get (3000x / 10x) = 300x meaning an emitted method needs to be called ~300 times to break-even but after that it is ~10x faster. The large 3000x cost is avoided by both re-using emitted methods and by avoiding well-known cases during startup, however, note again that we also emit on the 1st call instead of 2nd.
Jit logging

BEFORE:

   1: JIT compiled System.Threading.Thread:GetThreadStaticsBase() [Tier0, IL size=18, code size=34]
   2: JIT compiled System.Guid:FormatGuidVector128Utf8(System.Guid,ubyte) [Tier0, IL size=310, code size=546]
   3: JIT compiled System.HexConverter:AsciiToHexVector128(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte]) [Tier0, IL size=76, code size=377]
   4: JIT compiled System.Runtime.Intrinsics.Vector128:ShuffleUnsafe(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte]) [Tier0, IL size=29, code size=47]
   5: JIT compiled System.RuntimeType+IGenericCacheEntry`1[System.__Canon]:CreateAndCache(System.RuntimeType) [Instrumented Tier0, IL size=165, code size=1081]
   6: JIT compiled System.Buffers.SearchValues:TryGetSingleRange[ushort](System.ReadOnlySpan`1[ushort],byref,byref) [Tier-0 switched to FullOpts, IL size=294, code size=386]
   7: JIT compiled System.Buffers.AsciiCharSearchValues`2[System.Buffers.IndexOfAnyAsciiSearcher+Default,System.Buffers.SearchValues+FalseConst]:IndexOfAny(System.ReadOnlySpan`1[ushort]) [Tier0, IL size=30, code size=98]
   8: JIT compiled System.Buffers.IndexOfAnyAsciiSearcher:IndexOfAny[System.Buffers.IndexOfAnyAsciiSearcher+DontNegate,System.Buffers.IndexOfAnyAsciiSearcher+Default,System.Buffers.SearchValues+FalseConst](byref,int,byref) [Tier0, IL size=9, code size=45]
   9: JIT compiled System.Buffers.IndexOfAnyAsciiSearcher:IndexOfAnyCore[int,System.Buffers.IndexOfAnyAsciiSearcher+DontNegate,System.Buffers.IndexOfAnyAsciiSearcher+Default,System.Buffers.SearchValues+FalseConst,System.Buffers.IndexOfAnyAsciiSearcher+IndexOfAnyResultMapper`1[short]](byref,int,byref) [Instrumented Tier0, IL size=572, code size=1399]
  10: JIT compiled System.Buffers.IndexOfAnyAsciiSearcher:IndexOfAnyLookup[System.Buffers.IndexOfAnyAsciiSearcher+DontNegate,System.Buffers.IndexOfAnyAsciiSearcher+Default,System.Buffers.SearchValues+FalseConst](System.Runtime.Intrinsics.Vector256`1[short],System.Runtime.Intrinsics.Vector256`1[short],System.Runtime.Intrinsics.Vector256`1[ubyte]) [Tier0, IL size=41, code size=181]
  11: JIT compiled System.Buffers.IndexOfAnyAsciiSearcher+Default:PackSources(System.Runtime.Intrinsics.Vector256`1[ushort],System.Runtime.Intrinsics.Vector256`1[ushort]) [Tier0, IL size=18, code size=49]
  12: JIT compiled System.Buffers.IndexOfAnyAsciiSearcher:IndexOfAnyLookupCore[System.Buffers.SearchValues+FalseConst](System.Runtime.Intrinsics.Vector256`1[ubyte],System.Runtime.Intrinsics.Vector256`1[ubyte]) [Tier0, IL size=77, code size=188]
  13: JIT compiled (dynamicClass):InvokeStub_EventAttribute.set_Level(System.Object,System.Object,ulong) [FullOpts, IL size=25, code size=27]
  14: JIT compiled (dynamicClass):InvokeStub_EventAttribute.set_Message(System.Object,System.Object,ulong) [FullOpts, IL size=25, code size=28]
  15: JIT compiled (dynamicClass):InvokeStub_EventAttribute.set_Task(System.Object,System.Object,ulong) [FullOpts, IL size=25, code size=27]
  16: JIT compiled (dynamicClass):InvokeStub_EventAttribute.set_Opcode(System.Object,System.Object,ulong) [FullOpts, IL size=25, code size=27]
  17: JIT compiled (dynamicClass):InvokeStub_EventAttribute.set_Version(System.Object,System.Object,ulong) [FullOpts, IL size=25, code size=28]
  18: JIT compiled (dynamicClass):InvokeStub_EventAttribute.set_Keywords(System.Object,System.Object,ulong) [FullOpts, IL size=25, code size=28]
  19: JIT compiled (dynamicClass):InvokeStub_EventAttribute.set_Level(System.Object,System.Span`1[System.Object]) [FullOpts, IL size=36, code size=40]
  20: JIT compiled (dynamicClass):InvokeStub_EventAttribute.set_Message(System.Object,System.Span`1[System.Object]) [FullOpts, IL size=26, code size=37]
  21: JIT compiled (dynamicClass):InvokeStub_EventAttribute.set_Task(System.Object,System.Span`1[System.Object]) [FullOpts, IL size=36, code size=40]
  22: JIT compiled (dynamicClass):InvokeStub_EventAttribute.set_Opcode(System.Object,System.Span`1[System.Object]) [FullOpts, IL size=36, code size=40]
  23: JIT compiled (dynamicClass):InvokeStub_EventAttribute.set_Version(System.Object,System.Span`1[System.Object]) [FullOpts, IL size=36, code size=41]
  24: JIT compiled (dynamicClass):InvokeStub_EventAttribute.set_Keywords(System.Object,System.Span`1[System.Object]) [FullOpts, IL size=36, code size=41]

GetTotalMemory:733088
GetTotalAllocatedBytes:731192
GetTotalMemory after gc:278736

ELAPSED MS TO START CONSOLE APP: 68 (average)

AFTER:

   1: JIT compiled System.Threading.Thread:GetThreadStaticsBase() [Tier0, IL size=18, code size=34]
   2: JIT compiled System.Guid:FormatGuidVector128Utf8(System.Guid,ubyte) [Tier0, IL size=310, code size=546]
   3: JIT compiled System.HexConverter:AsciiToHexVector128(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte]) [Tier0, IL size=76, code size=377]
   4: JIT compiled System.Runtime.Intrinsics.Vector128:ShuffleUnsafe(System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte]) [Tier0, IL size=29, code size=47]
   5: JIT compiled System.RuntimeType+IGenericCacheEntry`1[System.__Canon]:CreateAndCache(System.RuntimeType) [Instrumented Tier0, IL size=165, code size=1081]
   6: JIT compiled System.Buffers.SearchValues:TryGetSingleRange[ushort](System.ReadOnlySpan`1[ushort],byref,byref) [Tier-0 switched to FullOpts, IL size=294, code size=386]
   7: JIT compiled System.Buffers.AsciiCharSearchValues`2[System.Buffers.IndexOfAnyAsciiSearcher+Default,System.Buffers.SearchValues+FalseConst]:IndexOfAny(System.ReadOnlySpan`1[ushort]) [Tier0, IL size=30, code size=98]
   8: JIT compiled System.Buffers.IndexOfAnyAsciiSearcher:IndexOfAny[System.Buffers.IndexOfAnyAsciiSearcher+DontNegate,System.Buffers.IndexOfAnyAsciiSearcher+Default,System.Buffers.SearchValues+FalseConst](byref,int,byref) [Tier0, IL size=9, code size=45]
   9: JIT compiled System.Buffers.IndexOfAnyAsciiSearcher:IndexOfAnyCore[int,System.Buffers.IndexOfAnyAsciiSearcher+DontNegate,System.Buffers.IndexOfAnyAsciiSearcher+Default,System.Buffers.SearchValues+FalseConst,System.Buffers.IndexOfAnyAsciiSearcher+IndexOfAnyResultMapper`1[short]](byref,int,byref) [Instrumented Tier0, IL size=572, code size=1399]
  10: JIT compiled System.Buffers.IndexOfAnyAsciiSearcher:IndexOfAnyLookup[System.Buffers.IndexOfAnyAsciiSearcher+DontNegate,System.Buffers.IndexOfAnyAsciiSearcher+Default,System.Buffers.SearchValues+FalseConst](System.Runtime.Intrinsics.Vector256`1[short],System.Runtime.Intrinsics.Vector256`1[short],System.Runtime.Intrinsics.Vector256`1[ubyte]) [Tier0, IL size=41, code size=181]
  11: JIT compiled System.Buffers.IndexOfAnyAsciiSearcher+Default:PackSources(System.Runtime.Intrinsics.Vector256`1[ushort],System.Runtime.Intrinsics.Vector256`1[ushort]) [Tier0, IL size=18, code size=49]
  12: JIT compiled System.Buffers.IndexOfAnyAsciiSearcher:IndexOfAnyLookupCore[System.Buffers.SearchValues+FalseConst](System.Runtime.Intrinsics.Vector256`1[ubyte],System.Runtime.Intrinsics.Vector256`1[ubyte]) [Tier0, IL size=77, code size=188]

GetTotalMemory:708648
GetTotalAllocatedBytes:706824
GetTotalMemory after gc:276120

ELAPSED MS TO START CONSOLE APP: 65 (average)

Invoke Benchmarks
Method Job Toolchain Mean Error StdDev Median Min Max Ratio RatioSD Gen0 Allocated Alloc Ratio
Method0_NoParms Job-NZRJJY \RFP_AFTER\corerun.exe 7.901 ns 0.1487 ns 0.1461 ns 7.832 ns 7.757 ns 8.281 ns 1.00 0.03 - - NA
Method0_NoParms Job-PAMZLT \RFP_BEFORE\corerun.exe 9.768 ns 0.1867 ns 0.1917 ns 9.704 ns 9.548 ns 10.090 ns 1.24 0.03 - - NA
Method0_NoParms_MethodInvoker Job-NZRJJY \RFP_AFTER\corerun.exe 4.348 ns 0.0856 ns 0.0840 ns 4.326 ns 4.236 ns 4.519 ns 1.00 0.03 - - NA
Method0_NoParms_MethodInvoker Job-PAMZLT \RFP_BEFORE\corerun.exe 5.083 ns 0.1009 ns 0.0944 ns 5.088 ns 4.959 ns 5.243 ns 1.17 0.03 - - NA
StaticMethod4_arrayNotCached_int_string_struct_class Job-NZRJJY \RFP_AFTER\corerun.exe 43.653 ns 1.3541 ns 1.5594 ns 43.378 ns 40.872 ns 46.095 ns 1.00 0.05 0.0098 104 B 1.00
StaticMethod4_arrayNotCached_int_string_struct_class Job-PAMZLT \RFP_BEFORE\corerun.exe 39.242 ns 1.3658 ns 1.5728 ns 39.269 ns 36.775 ns 41.797 ns 0.90 0.05 0.0098 104 B 1.00
StaticMethod5_arrayNotCached_int_string_struct_class_bool Job-NZRJJY \RFP_AFTER\corerun.exe 58.148 ns 1.9428 ns 2.2373 ns 57.517 ns 55.204 ns 63.050 ns 1.00 0.05 0.0128 136 B 1.00
StaticMethod5_arrayNotCached_int_string_struct_class_bool Job-PAMZLT \RFP_BEFORE\corerun.exe 53.039 ns 1.8366 ns 2.1150 ns 52.872 ns 49.520 ns 56.596 ns 0.91 0.05 0.0130 136 B 1.00
StaticMethod4_int_string_struct_class Job-NZRJJY \RFP_AFTER\corerun.exe 32.213 ns 0.5856 ns 0.5478 ns 32.183 ns 31.245 ns 33.118 ns 1.00 0.02 - - NA
StaticMethod4_int_string_struct_class Job-PAMZLT \RFP_BEFORE\corerun.exe 24.181 ns 0.3833 ns 0.3398 ns 24.189 ns 23.598 ns 24.756 ns 0.75 0.02 - - NA
StaticMethod4_int_string_struct_class_MethodInvoker Job-NZRJJY \RFP_AFTER\corerun.exe 11.901 ns 0.1517 ns 0.1419 ns 11.840 ns 11.723 ns 12.167 ns 1.00 0.02 - - NA
StaticMethod4_int_string_struct_class_MethodInvoker Job-PAMZLT \RFP_BEFORE\corerun.exe 11.404 ns 0.1341 ns 0.1189 ns 11.374 ns 11.250 ns 11.663 ns 0.96 0.01 - - NA
StaticMethod4_int_string_struct_class_MethodInvokerWithSpan Job-NZRJJY \RFP_AFTER\corerun.exe 14.772 ns 0.1207 ns 0.1008 ns 14.750 ns 14.609 ns 15.002 ns 1.00 0.01 - - NA
StaticMethod4_int_string_struct_class_MethodInvokerWithSpan Job-PAMZLT \RFP_BEFORE\corerun.exe 12.082 ns 0.2015 ns 0.1885 ns 12.055 ns 11.731 ns 12.384 ns 0.82 0.01 - - NA
StaticMethod4_ByRefParams_int_string_struct_class Job-NZRJJY \RFP_AFTER\corerun.exe 110.731 ns 1.3235 ns 1.2380 ns 110.916 ns 108.507 ns 112.723 ns 1.00 0.02 0.0042 48 B 1.00
StaticMethod4_ByRefParams_int_string_struct_class Job-PAMZLT \RFP_BEFORE\corerun.exe 88.206 ns 0.8195 ns 0.6843 ns 88.356 ns 86.975 ns 89.300 ns 0.80 0.01 0.0043 48 B 1.00
StaticMethod4_ByRefParams_int_string_struct_class_MethodInvoker Job-NZRJJY \RFP_AFTER\corerun.exe 72.451 ns 1.6013 ns 1.8441 ns 72.552 ns 69.641 ns 74.968 ns 1.00 0.04 0.0045 48 B 1.00
StaticMethod4_ByRefParams_int_string_struct_class_MethodInvoker Job-PAMZLT \RFP_BEFORE\corerun.exe 79.030 ns 1.2487 ns 1.1069 ns 79.062 ns 76.658 ns 81.099 ns 1.09 0.03 0.0046 48 B 1.00
StaticMethod5_ByRefParams_int_string_struct_class_bool Job-NZRJJY \RFP_AFTER\corerun.exe 146.490 ns 2.2497 ns 1.9943 ns 145.968 ns 144.088 ns 151.078 ns 1.00 0.02 0.0064 72 B 1.00
StaticMethod5_ByRefParams_int_string_struct_class_bool Job-PAMZLT \RFP_BEFORE\corerun.exe 129.627 ns 2.5337 ns 2.6019 ns 129.785 ns 126.652 ns 134.166 ns 0.89 0.02 0.0067 72 B 1.00
Ctor0_NoParams Job-NZRJJY \RFP_AFTER\corerun.exe 11.444 ns 0.3741 ns 0.4308 ns 11.390 ns 10.724 ns 12.487 ns 1.00 0.05 0.0061 64 B 1.00
Ctor0_NoParams Job-PAMZLT \RFP_BEFORE\corerun.exe 9.322 ns 0.4161 ns 0.4625 ns 9.208 ns 8.787 ns 10.406 ns 0.82 0.05 0.0061 64 B 1.00
Ctor0_NoParams_ConstructorInvoker Job-NZRJJY \RFP_AFTER\corerun.exe 6.897 ns 0.2156 ns 0.2307 ns 6.896 ns 6.419 ns 7.323 ns 1.00 0.05 0.0061 64 B 1.00
Ctor0_NoParams_ConstructorInvoker Job-PAMZLT \RFP_BEFORE\corerun.exe 7.439 ns 0.5270 ns 0.6069 ns 7.193 ns 6.696 ns 8.602 ns 1.08 0.09 0.0061 64 B 1.00
Ctor0_NoParams_Reinvoke Job-NZRJJY \RFP_AFTER\corerun.exe 7.688 ns 0.1035 ns 0.0968 ns 7.729 ns 7.546 ns 7.833 ns 1.00 0.02 - - NA
Ctor0_NoParams_Reinvoke Job-PAMZLT \RFP_BEFORE\corerun.exe 40.375 ns 0.5346 ns 0.5001 ns 40.289 ns 39.723 ns 41.317 ns 5.25 0.09 - - NA
Ctor0_ActivatorCreateInstance_NoParams Job-NZRJJY \RFP_AFTER\corerun.exe 7.519 ns 0.2008 ns 0.2232 ns 7.554 ns 7.184 ns 7.943 ns 1.00 0.04 0.0061 64 B 1.00
Ctor0_ActivatorCreateInstance_NoParams Job-PAMZLT \RFP_BEFORE\corerun.exe 6.843 ns 0.1775 ns 0.1899 ns 6.820 ns 6.510 ns 7.335 ns 0.91 0.04 0.0061 64 B 1.00
Ctor4_int_string_struct_class Job-NZRJJY \RFP_AFTER\corerun.exe 35.615 ns 0.7383 ns 0.8503 ns 35.905 ns 34.106 ns 37.059 ns 1.00 0.03 0.0060 64 B 1.00
Ctor4_int_string_struct_class Job-PAMZLT \RFP_BEFORE\corerun.exe 29.951 ns 1.1401 ns 1.3130 ns 29.848 ns 28.287 ns 32.622 ns 0.84 0.04 0.0061 64 B 1.00
Ctor4_int_string_struct_class_ConstructorInvoker Job-NZRJJY \RFP_AFTER\corerun.exe 15.954 ns 0.6178 ns 0.7114 ns 15.852 ns 14.832 ns 17.314 ns 1.00 0.06 0.0061 64 B 1.00
Ctor4_int_string_struct_class_ConstructorInvoker Job-PAMZLT \RFP_BEFORE\corerun.exe 13.922 ns 0.3645 ns 0.4051 ns 13.893 ns 13.268 ns 14.626 ns 0.87 0.05 0.0061 64 B 1.00
Ctor4_ActivatorCreateInstance Job-NZRJJY \RFP_AFTER\corerun.exe 249.778 ns 4.6488 ns 4.3485 ns 250.662 ns 242.854 ns 256.168 ns 1.00 0.02 0.0407 432 B 1.00
Ctor4_ActivatorCreateInstance Job-PAMZLT \RFP_BEFORE\corerun.exe 247.633 ns 3.6411 ns 3.4059 ns 247.807 ns 242.153 ns 253.467 ns 0.99 0.02 0.0410 432 B 1.00
Property_Get_int Job-NZRJJY \RFP_AFTER\corerun.exe 11.517 ns 0.3846 ns 0.4429 ns 11.517 ns 10.947 ns 12.339 ns 1.00 0.05 0.0023 24 B 1.00
Property_Get_int Job-PAMZLT \RFP_BEFORE\corerun.exe 11.955 ns 0.2372 ns 0.2636 ns 11.830 ns 11.633 ns 12.398 ns 1.04 0.04 0.0023 24 B 1.00
Property_Get_class Job-NZRJJY \RFP_AFTER\corerun.exe 8.501 ns 0.1201 ns 0.1123 ns 8.472 ns 8.375 ns 8.684 ns 1.00 0.02 - - NA
Property_Get_class Job-PAMZLT \RFP_BEFORE\corerun.exe 9.947 ns 0.1034 ns 0.0916 ns 9.943 ns 9.810 ns 10.138 ns 1.17 0.02 - - NA
Property_Set_int Job-NZRJJY \RFP_AFTER\corerun.exe 13.307 ns 0.1905 ns 0.1688 ns 13.293 ns 13.047 ns 13.639 ns 1.00 0.02 0.0023 24 B 1.00
Property_Set_int Job-PAMZLT \RFP_BEFORE\corerun.exe 14.808 ns 0.4008 ns 0.4289 ns 14.664 ns 14.193 ns 15.651 ns 1.11 0.03 0.0023 24 B 1.00
Property_Set_class Job-NZRJJY \RFP_AFTER\corerun.exe 10.823 ns 0.0987 ns 0.0771 ns 10.830 ns 10.668 ns 10.926 ns 1.00 0.01 - - NA
Property_Set_class Job-PAMZLT \RFP_BEFORE\corerun.exe 11.168 ns 0.2097 ns 0.1961 ns 11.178 ns 10.929 ns 11.555 ns 1.03 0.02 - - NA

fixes #90405
fixes #75357
fixes #78917

@janvorli
Copy link
Member

@steveharter do you plan to add support for other well known signatures than the zero arg with return value and one arg with void return ones? Based on our past discussion, we would like to eventually add ones that are on startup path of common types of apps like e.g. aspnet ones. Or were these mostly just those two basic cases?

@steveharter
Copy link
Member Author

@steveharter do you plan to add support for other well known signatures than the zero arg with return value and one arg with void return ones? Based on our past discussion, we would like to eventually add ones that are on startup path of common types of apps like e.g. aspnet ones. Or were these mostly just those two basic cases?

That approach (zero arg + one arg) was for property getters\setters however it didn't work for instance methods since function pointers don't support instance methods (sometimes it worked for simple methods, but other times it cause heap corruption) so I had to change it to very-hard-coded list of specific types and methods used during startup. That could be extended for other types based on the applications we want to check that are sensitive to startup\warmup (e.g. ASP.NET, WinForms)

However, that is somewhat untenable so we should really get function pointers to support instance methods, so I created the language proposal at dotnet/csharplang#8709. This would automatically bring in a ton of all methods, perhaps up to 50%, since it would cover most property getters and setters and other simple methods.

Another approach, compatible with the function pointer + instance methods above, is to add an API to let the application specify their own delegate for method that they do not want to cause emit\jit. E.g.:

public class MethodBase
{
    // We would support only our built-in delegates that are for property getters\setters, <= 4 args, byref args, etc.
+    void SetInvokeImplementation(Delegate delegate);
}
   

@jkotas
Copy link
Member

jkotas commented Dec 3, 2024

function pointers to support instance methods,

We can emit the function-pointer based thunks for arbitrary instance methods as runtime intrinsics. We do not need first class C# support for instance method function pointers to make it work.


public static int GetHashCode(Type? declaringType, Type[] parameterTypes, Type returnType)
{
int hashcode = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this impl can be simplified by using HashCode. The explicit use of int.Rotate* to construct a new hash seems unnecessary when we have utilities to combine and construct a well defined new hash.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HashCode is randomized hashcode. For this one, I think we may be rather going towards non-randomized stable hashcode so that would allow us to create hashtable of precompiled signatures at build time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah. That makes sense. Perhaps a comment on that expectation is warranted then?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently the hashcode is based on combining several calls to System.Type.GetHashCode(), so it is not stable. It would have to use Type.FullName. I can make that change here, but since it not needed yet I think I should take the original suggestion of using HashCode.Combine() -- I checked that perf against current (rotate \ xor) and it's a wash.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also with a Type.FullName hashcode, I'd have to find or implement a fast stable version of that since the default is randomized.

public Type ReturnType => _returnType;
public bool IsStatic => _isStatic;

public static bool AlternativeEquals(in InvokeSignatureInfoKey @this, InvokeSignatureInfo signatureInfo)
Copy link
Member

@AaronRobinsonMSFT AaronRobinsonMSFT Jan 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should avoid the @this pattern. It is rarely used in SPCL and I personally dislike it. The english language has enough words that we can avoid using keywords. In this case, I would suggest inst, instance, or key.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thanks.

@steveharter steveharter changed the title [draft] Improve startup and warmup performance for invoke by reducing emit usage Improve startup and warmup performance for invoke by reducing emit usage Jan 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
4 participants