Cached interface dispatch for coreclr #111771

davidwrighton · 2025-01-24T00:18:20Z

Enabling cached interface dispatch as an options for CoreCLR (should reduce memory usage/remove RWX pages, at the cost of reducing performance)

Current implementation is only enabled in release builds for Apple platforms with restrictions on code generation
On Debug/Checked builds of X64/Arm64 platforms it is possible to enable the feature by setting the DOTNET_UseCachedInterfaceDispatch environment variable to 1. (NOTE: Enabling this feature requires running on a processor which supports 128 bit compare and swap, which has implications on Linux X64 builds, and would have implications for Loongarch/RiscV if we enable the code there.)
The strategy is to re-use the existing VirtualCallStubManager infrastructure for all non-code-generation driven lookups, but to replace the stub generation logic with the CachedInterfaceDispatch paths from NativeAOT.
In addition, to support this, we need to extend the size of a Dispatch cell embedded in R2R images, so various parts of that logic are now capable of generating double pointer aligned dispatch cells when commanded. Infrastructure to set the right behavior for targetting apple platforms has not yet been implemented although the general purpose support is in place.

Known issues addressed before making a non-draft PR

… on shared things, and parts that are not shared

…ally in place

…an be switched between

… not yet supported

…rface dispatch

…ce dispatch or virtual stub dispatch

dotnet-policy-service · 2025-01-24T00:19:23Z

Tagging subscribers to this area: @mangod9
See info in area-owners.md if you want to be subscribed.

…te that this requires adding the -mcx16 switch to clang, so that cmpxchg16b instruction gets generated, which is an increase in the baseline CPU required by CoreCLR on Linux, and isn't likely to be OK for shipping publicly

…veAOT cached interface dispatch implementation (as it isn't actually used) Update IsIPinVirtualStub to check the AVLocations, not the stub entry points

…e_dispatch_for_coreclr

…hook up the VTable offset logic and such (vtable paths are untested)

- Enable generating double pointer indirection cells in R2R files using command line switch. - Fix VTableOffset calculation - Add logic in ExternalMethodFixupWorker to handle the double pointer indirection cells.

…llocating the memory for the dispatch using the LoaderHeap Also tweak a collectible assembly test to actually use cached interface dispatch

src/coreclr/shared_runtime/CachedInterfaceDispatch.cpp

src/coreclr/pal/inc/pal.h

…interface_dispatch_for_coreclr

…g scenarios

src/coreclr/clrfeatures.cmake

am11 · 2025-02-04T15:38:37Z

src/coreclr/clrfeatures.cmake

+else()
+  set(FEATURE_CORECLR_VIRTUAL_STUB_DISPATCH 1)


Suggested change

set(FEATURE_CORECLR_CACHED_INTERFACE_DISPATCH 1)

else()

set(FEATURE_CORECLR_VIRTUAL_STUB_DISPATCH 1)

set(FEATURE_CACHED_INTERFACE_DISPATCH 1)

else()

set(FEATURE_VIRTUAL_STUB_DISPATCH 1)

src/coreclr/clrfeatures.cmake

jkotas · 2025-02-04T17:12:33Z

src/coreclr/jit/compiler.h

-                reg     = REG_R11;
-                regMask = RBM_R11;
-            }
+            reg     = REG_R11;


https://github.com/dotnet/runtime/blob/main/docs/design/coreclr/botr/clr-abi.md#hidden-parameters should be updated too.

AAGGHH.. reading this has sent me down the rabbit hole of sadness which has told me that I need to move both CoreCLR and NativeAOT to use r10, and not to r11 as that is the only way to keep CFG from exploding sadly. (Or I can keep the difference between CoreCLR and NativeAOT).

I like the idea of getting these unified as you may guess.

Or rather... this won't break stuff, but the codegen for CFG is required to be a bit non-ideal in native aot..OTOH, it appears with some quick testing that this doesn't actually effect the codegen in any case, since our CFG generation logic isn't what I'd call great right now, so I'd like to keep moving everything to r11. If we want to use r10 we can swap back all of the amd64 stuff at once.

jkotas · 2025-02-06T16:24:38Z

src/coreclr/vm/jitinterface.cpp

+            DispatchToken token = VirtualCallStubManager::GetTokenFromFromOwnerAndSlot(ownerType, slot);
+
+            INTERFACE_DISPATCH_CACHED_OR_VSD(
+                return FALSE; // R2R interface dispatch currently only supports fixups with a single pointer, return FALSE to skip using the method


The PR has changes to add this support. Did you hit any roadblocks with making this actually work?

I did not. But the support is only for fixups which are used directly for dispatch. This code path is used when gathering a fixup as a function pointer to use directly such as embedded into a delegate or something. I believe that this logic may only actually used for R2R versioning where we replace a non-virtual method with a virtual method and construct a delegate. I'm not entirely sure. At the very least I should write a clearer comment. I'll also see if I can actually trigger this scenario. It's quite possible that the existing R2R behavior with VSD is actually not compliant with our latest VM logic.

This is actually completely dead code. The VIRTUAL_ENTRY_SLOT path was specific to NGEN. (So are the VIRTUAL_ENTRY_DEF/REF_TOKEN paths, but those might actually be useful for R2R as a load simplification, so I'm not touching those. I'm going to delete the logic that implements VIRTUAL_ENTRY_SLOT.

src/coreclr/nativeaot/Runtime/amd64/CachedInterfaceDispatchAot.S

src/coreclr/debug/daccess/request.cpp

janvorli · 2025-02-06T19:45:59Z

src/coreclr/minipal/minipal.h

@@ -76,3 +76,16 @@ class VMToOSInterface
    //  true if it succeeded, false if it failed
    static bool ReleaseRWMapping(void* pStart, size_t size);
 };
+
+#if defined(HOST_64BIT) && defined(FEATURE_CACHED_INTERFACE_DISPATCH)


It would be nice if we added the InterlockedCompareExchange128 to the VMToOSInterface above and implemented it for Unix / Windows in the respective subfolders.
When I have started with this minipal, the intent was that all stuff that needs to be platform specific will go there. Once we get rid of coreclr PAL, this was meant to be the only place for platform specific code.

Unfortunately, this is a case where we want to have a shared set of PAL apis between CoreCLR and NativeAOT, and NativeAOT hasn't moved towards the VmToOSInterface idea (And this change is already WAY too big to add that to it.) I'd welcome some rationalization after this is all merged.

janvorli · 2025-02-06T19:47:42Z

src/coreclr/nativeaot/Runtime/CMakeLists.txt

@@ -1,9 +1,11 @@
 set(GC_DIR ../../gc)
+set(SHARED_RUNTIME_DIR ../../shared_runtime)


A nit - we don't have any folders that have names with underscores, it would be nice to stay consistent.

I'd like to hear opinions on the directory structure from several people here. @AaronRobinsonMSFT might also be interested to comment on this.

jit and gc are shared too, but they do not have shared_ in the name. I am not a fan of the shared_ prefix.

Maybe just call it runtime?

runtime seems appropriate. I agree that a prefix would be nice, but I agree with @janvorli's consistency argument. @jkotas's dislike of shared_ is something I share too. I think shared is also dangerous as it breeds a dumping ground akin to utils.

runtime it is. I'll make the changes soon.

I'll be leaving the CMAKE name of the directory as SHARED_RUNTIME_DIR though, as RUNTIME_DIR is already taken for src/coreclr/nativeaot/Runtime

Rename RUNTIME_DIR -> NATIVEAOT_RUNTIME_DIR and SHARED_RUNTIME_DIR -> RUNTIME_DIR? There are only 25 instances of RUNTIME_DIR spread over 4 files.

src/coreclr/vm/virtualcallstub.cpp

src/coreclr/nativeaot/Runtime/amd64/MiscStubs.asm

AaronRobinsonMSFT · 2025-02-07T08:42:10Z

src/coreclr/clrfeatures.cmake

+  # Allow 16 byte compare-exchange (cmpxchg16b)
+  add_compile_options($<${FEATURE_CORECLR_CACHED_INTERFACE_DISPATCH}:-mcx16>)
+endif()


Is this check for both UNIX/AMD64 and if FEATURE_CORECLR_CACHED_INTERFACE_DISPATCH is set?

I'm definitely not a CMake expert, but I think this syntax means if FEATURE_CORECLR_CACHED_INTERFACE_DISPATCH is non-zero, then apply the flag. If so, can't we remove the explicit if?

My push back here if we want to expand this, then at present we need to update two locations instead of one, where FEATURE_CORECLR_CACHED_INTERFACE_DISPATCH is set.

You can combine all the options into the generator expression, but it usually becomes pretty unreadable with all the $< that get accumulated. So I usually use it just for the build configs or stuff where the expression is simple.

I agree with that. My point is about the need for the if check at all. Isn't the generator already conditioned on ${FEATURE_CORECLR_CACHED_INTERFACE_DISPATCH} being true? If so, why do we need the check above it?

Ah, that's right, I've missed this point. We don't need the if around that.

We don't want to set this flag on Windows or when targeting any other processor type than X64. So we still need the if. FEATURE_CORECLR_CACHED_INTERFACE_DISPATCH is sometimes enabled for Windows and it is really intended to be used for Arm64 in most cases.

So you're saying that FEATURE_CORECLR_CACHED_INTERFACE_DISPATCH is valid without the 16 byte compare exchange?

No, I'm saying that the -mcx16 switch is only valid/useful for the targeting X64 on unix platforms. I can't find ANY documentation that hints that this switch is useful on architectures other than X64. On Arm64, the baseline hardware we support requires 16 byte compare and swap, and the needed instructions are unconditionally available (via libatomic, they will either be the ldaxp and stxp pair or the caspal instruction, and on Mac/iOS the compiler will always general the caspal instruction. On Winodws, the -mcx16 switch generates a compiler warning and isn't useful, as all hardware which runs supported versions of Windows has the compare exchange 16 byte instruction.

AaronRobinsonMSFT · 2025-02-07T08:43:27Z

src/coreclr/debug/daccess/dacdbiimpl.cpp

        if (pVcsMgr->cache_entry_heap != NULL) heapsToEnumerate.Push(pVcsMgr->cache_entry_heap);
+#endif


Suggested change

#endif

#endif // FEATURE_VIRTUAL_STUB_DISPATCH

AaronRobinsonMSFT · 2025-02-07T08:53:57Z

src/coreclr/debug/daccess/request.cpp

@@ -3613,14 +3613,16 @@ ClrDataAccess::TraverseVirtCallStubHeap(CLRDATA_ADDRESS pAppDomain, VCSHeapType
                break;

            case CacheEntryHeap:
+#ifdef FEATURE_VIRTUAL_STUB_DISPATCH
                pLoaderHeap = pVcsMgr->cache_entry_heap;


Can we get a small comment here on why the CacheEntryHeap is not used in the caching logic? I'm ignorant of this area and it feels odd that the VirtualCallStubManager is used, but ignored in some cases. Perhaps an assert or some other breadcrumb to indicate why this isn't appropriate.

We could use it, and possibly should, but as I noted in the PR description, I intentionally disabled this feature since it was more work to enable, and this change is already too big for easy development. If we decide that this sort of caching is advantageous, it would be appropriate in a separate PR to enable it for the cached interface dispatch scenario.

AaronRobinsonMSFT · 2025-02-07T08:54:51Z

src/coreclr/debug/daccess/request.cpp

@@ -3663,7 +3665,9 @@ static const char *LoaderAllocatorLoaderHeapNames[] =
    "FixupPrecodeHeap",
    "NewStubPrecodeHeap",
    "IndcellHeap",
+#ifdef FEATURE_VIRTUAL_STUB_DISPATCH


This adds to the confusion from above considering case CacheEntryHeap: remains, but we define out the string.

AaronRobinsonMSFT · 2025-02-07T08:55:07Z

src/coreclr/debug/daccess/request.cpp

    "CacheEntryHeap",
+#endif


Suggested change

#endif

#endif // FEATURE_VIRTUAL_STUB_DISPATCH

AaronRobinsonMSFT · 2025-02-07T08:55:15Z

src/coreclr/debug/daccess/request.cpp

                pLoaderHeaps[i++] = HOST_CDADDR(pVcsMgr->cache_entry_heap);
+#endif


Suggested change

#endif

#endif // FEATURE_VIRTUAL_STUB_DISPATCH

AaronRobinsonMSFT · 2025-02-07T09:39:55Z

src/coreclr/vm/contractimpl.h

+    }
+    static inline DispatchToken FromCachedInterfaceDispatchToken(UINT_PTR token)
+    {
+        return DispatchToken(token >> 1);


Suggested change

return DispatchToken(token >> 1);

_ASSERTE(IsCachedInterfaceDispatchToken(token));

return DispatchToken(token >> 1);

AaronRobinsonMSFT · 2025-02-07T09:42:10Z

src/coreclr/vm/dynamicmethod.cpp

+                if (cellCacheHeader != NULL)
+                {
+                    InterfaceDispatch_DiscardCacheHeader(cellCacheHeader);
+                    pDispatchCell->m_pCache = 0;


This is odd. We have a getter, GetCache(), above, but then we unilaterally manipulate the field. We should either always access the field or do it through methods/abstractions.

AaronRobinsonMSFT · 2025-02-07T09:42:59Z

src/coreclr/vm/excep.cpp

+#ifdef FEATURE_CACHED_INTERFACE_DISPATCH
+    if (VirtualCallStubManager::isCachedInterfaceDispatchStubAVLocation(f_IP))
+        return TRUE;
+#endif


Suggested change

#endif

#endif // FEATURE_CACHED_INTERFACE_DISPATCH

src/coreclr/vm/prestub.cpp

AaronRobinsonMSFT · 2025-02-07T09:53:56Z

src/coreclr/nativeaot/Runtime/CMakeLists.txt

@@ -1,9 +1,11 @@
 set(GC_DIR ../../gc)
+set(SHARED_RUNTIME_DIR ../../shared_runtime)


runtime seems appropriate. I agree that a prefix would be nice, but I agree with @janvorli's consistency argument. @jkotas's dislike of shared_ is something I share too. I think shared is also dangerous as it breeds a dumping ground akin to utils.

…interface_dispatch_for_coreclr

…nd coreclr

davidwrighton added 10 commits January 8, 2025 15:47

Move the cached interface dispatch code into a shared region

2b2ca52

Split cached interface dispatch up into a component which is focussed…

4892674

… on shared things, and parts that are not shared

It builds for X64, VTable stuff isn't probably correct, but its basic…

d69528f

…ally in place

Add indirection cell helper so that VSD and CachedInterfaceDispatch c…

5f1f2b5

…an be switched between

Ready to try running things. R2R not yet supported. Virtual delegates…

976bf83

… not yet supported

Initialize CachedInterfaceDispatch at startup

652930c

AMD64 seems to work

39a2574

Arm64 Windows assembly written and factored amd64 to be similar

4c0865c

Allow there to be flavors of the build which do not build cached inte…

645c487

…rface dispatch

Make it possible for some OS/Architecture sets to have cached interfa…

921631a

…ce dispatch or virtual stub dispatch

dotnet-issue-labeler bot added the area-VM-coreclr label Jan 24, 2025

dotnet-policy-service bot assigned davidwrighton Jan 24, 2025

build-analysis bot mentioned this pull request Jan 24, 2025

The Operation will be canceled. The next steps may not contain expected logs. dotnet/dnceng#3008

Open

3 tasks

davidwrighton closed this Jan 24, 2025

davidwrighton added 5 commits January 24, 2025 09:46

Fix X86 build

73b0b26

Get Linux Arm64 and Amd64 into a possibly good state

2cdd955

Enable building cached interface dispatch for Linux arm64

df393d9

Add AVLocation for the VTable helper which wasn't present in the Nati…

cce3bcb

…veAOT cached interface dispatch implementation (as it isn't actually used) Update IsIPinVirtualStub to check the AVLocations, not the stub entry points

davidwrighton reopened this Jan 24, 2025

davidwrighton added 7 commits January 24, 2025 23:50

Merge branch 'main' of github.com:dotnet/runtime into cached_interfac…

4bbcdaa

…e_dispatch_for_coreclr

Fix musl build failure

c320e1d

Handle missed RhpVTableOffsetDispatchAVLocation case

361588a

Move RiscV stub dispatch logic to the same place as everything else

24e78b2

Fix assertion issue with collectible assemblies

5b0e5ac

Reduce InterfaceDispatchCell size from 4 pointers to 2, and actually …

fa7826a

…hook up the VTable offset logic and such (vtable paths are untested)

Use the isCachedInterfaceDispatchStubAVLocation helper where appropriate

f1c2c65

build-analysis bot mentioned this pull request Jan 28, 2025

slow macOS - "##[error]The job running on agent Azure Pipelines 9 ran longer than the maximum time of 60 minutes." dotnet/dnceng#1883

Open

3 tasks

Enable using cached interface dispatch in R2R

36c9cc0

- Enable generating double pointer indirection cells in R2R files using command line switch. - Fix VTableOffset calculation - Add logic in ExternalMethodFixupWorker to handle the double pointer indirection cells.

davidwrighton added 2 commits January 31, 2025 12:03

Merge branch 'main' into cached_interface_dispatch_for_coreclr

6e5a394

Fix x64 stub dispatch code to use the right register, and switch to a…

48b5009

…llocating the memory for the dispatch using the LoaderHeap Also tweak a collectible assembly test to actually use cached interface dispatch

am11 reviewed Feb 1, 2025

View reviewed changes

src/coreclr/shared_runtime/CachedInterfaceDispatch.cpp Outdated Show resolved Hide resolved

am11 reviewed Feb 1, 2025

View reviewed changes

src/coreclr/pal/inc/pal.h Show resolved Hide resolved

am11 reviewed Feb 1, 2025

View reviewed changes

src/coreclr/pal/inc/pal.h Outdated Show resolved Hide resolved

davidwrighton added 2 commits February 3, 2025 11:50

Merge branch 'main' of https://github.com/dotnet/runtime into cached_…

ad64a55

…interface_dispatch_for_coreclr

Add environment variable to control use of cached dispatch for testin…

fa72602

…g scenarios

davidwrighton changed the title ~~[DRAFT] Cached interface dispatch for coreclr~~ Cached interface dispatch for coreclr Feb 3, 2025

davidwrighton marked this pull request as ready for review February 3, 2025 23:46

davidwrighton requested a review from MichalStrehovsky as a code owner February 3, 2025 23:46

davidwrighton requested review from janvorli and jkotas February 3, 2025 23:46

am11 reviewed Feb 4, 2025

View reviewed changes

src/coreclr/clrfeatures.cmake Outdated Show resolved Hide resolved

am11 reviewed Feb 4, 2025

View reviewed changes

Fix interface stepping for cached interface dispatch

08ac0a1

janvorli reviewed Feb 6, 2025

View reviewed changes

src/coreclr/clrfeatures.cmake Outdated Show resolved Hide resolved

jkotas reviewed Feb 6, 2025

View reviewed changes

janvorli reviewed Feb 6, 2025

View reviewed changes

src/coreclr/nativeaot/Runtime/amd64/CachedInterfaceDispatchAot.S Show resolved Hide resolved

janvorli reviewed Feb 6, 2025

View reviewed changes

src/coreclr/debug/daccess/request.cpp Outdated Show resolved Hide resolved

janvorli reviewed Feb 6, 2025

View reviewed changes

src/coreclr/vm/virtualcallstub.cpp Outdated Show resolved Hide resolved

janvorli reviewed Feb 6, 2025

View reviewed changes

src/coreclr/nativeaot/Runtime/amd64/MiscStubs.asm Outdated Show resolved Hide resolved

Respond to most of the feedback

006537d

AaronRobinsonMSFT reviewed Feb 7, 2025

View reviewed changes

davidwrighton added 3 commits February 7, 2025 16:22

Feedback and fixes

2ff4d2a

Merge branch 'main' of https://github.com/dotnet/runtime into cached_…

6cb2b78

…interface_dispatch_for_coreclr

Use runtime as the directory to hold stuff shared between NativeAOT a…

70dacc0

…nd coreclr

build-analysis bot mentioned this pull request Feb 8, 2025

System.Numerics.Tensors.Tests.ConvertTests.ConvertChecked failing with System.OverflowException #112286

Open

-  set(FEATURE_CORECLR_CACHED_INTERFACE_DISPATCH 1)
-else()
-  set(FEATURE_CORECLR_VIRTUAL_STUB_DISPATCH 1)
+  set(FEATURE_CACHED_INTERFACE_DISPATCH 1)
+else()
+  set(FEATURE_VIRTUAL_STUB_DISPATCH 1)

		@@ -1,9 +1,11 @@
		set(GC_DIR ../../gc)
		set(SHARED_RUNTIME_DIR ../../shared_runtime)

		if (pVcsMgr->cache_entry_heap != NULL) heapsToEnumerate.Push(pVcsMgr->cache_entry_heap);
		#endif

		pLoaderHeaps[i++] = HOST_CDADDR(pVcsMgr->cache_entry_heap);
		#endif

	return DispatchToken(token >> 1);
	_ASSERTE(IsCachedInterfaceDispatchToken(token));
	return DispatchToken(token >> 1);

Cached interface dispatch for coreclr #111771

Are you sure you want to change the base?

Cached interface dispatch for coreclr #111771

Conversation

davidwrighton commented Jan 24, 2025 • edited Loading

dotnet-policy-service bot commented Jan 24, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

davidwrighton Feb 7, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

davidwrighton commented Jan 24, 2025 •

edited

Loading

davidwrighton Feb 7, 2025 •

edited

Loading