-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce the number of forced MethodTable
s
#79594
Conversation
bool thrown = false; | ||
try | ||
{ | ||
_ = t.TypeHandle; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We still keep the metadata around because metadata got figured out during scanning, but we can check for the presence of the MethodTable
by looking at the TypeHandle
property.
The Type
is damaged like this because we obtained it through unsafe reflection code. Legit code should not be able to see damaged types like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm also struggling with the relationship between the new node and MetadataManager.GetDependenciesDueToReflectability
.
Naively, with the new node I would expect that this method is called from the new node and not from various other places.
namespace ILCompiler.DependencyAnalysis | ||
{ | ||
/// <summary> | ||
/// Represents a type that is visible from reflection. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please add a comment about the difference between ReflectableTypeNode and TypeMetadataNode?
Naively I would expect that if type is reflectable it needs metadata, but currently it's the other way round - TypeMetadataNode adds ReflectableTypeNode. So that must mean that there's a case where type is reflectable but we don't have metadata for it.
IEnumerable<TypeDesc> forcedTypes, | ||
IEnumerable<ReflectableEntity<TypeDesc>> reflectableTypes, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the difference between forcedTypes
and reflectableTypes
- per the change description these should be basically identical now...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reflectableTypes
are types that are eligible to be reflection visible, but are not rooted. forcedTypes
are rooted.
I forgot to mention that we have a concept of reflection blocked types - types that produce a special System.Type
at runtime.
reflectableTypes
basically helps compilation phase to decide whether something should be blocked or not.
It unfortunately doesn't match well with the below reflectableFields
and reflectableMethods
because types have the unfortunate property of becoming visible to reflection just because they exist (unless blocked).
Let me see if I have the right mental model here. When I was talking about this topic with @sbomer a while back I noted that there's a difference between Is this basically the same idea? That just grabbing a typeof doesn't actually keep the MethodTable or any of the rest of the reflection infrastructure? |
I'll try to explain with a different example. Consider following code: GetO().ToString();
[BarAttribute]
struct Foo
{
public override void ToString() { ... }
}
static object GetO() => (object)default(Foo); Now consider we compiled this with optimizations on. When we're in the scanning phase we need to figure out the whole worldview, including reflection analysis. So we scan But now we go into compilation. RyuJIT sees Now, why do we need to force You may have also noticed that if we do inline |
Grabbing a I'm leveraging a RyuJIT optimization in this - if RyuJIT sees |
I think the terminology here is very confusing. We have "reflectable method" and "reflectable field" (in AnalysisBasedMetadataManager), these are:
We have "reflectable type" (in AnalysisBasedMetadataManager), these are:
We have now new
And in the description of this PR there's also the term "reflected type" (as oppose to "reflectable type") which is used for the types for which we've seen reflection so those with I think we should clean this up - and add a comment somewhere which clearly describes what the terms are used for. Otherwise the change looks good - functionally it works and will save size (nice!!), it's just that the code is really hard to read given the current naming scheme. Side node: Would be great to have some kind of a doc describing the nodes and what is their meaning and most importantly what is their relationship. In this area for example |
a78b7d6
to
12923d4
Compare
The problem is really only that for fields/methods, they only get visible from reflection if one does an explicit gesture towards that. So we don't have many of those in the system and all of them are deliberate and actually mean "someone did a gesture to make them available from reflection". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks - the comments are great.
I still think this is rather complicated and really hard to understand by just reading the code, but this makes it better :-)
I looked if we can make it so that there's only one list of types that feeds into AnalysisBasedMetadataManager, but we currently need two lists due to the mode where each assembly can be compiled into a single obj file. In this mode, the fact that we generated a methodtable doesn't mean we should also generate reflection metadata (this is due to generics that generate methodtables anywhere, but metadata only goes to the one obj file that corresponds to the definition of the uninstantiated type. |
Thanks for trying! |
If I understand correctly, scanning might keep things alive that later turn dead because codegen optimizes usages away. This can cause unnecessary data to be retained. I understand that the size of the unneeded data is meaningful. Crazy idea: How about running the pipeline twice? What's kept from the first run is just the information what actually will end up being alive. The code output is simply dropped. And in the second run, the actual output is created. But this time, it is precisely known, what will live. This obviously has a performance cost so it could be an opt-in mode for use cases in which every percent in size counts. This seems like a sledgehammer solution but it could be very general. Does this make sense? |
Yes, eventually the scanning phase would be done by RyuJIT - it can better predict when things will be devirtualized. We don't run RyuJIT for scanning right now because we don't have a way to run it without actually generating the code. We could certainly run RyuJIT twice and just throw away all the generated code first time. But right now the benefit would be quite limited. We can get a lot more benefit (without the compile time regression drawback) if we do it properly and also allow RyuJIT to collect data it will find useful to do codegen later - but RyuJIT needs to add first class support for running as a scanner. |
When we're doing an optimized build, we go over the whole program twice:
In both phases, we assume any type with a "constructed"
MethodTable
is visible to reflection because one can just callobject.GetType()
and reflect on stuff. We need to pass a list of constructedMethodTable
s from the first phase to the second phase because someMethodTable
s could be the result of reflection analysis and we need to make sure they're compiled.But crucially, up until now we didn't really track which
MethodTable
s are actual reflection roots and which ones just showed up in the dependency graph because the analyzed program happened to use it. We don't actually need to pass the latter ones as roots to compilation because the compilation phase is going to figure them out if they're needed anyway and if the compilation doesn't come up with some, that's fine because one wouldn't be able to callobject.GetType
on those anyway, because they're not actually part of the program.Passing all of the MethodTables we saw from scanning to compilation is actually size bloat because scanning overapproximates things (by necessity, since it doesn't have a whole program view).
In this pull request I'm introducing
ReflectedTypeNode
to modelMethodTable
s that are actual targets of reflection. Only those will get passed as roots to the compilation phase. From now on we need to be mindful of how we refer to types. If a reference to a type is a result of non-code dependency, we should useReflectedType
to model it.Saves about 1.2% (32 kB) in size on a Hello World.
I'm seeing it helps in two patterns:
runtime/src/libraries/System.Private.CoreLib/src/System/Byte.cs
Lines 1088 to 1093 in b1a2080
RyuJIT is able to eliminate the dead code in the if branch, but we were still rooting the type within the branch from the box.
The second pattern seems to be around RyuJIT devirtualizing things and preventing a box, which now eliminates the
MethodTable
.Cc @dotnet/ilc-contrib