-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Stop using LIST nodes for CALL arg lists #26392
Conversation
This is pretty large but most changes are rather simple:
There are few more notable changes:
Another notable change is that this (and all similar |
@dotnet/jit-contrib |
@jashook @BruceForstall Do you have any idea what exactly failed in the CoreFX Linux check? Apparently all jobs have failed but I can't find much about what happened. In a Helix log (https://helix.dot.net/api/2019-06-17/jobs/2d58505d-56e6-46a6-a07b-4bc8754fc017/workitems/System.Threading.Channels.Tests/console) I see entries like:
Test run started and failed immediately without a trace? |
The real failure looks to be higher up:
Looks like for some reason the dotnet CLI didn't get copied/installed properly. I set the failures to rerun |
@mikedn is this PR ready for review or it is wip? |
It should be good for review, I was just hoping that the CI will get green but I see that it failed again. Still. it looks like those tests didn't even run so it's probably a CI issue rather than something wrong with my change. |
Yeah, I'm not 100% sure how it all works but it looks to me that the normal and CoreFX test builds have conflicting artifact names so they overwrite each over. On a successful PR build I see in the run tests jobs:
but here it's
so failed CoreFX run probably ended up downloading the wrong artifact. |
The VN memory usage improvement in the "largest method" reported in mem stats is more than 2x. I was expecting improvements due to VNs for LIST nodes no longer being generated but this seemed just amazing so I tried to figure which method is this. It looks like it's coreclr/src/System.Private.CoreLib/shared/System/Globalization/CultureData.cs Lines 153 to 285 in 4aa65f0
So yeah, it's basically a long list of calls. And the memory stats are collected from a checked corelib (the rel VM doesn't dump the stats even if the rel JIT is compiled with mem stats enabled) so there are even more calls in that method thanks to some debug stuff. --- D:\Projects\jit\foo.asm 2019-08-31 19:14:14.000000000 +-0300
+++ D:\Projects\jit\boo.asm 2019-08-31 19:19:30.000000000 +-0300
@@ -171826,21 +169650,14 @@
-N003 [007204] ADD => $1c5 {norm=$1c4 {ADD($47, $181)}, exc=$100 {HelperMultipleExc()}}
- fgCurMemoryVN[GcHeap] assigned for GTF_IND_VOLATILE - read at [007205] to VN: $2083.
-N004 [007205] IND => $2006 {norm=$2084 {2084}, exc=$10c( {NullPtrExc($1c4)}, {HelperMultipleExc()})}
-N005 [010024] LCL_VAR V515 tmp515 d:1 => $2084 {2084}
-N006 [010025] ASG => $2006 {norm=$2084 {2084}, exc=$10c( {NullPtrExc($1c4)}, {HelperMultipleExc()})}
-N007 [010028] ARGPLACE => $2086 {2086}
-N008 [010030] ARGPLACE => $2087 {2087}
-N009 [007207] LIST => $1fbe {LIST($2087, $0)}
-N010 [007208] LIST => $1fbf {LIST($2086, $1fbe)}
-N011 [010026] LCL_VAR V515 tmp515 u:1 (last use) => $2084 {2084}
-N012 [007192] LCL_VAR V386 tmp386 u:1 (last use) => <l:$1c3f {ByrefExposedLoad($43, $109, $1f7e)}, c:$1f7f {1f7f}>
-N013 [007193] LCL_VAR V387 tmp387 u:1 (last use) => <l:$1c3f {ByrefExposedLoad($43, $109, $1f7e)}, c:$2081 {2081}>
-N014 [010031] LIST => <l:$2141 {LIST($1c3f, $0)}, c:$2140 {LIST($2081, $0)}>
-N015 [010029] LIST => <l:$2143 {LIST($1c3f, $2141)}, c:$2142 {LIST($1f7f, $2140)}>
-N016 [010027] LIST => <l:$2145 {LIST($2084, $2143)}, c:$2144 {LIST($2084, $2142)}>
-VN of ARGPLACE tree [010028] updated to $2084 {2084}
-VN of ARGPLACE tree [010030] updated to <l:$1c3f {ByrefExposedLoad($43, $109, $1f7e)}, c:$1f7f {1f7f}>
-N009 [007207] LIST => <l:$2141 {LIST($1c3f, $0)}, c:$2146 {LIST($1f7f, $0)}>
-N010 [007208] LIST => <l:$2148 {LIST($2084, $2141)}, c:$2147 {LIST($2084, $2146)}>
- fgCurMemoryVN[GcHeap] assigned for CALL at [007206] to VN: $2088.
-N017 [007206] CALLV ind => $VN.Void
+N003 [005539] ADD => $1c5 {norm=$1c4 {ADD($47, $181)}, exc=$100 {HelperMultipleExc()}}
+ fgCurMemoryVN[GcHeap] assigned for GTF_IND_VOLATILE - read at [005540] to VN: $1483.
+N004 [005540] IND => $1406 {norm=$1484 {1484}, exc=$10c( {NullPtrExc($1c4)}, {HelperMultipleExc()})}
+N005 [007461] LCL_VAR V515 tmp515 d:1 => $1484 {1484}
+N006 [007462] ASG => $1406 {norm=$1484 {1484}, exc=$10c( {NullPtrExc($1c4)}, {HelperMultipleExc()})}
+N007 [007464] ARGPLACE => $1486 {1486}
+N008 [007465] ARGPLACE => $1487 {1487}
+N009 [007463] LCL_VAR V515 tmp515 u:1 (last use) => $1484 {1484}
+N010 [005529] LCL_VAR V386 tmp386 u:1 (last use) => <l:$11ff {ByrefExposedLoad($43, $109, $13be)}, c:$13bf {13bf}>
+N011 [005530] LCL_VAR V387 tmp387 u:1 (last use) => <l:$11ff {ByrefExposedLoad($43, $109, $13be)}, c:$1481 {1481}>
+VN of ARGPLACE tree [007464] updated to $1484 {1484}
+VN of ARGPLACE tree [007465] updated to <l:$11ff {ByrefExposedLoad($43, $109, $13be)}, c:$13bf {13bf}>
+ fgCurMemoryVN[GcHeap] assigned for CALL at [005541] to VN: $1488.
+N012 [005541] CALLV ind => $VN.Void |
newArgs = &newArgObjp; | ||
oldArgObjp.Current() = oldCall->gtCallObjp; | ||
oldArgs = &oldArgObjp; | ||
newArgObjp.SetNode(newCall->gtCallObjp); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Old code was broken, it was setting Current
to gtCallArgs
instead of gtCallObjp
which made no sense. I don't think this code is ever hit, it's only used when calls are cloned and that's not exactly common. The only case I know about is loop hoisting but only a few helper calls can be considered invariant and hoisted. And those helper calls do not have this
...
Speaking of this
, I find it extremely odd that it's stored separately from rest of the args, it gtCallObjp
. It complicates things and as seen here this can lead to potential bugs.
seems odd. maybe all output is suppressed for some reason? |
It looks like I was (not) seeing things, works fine now. With a release corelib memory usage improvement is a bit lower but similar - 3.05%. Anyway, I tried with other assemblies (System.Windows.Form, Microsoft.CodeAnalysis.CSharp) to be sure that corelib isn't somehow special and the number are similar or even higher - 4.55% on Microsoft.CodeAnalysis.CSharp. It's normal that smaller IR results in less memory and CPU usage but I wasn't expecting to be over 3%. Oh well, simply removing the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, TP and memory win is amazing, thank you!
}; | ||
|
||
GenTree* gtCallObjp; // The instance argument ('this' pointer) | ||
Use* gtCallArgs; // The list of arguments in original evaluation order |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was a bit confused by:
Use* gtCallArgs; // The list of arguments
and
UseList Args()
they are both lists but they are used differently, there are few places where we access gtCallArgs
and it takes me a few secs to understand why.
We could think of how to separate them better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I would very much prefer to make gtCallArgs
private but there are a lot more places that need to be changed to achieve that. I also don't like the "manual" linked list manipulation that's all over the place but that again requires a lot more changes.
I hope I'll be able to make other improvements around GenTreeCall
data structures (get rid of GTF_LATE_ARG
, ReplaceCallOperand
, GT_ARGPLACE
and gtCallLateArgs
) and then perhaps do some more polishing as well because some things would just be easier.
Here it's probably also worth pointing out that keeping the args in an array rather than a linked list would make parts of the code cleaner. Unfortunately we have a few places were we need to insert "hidden" args so keeping the existing linked list approach was simpler and safer.
@@ -311,25 +309,23 @@ int LinearScan::BuildCall(GenTreeCall* call) | |||
} | |||
} | |||
|
|||
#ifdef DEBUG |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch, looks like the code here is old and the comment is not valid anymore, because we are not counting anything. The same for lsraxarch.
// Return Value:
// The number of sources consumed by this node.
//
int LinearScan::BuildCall(GenTreeCall* call)
Is it the number of registers consumed by this node?
@CarolEidt ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I forgot to mention this part and the similar one in lsraxarch.cpp. It does look like at least the comment is out of date and maybe the code is no longer necessary as well. But it doesn't seem like a good idea to remove asserts while doing a large refactoring so for now I made only minimal changes.
} | ||
break; | ||
|
||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
}
else
{
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, don't know. I usually don't like to add else
when the if
branch returns. Besides, the subsequent scoping is only required because we're in a switch. Perhaps I should just pulled out all the comapre code in a GenTreeCall::Equals
function like I did for PHIs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that this needn't be an else
, and it might be nice to pull it out, but I don't feel that's necessary.
@@ -14742,7 +14748,7 @@ void Compiler::impImportBlockCode(BasicBlock* block) | |||
if (!usingReadyToRunHelper) | |||
#endif | |||
{ | |||
args = gtNewArgList(op1, op2); | |||
GenTreeCall::Use* args = gtNewCallArgs(op1, op2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GitHub doesn't really support reviewing changes in a file with more than 12000 lines, especially at the end of such files. It shows me that args
is unused here and you have to check the raw file to see the use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ha ha, didn't noticed that an entire section of the file is actually missing from the diff. GitHub is great in many ways but there's still room for improvement :)
I should have reverted this variable declaration change. Initially I did more cleanup but then I saw the size of the PR I reverted a bunch of stuff but missed a few cases. Oh well, I absolutely hate variables declared on the other side of the planet. This is probably one the worst "features" of the old C versions that required variables to be declared upfront.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GitHub doesn't really support reviewing changes in a file with more than 12000 lines
Maybe we need to figure out how to break this file up a bit!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My suggestions are all requested comment updates. If you're willing to make those I think it would be great (and assuming they're the only changes made we wouldn't need to wait for the full round of CI tests to complete before merging).
@@ -808,10 +808,13 @@ Compiler::AssertionDsc* Compiler::optGetAssertion(AssertionIndex assertIndex) | |||
* if they don't care about it. Refer overloaded method optCreateAssertion. | |||
* | |||
*/ | |||
AssertionIndex Compiler::optCreateAssertion(GenTree* op1, GenTree* op2, optAssertionKind assertionKind) | |||
AssertionIndex Compiler::optCreateAssertion(GenTree* op1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While this method didn't have a header to begin with, it would be good to add one, especially since you've added an arg here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added comment headers to all affected assertionprop functions. However, I also cleaned up things a bit because I don't want to waste my time adding comments about useless overloads and parameters - assertion
parameter isn't needed because everyone just passes the address of a local variable created solely for this purpose.
Of course, it would be even better to split optCreateAssertion
into smaller functions such that helperCallArgs
parameter isn't needed but that doesn't belong in this PR.
} | ||
break; | ||
|
||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that this needn't be an else
, and it might be nice to pull it out, but I don't feel that's necessary.
@@ -14742,7 +14748,7 @@ void Compiler::impImportBlockCode(BasicBlock* block) | |||
if (!usingReadyToRunHelper) | |||
#endif | |||
{ | |||
args = gtNewArgList(op1, op2); | |||
GenTreeCall::Use* args = gtNewCallArgs(op1, op2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GitHub doesn't really support reviewing changes in a file with more than 12000 lines
Maybe we need to figure out how to break this file up a bit!
src/jit/importer.cpp
Outdated
BOOL Compiler::impInlineIsGuaranteedThisDerefBeforeAnySideEffects(GenTree* additionalTreesToBeEvaluatedBefore, | ||
GenTree* variableBeingDereferenced, | ||
InlArgInfo* inlArgInfo) | ||
BOOL Compiler::impInlineIsGuaranteedThisDerefBeforeAnySideEffects(GenTree* additionalTreeToBeEvaluatedBefore, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be worth updating the function header of this method to clarify that both additionalTreeToBeEvaluatedBefore
as well as the argList
will be examined for side-effects. (Also the comment needs to be changed to use additionalTreeToBeEvaluatedBefore
(singular 'Tree').
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added comment header and renamed some parameters in the process.
additionalTreeToBeEvaluatedBefore
is unnecessarily long as "before" isn't really relevant - the order in which various trees are searched for side effects doesn't matter.variableBeingDereferenced
was misleading, there was no guarantee that it is actually a variable. That's checked inimpInlineIsThis
.
@@ -428,19 +428,19 @@ class IndirectCallTransformer | |||
// | |||
void AddHiddenArgument(GenTreeCall* fatCall, GenTree* hiddenArgument) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pre-existing, but it would be worthwhile to mention in the function header that the hidden argument is always add just after the return buffer, if there is one, and otherwise prepended to the list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can also be appended to the list depending on USER_ARGS_COME_LAST
. I believe that the function would be clear enough and did not require a comment, if it weren't for the "manual" linked list manipulation that happens in the ret buffer arg case. So I cleaned up the "manual" linked list manipulation instead of adding a comment. Anyway it was messed up - instead of simply inserting a new use for the hidden argument it was also recreating the use of the ret buffer arg.
GenTreeArgList** insertionPoint = &call->gtCallArgs; | ||
for (; *insertionPoint != nullptr; insertionPoint = &(*insertionPoint)->Rest()) | ||
GenTreeCall::Use** insertionPoint = &call->gtCallArgs; | ||
for (; *insertionPoint != nullptr; insertionPoint = &((*insertionPoint)->NextRef())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It took a second look to determine that, by construction, insertionPoint
cannot be null. It would definitely be worth a comment to that effect (at first I thought it merited a noway_assert
until I realized that couldn't happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that this kind of code isn't easy to read but it's only 3 lines and this is not the only place where it occurs so I don't think adding a comment is a solution. The solution would be to add a function that does this operation. I considered adding something like gtAppendNewCallArg
but I don't think it's a good idea due to its potential for misuse - it has to search for the end of the list every time it is called so if it ends up being used to append multiple arguments it's not going to be good for perf.
I hope to make something better once I also change GT_FIELD_LIST
to not be a list. It's a bit of a problem to abstract linked list operations in this case (single linked list with only the head being stored) but there are some ways to do this that are better than having this kind of low level list manipulation code all over the place (something similar to a back_inserter
STL iterator could help for example).
* Stop using GT_LIST in GenTreeCall * Clarify optAssertionGenJtrue code * Cleanup optCreateAssertion * Cleanup AddHiddenArgument * Cleanup impInlineIsGuaranteedThisDerefBeforeAnySideEffects Commit migrated from dotnet/coreclr@667222e
Similar to #20266 but for
GenTreeCall
nodes.This saves 3.4% used memory and 3.2% instructions:
Mem diff: https://gist.github.com/mikedn/3d34aebb9fc3da218cf23f16ab821d01
PIN data: https://1drv.ms/x/s!Av4baJYSo5pjgs4637g1JfAO3jOlIQ?e=FhGJ15
Contributes to #19876
No x64/x86/arm32(altjit)/arm64(altjit) FX diffs.