Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvements for dead store removal. #38004

Merged
merged 1 commit into from
Jun 27, 2020

Conversation

erozenfeld
Copy link
Member

@erozenfeld erozenfeld commented Jun 17, 2020

Improvements for dead store removal.

  1. Don't mark fields of dependently promoted structs as untracked.

  2. Remove some stores whose lhs local has a ref count of 1 when running
    late liveness. We can rely on ref counts since they are calculated right
    before the late liveness pass.

  3. Remove dead GT_STORE_BLK in addition to GT_STOREIND in the late
    liveness pass.

  4. Remove dead stores to untracked locals in the late liveness pass.

  5. Allow optRemoveRedundantZeroInits to remove some redundant initializations
    of tracked locals. Move the phase to right after liveness so that SSA is correct
    after removing assignments to tracked locals.

@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jun 17, 2020
@erozenfeld erozenfeld added the NO-REVIEW Experimental/testing PR, do NOT review it label Jun 17, 2020
@erozenfeld erozenfeld force-pushed the DeadStoreRemoval branch 7 times, most recently from 030a585 to ed7a888 Compare June 19, 2020 19:12
@erozenfeld erozenfeld marked this pull request as ready for review June 24, 2020 04:00
@erozenfeld erozenfeld removed the NO-REVIEW Experimental/testing PR, do NOT review it label Jun 24, 2020
@erozenfeld erozenfeld marked this pull request as draft June 24, 2020 04:03
@erozenfeld erozenfeld force-pushed the DeadStoreRemoval branch 3 times, most recently from 1a7a5be to 3d8ed62 Compare June 25, 2020 21:27
@erozenfeld
Copy link
Member Author

Framework x64 pmi diffs:

PMI CodeSize Diffs for System.Private.CoreLib.dll, framework assemblies for  default jit
Summary of Code Size diffs:
(Lower is better)
Total bytes of diff: -17783 (-0.04% of base)
    diff is an improvement.
Top file improvements (bytes):
       -9586 : Microsoft.Diagnostics.Tracing.TraceEvent.dasm (-0.31% of base)
       -2727 : Microsoft.CodeAnalysis.VisualBasic.dasm (-0.05% of base)
        -985 : Microsoft.CodeAnalysis.CSharp.dasm (-0.02% of base)
        -898 : System.Data.Common.dasm (-0.06% of base)
        -502 : Microsoft.CodeAnalysis.dasm (-0.03% of base)
        -457 : System.Private.CoreLib.dasm (-0.01% of base)
        -451 : System.Reflection.Metadata.dasm (-0.11% of base)
        -404 : Newtonsoft.Json.dasm (-0.05% of base)
        -359 : System.Threading.Tasks.Dataflow.dasm (-0.04% of base)
        -212 : System.Drawing.Primitives.dasm (-0.57% of base)
        -149 : System.Reflection.MetadataLoadContext.dasm (-0.08% of base)
        -140 : ILCompiler.Reflection.ReadyToRun.dasm (-0.10% of base)
        -102 : System.Linq.dasm (-0.01% of base)
         -71 : Newtonsoft.Json.Bson.dasm (-0.07% of base)
         -70 : System.Net.Http.dasm (-0.01% of base)
         -69 : System.Diagnostics.EventLog.dasm (-0.08% of base)
         -61 : System.Drawing.Common.dasm (-0.02% of base)
         -56 : System.Private.Xml.Linq.dasm (-0.03% of base)
         -45 : System.Linq.Parallel.dasm (-0.00% of base)
         -42 : System.Net.Security.dasm (-0.03% of base)
45 total files with Code Size differences (45 improved, 0 regressed), 219 unchanged.
Top method improvements (bytes):
       -2623 (-8.07% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - ClrPrivateTraceEventParser:EnumerateTemplates(Func`3,Action`1):this
       -2607 (-8.81% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - ClrTraceEventParser:EnumerateTemplates(Func`3,Action`1):this
       -2195 (-4.62% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - KernelTraceEventParser:EnumerateTemplates(Func`3,Action`1):this
       -2193 (-17.99% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - BoundTreeVisitor`2:VisitInternal(BoundNode,Vector`1):long:this
       -1761 (-2.67% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - CtfTraceEventSource:InitEventMap():Dictionary`2
        -279 (-0.25% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - ApplicationServerTraceEventParser:EnumerateTemplates(Func`3,Action`1):this
        -245 (-19.65% of base) : Microsoft.CodeAnalysis.CSharp.dasm - BinopEasyOut:TypeToIndex(TypeSymbol):Nullable`1
        -245 (-19.65% of base) : Microsoft.CodeAnalysis.CSharp.dasm - UnopEasyOut:TypeToIndex(TypeSymbol):Nullable`1
        -140 (-2.40% of base) : System.Data.Common.dasm - UnboxT`1:NullableField(Object):Nullable`1 (42 methods)
        -121 (-7.02% of base) : Microsoft.CodeAnalysis.dasm - SyntaxDiffer:GetNextAction():DiffAction:this
        -120 (-1.48% of base) : Newtonsoft.Json.dasm - JToken:op_Explicit(JToken):Nullable`1 (17 methods)
        -119 (-23.11% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - SyntaxRemover:AddEndOfLine():this
         -96 (-0.94% of base) : System.Threading.Tasks.Dataflow.dasm - BatchBlockTargetCore:RetrievePostponedItemsGreedyBounded(bool):this (7 methods)
         -93 (-0.83% of base) : System.Threading.Tasks.Dataflow.dasm - BatchBlockTargetCore:RetrievePostponedItemsNonGreedy(bool):this (7 methods)
         -85 (-1.19% of base) : System.Threading.Tasks.Dataflow.dasm - BatchBlockTargetCore:ConsumeReservedMessagesNonGreedy():this (7 methods)
         -85 (-1.45% of base) : System.Threading.Tasks.Dataflow.dasm - BatchBlockTargetCore:ConsumeReservedMessagesGreedyBounded():this (7 methods)
         -84 (-0.65% of base) : System.Linq.dasm - Enumerable:Sum(IEnumerable`1,Func`2):Nullable`1 (35 methods)
         -81 (-31.15% of base) : Microsoft.CodeAnalysis.CSharp.dasm - <>c:<HasEndOfLine>b__12_0(SyntaxTrivia):bool:this
         -81 (-31.64% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - _Closure$__:_Lambda$__12-0(SyntaxTrivia):bool:this
         -74 (-15.98% of base) : Microsoft.CodeAnalysis.CSharp.dasm - SyntaxRemover:AddEndOfLine():this
Top method improvements (percentages):
         -81 (-31.64% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - _Closure$__:_Lambda$__12-0(SyntaxTrivia):bool:this
         -81 (-31.15% of base) : Microsoft.CodeAnalysis.CSharp.dasm - <>c:<HasEndOfLine>b__12_0(SyntaxTrivia):bool:this
          -6 (-26.09% of base) : Microsoft.CodeAnalysis.dasm - SubsystemVersion:Create(int,int):SubsystemVersion
          -6 (-25.00% of base) : Microsoft.CodeAnalysis.dasm - Optional`1:op_Implicit(int):Optional`1
          -6 (-25.00% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - DimensionSize:ConstantSize(int):DimensionSize
          -6 (-25.00% of base) : System.Data.Common.dasm - SqlInt32:op_Implicit(int):SqlInt32
          -6 (-25.00% of base) : System.Private.CoreLib.dasm - Nullable`1:op_Implicit(int):Nullable`1
          -6 (-25.00% of base) : System.Reflection.Metadata.dasm - AssemblyDefinitionHandle:op_Implicit(AssemblyDefinitionHandle):Handle
          -6 (-25.00% of base) : System.Reflection.Metadata.dasm - InterfaceImplementationHandle:op_Implicit(InterfaceImplementationHandle):Handle
          -6 (-25.00% of base) : System.Reflection.Metadata.dasm - MethodDefinitionHandle:op_Implicit(MethodDefinitionHandle):Handle
          -6 (-25.00% of base) : System.Reflection.Metadata.dasm - MethodImplementationHandle:op_Implicit(MethodImplementationHandle):Handle
          -6 (-25.00% of base) : System.Reflection.Metadata.dasm - MethodSpecificationHandle:op_Implicit(MethodSpecificationHandle):Handle
          -6 (-25.00% of base) : System.Reflection.Metadata.dasm - TypeDefinitionHandle:op_Implicit(TypeDefinitionHandle):Handle
          -6 (-25.00% of base) : System.Reflection.Metadata.dasm - ExportedTypeHandle:op_Implicit(ExportedTypeHandle):Handle
          -6 (-25.00% of base) : System.Reflection.Metadata.dasm - TypeReferenceHandle:op_Implicit(TypeReferenceHandle):Handle
          -6 (-25.00% of base) : System.Reflection.Metadata.dasm - TypeSpecificationHandle:op_Implicit(TypeSpecificationHandle):Handle
          -6 (-25.00% of base) : System.Reflection.Metadata.dasm - MemberReferenceHandle:op_Implicit(MemberReferenceHandle):Handle
          -6 (-25.00% of base) : System.Reflection.Metadata.dasm - FieldDefinitionHandle:op_Implicit(FieldDefinitionHandle):Handle
          -6 (-25.00% of base) : System.Reflection.Metadata.dasm - EventDefinitionHandle:op_Implicit(EventDefinitionHandle):Handle
          -6 (-25.00% of base) : System.Reflection.Metadata.dasm - PropertyDefinitionHandle:op_Implicit(PropertyDefinitionHandle):Handle
455 total methods with Code Size differences (455 improved, 0 regressed), 242842 unchanged.

Benchmarks x64 pmi diffs:

PMI CodeSize Diffs for benchstones and benchmarks game in f:\runtime1\artifacts\tests\coreclr\Windows_NT.x64.Release for  default jit
Summary of Code Size diffs:
(Lower is better)
Total bytes of diff: -52 (-0.01% of base)
    diff is an improvement.
Top file improvements (bytes):
         -40 : SIMD\ConsoleMandel\ConsoleMandel\ConsoleMandel.dasm (-0.14% of base)
         -12 : BenchmarksGame\reverse-complement\reverse-complement-1\reverse-complement-1.dasm (-0.33% of base)
2 total files with Code Size differences (2 improved, 0 regressed), 80 unchanged.
Top method improvements (bytes):
         -14 (-3.23% of base) : SIMD\ConsoleMandel\ConsoleMandel\ConsoleMandel.dasm - ScalarFloatRenderer:RenderSingleThreadedWithADT(float,float,float,float,float):this
         -14 (-2.30% of base) : SIMD\ConsoleMandel\ConsoleMandel\ConsoleMandel.dasm - <>c__DisplayClass4_0:<RenderMultiThreadedWithADT>b__0(int):this (2 methods)
         -12 (-1.08% of base) : BenchmarksGame\reverse-complement\reverse-complement-1\reverse-complement-1.dasm - ReverseComplement_1:Bench(Stream,Stream)
          -6 (-10.17% of base) : SIMD\ConsoleMandel\ConsoleMandel\ConsoleMandel.dasm - ComplexFloat:square():ComplexFloat:this
          -6 (-9.38% of base) : SIMD\ConsoleMandel\ConsoleMandel\ConsoleMandel.dasm - ComplexFloat:op_Addition(ComplexFloat,ComplexFloat):ComplexFloat
Top method improvements (percentages):
          -6 (-10.17% of base) : SIMD\ConsoleMandel\ConsoleMandel\ConsoleMandel.dasm - ComplexFloat:square():ComplexFloat:this
          -6 (-9.38% of base) : SIMD\ConsoleMandel\ConsoleMandel\ConsoleMandel.dasm - ComplexFloat:op_Addition(ComplexFloat,ComplexFloat):ComplexFloat
         -14 (-3.23% of base) : SIMD\ConsoleMandel\ConsoleMandel\ConsoleMandel.dasm - ScalarFloatRenderer:RenderSingleThreadedWithADT(float,float,float,float,float):this
         -14 (-2.30% of base) : SIMD\ConsoleMandel\ConsoleMandel\ConsoleMandel.dasm - <>c__DisplayClass4_0:<RenderMultiThreadedWithADT>b__0(int):this (2 methods)
         -12 (-1.08% of base) : BenchmarksGame\reverse-complement\reverse-complement-1\reverse-complement-1.dasm - ReverseComplement_1:Bench(Stream,Stream)
5 total methods with Code Size differences (5 improved, 0 regressed), 1888 unchanged.

@erozenfeld erozenfeld marked this pull request as ready for review June 25, 2020 21:55
@erozenfeld
Copy link
Member Author

@CarolEidt @sandreenko @AndyAyersMS PTAL cc: @dotnet/jit-contrib

Copy link
Contributor

@CarolEidt CarolEidt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - I suspect that getting some of these conditions right was tricky! Have you measured the throughput or size impact of adding the defsInBlock map?

@CarolEidt
Copy link
Contributor

FYI - the timeout on the CoreCLR Pri0 Runtime Tests Run Windows_NT x86 checked has been quite frequent lately. I've seen it on my last two PRs - one required 2 retries to get it to complete.
cc @dotnet/dnceng

@erozenfeld
Copy link
Member Author

@CarolEidt

I suspect that getting some of these conditions right was tricky!

Yes, this took multiple iterations. Tracking down failures caused by removing stores to pinned locals with ref count 1 was painful.

Have you measured the throughput or size impact of adding the defsInBlock map?

I measured crossgen of SPC with pin-icount. Somewhat surprisingly, it shows a throughput improvement of 0.02%, which is close to noise level.

Copy link
Contributor

@sandreenko sandreenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, do you want to run outerloop/stress tests before merge?

Copy link
Contributor

@briansull briansull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks Good

@erozenfeld
Copy link
Member Author

LGTM, do you want to run outerloop/stress tests before merge?

Yes, will rebase and run outerloop and stress pipelines overnight.

1. Don't mark fields of dependently promoted structs as untracked.

2. Remove some stores whose lhs local has a ref count of 1 when running
late liveness. We can rely on ref counts since they are calculated right
before the late liveness pass.

3. Remove dead GT_STORE_BLK in addition to GT_STOREIND in the late
liveness pass.

4. Remove dead stores to untracked locals in the late liveness pass.

5. Allow optRemoveRedundantZeroInits to remove some redundant initializations
of tracked locals. Move the phase to right after liveness so that SSA is correct
after removing assignments to tracked locals.
@JulieLeeMSFT JulieLeeMSFT added this to the 5.0.0 milestone Jun 26, 2020
@garath
Copy link
Member

garath commented Jun 26, 2020

FYI - the timeout on the CoreCLR Pri0 Runtime Tests Run Windows_NT x86 checked has been quite frequent lately. I've seen it on my last two PRs - one required 2 retries to get it to complete.
cc @dotnet/dnceng

Noted, and thanks for highlighting it to us. I'll check some telemetry, but the likely cause is that this queue is very busy. We'll continue to monitor the pressure.

@erozenfeld
Copy link
Member Author

erozenfeld commented Jun 27, 2020

Loader/binding/assemblies/assemblybugs/37910/Ii/Ii.sh failures in outerloop and jitstress2-jitstressregs pipelines are #38452.

JIT\jit64\mcc\interop\mcc_i7*\mcc_i7*.cmd failures in jitstress2-jitstressregs are #36199.

All failures in gcstress0x3-gcstress0xc pipeline are also seen in master except crossgen2smoke: https://dev.azure.com/dnceng/public/_build/results?buildId=705744&view=ms.vss-test-web.build-test-results-tab&runId=21861154&resultId=109360&paneView=debug
I'll try to repro it locally.

@erozenfeld
Copy link
Member Author

Failure mode in crossgen2smoke is very similar to #34316.

@erozenfeld
Copy link
Member Author

I was able to repro crossgen2smoke failure locally both with my changes (on the 6th run) and without my changes (on the 7th run). I opened #38482.

@erozenfeld erozenfeld merged commit 5e153a2 into dotnet:master Jun 27, 2020
@MattGal
Copy link
Member

MattGal commented Jun 29, 2020

FYI - the timeout on the CoreCLR Pri0 Runtime Tests Run Windows_NT x86 checked has been quite frequent lately. I've seen it on my last two PRs - one required 2 retries to get it to complete.
cc @dotnet/dnceng

Noted, and thanks for highlighting it to us. I'll check some telemetry, but the likely cause is that this queue is very busy. We'll continue to monitor the pressure.

@garath when you see logging like error : (NETCORE_ENGINEERING_TELEMETRY=Test) Work item ce3b1eca-cf51-4249-9c24-9a84e2fce15f/PayloadGroup0 in job ce3b1eca-cf51-4249-9c24-9a84e2fce15f has failed. Failure log: https://helix.dot.net/api/2019-06-17/jobs/ce3b1eca-cf51-4249-9c24-9a84e2fce15f/workitems/PayloadGroup0/console that's a work item timeout and not subject to the amount of work being sent in. Aside from helping to investigate the hangs there's not much we can do here to help the runtime team.

@ghost ghost locked as resolved and limited conversation to collaborators Dec 8, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants