Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: Don't reorder handler blocks #112292

Merged
merged 2 commits into from
Feb 10, 2025

Conversation

amanasifkhalid
Copy link
Member

Splitting this out of #112004 to simplify diff triage. The cost of executing a handler region should be dominated by the runtime overhead of exception handling, so I don't think we're losing anything meaningful in not reordering them. In the case of finally regions, if they are sufficiently hot, the JIT should've copied them into the main method region.

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Feb 7, 2025
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

@jakobbotsch
Copy link
Member

jakobbotsch commented Feb 9, 2025

I'm curious about the motivation -- what is the benefit of skipping the layout for handlers? Is there a good reason to not optimize handlers?

Note that the VM side has switched stance on importance of EH perf recently: #77568. So I think it needs to come with some motivation if we are intentionally skipping opts for handlers.

@amanasifkhalid
Copy link
Member Author

Not having to handle exceptional flow during block layout means we can simplify our implementation quite a bit. Our current implementation is relatively simple because it doesn't allow reordering blocks in different regions, but one issue I want to address in the consolidated implementation is subpar placement of whole try regions.

The goal of the consolidation is to reduce block layout's workflow to the following:

  1. Compute a loop-aware RPO that skips "unimportant" (i.e. cold) blocks
  2. Pass this span of important blocks to 3-opt
  3. Reorder the block list once, such that the order of important blocks is as close to what 3-opt came up with as possible, and cold blocks have been pushed to the end of the method

If we implement step 3 with the invariant that we only reorder within EH regions, child regions frequently "sink" down the method. Take the following example initial layout (sorry it isn't smaller, I just grabbed this from aspnet diffs):

---------------------------------------------------------------------------------------------------------------------------------------------------------------------
BBnum BBid ref try hnd preds           weight   IBC [IL range]   [jump]                            [EH region]        [flags]
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
BB01 [0000]  1                             1        [000..00E)-> BB02(1)                 (always)                     i LIR nullcheck
BB02 [0001]  1  0    BB01                  0.50  50 [00E..00F)-> BB05(1)                 (always) T0      try { }     i LIR IBC keep gcsafe nullcheck
BB05 [0003]  1       BB02                  0.50  50 [032..03B)-> BB07(0.00237),BB06(0.998)   ( cond )                     i LIR IBC nullcheck
BB06 [0014]  1       BB05                  1.00 100 [03A..03B)-> BB24(1)                 (always)                     i LIR IBC
BB07 [0015]  1       BB05                  0.00   0 [03A..03B)-> BB09(1),BB08(0)         ( cond )                     i LIR IBC
BB08 [0018]  1       BB07                  0      0 [03A..03B)-> BB09(1)                 (always)                     i LIR IBC rare hascall gcsafe
BB09 [0019]  2       BB07,BB08             0.00   0 [03A..03B)-> BB13(0),BB10(1)         ( cond )                     i LIR IBC nullcheck
BB10 [0031]  1       BB09                  0.00   0 [03A..03B)-> BB12(1),BB11(0)         ( cond )                     i LIR IBC nullcheck
BB11 [0032]  1       BB10                  0      0 [03A..03B)-> BB13(0.000991),BB12(0.999)    ( cond )                     i LIR IBC rare hascall gcsafe
BB12 [0033]  2       BB10,BB11             0.00   0 [03A..03B)-> BB17(1),BB16(0)         ( cond )                     i LIR IBC nullcheck
BB13 [0034]  2       BB09,BB11             0      0 [03A..03B)                           (throw )                     i LIR IBC rare gcsafe nullcheck
BB16 [0026]  1       BB12                  0      0 [03A..03B)-> BB17(1)                 (always)                     i LIR IBC rare hascall gcsafe
BB17 [0027]  2       BB16,BB12             0.00   0 [03A..03B)-> BB19(0.5),BB18(0.5)     ( cond )                     i LIR IBC nullcheck
BB18 [0041]  1       BB17                  0.00   0 [03A..03B)                           (throw )                     i LIR IBC hascall gcsafe
BB19 [0042]  1       BB17                  0      0 [03A..03B)-> BB24(0.569),BB20(0.431) ( cond )                     i LIR IBC rare hascall
BB20 [0037]  1       BB19                  0      0 [03A..03B)-> BB24(1)                 (always)                     i LIR IBC rare hascall gcsafe
BB24 [0021]  4       BB04,BB06,BB20,BB19   1    100 [03A..046)                           (return)                     i LIR IBC
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ funclets follow
BB04 [0002]  1     0                       0        [01B..032)-> BB24(1)                 ( cret )    H0 F catch { }   i LIR rare keep hascall xentry gcsafe flet nullcheck
---------------------------------------------------------------------------------------------------------------------------------------------------------------------

BB02's preferred predecessor is BB01, but they are in different regions, so BB02 sinks to the end of the hot section as we bubble non-try blocks up. In this method, several blocks barely exceed the threshold for "cold," so without the ability to move try regions, BB02 ends up interleaved with relatively cold code:


---------------------------------------------------------------------------------------------------------------------------------------------------------------------
BBnum BBid ref try hnd preds           weight   IBC [IL range]   [jump]                            [EH region]        [flags]
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
BB01 [0000]  1                             1        [000..00E)-> BB02(1)                 (always)                     i LIR nullcheck
BB05 [0003]  1       BB02                  0.50  50 [032..03B)-> BB07(0.00237),BB06(0.998)   ( cond )                     i LIR IBC nullcheck
BB06 [0014]  1       BB05                  1.00 100 [03A..03B)-> BB24(1)                 (always)                     i LIR IBC
BB24 [0021]  4       BB04,BB06,BB20,BB19   1    100 [03A..046)                           (return)                     i LIR IBC
BB07 [0015]  1       BB05                  0.00   0 [03A..03B)-> BB09(1),BB08(0)         ( cond )                     i LIR IBC
BB09 [0019]  2       BB07,BB08             0.00   0 [03A..03B)-> BB13(0),BB10(1)         ( cond )                     i LIR IBC nullcheck
BB10 [0031]  1       BB09                  0.00   0 [03A..03B)-> BB12(1),BB11(0)         ( cond )                     i LIR IBC nullcheck
BB12 [0033]  2       BB10,BB11             0.00   0 [03A..03B)-> BB17(1),BB16(0)         ( cond )                     i LIR IBC nullcheck
BB17 [0027]  2       BB16,BB12             0.00   0 [03A..03B)-> BB19(0.5),BB18(0.5)     ( cond )                     i LIR IBC nullcheck
BB18 [0041]  1       BB17                  0.00   0 [03A..03B)                           (throw )                     i LIR IBC hascall gcsafe
BB02 [0001]  1  0    BB01                  0.50  50 [00E..00F)-> BB05(1)                 (always) T0      try { }     i LIR IBC keep gcsafe nullcheck
...

Even if we raised the threshold for cold blocks, we'd still have this awkward jump from BB01 to the end of the hot region, and then a jump back up from BB02. To fix such cases, I implemented an additional reordering pass in my prototype that moves entire try regions up to their preferred predecessor, if doing so doesn't break nesting invariants. If we cannot use the predecessor 3-opt picked, then we move the region up to the last hot block we found in the parent region, so that we at least aren't interleaving hot and cold code. The above case is easily addressed by this pass:

---------------------------------------------------------------------------------------------------------------------------------------------------------------------
BBnum BBid ref try hnd preds           weight   IBC [IL range]   [jump]                            [EH region]        [flags]
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
BB01 [0000]  1                             1        [000..00E)-> BB02(1)                 (always)                     i LIR nullcheck
BB02 [0001]  1  0    BB01                  0.50  50 [00E..00F)-> BB05(1)                 (always) T0      try { }     i LIR IBC keep gcsafe nullcheck
BB05 [0003]  1       BB02                  0.50  50 [032..03B)-> BB07(0.00237),BB06(0.998)   ( cond )                     i LIR IBC nullcheck
BB06 [0014]  1       BB05                  1.00 100 [03A..03B)-> BB24(1)                 (always)                     i LIR IBC
BB24 [0021]  4       BB04,BB06,BB20,BB19   1    100 [03A..046)                           (return)                     i LIR IBC
...

The EH nesting checks needed to do this are cumbersome, so to simplify my initial prototype, I defined "unimportant" in step 1 as "cold, or a handler block". Our 3-opt implementation never considered edges in hander regions to begin with, but if we believe we're losing something by not placing handler regions in RPO, I can easily add that functionality back into my prototype. #112004 is bigger and incurs more churn than I'd like it to as-is, so I'd prefer to add further enhancements to it after it's been checked in.

@jakobbotsch
Copy link
Member

Thanks! Sounds reasonable to me.

@amanasifkhalid
Copy link
Member Author

Diffs show relatively little churn on most platforms. libraries_tests.run on win-x86 has considerably more size regressions that it does on other platforms. From the jit-analyze summary, it looks like we're incurring diffs in methods that are duplicated in the collection quite a bit, thus inflating the total size regression.

@amanasifkhalid amanasifkhalid merged commit 9b77dd4 into dotnet:main Feb 10, 2025
113 checks passed
@amanasifkhalid amanasifkhalid deleted the 3-opt-skip-handler-blocks branch February 10, 2025 14:44
grendello added a commit to grendello/runtime that referenced this pull request Feb 11, 2025
* main:
  Code clean up in AP for NonNull* (dotnet#112027)
  JIT: Invalidate LSRA's DFS tree if we aren't running new layout phase (dotnet#112364)
  Update dependencies from https://github.com/dotnet/source-build-reference-packages build 20250204.2 (dotnet#112339)
  Add doc on OS onboarding (dotnet#112026)
  Add `TypeName` APIs to simplify metadata lookup. (dotnet#111598)
  Internal monitor impl not using coop mutex causing deadlocks on Android. (dotnet#112358)
  Do not run NAOT arm64 OSX testing on all PRs (dotnet#112342)
  Special-case empty enumerables in AsyncEnumerable (dotnet#112321)
  Have mono handle ConvertToIntegerNative for Double and Single (dotnet#112206)
  Update dependencies from https://github.com/dotnet/arcade build 20250206.4 (dotnet#112338)
  System.Configuration.ConfigurationManager.Tests: use Assembly.Location to determine ThisApplicationPath. (dotnet#112231)
  Force write of local file header when "version needed to extract" changes (dotnet#112032)
  JIT: Don't reorder handler blocks (dotnet#112292)
  [RISC-V] Synthesize some floating constants inline (dotnet#111529)
  Enable `SA1000`: Spacing around keywords (dotnet#112302)
  Fix relocs for linux-riscv64 AOT (dotnet#112331)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants