Parallel project analysis behind a feature flag #13521

safesparrow · 2022-07-17T20:15:00Z

FCS code analysis currently runs in serial - project-by-project.
There is a big potential for speedup by analysing all unrelated projects at the same time.
Some discussion around this happened on F# Slack with @vzarytovskii, @cartermp and others.

Given the simplest example:

graph TD;
    A-->B1;
    A-->B2;
    B1-->C;
    B2-->C;

the current order of analysis will be start A, start B1, start C, finish C, finish B1, start B2, finish B2, finish A.
With this change the order changes to start A, start B1&B2, start C, finish C, finish B1&B2, finish A

What this change does:

tcImports.RegisterAndImportReferencedAssemblies invokes computations for each reference using Async.Parallel instead of Async.Sequential. This includes DLL references and project references.
Both DLLs and project references are processed in parallel.
For project references this means many incremental builders are created and projects analysed in parallel
There is no logic that scans the project graph up-front - projects are analysed as references are discovered, top-down.

How the feature is enabled

The feature is currently wired up for the two main cases: 1. standalone compiler and 2. FSharpChecker.
It is controlled as follows:

If the FCS_ParallelReferenceResolution environment variable is set, its value (true or false) dictates whether the feature is on or off.
If the environment variable is not set (to a valid value):

FSC falls back to an internal CLI option --parallelreferenceresolution which when set, enables the feature:

Microsoft (R) F# Compiler version 12.0.5.0 for F# 7.0
Copyright (c) Microsoft Corporation. All Rights Reserved.

warning FS0075: The command-line option '--parallelreferenceresolution' is for test purposes only

FSharpChecker.Create has a new parameter that when set to true, enables the feature.

Some things I'm not 100% sure about:

I think the cache mechanism for IncrementalBuilders means that for a given project only one builder will be created, even if many dependents will try to process it at the same time (thinking about a race condition in A -> B1, B2 -> C where B1 and B2 request analysis of C).
I think the way I'm creating those computations with NodeCode and Async.Parallel is correct, but I'm not sure.
Is it OK to parallelise resolving assemblies and not just projects?
If this applies to assembly processing, do we want that, and do we want that to apply during compilation as well (rather than code analysis)?

Limiting parallelisation

When enabled, should parallelisation be configurable to maximum of X threads? If so, should that be per request, or globally for the process? How should it be configured?
I initially planned to add an option to limit it, but given that separate analysis requests as I understand are completely independent and thus there is no limit to how multi-threaded FCS is already, I would argue that this PR isn't a major change in that regard.
However it definitely increases the number of threads running analysis in a typical scenario.

Three issues with that that I can think of:

too many threads causing CPU contention, context switching - will ThreadPool scheduling save us and not schedule too many operations?
I/O contention - probably requires looking at the profiler.
high memory usage

What speedup does this make?

Besides any testing, this change feels very natural to me in principle.
At our company we operate on big solutions with ~100 F# projects. Before doing an accurate analysis I'm fairly sure that the level of parallelisation possible in that project graph is >=2, which means almost 2x speedup (and I think it's actually >> 2).

GC consideration

Running code analysis on many threads means much higher allocation rate leading to more work for GC.
From the tests I observed that with Workstation GC (which I believe is used by Rider by default) FCS was spending as much as 70% of time in GC.

Enabling Server GC had therefore a huge impact on the timings.

Test results

The below script can be used to measure the impact of this feature.

It utilises the benchmark generator from https://github.com/safesparrow/fsharp-benchmark-generator .

Script: (click to see details)

# Checkout and build a specific version of FSharp codebase
git clone https://github.com/safesparrow/fsharp
Push-Location fsharp
git checkout 4431af5960bcaece76660ec3995faaf7d8d178ac
./build.cmd -c Release -noVisualStudio
Pop-Location

# Checkout benchmark generator
git clone https://github.com/safesparrow/fsharp-benchmark-generator
Push-Location fsharp-benchmark-generator

Function FCSBenchmark {
    dotnet run -- -f ((Get-Location).Path + "/../fsharp/artifacts/bin/FSharp.Compiler.Service/release/netstandard2.0/FSharp.Compiler.Service.dll") -i .\inputs\50_leaves.json -n 1
}

# Run 4 benchmarks with 4 different environment configs, save results in files
$env:DOTNET_gcServer=0; $env:FCS_PARALLEL_PROJECTS_ANALYSIS="false"; FCSBenchmark > ../workstation_sequential.log
$env:DOTNET_gcServer=0; $env:FCS_PARALLEL_PROJECTS_ANALYSIS="true"; FCSBenchmark > ../workstation_parallel.log
$env:DOTNET_gcServer=1; $env:FCS_PARALLEL_PROJECTS_ANALYSIS="false"; FCSBenchmark > ../server_sequential.log
$env:DOTNET_gcServer=1; $env:FCS_PARALLEL_PROJECTS_ANALYSIS="true"; FCSBenchmark > ../server_parallel.log

Pop-Location

Test 1: Extremely parallel example: Root -> 50 leaves

See https://github.com/safesparrow/fsharp-benchmark-generator/blob/main/inputs/50_leaves.json for the definition.

Click to see project diagram

10 iteration average (the first iteration is considerably slower):

GC	Mode	Time
Workstation	Sequential	16005ms
Workstation	Parallel	10849ms
Server	Sequential	14594ms
Server	Parallel	2659ms

Test 2: Almost sequential project structure: Fantomas.Tests

See https://github.com/safesparrow/fsharp-benchmark-generator/blob/main/inputs/fantomas.json for the definition.

Click to see project diagram

Here the only parallelisation can happen between `Fantomas.Client` and `Fantomas.FCS`.

However Fantomas.Client is a 3-file project, so its analysis doesn't take long and my claim is that the ~200ms difference is roughly the time it takes to analyse Fantomas.Client.

GC	Mode	Time
Workstation	Sequential	8496ms
Workstation	Parallel	8353ms
Server	Sequential	6492ms
Server	Parallel	6222ms

TODO:

Discuss all the above issues
Inject the "analyse in parallel" flag from top-level
Provide more test results
Test this change on the CI (currently it's disabled by default)

safesparrow · 2022-07-23T13:15:51Z

I have provided some test results which show the performance impact.
@vzarytovskii Would it be possible to consider this change?

As mentioned in the description there are tweaks to be made (eg. a different way to turn it on/off), but that requires input from the maintainers 🙂

vzarytovskii · 2022-07-25T08:37:20Z

Change is great, and needs a bunch of testing, especially in Visual Studio, since it loads projects differently, and has some quirks regarding the threading.

We will also probably need a flag passed to checker, and a VS feature flag (for easier in-VS testing, which will be shown in the settings).

Update 1: I'd also be interested in differences in memory consumption.
Update 2: GC is important, I wonder how does it behave on "slower" machines, I will test it on my laptop (which doesn't have great specs).

Thoughts @KevinRansom @dsyme ( + @TIHan and @cartermp, sorry for the ping :) )

safesparrow · 2022-07-25T11:27:36Z

We will also probably need a flag passed to checker, and a VS feature flag (for easier in-VS testing, which will be shown in the settings).

Makes sense. @auduchinok if this change were to land, how much work would it be for Rider to pick up the FCS version that supports it and an option to enable it?

Update 1: I'd also be interested in differences in memory consumption.

Yes, that's something worth looking at.
Would be nice to add more stats to https://github.com/safesparrow/fsharp-benchmark-generator/ .
Possibly worth making Benchmark.Runner a BDN benchmark with carefully chosen config as to not make it take too long.

Update 2: GC is important, I wonder how does it behave on "slower" machines, I will test it on my laptop (which doesn't have great specs).

Would be lovely if one day this question can be answered automatically by running performance tests on a variety of predefined [possibly virtual, if results are not skewed too much] machines.

For now we can capture those stats in https://github.com/safesparrow/fsharp-benchmark-generator/.

In the short-term I think what we should do is:

gather a set of metrics to look for
gather/generate a set of codebases to test this on

Then we can either:

find enough variations of hardware by asking people here to volunteer
get the hardware from the cloud

And then run https://github.com/safesparrow/fsharp-benchmark-generator/ for every combination.

Regarding VS:
What's special about it?
Can we distill any/most important unique environment conditions it provides into a reusable piece of code that can then be trivially set up in a benchmark?

auduchinok · 2022-07-25T11:32:33Z

Makes sense. @auduchinok if this change were to land, how much work would it be for Rider to pick up the FCS version that supports it and an option to enable it?

It would be a bit more difficult this time, since there're many impactful refactorings has been made upstream. 🙂
I'm going to update it for 2022.3 release at some time during EAP.

vzarytovskii · 2022-07-25T15:06:39Z

Regarding VS:
What's special about it?

If I am not mistaken, it pretty much has its own threading model/primitives which may affect the behaviour here.

safesparrow · 2022-07-25T16:27:22Z

Regarding VS:
What's special about it?

If I am not mistaken, it pretty much has its own threading model/primitives which may affect the behaviour here.

I think this is another vote for making this opt-in,
but nonetheless:
Do you know if it would be possible to use the same threading model in a benchmark to increase coverage without needing the full VS?
Obviously we'd want to try it in full VS at some point. But hopefully not all future changes like this (but maybe less significant) require a lot of manual effort to verify behaviour in VS.

src/Compiler/Driver/CompilerImports.fs

dsyme · 2022-07-28T14:44:39Z

I think the cache mechanism for IncrementalBuilders means that for a given project only one builder will be created, even if many dependents will try to process it at the same time (thinking about a race condition in A -> B1, B2 -> C where B1 and B2 request analysis of C).

Yes, this is correct

I think the way I'm creating those computations with NodeCode and Async.Parallel is correct, but I'm not sure.

This is correct.

Is it OK to parallelise resolving assemblies and not just projects?

I think it is ok.

On initial review I spotted some concerns about concurrent assembly loading/referencing, I'll document those below. However these are not in practice a problem in the case of IncrementalBuilder, the overall assembly loading and resolution process is held within a GraphNode fot the unique builder, and all requests requiring the builder will have to await the completion of that GraphNode.

The part I was concerned about are concurrent access with regard to the RegisterDll and RegisterCcu calls which are executing before the phase2 fixups executed, that is all these calls:

TryRegisterAndPrepareToImportReferencedDll
                    tcImports.RegisterDll dllinfo

PrepareToImportReferencedILAssembly: 

        tcImports.RegisterCcu ccuinfo

PrepareToImportReferencedFSharpAssembly:

        ccuRawDataAndInfos |> List.iter (p23 >> tcImports.RegisterCcu)

Currently the sequence is

    Sequential [
        Phase1 Assembly A
        RegisterDll Assembly A
        RegisterCcu  Assembly A
        Phase1 Assembly B
        RegisterDll Assembly B
        RegisterCcu  Assembly B
        Phase2 Assembly A
        Phase2 Assembly B
    ]

With the change it would become

    Parallel [
        Sequential [
            Phase1Assembly A
            RegisterDll Assembly A
            RegisterCcu  Assembly A
        ]
        Sequential [
            Phase1Assembly B
            RegisterDll Assembly B
            RegisterCcu  Assembly B
        ]
    ]
    Phase2 Assembly A
    Phase2 Assembly B

This amounts to the same thing, and as mentioned all of it is within the control of a sequentializing GraphNode here or here.

If this applies to assembly processing, do we want that, and do we want that to apply during compilation as well (rather than code analysis)?

Yes, we would should apply it during command-line compilation as well, subject to performance testing.

…dd a constructor arg to the FSharpChecker.

safesparrow · 2022-09-17T12:57:21Z

I have now wired up the feature setting from top-level in both FSC and FSharpChecker.
The user can opt-in to the feature by setting the environment variable FCS_ParallelReferenceResolution=true, or through MSBuild/IDE.

I also raised #13835 with a feature that produces OpenTelemetry traces, and the description contains example traces that compare the type-checking timeline with the feature on and off - see #13835 (comment) .
This was done to produce results that would help verify, that the behaviour of parallel analysis is as expected.

The few performance tests from which results I shared show the expected benefits from parallel work and as far as I can see, nothing indicates that work is being duplicated.

@vzarytovskii @dsyme What would be the next steps before this PR can be accepted as an opt-in, non-public/experimental feature?

vzarytovskii

@dsyme I'm fine with merging it
@safesparrow thanks a lot for this!

…ysis

dsyme · 2022-09-23T18:32:29Z

@safesparrow I can't merge to your branch, can you merge main into this please?

dsyme · 2022-09-23T19:25:41Z

@safesparrow Thanks!

dsyme · 2022-09-23T19:26:19Z

@safesparrow Actually it's showing it's needs another merge (normally we can update but for some reason your branches don't allow it)

safesparrow · 2022-09-23T19:29:28Z

Actually it's showing it's needs another merge

Ah, those changes in main are coming quicker than PR checks can run 😄
I did another update now.

normally we can update but for some reason your branches don't allow it

Not sure why. The Allow edits by maintainers checkbox is checked:

…arrow/fsharp into parallel-project-analysis

safesparrow · 2022-09-23T21:58:26Z

Not sure why, but a single regression test failed after the latest merge. I submitted a dummy commit to trigger a rerun.

EDIT: Actually during one of the merges I mishandled git and ended up deleting one of the regression test samples. This time tests should pass 🤞

dsyme · 2022-09-24T01:27:18Z

You seem to need to keep merging, my apologies.

* Allow parallel project analysis with an environment variable * reformat CompilerImports.fs * Wire in the flag from top-level, add an internal option to the FSC, add a constructor arg to the FSharpChecker. * Fantomas formatting * Update surface baseline * Cleanup * Dummy commit to trigger PR checks * Update surface baseline after merge * Empty commit to trigger PR check rerun * Empty commit to trigger PR check rerun * Restore tests/fsharp/regression/13219/test.fsx

dotnet/fsharp#13521

Allow parallel project analysis with an environment variable

c62e89c

safesparrow changed the title ~~Allow parallel project analysis with an environment variable~~ Allow parallel project analysis with an environment variable WIP Jul 17, 2022

reformat CompilerImports.fs

4431af5

safesparrow marked this pull request as ready for review July 23, 2022 13:15

vzarytovskii reviewed Jul 25, 2022

View reviewed changes

src/Compiler/Driver/CompilerImports.fs Outdated Show resolved Hide resolved

This was referenced Sep 3, 2022

Optionally record activities in the service #13835

Merged

Add support for OpenTelemetry, testing server GC, testing parallel analysis. safesparrow/fsharp-benchmark-generator#12

Merged

safesparrow added 3 commits September 17, 2022 09:31

Merge branch 'main' into parallel-project-analysis

5bb7f14

Wire in the flag from top-level, add an internal option to the FSC, a…

ad26490

…dd a constructor arg to the FSharpChecker.

Fantomas formatting

4f1ea49

safesparrow added 3 commits September 17, 2022 14:54

Update surface baseline

d3996c0

Merge branch 'main' into parallel-project-analysis

818fb9f

Cleanup

bac0f3e

safesparrow mentioned this pull request Sep 17, 2022

Expose "Parallel projects analysis" option in Visual Studio #13919

Closed

safesparrow changed the title ~~Allow parallel project analysis with an environment variable WIP~~ Parallel project analysis behind a feature flag Sep 17, 2022

Dummy commit to trigger PR checks

13c9044

vzarytovskii approved these changes Sep 21, 2022

View reviewed changes

safesparrow added 3 commits September 21, 2022 20:37

Merge origin/main

2bcf5ee

Update surface baseline after merge

b227343

Merge remote-tracking branch 'origin/main' into parallel-project-anal…

ed3d7b7

…ysis

Merge origin/main

76d63d4

dsyme approved these changes Sep 23, 2022

View reviewed changes

Merge branch 'main' into parallel-project-analysis

6e66d85

dsyme enabled auto-merge (squash) September 23, 2022 20:02

safesparrow added 3 commits September 23, 2022 22:57

Empty commit to trigger PR check rerun

7521f35

Merge branch 'parallel-project-analysis' of https://github.com/safesp…

4ec714f

…arrow/fsharp into parallel-project-analysis

Empty commit to trigger PR check rerun

5bb7c14

Restore tests/fsharp/regression/13219/test.fsx

bf9548a

auto-merge was automatically disabled September 24, 2022 00:08
Head branch was pushed to by a user without write access

dsyme enabled auto-merge (squash) September 24, 2022 01:22

dsyme merged commit 97c7b9a into dotnet:main Sep 24, 2022

allisonchou mentioned this pull request Oct 12, 2022

[Automated] PRs inserted in VS build main-33012.18 #14092

Closed

TheAngryByrd added a commit to TheAngryByrd/ionide-vscode-fsharp that referenced this pull request Feb 21, 2023

Adds FSharp.fsac.parallelReferenceResolution

bf56cc6

dotnet/fsharp#13521

TheAngryByrd mentioned this pull request Feb 21, 2023

Adds FSharp.fsac.parallelReferenceResolution ionide/ionide-vscode-fsharp#1828

Merged

TheAngryByrd added a commit to TheAngryByrd/ionide-vscode-fsharp that referenced this pull request Feb 21, 2023

Adds FSharp.fsac.parallelReferenceResolution

66662ec

dotnet/fsharp#13521

baronfel pushed a commit to ionide/ionide-vscode-fsharp that referenced this pull request Feb 21, 2023

Adds FSharp.fsac.parallelReferenceResolution (#1828)

9a7d425

dotnet/fsharp#13521

TheAngryByrd mentioned this pull request Feb 21, 2023

OpenTelemetry proof of concept ionide/FsAutoComplete#1060

Merged

safesparrow mentioned this pull request Jun 17, 2023

FSC: Consider enabling --parallelreferenceresolution by default #15428

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallel project analysis behind a feature flag #13521

Parallel project analysis behind a feature flag #13521

safesparrow commented Jul 17, 2022 •

edited

Loading

safesparrow commented Jul 23, 2022 •

edited

Loading

vzarytovskii commented Jul 25, 2022 •

edited

Loading

safesparrow commented Jul 25, 2022 •

edited

Loading

auduchinok commented Jul 25, 2022

vzarytovskii commented Jul 25, 2022 •

edited

Loading

safesparrow commented Jul 25, 2022

dsyme commented Jul 28, 2022

safesparrow commented Sep 17, 2022 •

edited

Loading

vzarytovskii left a comment

dsyme commented Sep 23, 2022

dsyme commented Sep 23, 2022

dsyme commented Sep 23, 2022

safesparrow commented Sep 23, 2022

safesparrow commented Sep 23, 2022 •

edited

Loading

dsyme commented Sep 24, 2022

Parallel project analysis behind a feature flag #13521

Parallel project analysis behind a feature flag #13521

Conversation

safesparrow commented Jul 17, 2022 • edited Loading

What this change does:

How the feature is enabled

Some things I'm not 100% sure about:

Limiting parallelisation

What speedup does this make?

GC consideration

Test results

Test 1: Extremely parallel example: Root -> 50 leaves

Test 2: Almost sequential project structure: Fantomas.Tests

safesparrow commented Jul 23, 2022 • edited Loading

vzarytovskii commented Jul 25, 2022 • edited Loading

safesparrow commented Jul 25, 2022 • edited Loading

auduchinok commented Jul 25, 2022

vzarytovskii commented Jul 25, 2022 • edited Loading

safesparrow commented Jul 25, 2022

dsyme commented Jul 28, 2022

safesparrow commented Sep 17, 2022 • edited Loading

vzarytovskii left a comment

Choose a reason for hiding this comment

dsyme commented Sep 23, 2022

dsyme commented Sep 23, 2022

dsyme commented Sep 23, 2022

safesparrow commented Sep 23, 2022

safesparrow commented Sep 23, 2022 • edited Loading

dsyme commented Sep 24, 2022

safesparrow commented Jul 17, 2022 •

edited

Loading

safesparrow commented Jul 23, 2022 •

edited

Loading

vzarytovskii commented Jul 25, 2022 •

edited

Loading

safesparrow commented Jul 25, 2022 •

edited

Loading

vzarytovskii commented Jul 25, 2022 •

edited

Loading

safesparrow commented Sep 17, 2022 •

edited

Loading

safesparrow commented Sep 23, 2022 •

edited

Loading