-
Notifications
You must be signed in to change notification settings - Fork 797
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parallel project analysis behind a feature flag #13521
Conversation
I have provided some test results which show the performance impact. As mentioned in the description there are tweaks to be made (eg. a different way to turn it on/off), but that requires input from the maintainers 🙂 |
Change is great, and needs a bunch of testing, especially in Visual Studio, since it loads projects differently, and has some quirks regarding the threading. We will also probably need a flag passed to checker, and a VS feature flag (for easier in-VS testing, which will be shown in the settings). Update 1: I'd also be interested in differences in memory consumption. Thoughts @KevinRansom @dsyme ( + @TIHan and @cartermp, sorry for the ping :) ) |
Makes sense. @auduchinok if this change were to land, how much work would it be for Rider to pick up the FCS version that supports it and an option to enable it?
Yes, that's something worth looking at.
Would be lovely if one day this question can be answered automatically by running performance tests on a variety of predefined [possibly virtual, if results are not skewed too much] machines. For now we can capture those stats in https://github.com/safesparrow/fsharp-benchmark-generator/. In the short-term I think what we should do is:
Then we can either:
And then run https://github.com/safesparrow/fsharp-benchmark-generator/ for every combination. Regarding VS: |
It would be a bit more difficult this time, since there're many impactful refactorings has been made upstream. 🙂 |
If I am not mistaken, it pretty much has its own threading model/primitives which may affect the behaviour here. |
I think this is another vote for making this opt-in, |
Yes, this is correct
This is correct.
I think it is ok. On initial review I spotted some concerns about concurrent assembly loading/referencing, I'll document those below. However these are not in practice a problem in the case of IncrementalBuilder, the overall assembly loading and resolution process is held within a GraphNode fot the unique builder, and all requests requiring the builder will have to await the completion of that GraphNode. The part I was concerned about are concurrent access with regard to the TryRegisterAndPrepareToImportReferencedDll
tcImports.RegisterDll dllinfo
PrepareToImportReferencedILAssembly:
tcImports.RegisterCcu ccuinfo
PrepareToImportReferencedFSharpAssembly:
ccuRawDataAndInfos |> List.iter (p23 >> tcImports.RegisterCcu) Currently the sequence is
With the change it would become
This amounts to the same thing, and as mentioned all of it is within the control of a sequentializing GraphNode here or here.
Yes, we would should apply it during command-line compilation as well, subject to performance testing. |
…dd a constructor arg to the FSharpChecker.
I have now wired up the feature setting from top-level in both FSC and I also raised #13835 with a feature that produces OpenTelemetry traces, and the description contains example traces that compare the type-checking timeline with the feature on and off - see #13835 (comment) . The few performance tests from which results I shared show the expected benefits from parallel work and as far as I can see, nothing indicates that work is being duplicated. @vzarytovskii @dsyme What would be the next steps before this PR can be accepted as an opt-in, non-public/experimental feature? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dsyme I'm fine with merging it
@safesparrow thanks a lot for this!
@safesparrow I can't merge to your branch, can you merge main into this please? |
@safesparrow Thanks! |
@safesparrow Actually it's showing it's needs another merge (normally we can update but for some reason your branches don't allow it) |
Not sure why, but a single regression test failed after the latest merge. I submitted a dummy commit to trigger a rerun. EDIT: Actually during one of the merges I mishandled git and ended up deleting one of the regression test samples. This time tests should pass 🤞 |
Head branch was pushed to by a user without write access
You seem to need to keep merging, my apologies. |
* Allow parallel project analysis with an environment variable * reformat CompilerImports.fs * Wire in the flag from top-level, add an internal option to the FSC, add a constructor arg to the FSharpChecker. * Fantomas formatting * Update surface baseline * Cleanup * Dummy commit to trigger PR checks * Update surface baseline after merge * Empty commit to trigger PR check rerun * Empty commit to trigger PR check rerun * Restore tests/fsharp/regression/13219/test.fsx
FCS code analysis currently runs in serial - project-by-project.
There is a big potential for speedup by analysing all unrelated projects at the same time.
Some discussion around this happened on F# Slack with @vzarytovskii, @cartermp and others.
Given the simplest example:
the current order of analysis will be
start A, start B1, start C, finish C, finish B1, start B2, finish B2, finish A
.With this change the order changes to
start A, start B1&B2, start C, finish C, finish B1&B2, finish A
What this change does:
tcImports.RegisterAndImportReferencedAssemblies
invokes computations for each reference usingAsync.Parallel
instead ofAsync.Sequential
. This includes DLL references and project references.How the feature is enabled
The feature is currently wired up for the two main cases: 1. standalone compiler and 2.
FSharpChecker
.It is controlled as follows:
FCS_ParallelReferenceResolution
environment variable is set, its value (true
orfalse
) dictates whether the feature is on or off.--parallelreferenceresolution
which when set, enables the feature:FSharpChecker.Create
has a new parameter that when set to true, enables the feature.Some things I'm not 100% sure about:
A -> B1, B2 -> C
whereB1
andB2
request analysis ofC
).NodeCode
andAsync.Parallel
is correct, but I'm not sure.Limiting parallelisation
When enabled, should parallelisation be configurable to maximum of X threads? If so, should that be per request, or globally for the process? How should it be configured?
I initially planned to add an option to limit it, but given that separate analysis requests as I understand are completely independent and thus there is no limit to how multi-threaded FCS is already, I would argue that this PR isn't a major change in that regard.
However it definitely increases the number of threads running analysis in a typical scenario.
Three issues with that that I can think of:
What speedup does this make?
Besides any testing, this change feels very natural to me in principle.
At our company we operate on big solutions with ~100 F# projects. Before doing an accurate analysis I'm fairly sure that the level of parallelisation possible in that project graph is >=2, which means almost 2x speedup (and I think it's actually >> 2).
GC consideration
Running code analysis on many threads means much higher allocation rate leading to more work for GC.
From the tests I observed that with Workstation GC (which I believe is used by Rider by default) FCS was spending as much as 70% of time in GC.
Enabling Server GC had therefore a huge impact on the timings.
Test results
The below script can be used to measure the impact of this feature.
It utilises the benchmark generator from https://github.com/safesparrow/fsharp-benchmark-generator .
Script: (click to see details)
Test 1: Extremely parallel example: Root -> 50 leaves
See https://github.com/safesparrow/fsharp-benchmark-generator/blob/main/inputs/50_leaves.json for the definition.
Click to see project diagram
10 iteration average (the first iteration is considerably slower):
Test 2: Almost sequential project structure: Fantomas.Tests
See https://github.com/safesparrow/fsharp-benchmark-generator/blob/main/inputs/fantomas.json for the definition.
Click to see project diagram
However
Fantomas.Client
is a 3-file project, so its analysis doesn't take long and my claim is that the ~200ms difference is roughly the time it takes to analyseFantomas.Client
.TODO: