Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase ThreadPool.MinThreads limit #3845

Merged
merged 2 commits into from
Jul 13, 2022

Conversation

nohwnd
Copy link
Member

@nohwnd nohwnd commented Jul 7, 2022

Description

Raise the limit of how many thread can ThreadPool start without waiting. The limit is by default the based on the number of logical cores. But vstest.console blocks many threads because it uses non-async locks to wait for actions to complete, and so when we start many testhosts in parallel, each testhost wants at least 1 thread from threadpool, and so the limit is quickly reached. Threadpool slowly adds new threads (about 1 per second), which makes the testhost callbacks take a long time to complete. This can be observed by looking at the time where testhost reports it connected to server, and the time where we actually see the client was accepted.

Or by putting this piece of code into TestRequestSender:

 public bool WaitForRequestHandlerConnection(int connectionTimeout, CancellationToken cancellationToken)
    {
        var sw = Stopwatch.StartNew();
        EqtTrace.Verbose("TestRequestSender.WaitForRequestHandlerConnection: waiting for connection with timeout: {0}.", connectionTimeout);

        // Wait until either connection is successful, handled by connected.WaitHandle
        // or operation is canceled, handled by cancellationToken.WaitHandle
        // or testhost exits unexpectedly, handled by clientExited.WaitHandle
        var waitIndex = WaitHandle.WaitAny(new WaitHandle[] { _connected.WaitHandle, cancellationToken.WaitHandle, _clientExited.WaitHandle }, connectionTimeout);

        Debug.WriteLine($">>>>> waiting for client for {sw.ElapsedMilliseconds} ms");

        // Return true if connection was successful.
        return waitIndex == 0;
    }

Before the change this elapsed time can be from 250ms (which it roughly the time that testhost spends setting up and looking up extensions), till 3-4seconds when there are many testhosts running in parallel and threadpool is saturated and does not have enough non-blocked threads to process the incoming callbacks.


Raising the limit allows ThreadPool to start a new thread without waiting, and process the callback as soon as it comes. So the measurement above is much closer to 250ms.

Raising the limit also allows us to see that SkipDefaultAdapters has some impact on the execution time, because before this change the impact it had (~200ms) was greatly offset by the time we spend waiting to connect (~2000ms). We can also see the single channel back to IDE (SocketCommunicationManager.WriteAndFlushToChannel) to be a contention point (because many testhosts are trying to push as much data as they can through this single channel). And so we can also see BatchSize to have significant impact on speed, where moving it up from 10 to 100 and 1000 makes the discovery time progressively shorter.

Fix AB#1563539

@nohwnd
Copy link
Member Author

nohwnd commented Jul 7, 2022

Roslyn like solution, 74 projects, mix of net472 and net6.0,
250k tests total. 12 cores.

small threadpool, batch size 10 - 47s
small threadpool, batch size 1000 - 44s

big threadpool, batch size 10 - 45s
big threadpool, batch size 1000 - 34s

// The correct fix would be to re-visit all code that offloads work to threadpool and avoid blocking any thread,
// and also use async await when we need to await a completion of an action. But that is a far away goal, so this
// is a "temporary" measure to remove the threadpool contention.
var additionalThreadsCount = Environment.ProcessorCount * 4;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the concept of pushing the thread pool limit higher, but I'm not sure I get why we decided to raise the limit to exactly 4 times the processor count. Is "4" a constant determined empirically that yielded the best throughput overall or is there a more formal reason for choosing it ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used 2*, then 5* then 10*. I know that the path blocks 2-3 threads, and there are double the threads blocked when you have a datacollector. So it is part empirical, part guessing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I ended up with 1* (workerThreads) + 4*, to land on 5*

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would write that down as comment in code for later.

Copy link
Contributor

@MarcoRossignoli MarcoRossignoli Jul 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you tried to change only the worker one? I don't know the I/O one makes difference here, we're not async at all so I suppose we're not using it. And let's hope to not generate too much contention/memory consuption(if I recall well on .NET 1mb per stack is commited) on huge test servers 🤞 const heuristics is...meh...

Maybe if it's only for VS responsiveness we could recognize it(design mode) and increase only in that case, but not mandatory.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did try to change, but if the IO pool is not exhausted then this option has no impact, it simply won't get past the limit. In both cases we are just telling the threadpool thread to start more threads without delay when it needs to. We will eventually stop blocking them, and thread pool can re-balance itself, we are only paying a small price say 1MB per thread, but we know we will cap out on the amount of threads that is x times the processor count. So if there is a server that has 100 cores, it probably will also have 1GB of ram to spare. If it ever gets to the point where it allocates that many threads.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the comment.

@nohwnd nohwnd merged commit 221a673 into microsoft:main Jul 13, 2022
@MarcoRossignoli
Copy link
Contributor

MarcoRossignoli commented Nov 2, 2022

Threadpool slowly adds new threads (about 1 per second),

@nohwnd FYI this is better starting from 6.0 there's a CooperativeBlocking mechanism and at every Task.Wait/Task.Return the ThreadPool is injecting a new thread without wait.
Netfx suffer of it and it will suffer of it forever I think.

https://github.com/dotnet/runtime/blob/main/src/libraries/System.Private.CoreLib/src/System/Threading/PortableThreadPool.Blocking.cs#L120

Also completion port is not more used like in the past but I/O callback are "batched" and run with normal worker thread
https://github.com/dotnet/runtime/blob/main/src/libraries/System.Private.CoreLib/src/System/Threading/PortableThreadPool.IO.Windows.cs#L205
https://github.com/dotnet/runtime/blob/main/src/libraries/System.Private.CoreLib/src/System/Threading/ThreadPoolWorkQueue.cs#L1091

cc: @Evangelink

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants