-
Notifications
You must be signed in to change notification settings - Fork 569
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nima WebServer.stop() is slow / not timely when there are no active requests #5717
Comments
… not timely when there are no active requests
Note that the WebServerStopOnlyTest will pass when |
For the At the time of A sort-of-hack-that-fixes-this is to add a stopIfIdle() method to Thread thread;
volatile boolean currentlyReadingPrologue;
public void stopIfIdle() {
if (currentlyReadingPrologue && thread.getState() == Thread.State.WAITING) {
System.out.println("Http1Connection.stop() - interrupt thread [" + thread + "] while its WAITING readingPrologue " + Instant.now());
thread.interrupt();
}
} With some extra system out the test passes with output like:
and with this I can bump the |
…mely Nima WebServer.stop()
I have pushed a solution for It would be more elegant if the JDK ThreadPerTaskExecutor exposed some API to get the active threads or Runnables that are in thread state WAITING. Alternatively instead of ConnectionHandler implements InterruptableRunnable {
...
private volatile boolean currentlyReadingPrologue;
@Override
public boolean isInterruptableRunnable() {
return currentlyReadingPrologue; // true before http1prologue.readPrologue() and false after that
}
@Override
public void run() {
...
while (true) {
// prologue (first line of request)
currentlyReadingPrologue = true;
HttpPrologue prologue = http1prologue.readPrologue();
currentlyReadingPrologue = false;
...
}
} ... and have the internals of |
…o currentlyReadingPrologue
Hello @rbygrave, I saw your message about this on loom-dev. I'll post my answer here too. I don't think this has anything to do with Loom and virtual threads. The way to interrupt sockets has always been to close them. Then blocking methods will throw an exception. See this, for example: https://stackoverflow.com/questions/4425350/how-to-terminate-a-thread-blocking-on-socket-io-operation-instantly Nima seems to do that for the server socket, but it FIRST waits for the helidon/nima/webserver/webserver/src/main/java/io/helidon/nima/webserver/ServerListener.java Lines 133 to 137 in a45b04c
I think Nima should close the socket FIRST, then shutdown the (I have not checked how Nima handles the client sockets. They should probably be treated in the same way.) |
You'll see my answer in loom-dev but yes we want graceful shutdown allowing for in-flight requests. We can't just close the connections or interrupt the Http1Connection threads unless we know they are not the middle of processing in-flight requests.
I agree we can and should do the The |
Aha, I see. One potential application specific solution for Nima might be to maintain the state of the client connections in a field. When the time comes to stop the server all idle client sockets can be closed immediately, and the currently active sockets can be made to complete their current request and then close. But understand that you are thinking about whether an alternative solution is possible, where the states of the (virtual) threads can be used to determine which connection that is active and which connection that is idle. |
Yes, this is coded pretty much as you describe in the associated PR - https://github.com/helidon-io/helidon/pull/5718/files
The field is at: https://github.com/helidon-io/helidon/pull/5718/files#diff-c90e0012c4ad88754ff6db529fd9db71a10065dc2f102979e35f2e1983735970R315
Which is done via https://github.com/helidon-io/helidon/pull/5718/files#diff-c90e0012c4ad88754ff6db529fd9db71a10065dc2f102979e35f2e1983735970R142
Yes exactly. As I see it the application code is duplicating the features of ThreadPerTaskExecutor simply because ThreadPerTaskExecutor doesn't expose the active Runnables (or active Threads) via public api [but it does expose the active Threads for internal jdk use]. If ThreadPerTaskExecutor publicly exposed active Runnable that would provide a very elegant solution and I think if ThreadPerTaskExecutor made the api to expose the active Threads public to application code we could probably make that work ok. Another sounds-a-bit-crazy option is for Helidon to use it's own version of ThreadPerTaskExecutor such that we don't double up on the holding / adding / removing of Http1Connection runnables which is what the code does at the moment. |
FYI I have a discussion at I've described those desired changes as I see them on I have locally a modified version of |
Pushed a commit to https://github.com/helidon-io/helidon/pull/5748/files that changes to use a modified/hacked ThreadPerTaskExecutor. The main part of this change is at: https://github.com/helidon-io/helidon/pull/5748/files#diff-7e338ad5380cf95ade56f55831012583e5e99387c505ebc56260f530bd3371ccR97-R110 |
From loom-dev Ron:
My interpretation of Ron's point here is that ... For Helidon HTTP 1.1 keepalive true connections, currently the same thread is being used for the If these were in fact 2 separate Threads in 2 separate ExecutorServices then that would enable a more forceful shutdown of the threads that are waiting at Maybe something like:Split the readerExecutor ExecutorService in ServerListener into 2 ExecutorService like: ExecutorService readerExecutor; // waits on readPrologue() and then hands off work to workerExecutor
ExecutorService workerExecutor; // given the prologue processes the request Change the |
Thanks for the detailed description and analysis of the shutdown issue. Only just going over this issue and the proposed PR. A modified executor solution isn't ideal IMO, especially if it duplicates a lot of the JDK library code. Have you explored a solution where the |
I have not. See my thinking below ... TLDR: I don't think it helps.
Agreed, not ideal. Still not crazy IF it helps to prompt some discussion and change in JDK library code to assist our case here ... and maybe not crazy IF it's the difference between beating Netty or not and IF we only need to maintain it for a year or so.
FWIW as I see it, shutdown is a one-off-task and in improving this we are looking at options that incur extra per-request cost. As such it's likely that Nima is especially sensitive to any extra per-request costs as we often see millions of requests per the one-off shutdown task. Said another way, we want nice elegant solutions but Nima also likely has very high performance goals in play. Approx costs of options looked at so far:
So option 3 is what seems to be suggested by the likes of Ron but at first glance looks like the most expensive option for Nima to adopt with some unknown extra costs that would ideally be measured but take effort to assess properly. We are doing this all because JDK ThreadPerTaskExectutor only exposes its active underlying virtual threads internally to other JDK code (as I see it).
I don't see how ThreadFactory helps in that it is unaware of the removal of threads at task completion. At shutdown we ideally get access to the threads and associated task left in ThreadPerTaskExecutor. The ThreadPerTaskExecutor adds and removes Threads and the ThreadFactory isn't aware of the removals.
Another alternative approach is to maintain AtomicLong counter(s) that hold the "number of active requests" (and on shutdown wait until active requests is zero before shutting down the ES). At this stage I don't think this is better than the other 3 options I've looked at. |
Option 3 does seem to be the cleanest. It would be interesting to measure its cost.
I was thinking about doing additional wrapping of input runnables/callables and look for their completion, but this does not work because the In any case, as you stated, a solution to the problem available from the JDK would be ideal, such as the interface you proposed. |
Well it's all in the eye of the beholder of course but for me I had an attempt at doing Option 3 and I personally didn't actually like the result and for me I much preferred the existing code in terms of readability and understanding. Hopefully someone else does a better job of it :) |
…or more information see issue helidon-io#5717. This approach is based on that described in PR helidon-io#5748. It includes a simple executor SPI for tasks that are immediately interruptable, such as those waiting to read connection preambles. Some new tests that verify faster webserver shutdown.
…5876) * Proposal to implement a more efficient webserver shutdown strategy. For more information see issue #5717. This approach is based on that described in PR #5748. It includes a simple executor SPI for tasks that are immediately interruptable, such as those waiting to read connection preambles. Some new tests that verify faster webserver shutdown. * Updated copyright year. Signed-off-by: Santiago Pericasgeertsen <[email protected]> * Removed print statements and fixed logging. Signed-off-by: Santiago Pericasgeertsen <[email protected]> * Fixed checkstyle. Signed-off-by: Santiago Pericasgeertsen <[email protected]> * Avoid using @servertest when stopping server manually. Signed-off-by: Santiago Pericasgeertsen <[email protected]> * Dropped SPI for tasks, moved interfaces into webserver package. Signed-off-by: Santiago Pericasgeertsen <[email protected]> * Fixed checkstyle. Signed-off-by: Santiago Pericasgeertsen <[email protected]> * Make interface package private. Signed-off-by: Santiago Pericasgeertsen <[email protected]> --------- Signed-off-by: Santiago Pericasgeertsen <[email protected]>
Environment Details
Problem Description
WebServer#stop()
when there have been NO requests takes ~500ms (value of ServerListener.EXECUTOR_SHUTDOWN_MILLIS)WebServer#stop()
on an Idle server that has no active requests takes ~500msBackground wrt Reative Netty server
#2087
Steps to reproduce
PR - #5748 with failing tests for 1) and 2) Edit: This PR now also includes 2 ways of addressing the issue via modified jdk ThreadPerTaskExecutor and a application code approach (which I don't think is very nice).
Edit: Old PR was #5718
Motivation
Fast graceful shutdown as I see it is critical to the performance of Kubernetes rollout. That is, K8s rollout is generally only as fast as the startup time + shutdown time so if server shutdown is slow then rollout is slow. This is more impactful when a K8s service has a lot of pods/replicas and fast graceful shutdown is very important.
The text was updated successfully, but these errors were encountered: