HTTP2 doesn't retry the first stream on REFUSED_STREAM #44669

BrennanConroy · 2020-11-13T23:01:31Z

We have an HttpClient+Kestrel test where we are checking that the stream limit prevents requests from hitting application function when the limit is reached.

The test is flaky and after investigating, it looks like if a batch of streams are sent in parallel when the connection starts and the stream limit is lower than the batch, then there is a race where the first request creates the connection but isn't the first stream sent and if it is for example the 6th stream with a server limit of 5, then it can get the REFUSED_STREAM response and will skip the retry logic because it has the isNewConnection value set to true.

runtime/src/libraries/System.Net.Http/src/System/Net/Http/SocketsHttpHandler/HttpConnectionPool.cs

Line 909 in 095adbd

    
           catch (HttpRequestException e) when (!isNewConnection && e.AllowRetry == RequestRetryType.RetryOnSameOrNextProxy)

If the SETTINGS frame is received everything works fine, but because there is a big batch in the beginning, all the streams can be in-flight before the SETTINGS frame is received.

Test issue dotnet/aspnetcore#27379

cc @Tratcher and @halter73 who helped me understand some of the HTTP2 spec when looking at this.

Our test code looks something like this (server code is just pseudo code here):

server.MaxConcurrentStreams = 5;
var blockRequests = new TaskCompletionSource(TaskCreationOptions.RunContinuationsAsynchronously);
server.Run(async (request) =>
{
    // unblock after 5th request
    // ...
    await blockRequests.Task;
});

var tasks = new List<Task<HttpResponseMessage>>(10);
for (var i = 0; i < 10; i++)
{
    var requestTask = client.PostAsync(url, new StringContent("Hello World")).DefaultTimeout();
    tasks.Add(requestTask);
}

var responses = await Task.WhenAll(tasks.ToList()).DefaultTimeout(); // throws sometimes

The text was updated successfully, but these errors were encountered:

ghost · 2020-11-13T23:01:35Z

Tagging subscribers to this area: @dotnet/ncl
See info in area-owners.md if you want to be subscribed.

Issue Details

Description:

We have an HttpClient+Kestrel test where we are checking that the stream limit prevents requests from hitting application function when the limit is reached.

The test is flaky and after investigating, it looks like if a batch of streams are sent in parallel when the connection starts and the stream limit is lower than the batch, then there is a race where the first request creates the connection but isn't the first stream sent and if it is for example the 6th stream with a server limit of 5, then it can get the REFUSED_STREAM response and will skip the retry logic because it has the isNewConnection value set to true.

runtime/src/libraries/System.Net.Http/src/System/Net/Http/SocketsHttpHandler/HttpConnectionPool.cs

Line 909 in 095adbd

    
           catch (HttpRequestException e) when (!isNewConnection && e.AllowRetry == RequestRetryType.RetryOnSameOrNextProxy)

If the SETTINGS frame is received everything works fine, but because there is a big batch in the beginning, all the streams can be in-flight before the SETTINGS frame is received.

Test issue dotnet/aspnetcore#27379

cc @Tratcher and @halter73 who helped me understand some of the HTTP2 spec when looking at this.

Our test code looks something like this (server code is just pseudo code here):

server.MaxConcurrentStreams = 5;
var blockRequests = new TaskCompletionSource(TaskCreationOptions.RunContinuationsAsynchronously);
server.Run(async (request) =>
{
    // unblock after 5th request
    // ...
    await blockRequests.Task;
});

var tasks = new List<Task<HttpResponseMessage>>(10);
for (var i = 0; i < 10; i++)
{
    var requestTask = client.PostAsync(url, new StringContent("Hello World")).DefaultTimeout();
    tasks.Add(requestTask);
}

var responses = await Task.WhenAll(tasks.ToList()).DefaultTimeout(); // throws sometimes

Author: BrennanConroy

Assignees: -

Labels:

area-System.Net.Http, untriaged

Milestone: -

Tratcher · 2020-11-13T23:07:25Z

A) Why isn't the request that was used to create the connection the first request to be sent on that connection? I'd expect it to have priority.
B) Why can't the first request be retried? Because it's assumed to have failed for some more fatal reason like DNS?

geoffkizer · 2020-11-14T08:03:43Z

Why isn't the request that was used to create the connection the first request to be sent on that connection? I'd expect it to have priority.

As soon as the connection is established, we make it available for use by any requests. So while the request that caused the connection to be created will immediately try to use it, other requests may also be racing to use it and there's no guarantee that the original request that created the connection will actually be the first one to send a request on it.

Why can't the first request be retried? Because it's assumed to have failed for some more fatal reason like DNS?

The reasoning here is that we don't want to get into an infinite retry loop. So if the first request on the connection fails, we don't retry it -- otherwise we could end up just trying to create new connections endlessly.

geoffkizer · 2020-11-14T08:05:07Z

That said, it does seem like the combination of all these factors leads to a situation where a REFUSED_STREAM is not retried even though it should be. Not sure how to fix that. Suggestions welcome.

karelz · 2020-12-02T17:39:39Z

Triage: We should verify that the 1st request on new HTTP connection is assigned to stream 1.
And if it is, then it is super-low priority as it should be fairly rare and exceptional.

This might explain several issues during our Stress testing.

ManickaP · 2020-12-02T17:51:23Z

Did a quick look at the code and it can happen. We construct the connection, store it in _http2Connections and then we initiate the request itself. The streamId is assigned in SendHeadersAsync which happens further down the line so other requests might have been assigned lower stream ids in the meantime.

geoffkizer · 2021-02-23T20:34:48Z

I think we should change the retry logic so that we only disallow retrying the request on stream 1 (for HTTP2). This would fix the original issue here while still addressing the "endless loop" problem.

That said, refactoring the logic around isNewConnection may be a little tricky.

geoffkizer · 2021-02-24T02:56:25Z

That said, refactoring the logic around isNewConnection may be a little tricky.

Actually I think it's relatively straightforward. In HttpConnectionPool, we should always treat an HTTP2 (or HTTP3?) connection as a new connection for retry purposes -- i.e. isNewConnection is always true. But in the HTTP2 OnRest logic, where we determine whether a request can be retried or not, we should just disable this for StreamId = 1,

geoffkizer · 2021-02-24T03:03:43Z

I also wonder if we should additionally just set a hard limit for retry attempts, say 5. We could make this configurable if desired.

As things stand, a badly behaved HTTP2 server could still cause us to loop infinitely if it just keeps resetting new streams that should be valid under the stream limit.

This would allow us to just rip out the isNewConnection stuff entirely, I think, which would be nice. And a hard limit ensures we have a backstop for any server shenanigans, HTTP2 or otherwise.

geoffkizer · 2021-02-24T03:04:50Z

@scalablecory Any thoughts here?

Dotnet-GitSync-Bot added area-System.Net.Http untriaged New issue has not been triaged by the area owner labels Nov 13, 2020

BrennanConroy mentioned this issue Nov 13, 2020

Settings_MaxConcurrentStreamsGet_Server test failure dotnet/aspnetcore#27379

Closed

karelz added bug and removed untriaged New issue has not been triaged by the area owner labels Dec 2, 2020

karelz added this to the 6.0.0 milestone Dec 2, 2020

geoffkizer mentioned this issue Feb 25, 2021

Rework request retry logic to be based on retry count limit #48758

Merged

ghost added the in-pr There is an active PR which will close this issue when it is merged label Feb 25, 2021

geoffkizer closed this as completed in #48758 Apr 18, 2021

ghost removed the in-pr There is an active PR which will close this issue when it is merged label Apr 18, 2021

ghost locked as resolved and limited conversation to collaborators May 18, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HTTP2 doesn't retry the first stream on REFUSED_STREAM #44669

HTTP2 doesn't retry the first stream on REFUSED_STREAM #44669

BrennanConroy commented Nov 13, 2020 •

edited by karelz

Loading

ghost commented Nov 13, 2020 •

edited by jeffhandley

Loading

Tratcher commented Nov 13, 2020

geoffkizer commented Nov 14, 2020 •

edited

Loading

geoffkizer commented Nov 14, 2020

karelz commented Dec 2, 2020 •

edited

Loading

ManickaP commented Dec 2, 2020

geoffkizer commented Feb 23, 2021

geoffkizer commented Feb 24, 2021

geoffkizer commented Feb 24, 2021

geoffkizer commented Feb 24, 2021

HTTP2 doesn't retry the first stream on REFUSED_STREAM #44669

HTTP2 doesn't retry the first stream on REFUSED_STREAM #44669

Comments

BrennanConroy commented Nov 13, 2020 • edited by karelz Loading

ghost commented Nov 13, 2020 • edited by jeffhandley Loading

Tratcher commented Nov 13, 2020

geoffkizer commented Nov 14, 2020 • edited Loading

geoffkizer commented Nov 14, 2020

karelz commented Dec 2, 2020 • edited Loading

ManickaP commented Dec 2, 2020

geoffkizer commented Feb 23, 2021

geoffkizer commented Feb 24, 2021

geoffkizer commented Feb 24, 2021

geoffkizer commented Feb 24, 2021

BrennanConroy commented Nov 13, 2020 •

edited by karelz

Loading

ghost commented Nov 13, 2020 •

edited by jeffhandley

Loading

geoffkizer commented Nov 14, 2020 •

edited

Loading

karelz commented Dec 2, 2020 •

edited

Loading