TTL on connections #3374

juretta · 2017-05-24T14:03:03Z

What kind of issue is this?

Feature Request. Start by telling us what problem you’re trying to solve. Often a solution
already exists! Don’t send pull requests to implement new features without first getting our
support. Sometimes we leave features out on purpose to keep the project small.

We are using OkHttp as an HTTP client as part of a multi service application in AWS.
Our services are behind ELBs and we are using a connection pool per client (with a client per downstream service). ELBs use DNS round robing with a TTL of 60 seconds. Our idle timeout is set to 50 seconds.

We are seeing persistent connections that are in use for a long period of time. In particular we observer errors in cases where a downstream service/stack gets replaced (which will replace the ELB nodes) and those connections become stale.

In addition to errors, we are also not fully leveraging ELB nodes being scaled out, due to the fact that connections in the pool live for significantly longer than the DNS TTL and new nodes are not being used.

This is unlikely to be a problem when OkHttp is used for mobile clients as there are many clients with significantly fewer load balancer nodes, but in our case we are looking at a much smaller set of client nodes that connect to the ELB/LB nodes.

We have a couple of options here (as far as I can tell):

Don't use ConnectionPools/persistent connections (setting maxIdleConnections seems to mostly do this)
Manage our own implementation that periodically evicts connections regardless of their idle time.

The former might be ok for the parts that are not directly on the user request path, but it would be unfortunate if we couldn't use connection pooling at all.
The latter is an approach that we have implemented, but it has some gremlins as care needs to be taken to properly manage the lifetime of the client and its pool and the scheduled task that periodically evicts connections.

Is there interest in allowing clients to specify a connection time-to-live in addition to the idle timeout? Managing this as part of the client or (more likely) the pool seems much better in terms of properly managing the lifecycle of the pool and its management thread.

Or are we misusing OkHttp in the way we use it?

The text was updated successfully, but these errors were encountered:

swankjesse · 2017-06-10T14:51:07Z

Just to confirm I understand: because the network topology is dynamic, you’d like to impose a maximum lifetime on a connection?

We can do that, though it’s not a great fit for the problem. Whatever time period we use will be too long when you want to change your topology, and too short when you don’t. My experience is that whenever we add configuration options like this many users want to use them, and this would be problematic.

A more natural fit would be for the client or server that’s being taken out of the network to terminate manually. For webservers this is adding Connection: close headers on responses once they’re being cycled out. For clients this is manually evicting connections from the pool.

juretta · 2017-09-12T09:06:01Z

Just to confirm I understand: because the network topology is dynamic, you’d like to impose a maximum lifetime on a connection?

As one option yes. Another option would be to monitor DNS changes and only recycle connections if DNS records actually change.

We ended up creating our own Call.Factory that uses a scheduled task to evict the pool on a fixed schedule (currently 5 minutes). This effectively imposes an upper bound on the connection TTL (modulo connections that are currently active and won't be evicted).

Instead of doing this from the consumer side, I think it would be beneficial if okhttp3.ConnectionPool#cleanup would not only cleanup idle connections but any connection that isn't currently in use and exceeds a connection TTL configured in the pool.

swankjesse · 2017-09-12T18:10:21Z

Cool. I think that’s a good fix.

mpazik · 2017-11-13T03:25:48Z

My experience is that whenever we add configuration options like this many users want to use them, and this would be problematic.

@swankjesse Why it is problematic? If the particular configuration option is excessively used, isn't that indication that it's needed?

swankjesse · 2017-11-13T09:51:54Z

It is not how HTTP works. Clients aren’t supposed to have special configuration for their endpoints. In particular, browsers don’t do this.

Instead the solution must be server-side.

mpazik · 2017-11-13T10:48:28Z

Thanks @swankjesse, that's a good point.

ibaneeez · 2023-01-20T07:48:08Z

I have one more usecase, which in my option is not solved by server-side connection termination. Disaster recovery. This is not AWS related, bear in mind please. In case a loadbalancer fails (stops listening to its port), we do the failover by switching the hostname resolution on DNS. As this is not predictable, we can't expect the failing loadbalancer to terminate connections.
How to approach this please?

int0x03 · 2023-02-08T04:51:49Z

this expected feature exists for apache httpClient connection pool. check timeToLive parameter for this class: https://hc.apache.org/httpcomponents-client-4.5.x/current/httpclient/apidocs/org/apache/http/impl/conn/PoolingHttpClientConnectionManager.html

what's the issue:
we have 3 data centers (dc-a, dc-b, dc-c), and we have svc1 calls svc2. normally svc1 in dc-a will call the downstream svc2 in dc-a, same for other 2 data centers. sometime, we want traffic not go to dc-c for svc2, so we remove the IP of svc2 in dc-c on DNS, if we lookup dns, it will not return svc2 IP in dc-c, but the svc1 in dc-c still call svc2 in dc-c, as it still reuse the IP.

expected:
force close the connection in pool if it used N times or used a few minutes.

daniloesk · 2023-12-04T13:47:12Z

Linking to #190 just in case anyone is looking for OkHttp's timeToLive (TTL), max age, max lifetime. AKA keepAliveDuration.

andrebrait · 2024-10-02T15:07:46Z

Leaving a comment for those who might attempt the Connection: close approach: it seems OKHttp does close the connection properly, but it does not re-run name resolution, so if you're trying to use this in conjunction to changes in DNS records, it'll likely just retry connecting to the same IPs as it had the first time it resolved the address.

Putting this here for documentation, as this issue seems to be one of the top results on search engines.

Relates to #4530

swankjesse added the needs info More information needed from reporter label Jun 10, 2017

jrudolph mentioned this issue Jul 19, 2017

Akka Http Client pool connections are not reestablished after DNS positive-ttl akka/akka-http#1226

Open

swankjesse closed this as completed Jul 31, 2017

szuecs mentioned this issue Apr 2, 2024

net/http: make Transport's idle connection management aware of DNS changes? golang/go#23427

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TTL on connections #3374

TTL on connections #3374

juretta commented May 24, 2017

swankjesse commented Jun 10, 2017

juretta commented Sep 12, 2017

swankjesse commented Sep 12, 2017

mpazik commented Nov 13, 2017 •

edited

Loading

swankjesse commented Nov 13, 2017

mpazik commented Nov 13, 2017

ibaneeez commented Jan 20, 2023 •

edited

Loading

int0x03 commented Feb 8, 2023

daniloesk commented Dec 4, 2023

andrebrait commented Oct 2, 2024

TTL on connections #3374

TTL on connections #3374

Comments

juretta commented May 24, 2017

swankjesse commented Jun 10, 2017

juretta commented Sep 12, 2017

swankjesse commented Sep 12, 2017

mpazik commented Nov 13, 2017 • edited Loading

swankjesse commented Nov 13, 2017

mpazik commented Nov 13, 2017

ibaneeez commented Jan 20, 2023 • edited Loading

int0x03 commented Feb 8, 2023

daniloesk commented Dec 4, 2023

andrebrait commented Oct 2, 2024

mpazik commented Nov 13, 2017 •

edited

Loading

ibaneeez commented Jan 20, 2023 •

edited

Loading