Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TTL on connections #3374

Closed
1 task done
juretta opened this issue May 24, 2017 · 10 comments
Closed
1 task done

TTL on connections #3374

juretta opened this issue May 24, 2017 · 10 comments
Labels
needs info More information needed from reporter

Comments

@juretta
Copy link

juretta commented May 24, 2017

What kind of issue is this?

  • Feature Request. Start by telling us what problem you’re trying to solve. Often a solution
    already exists! Don’t send pull requests to implement new features without first getting our
    support. Sometimes we leave features out on purpose to keep the project small.

We are using OkHttp as an HTTP client as part of a multi service application in AWS.
Our services are behind ELBs and we are using a connection pool per client (with a client per downstream service). ELBs use DNS round robing with a TTL of 60 seconds. Our idle timeout is set to 50 seconds.

We are seeing persistent connections that are in use for a long period of time. In particular we observer errors in cases where a downstream service/stack gets replaced (which will replace the ELB nodes) and those connections become stale.

In addition to errors, we are also not fully leveraging ELB nodes being scaled out, due to the fact that connections in the pool live for significantly longer than the DNS TTL and new nodes are not being used.

This is unlikely to be a problem when OkHttp is used for mobile clients as there are many clients with significantly fewer load balancer nodes, but in our case we are looking at a much smaller set of client nodes that connect to the ELB/LB nodes.

We have a couple of options here (as far as I can tell):

  • Don't use ConnectionPools/persistent connections (setting maxIdleConnections seems to mostly do this)
  • Manage our own implementation that periodically evicts connections regardless of their idle time.

The former might be ok for the parts that are not directly on the user request path, but it would be unfortunate if we couldn't use connection pooling at all.
The latter is an approach that we have implemented, but it has some gremlins as care needs to be taken to properly manage the lifetime of the client and its pool and the scheduled task that periodically evicts connections.

Is there interest in allowing clients to specify a connection time-to-live in addition to the idle timeout? Managing this as part of the client or (more likely) the pool seems much better in terms of properly managing the lifecycle of the pool and its management thread.

Or are we misusing OkHttp in the way we use it?

@swankjesse
Copy link
Collaborator

Just to confirm I understand: because the network topology is dynamic, you’d like to impose a maximum lifetime on a connection?

We can do that, though it’s not a great fit for the problem. Whatever time period we use will be too long when you want to change your topology, and too short when you don’t. My experience is that whenever we add configuration options like this many users want to use them, and this would be problematic.

A more natural fit would be for the client or server that’s being taken out of the network to terminate manually. For webservers this is adding Connection: close headers on responses once they’re being cycled out. For clients this is manually evicting connections from the pool.

@juretta
Copy link
Author

juretta commented Sep 12, 2017

Just to confirm I understand: because the network topology is dynamic, you’d like to impose a maximum lifetime on a connection?

As one option yes. Another option would be to monitor DNS changes and only recycle connections if DNS records actually change.

We ended up creating our own Call.Factory that uses a scheduled task to evict the pool on a fixed schedule (currently 5 minutes). This effectively imposes an upper bound on the connection TTL (modulo connections that are currently active and won't be evicted).

Instead of doing this from the consumer side, I think it would be beneficial if okhttp3.ConnectionPool#cleanup would not only cleanup idle connections but any connection that isn't currently in use and exceeds a connection TTL configured in the pool.

@swankjesse
Copy link
Collaborator

Cool. I think that’s a good fix.

@mpazik
Copy link

mpazik commented Nov 13, 2017

My experience is that whenever we add configuration options like this many users want to use them, and this would be problematic.

@swankjesse Why it is problematic? If the particular configuration option is excessively used, isn't that indication that it's needed?

@swankjesse
Copy link
Collaborator

It is not how HTTP works. Clients aren’t supposed to have special configuration for their endpoints. In particular, browsers don’t do this.

Instead the solution must be server-side.

@mpazik
Copy link

mpazik commented Nov 13, 2017

Thanks @swankjesse, that's a good point.

@ibaneeez
Copy link

ibaneeez commented Jan 20, 2023

I have one more usecase, which in my option is not solved by server-side connection termination. Disaster recovery. This is not AWS related, bear in mind please. In case a loadbalancer fails (stops listening to its port), we do the failover by switching the hostname resolution on DNS. As this is not predictable, we can't expect the failing loadbalancer to terminate connections.
How to approach this please?

@int0x03
Copy link

int0x03 commented Feb 8, 2023

this expected feature exists for apache httpClient connection pool. check timeToLive parameter for this class: https://hc.apache.org/httpcomponents-client-4.5.x/current/httpclient/apidocs/org/apache/http/impl/conn/PoolingHttpClientConnectionManager.html

what's the issue:
we have 3 data centers (dc-a, dc-b, dc-c), and we have svc1 calls svc2. normally svc1 in dc-a will call the downstream svc2 in dc-a, same for other 2 data centers. sometime, we want traffic not go to dc-c for svc2, so we remove the IP of svc2 in dc-c on DNS, if we lookup dns, it will not return svc2 IP in dc-c, but the svc1 in dc-c still call svc2 in dc-c, as it still reuse the IP.

expected:
force close the connection in pool if it used N times or used a few minutes.

@daniloesk
Copy link

Linking to #190 just in case anyone is looking for OkHttp's timeToLive (TTL), max age, max lifetime. AKA keepAliveDuration.

@andrebrait
Copy link

Leaving a comment for those who might attempt the Connection: close approach: it seems OKHttp does close the connection properly, but it does not re-run name resolution, so if you're trying to use this in conjunction to changes in DNS records, it'll likely just retry connecting to the same IPs as it had the first time it resolved the address.

Putting this here for documentation, as this issue seems to be one of the top results on search engines.

Relates to #4530

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs info More information needed from reporter
Projects
None yet
Development

No branches or pull requests

7 participants