OkHttp responses not closed properly #4666

scholzj · 2022-12-12T21:06:25Z

Describe the bug

With 6.3.0 release, it seems that some OkHttp client responses are not closed properly and every 5 minutes, we get bunch of warnings about them in our logs in the Strimzi project For example:

From leader election:

2022-12-12 20:10:17 WARN  OkHttpClient:134 - A connection to https://10.96.0.1/ was leaked. Did you forget to close a response body?
java.lang.Throwable: response.body().close()
    at okhttp3.internal.platform.Platform.getStackTraceForCloseable(Platform.java:148) ~[com.squareup.okhttp3.okhttp-3.12.12.jar:?]
    at okhttp3.RealCall.captureCallStackTrace(RealCall.java:116) ~[com.squareup.okhttp3.okhttp-3.12.12.jar:?]
    at okhttp3.RealCall.enqueue(RealCall.java:125) ~[com.squareup.okhttp3.okhttp-3.12.12.jar:?]
    at io.fabric8.kubernetes.client.okhttp.OkHttpClientImpl.sendAsync(OkHttpClientImpl.java:266) ~[io.fabric8.kubernetes-httpclient-okhttp-6.3.0.jar:?]
    at io.fabric8.kubernetes.client.okhttp.OkHttpClientImpl.consumeBytesDirect(OkHttpClientImpl.java:322) ~[io.fabric8.kubernetes-httpclient-okhttp-6.3.0.jar:?]
    at io.fabric8.kubernetes.client.http.StandardHttpClient.consumeBytes(StandardHttpClient.java:57) ~[io.fabric8.kubernetes-client-api-6.3.0.jar:?]
    at io.fabric8.kubernetes.client.http.SendAsyncUtils.bytes(SendAsyncUtils.java:51) ~[io.fabric8.kubernetes-client-api-6.3.0.jar:?]
    at io.fabric8.kubernetes.client.http.HttpResponse$SupportedResponses.sendAsync(HttpResponse.java:105) ~[io.fabric8.kubernetes-client-api-6.3.0.jar:?]
    at io.fabric8.kubernetes.client.http.HttpClient.sendAsync(HttpClient.java:178) ~[io.fabric8.kubernetes-client-api-6.3.0.jar:?]
    at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.retryWithExponentialBackoff(OperationSupport.java:618) ~[io.fabric8.kubernetes-client-6.3.0.jar:?]
    at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.handleResponse(OperationSupport.java:595) ~[io.fabric8.kubernetes-client-6.3.0.jar:?]
    at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.handleResponse(OperationSupport.java:570) ~[io.fabric8.kubernetes-client-6.3.0.jar:?]
    at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.handleUpdate(OperationSupport.java:366) ~[io.fabric8.kubernetes-client-6.3.0.jar:?]
    at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.handleUpdate(BaseOperation.java:709) ~[io.fabric8.kubernetes-client-6.3.0.jar:?]
    at io.fabric8.kubernetes.client.dsl.internal.HasMetadataOperation.lambda$replace$0(HasMetadataOperation.java:167) ~[io.fabric8.kubernetes-client-6.3.0.jar:?]
    at io.fabric8.kubernetes.client.dsl.internal.HasMetadataOperation.replace(HasMetadataOperation.java:172) ~[io.fabric8.kubernetes-client-6.3.0.jar:?]
    at io.fabric8.kubernetes.client.dsl.internal.HasMetadataOperation.replace(HasMetadataOperation.java:113) ~[io.fabric8.kubernetes-client-6.3.0.jar:?]
    at io.fabric8.kubernetes.client.dsl.internal.HasMetadataOperation.replace(HasMetadataOperation.java:41) ~[io.fabric8.kubernetes-client-6.3.0.jar:?]
    at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.replace(BaseOperation.java:1077) ~[io.fabric8.kubernetes-client-6.3.0.jar:?]
    at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.replace(BaseOperation.java:93) ~[io.fabric8.kubernetes-client-6.3.0.jar:?]
    at io.fabric8.kubernetes.client.extended.leaderelection.resourcelock.LeaseLock.update(LeaseLock.java:100) ~[io.fabric8.kubernetes-client-api-6.3.0.jar:?]
    at io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector.tryAcquireOrRenew(LeaderElector.java:214) ~[io.fabric8.kubernetes-client-api-6.3.0.jar:?]
    at io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector.lambda$renewWithTimeout$6(LeaderElector.java:175) ~[io.fabric8.kubernetes-client-api-6.3.0.jar:?]
    at io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector.lambda$loop$8(LeaderElector.java:258) ~[io.fabric8.kubernetes-client-api-6.3.0.jar:?]
    at java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1804) ~[?:?]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]
    at java.lang.Thread.run(Thread.java:833) ~[?:?]

When creating a custom resource

2022-12-12 20:35:07,81611 WARN  [OkHttp ConnectionPool] OkHttpClient:134 - A connection to https://10.96.0.1/ was leaked. Did you forget to close a response body?
java.lang.Throwable: response.body().close()
    at okhttp3.internal.platform.Platform.getStackTraceForCloseable(Platform.java:148) ~[com.squareup.okhttp3.okhttp-3.12.12.jar:?]
    at okhttp3.RealCall.captureCallStackTrace(RealCall.java:116) ~[com.squareup.okhttp3.okhttp-3.12.12.jar:?]
    at okhttp3.RealCall.enqueue(RealCall.java:125) ~[com.squareup.okhttp3.okhttp-3.12.12.jar:?]
    at io.fabric8.kubernetes.client.okhttp.OkHttpClientImpl.sendAsync(OkHttpClientImpl.java:266) ~[io.fabric8.kubernetes-httpclient-okhttp-6.3.0.jar:?]
    at io.fabric8.kubernetes.client.okhttp.OkHttpClientImpl.consumeBytesDirect(OkHttpClientImpl.java:322) ~[io.fabric8.kubernetes-httpclient-okhttp-6.3.0.jar:?]
    at io.fabric8.kubernetes.client.http.StandardHttpClient.consumeBytes(StandardHttpClient.java:57) ~[io.fabric8.kubernetes-client-api-6.3.0.jar:?]
    at io.fabric8.kubernetes.client.http.SendAsyncUtils.bytes(SendAsyncUtils.java:51) ~[io.fabric8.kubernetes-client-api-6.3.0.jar:?]
    at io.fabric8.kubernetes.client.http.HttpResponse$SupportedResponses.sendAsync(HttpResponse.java:105) ~[io.fabric8.kubernetes-client-api-6.3.0.jar:?]
    at io.fabric8.kubernetes.client.http.HttpClient.sendAsync(HttpClient.java:178) ~[io.fabric8.kubernetes-client-api-6.3.0.jar:?]
    at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.retryWithExponentialBackoff(OperationSupport.java:618) ~[io.fabric8.kubernetes-client-6.3.0.jar:?]
    at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.handleResponse(OperationSupport.java:595) ~[io.fabric8.kubernetes-client-6.3.0.jar:?]
    at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.handleResponse(OperationSupport.java:570) ~[io.fabric8.kubernetes-client-6.3.0.jar:?]
    at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.handleResponse(OperationSupport.java:554) ~[io.fabric8.kubernetes-client-6.3.0.jar:?]
    at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.handleCreate(OperationSupport.java:347) ~[io.fabric8.kubernetes-client-6.3.0.jar:?]
    at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.handleCreate(BaseOperation.java:704) ~[io.fabric8.kubernetes-client-6.3.0.jar:?]
    at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.handleCreate(BaseOperation.java:93) ~[io.fabric8.kubernetes-client-6.3.0.jar:?]
    at io.fabric8.kubernetes.client.dsl.internal.CreateOnlyResourceOperation.create(CreateOnlyResourceOperation.java:42) ~[io.fabric8.kubernetes-client-6.3.0.jar:?]
    at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.create(BaseOperation.java:1107) ~[io.fabric8.kubernetes-client-6.3.0.jar:?]
    at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.create(BaseOperation.java:93) ~[io.fabric8.kubernetes-client-6.3.0.jar:?]
    at io.strimzi.operator.topic.K8sImpl.lambda$createResource$0(K8sImpl.java:53) ~[io.strimzi.topic-operator-0.33.0-SNAPSHOT.jar:0.33.0-SNAPSHOT]
    at io.vertx.core.impl.ContextBase.lambda$null$0(ContextBase.java:137) ~[io.vertx.vertx-core-4.3.5.jar:4.3.5]
    at io.vertx.core.impl.ContextInternal.dispatch(ContextInternal.java:264) ~[io.vertx.vertx-core-4.3.5.jar:4.3.5]
    at io.vertx.core.impl.ContextBase.lambda$executeBlocking$1(ContextBase.java:135) ~[io.vertx.vertx-core-4.3.5.jar:4.3.5]
    at io.vertx.core.impl.TaskQueue.run(TaskQueue.java:76) ~[io.vertx.vertx-core-4.3.5.jar:4.3.5]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]
    at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[io.netty.netty-common-4.1.77.Final.jar:4.1.77.Final]
    at java.lang.Thread.run(Thread.java:833) ~[?:?]

When updating CR status subresource:

2022-12-12 20:35:07,81724 WARN  [OkHttp ConnectionPool] OkHttpClient:134 - A connection to https://10.96.0.1/ was leaked. Did you forget to close a response body?
java.lang.Throwable: response.body().close()
    at okhttp3.internal.platform.Platform.getStackTraceForCloseable(Platform.java:148) ~[com.squareup.okhttp3.okhttp-3.12.12.jar:?]
    at okhttp3.RealCall.captureCallStackTrace(RealCall.java:116) ~[com.squareup.okhttp3.okhttp-3.12.12.jar:?]
    at okhttp3.RealCall.enqueue(RealCall.java:125) ~[com.squareup.okhttp3.okhttp-3.12.12.jar:?]
    at io.fabric8.kubernetes.client.okhttp.OkHttpClientImpl.sendAsync(OkHttpClientImpl.java:266) ~[io.fabric8.kubernetes-httpclient-okhttp-6.3.0.jar:?]
    at io.fabric8.kubernetes.client.okhttp.OkHttpClientImpl.consumeBytesDirect(OkHttpClientImpl.java:322) ~[io.fabric8.kubernetes-httpclient-okhttp-6.3.0.jar:?]
    at io.fabric8.kubernetes.client.http.StandardHttpClient.consumeBytes(StandardHttpClient.java:57) ~[io.fabric8.kubernetes-client-api-6.3.0.jar:?]
    at io.fabric8.kubernetes.client.http.SendAsyncUtils.bytes(SendAsyncUtils.java:51) ~[io.fabric8.kubernetes-client-api-6.3.0.jar:?]
    at io.fabric8.kubernetes.client.http.HttpResponse$SupportedResponses.sendAsync(HttpResponse.java:105) ~[io.fabric8.kubernetes-client-api-6.3.0.jar:?]
    at io.fabric8.kubernetes.client.http.HttpClient.sendAsync(HttpClient.java:178) ~[io.fabric8.kubernetes-client-api-6.3.0.jar:?]
    at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.retryWithExponentialBackoff(OperationSupport.java:618) ~[io.fabric8.kubernetes-client-6.3.0.jar:?]
    at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.handleResponse(OperationSupport.java:595) ~[io.fabric8.kubernetes-client-6.3.0.jar:?]
    at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.handleResponse(OperationSupport.java:570) ~[io.fabric8.kubernetes-client-6.3.0.jar:?]
    at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.handleUpdate(OperationSupport.java:366) ~[io.fabric8.kubernetes-client-6.3.0.jar:?]
    at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.handleUpdate(BaseOperation.java:709) ~[io.fabric8.kubernetes-client-6.3.0.jar:?]
    at io.fabric8.kubernetes.client.dsl.internal.HasMetadataOperation.lambda$replace$0(HasMetadataOperation.java:167) ~[io.fabric8.kubernetes-client-6.3.0.jar:?]
    at io.fabric8.kubernetes.client.dsl.internal.HasMetadataOperation.replace(HasMetadataOperation.java:172) ~[io.fabric8.kubernetes-client-6.3.0.jar:?]
    at io.fabric8.kubernetes.client.dsl.internal.HasMetadataOperation.replaceStatus(HasMetadataOperation.java:118) ~[io.fabric8.kubernetes-client-6.3.0.jar:?]
    at io.fabric8.kubernetes.client.dsl.internal.HasMetadataOperation.replaceStatus(HasMetadataOperation.java:41) ~[io.fabric8.kubernetes-client-6.3.0.jar:?]
    at io.strimzi.operator.common.operator.resource.CrdOperator.lambda$updateStatusAsync$2(CrdOperator.java:132) ~[io.strimzi.operator-common-0.33.0-SNAPSHOT.jar:0.33.0-SNAPSHOT]
    at io.vertx.core.impl.ContextBase.lambda$null$0(ContextBase.java:137) ~[io.vertx.vertx-core-4.3.5.jar:4.3.5]
    at io.vertx.core.impl.ContextInternal.dispatch(ContextInternal.java:264) ~[io.vertx.vertx-core-4.3.5.jar:4.3.5]
    at io.vertx.core.impl.ContextBase.lambda$executeBlocking$1(ContextBase.java:135) ~[io.vertx.vertx-core-4.3.5.jar:4.3.5]
    at io.vertx.core.impl.TaskQueue.run(TaskQueue.java:76) ~[io.vertx.vertx-core-4.3.5.jar:4.3.5]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]
    at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[io.netty.netty-common-4.1.77.Final.jar:4.1.77.Final]
    at java.lang.Thread.run(Thread.java:833) ~[?:?]

There seem to be more around list and get operations as well.

This was also discussed here

Fabric8 Kubernetes Client version

6.3.0

Steps to reproduce

Just using the Fabric8 Kubernetes client. If needed, I can try to create some reproducer code snippets. But it seems to come from all kind of places. The stacktraces are in the description.

Expected behavior

No warnings about unclosed messages will appear in the logs.

Runtime

Kubernetes (vanilla)

Kubernetes API Server version

1.25.3@latest

Environment

Linux

Fabric8 Kubernetes Client Logs

See the description. I can provide more logs if needed.

Additional context

n/a

The text was updated successfully, but these errors were encountered:

manusa · 2022-12-13T05:08:16Z

:( it seems we might have introduced a regression when refactoring our OkHttp HttpClient implementation. We'll need to release a patch with a fix.

manusa · 2022-12-13T06:46:27Z

Continues discussion in #4660 (reply in thread)

I'm not sure that the handler causing the leakage is the one for the AsyncBody (sendAsync) consumption. It is reported there as a result of the OkHttp's ConnectionPool pruning, but that doesn't mean that the connection was leaked there.

The stack shown above was captured for simply reading the bytes of the response. The handler there will always read to completion, which apparently is not enough for okhttp (it is for the jdk client).

Unfortunately it doesn't appear that the integration tests are long running enough with a single kubernetesclient instance that is using okhttp to highlight this issue, so it was missed.

It does seem that reading the BufferedSource to completion deallocates the connection (RealBufferedSource.exhaused() -> StreamAllocation.streamFinished() -> StreamAllocation.deallocate() -> ConnectionPool.connectionBecameIdle()). So this would be evidence that the leakage is happening elsewhere.

I'm trying to see where else we can be leaking the connection by not closing the response.

shawkins · 2022-12-13T10:15:37Z

I'm not sure that the handler causing the leakage is the one for the AsyncBody (sendAsync) consumption

There are only 2 calls now. Either one using asyncbody, or a websocket. For there to be an issue with websockets, sendClose would not have to be called - as far as I'm aware non of that logic changed. The change wrt asyncbody is that previously the sendAsync results were directly materialized by okhttp, now they are not.

It is reported there as a result of the OkHttp's ConnectionPool pruning, but that doesn't mean that the connection was leaked there.

Isn't the reporting based upon the stack that was captured for a connection that was leaked?

It does seem that reading the BufferedSource to completion deallocates the connection (RealBufferedSource.exhaused() -> StreamAllocation.streamFinished() -> StreamAllocation.deallocate() -> ConnectionPool.connectionBecameIdle()). So this would be evidence that the leakage is happening elsewhere.

I've confirmed the fix works - that is you can easily reproduce this scenario doing repeated reads. At least with an integration test / real server (if you run against the mock server http1 will be used, I haven't confirmed if it has the same problem). What I see is that when Http2Codec.read is called under exhausted, endOfInput is only called if an IOException is thrown from the FramingSource. If it's just -1 being returned, then endOfInput is not called - so the auto close is not triggered. Some of the http1 logic may be affected as well because I see the same read method in Http1Codec$AbstractSource.read.

manusa · 2022-12-13T11:18:22Z

Isn't the reporting based upon the stack that was captured for a connection that was leaked?

The reporting just says that a connection was leaking, not which one (at least this is what I get from reading the code in ConnectionPool.

HTTP 2

I haven't checked with HTTP2, but I've validated that the connection is actually deallocated when read to exhaustion with the mock server (HTTP1).

I'll check with HTTP2 and see if I can provide a more realistic test to verify the fix.

manusa · 2022-12-13T12:45:13Z

So it's confirmed, the issue is only present when using the HTTP 2 protocol. The PR (#4665) includes a test that verifies this against a fine-tuned OkHTTP Mock Server.

### What changes were proposed in this pull request? This PR aims to upgrade K8s client to 6.3.1. I tested manually that the regression is fixed. - https://github.com/fabric8io/kubernetes-client/releases/tag/v6.3.1 - https://github.com/fabric8io/kubernetes-client/releases/tag/v6.3.0 ### Why are the changes needed? SPARK-41521 tried to use 6.3.0, but it was reverted due to fabric8io/kubernetes-client#4666. v6.3.1 fixed it. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the Cis. Closes #39094 from dongjoon-hyun/SPARK-41552. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>

shawkins added a commit to shawkins/kubernetes-client that referenced this issue Dec 12, 2022

fix fabric8io#4666: addressing okhttp sources needing explicit close

8b5f8ef

shawkins mentioned this issue Dec 12, 2022

fix #4666: addressing okhttp sources needing explicit close #4665

Merged

11 tasks

shawkins self-assigned this Dec 12, 2022

shawkins added a commit to shawkins/kubernetes-client that referenced this issue Dec 12, 2022

fix fabric8io#4666: addressing okhttp sources needing explicit close

562005f

manusa added this to the 6.3.1 milestone Dec 13, 2022

manusa added the bug label Dec 13, 2022

manusa mentioned this issue Dec 13, 2022

Bump kubernetes-client-bom from 6.2.0 to 6.3.1 quarkusio/quarkus#29807

Merged

manusa self-assigned this Dec 13, 2022

manusa moved this to In Progress in Eclipse JKube Dec 13, 2022

manusa added this to Eclipse JKube Dec 13, 2022

manusa closed this as completed in #4665 Dec 13, 2022

manusa pushed a commit that referenced this issue Dec 13, 2022

fix #4666: addressing okhttp sources needing explicit close

5133244

Repository owner moved this from In Progress to Done in Eclipse JKube Dec 13, 2022

dongjoon-hyun mentioned this issue Dec 16, 2022

[SPARK-41552][BUILD][K8S] Upgrade kubernetes-client to 6.3.1 apache/spark#39094

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OkHttp responses not closed properly #4666

OkHttp responses not closed properly #4666

scholzj commented Dec 12, 2022

manusa commented Dec 13, 2022

manusa commented Dec 13, 2022 •

edited

Loading

shawkins commented Dec 13, 2022

manusa commented Dec 13, 2022 •

edited

Loading

manusa commented Dec 13, 2022

OkHttp responses not closed properly #4666

OkHttp responses not closed properly #4666

Comments

scholzj commented Dec 12, 2022

Describe the bug

Fabric8 Kubernetes Client version

Steps to reproduce

Expected behavior

Runtime

Kubernetes API Server version

Environment

Fabric8 Kubernetes Client Logs

Additional context

manusa commented Dec 13, 2022

manusa commented Dec 13, 2022 • edited Loading

shawkins commented Dec 13, 2022

manusa commented Dec 13, 2022 • edited Loading

manusa commented Dec 13, 2022

manusa commented Dec 13, 2022 •

edited

Loading

manusa commented Dec 13, 2022 •

edited

Loading