Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use of "async" calls eventually causes StackoverflowErrors due to unbounded growth of networkInterceptors in OkHttpClient #482

Closed
mattnworb opened this issue Jan 11, 2019 · 18 comments
Labels
lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness.

Comments

@mattnworb
Copy link

mattnworb commented Jan 11, 2019

I mentioned this in #467 but I thought it was worth opening a new issue for visibility.

Using the generated "async" methods, such as CoreV1Api.listPodForAllNamespacesAsync(..), with an ApiCallback will eventually cause a StackoverflowError because the generated client code adds a new networkInteceptor onto the ApiClient's list of networkInterceptors for every single call and never cleans up the list.

For example, listPodForAllNamespacesAsync creates a ProgressListener when callback != null. When the call stack reaches listPodForAllNamespacesCall, this series of generated code does:

if(progressListener != null) {
    apiClient.getHttpClient().networkInterceptors().add(new com.squareup.okhttp.Interceptor() {
      ...
    });
}

networkInterceptors() is a plain java.util.List which grows unbounded in size. The "interceptor chain" concept in OkHttpClient recurses through all of the interceptors in the list, and if the list of interceptors becomes long enough it will eventually cause a StackOverflowError:

 Uncaught exception in thread Thread[OkHttp Dispatcher,5,main]
 	java.lang.StackOverflowError: null
 	at io.kubernetes.client.apis.CoreV1Api$581.intercept(CoreV1Api.java:18426)
 	at com.squareup.okhttp.internal.http.HttpEngine$NetworkInterceptorChain.proceed(HttpEngine.java:695)
 	at io.kubernetes.client.apis.CoreV1Api$581.intercept(CoreV1Api.java:18426)
 	at com.squareup.okhttp.internal.http.HttpEngine$NetworkInterceptorChain.proceed(HttpEngine.java:695)
 	at io.kubernetes.client.apis.CoreV1Api$581.intercept(CoreV1Api.java:18426)
 	at com.squareup.okhttp.internal.http.HttpEngine$NetworkInterceptorChain.proceed(HttpEngine.java:695)
 	at io.kubernetes.client.apis.CoreV1Api$581.intercept(CoreV1Api.java:18426)
 	at com.squareup.okhttp.internal.http.HttpEngine$NetworkInterceptorChain.proceed(HttpEngine.java:695)
 	at io.kubernetes.client.apis.CoreV1Api$581.intercept(CoreV1Api.java:18426)
 	at com.squareup.okhttp.internal.http.HttpEngine$NetworkInterceptorChain.proceed(HttpEngine.java:695)
 	at io.kubernetes.client.apis.CoreV1Api$581.intercept(CoreV1Api.java:18426)
 	at com.squareup.okhttp.internal.http.HttpEngine$NetworkInterceptorChain.proceed(HttpEngine.java:695)
...

The upstream issue for the swagger-codegen is swagger-api/swagger-codegen#7876 and a fix is proposed in swagger-api/swagger-codegen#8053 but has received no attention for 8+ months.

As-is (along with the thread safety issues in #467), the "async" methods in the generated client code here are virtually unusable in any sort of long-lived application or production server.

@brendandburns
Copy link
Contributor

Thanks for adding clarity.

We're looking to switch to the for of the swagger code generator so that (hopefully) issues like this will get more timely attention.

@mattnworb
Copy link
Author

Thanks for the update @brendandburns. Is there any place where we can follow along for that discussion and/or perhaps help out?

@ngaya-ll
Copy link

ngaya-ll commented Jan 14, 2019

As a workaround, you can override the api.mustache template to comment out the code that adds the progress interceptor (here).

@wing328
Copy link

wing328 commented Jan 16, 2019

For the fork, I believe @brendandburns is referring to https://github.com/OpenAPITools/openapi-generator (please refer to Q&A for the reasons behind the fork)

We've made a couple of enhancements based on @brendandburns feedback:

We definitely welcome PRs to further improve the Java client generated by OpenAPI Generator.

@rjeberhard
Copy link
Contributor

Just also want to mention that even if you don't hit the StackoverflowError, this is still a memory leak. We were able to workaround this by clearing the networkInterceptors() before returning the ApiClient to our pool.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 29, 2019
@mattnworb
Copy link
Author

/remove-lifecycle stale

this is definitely still an issue.

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 29, 2019
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 28, 2019
@manlinl
Copy link

manlinl commented Aug 15, 2019

Any update on this issue? This renders the client library useless in production environment.

@manlinl
Copy link

manlinl commented Aug 15, 2019

/remove-lifecycle stale
This is a blocker.

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 15, 2019
@yue9944882
Copy link
Member

we're on the way of migrating the repo toward the openapi-generator #595 , but OpenAPITools/openapi-generator#3191 is now a blocker. politely ping @wing328

@wing328
Copy link

wing328 commented Aug 15, 2019

Let me give it a try this weekend to fix the bug in OpenAPI Generator.

@yue9944882
Copy link
Member

@wing328 thanks, if it works, we will get it after 1.16 releases i.e. end of Sept.

@mattnworb
Copy link
Author

/lifecycle frozen

@k8s-ci-robot k8s-ci-robot added the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label Aug 15, 2019
@rjeberhard
Copy link
Contributor

If it helps, we implemented a workaround to pool ApiClients and to clear the network interceptor list on recycling: https://github.com/oracle/weblogic-kubernetes-operator/blob/master/operator/src/main/java/oracle/kubernetes/operator/helpers/ClientPool.java

@manlinl
Copy link

manlinl commented Aug 15, 2019

Thanks for the update! Looking forward to the fix.

@mattnworb
Copy link
Author

I believe this is still an issue after #709 and #778.

Inspecting the code in io.kubernetes:client-java:5.0.0, a call like CoreV1Api.listNamespaceAsync(...) calls into listNamespaceValidateBeforeCall which calls into listNamespaceCall, which has:

if (progressListener != null) {
      this.apiClient.getHttpClient().networkInterceptors().add(new Interceptor() {
        ...
      });
}

So the generator still produces code which adds a networkInterceptor to the httpClient on every request, and I don't see where those networkInterceptors are cleaned up. I could be missing something though, as I'm not that familiar with openapi-generator and how it differs than the earlier used swagger-codegen.

@mattnworb
Copy link
Author

Oops, the commit from #778 is in io.kubernetes:client-java:7.0.0, I missed that there were versions newer than 5.0.0. Looks like the problematic behavior of adding new networkInterceptors onto the httpclient on each request no longer appears there 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants