Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug:Problems that may be caused by session cache and keepalive_reques #7237

Closed
hahayyum opened this issue Jun 12, 2022 · 13 comments
Closed

bug:Problems that may be caused by session cache and keepalive_reques #7237

hahayyum opened this issue Jun 12, 2022 · 13 comments
Labels

Comments

@hahayyum
Copy link

Current Behavior

The route is accessed using https. When there are more than 100 requests in a long

connection, an exception occurs after the connection is reset.

keepalive_request default value is 100

When the number of requests is within 100, it is correct, the log information is as

follows:

10.23.34.11 - - [11/Jun/2022:18:29:23 +0000] 192.168.249.2:6443 "GET /test HTTP/1.1" 200

315 0.015 "-" "Java Client" 192.168.244.3:443 200 0.004 "https://192.168.249.2:6443"

When the number of requests exceeds 100, it is an error, and the log information is as

follows:

10.23.34.11 - - [11/Jun/2022:18:29:23 +0000] 192.168.249.2:6443 "GET /test HTTP/1.1" 200

315 0.015 "-" "Java Client" 192.168.244.5:8080 200 0.004 "http://192.168.249.2:6443"

Analysis summary:
The phenomenon of the log is: the https request is still coming in, but other routes are matched and resolved to http. After checking, the "ctx" context is nil

guess:
Connection reset occurs when the 100 request cap is triggered, then due to session cache or connection reset bug

Expected Behavior

When the upper limit of 100 requests is reached, the connection can be reset normally and the request can be normal

Error Logs

see description above

Steps to Reproduce

see description above

Environment

  • APISIX version 2.7
@tokers
Copy link
Contributor

tokers commented Jun 12, 2022

@hahayyum Have you tried a newer APISIX version? 2.7 is too old.

@hahayyum
Copy link
Author

@tokers There are still problems with version 2.12, and there is no relevant modification records in the changelog. I think the problem may still exist. I prefer to clarify the reasons and then make changes. Because it has been applied to the online environment, it is not very convenient to upgrade.

@tokers
Copy link
Contributor

tokers commented Jun 13, 2022

@hahayyum OK, could you give the reproduce steps (including the route, upstream, and other core objects), so that we can reproduce it on our local?

@hahayyum
Copy link
Author

@tokers

curl http://apisix-admin.1-default-ns.svc.cluster.local:9180/apisix/admin/ssl/11111 -X PUT -H 'X-API-KEY: edd1c9f034335f136f87ad84b625c8f1' -d '
{
"key": "key",
"cert": "cert",
"snis": ["test.com"]
}'

curl http://apisix-admin.1-default-ns.svc.cluster.local:9180/apisix/admin/routes/testapisix -X PUT -H 'X-API-KEY: edd1c9f034335f136f87ad84b625c8f1' -d '
{
"priority": 0,
"desc": "",
"methods": [
"GET",
"POST",
"PUT",
"DELETE",
"PATCH",
"HEAD",
"OPTIONS",
"CONNECT",
"TRACE"
],
"hosts": [
"test.com"
],
"upstream": {
"pass_host": "pass",
"type": "roundrobin",
"nodes": [
{
"priority": 0,
"weight": 100,
"host": "kubernetes.default",
"port": 443
}
],
"scheme": "https",
"hash_on": "vars",
"timeout": {
"send": 6,
"connect": 6,
"read": 60
}
},
"name": "test",
"uris": [
"/test"
],
"labels": {},
"status": 1
}'

Certificates, keys, and upstream can be constructed casually, as long as they can be accessed using HTTPS. We use Java okhttp to establish an HTTP1.1 connection for access /test. We use k8s apiserver as the back-end service, and we can also use other

@hahayyum
Copy link
Author

@tokers
2.14.1 client mTLS was ignored sometimes in TLS session reuse: #6906
#6906

The latest version of this repair seems to solve the problem, but I'm not sure. Is there a specific description or problem scenario for this repair?

@tokers
Copy link
Contributor

tokers commented Jun 14, 2022

@hahayyum I'm not sure if this problem is related to the issue you given. @tzssangglass Could you take a look? Thanks!

@tzssangglass tzssangglass added checking check first if this issue occurred and removed checking check first if this issue occurred labels Jun 14, 2022
@tzssangglass
Copy link
Member

@hahayyum I'm not sure if this problem is related to the issue you given. @tzssangglass Could you take a look? Thanks!

hi @hahayyum, it looks like the fix for your problem is not due to #6906.

When the number of requests is within 100, it is correct, the log information is as

follows:

10.23.34.11 - - [11/Jun/2022:18:29:23 +0000] 192.168.249.2:6443 "GET /test HTTP/1.1" 200

315 0.015 "-" "Java Client" 192.168.244.3:443 200 0.004 "192.168.249.2:6443"

When the number of requests exceeds 100, it is an error, and the log information is as

follows:

10.23.34.11 - - [11/Jun/2022:18:29:23 +0000] 192.168.249.2:6443 "GET /test HTTP/1.1" 200

315 0.015 "-" "Java Client" 192.168.244.5:8080 200 0.004 "http://192.168.249.2:6443"

The difference between these two logs is: 192.168.244.3:443 and 192.168.244.5:8080, which means $upstream_addr, see:

access_log_format: "$remote_addr - $remote_user [$time_local] $http_host \"$request\" $status $body_bytes_sent $request_time \"$http_referer\" \"$http_user_agent\" $upstream_addr $upstream_status $upstream_response_time \"$upstream_scheme://$upstream_host$upstream_uri\""

It looks like some reason led to the selection of another node.

@tokers
Copy link
Contributor

tokers commented Jun 15, 2022

@tzssangglass Maybe cross upstream connection pool?

@hahayyum
Copy link
Author

@tokers @tzssangglass
I re-describe:
routeA: /test upstream: 192.168.244.3:443 Proto: https Host:t test.com
routeB: /* upstream: 192.168.244.5:8080 Proto: http Host: nil
If routeB does not exist, after more than 100 requests, return "404 Route Not Found"

I modified it according to the fix of #6906, and found that the problem was solved after a simple test, but further testing is needed.

guess:
After more than 100 requests, apisix initiates a reconnection (FIN, RST), and the client reconnects, but apisix reuses the last session, when the ssl_session_timeout (10m) time expires, the request returns to normal

after verification:
Set ssl_session_timeout to 30s, when the client reconnects, the exception occurs within 30s, the connection is reset after 30s, and the request returns to normal

so:
I would like to know the fix #6906, are there any specific instructions or descriptions, why it fixes the problem I encountered?

@tzssangglass
Copy link
Member

I would like to know the fix #6906, are there any specific instructions or descriptions, why it fixes the problem I encountered?

cc @spacewander PTAL

@spacewander
Copy link
Member

The #6906 fixes a bug about the TLS session reuse. Maybe it is relative to your problem.

@github-actions
Copy link

github-actions bot commented Jun 1, 2023

This issue has been marked as stale due to 350 days of inactivity. It will be closed in 2 weeks if no further activity occurs. If this issue is still relevant, please simply write any comment. Even if closed, you can still revive the issue at any time or discuss it on the [email protected] list. Thank you for your contributions.

@github-actions github-actions bot added the stale label Jun 1, 2023
@github-actions
Copy link

This issue has been closed due to lack of activity. If you think that is incorrect, or the issue requires additional review, you can revive the issue at any time.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jun 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants