bug:Problems that may be caused by session cache and keepalive_reques #7237

hahayyum · 2022-06-12T04:30:51Z

Current Behavior

The route is accessed using https. When there are more than 100 requests in a long

connection, an exception occurs after the connection is reset.

keepalive_request default value is 100

When the number of requests is within 100, it is correct, the log information is as

follows:

10.23.34.11 - - [11/Jun/2022:18:29:23 +0000] 192.168.249.2:6443 "GET /test HTTP/1.1" 200

315 0.015 "-" "Java Client" 192.168.244.3:443 200 0.004 "https://192.168.249.2:6443"

When the number of requests exceeds 100, it is an error, and the log information is as

follows:

10.23.34.11 - - [11/Jun/2022:18:29:23 +0000] 192.168.249.2:6443 "GET /test HTTP/1.1" 200

315 0.015 "-" "Java Client" 192.168.244.5:8080 200 0.004 "http://192.168.249.2:6443"

Analysis summary:
The phenomenon of the log is: the https request is still coming in, but other routes are matched and resolved to http. After checking, the "ctx" context is nil

guess:
Connection reset occurs when the 100 request cap is triggered, then due to session cache or connection reset bug

Expected Behavior

When the upper limit of 100 requests is reached, the connection can be reset normally and the request can be normal

Error Logs

see description above

Steps to Reproduce

see description above

Environment

APISIX version 2.7

tokers · 2022-06-12T06:43:37Z

@hahayyum Have you tried a newer APISIX version? 2.7 is too old.

hahayyum · 2022-06-12T08:00:17Z

@tokers There are still problems with version 2.12, and there is no relevant modification records in the changelog. I think the problem may still exist. I prefer to clarify the reasons and then make changes. Because it has been applied to the online environment, it is not very convenient to upgrade.

tokers · 2022-06-13T01:06:49Z

@hahayyum OK, could you give the reproduce steps (including the route, upstream, and other core objects), so that we can reproduce it on our local?

hahayyum · 2022-06-14T07:36:42Z

@tokers

curl http://apisix-admin.1-default-ns.svc.cluster.local:9180/apisix/admin/ssl/11111 -X PUT -H 'X-API-KEY: edd1c9f034335f136f87ad84b625c8f1' -d '
{
"key": "key",
"cert": "cert",
"snis": ["test.com"]
}'

curl http://apisix-admin.1-default-ns.svc.cluster.local:9180/apisix/admin/routes/testapisix -X PUT -H 'X-API-KEY: edd1c9f034335f136f87ad84b625c8f1' -d '
{
"priority": 0,
"desc": "",
"methods": [
"GET",
"POST",
"PUT",
"DELETE",
"PATCH",
"HEAD",
"OPTIONS",
"CONNECT",
"TRACE"
],
"hosts": [
"test.com"
],
"upstream": {
"pass_host": "pass",
"type": "roundrobin",
"nodes": [
{
"priority": 0,
"weight": 100,
"host": "kubernetes.default",
"port": 443
}
],
"scheme": "https",
"hash_on": "vars",
"timeout": {
"send": 6,
"connect": 6,
"read": 60
}
},
"name": "test",
"uris": [
"/test"
],
"labels": {},
"status": 1
}'

Certificates, keys, and upstream can be constructed casually, as long as they can be accessed using HTTPS. We use Java okhttp to establish an HTTP1.1 connection for access /test. We use k8s apiserver as the back-end service, and we can also use other

hahayyum · 2022-06-14T09:33:46Z

@tokers
2.14.1 client mTLS was ignored sometimes in TLS session reuse: #6906
#6906

The latest version of this repair seems to solve the problem, but I'm not sure. Is there a specific description or problem scenario for this repair?

tokers · 2022-06-14T09:41:17Z

@hahayyum I'm not sure if this problem is related to the issue you given. @tzssangglass Could you take a look? Thanks!

tzssangglass · 2022-06-14T13:38:53Z

@hahayyum I'm not sure if this problem is related to the issue you given. @tzssangglass Could you take a look? Thanks!

hi @hahayyum, it looks like the fix for your problem is not due to #6906.

When the number of requests is within 100, it is correct, the log information is as

follows:

10.23.34.11 - - [11/Jun/2022:18:29:23 +0000] 192.168.249.2:6443 "GET /test HTTP/1.1" 200

315 0.015 "-" "Java Client" 192.168.244.3:443 200 0.004 "192.168.249.2:6443"

When the number of requests exceeds 100, it is an error, and the log information is as

follows:

10.23.34.11 - - [11/Jun/2022:18:29:23 +0000] 192.168.249.2:6443 "GET /test HTTP/1.1" 200

315 0.015 "-" "Java Client" 192.168.244.5:8080 200 0.004 "http://192.168.249.2:6443"

The difference between these two logs is: 192.168.244.3:443 and 192.168.244.5:8080, which means $upstream_addr, see:

apisix/conf/config-default.yaml

Line 223 in 4581627

    
           access_log_format: "$remote_addr - $remote_user [$time_local] $http_host \"$request\" $status $body_bytes_sent $request_time \"$http_referer\" \"$http_user_agent\" $upstream_addr $upstream_status $upstream_response_time \"$upstream_scheme://$upstream_host$upstream_uri\""

It looks like some reason led to the selection of another node.

tokers · 2022-06-15T01:22:14Z

@tzssangglass Maybe cross upstream connection pool?

hahayyum · 2022-06-15T02:30:46Z

@tokers @tzssangglass
I re-describe:
routeA: /test upstream: 192.168.244.3:443 Proto: https Host:t test.com
routeB: /* upstream: 192.168.244.5:8080 Proto: http Host: nil
If routeB does not exist, after more than 100 requests, return "404 Route Not Found"

I modified it according to the fix of #6906, and found that the problem was solved after a simple test, but further testing is needed.

guess:
After more than 100 requests, apisix initiates a reconnection (FIN, RST), and the client reconnects, but apisix reuses the last session, when the ssl_session_timeout (10m) time expires, the request returns to normal

after verification:
Set ssl_session_timeout to 30s, when the client reconnects, the exception occurs within 30s, the connection is reset after 30s, and the request returns to normal

so:
I would like to know the fix #6906, are there any specific instructions or descriptions, why it fixes the problem I encountered?

tzssangglass · 2022-06-15T16:18:55Z

I would like to know the fix #6906, are there any specific instructions or descriptions, why it fixes the problem I encountered?

cc @spacewander PTAL

spacewander · 2022-06-16T06:42:22Z

The #6906 fixes a bug about the TLS session reuse. Maybe it is relative to your problem.

github-actions · 2023-06-01T10:04:46Z

This issue has been marked as stale due to 350 days of inactivity. It will be closed in 2 weeks if no further activity occurs. If this issue is still relevant, please simply write any comment. Even if closed, you can still revive the issue at any time or discuss it on the [email protected] list. Thank you for your contributions.

github-actions · 2023-06-16T10:04:27Z

This issue has been closed due to lack of activity. If you think that is incorrect, or the issue requires additional review, you can revive the issue at any time.

tzssangglass added checking check first if this issue occurred and removed checking check first if this issue occurred labels Jun 14, 2022

github-actions bot added the stale label Jun 1, 2023

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jun 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug:Problems that may be caused by session cache and keepalive_reques #7237

bug:Problems that may be caused by session cache and keepalive_reques #7237

hahayyum commented Jun 12, 2022

tokers commented Jun 12, 2022

hahayyum commented Jun 12, 2022

tokers commented Jun 13, 2022

hahayyum commented Jun 14, 2022

hahayyum commented Jun 14, 2022

tokers commented Jun 14, 2022

tzssangglass commented Jun 14, 2022

tokers commented Jun 15, 2022

hahayyum commented Jun 15, 2022

tzssangglass commented Jun 15, 2022

spacewander commented Jun 16, 2022

github-actions bot commented Jun 1, 2023

github-actions bot commented Jun 16, 2023

bug:Problems that may be caused by session cache and keepalive_reques #7237

bug:Problems that may be caused by session cache and keepalive_reques #7237

Comments

hahayyum commented Jun 12, 2022

Current Behavior

Expected Behavior

Error Logs

Steps to Reproduce

Environment

tokers commented Jun 12, 2022

hahayyum commented Jun 12, 2022

tokers commented Jun 13, 2022

hahayyum commented Jun 14, 2022

hahayyum commented Jun 14, 2022

tokers commented Jun 14, 2022

tzssangglass commented Jun 14, 2022

tokers commented Jun 15, 2022

hahayyum commented Jun 15, 2022

tzssangglass commented Jun 15, 2022

spacewander commented Jun 16, 2022

github-actions bot commented Jun 1, 2023

github-actions bot commented Jun 16, 2023