-
Notifications
You must be signed in to change notification settings - Fork 30.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test: fix test-tls-ticket-cluster.js #52431
test: fix test-tls-ticket-cluster.js #52431
Conversation
Ensured connection end is initiated by worker (server) Fixes: nodejs#2510
c.on('end', () => { | ||
c.end(); | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
c.on('end', () => { | |
c.end(); | |
}); |
Removing the whole 'secureConnect'
listener should have the same effect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your review. I applied your suggestion and tested it locally. I ran it over 4000 times and got 2 timeouts. The following is the test log:
E:\github\node>python tools\test.py test\parallel\test-tls-ticket-cluster
=== release test-tls-ticket-cluster ===
Path: parallel/test-tls-ticket-cluster
spawn file: E:\github\node\out\Release\node.exe
spawn file: E:\github\node\out\Release\node.exe
spawn file: E:\github\node\out\Release\node.exe
spawn file: E:\github\node\out\Release\node.exe
[primary] got "listening"
[primary] connecting 49193 session? false
[primary] got "listening"
[primary] got "listening"
[primary] got "listening"
[worker] connection reused? false
[primary] got "not-reused"
[primary] connecting 49193 session? true
[worker] connection reused? true
[primary] got "reused"
[primary] connecting 49193 session? true
[worker] connection reused? true
[primary] got "reused"
[primary] connecting 49193 session? true
[worker] connection reused? true
[primary] got "reused"
[primary] connecting 49193 session? true
[worker] connection reused? true
[primary] got "reused"
[primary] connecting 49193 session? true
[worker] connection reused? true
[primary] got "reused"
[primary] connecting 49193 session? true
[worker] connection reused? true
[primary] got "reused"
[primary] connecting 49193 session? true
[worker] connection reused? true
[primary] got "reused"
[primary] connecting 49193 session? true
[worker] connection reused? true
[primary] got "reused"
[primary] connecting 49193 session? true
[worker] connection reused? true
[primary] got "reused"
[primary] connecting 49193 session? true
[worker] connection reused? true
[primary] got "reused"
[primary] connecting 49193 session? true
[worker] connection reused? true
[primary] got "reused"
[primary] connecting 49193 session? true
[worker] connection reused? true
[primary] got "reused"
[primary] connecting 49193 session? true
[worker] connection reused? true
[primary] got "reused"
[primary] connecting 49193 session? true
[worker] connection reused? true
[primary] got "reused"
[primary] connecting 49193 session? true
[worker] connection reused? true
[primary] got "reused"
[worker] got "die"
[worker] server close
[worker] exit
[worker] got "die"
[worker] server close
[worker] exit
[worker] got "die"
[worker] server close
[worker] exit
[primary] worker died
[worker] got "die"
[worker] server close
[worker] exit
[primary] worker died
[primary] worker died
[primary] worker died
Command: E:\github\node\out\Release\node.exe E:\github\node\test\parallel\test-tls-ticket-cluster.js
--- TIMEOUT ---
[02:32|% 100|+ 0|- 1]: Done
Failed tests:
E:\github\node\out\Release\node.exe E:\github\node\test\parallel\test-tls-ticket-cluster.js
Since the initial solution doesn't have that issue, I'd go with it to avoid timeouts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can reproduce the timeouts also with the initial version then as the only difference is that c.end()
is called in the next tick. See
node/lib/internal/streams/readable.js
Line 1699 in 9ef03f1
process.nextTick(endWritableNT, stream); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine either way, as is or with the 'secureListener'
event listener removed. The result is the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left the test running overnight and saw the error, which is related to the handshaking. It appears that removing secureConnect
doesn't have the same effect as adding an end
callback. Do you have any other suggestion I could try here?
P.S. I applied your other suggestion, the code looks better now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left the test running overnight and saw the error, which is related to the handshaking. It appears that removing
secureConnect
doesn't have the same effect as adding anend
callback. Do you have any other suggestion I could try here?
I have no suggestions. What is currently done in the 'end'
callback, is already done automatically even if the callback is not added. The only difference is that c.end()
is called in the next tick. This makes it explicit and I'm fine with it.
Co-authored-by: Luigi Pinca <[email protected]>
|
Landed in cfca9e3 |
Ensured connection end is initiated by worker (server) Fixes: #2510 PR-URL: #52431 Reviewed-By: Luigi Pinca <[email protected]>
Ensured connection end is initiated by worker (server) Fixes: #2510 PR-URL: #52431 Reviewed-By: Luigi Pinca <[email protected]>
Ensured connection end is initiated by worker (server) Fixes: #2510 PR-URL: #52431 Reviewed-By: Luigi Pinca <[email protected]>
Problem
I've investigated the related issue in depth and found out that this test sometimes fails because of the handshake process. When the error happens and I look at the Wireshark logs, I see that the client doesn't send a
Change Cipher Spec
packet, but instead sends aFIN
packet after a while. TheFIN
packet is sent in approximately 10ms. This may be some timeout, but I wasn't able to find proof for that.I've added 2 screenshots from the Wireshark logs (which I can provide too if requested). The first one shows the expected packet flow when the test passes, and the second one shows the packet flow when the test fails.
Pass:
Failure:
As can be seen, the
Change Cipher Spec
packet was missing, and theFIN
packet had already been sent before the handshake was complete.I've seen discussions about the issues in the handshake process of Node.js in the following links: openssl/openssl#8069, openssl/openssl#7199, #18770.
As I don't have in-depth knowledge about OpenSSL, I'm not sure if this issue is caused by the way OpenSSL is used by Node. If there is a need to investigate it, a new issue can be opened.
Fix
In this PR, I’ve made the
Primary
(client) wait for theFIN
packet from theWorker
(server) to close the connection. This approach ensures that thePrimary
waits until it receives aFIN
packet from theWorker
and doesn’t time out during the handshake. I've run the test with the fix roughly 11k times and it hasn't failed once. Without the fix, I would usually get a failure with 1k runs easily.Fixes: #2510