Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

listener: ListenerSocketFactoryImpl does not dup listen fd at in place update #18677

Closed

Conversation

lambdai
Copy link
Contributor

@lambdai lambdai commented Oct 19, 2021

Commit Message:
Fixed the bug that in place update listener holding listen fd during the drain.

The solution is to share the listen sockets without fd duplication.
Previous: The new listener create new listen socket fd which has the same inode as the old listen fd.
This PR: The new listener copy the shared_ptr of the previous listen socket. No fd is created.

Additional Description:
Risk Level:
Testing: Modified unit test.
Docs Changes:
Release Notes:
Platform Specific Features:
[Optional Runtime guard:]
Fix #18616
[Optional Fixes commit #PR or SHA]
[Optional Deprecated:]
[Optional API Considerations:]

@lambdai
Copy link
Contributor Author

lambdai commented Oct 19, 2021

/assign @mattklein123

@lambdai
Copy link
Contributor Author

lambdai commented Oct 20, 2021

This PR address the minor issue of the below sequence

  1. add listener 0.0.0.0:80. Open fd = {X}
  2. complete an in place update listener, and the old listener is draining filter chain. This draining state last drain_timeout. The new listener create listen fd=Y. Now open fd = {X,Y}.
  3. immediately stop the listener. Open fd = {X}, Closed fd = {Y}.
  4. after the drain_timeout, Closed fd = {X,Y}

The expectation is that
after 3 but before 4, tcp client connect to :80 should see reset()

Actual
tcp client connect to :80 is successful because the old listener fd = X is open. It's a minor issue because envoy doesn't serve the tcp client.

@mattklein123
Copy link
Member

In step (3) above, why not just go and force close the draining listener socket X? Or, perhaps do something like hot restart where we wait some period of time after making Y and then just go and close X since we assume Y will pick up new sockets?

In any case, per offline discussion, we aren't going to merge this PR since:

  1. I don't thing it's a good idea to add both "share" and "clone" functionality. We use a ref counting system as I already mentioned to you if there is a good reason to do this to make it cleaner.
  2. I think the correct fix is the one I mentioned above.

I would recommend closing this PR, opening an issue about the above bug, and then we can look at a different and more targeted fix for it?

@lambdai
Copy link
Contributor Author

lambdai commented Oct 20, 2021

Agree with the direction that the new share() is not appropriate.

I would recommend closing this PR, opening an issue about the above bug, and then we can look at a different and more targeted fix for it?

Make sense. I will create issue and the list the potential fixes

@shashankram
Copy link

This PR address the minor issue of the below sequence

  1. add listener 0.0.0.0:80. Open fd = {X}
  2. complete an in place update listener, and the old listener is draining filter chain. This draining state last drain_timeout. The new listener create listen fd=Y. Now open fd = {X,Y}.
  3. immediately stop the listener. Open fd = {X}, Closed fd = {Y}.
  4. after the drain_timeout, Closed fd = {X,Y}

The expectation is that after 3 but before 4, tcp client connect to :80 should see reset()

Actual tcp client connect to :80 is successful because the old listener fd = X is open. It's a minor issue because envoy doesn't serve the tcp client.

@lambdai @mattklein123 I seem to be hitting a similar issue in #20113, where during the listener drain sequence, requests are not rejected by the listener being removed. Instead, the draining listener allows the client to connect to it, which hangs till the drain interval expires.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Envoy v1.20.0 slower to re-initialize dynamic listeners
3 participants