Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Manually close AMQP sockets upon stopping event catcher #76

Conversation

miha-plesko
Copy link
Contributor

With this commit we address a bugs that exist in qpid_proton gem that abruptly closed connections are leaking file descriptors. When AMQP connection is closed abruptly it results in TCP socket remaining open in CLOSE_WAIT state, which means file descriptor is not released:

$ ps -ef | grep MIQ
miha     108492     4234  3 12:02 pts/6    00:01:40 MIQ Server
miha   **108533** 108492  0 12:02 pts/6    00:00:07 MIQ: Nuage::NetworkManager::EventCatcher id: 105, queue: ems_3
miha     108545   108492  0 12:02 pts/6    00:00:02 MIQ: MiqEventHandler id: 106, queue: ems
miha     108554   108492  0 12:02 pts/6    00:00:04 MIQ: MiqGenericWorker id: 107, queue: generic

$ lsof -ap 108533 | grep CLOSE_WAIT
ruby    108533 miha  116u  IPv4   562438  0t0  TCP 172.16.117.189:53626->147.75.102.132:amqp (CLOSE_WAIT)
ruby    108533 miha  197u  IPv4   561644  0t0  TCP 172.16.117.189:53630->147.75.102.132:amqp (CLOSE_WAIT)
ruby    108533 miha  311u  IPv4   560657  0t0  TCP 172.16.117.189:53634->147.75.102.132:amqp (CLOSE_WAIT)
ruby    108533 miha  549u  IPv4   565342  0t0  TCP 172.16.117.189:53642->147.75.102.132:amqp (CLOSE_WAIT)
ruby    108533 miha  576u  IPv4   565122  0t0  TCP 172.16.117.189:53650->147.75.102.132:amqp (CLOSE_WAIT)
... (keeps growing)

After period of time (9 hours in our case) there are enough file descriptors open for
operating system to yield:

Too many open files - socket(2) for "172.16.117.189" port 5672

This is a bug in qpid_proton gem as reported here: https://issues.apache.org/jira/browse/PROTON-1791 Until it gets resolved, we're introducing a workaround for it with this commit. Basically we capture socket references and manually close them upon closing AMQP connection.

We also add some more debug logging to the messaging handler with this commit to be able to debug why connection is even being closed abruptly.

BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1554771

@miq-bot assign @juliancheal
@miq-bot add_label enhancement,gaprindashvili/yes

With this commit we address a bugs that exist in qpid_proton gem that
abruptly closed connections are leaking file descriptors. When AMQP
connection is closed abruptly it results in TCP socket remaining open
in CLOSE_WAIT state, which means file descriptor is not released:

```
$ ps -ef | grep MIQ
miha     108492     4234  3 12:02 pts/6    00:01:40 MIQ Server
miha   **108533** 108492  0 12:02 pts/6    00:00:07 MIQ: Nuage::NetworkManager::EventCatcher id: 105, queue: ems_3
miha     108545   108492  0 12:02 pts/6    00:00:02 MIQ: MiqEventHandler id: 106, queue: ems
miha     108554   108492  0 12:02 pts/6    00:00:04 MIQ: MiqGenericWorker id: 107, queue: generic

$ lsof -ap 108533 | grep CLOSE_WAIT
ruby    108533 miha  116u  IPv4   562438  0t0  TCP 172.16.117.189:53626->147.75.102.132:amqp (CLOSE_WAIT)
ruby    108533 miha  197u  IPv4   561644  0t0  TCP 172.16.117.189:53630->147.75.102.132:amqp (CLOSE_WAIT)
ruby    108533 miha  311u  IPv4   560657  0t0  TCP 172.16.117.189:53634->147.75.102.132:amqp (CLOSE_WAIT)
ruby    108533 miha  549u  IPv4   565342  0t0  TCP 172.16.117.189:53642->147.75.102.132:amqp (CLOSE_WAIT)
ruby    108533 miha  576u  IPv4   565122  0t0  TCP 172.16.117.189:53650->147.75.102.132:amqp (CLOSE_WAIT)
... (keeps growing)
```

After period of time (9 hours in our case) there are enough file descriptors open for
operating system to yield:

```
Too many open files - socket(2) for "172.16.117.189" port 5672
```

This is a bug in qpid_proton gem as reported here: https://issues.apache.org/jira/browse/PROTON-1791
Until it gets resolved, we're introducing a workaround for it with this commit. Basically we capture
socket references and manually close them upon closing AMQP connection.

We also add some more debug logging to the messaging handler with this commit to be able to debug
why connection is even being closed abruptly.

Signed-off-by: Miha Pleško <[email protected]>
@miha-plesko
Copy link
Contributor Author

Closing because qpid_proton guys found out that file descriptors leak due to raise statements in callback functions of our MessageingHandler and then sockets are not properly closed. Will come up with a PR that removes all raise-es from callbacks and the issue should be gone.

@miha-plesko miha-plesko deleted the prevent-leaking-file-descriptors branch August 8, 2018 06:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants