Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initialize xp_ifindex of the transport. #137

Merged
merged 1 commit into from
Aug 7, 2018
Merged

Conversation

malahal
Copy link

@malahal malahal commented Jun 9, 2018

It defaults to zero and all responses go to a single queue leading to
just one thread doing the Replies. Added a monotonically increasing
value so we can use multiple threads for sending RPC Replies.

@was4
Copy link

was4 commented Jun 10, 2018 via email

@was4
Copy link

was4 commented Jun 10, 2018 via email

@malahal
Copy link
Author

malahal commented Jun 11, 2018

More investigation revealed that there is a client that is slow or downright doesn't acknowledge packets from our server. In this case writev/write hangs rather than return an error. A bad client can cause Ganesha hang! I think a non-blocking socket would help here, any downside with a non-blocking socket? Seems like there was an effort to make it non-blocking but the code is under comments for now.

@dang
Copy link
Collaborator

dang commented Jun 12, 2018

So, I don't think this change will help that situation. The ifq doesn't determine which thread is used, just which queue is used. The thread that's used is the one that called svc_sendreply(). A workaround would be to make svc_vc_reply() call the async write (svc_ioq_write_submit()) rather than the sync version (svc_ioq_write_now()). That gets rid of hot thread streaming, but it also means no write can block progress. So it likely would result in lower throughput.

Also, note that, in 2.5, this write should not block everything in Ganesha, just the one worker thread. So it would take as many dead clients as you have worker threads to halt entirely; and, in addition, the tcp session should eventually (30 minutes, I think) time out, closing the socket and freeing the thread.

@was4
Copy link

was4 commented Jun 12, 2018 via email

@was4
Copy link

was4 commented Jun 13, 2018 via email

@malahal
Copy link
Author

malahal commented Jun 14, 2018

Bill, based on documentation, writev/sendto/send/sendmsg will wait for the socket send buffer. I don't think the entire kernel memory needs to be exhausted. BTW, I did a very small experiment to show that a single slow client could affect perf with other clients:

Run these two lines on two different terminals on a Linux box:
#1. cat /dev/zero | strace -T -e sendto -f nc -l 5000
#2. nc localhost 5000 | pv -L 100

"pv" makes the second "nc" very slow to read the socket buffer after the pipe buffer full. The first screen will show sendto system call times. Initially, it will print a bunch of quick sendto calls, then it waits in sendto call. My system has 8GB and it still has 4-5GB free when this happened.

@malahal
Copy link
Author

malahal commented Jun 14, 2018

Bill, instead of using TCP_USER_TIMEOUT, I used SO_SNDTIMEO. I think SO_SNDTIMEO is what we are concerned with. TCP_USER_TIMEOUT also should work and it closes the TCP connection which is good. SO_SNDTIMEO needs Ganesha to close the connection and I am not sure SVC_DESTROY() is the right candidate. I have not tested the patch yet though. Here is the patch:

malahal@bb81c2b

Most of these options are inherited from the listening socket, so setting these on the listening sockets might be good enough. What do you think? Shall I go with SEND TIMEOUT or USER TIMEOUT? Anything that pushes us to use one over the other?

@was4
Copy link

was4 commented Jun 15, 2018 via email

@was4
Copy link

was4 commented Jun 15, 2018 via email

@malahal
Copy link
Author

malahal commented Jun 15, 2018

I was trying to show with nc cmmand that per socket max buffers are used and writev()/send() times do depend on how fast the other side can read.

SO_SNDTIMEO should work for this issue as well as with a bad client that ACKs everything and still advertises zero window size. TCP_USER_TIMEOUT will not work with the latter case and is only available in some versions (RHEL6.x lacks it!).

With either SO_SNDTIMEO or TCP_USER_TIMEOUT, we probably need a config parameter for extreme cases. Having a very large timeout like 5 seconds should be OK without any config parameter option, correct?

@was4
Copy link

was4 commented Jun 15, 2018 via email

@malahal
Copy link
Author

malahal commented Jun 18, 2018

kaleb recently posted a patch for non-systemd environments, that would be RHEL/CENTOS 6.x. Ganesha being user level should be more portable. Talking on IRC with DanG indicated that we should use SO_SNDTIMEO as TCP_USER_TIMEOUT is not available on REHL6.x systems. So keeping that in mind, I adjusted this patch here: malahal@100cc6b

Non blocking sockets would be better but it requires changes at many
places to handle non-blocking sockets. Added send timeout of 5 seconds
where writev should timeout at the most after 5 seconds. We destroy the
socket upon any failure of writev.
@dang dang merged commit c08a99f into nfs-ganesha:next Aug 7, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants