-
Notifications
You must be signed in to change notification settings - Fork 992
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
our TCP listener is nonblocking so we *must* set the accepted stream to blocking explicitly #3154
Conversation
…to blocking explicitly...
Amazing. I would've never thought that would happen and not fail catastrophically somewhere. Awesome find! |
I suspect this also explains part of the slow header sync - if you request headers from an outbound peer then they are sending you a reasonably large chunk of bytes via a |
I have successfully sync'd against a node running this code several times in a row without any issues. |
Once we get this merged, I'll ask everyone I know of that runs public nodes to build from master. I'm aware of dozens of people that simply can't use Grin without this. |
@antiochp do you have a link to the docs or source code? I did it in the beginning a while ago and then removed it out of curiosity. Nothing changed. Also I run a public node and don’t see any issues, moreover I could not reproduce txhashet issue for 2 weeks. It would also cause high cpu load and my node runs on 0.3-1% cpu having at least a dozen of inbound peers. Having said this Gary experienced cpu spikes and fixed it adding a sleep after WouldBlock. The only explanation I have is that my node has a very good connection or extremely lucky. |
I'm fine with the change. After reading an example in rust docs I agree with your idea, however I still don't understand how it works on my machine, I have 29 inbound peers now, even with tui I'm under 5% cpu, with non-blocking io it was at least 60%. |
I actually don't no. This is all from experimenting with nodes locally (and a lot of debug logs). But I do think that the "principle of least surprise" may apply here - 99% of the time if you have a nonblocking listener you are going to want nonblocking streams accepted from it and it would actually be surprising behavior if you were then required to set each stream to nonblocking. Our usage is definitely an outlier - we have a nonblocking listener but we do want blocking streams accepted. It would be nice if there was a simple way of actually reading what the current blocking/nonblocking state of a stream was, but there does not appear to be. |
Hmm. @hashmap has been testing this under various conditions and it appears this is only an issue on MacOS. MacOS accepts nonblocking connections by default from a nonblocking listener |
A nonblocking TCP listener will accept nonblocking TCP streams by default.
This is very surprising to put it mildly.
We were creating nonblocking streams when accepting inbound connections from peers, but creating blocking streams when connecting to outbound peers.
Which made this problem really hard to debug and really hard to track down.
You can see this if you add debug logging in try_break! when matching on
WouldBlock
. On some connections we were seeing these scroll by at a rapid rate, but not all connections.It turns out it was only on inbound connections.
So you can only reproduce the issue on a public node, or locally with a couple of nodes on localhost.
This probably explains all kinds of weird connection related problems.
One example: if we request a txhashset from a public node and we are an inbound connection from the perspective of that node then the nonblocking stream is going to prevent that node from reliably sending the full txhashset attachment back un-corrupted.