-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix ability to use gen_tcp/gen_udp with externally open AF_LOCAL fds #612
Conversation
Thanks for fixing this. This has been a long-term problem. |
Maybe if I get some time in a couple of months I'll add "native" support to gen_tcp/gen_udp to deal with opening unix domain sockets, but for now this patch is a good interm solution. |
Patch has passed first testings and has been assigned to be reviewed I am a script, I am not human |
Please add tests for these changes. |
gen_udp and gen_tcp have their names for a reason, there is nothing in the documentation or their naming that suggest that they should be able to handle anything else than udp/tcp sockets. If anything, they should reject working on fd's that are not udp/tcp, IMO. There are other solutions to work on native sockets outside of stdlib, search for procket, gen_socket and probably others that I'm not aware of. |
@zhird : Regarding tests - AFAIK there is no way to open a Unix Domain Socket from within the current OTP implementation, which is the reason I tested it outside of the OTP using a NIF-based UDS socket creation. Would it be appropriate to add a NIF library to the test cases? |
@RoadRunnr : I believe you are confusing the address family with the transport. In POSIX different address families (e.g. INET, UNIX, etc) can handle various protocol transports (e.g. UDP, TCP). File descriptor belongs to an address family. Unix Domain Sockets are also perfectly valid candidates for UDP/TCP transport abstractions, and the fact that OTP has current restriction of not allowing to bind them with gen_udp and gen_tcp is a clear oversight. |
@saleyn: I believe you are confusing the meaning of the socket domain and type parameters. UDP and TCP are IP protocols. When you open a socket of domain AF_INET, you select IPv4, AF_INET with type SOCK_DGRAM then selects UDP and AF_INET with type SOCK_STREAM selects TCP. AF_UNIX with SOCK_DGRAM does not mean UDP, since UDP is a protocol from the AF_INET domain. UNIX sockets can be run in datagram or stream mode, but this does not mean that it runs UDP or TCP on top of them. See also 'man 7 ip' on Linux:
'man 7 unix':
|
@RoadRunnr: though technically you are correct that SOCK_DGRAM is not the same as UDP and SOCK_STREAM is not the same as TCP, the protocol selection is not explicitly defined by neither 1st nor 2nd parameter in the socket(2) call, but by the 3rd one (which selects the default protocol implementation only when specified as 0). SOCK_DGRAM and SOCK_STREAM define the semantics of communication over the socket, whereas the actual protocol type (e.g. UDP/TCP/SCTP) is chosen either explicitly or implicitly (e.g. IPPROTO_TCP/IPPROTO_UDP, which for UDS sockets is always implicit). The UDS socket family uses transport protocols that support the SOCK_STREAM and SOCK_DGRAM abstractions. Since all higher level functions dealing with sockets are fully compatible with file descriptors of LOCAL/INET address families, it's logical to offer that functionality to the user of gen_udp and gen_tcp, who shouldn't care about the origin of the lower-level protocol implementation. IMHO, it would actually make more sense to call these modules more generically gen_dgram and gen_stream, but since TCP and UDP are much more "user-friendly" (and it's questionable whether they were exclusively meant to be used on the INET sockets by design or this was the result of an oversight) I understand why more specific names were selected. |
I looked into this some and yes, @RoadRunnr has a good point. We'll look into it some more, so no need to spend time on fixing the tests yet. |
@zhird @RoadRunnr It appears you may be confusing the type UDP (i.e., User Datagram Protocol, SOCK_DGRAM) with the domain AF_UNIX or AF_LOCAL (UNIX domain sockets). For many years Erlang has changed internal validation to block UNIX domain socket usage, making it harder and harder to use, crippling the usage internally. Please do not block this due to ignorance or the "well they don't exist on Windows" (anonymous pipes) excuse that was on the mailing list a year ago or more. It would be nice to get past this. There is less latency when using a unix domain socket when compared to an inet socket on localhost, both using TCP, making this a significant oversight. |
To summarize the main questions:
The current patch answers the first question affirmatively, and given that UDS support is not available on Windows, IMHO the answer "yes" to the second question would make more sense. |
@saleyn To give a background on the problem, links to erlang-questions mailing list posts can help: http://erlang.org/pipermail/erlang-questions/2014-March/078347.html . That post shows the problem has been the source of complaints for 6 or more years. The (2) approach was what worked with release R15 and earlier as mentioned in the email thread. All the inet Erlang source code was replicated at https://github.com/tonyrog/afunix because the Erlang/OTP team was unwilling to consider UNIX domain sockets in the past, for unknown reasons. The example https://github.com/mikma/unixdom_drv was broken due to the validation added to the Erlang inet source code in R16 and later. To completely bypass Erlang/OTP socket support, https://github.com/msantos/procket exists, but does not provide all the functionality found in Erlang/OTP, so it ends up being less useful and more error-prone due to a lack of testing and use. I strongly believe we need the (1) approach to get the Erlang/OTP source code back to the usable state it was in at R15 and earlier, with unix domain sockets, since there is no good reason not to. The (2) approach will just continue to fragment effort more than it already has been, so it would not be constructive. |
How does your UNIX socket use TCP? +1 for gen_uds. |
@nox I have code like the simplified sequence below, in a NIF at https://github.com/CloudI/CloudI/blob/develop/src/lib/cloudi_core/cxx_src/cloudi_socket_drv.cpp#L97-L167 which creates the unix domain socket listener for TCP connections:
Due to Erlang not supporting unix domain sockets, but still wanting to use the inet source code path due to its testing, I create an inet socket and use dup2 on the file descriptor... which is an unfortunate solution which angers the io thread (it consumes 100% of the CPU after the socket is used the first time). From what I have seen, it is an error in the Erlang VM which is not handled (CloudI/CloudI#39). So, it would be nice to have this be a real feature to avoid integration problems. Otherwise it is just forcing people to avoid OTP usage to get efficient localhost sockets, which seems wrong. If it was required to be in gen_uds due to too much cruft in inet: ok. However, that is just making it more complicated than it needs to be IMO (since gen_tcp and gen_udp are for the TCP/UDP protocols that are not exclusive to the inet domain/family). (e.g., see inet:address_family() for the current selection with gen_tcp) |
Patch has passed first testings and has been assigned to be reviewed I am a script, I am not human |
Could you please give an update on the status of this patch? |
It is on my table for review, mostly because no-one else wanted it. I am sorry that it takes so long, it is an awkward decision between your patch that gets the job done right now and the hopefully better knowing people I have asked that agrees it is a problem that should be solved but there should be better ways (all suggest different ways), but no ways that work right now... So currently this pull request gets starved by a steady stream of higher priority problems on my table and the high hill of reading up I will have to do to form an opinion of myself I can stand by. That is the sad situation... / Raimo |
Raimo, thanks for following up quickly! Though this is not as involved as the SCTP patch you helped us integrate years back, I am sure you will eventually do as amazing job as you did with the former. 😉 |
I'll try to do my very best... |
I have a strong desire for this functionality to get into RabbitMQ, which I use heavily as the broker for fire-and-forget messages from my web servers (installed on each web server to handle the load). I have been able to get 30% or better latency improvements from memcached by using unix sockets for similar tasks. |
Any update on this pr ? Any help needed ? |
Rebased to current maint |
Patch has passed first testings and has been assigned to be reviewed I am a script, I am not human |
👍 |
Nope. 8 bit length field was also the right decision... There has been many historical mistakes when designing the network APIs in Erlang. So I guess squeezing the UDS name into the IP field as you did is the only viable option, allthough the UDS address really is on the same level as an {IP,Port} tuple. |
@RaimoNiskanen: regarding setting desc->name_addr in fdopen - I can't recall 100% why I did that, as it's been quite a while, but I suspect it was so that when inet:sockname(Socket) is called so that the UNIX file name is returned. Also, if memory serves, I had to add that code or else ability to pass external an external FD to a gen_tcp/udp instance didn't work (likely because). One other design question to consider is if in the active mode it's acceptable to return |
@saleyn wrote:
Shouldn't that be Check out the upcoming RC2I have reworked your suggestion somewhat (a lot(?)) so I hope you like the result or at least can bear with it :-) I have used The local and remote names for the socket is not cached in I have not had time for any test suites or documentation, awaiting feedback on this suggestion coming in 19.0.rc2 now running in our daily build on proposed updates for master i.e to be RC2. I have tested this manually on Ubuntu 14.04, FreeBSD 10.1 and OpenBSD 5.9 (which needs a little fix):
And this, which is a weird one since S3 is not bound:
Linux's "Abstract Addresses" feature works; just use Have fun! Give me feedback! |
@okeuday: I am awaiting further problem description after 19.0.rc2 |
@RaimoNiskanen The problem at CloudI/CloudI#39 is not due to the changes in this pull request. I would like to use the changes (or at least the interface) in this pull request to avoid the problem. The problem CloudI/CloudI#39 is related to my usage of undocumented functions (prim_inet:getfd/1 and prim_inet:ignorefd/2) and using dup2 on a file descriptor used by an Erlang port object. It was the easiest way to get AF_LOCAL usage working through inet despite the validation present that attempts to prevent it, so this particular hack might not be considered valid use (i.e., is using undocumented functions a bug if the functions are not meant to be used?). If that problem needs a bug filed, tell me to file it, but that problem is not caused by these changes. |
@okeuday: Well..., if our own undocumented primitives does not work as we expect them to, then it should be fixed. So the question is if they behave badly or unfortunately as expected in this case. :-) But it is of course better if you can use AF_LOCAL through gen_* from 19.0.rc2. |
@RaimoNiskanen: is there a tag for RC2 that includes your changes? I believe the current latest tag is rc1 (https://github.com/erlang/otp/tree/OTP-19.0-rc1). |
Merged to master.@saleyn: Nope, no tag. The integration team sets the tag today. But a master branch of 30a202a or later contains the changes. I think no.1 on the todo list will be to rename the address family from @okeuday: Might it be simply so that a socket fed via the |
The tag is now out: OTP-19.0-rc2 What do you think; should we name the address family @okeuday: It seems |
Regarding the address family, since AF_UNIX and AF_LOCAL are used interchangeably, this is likely just a matter of taste. The name One other point that was brought up earlier in this thread by @RoadRunnr, is not so much in regards to the use of |
@RaimoNiskanen I agree |
Alright. Fair point! It also seems like some kind of way in the middle since I find Linux forums that talk about renaming AF_LOCAL to AF_FILE, so AF_LOCAL maybe is the common name today... |
@RaimoNiskanen: First of all, thank you for taking the time to rework the patch in order to integrate it into the distribution! I played with this a little bit, and even though the following works fine: Terminal1:
Terminal2:
I find it a little strange that the client can't connect to a local socket with just:
It seems to me that this should be permitted without requiring a client to go through separate calls for file descriptor creation. |
A slight modification works:
On Linux you can use This is because the outgoing connection can not utilize an ephemeral port (0) to bind to any port interfaces since ports have no meaning for AF_LOCAL, so it needs an outbound address. An outbound address is not actually needed, though... For AF_LOCAL you can connect without binding to an address and then you get an anonymous address denoted by the address There is also an OS function socketpair() that can give two connected anonymous sockets, and this function I think we should implement as local:socketpair(). The existence of this function shows that anonymous addresses is a supported feature of AF_LOCAL. Today inet_drv.c checks that the socket has bound an address before allowing to connect. I do not know why, but it is probably an AF_INET safeguard that I suspect should be removed. I would like to get some insight on this... Anyway this is why an outbound address is needed for gen_tcp:connect/3 for now, but I think this restriction should be removed. gen_tcp:connect/3 with I almost had this implemented, but ran into the existing inet_drv.c restriction so I thought it needed some more contemplation first... |
Yes, that AF_INET safeguard seems to be unnecessary. If the need for that code is still not understood, perhaps as a work-around for the local sockets when connect is issued without {ifaddr, Addr} option, to populate it in the |
Since |
I ripped out the internal state BOUND from inet_drv.c, and it seems to cause no problems in our daily tests. This, I felt, was the smallest and safest solution... I am adding test cases and polishing corner cases in the code, for the release. |
FYI, a common optimisation for UDP is skipping e.g. https://github.com/lpgauth/shackle/blob/master/src/shackle_udp.erl#L49 This change breaks this pattern since now UDP requires |
Being a common optimization does not make it correct. ;-) And I wonder how common it really is... Nevertheless the optimization uses a very internal API and I have no second thoughts about changing an internal API for a major release. The use case the optimization targets seems to be the same one that the undocumented gen_udp:connect/3 targets. Unfortunately a send over a connected UDP socket is just as badly optimized as over an unconnected, but the connected send will be easy to optimize, which we probably will have to look at fairly soon. Optimizing connected UDP send would simply do Regarding your optimization you can change |
@RaimoNiskanen cool, I like the idea of optimizing And yes, I just need to add the address family. https://github.com/lpgauth/shackle/blob/master/src/shackle_udp.erl#L26 |
@RaimoNiskanen I have question regarding "Abstract Address", given that I have this address:
How can I connect to it as a client from Erlang? |
That does not look like an "Abstract Address". That looks like a normal AF_LOCAL address. The type An abstract address in erlang might look like this: |
@RaimoNiskanen thank for your help, I had to set the first byte of abstract socket path to <<0>>:
|
Curious: Where did you find this "Abstract Path", how did you find out that it was "abstract", is there a file /tmp/dbus-cNsBK4GIj1 in the filesystem and does not If not try You should not have to use prim_inet. |
Well in Linux distros that are using D-Bus, D-Bus provides access to current session like this
And no there is no file in this path. I just tried it without |
That was to get an Fd with DBUS has got a for me new new way to use abstract addresses. I have just had the OS create an abstract address, and then it gives me one like i showed above... |
When a AF_LOCAL file descriptor is created externally (e.g. Unix
Domain Socket) and passed to
gen_tcp:listen(0, [{fd, FD}])
, theimplementation incorrectly assigned the address family to be equal
to
inet
, which in the inet_drv driver translated to AF_INET insteadof AF_LOCAL (or AF_UNIX), and an
einval
error code was returned.This patch fixes this problem such that the file descriptors of the
local
address family are supported in the inet:fdopen/5,gen_tcp:connect/3, gen_tcp:listen/2, gen_udp:open/2 calls.
See https://github.com/saleyn/euds for examples of using this feature.