-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problem: calling randombytes_close with libsodium can crash Contexts in other threads #4242
Conversation
randombytes_close is not threadsafe and calling it while still in use by a Context can cause a crash. For implementations using /dev/[u]random, this can leave up to one leftover FD per process. The libsodium docs suggest that this function rarely needs to be called explicitly.
Unfortunately this is not a solution, because then we go back to leaking file descriptors, which trigger a cascade of other problems (selinux errors, etc etc) |
OK, then the fix is to revert #3001 |
Eh I wish - that breaks a whole other host of cases due to C++ broken global initialization mechanisms... |
I don't fully understand how the refcounting broke things, but to be correct, libsodium must have a refcounter to protect the call to As it is now, one must ensure that once a context is active dealing with curve messages, no other contexts can be terminated and no new keypairs issued. |
It's explained in the comments - it's not possible to do refcounting via globals in C++. The only solution is to get a kernel with getrandom(), everything else fixes one things and breaks another, which is not acceptable, as we really need to keep things stable. |
That's not something folks usually have control over, so not really a solution.
Yeah, I get that. Even fixing wrong behavior can break things. WDYT about a runtime and/or compile-time opt-out of the random_close? When The 'wrong' behavior of the leak is easy to workaround (close the FD) and of little consequence (leak up to one FD in the lifetime of the process), whereas the current behavior is not possible to workaround except with "get a different kernel". |
A build option would be fine. Note that the leak is not just cosmetic, things like selinux will raise an audit and so on. |
I added an opt-in to cmake and autotools (my m4 is rusty, so I hope it's right). As you might guess, I would prefer the non-crashing behavior be the default, but I assume no change by default is the right choice for consistency. I wonder if a runtime
Do you have a link to more information on that? If I have application code that does: zmq_term() // doesn't call randombytes_close() anymore
randombytes_close() would that avoid the audit issue, or no? I don't have experience with selinux, so don't know exactly what's being audited there. |
da20a13
to
e9b1292
Compare
CMakeLists.txt
Outdated
@@ -258,6 +258,7 @@ endif() | |||
|
|||
option(WITH_LIBSODIUM "Use libsodium instead of built-in tweetnacl" ON) | |||
option(WITH_LIBSODIUM_STATIC "Use static libsodium library" OFF) | |||
option(ENABLE_LIBSODIUM_RANDOMBYTES_CLOSE "Automatically closing libsodium randombytes (not threadsafe)" ON) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want to list the explanation, please be precise "not threadsafe without getrandom()"
Do you think it's better to restore the refcounting instead of disabling entirely (i.e. exactly reverting #3001 if the new flag is set)? |
Current patch is best - the workaround to call randombytes_close manually is trivial, while safely dealing with the global init shenanigans is extremely convoluted, if at all possible |
Problem: randombytes_close is either a no-op or unsafe when a Context is running. Unfortunately, there does not appear to be a single always correct choice, so let builders pick between two not-great options. Opting out can leak an FD on /dev/urandom which may need to be closed explicitly. However, with the default behavior, using multiple contexts with CURVE can crash with no application-level workaround available.
e9b1292
to
e3da163
Compare
Problem: When using libsodium, calling
zmq::random_close()
closes a shared resources that may still be in use.See #4241 for details and repro, but this is called unconditionally on every
curve_keypair()
and every context destructor, since #3001. Closing any context and/or issuing a new curve keypair can crash another context if it is working with curve-enabled sockets at the time, as it is still a consumer of a now-closed resource. libsodium will try to reinitialize the source, so it doesn't crash every time; instead, certain race conditions must be met to result in a crash.Solution: Do not close this shared resource while it is still in use, as suggested by the libsodium documentation.
For implementations using /dev/[u]random, this can leave up to one leftover FD per process (restores #2632, but should be bounded to one).
If we want to call
randombytes_close()
, we must restore the use of refcounting in #2636 when using libsodium to avoid crashes.closes #4241