Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem: global random init/deinit breaks existing applications #3001

Merged
merged 1 commit into from
Mar 19, 2018

Conversation

bluca
Copy link
Member

@bluca bluca commented Mar 19, 2018

Solution: restrict it only to the original issue #2632, Tweetnacl on
*NIX when using /dev/urandom, ie: without the new Linux getrandom()
syscall.

Existing applications might use atexit to register cleanup functions
(like CZMQ does), and the current change as-is imposes an ordering
that did not exist before - the context MUST be created BEFORE
registering the cleanup with atexit. This is a backward incompatible
change that is reported to cause aborts in some applications.

Although libsodium's documentation says that its initialisation APIs
is not thread-safe, nobody has ever reported an issue with it, so
avoiding the global init/deinit in the libsodium case is the less
risky option we have.

Tweetnacl users on Windows and on Linux with getrandom (glibc 2.25 and
Linux kernel 3.17) are not affected by the original issue.

Fixes #2991

@bluca
Copy link
Member Author

bluca commented Mar 19, 2018

@sigiesec this is what I meant - for libsodium, revert to naive behaviour that we had initially before I did the first change that started breaking stuff, without any serialisation

src/random.cpp Outdated
@@ -86,6 +86,13 @@ uint32_t zmq::generate_random ()
// order fiasco, this is done using function-local statics, if the
// compiler implementation supports thread-safe initialization of those.
// Otherwise, we fall back to global statics.
// HOWEVER, this initialisation code adds a race condition when an
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not a race condition (this has nothing to do with concurrency). Maybe:

HOWEVER, this initialisation code imposes ordering constraints, which are not obvious to users of libzmq, and may lead to problems if atexit or similar methods are used for cleanup.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, updated


#elif defined(ZMQ_USE_LIBSODIUM)
if (init) {
int rc = sodium_init ();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this does not use the refcount, this means that sodium_init and randombytes_close will be called for each context. Is that what you meant?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - that is how it worked before my initial change. They were called inline in the Context class constructor and destructor.

In reality I think randombytes_close is mostly a no-op in libsodium, and nobody ever reported a problem with the multiple sodium_init calls either (with libsodium again).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, just wanted to be sure.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a clarifying correction for anyone who comes here in the future: this is only a no-op if getrandom is unavailable (already excluded in the above condition). If not, libsodium has all the same threadsafety issues as tweetnacl, so this PR causes crashes because libsodium has all the same requirements as tweetnacl here.

Solution: restrict it only to the original issue zeromq#2632, Tweetnacl on
*NIX when using /dev/urandom, ie: without the new Linux getrandom()
syscall.

Existing applications might use atexit to register cleanup functions
(like CZMQ does), and the current change as-is imposes an ordering
that did not exist before - the context MUST be created BEFORE
registering the cleanup with atexit. This is a backward incompatible
change that is reported to cause aborts in some applications.

Although libsodium's documentation says that its initialisation APIs
is not thread-safe, nobody has ever reported an issue with it, so
avoiding the global init/deinit in the libsodium case is the less
risky option we have.

Tweetnacl users on Windows and on Linux with getrandom (glibc 2.25 and
Linux kernel 3.17) are not affected by the original issue.

Fixes zeromq#2991
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants