Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenSSL: Make stream liveness check non-blocking #3223

Closed
wants to merge 3 commits into from

Conversation

valga
Copy link
Contributor

@valga valga commented Apr 23, 2018

The problem: stream_get_contents() blocks on SSL-enabled stream is some cases when the stream is in non-blocking mode. For example, it happens when SSL packets are fragmented heavily, so the stream liveness check blocks until whole packet is received. It also causes high CPU usage because of read() calls in a loop without poll or delay.

strace:

read(131, "\27\3\3\0\235", 5)           = 5
read(131, "Nr\243\242\243V\205'\234bE\306", 157) = 12
read(131, 0x55c877eba524, 145)          = -1 EAGAIN (Resource temporarily unavailable)
read(131, 0x55c877eba524, 145)          = -1 EAGAIN (Resource temporarily unavailable)
... 600+ identical lines skipped ...
read(131, 0x55c877eba524, 145)          = -1 EAGAIN (Resource temporarily unavailable)
read(131, 0x55c877eba524, 145)          = -1 EAGAIN (Resource temporarily unavailable)
read(131, "\205\223\274\277 B\35V\231\325\373\360\37&J\6V\367\264D\357h67\276\235\324\270\305\205v\262"..., 145) = 36
read(131, 0x55c877eba548, 109)          = -1 EAGAIN (Resource temporarily unavailable)
read(131, 0x55c877eba548, 109)          = -1 EAGAIN (Resource temporarily unavailable)
... 800+ identical lines skipped ...
read(131, 0x55c877eba548, 109)          = -1 EAGAIN (Resource temporarily unavailable)
read(131, 0x55c877eba548, 109)          = -1 EAGAIN (Resource temporarily unavailable)
read(131, "\305q\214\205G\252\372\336\271\2\34\253\3625j\375nJ\315\vNL-\202+\351S$\37\240EZ"..., 109) = 72
read(131, 0x55c877eba590, 37)           = -1 EAGAIN (Resource temporarily unavailable)
read(131, 0x55c877eba590, 37)           = -1 EAGAIN (Resource temporarily unavailable)
... 1300+ identical lines skipped ...
read(131, 0x55c877eba590, 37)           = -1 EAGAIN (Resource temporarily unavailable)
read(131, 0x55c877eba590, 37)           = -1 EAGAIN (Resource temporarily unavailable)
read(131, "l\\\321$\223\226\32\243\356\312\223\267XLo\0325(\357d\6\305\337\261'\330|\3560\25\334F"..., 37) = 36
read(131, 0x55c877eba5b4, 1)            = -1 EAGAIN (Resource temporarily unavailable)
read(131, 0x55c877eba5b4, 1)            = -1 EAGAIN (Resource temporarily unavailable)
... 500+ identical lines skipped ...
read(131, 0x55c877eba5b4, 1)            = -1 EAGAIN (Resource temporarily unavailable)
read(131, 0x55c877eba5b4, 1)            = -1 EAGAIN (Resource temporarily unavailable)
read(131, "\253", 1)                    = 1
poll([{fd=131, events=POLLIN|POLLPRI|POLLERR|POLLHUP}], 1, 0) = 0 (Timeout)
read(131, 0x55c877eba513, 5)            = -1 EAGAIN (Resource temporarily unavailable)

backtrace:

0x00007fc8371d5e60 in __errno_location () from /lib64/libc.so.6
#0  0x00007fc8371d5e60 in __errno_location () from /lib64/libc.so.6
#1  0x00007fc837692fb9 in BIO_sock_should_retry () from /lib64/libcrypto.so.10
#2  0x00007fc837693027 in sock_read () from /lib64/libcrypto.so.10
#3  0x00007fc83769103b in BIO_read () from /lib64/libcrypto.so.10
#4  0x00007fc837a023e4 in ssl3_read_n () from /lib64/libssl.so.10
#5  0x00007fc837a03da0 in ssl3_read_bytes () from /lib64/libssl.so.10
#6  0x00007fc837a006f4 in ssl3_read_internal () from /lib64/libssl.so.10
#7  0x000055bb89ff9507 in php_openssl_sockop_set_option ()
#8  0x000055bb8a14079d in _php_stream_set_option ()
#9  0x000055bb8a14085c in _php_stream_eof ()
#10 0x000055bb8a140a78 in _php_stream_copy_to_mem ()
#11 0x000055bb8a1217b1 in zif_stream_get_contents ()
#12 0x000055bb8a22ed92 in ZEND_DO_FCALL_BY_NAME_SPEC_RETVAL_USED_HANDLER ()
... skipped ...

@nikic
Copy link
Member

nikic commented Apr 26, 2018

@DaveRandom @kelunik @bwoebi I have vague memories of you discussion stuff in this area at some point?

@DaveRandom
Copy link
Contributor

In general, I don't think it makes sense to use stream_get_contents() on a non-blocking stream, since it is a blocking call that's essentially sugar over an fread() loop.

I haven't had to time to fully consider the ramifications of this patch, but my gut says that this is going to cause problems in other other scenarios with non-blocking SSL streams.

Regardless, I think there is a better solution here in order to make the use case work reliably, which is to simply switch the stream to blocking mode at the start of the stream_get_contents() routine and then switch it back again afterwards, rather than changing the way that lower level stream operations work.

Unless I'm missing something?

@kelunik
Copy link
Member

kelunik commented Apr 27, 2018

@DaveRandom We use stream_get_contents() on non-blocking streams all the time: https://github.com/amphp/byte-stream/blob/24b2530ec74ea4964a548235806744709cc9a650/lib/ResourceInputStream.php#L74

Changing the stream into blocking mode here breaks at least Amp, probably others.

@valga
Copy link
Contributor Author

valga commented Apr 27, 2018

@DaveRandom stream_get_contents() is a syntax sugar for while (!feof()) {fread();}, but feof() doesn't block even on blocking streams, so why should it block on SSL non-blocking streams?

@kelunik
Copy link
Member

kelunik commented Apr 27, 2018

@valga That's not what it does for non-blocking streams. It returns before the stream is at EOF.

@valga
Copy link
Contributor Author

valga commented Apr 27, 2018

@kelunik Nope, when SSL packet is incomplete feof() goes into endless do SSL_PEEK(); while (1); loop until whole packet is received. And then it returns.

Just look at the bt and strace I provided.

@bukka
Copy link
Member

bukka commented Apr 27, 2018

@valga I see the issue but not sure about the solution. Basically what you are doing is returning PHP_STREAM_OPTION_RETURN_ERR when SSL_ERROR_WANT_READ or SSL_ERROR_WANT_WRITE (I think that php_socket_errno() does not return EAGAIN in such case so alive will be 0) which basically saying that the stream is not alive and it means that _php_stream_eof also marks stream as eof.

I think that it should always return that the stream is alive when the SSL_ERROR_WANT_READ or SSL_ERROR_WANT_WRITE. It means you should keep alive as 1 IMO. It shouldn't block then as php_stream_read should return 0 so it will leave the loop in _php_stream_copy_to_mem.

Also this needs a test before it can be merged - see https://github.com/php/php-src/blob/master/ext/openssl/tests/bug72333.phpt for an example how to make a non blocking test (filling kernel buffer so it doesn't return everything in one go).

@DaveRandom I think it should be ok to merge it to the master if the above is fixed. The liveness doesn't need to block if SSL_ERROR_WANT_READ or SSL_ERROR_WANT_WRITE is returned as it should mean that the stream is still alive IMHO. But I might be missing something so it needs a bit more checking and testing though. We shouldn't definitely back port it as it could break some apps that relies on the fact that stream_get_contents is blocking atm.

@valga
Copy link
Contributor Author

valga commented Apr 27, 2018

@bukka It turned out that errno isn't always EAGAIN for SSL_ERROR_WANT_READ and SSL_ERROR_WANT_WRITE indeed, I fixed that.

I've tested re-negotiation scenario: the patch survives it. As for buffer underflow, it's hard to test it without packet fragmentation. Your suggestion to flood a socket with data doesn't work, because SSL_peek() always returns 1.

@valga
Copy link
Contributor Author

valga commented Apr 29, 2018

@bukka I added a test that simulates packets fragmentation with a proxy that splits application data (0x17) packets in 3 parts.

@bukka
Copy link
Member

bukka commented Apr 29, 2018

@valga Thanks for the fix and the test. I really like the introduction of a proxy to the test which can be useful for other non-blocking tests in the future. I will be quite busy next week but the week after I will try to find some time to test it.

@DaveRandom
Copy link
Contributor

After playing about with this and thinking about it some more I am +1 on this patch, I was looking at it from the wrong end in my initial comments.

👍

@DaveRandom
Copy link
Contributor

The only other thing I would say is that this change means that the code can be simplified, as the loop is now redundant and would never cycle, so the do-while wrapper can be removed completely.

@valga
Copy link
Contributor Author

valga commented May 2, 2018

Hmm, the problem here is bigger than I thought.

I was interested why Amp uses stream_get_contents() instead of fread() so I found this issue https://github.com/amphp/artax/issues/138. The patch makes stream_get_contents() version hang as well as fread() one with 3rd party loop extensions.

The problem here is SSL_peek() does not really peek: it reads a data from a socket and places it into an internal buffer for future use. In other words, it makes the socket empty.

Consider following code:

$chunk = fread($fp, $chunkSize);
if ($chunk === '' && feof($fp)) {
    // handle EOF
}

Now imagine that SSL packet arrives in 2 fragments with a tiny delay between them: first one is being consumed by SSL_read() in fread() and second one goes into SSL_peek() in feof(). The socket is now empty, so it "hangs" with 3rd party loop extensions until more data arrives or the socket is closed by other side.

stream_select() does not "hang" because of SSL_pending():

&& (pending = (size_t)SSL_pending(sslsock->ssl_handle)) > 0) {
it marks stream as readable when internal SSL buffer is not empty.

@valga
Copy link
Contributor Author

valga commented May 2, 2018

@bukka @DaveRandom I think we should remove SSL_peek() from stream liveness check, because using eof flag is enough.

@kelunik @bwoebi You might be interested in that as well, because using stream_get_contents() is just a hack that doesn't resolve the issue completely.

@bukka
Copy link
Member

bukka commented May 2, 2018

@valga Couldn't that break feof as it identifies the end of stream by the liveness check:
feof -> _php_stream_eof -> php_openssl_sockop_set_option (liveness check...).
If you don't do SSL_peek, then how would you identify the end of stream?

I think it might be better solution to expose SSL_pending. Of course in a generic stream way so something like stream_pending maybe..?

@valga
Copy link
Contributor Author

valga commented May 2, 2018

@bukka In most cases you are using feof() with fread() (like in snippet above) and fread() already sets eof flag in case of unrecoverable read errors:

php-src/ext/openssl/xp_ssl.c

Lines 2055 to 2073 in 6400264

/* If we didn't do anything on the last loop (or an error) check to see if we should retry or exit. */
if (nr_bytes <= 0) {
/* Get the error code from SSL, and check to see if it's an error or not. */
int err = SSL_get_error(sslsock->ssl_handle, nr_bytes );
retry = php_openssl_handle_ssl_error(stream, nr_bytes, 0);
/* If we get this (the above doesn't check) then we'll retry as well. */
if (errno == EAGAIN && err == SSL_ERROR_WANT_READ && read) {
retry = 1;
}
if (errno == EAGAIN && err == SSL_ERROR_WANT_WRITE && read == 0) {
retry = 1;
}
/* Also, on reads, we may get this condition on an EOF. We should check properly. */
if (read) {
stream->eof = (retry == 0 && errno != EAGAIN && !SSL_pending(sslsock->ssl_handle));
}

The main difference between them is that feof() ignores SSL_pending(), so it marks the stream as EOF even it has some unprocessed data in SSL buffer.

As for SSL_pending() exposure, it requires all 3rd party loop extensions to update, or am I wrong?

@bukka
Copy link
Member

bukka commented May 2, 2018

Yeah I know it's set by fread if it reads 0 bytes. However I think it would break the case when fread reads the whole buffer and then feof is done. For example if client writes only 256 bytes, then doing something like this in server:

fread($fp, 256);
var_dump(feof($fp));

would result in a different output unless I miss something. If that's the case, then it would be a BC break including the blocking socket so it would be really a no-go. Also it really should be consistent with standard xp_socket - see

ret = recv(sock->socket, &buf, sizeof(buf), MSG_PEEK);
.

If the 3rd party extensions deal directly with the php streams, then they should be probably updated I guess. Alternatively it could be done in the PHP code before doing poll on read.

@valga
Copy link
Contributor Author

valga commented May 2, 2018

@bukka Reading 0 bytes is not enough for EOF. There must be also some error different from EAGAIN (the logic is almost identical with stream liveness check by the way).

Quite unfortunately we can't peek with SSL, but we can use SSL_get_shutdown() to check whether a connection has been shut down properly by other end + peek at the underlying transport level to check whether a socket itself is still alive.

@bukka
Copy link
Member

bukka commented May 2, 2018

I need to check what OpenSSL does exactly but I don't really think we should be doing this kind of hack (basically trying to recreate what OpenSSL does already after SSL_read / SSL_peak) in the openssl ext just to make other 3rd party extension work. If those extensions want to work with streams, they should explicitly check if there are any pending bytes in the stream as stream_select does. That's why stream_pending is much better solution IMO. I think having such function would be useful anyway. In any case the change like this would go to 7.3 earliest so I don't see why 3rd party extensions couldn't integrate it.

Going forward it would be great to extend stream_select to support epoll so there is not much reason why to use 3rd party event loop extensions with streams and we don't have to care about them.

@kelunik
Copy link
Member

kelunik commented May 2, 2018

@valga TLS can be enabled / disabled at any time without closing the connection. I think currently there's no way to determine whether TLS has been properly shutdown, which makes truncation attacks possible. These are usually detected in case of HTTP, because HTTP has it's own request framing, but that might not be possible for other protocols.

Instead if offering separate stream_* methods, it's better to fix it properly for the long run. Maybe provide TLS as a separate functionality on top of the stream functions. I'm not sure how a proper API would look like, but another thing we need to expose is SSL_get_error or whatever that things is called, because a failing read might be because the TLS layer says WANT_WRITE in case of a renegotiation.

@valga
Copy link
Contributor Author

valga commented May 2, 2018

@bukka Considering typical usage pattern fread();...;feof(); I see no reason for double check by peeking at a stream: either fread() set eof flag because of unrecoverable error / EOF reached or it just says that the stream has not enough data for a complete read.

Anyway, the patch does what's advertised, but it breaks apps that rely on feof() and stream_get_contents() block on non-blocking SSL streams sometimes.

@valga
Copy link
Contributor Author

valga commented May 4, 2018

@kelunik As far as I can see, encryption can be disabled only from PHP side. Doing that from the other side results in EOF because IO fails and SSL_get_shutdown is never called to determine encryption state and unset ssl_active flag.

@DaveRandom
Copy link
Contributor

@valga Either party can initiate renegotiation at any time, regardless of the implementation details, that is simply a property of the underlying protocols.

It is this fact that exposes an (as yet unresolved) problem/hole in the PHP streams + SSL/TLS abstraction, in that if the remote party has initiated a renegotiation then there is no way to reliably differentiate between this and the remote party gracefully closing the socket. Specifically, the condition that for a raw TCP connection would indicate a remote shutdown (i.e. socket polls as readable, fread() returns an empty string) will occur both when the client has closed the socket and while a renegotiation is in progress. As it stands, there's no way for userland to check whether the reason is WANT_READ/WANT_WRITE (renegotiation in progress) or ZERO_RETURN (initiated shutdown).

In many ways this is a separate problem, but what it absolutely does mean is that checking the "liveness" of an SSL stream must call a function that will progress any pending renegotiation (such as SSL_peek(), although there may be an alternative that could be used here) and report the state in order to accurately determine the result.

@kelunik
Copy link
Member

kelunik commented May 4, 2018

@valga I think the current implementation for disabling is broken, at least for non-blocking sockets.

@valga
Copy link
Contributor Author

valga commented May 4, 2018

@DaveRandom Do you think it is ok to call SSL_peek() after SSL_read() already returned WANT_READ/WANT_WRITE? As for difference between renegotiation and remote shutdown, feof() returns false for renegotiation and true for shutdown, at least atm.

@kelunik I think it is broken also for blocking sockets, because SSL_shutdown() doesn't wait for "close notify" from the other side.

@DaveRandom
Copy link
Contributor

DaveRandom commented May 5, 2018

@valga in theory that should be true but in practice I don't think it always works like that, and I'm absolutely certain that feof() is not currently guaranteed to be non-blocking. This patch would at least partially resolve that - the reason it's currently potentially blocking is the loop - but I'm still not certain it would completely resolve it, and I'm also not certain it wouldn't cause issues somewhere else. However, I may be overthinking it.

More pertinent, however, is that it is at least in theory necessary to distinguish between WANT_READ and WANT_WRITE, as this indicates whether the program should watch the stream for readability or writability, and while this can be dealt with by watching for both, this approach could result in a busy-wait situation. In practice WANT_WRITE will rarely occur, but (IMHO) any patch which addresses these issues should address all of them rather than being a partial fix.

None of this is to say that I disapprove of this patch in isolation, as long as we recognize that it is isn't addressing the issues outlined above.

@kelunik
Copy link
Member

kelunik commented Jan 2, 2019

#3729 is a duplicate and basically the same patch. Let's get this in, please. Bug report is https://bugs.php.net/bug.php?id=77390.

@php-pulls
Copy link

Comment on behalf of bukka at php.net:

Updated version with integrated proxy test in #3752

@php-pulls php-pulls closed this Jan 17, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants