Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decompression of "deflate" encoded content fails for "zlib" format #132

Open
atifaziz opened this issue Apr 17, 2020 · 0 comments
Open

Decompression of "deflate" encoded content fails for "zlib" format #132

atifaziz opened this issue Apr 17, 2020 · 0 comments
Labels
info For your information

Comments

@atifaziz
Copy link
Owner

I ran into the same problem as @tmenier reported for httpbin when adding the compression sample using https://httpbin.org/deflate (initially).

It seems that the problem lies with .NET Core/Framework that incorrectly uses DeflateStream when the standard states (in section 4.2.2 of RFC 7320) that a content encoding value of deflate actually is the "zlib" data format, which is a header + "deflate" data if the compression method (CM) field of the header is 8. Sadly, the same section also states:

Note: Some non-conformant implementations send the "deflate" compressed data without the zlib wrapper.

However, all hope is not lost as it is trivial to distinguish between the "deflate" and "zlib" formats by just looking at the low nibble of the first byte, as pointed out by this brilliant observation (and answer on StackOverflow) by none other than @madler himself:

If the first byte in hex has a low nybble of 8, then it is a zlib stream. Otherwise it is a raw deflate stream. (Assuming that you know a priori that the only possible choices are a valid zlib stream or a valid deflate stream.) A raw deflate stream will never have an 8 in the low first nybble, but a zlib stream always will.

Background:

The zlib header format puts the compression method in the low nybble of the first byte. That compression method is always 8 for deflate.

The bit sequence in a raw deflate stream starts from the least significant bits of the bytes. If the first three bits are 000 (as they are for an 8), that signifies a stored (not compressed block), and it is not the last block. Stored blocks put the bytes of the input on byte boundaries. So the next thing that is done by the compressor after writing the 000 bits is to fill out the rest of the byte with zero bits to get to the next byte boundary. Therefore the next bit will never be a 1, so it is not possible for a valid deflate stream to have the first four bits be 1000, or the first nybble to be 8. (Note that the bits are read from the bottom up.)

The first (i.e. low) nybble of a valid deflate stream can only be 0..5 or a..d. If you see 6..9, e, or f, then it is not a valid deflate stream.

Mark Adler, May 30, 2016 (CC BY-SA 4.0)

The misinterpretation of Deflate Coding section of the HTTP specification has caused enough problems between implementations (especially for proxies) that "deflate" is generally discouraged; most browsers also put "deflate" last when listing the accepted content encoding set. So while a solution could be provided by looking at the first byte to detect the actual format format, it does not seem to be worth the effort (being non-trivial in an un-buffered manner) for a few reasons:

  • Use of "deflate" encoding will diminish considerably over time (and possibly is the case already).
  • The fix really belongs in .NET that should, at the very least, support the standard "deflate" encoding by using the "zlib" format and fail for non-conforming replies. It should have definitely not have failed for https://httpbin.org/deflate.
  • A workaround can implemented by user for specific cases of non-conforming replies.

Meanwhile, this issue shares my research thus far as FYI for users and note-to-self.

@atifaziz atifaziz added the info For your information label Apr 17, 2020
@atifaziz atifaziz changed the title Decompression of "deflate" content fails if the server uses zlib format Decompression of "deflate" encoded content fails for "zlib" format Apr 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
info For your information
Projects
None yet
Development

No branches or pull requests

1 participant