Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unchecked vector pre-allocation #151

Closed
dbrgn opened this issue Nov 21, 2017 · 22 comments · Fixed by #221
Closed

Unchecked vector pre-allocation #151

dbrgn opened this issue Nov 21, 2017 · 22 comments · Fixed by #221

Comments

@dbrgn
Copy link
Contributor

dbrgn commented Nov 21, 2017

When playing around with afl, I found the following example:

extern crate rmpv;

use std::io::Cursor;

fn main() {
    let data = [219, 175, 142, 142, 201, 219, 128, 0, 50, 175, 142, 196, 100, 212, 185];
    let mut cursor = Cursor::new(data);
    let decoded = rmpv::decode::value::read_value(&mut cursor);
    println!("Done: {:?}", decoded);
}

It takes almost 200ms in release mode and 2min 47s in debug mode to run.

@dbrgn
Copy link
Contributor Author

dbrgn commented Nov 21, 2017

Note that – at least in debug mode – the CPU usage is at 100% on one core and memory consumption is steadily increasing to a few gigabytes.

@dbrgn
Copy link
Contributor Author

dbrgn commented Nov 21, 2017

Sorry for the triple post. If I understand the msgpack spec correctly, this is because the data indicates that a raw str buffer with 2.8 GiB of data is following.

The decoder should probably abort after having reached the end of the input data, right? Or is this an issue with the Cursor?

@3Hren
Copy link
Owner

3Hren commented Nov 21, 2017

Yep, there is vector pre-allocation to fit the specified size (which is 2945355465 bytes) before actual reading. This is definitely a bug.

@dbrgn dbrgn changed the title Slow decoding Unchecked vector pre-allocation Mar 8, 2018
@stusmall
Copy link

stusmall commented Mar 6, 2019

I took a quick look at fixing this. Looks like it should be easy to fix once rust-lang/rust#48043 stabilizes

@dbrgn
Copy link
Contributor Author

dbrgn commented Mar 6, 2019

The question is whether we should even try to allocate that much memory, or whether a heuristic should be used to check whether there will actually be data in the input buffer to fill that memory...

E.g. if the remaining input data (in a non-streaming parser) is smaller than the requested buffer size, then something's probably wrong.

@huitseeker
Copy link

This issue came up in the recent audit of the dalek libraries by Quarkslab.

Is it worth opening a RustSec report?

@dbrgn
Copy link
Contributor Author

dbrgn commented Aug 27, 2019

Let's ask @tarcieri 🙂 It does have potential for denial-of-service.

@tarcieri
Copy link

This particular vulnerability has come up enough times in my life I'd appreciate an advisory for it 😉

@Shnatsel
Copy link

Shnatsel commented Oct 1, 2019

We should probably request a Clippy lint for this kind of bug. It's a fairly common pattern and is very easy to detect, although you cannot be 100% sure it's a bug without taint analysis.

@dbrgn
Copy link
Contributor Author

dbrgn commented Oct 1, 2019

@Shnatsel how would that work? Vec::with_capacity(param) certainly has its uses.

@Shnatsel
Copy link

Shnatsel commented Oct 1, 2019

Where param is not a constant and greater than u16, warn that it may exhaust memory. Sadly this will lead to false positives so we can only make this a lint that's not enabled by default.

https://github.com/facebookexperimental/MIRAI should be able to detect such cases much more reliably.

@dbrgn
Copy link
Contributor Author

dbrgn commented Oct 1, 2019

Ok, but in this case the param is not constant, it's part of the data. The problem isn't even mainly the memory exhaustion, it's the fact that 2 GiB are allocated even when the serialized data is only 20 bytes long.

@3Hren do you think aborting with an error when [requested allocation size > total (or remaining) serialized data size] is a good fix for this?

@3Hren
Copy link
Owner

3Hren commented Oct 2, 2019

I think this would be enough.

@dbrgn
Copy link
Contributor Author

dbrgn commented Oct 2, 2019

Here's another minimal example:

let buf: &[u8] = &[
  // bin32 type
  0xc6,
  // 0xffffffff bytes (4096 GiB) of data will follow
  0xff, 0xff, 0xff, 0xff,
  // only 1 byte of data
  0x00,
];
read_value(&mut &buf[..]).unwrap();

Regarding the fix:

@3Hren do you think aborting with an error when [requested allocation size > total (or remaining) serialized data size] is a good fix for this?

I think this would be enough.

I tried to fix it, but then realized that the rmpv library always deals with R: Read types, which means that we do not know the length of the available data in advance.

I see two possibilities for an "easy" fix:

  • Only preallocate vectors up to a certain arbitrary size like 10 MiB (e.g. Vec::with_capacity(min(len, 10_485_760)))
  • Don't preallocate vectors at all

The first option would still have the advantage that all messages smaller than that limit don't require vector resizing, but I don't like the arbitrary limit that much. Limit could also be higher (e.g. 100 MiB), but it's always arbitrary.

Any opinions?

@kornelski
Copy link
Collaborator

Limiting capacity to some arbitrary size is OK IMHO, as it doesn't affect correctness in any way, and it's only a performance optimization.

Is it possible to amplify this attack by using something like an array to allocate n oversized vecs, or does it work only at the end of the stream?

@skrap
Copy link

skrap commented Oct 2, 2019

Could this be a premature optimization? Is there any measurable performance gain from preallocating? If not, perhaps just let Vec grow, with whatever algorithm it chooses, as new bytes from Read become available. This issue goes away in that case.

@dbrgn
Copy link
Contributor Author

dbrgn commented Oct 2, 2019

Is it possible to amplify this attack by using something like an array to allocate n oversized vecs, or does it work only at the end of the stream?

Yeah, that might be possible. Good point.

Could this be a premature optimization? Is there any measurable performance gain from preallocating?

I wondered the same and wanted to do a benchmark, but it seems that the current benchmarks in the codebase don't work (usage of undefined variables and other things like that) so I didn't bother with that so far.

I agree that we should probably just stop preallocating any vectors if we have no way of knowing the following data. The Rust compiler already deals with properly optimized algorithms to grow vectors.

@dbrgn
Copy link
Contributor Author

dbrgn commented Oct 2, 2019

Note: As an optimization there could also be a parallel API that takes a byte slice instead of a R: Read type, since then we could detect this problem by taking a look at the remaining length of the slice. Then we could pre-allocate safely (of course only if this brings any reasonable performance benefits).

@kornelski
Copy link
Collaborator

Is it possible to amplify this attack by using something like an array to allocate n oversized vecs, or does it work only at the end of the stream?

Yeah, that might be possible.

Do you have an example, or could describe how it can be done?

@dbrgn
Copy link
Contributor Author

dbrgn commented Oct 3, 2019

@kornelski this is the spec for arrays with up to 2**32-1 entries:

array 32 stores an array whose length is upto (2^32)-1 elements:
+--------+--------+--------+--------+--------+~~~~~~~~~~~~~~~~~+
|  0xdd  |ZZZZZZZZ|ZZZZZZZZ|ZZZZZZZZ|ZZZZZZZZ|    N objects    |
+--------+--------+--------+--------+--------+~~~~~~~~~~~~~~~~~+

Note that the following things are all untested.

If we write 0xdd 0x00 0x0f 0x42 0x40 ... (not sure if I got the endianness correct) that would specify an array of 1 million elements.

While the type of the elements is not known in advance (IIRC), it will take at least a byte per element. I guess that's why rmpv preallocates 1 million bytes (1 MiB) of RAM:

fn read_array_data<R: Read>(rd: &mut R, mut len: usize) -> Result<Vec<Value>, Error> {
    let mut vec = Vec::with_capacity(len);
    ...

Now the library will iterate over the values. If the first element in that array is also an array with 1 million entries, then that will result in another Vec of ~1 MiB being allocated (so 2 in total).

This can be nested. I'm not sure if there's a max recursion depth, but of course if you nest 100 arrays that all allocate 100 MiB of space, then that will already require 10 GiB of RAM, without any single vector exceeding that 100 MiB size.

It's not exponential, since it will error as soon as the parser notices that there's not actually sufficient data to fill all those arrays. But it's still possible to do a DoS even if we add a preallocation limit.

I'll create a PR that replaces the with_capacity calls with new. If someone has a better idea, we can still add another fix later on. But I don't see how preallocating without any knowledge of the remaining data size will work safely for untrusted input.

Edit: With this example:

#[test]
fn invalid_buf_size_arr() {
    // This invalid buffer requests a nested array of depth 10.
    // All arrays contain the maximum possible number of elements.
    // If a byte is preallocated for every array content,
    // that would require 40 GiB of RAM.
    let buf: &[u8] = &[
        0xdd, 0xff, 0xff, 0xff, 0xff,
        0xdd, 0xff, 0xff, 0xff, 0xff,
        0xdd, 0xff, 0xff, 0xff, 0xff,
        0xdd, 0xff, 0xff, 0xff, 0xff,
        0xdd, 0xff, 0xff, 0xff, 0xff,
        0xdd, 0xff, 0xff, 0xff, 0xff,
        0xdd, 0xff, 0xff, 0xff, 0xff,
        0xdd, 0xff, 0xff, 0xff, 0xff,
        0xdd, 0xff, 0xff, 0xff, 0xff,
        0xdd, 0xff, 0xff, 0xff, 0xff,
    ];
    let result = read_value(&mut &buf[..]);
}

...the code SIGABRTs at runtime with the message memory allocation of 240518168520 bytes failed

dbrgn added a commit to dbrgn/msgpack-rust that referenced this issue Oct 3, 2019
@daboross
Copy link
Collaborator

daboross commented Oct 3, 2019

I agree that we should probably just stop preallocating any vectors if we have no way of knowing the following data. The Rust compiler already deals with properly optimized algorithms to grow vectors.

It has properly optimized algorithms, but the best algorithms still involve log_2(n) allocations.

I'd still think that pre-allocating under some limit is better than not pre-allocating at all. If we pre-allocate under some arbitrary size, then arrays over that limit decode slower than arrays under it. But code with pre-allocation, even with an arbitrary cap, will always be faster than code without it?

serde's Deserialize implementations for Vec, HashSet, etc. cap out pre-allocation at 4096 elements. Perhaps we could do the same?

dbrgn added a commit to dbrgn/msgpack-rust that referenced this issue Oct 8, 2019
dbrgn added a commit to dbrgn/msgpack-rust that referenced this issue Oct 8, 2019
dbrgn added a commit to dbrgn/msgpack-rust that referenced this issue Oct 8, 2019
@kornelski
Copy link
Collaborator

Fix released in rmpv v0.4.2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants