-
Notifications
You must be signed in to change notification settings - Fork 136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unchecked vector pre-allocation #151
Comments
Note that – at least in debug mode – the CPU usage is at 100% on one core and memory consumption is steadily increasing to a few gigabytes. |
Sorry for the triple post. If I understand the msgpack spec correctly, this is because the data indicates that a raw str buffer with 2.8 GiB of data is following. The decoder should probably abort after having reached the end of the input data, right? Or is this an issue with the |
Yep, there is vector pre-allocation to fit the specified size (which is 2945355465 bytes) before actual reading. This is definitely a bug. |
I took a quick look at fixing this. Looks like it should be easy to fix once rust-lang/rust#48043 stabilizes |
The question is whether we should even try to allocate that much memory, or whether a heuristic should be used to check whether there will actually be data in the input buffer to fill that memory... E.g. if the remaining input data (in a non-streaming parser) is smaller than the requested buffer size, then something's probably wrong. |
This issue came up in the recent audit of the dalek libraries by Quarkslab. Is it worth opening a RustSec report? |
Let's ask @tarcieri 🙂 It does have potential for denial-of-service. |
This particular vulnerability has come up enough times in my life I'd appreciate an advisory for it 😉 |
We should probably request a Clippy lint for this kind of bug. It's a fairly common pattern and is very easy to detect, although you cannot be 100% sure it's a bug without taint analysis. |
@Shnatsel how would that work? |
Where https://github.com/facebookexperimental/MIRAI should be able to detect such cases much more reliably. |
Ok, but in this case the param is not constant, it's part of the data. The problem isn't even mainly the memory exhaustion, it's the fact that 2 GiB are allocated even when the serialized data is only 20 bytes long. @3Hren do you think aborting with an error when [requested allocation size > total (or remaining) serialized data size] is a good fix for this? |
I think this would be enough. |
Here's another minimal example: let buf: &[u8] = &[
// bin32 type
0xc6,
// 0xffffffff bytes (4096 GiB) of data will follow
0xff, 0xff, 0xff, 0xff,
// only 1 byte of data
0x00,
];
read_value(&mut &buf[..]).unwrap(); Regarding the fix:
I tried to fix it, but then realized that the I see two possibilities for an "easy" fix:
The first option would still have the advantage that all messages smaller than that limit don't require vector resizing, but I don't like the arbitrary limit that much. Limit could also be higher (e.g. 100 MiB), but it's always arbitrary. Any opinions? |
Limiting capacity to some arbitrary size is OK IMHO, as it doesn't affect correctness in any way, and it's only a performance optimization. Is it possible to amplify this attack by using something like an array to allocate n oversized vecs, or does it work only at the end of the stream? |
Could this be a premature optimization? Is there any measurable performance gain from preallocating? If not, perhaps just let |
Yeah, that might be possible. Good point.
I wondered the same and wanted to do a benchmark, but it seems that the current benchmarks in the codebase don't work (usage of undefined variables and other things like that) so I didn't bother with that so far. I agree that we should probably just stop preallocating any vectors if we have no way of knowing the following data. The Rust compiler already deals with properly optimized algorithms to grow vectors. |
Note: As an optimization there could also be a parallel API that takes a byte slice instead of a |
Do you have an example, or could describe how it can be done? |
@kornelski this is the spec for arrays with up to 2**32-1 entries:
Note that the following things are all untested. If we write While the type of the elements is not known in advance (IIRC), it will take at least a byte per element. I guess that's why rmpv preallocates 1 million bytes (1 MiB) of RAM: fn read_array_data<R: Read>(rd: &mut R, mut len: usize) -> Result<Vec<Value>, Error> {
let mut vec = Vec::with_capacity(len);
... Now the library will iterate over the values. If the first element in that array is also an array with 1 million entries, then that will result in another Vec of ~1 MiB being allocated (so 2 in total). This can be nested. I'm not sure if there's a max recursion depth, but of course if you nest 100 arrays that all allocate 100 MiB of space, then that will already require 10 GiB of RAM, without any single vector exceeding that 100 MiB size. It's not exponential, since it will error as soon as the parser notices that there's not actually sufficient data to fill all those arrays. But it's still possible to do a DoS even if we add a preallocation limit. I'll create a PR that replaces the Edit: With this example: #[test]
fn invalid_buf_size_arr() {
// This invalid buffer requests a nested array of depth 10.
// All arrays contain the maximum possible number of elements.
// If a byte is preallocated for every array content,
// that would require 40 GiB of RAM.
let buf: &[u8] = &[
0xdd, 0xff, 0xff, 0xff, 0xff,
0xdd, 0xff, 0xff, 0xff, 0xff,
0xdd, 0xff, 0xff, 0xff, 0xff,
0xdd, 0xff, 0xff, 0xff, 0xff,
0xdd, 0xff, 0xff, 0xff, 0xff,
0xdd, 0xff, 0xff, 0xff, 0xff,
0xdd, 0xff, 0xff, 0xff, 0xff,
0xdd, 0xff, 0xff, 0xff, 0xff,
0xdd, 0xff, 0xff, 0xff, 0xff,
0xdd, 0xff, 0xff, 0xff, 0xff,
];
let result = read_value(&mut &buf[..]);
} ...the code SIGABRTs at runtime with the message |
Fixes 3Hren#151 / RUSTSEC-2017-0006
It has properly optimized algorithms, but the best algorithms still involve log_2(n) allocations. I'd still think that pre-allocating under some limit is better than not pre-allocating at all. If we pre-allocate under some arbitrary size, then arrays over that limit decode slower than arrays under it. But code with pre-allocation, even with an arbitrary cap, will always be faster than code without it?
|
Fixes 3Hren#151 / RUSTSEC-2017-0006
Fixes 3Hren#151 / RUSTSEC-2017-0006
Fix released in rmpv v0.4.2 |
When playing around with afl, I found the following example:
It takes almost 200ms in release mode and 2min 47s in debug mode to run.
The text was updated successfully, but these errors were encountered: