Increase max I/O buffer size to 128 MiB, and DRY the constant in stdio_common tests #4580

mcronce · 2022-03-24T06:52:57Z

Motivation

A very rudimentary solution to #1976 :) the maximum number of bytes we can currently read/write through tokio's I/O subsystem in a single operation is 16KiB. This increases that maximum to 128MiB.

Solution

It doesn't give us direct configurability of the maximum buffer size, but since (unless I'm misunderstanding the code in that module) calling code can create and supply smaller buffers, it seems like simply increasing the hardcoded number is a reasonable starting point

Darksonn · 2022-03-24T07:17:13Z

Although 16KiB is too small, 128 MiB is probably too large, at least as a default value. I'd rather not have a few open files take up an entire gigabyte of ram unless the user explicitly asks for a buffer size that large.

…o_common tests

mcronce · 2022-03-24T16:07:47Z

@Darksonn I don't think it does. As far as I can tell, this is only specifying the hard maximum number of bytes that io::blocking::Buf will extend itself to, and thus what can be transferred in a single I/O request. Calling code has always been free to allocate and pass in a buffer of any size; MAX_BUF is used in a std::cmp::min() call, so the lesser of MAX_BUF and the size of the passed-in buffer will be used.

The only uses of MAX_BUF to directly determine allocation size that I can find are in two (currently failing :( ) tests in tokio/src/fs/file/tests.rs

Unless I'm missing something and am completely off base ;) I've been hunting for evidence to the contrary but haven't been able to find it - I wasn't able to find anything in code that allocates a larger buffer, and with some added debugging and a quick test binary:

diff --git a/tokio/src/io/blocking.rs b/tokio/src/io/blocking.rs
index 289c32ca..9d431b40 100644
--- a/tokio/src/io/blocking.rs
+++ b/tokio/src/io/blocking.rs
@@ -241,6 +241,8 @@ impl Buf {
         unsafe {
             self.buf.set_len(len);
         }
+
+        dbg!(self.buf.capacity());
     }
 
     pub(crate) fn read_from<T: Read>(&mut self, rd: &mut T) -> io::Result<usize> {

use tokio::fs::File;
use tokio::io::AsyncReadExt;

#[tokio::main]
async fn main() {
	println!("+++ Buffer size 4k");
	let mut buf = [0u8; 4 * 1024];
	let mut f = File::open("test.dat").await.unwrap();
	let count = f.read(&mut buf).await.unwrap();
	println!("--- Read {}\n", count);

	println!("+++ Buffer size 16k");
	let mut buf = [0u8; 16 * 1024];
	let mut f = File::open("test.dat").await.unwrap();
	let count = f.read(&mut buf).await.unwrap();
	println!("--- Read {}\n", count);

	println!("+++ Buffer size 1M");
	let mut buf = vec![0u8; 1 * 1024 * 1024];
	let mut f = File::open("test.dat").await.unwrap();
	let count = f.read(&mut buf).await.unwrap();
	println!("--- Read {}\n", count);

	println!("+++ Buffer size 128M");
	let mut buf = vec![0u8; 128 * 1024 * 1024];
	let mut f = File::open("test.dat").await.unwrap();
	let count = f.read(&mut buf).await.unwrap();
	println!("--- Read {}\n", count);

	println!("+++ Buffer size 512M");
	let mut buf = vec![0u8; 512 * 1024 * 1024];
	let mut f = File::open("test.dat").await.unwrap();
	let count = f.read(&mut buf).await.unwrap();
	println!("--- Read {}\n", count);
}

That behavior seems to be confirmed:

+++ Buffer size 4k
[/home/mc/code/tokio/tokio/src/io/blocking.rs:245] self.buf.capacity() = 4096
--- Read 4096

+++ Buffer size 16k
[/home/mc/code/tokio/tokio/src/io/blocking.rs:245] self.buf.capacity() = 16384
--- Read 16384

+++ Buffer size 1M
[/home/mc/code/tokio/tokio/src/io/blocking.rs:245] self.buf.capacity() = 1048576
--- Read 1048576

+++ Buffer size 128M
[/home/mc/code/tokio/tokio/src/io/blocking.rs:245] self.buf.capacity() = 134217728
--- Read 134217728

+++ Buffer size 512M
[/home/mc/code/tokio/tokio/src/io/blocking.rs:245] self.buf.capacity() = 134217728
--- Read 134217728

Let me know if that seems wrong! If it isn't wrong, maybe it's worth adding a brief doc to MAX_BUF explaining all this - I can do that in this PR

tobz · 2022-04-06T14:58:43Z

Personally, I agree with Alice that this is an easy way to allow for a ton of behind-the-scenes allocations when working with files, and in particular, allocations that a user will not be able to easily surface/control.

Separately, though, but partially related to the above: it's not clear to me why it should be 128MB. Ostensibly, we want to do more I/O per call, that much is clear, but is there not a diminishing return at maximum buffer sizes between 16KB and 128MB?

As an example: with the current 16KB maximum buffer size, I can already read and write at nearly 1GB/s by doing simple things like using a buffered writer, and not constantly fsyncing, etc.

Overall, I don't see why this needs to change, but I'm open to examining any data/evidence that shows how the change would benefit users, whether it's 128MB or something smaller, like 1MB, etc.

mcronce · 2022-04-06T15:02:51Z

@tobz my argument about why it needs to change is because it takes control away from the user. It's not like we're setting a default with this constant; we're setting a limit.

By increasing the limit to something much higher (or, alternatively, removing it altogether), it gives users the option to figure out a size that works best for their use case.

tobz · 2022-04-06T15:15:46Z

@tobz my argument about why it needs to change is because it takes control away from the user. It's not like we're setting a default with this constant; we're setting a limit.

By increasing the limit to something much higher (or, alternatively, removing it altogether), it gives users the option to figure out a size that works best for their use case.

The problem is that is the work that has to happen behind-the-scenes to drive synchronous I/O through an async-based API means we're doing a lot more copying/buffering than is optimal.

We're taking away control from the user, for some definition of "control", but again, we're explicitly trying to avoid allocating a large one-off buffer that the user actually has no control over and never will have control over.

This is why I'd personally want to see data along the curve of possible maximum buffer sizes, because I believe the pragmatic view is that we could increase the maximum buffer size so long as we see there's a general performance improvement, and avoid increasing it past the point where there starts to be diminishing returns.

File I/O through the blocking infrastructure is kind of a crutch to begin with.. a means to an end, by providing a usable API surface for doing file I/O in a way that is natural for async applications, but I think Carl's comments in #1976 are ultimately what we really want: a way to allow the user to actually pass ownership of their buffer(s) rather than having to ever copy at all.

That's control, and efficiency to boot.

Darksonn · 2022-04-06T15:24:04Z

a way to allow the user to actually pass ownership of their buffer(s) rather than having to ever copy at all.

Well, a much simpler option is a way to explicitly set the buffer size, that allows it to go above the limit. This would not risk existing applications that run out of memory because they keep a few File objects around.

tobz · 2022-04-06T15:27:53Z

I wouldn't disagree, although just visualizing it in my mind, it's not clear where we might expose that, and it also feels like it leaks a bit of the implementation details around how file I/O is driven.

All of that said, yeah, it would indeed preserve much of the "no surprises" aspect of the current maximum buffer size while giving others the ability to exceed it.

mcronce · 2022-04-06T16:27:20Z

This is not really worth the mental effort of arguing about for me. Will have to just keep maintaining a fork to control I/O size.

mcronce mentioned this pull request Mar 24, 2022

File operations don't use the whole buffer #1976

Closed

Darksonn added A-tokio Area: The main tokio crate M-fs Module: tokio/fs labels Mar 24, 2022

Increase max I/O buffer size to 128 MiB, and DRY the constant in stdi…

0ed0432

…o_common tests

mcronce force-pushed the master branch from b46afa5 to 0ed0432 Compare March 24, 2022 16:08

mcronce closed this Apr 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Increase max I/O buffer size to 128 MiB, and DRY the constant in stdio_common tests #4580

Increase max I/O buffer size to 128 MiB, and DRY the constant in stdio_common tests #4580

mcronce commented Mar 24, 2022 •

edited

Loading

Darksonn commented Mar 24, 2022

mcronce commented Mar 24, 2022 •

edited

Loading

tobz commented Apr 6, 2022

mcronce commented Apr 6, 2022 •

edited

Loading

tobz commented Apr 6, 2022 •

edited

Loading

Darksonn commented Apr 6, 2022 •

edited

Loading

tobz commented Apr 6, 2022

mcronce commented Apr 6, 2022

Increase max I/O buffer size to 128 MiB, and DRY the constant in stdio_common tests #4580

Increase max I/O buffer size to 128 MiB, and DRY the constant in stdio_common tests #4580

Conversation

mcronce commented Mar 24, 2022 • edited Loading

Motivation

Solution

Darksonn commented Mar 24, 2022

mcronce commented Mar 24, 2022 • edited Loading

tobz commented Apr 6, 2022

mcronce commented Apr 6, 2022 • edited Loading

tobz commented Apr 6, 2022 • edited Loading

Darksonn commented Apr 6, 2022 • edited Loading

tobz commented Apr 6, 2022

mcronce commented Apr 6, 2022

mcronce commented Mar 24, 2022 •

edited

Loading

mcronce commented Mar 24, 2022 •

edited

Loading

mcronce commented Apr 6, 2022 •

edited

Loading

tobz commented Apr 6, 2022 •

edited

Loading

Darksonn commented Apr 6, 2022 •

edited

Loading