-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Increase max I/O buffer size to 128 MiB, and DRY the constant in stdio_common tests #4580
Conversation
Although 16KiB is too small, 128 MiB is probably too large, at least as a default value. I'd rather not have a few open files take up an entire gigabyte of ram unless the user explicitly asks for a buffer size that large. |
@Darksonn I don't think it does. As far as I can tell, this is only specifying the hard maximum number of bytes that The only uses of Unless I'm missing something and am completely off base ;) I've been hunting for evidence to the contrary but haven't been able to find it - I wasn't able to find anything in code that allocates a larger buffer, and with some added debugging and a quick test binary: diff --git a/tokio/src/io/blocking.rs b/tokio/src/io/blocking.rs
index 289c32ca..9d431b40 100644
--- a/tokio/src/io/blocking.rs
+++ b/tokio/src/io/blocking.rs
@@ -241,6 +241,8 @@ impl Buf {
unsafe {
self.buf.set_len(len);
}
+
+ dbg!(self.buf.capacity());
}
pub(crate) fn read_from<T: Read>(&mut self, rd: &mut T) -> io::Result<usize> { use tokio::fs::File;
use tokio::io::AsyncReadExt;
#[tokio::main]
async fn main() {
println!("+++ Buffer size 4k");
let mut buf = [0u8; 4 * 1024];
let mut f = File::open("test.dat").await.unwrap();
let count = f.read(&mut buf).await.unwrap();
println!("--- Read {}\n", count);
println!("+++ Buffer size 16k");
let mut buf = [0u8; 16 * 1024];
let mut f = File::open("test.dat").await.unwrap();
let count = f.read(&mut buf).await.unwrap();
println!("--- Read {}\n", count);
println!("+++ Buffer size 1M");
let mut buf = vec![0u8; 1 * 1024 * 1024];
let mut f = File::open("test.dat").await.unwrap();
let count = f.read(&mut buf).await.unwrap();
println!("--- Read {}\n", count);
println!("+++ Buffer size 128M");
let mut buf = vec![0u8; 128 * 1024 * 1024];
let mut f = File::open("test.dat").await.unwrap();
let count = f.read(&mut buf).await.unwrap();
println!("--- Read {}\n", count);
println!("+++ Buffer size 512M");
let mut buf = vec![0u8; 512 * 1024 * 1024];
let mut f = File::open("test.dat").await.unwrap();
let count = f.read(&mut buf).await.unwrap();
println!("--- Read {}\n", count);
} That behavior seems to be confirmed:
Let me know if that seems wrong! If it isn't wrong, maybe it's worth adding a brief doc to |
Personally, I agree with Alice that this is an easy way to allow for a ton of behind-the-scenes allocations when working with files, and in particular, allocations that a user will not be able to easily surface/control. Separately, though, but partially related to the above: it's not clear to me why it should be 128MB. Ostensibly, we want to do more I/O per call, that much is clear, but is there not a diminishing return at maximum buffer sizes between 16KB and 128MB? As an example: with the current 16KB maximum buffer size, I can already read and write at nearly 1GB/s by doing simple things like using a buffered writer, and not constantly fsyncing, etc. Overall, I don't see why this needs to change, but I'm open to examining any data/evidence that shows how the change would benefit users, whether it's 128MB or something smaller, like 1MB, etc. |
@tobz my argument about why it needs to change is because it takes control away from the user. It's not like we're setting a default with this constant; we're setting a limit. By increasing the limit to something much higher (or, alternatively, removing it altogether), it gives users the option to figure out a size that works best for their use case. |
The problem is that is the work that has to happen behind-the-scenes to drive synchronous I/O through an We're taking away control from the user, for some definition of "control", but again, we're explicitly trying to avoid allocating a large one-off buffer that the user actually has no control over and never will have control over. This is why I'd personally want to see data along the curve of possible maximum buffer sizes, because I believe the pragmatic view is that we could increase the maximum buffer size so long as we see there's a general performance improvement, and avoid increasing it past the point where there starts to be diminishing returns. File I/O through the That's control, and efficiency to boot. |
Well, a much simpler option is a way to explicitly set the buffer size, that allows it to go above the limit. This would not risk existing applications that run out of memory because they keep a few |
I wouldn't disagree, although just visualizing it in my mind, it's not clear where we might expose that, and it also feels like it leaks a bit of the implementation details around how file I/O is driven. All of that said, yeah, it would indeed preserve much of the "no surprises" aspect of the current maximum buffer size while giving others the ability to exceed it. |
This is not really worth the mental effort of arguing about for me. Will have to just keep maintaining a fork to control I/O size. |
Motivation
A very rudimentary solution to #1976 :) the maximum number of bytes we can currently read/write through
tokio
's I/O subsystem in a single operation is 16KiB. This increases that maximum to 128MiB.Solution
It doesn't give us direct configurability of the maximum buffer size, but since (unless I'm misunderstanding the code in that module) calling code can create and supply smaller buffers, it seems like simply increasing the hardcoded number is a reasonable starting point