Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

experiment: io_uring support #16265

Closed
wants to merge 1 commit into from
Closed

Conversation

littledivy
Copy link
Member

@littledivy littledivy commented Oct 13, 2022

This PR does initial integration for using io_uring in Deno. See #16232

Implements specialized tokio_iouring ops for writeFile and readFile. The tokio
current_thread runtime is managed by tokio_iouring::start. We want to continue
adding more ops specialized for iouring.

Additionally, all of io_uring functionality is behind the enable_iouring Cargo feature
flag on deno_runtime. This is so that we do not distrupt workflow for people working
on this repo.

To compile Deno with io_uring support:

cargo build --release --features "deno_runtime/enable_iouring"

tokio_iouring

It's early but has amazing integration with tokio's current_thread runtime. Blocked on the following tokio_iouring PRs:

See stuff that Deno needs (ugly code alert).

Is it the right choice for Deno? This is what we aim to find out in the coming days with the experimental compilation flag.

Benchmarks

4x improvement in await Deno.writeFile(file, new Uint8Array(1024 * 1024 * 10)):

divy@divy:~/deno$ target/release/deno run -A --unstable cli/bench/write_file.js # io_uring
time 433 ms rate 230
time 548 ms rate 182
time 540 ms rate 185
time 565 ms rate 176
^C

divy@divy:~/deno$ ~/.deno/bin/deno run -A --unstable cli/bench/write_file.js # main
time 2696 ms rate 37
time 2857 ms rate 35
time 2859 ms rate 34
time 2861 ms rate 34
^C
main writeFile flamegraph

main_write_file

io_uring writeFile flamegraph

io_uring_write_file

.check(path, Some("Deno.readFile()"))?;
}

// TODO(@littledivy): Stat file to get the size.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is WIP lol. resize below may explode allocations for large files. We should stat the file for size.

@@ -12,12 +12,18 @@ description = "Rust to V8 serialization and deserialization"
[lib]
path = "lib.rs"

[features]
default = []
# Requires a recent Linux kernel version (atleast )
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Requires a recent Linux kernel version (atleast )
# Requires a recent Linux kernel version (atleast 5.6; tested on >5.11.0)

I think 5.6 is safe to assume? Read op was added in 5.6. I'm on 5.15.0 and tokio_iouring seems to suggest >5.11.0

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IORING_OP_READV and IORING_OP_READ_FIXED - what maps to the Readv and ProvideBuffers ops, if I read the the io_uring crate's source code correctly - were added in 5.1.

options: Option<&OpenOptions>,
api_name: &str,
) -> Result<(PathBuf, tokio_uring::fs::OpenOptions), AnyError> {
let path = Path::new(path).to_path_buf();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More of an aside but:

Suggested change
let path = Path::new(path).to_path_buf();
let path = PathBuf::from(path);

And maybe you don't even need to convert the &str to a Path or PathBuf?

permissions.read.check(&path, Some(api_name))?;
}

if options.write || options.append {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this also check options.truncate and/or options.create_new? A mode like O_RDONLY|O_CREAT is valid (creates file, then opens it read-only) but I would consider that a write operation.

.check(path, Some("Deno.readFile()"))?;
}

// TODO(@littledivy): Stat file to get the size.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stat-then-read is not really sound because of TOCTOU. You should be prepared for the file to change size.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stat size is simply a preallocation hint. We should stop reading only on EOF

let mut offset: u64 = 0;
loop {
let (res, mut ret) =
file.read_at(buf.split_off(offset as usize), offset).await;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

buf.split_off() is wildly inefficient for the offset > 0 case (creates new vec, copies data, leaves old vec's capacity unchanged, i.e., wasted.)

Perhaps a better way to handle this is to maintain a vec of vecs, then join them at the end. That's O(n) but that's better than O(n*m).

@@ -12,12 +12,18 @@ description = "Rust to V8 serialization and deserialization"
[lib]
path = "lib.rs"

[features]
default = []
# Requires a recent Linux kernel version (atleast )
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IORING_OP_READV and IORING_OP_READ_FIXED - what maps to the Readv and ProvideBuffers ops, if I read the the io_uring crate's source code correctly - were added in 5.1.

Copy link
Member

@ry ry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This patch is cleanly implemented, but ties us ever more to Tokio. This work might need to be undone if we went in a Crimson or pollio direction.

@espoal
Copy link

espoal commented Oct 15, 2022

question @littledivy : if I understood correctly here you use the unix syscall to open a file. Why didn't you use IORING_OP_OPENAT? that would save a syscall.

@littledivy
Copy link
Member Author

@espoal that is a syncronous op for Deno.openSync.

@littledivy
Copy link
Member Author

we are trying to optimize async I/O here. See runtime/ops/asyncfs/iouring/mod.rs

@lino-levan
Copy link
Contributor

@littledivy are there any updates? All of the PRs you mentioned blocking this change are merged now. This would be big for linux perf.

@KnorpelSenf
Copy link
Contributor

This patch is cleanly implemented, but ties us ever more to Tokio. This work might need to be undone if we went in a Crimson or pollio direction.

Is this the reason why this was abandoned?

@aapoalas
Copy link
Collaborator

@littledivy @ry Do we expect this PR to be taken up again at some point, or will it be abandoned altogether? ie. Should we close this?

@littledivy
Copy link
Member Author

I still want to do this but I don't have time to prioritize this atm. PRs welcome

@littledivy littledivy closed this Mar 25, 2023
@rektide
Copy link

rektide commented Nov 16, 2023

What are some take that would help this pr?

Each do the three blocking tokio-uring PR are merged now. Presumably updating this branch to use those would help?

@bnoordhuis
Copy link
Contributor

As a data point: after adding io_uring support to libuv we've been receiving a veritable fountain of reports from people hitting kernel bugs, from 100% CPU consumption to file operations not working on CIFS mounts.

Our kernel blacklist is long and specific, and still growing longer and more specific; see e.g. libuv/libuv#4224 from just yesterday. I honestly can't really recommend io_uring right now.

@espoal
Copy link

espoal commented Nov 16, 2023

@bnoordhuis io_uring is not thought to handle network mounted file systems.
I'm surprised of your report, because I built a cloud scale database on top of io_uring and I don't see those problems. Could you provide more sources? I will take time personally to debug and eventually propose a patch for the interface.

The ticket you mention do not refer to a bug, but to the fact that the interface of io_uring in kernel 6.1 improved dramatically, both in term of ergonomics and performance, hence why the community try to aim for that version at least.

@bnoordhuis
Copy link
Contributor

bnoordhuis commented Nov 16, 2023

@espoal search the nodejs/node and libuv/libuv issue trackers, you'll find plenty of issues, some still open.

I built a cloud scale database on top of io_uring and I don't see those problems

That probably means you're either targeting only really new kernels, or your user base is a fraction of node and libuv. Both projects have huge user bases and people hit (and report) the weirdest bugs. The war stories I could tell...

@espoal
Copy link

espoal commented Nov 16, 2023

@bnoordhuis that's a very fair point. I also noticed a huge stability improvement simply upgrading from 6.0 to 6.1

Sorry if I came out as rude, but I felt that you were attacking my single favourite kernel interface 😄 thanks for taking the time to elaborate ❤️

@bnoordhuis
Copy link
Contributor

No worries, didn't come across as rude.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants