Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed buffers to support ReadFixed and WriteFixed ops #54

Merged
merged 35 commits into from
Nov 23, 2022

Conversation

mzabaluev
Copy link
Contributor

@mzabaluev mzabaluev commented Sep 9, 2021

Add infrastructure to manage pre-registered buffers, and operation methods File::read_fixed_at and File::write_fixed_at to make use of them. Exclusive access to buffer data between the application and in-flight ops is controlled at runtime.

This is initial API to enable fixed buffers. Future developments may include:

Needs #52 to get tests to pass.

src/fs/file.rs Outdated Show resolved Hide resolved
@Noah-Kennedy
Copy link
Contributor

@mzabaluev any interest in completing this?

@mzabaluev
Copy link
Contributor Author

@Noah-Kennedy Yes, I should update the PR at least, will get to it soon.

@mzabaluev
Copy link
Contributor Author

I have rebased the branch. The tests (after merging with #52) sporadically fail, but I see the same flakeyness in master.

src/driver/register.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@ollie-etl ollie-etl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the api for Buffers::new is flawed, and should instead take a buffer and fragment it internally.

@FrankReh
Copy link
Collaborator

@mzabaluev Can we try again and get this merged? You put a lot of work into this. I can get up to speed on it shortly if you wanted to rebase with master. Perhaps wait for the other merge we might pick up first so the rebase only has to be done once more.

@mzabaluev
Copy link
Contributor Author

@FrankReh I'll try to rebase the branch this week.

@mzabaluev
Copy link
Contributor Author

@ollie-etl

I believe the api for Buffers::new is flawed, and should instead take a buffer and fragment it internally.

This API allows pre-registering an arbitrary set of pre-allocated buffers. While it's true that each buffer needs to be allocated separately, registering is meant to be a one-time operation for the lifetime of Runtime, and with a good allocator and reasonably sized buffers, the overhead should be negligible. If bytes was a non-optional dependency, it would be possible to take a sequence of BytesMut handles into fragments of a larger buffer.

BufRegistry keeps track of buffers that are pre-registered
with the IoUring instance and can be checked out for use as FixedBuf
handles.
This is used by a new read_fixed_at method on File.
Used by a new write_fixed_at method on File.
This supports the io_uring_register facility for buffer deregistration,
allowing different buffer sets to be registered over the lifetime
of an io-uring descriptor.
As long as the old buffer allocations are kept alive by BufRegistry or
FixedBuf instances referring to them, there is no ambiguity for
the kernel when an old buffer instance is erroneously used after
its registration is revoked, even if the buffer index has since been
reused for a newly registered buffer, because the new buffer has to use
a different memory allocation.
When BufRegistry::unregister is called, but some FixedBuf handles
obtained from that BufRegistry instance are still around, operations
using these handles are expected to fail gracefully even if the buffer
indices have since been re-registered for new buffers.

This is an edge case, but adding a test for it validates the design
with the capability to unregister the previously registered buffers
and register new ones while in the same io-uring runtime context.
Test that a FixedBuf is correctly checked out from a BufRegistry
by index, and cannot be checked out more than once while the buffer is
either referenced by the FixedBuf handle, or used by an operation
in flight.
Examples on File::read_fixed_at and File::write_fixed_at
get this treatment, as they are added in the same PR.
The blocking is no longer observed as of kernel 5.19.16 on x86_64.
Copy link
Collaborator

@FrankReh FrankReh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The API provided by this PR needs to be evaluated in light of the pbuf_ring support that is going to this crate in the next week or so.

@mzabaluev
Copy link
Contributor Author

Thanks @FrankReh and @Noah-Kennedy for getting #172 merged. The API here is much nicer now, both FixedBuf and Slice<FixedBuf> can be passed to the operations.

@FrankReh
Copy link
Collaborator

I'll try to review now. Took a quick glance and they did look cleaner. But @Noah-Kennedy will be anxious to get his moved files in so this may have to go through another rebase. From his move diffs though, mostly files were moved with no symbol name changes, but the odd use or mod change. I don't know if git will be making that easy for you to merge or rebase with or not. But before doing too much manual work, probably want to wait for Noah to also bless this type of work. I don't think he's commented on it recently.

src/driver/read_fixed.rs Outdated Show resolved Hide resolved
src/driver/read_fixed.rs Outdated Show resolved Hide resolved
src/driver/read_fixed.rs Outdated Show resolved Hide resolved
Copy link
Collaborator

@FrankReh FrankReh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it looks very good. As with all things so complex, there will be many color choices to paint the shed. I would love to see this go in, even if there are things to change down the road - although right now I'm not thinking of any. I can't tell you how many times I thought about doing something with indexed buffers only to then remember that they aren't in the library yet.

@FrankReh
Copy link
Collaborator

Going to wait for @Noah-Kennedy to decide the timing of this merge, or to provide his own review too of course.

@Noah-Kennedy
Copy link
Contributor

I just merged #180, so this needs a rebase. I'll review once that happens.

Copy link
Collaborator

@FrankReh FrankReh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. One comment.

tests/fixed_buf.rs Show resolved Hide resolved
Copy link
Collaborator

@FrankReh FrankReh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@Noah-Kennedy Noah-Kennedy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about parts of this.

I'm a bit concerned about soundness in the case where buffers are checked out and then unregistered. If my understanding is correct, a user could safely cause undefined behavior by using the deref implementation of an existing buffer after the set of buffers has been unregistered.

I'm also uncomfortable with the idea of this being able to leak buffers this easily.

I don't necessarily have solutions to these issues right now, but they need solved before we can merge.

@mzabaluev
Copy link
Contributor Author

mzabaluev commented Nov 23, 2022

I'm a bit concerned about soundness in the case where buffers are checked out and then unregistered. If my understanding is correct, a user could safely cause undefined behavior by using the deref implementation of an existing buffer after the set of buffers has been unregistered.

Unregistering the buffers does not get them deallocated, as long as there is a FixedBuf or a FixedBufRegistry handle holding a reference to the registry's inner data. And only these objects provide safe access to buffers' data.

My concern is rather that it's not necessarily easy to know the moment when the buffers get dropped. But:

  1. The v1 io-uring API for fixed buffers is mostly suited for one time set up and tear down, rather than juggling with successive generations of buffers. IORING_UNREGISTER_BUFFERS works like a synchronous barrier to stop all use of the previously registered buffers (it even imposed a 1 s delay when I tested it on earlier kernels), so it's not advisable to use it in concurrent tasks on the runtime.
  2. The IORING_REGISTER_BUFFERS2 API with tagged buffers, in fact, promotes the model of eventual dropping of replaced buffers as CQEs reporting their release by the kernel get received. So when we get to implement that, deallocation of buffers will become even more hidden and asynchronous.

I'm also uncomfortable with the idea of this being able to leak buffers this easily.

The only problem I see is that leaking a single FixedBuf handle, in general scenarios where such leaking is possible, would leak the entire buffer collection owned by its FixedBufRegistry.

@FrankReh
Copy link
Collaborator

@Noah-Kennedy I think you raised two concerned about 12 hours ago and @mzabaluev gave some feedback 7 hours ago. Do you still believe holding a reference to a buffer after the kernel has had the buffers unregistered is a problem? I may just be repeating @mzabaluev , but I believe the memory has not been freed in that case so no UB there, and the kernel would reject further attempts at using that buffer (but it would have had to be owned, not just referenced). So maybe not a problem?

On the second concern, where do you see buffers being leaked? Or is that part of the first concern and @mzabaluev and I don't follow your point, or you've changed your mind on it?

@Noah-Kennedy
Copy link
Contributor

Sorry about the responsiveness, I'm traveling at moment.

@mzabaluev thanks for the clarification, I see that I misread how that is tracked.

I'm in favor of merging.

@Noah-Kennedy
Copy link
Contributor

@FrankReh I'll leave the merging to you as I'm getting on a plane in just a moment.

@FrankReh FrankReh merged commit 8750caa into tokio-rs:master Nov 23, 2022
@FrankReh
Copy link
Collaborator

@Noah-Kennedy @mzabaluev I think we should have an extras module parent, maybe you have a better name. A place in the repo for some extra modules that aren't needed by the driver or the ops but that a user may want for common tasks. The fixed indexes begs for one or more implementations of sets of buffers. One where the next is supported, as @mzabaluev has called out before. I'm also wondering about one where next would be a Future so one could await the next free index. Let's say I create an index registry with 1000 buffers and I don't want to drop a service if another buffer isn't available, but rather I want to wait for one to free up. Once it is behind a Future, it could also be behind a timed Future that does something different if a buffer isn't freed in a reasonable amount of time.

@Noah-Kennedy
Copy link
Contributor

I'm fully in agreement here.

@mzabaluev
Copy link
Contributor Author

mzabaluev commented Nov 24, 2022

Thanks for shepherding this!

The fixed indexes begs for one or more implementations of sets of buffers. One where the next is supported, as @mzabaluev has called out before.

I'm pretty sure I can work two Option<u16> linked list indices for the next and the previous free buffer into BufState::Free, so that the overall size of the enum does not increase. Updating the list when a buffer is checked out or checked in takes constant time. So the try_next operation could be added for FixedBufRegistry itself.

I'm also wondering about one where next would be a Future so one could await the next free index. Let's say I create an index registry with 1000 buffers and I don't want to drop a service if another buffer isn't available, but rather I want to wait for one to free up.

I think it's also feasible to implement an async next method for FixedBufRegistry. Tasks pending on next can post their wakers internally at the registry, and dropping a FixedBuf will fire up all accumulated wakers at its registry. This feature has a small and constant runtime cost per each buffer check out/in, unless the tasks exhaust the pool of free buffers.

Once it is behind a Future, it could also be behind a timed Future that does something different if a buffer isn't freed in a reasonable amount of time.

I can't think of anything that can't be achieved with existing composing APIs like tokio::time::timeout, but maybe there is something clever to do with the buffers internally when the timeout is elapsed?

@FrankReh
Copy link
Collaborator

Not to confuse anyone, I totally saw using the tokio timeout future. Didn't mean it would be built into our library or an extras module. Just that waiting for the next would be useful and then it's easy to break out with a timeout too if the situation calls for it. Guess I was stating the obvious and that was confusing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants