-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
prov/net: TX operations now support io_uring #1
Conversation
The patch adds the argument --with-uring[=DIR] to detect liburing support in libfabric. Signed-off-by: Sylvain Didelot <[email protected]>
The patch implements the utility functions ofi_{sendv,recv}_socket() that can be used to send/recv data to/from a socket using an iovec as input argument. The bsock interface was also updated to make use of these new functions. Signed-off-by: Sylvain Didelot <[email protected]>
This patch moves all the socket ops to a common API. In a later patch, the bsock interface will allow the provider to specify a custom API for the socket ops (required for io_uring support). Signed-off-by: Sylvain Didelot <[email protected]>
Signed-off-by: Sylvain Didelot <[email protected]>
We need to differentiate io_uring error codes from other types of error codes, including: - FI_EIOURING_PREP: "io_uring operation prepared, waiting for submission" - FI_EIOURING_FULL: "io_uring full" Signed-off-by: Sylvain Didelot <[email protected]>
The patch adds the file src/iouring.c where are listed all the definitions related to io_uring. In addition, configure.ac was modified to export the autoconf variable HAVE_LIBURING and which is used to determine whether src/iouring.c must be compiled or not. Signed-off-by: Sylvain Didelot <[email protected]>
io_uring operations in the sockapi interface would return -FI_EIOURING_PREP if the operation was prepared in the ring and is waiting for submission. recv() and recvv() require some changes to the bsock API to return how many bytes were read from the byteq to the caller (similar to what was done for zerocopy send). I added a couple of assertion failures to detect unimplemented features related to io_uring. Signed-off-by: Sylvain Didelot <[email protected]>
FI_NET_IO_URING=[1|0] can be used to switch on/off io_uring support from the net provider if compiled with HAVE_LIBURING. Signed-off-by: Sylvain Didelot <[email protected]>
This patch adds the tx/rx io_urings in the progress instance. They will be used in a later patch. Signed-off-by: Sylvain Didelot <[email protected]>
io_uring will need to complete cur_tx on its own where a CQE is received for a given EP. Let's move the completion of ep->cur_tx to its own function, so we can call it from multiple places. Signed-off-by: Sylvain Didelot <[email protected]>
Signed-off-by: Sylvain Didelot <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fairly quick look over these changes. See comments. Thanks! I think this is getting close to having io_uring support, which I'm hoping will be significant.
@@ -45,6 +45,57 @@ | |||
|
|||
static ssize_t (*xnet_start_op[ofi_op_write + 1])(struct xnet_ep *ep); | |||
|
|||
#ifdef HAVE_LIBURING |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we'll need these ifdef's in the net provider. They already occur in the io_uring code, so all of the calls should be abstracted. Except I guess the calls below are calling io_uring directly? I thought you defined abstractions for these?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought you wanted the net
provider to manage the io_uring
directly (no abstraction).
I think it makes sense to initialize the io_uring
in the net
provider as it may request specific flags during initialization. I'm not sure we should defined abstractions for that in the common code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was comparing this to the ofi_epoll calls. We make those directly from the providers, but the abstraction hides the implementation, or if it's available. I thought I saw that you had abstractions -- something like ofi_uring_init() -- that the provider could just call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I think I initially misunderstood your statement:
We obviously need an OS abstraction for an io_uring. I would actually introduce a typedef for this, similar to epoll. E.g. struct io_uring ofi_io_uring_t. This abstraction would be mostly a wrapper, with more control given to the provider
By OS abstraction you mean an abstraction in src/iouring.c
. My bad, will fix it.
|
||
/* io_uring rounds up the number of entries to the next power of 2, | ||
* so we could get more entries than initially | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd drop this comment. It's sufficient to note that the return value may be higher then requested. I dno't see where 'entries' is defined or used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, will do.
#define xnet_init_io_uring(io_uring, entries) -FI_ENOSYS | ||
#define xnet_destroy_io_uring(io_uring) do {} while(0) | ||
#define xnet_io_uring_fd(io_uring) INVALID_SOCKET | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you move the io_uring abstraction that you had here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only abstraction I had was for send/recv syscalls (in iouring.c).
I just realized that you wanted to abstract all the calls of io_uring
in src/iouring.c
. My bad, I will fix that.
*/ | ||
entries = io_uring_sq_space_left(&io_uring->ring); | ||
|
||
io_uring->fid.fclass = XNET_CLASS_IO_URING; | ||
io_uring->credits = io_uring_sq_space_left(&io_uring->ring); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Last patch is modifying a previous patch to fix-up errors. Some of the changes should move into the earlier patch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will fix. Thanks.
static bool xnet_get_io_uring_credit(struct xnet_io_uring *io_uring) | ||
{ | ||
if (!xnet_io_uring) | ||
return true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks odd. I'd either expect to fail here or assert that we're using io_uring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If xnet_io_uring
is false, io_uring support is disabled and the number of credits is initially set to 0.
The function always returns true to bypass the credit system when io_uring is disabled.
assert(nready <= XNET_MAX_EVENTS); | ||
for (i = 0; i < nready; i++) { | ||
xnet_progress_cqe(io_uring, cqes[i]); | ||
io_uring->credits++; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a potential problem here with this ordering? We have a completion for a send, but the send may not be done. That falls through to xnet_progress_tx(), which may need to post another transfer, which in turns need another credit. Could we temporarily block the send because we haven't updated the credits?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching that! You are right and the credit must be released before executing xnet_progress_cqe()
. I will update the code.
|
||
if (ret == -FI_EINPROGRESS) { | ||
} | ||
else if (ret == -FI_EINPROGRESS) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, so you need to distinguish between 2 async 'error' codes based on whether using io_uring or async sockets.
/* We cannot enter blocking if io_uring has entries | ||
* that need submission. */ | ||
assert(xnet_io_uring_needs_submit(&progress->tx_io_uring)); | ||
assert(xnet_io_uring_needs_submit(&progress->rx_io_uring)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The assert and comment look like opposites
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aargh, the assert is wrong. Thanks for catching that.
@@ -876,6 +994,8 @@ xnet_handle_events(struct xnet_progress *progress, | |||
} | |||
|
|||
xnet_handle_event_list(progress); | |||
xnet_submit_io_uring(&progress->tx_io_uring); | |||
xnet_submit_io_uring(&progress->rx_io_uring); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these are acting somewhat like a deferred doorbell notification. I would move check for using io_uring here, and wrap both calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, will do.
@@ -237,12 +345,17 @@ static void xnet_progress_tx(struct xnet_ep *ep) | |||
ssize_t ret; | |||
|
|||
assert(xnet_progress_locked(xnet_ep2_progress(ep))); | |||
while (ep->cur_tx.entry) { | |||
while (ep->cur_tx.entry && !ep->cur_tx.io_uring_busy) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to look at how io_uring_busy is used. I mostly follow what you're doing here. But I want to understand where the flow differs between using io_uring and async sends and why we have those differences.
ffa47f0
to
46e49b3
Compare
9418fd8
to
a287217
Compare
16335dc
to
9922e47
Compare
If a posted receive matches with a saved receive, we may need to increment the rx counter. Set the rx counter increment callback to match that of the posted receive. This fixes an assert in xnet_cntr_inc() accessing a NULL cntr_inc function pointer. Program received signal SIGABRT, Aborted. 0x0000155552d4d37f in raise () from /lib64/libc.so.6 #0 0x0000155552d4d37f in raise () from /lib64/libc.so.6 #1 0x0000155552d37db5 in abort () from /lib64/libc.so.6 ofiwg#2 0x0000155552d37c89 in __assert_fail_base.cold.0 () from /lib64/libc.so.6 ofiwg#3 0x0000155552d45a76 in __assert_fail () from /lib64/libc.so.6 ofiwg#4 0x00001555522967f9 in xnet_cntr_inc (ep=0x6e4c70, xfer_entry=0x6f7a30) at prov/tcp/src/xnet_cq.c:347 ofiwg#5 0x0000155552296836 in xnet_report_cntr_success (ep=0x6e4c70, cq=0x6ca930, xfer_entry=0x6f7a30) at prov/tcp/src/xnet_cq.c:354 ofiwg#6 0x000015555229970d in xnet_complete_saved (saved_entry=0x6f7a30) at prov/tcp/src/xnet_progress.c:153 ofiwg#7 0x0000155552299961 in xnet_recv_saved (saved_entry=0x6f7a30, rx_entry=0x6f7840) at prov/tcp/src/xnet_progress.c:188 ofiwg#8 0x00001555522946f8 in xnet_srx_tag (srx=0x6dd1c0, recv_entry=0x6f7840) at prov/tcp/src/xnet_srx.c:445 ofiwg#9 0x0000155552294bb1 in xnet_srx_trecv (ep_fid=0x6dd1c0, buf=0x6990c4, len=4, desc=0x0, src_addr=0, tag=21474836494, ignore=3458764513820540928, context=0x7ffffffeb180) at prov/tcp/src/xnet_srx.c:558 ofiwg#10 0x000015555228f60e in fi_trecv (ep=0x6dd1c0, buf=0x6990c4, len=4, desc=0x0, src_addr=0, tag=21474836494, ignore=3458764513820540928, context=0x7ffffffeb180) at ./include/rdma/fi_tagged.h:91 ofiwg#11 0x00001555522900a7 in xnet_rdm_trecv (ep_fid=0x6d9fe0, buf=0x6990c4, len=4, desc=0x0, src_addr=0, tag=21474836494, ignore=3458764513820540928, context=0x7ffffffeb180) at prov/tcp/src/xnet_rdm.c:212 Signed-off-by: Sean Hefty <[email protected]>
No description provided.