-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance regression since 78687e7 #20
Comments
Indeed, I have noticed this problem. I thought about this yesterday, but I still don't know what's the bottleneck of the current implementation. For the problem you have mentioned above:
So here is my opinion: The #16 has to be done because if we want to support IOCP, then we cannot use one-loop-per-thread strategy. Before profiling, I suspect the real problem is one of these:
The Speed drops from 75640 to 1526 is definitely a bug. |
CPU usage is not 100% (about 10% actually) when running benchmarks. So there must be somewhere that blocks the worker, which is the cause of this issue. |
@zonyitoo I finally found the line causing this regression and I'm sure that your reaction will be just like mine: "OMG". 😄 If I remove this the performance goes back to 76k req/s with |
OOOOOMG...... Yes, with I/O bound applications, |
It might also be worth a try to seperate mio's EventLoop from coio. Since e.g. sockets are Alternatively you might consider reverting back to the "one EventLoop per Thread" design. You would need to transform the scheduler to a hybrid-work-stealing one: You could mark coroutines which are parked for methods on objects which are not sendable (i.e. all socket methods since they cannot be reregistered at another EventLoop in another thread) as not being "stealable". Such coroutines which are parked as "non-stealable" would be required to stay as long in the same Processor as long as they are not resumed by it. I think that this "might" offer better performance and might be easier to archieve than optimizing a shared EventLoop. BTW: I looked at mio's source code and you're right: It doesn't allocate any memory (which is only possible because it's bounded - unbounded channels require linked lists and thus |
Then, it would makes nearly all coroutines can only be run in their own worker-thread, which is not a work-stealing implementation, just one-eventloop-per-thread + coroutine-pool. Shared |
Well actually this would only effect windows, while UNIX platforms could run without any kind of locks, since you can freely shere the Oh and I got another hint for you... 😊 Try removing |
Excellent! But by the way, in the current implementation of Mio, it forbids |
Ah I just saw it... It would be great if one could create mio sockets with a The second biggest performance hog is btw the work stealing queue which is far from being well engineered... It causes a lot of implicit syscalls and/or thread barrriers. 😟 Reason being: They use sequential-consistent ordering EVERYWHERE: https://github.com/kinghajj/deque/blob/master/src/lib.rs#L253-L277 (In case you don't know what's so bad about I'd still argue that one loop per thread might scale better than the shared one. The select/kevent/etc. calls will cause syscalls etc. which probably scales a lot better if more than one thread is busy performing those syscalls. Furthermore you circumvent the effort of using queues to communicate. This just depends on whether it's possible to determine when a coroutine must be locked to a specific Processor and when it can be traded among them using work stealing. If it's not possible then I really whonder why |
The Also, I am still wondering how to make good use of the So many problems arise when Mio comes to 5.0. This crate needs a proper refactor! I know the |
To make a conclusion, all possible optimization:
Will be extended by further discussion. |
I'm going to optimize the |
Today I thought a bit more about the first point (the "Hybrid work-stealing algorithm"). In hindsight you might be right that a shared event loop is better, because I only had cases in mind where network operations are equally balanced between connections (which is the case for the project I'm currently using coio for). But if the workload is unbalanced the worker starvation kicks in and we get the usual problems with simple coroutine schedulers. So yeah... Maybe you've been right all along. I still think that a loop per thread is faster (meaning: we should keep it on our radar), but it won't be faster if we don't have a solution for making work-stealing in the general use-case with mio possible (meaning: it should be lowest priority, huh?). If you'd like me to work on some part of this, or implement some of the solutions we already spoke about, just say so. :) P.S.: If you've ever wondered as to why I'm so active on this project: I would like to learn Rust and simply picked this project for it, because I consider coroutines as highly interesting. ( |
Let me explain my priority goals:
So here the I have been quite busy these days, so I won't commit much lately. Please fell free to comment, I will reply as soon as possible. I want to make BTW, if you are interested, please take a look at the |
It's great to hear that you're planning to take coio so far. 😊 And yes I'd really like to make coio as stable as possible by investigating solutions to the current bugs, but I think there should be some kind of coordination, where you say what you'd like to work on, and what I can/should work on, so we don't fix the same problem twice. I'm soon going to get a bit more busy though, but I'll try to spend at least a some time every day writing code for coio and it's transitive projects. I know it's a lot to ask for, but let's just assume that I'm going to continue supporting this project (which I'm planning to): It would probably really simplify some things if you could grant me push rights to this project (i.e. if you could make me a collaborator) as soon as you trust me with it, because there are a lot of things I have in mind for this and other projects.
|
Sure, I would add you as a collaborator. :) Happy hacking! And actually I am not working on anything. So you can choose your target freely. |
I discovered just now that src/bin/server.rs:71:13: 71:39 error: trait `AsRawFd` is private
src/bin/server.rs:71 use std::sys::ext::io::AsRawFd;
^~~~~~~~~~~~~~~~~~~~~~~~~~
src/bin/server.rs:73:18: 73:34 error: source trait is private
src/bin/server.rs:73 let fd = sock.as_raw_fd();
^~~~~~~~~~~~~~~~ does not in fact mean that a trait is "private" but instead that the I also took a look at context-rs. Your reasoning behind how context switches work are really easy to understand. But I do think that using Boost.Context's assembler is going to be much better in the long run, because their code is surely a lot more fleshed out due to all the manhours spent on it over the years and due to the tests on all those different platforms (e.g.: they have asm files in AT&T and Intel syntax, and thus can be compiled seamlessly on windows). I don't understand how their assembler works yet though ("why do they push/pop all the things?"), but I'm sure I'll get there. Currently it's probably a bit more important to improve coio though. But I'll try to send you a PR with a branch that uses Boost.Context's asm - this might even solve a couple issues along the way. Oh and thanks for making me a collaborator. 😊 |
use std::os::unix::io::AsRawFd; Porting Boost.Context is a great idea, you can find another person is working on that in |
Yeah I found the solution to the And while my knowledge in Rust and Coroutine implementations is definitely lacking I'm quite sure that you can't use inline assembly for the context swaps etc. safely without having naked functions. Those might be coming soon though: |
Of course, I am still waiting for the |
Thread parking strategies will be discussed in #27 . |
Up until e63031c (inclusive) the current benchmark for coio-tcp-echo-server showed (subjectively) good results:
Since then performance has dropped significantly:
Between 78687e7..f747ab8 all builds fail the benchmark for major or minor reasons though:
socket already registered
orNo such file or directory
error in earlier builds (before the change to the more recent mio version) and later because the slab is overfilled, or because using too many semaphores leads to aToo many open files
panic. All of this made ruling out the reason a bit hard for me, but I suspect that it's related to the change to a newer mio version and the shared event loop etc.Any ideas?
The text was updated successfully, but these errors were encountered: