-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate "switched-goto" for compilers without computed gotos #537
Comments
Seems interesting! We had a small related discussion here where the timings showed that computed gotos are only giving the eval loop a 1% speedup on Linux these days. (On the other hand, I got a 5-10% speedup on when I added computed gotos to the |
IIUC the blog post essentially changes the ...
TARGET(OP1) {
...
DISPATCH();
}
TARGET(OP2) {
...
DISPATCH();
}
... would expand to label_OP1:
...
switch (*next_instr++) {
case OP1: goto label_OP1;
case OP2: goto label_OP2;
...
}
}
label_OP2:
...
switch (*next_instr++) {
case OP1: goto label_OP1;
case OP2: goto label_OP2;
...
}
}
... and that switch would be repeated in each instruction. (Exactly where and how And the theory is that having N copies of the switch (one for each opcode) helps the CPU's branch predictor because it will learn the most likely branch taken at the end of each opcode, so we won't need It would not be a very complicated experiment to carry out, except we'd need to wait until we have Windows benchmarking infrastructure in place, since on Linux/Mac we already have the computed goto. One of my worries would be that the compiler sees that you have the same big piece of code in many places and it just unifies that into a single copy that it jumps to from everywhere. Compilers are weird that way. |
As for Windows, _PyEval_EvalFrameDefault will hit MSVC's stuck or The current 3.12 eval function can be less optimized getting the warning ( |
Recently I came across this blog post that shows a rather weird way of having something between a standard switch dispatching for an eval loop and an eval loop with computed gotos. Now that we're experimenting with generating a lot of that code, we could maybe see if it makes any sense to adopt this strategy?
The text was updated successfully, but these errors were encountered: