-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: reduce the CPU usage of HTTP/1.1 header parser #2880
Comments
FYI I have spent considerable time heavily optimizing this code path. There are unlikely to be any low hanging fruit here -- HTTP proxy servers end up spending an incredible amount of time dealing with headers. With that said, I would love to have someone else take a look to see if they can find any wins. |
I'll sign up to study the performance of http_parser. |
@brian-pane FYI I didn't look at http_parser at all. Yes it's possible that there might be some vectorizing wins. I only optimized the layer on top (custom header string, custom data structure for storing headers, and reducing as many copies as possible). |
An idea for anybody who has time to look into |
@brian-pane yeah agreed, would love some analysis on the size of the struct, the size of the header string interning buffer, cache alignment, etc. Have never had time to do any of that. |
Other potential from our overoptimized !Envoy header parsing: we take the raw header buffer and simply annotate header locations with a string-view equivalent (zero copy for header parsing) and having hash-map lookups for referencing headers (better than list walking for the non O(1) headers) |
The PR that I just submitted to the http_parser project should help a bit with the CPU usage. I wasn't able to vectorize the parsing code, but I was able to reduce the average number of conditional branches per input character and also replace some memory reads/writes with register operations. The parsing code might still be vectorizable. A common operation is to find the next instance of any of a small set of characters. I can think of a way to do that in x86 using N+1 SSE registers, where N is the cardinality of the set, but I'd rather avoid adding any assembly language to the codebase if possible. |
One other option to look at is the picohttpparser which according to the benchmark on its page should be about 3-5x faster than http-parser |
Thanks for the pointer; I'll take a look at picohttpparser. |
This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or other activity occurs. Thank you for your contributions. |
@brian-pane Is there still more work you'd like to do on this issue, or should we consider it fixed with #3505? |
There's still room for improvement in the HTTP/1 parser speed, but I don't think I'll have time to work on it in the near future. So I'm ok with closing this issue. |
Description:
From a callgraph profile (
perf record -g
) of an Internet-facing Envoy instance serving mostly HTTP/1.1 traffic, I found that the parsing of request headers and building of the internal HeaderMap structure is somewhat CPU-intensive.Here is the relevant part of the
perf report --sort parent
output, showing the percentage of total non-idle clock cycles spent in function+children:The text was updated successfully, but these errors were encountered: