perf: reduce the CPU usage of HTTP/1.1 header parser #2880

brian-pane · 2018-03-22T22:48:14Z

Description:
From a callgraph profile (perf record -g) of an Internet-facing Envoy instance serving mostly HTTP/1.1 traffic, I found that the parsing of request headers and building of the internal HeaderMap structure is somewhat CPU-intensive.

Here is the relevant part of the perf report --sort parent output, showing the percentage of total non-idle clock cycles spent in function+children:

-   86.12%    86.12%  [other]
   - 80.94% start_thread
      - _ZZN5Envoy6Thread6ThreadC4ESt8functionIFvvEEENUlPvE_4_FUNES5_
         - 80.89% Envoy::Server::WorkerImpl::threadRoutine
            - 80.73% event_base_loop
	       - 75.65% event_process_active_single_queue.isra.29
                  - 62.25% Envoy::Event::FileEventImpl::assignEvents(unsigned int)::{lambda(int, short, void*)#1}::_FUN
                     - 62.04% Envoy::Network::ConnectionImpl::onFileEvent
			- 41.59% Envoy::Network::ConnectionImpl::onReadReady
                           - 34.33% Envoy::Network::FilterManagerImpl::onContinueReading
                              - 19.49% Envoy::Http::ConnectionManagerImpl::onData
                                 - 18.05% Envoy::Http::Http1::ConnectionImpl::dispatch
                                    - 17.62% Envoy::Http::Http1::ConnectionImpl::dispatchSlice
                                       - 17.35% http_parser_execute
                                          - 8.54% Envoy::Http::Http1::ConnectionImpl::onHeadersCompleteBase
                                             - 8.44% Envoy::Http::Http1::ServerConnectionImpl::onHeadersComplete
                                                - 8.01% Envoy::Http::ConnectionManagerImpl::ActiveStream::decodeHeaders
                                                   - 6.30% Envoy::Http::ConnectionManagerImpl::ActiveStream::decodeHeaders
                                                      - 4.54% Envoy::Router::Filter::decodeHeaders
                                                         - 2.35% Envoy::Http::Http1::ConnPoolImpl::newStream
                                                            - 1.82% Envoy::Http::Http1::ConnPoolImpl::attachRequestToClient
                                                               - 1.17% Envoy::Router::Filter::UpstreamRequest::onPoolReady
                                                                  - 1.12% Envoy::Http::StreamEncoderWrapper::encodeHeaders
                                                                       0.91% Envoy::Http::Http1::RequestStreamEncoderImpl::encodeHeaders
                                                        0.69% Envoy::Http::IpTaggingFilter::decodeHeaders
                                                     0.75% Envoy::Http::ConnectionManagerUtility::mutateRequestHeaders
                                                     0.53% Envoy::Http::ConnectionManagerImpl::ActiveStream::refreshCachedRoute
                                          - 3.52% Envoy::Http::Http1::ConnectionImpl::{lambda(http_parser*)#7}::_FUN
                                             - Envoy::Http::Http1::ServerConnectionImpl::onMessageComplete
                                                - 2.51% Envoy::Http::ConnectionManagerImpl::ActiveStream::decodeHeaders
                                                   - 1.82% Envoy::Http::ConnectionManagerImpl::ActiveStream::decodeHeaders
                                                      - 1.23% Envoy::Router::Filter::decodeHeaders
                                                           0.53% Envoy::Http::Http1::ConnPoolImpl::newStream
                                                - 0.75% Envoy::Http::ConnectionManagerImpl::ActiveStream::decodeData
                                                   - 0.64% Envoy::Router::Filter::decodeData
                                                      	0.53% Envoy::Router::Filter::onRequestComplete
                                          - 1.33% Envoy::Http::Http1::ConnectionImpl::{lambda(http_parser*, char const*, unsigned long)#3}::_FUN
                                             - Envoy::Http::Http1::ConnectionImpl::onHeaderField
                                                - 1.17% Envoy::Http::Http1::ConnectionImpl::completeLastHeader
                                                     0.69% Envoy::Http::HeaderMapImpl::insertByKey
                                          - 1.33% Envoy::Http::Http1::ConnectionImpl::{lambda(http_parser*)#1}::_FUN
                                             - 1.28% Envoy::Http::Http1::ServerConnectionImpl::onMessageBegin
                                                - Envoy::Http::ConnectionManagerImpl::newStream
                                                     0.69% Envoy::Server::Configuration::HttpConnectionManagerConfig::createFilterChain
                                          - 1.12% Envoy::Http::Http1::ConnectionImpl::{lambda(http_parser*, char const*, unsigned long)#6}::_FUN
                                             - Envoy::Http::Http1::ServerConnectionImpl::onBody
                                                  0.64% Envoy::Http::ConnectionManagerImpl::ActiveStream::decodeData

The text was updated successfully, but these errors were encountered:

mattklein123 · 2018-03-22T23:52:51Z

FYI I have spent considerable time heavily optimizing this code path. There are unlikely to be any low hanging fruit here -- HTTP proxy servers end up spending an incredible amount of time dealing with headers. With that said, I would love to have someone else take a look to see if they can find any wins.

brian-pane · 2018-03-23T01:30:55Z

I'll sign up to study the performance of http_parser.
I'm looking at a source-line-level profile of the http_parser library now. It might be possible to speed up the parser by vectorizing more of it with memchr. Conveniently, that library comes with a benchmark program, which will make it easy to validate potential optimizations.

mattklein123 · 2018-03-23T04:30:30Z

@brian-pane FYI I didn't look at http_parser at all. Yes it's possible that there might be some vectorizing wins. I only optimized the layer on top (custom header string, custom data structure for storing headers, and reducing as many copies as possible).

brian-pane · 2018-03-23T15:57:32Z

An idea for anybody who has time to look into HeaderMapImpl: determine whether reducing the size of that struct, and/or grouping commonly accessed O(1)-time header fields within the struct so they’re in the same cache line, will help performance by improving L1 cache hits. (The approach we used in Proxygen might also be of interest if cache footprint turns out to be an issue: https://github.com/facebook/proxygen/blob/master/proxygen/lib/http/HTTPHeaders.h)

mattklein123 · 2018-03-23T23:12:10Z

@brian-pane yeah agreed, would love some analysis on the size of the struct, the size of the header string interning buffer, cache alignment, etc. Have never had time to do any of that.

alyssawilk · 2018-03-26T15:19:41Z

Other potential from our overoptimized !Envoy header parsing: we take the raw header buffer and simply annotate header locations with a string-view equivalent (zero copy for header parsing) and having hash-map lookups for referencing headers (better than list walking for the non O(1) headers)

brian-pane · 2018-03-29T04:11:31Z

The PR that I just submitted to the http_parser project should help a bit with the CPU usage. I wasn't able to vectorize the parsing code, but I was able to reduce the average number of conditional branches per input character and also replace some memory reads/writes with register operations.

The parsing code might still be vectorizable. A common operation is to find the next instance of any of a small set of characters. I can think of a way to do that in x86 using N+1 SSE registers, where N is the cardinality of the set, but I'd rather avoid adding any assembly language to the codebase if possible.

georgi-d · 2018-04-11T07:39:40Z

One other option to look at is the picohttpparser which according to the benchmark on its page should be about 3-5x faster than http-parser

brian-pane · 2018-04-11T08:08:42Z

Thanks for the pointer; I'll take a look at picohttpparser.

stale · 2018-07-25T04:49:05Z

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or other activity occurs. Thank you for your contributions.

ggreenway · 2018-07-25T16:30:34Z

@brian-pane Is there still more work you'd like to do on this issue, or should we consider it fixed with #3505?

brian-pane · 2018-07-25T18:04:12Z

There's still room for improvement in the HTTP/1 parser speed, but I don't think I'll have time to work on it in the near future. So I'm ok with closing this issue.

mattklein123 added area/perf help wanted Needs help! labels Mar 22, 2018

brian-pane mentioned this issue Mar 29, 2018

Speed up the http_parser_execute loop nodejs/http-parser#422

Closed

brian-pane mentioned this issue May 30, 2018

Pull in an http_parser performance fix #3505

Merged

mattklein123 removed the help wanted Needs help! label Jun 25, 2018

stale bot added the stale stalebot believes this issue/PR has not been touched recently label Jul 25, 2018

stale bot removed the stale stalebot believes this issue/PR has not been touched recently label Jul 25, 2018

brian-pane closed this as completed Jul 25, 2018

brian-pane mentioned this issue Jan 13, 2019

Performance Issues with Envoy #5536

Closed

cmluciano mentioned this issue Apr 11, 2019

Switch from http-parser to llhttp #5155

Closed

sunjayBhatia mentioned this issue Aug 22, 2019

Envoy Proxy produce a lot of cpu load cloudfoundry/diego-release#436

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: reduce the CPU usage of HTTP/1.1 header parser #2880

perf: reduce the CPU usage of HTTP/1.1 header parser #2880

brian-pane commented Mar 22, 2018

mattklein123 commented Mar 22, 2018

brian-pane commented Mar 23, 2018 •

edited

Loading

mattklein123 commented Mar 23, 2018

brian-pane commented Mar 23, 2018

mattklein123 commented Mar 23, 2018

alyssawilk commented Mar 26, 2018

brian-pane commented Mar 29, 2018

georgi-d commented Apr 11, 2018

brian-pane commented Apr 11, 2018

stale bot commented Jul 25, 2018

ggreenway commented Jul 25, 2018

brian-pane commented Jul 25, 2018

perf: reduce the CPU usage of HTTP/1.1 header parser #2880

perf: reduce the CPU usage of HTTP/1.1 header parser #2880

Comments

brian-pane commented Mar 22, 2018

mattklein123 commented Mar 22, 2018

brian-pane commented Mar 23, 2018 • edited Loading

mattklein123 commented Mar 23, 2018

brian-pane commented Mar 23, 2018

mattklein123 commented Mar 23, 2018

alyssawilk commented Mar 26, 2018

brian-pane commented Mar 29, 2018

georgi-d commented Apr 11, 2018

brian-pane commented Apr 11, 2018

stale bot commented Jul 25, 2018

ggreenway commented Jul 25, 2018

brian-pane commented Jul 25, 2018

brian-pane commented Mar 23, 2018 •

edited

Loading