Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmarking Envoy RPS #28318

Closed
cyrus-mc opened this issue Jul 10, 2023 · 6 comments
Closed

Benchmarking Envoy RPS #28318

cyrus-mc opened this issue Jul 10, 2023 · 6 comments
Labels
area/perf stale stalebot believes this issue/PR has not been touched recently

Comments

@cyrus-mc
Copy link

Title: Benchmarking Envoy

Description:

I understand that when it comes to benchmarking Envoy there are many factors that come into play. My need to benchmark was strictly based on what I observed in our setup of Envoy (which underpins Emissary Ingress) and looking to implementation of Istio (which uses Envoy for the data plane).

My goal was to isolate Envoy as best as possible and drive as high as possible RPS. My expectations of what should be possible were somewhat guided by this previous issue #5536 (#5536).

Describe the issue.

Before I go into my findings I will outline the setup I used when performing the tests. As I mentioned above I have attempted to isolate the tests to Envoy, removing any other limiting contributing factors.

I am testing against Envoy version 1.26 with the following configuration:

    admin:
      address:
        socket_address:
          protocol: TCP
          address: 0.0.0.0
          port_value: 9901

    static_resources:
      listeners:
      - name: listener_0
        address:
          socket_address:
            protocol: TCP
            address: 0.0.0.0
            port_value: 10000
        filter_chains:
        - filter_chain_match:
            server_names:
            - envoy.ce.dat.com
            transport_protocol: tls
        - filters:
          - name: envoy.http_connection_manager
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
              stat_prefix: ingress_http
              route_config:
                name: local_route
                virtual_hosts:
                  - name: lcoal_service
                    domains: [ "*" ]
                    routes:
                    - match:
                        prefix: "/"
                      route:
                        cluster: debug
              http_filters:
              - name: envoy.filters.http.cors
                typed_config:
                  "@type": type.googleapis.com/envoy.extensions.filters.http.cors.v3.Cors
              - name: envoy.filters.http.router
                typed_config:
                  "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
      clusters:
      - name: debug
        connect_timeout: 0.25s
        dns_lookup_family: V4_ONLY
        lb_policy: ROUND_ROBIN
        circuit_breakers:
          thresholds:
            - max_connections: 8192
              max_pending_requests: 8192
              priority: DEFAULT
        load_assignment:
          cluster_name: debug
          endpoints:
            - lb_endpoints:
              - endpoint:
                  address:
                    socket_address:
                      address: 10.36.17.214
                      port_value: 3000

The upstream endpoints are all defined statically. In this case the upstream is a simple NodeJS hello world service, designed to be as dead simple as possible.

Envoy is running within Kubernetes, on a dedicated node (outside of normal daemonset PODs such as CNI, kube-proxy, etc) that has 2 CPU and 4 gig of memory (c5.large). Attached storage is provisioned IOPS at 5k. No limits or requests are set on the Envoy POD.

For the upstream service, as mentioned, it is a simple NodeJS hello world service. Given that I am generating very high RPS I wanted to ensure that the upstream service wasn't limiting throughput, therefore for the duration of the tests I scaled up to 130 PODs, all on dedicated nodes.

My test suite is k6 using a simple ramping arrival rate executor:

import http from 'k6/http';
import { sleep, check } from 'k6';
import { Counter } from 'k6/metrics';

// A simple counter for http requests

export const requests = new Counter('http_reqs');

// you can specify stages of your test (ramp up/down patterns) through the options object
// target is the number of VUs you are aiming for

export const options = {
  discardResponseBodies: true,
  noVUConnectionReuse: false,
  scenarios: {
    contacts: {
      executor: 'ramping-arrival-rate',

      startRate: 100,

      timeUnit: '1s',

      preAllocatedVUs: 4000,

      stages: [
        { target: 10000, duration: '8m' },

        { target: 10000, duration: '5m' },
      ],

    },
  },
};

export default function () {
  // our HTTP request, note that we are saving the response to res, which can be accessed later
  //
  const params = {
    headers: {}
  }

  const res = http.get('http://xxxxxx.com/');

  const checkRes = check(res, {
    'status is 200': (r) => r.status === 200,
    'protocol is HTTP/2': (r) => r.proto == 'HTTP/2.0',
  });
}

The above script ramps up to 10k RPS and then keeps that rate for an additional 5 minutes.

When I execute this test I note the following results:

  1. underlying node CPU.

node-cpu-no-logging

During peak RPS (around @17:31) the idle CPU (green descending line) gets to about 21%.

  1. underlying node load average (1 minute)

node-load-no-logging

Sorry I cut off the Y-axis, but the load peaks just above 2 but for the most part stays under 2.

Based on these metrics my conclusion would be that the underlying node is not over-saturated and can handle additional RPS.

  1. upstream 99th percentile latency

latency-no-logging

This graph is interesting in that it show the response latency for the upstream service (again I apologize for cutting off the Y-axis). But the latency is around 5 ms until it hockey sticks up to almost 300 ms. This uptick corresponds to around 7k RPS.

I have done additional testing on nodes of different sizes (CPU and memory) and based off what I see is that the latency increases around the 3500 RPS per CPU mark. So a 1 CPU node can push 3500 RPS before latency drastically increases, a 2 can do 7k, etc, etc.

The above tests were all ran with access logging disabled. If I enable access logging (since that is standard in most setups) the results are even worse.

Using this Envoy configuration:

    admin:
      access_log_path: /dev/fd/1
      address:
        socket_address:
          protocol: TCP
          address: 0.0.0.0
          port_value: 9901

    static_resources:
      listeners:
      - name: listener_0
        address:
          socket_address:
            protocol: TCP
            address: 0.0.0.0
            port_value: 10000
        filter_chains:
        - filter_chain_match:
            server_names:
            - envoy.ce.dat.com
            transport_protocol: tls
        - filters:
          - name: envoy.http_connection_manager
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
              stat_prefix: ingress_http
              access_log:
                - name: envoy.access_loggers.file
                  typed_config:
                    "@type": type.googleapis.com/envoy.extensions.access_loggers.file.v3.FileAccessLog
                    json_format:
                      authority: "%REQ(:AUTHORITY)%"
                      bytes_received: "%BYTES_RECEIVED%"
                      bytes_sent: "%BYTES_SENT%"
                      downstream_direct_remote_address: "%DOWNSTREAM_DIRECT_REMOTE_ADDRESS%"
                      downstream_local_address: "%DOWNSTREAM_LOCAL_ADDRESS%"
                      downstream_remote_address: "%DOWNSTREAM_REMOTE_ADDRESS%"
                      duration: "%DURATION%"
                      istio_policy_status: "%DYNAMIC_METADATA(istio.mixer:status)%"
                      method: "%REQ(:METHOD)%"
                      path: "%REQ(X-ENVOY-ORIGINAL-PATH?:PATH)%"
                      protocol: "%PROTOCOL%"
                      request_id: "%REQ(X-REQUEST-ID)%"
                      requested_host: "%REQ(HOST)%"
                      requested_server_name: "%REQUESTED_SERVER_NAME%"
                      response_code: "%RESPONSE_CODE%"
                      response_flags: "%RESPONSE_FLAGS%"
                      start_time: "%START_TIME%"
                      upstream_cluster: "%UPSTREAM_CLUSTER%"
                      upstream_host: "%UPSTREAM_HOST%"
                      upstream_local_address: "%UPSTREAM_LOCAL_ADDRESS%"
                      upstream_service_time: "%RESP(X-ENVOY-UPSTREAM-SERVICE-TIME)%"
                      upstream_transport_failure_reason: "%UPSTREAM_TRANSPORT_FAILURE_REASON%"
                      user_agent: "%REQ(USER-AGENT)%"
                      x_forwarded_for: "%REQ(X-FORWARDED-FOR)%"
                      x_user_id: "%REQ(X-USER-ID)%"
                    path: /dev/fd/1
              route_config:
                name: local_route
                virtual_hosts:
                  - name: lcoal_service
                    domains: [ "*" ]
                    routes:
                    - match:
                        prefix: "/"
                      route:
                        cluster: debug
              http_filters:
              - name: envoy.filters.http.cors
                typed_config:
                  "@type": type.googleapis.com/envoy.extensions.filters.http.cors.v3.Cors
              - name: envoy.filters.http.router
                typed_config:
                  "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
      clusters:
      - name: debug
        connect_timeout: 0.25s
        dns_lookup_family: V4_ONLY
        lb_policy: ROUND_ROBIN
        circuit_breakers:
          thresholds:
            - max_connections: 8192
              max_pending_requests: 8192
              priority: DEFAULT
        load_assignment:
          cluster_name: debug
          endpoints:
            - lb_endpoints:

By enabling logging and executing the same ramping arrival rate k6 executor I can only obtain max 7k RPS, as the node CPU hits 100%, load spikes above 3.5 and the latency for the upstream spikes to 1 second.

While I don't know for sure, these numbers seem awfully low to me (and considering #5536 talks about 40k requests per second on a 4 CPU box).

My Envoy configuration is pretty simple so I don't think there are changes there that I could make to improve throughput. I also feel that I have eliminated any external factors.

@cyrus-mc cyrus-mc added the triage Issue requires triage label Jul 10, 2023
@cyrus-mc
Copy link
Author

Istio, through their benchmarking states

The Envoy proxy uses 0.35 vCPU and 40 MB memory per 1000 requests per second going through the proxy

That is a far cry from what I am seeing.

@alyssawilk alyssawilk added area/perf and removed triage Issue requires triage labels Jul 12, 2023
@alyssawilk
Copy link
Contributor

cc @yanavlasov for ideas of who might be into Enovy benchmarking
cc @wbpcode for when he's back

@wbpcode
Copy link
Member

wbpcode commented Jul 13, 2023

First, compare to old version Envoy, the new version Envoy comsume more CPU and has a more poor benchmark result. See #19103. The new version Envoy do more verification and has more features which degrade it's performance. But we are trying to improve it.
Second, if extreme performance is necessary, then json_format is not recommend for now because of the poor performance of the json serializer in the protobuf lib.

@soulxu
Copy link
Member

soulxu commented Jul 13, 2023

Depending on the CPU capabilities the performance is different. It isn't sure #5536 is running on a CPU that has same performance with yours.

And the wrk is closed loop, the k6's ramping-arrival-rate is open loop. https://k6.io/docs/using-k6/scenarios/concepts/open-vs-closed/

So I wasn't sure these two cases are comparable.

If we really want to compare these two cases. In 5536, it uses 10 threads wrk vs 4 thread envoy. In my machine, the envoy hits 100% CPU for this case. Then it got 40k RPS for 4 threads, then 1 thread should be 10K (assume those 4 thread on physical CPU core without hyperthread). Compare to your case, it is 7k RPS when the CPU hits 100%. And your case is TLS enabled, 5536 is non-TLS. It seems it isn't too far.

But yes, I'm still not sure these two cases are comparable.

Just my two cents I wasn't sure I'm right.

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale stalebot believes this issue/PR has not been touched recently label Aug 12, 2023
@github-actions
Copy link

This issue has been automatically closed because it has not had activity in the last 37 days. If this issue is still valid, please ping a maintainer and ask them to label it as "help wanted" or "no stalebot". Thank you for your contributions.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Aug 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/perf stale stalebot believes this issue/PR has not been touched recently
Projects
None yet
Development

No branches or pull requests

4 participants