Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tls: add per-upstream session key caching support #38479

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

jronak
Copy link
Contributor

@jronak jronak commented Feb 17, 2025

Commit Message:
This pull request introduces a new configuration option, max_session_cache_upstream_hosts, which allows TLS session keys to be cached and used on a per-upstream host basis. It addresses #33512 by ensuring that a session key generated for one upstream host isn’t mistakenly reused by another.

Additional Description:
In a typical service mesh, upstream hosts serving the same destination might have unique TLS configurations (for example, different session ticket keys). As a result, a session key issued by one host shouldn’t be used with another. This pull request enables caching on a per-upstream host basis, ensuring that session keys are used only by the host that generated them, which in turn helps maintain correct and efficient TLS session resumption.

This option sets the maximum number of upstream hosts that can cache and use TLS session keys. It works together with the existing max_session_keys setting, which limits the number of session keys per host. For instance, if you set max_session_keys to 10 and max_session_cache_upstream_hosts to 20, you can cache up to 200 session keys overall.

I'll follow up with another pull request to add support for the following improvements:

  • Automatically deleting session keys for an upstream host when that host is removed (e.g., via an EDS update).
  • Optionally allowing session key storage per worker thread. Currently, the session cache is shared across all workers, which has significantly improved the session reuse rate internally at Uber. However, this shared cache might become a bottleneck for workloads with high connection churn due to locking overhead.

Risk Level: Low
Testing: Unit tests

Copy link

CC @envoyproxy/api-shepherds: Your approval is needed for changes made to (api/envoy/|docs/root/api-docs/).
envoyproxy/api-shepherds assignee is @markdroth
CC @envoyproxy/api-watchers: FYI only for changes made to (api/envoy/|docs/root/api-docs/).

🐱

Caused by: #38479 was opened by jronak.

see: more, trace.

@adisuissa
Copy link
Contributor

Assigning TLS extension codeowner for a first-pass.
/assign @botengyao

// would be 200.
//
// Defaults to 0, which disables TLS session key caching per upstream host.
google.protobuf.UInt32Value max_session_cache_upstream_hosts = 8;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible that there will be other configurable parameters for per-upstream-host session caching in the future? If so, consider using a separate message as an alternative:

message TlsPerHostSessionCacheConfig {
  uint32 max_session_cache_upstream_hosts = 1;

  // ...can add more fields here in the future...
}

message UpstreamTlsContext {
  // ...existing fields...

  // If set, per-host session caching is enabled.
  TlsPerHostSessionCacheConfig per_host_session_cache_config = 8;
}

Copy link
Contributor Author

@jronak jronak Feb 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it could potentially carry other features in the future, updated with separate message. Thanks for the review!

@@ -25,7 +25,7 @@ option (udpa.annotations.file_status).package_version_status = ACTIVE;
// [#extension: envoy.transport_sockets.tls]
// The TLS contexts below provide the transport socket configuration for upstream/downstream TLS.

// [#next-free-field: 8]
// [#next-free-field: 9]
message UpstreamTlsContext {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this something we might also need in DownstreamTlsContext for the server side?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feature is only needed on the client/upstream side. Servers/downstreams issue stateless session keys (tickets) for resumption, so they don't require per-host session caching.

@markdroth
Copy link
Contributor

/wait

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants