CA: implement prioritization/rate limitting #334

howardjohn · 2023-01-12T23:27:06Z

Depends on #298

Currently we have 3 sources of CA calls:

On demand: we have a request incoming and need a cert (right now!).
Background refresh: we have a cert and its near expiration, we need to refresh. By default this is 24hr lifetime, start refresh at 12hr
Prewarming. New workload (or ztunnel just started), we want to preload the cert to reduce latency on first call (avoid "cold starts")

In order of important, this likely looks like: On Demand >>>> Background refresh when really close to expiration > Prewarming == Background refresh. Could be simplified to just "on demand is top priority".

Additionally, CA requests are expensive. 255 concurrent requests to 1 istiod would almost certainly overwhelm it (in CPU cost); other CAs may have different constraints. We are also not the only client. We need a sensible strategy that trades off not killing the CA with getting all the certs when we want.

At the very least, we should have a way to prioritize on demand requests. This could be something simple like adding some delay in prewarming, or have some priority queue.

bleggett · 2023-02-02T18:58:20Z

Dumb q - do we (or do we intend to) support the SDS protocol so that ztunnel could play nice with a compatible replacement workload CA/SDS, like the rest of Istio does today?

howardjohn · 2023-02-02T19:02:17Z

I started #251 to define these integration interfaces. I hadn't put SDS in there but its plausible.

I will say in general the idea has been to have a few common integration points. For example, instead of 10 telemetry providers just use OTEL, which itself supports the kitchen sink of providers.

There is less of a standard around CAs compared to telemetry though.

keithmattix · 2023-06-23T18:46:59Z

I've seen some semblance of this in the code; has this been implemented?

howardjohn · 2023-06-23T18:49:59Z

yeah I think this is done. thanks!

* Allow mismatched ns/hostnames and pick randomly based on services for the dst workload Signed-off-by: Kevin Dorosh <[email protected]> * Fix lint Signed-off-by: Kevin Dorosh <[email protected]> * Fix lint, again Signed-off-by: Kevin Dorosh <[email protected]> * Update test to cover multi namespace multi network Signed-off-by: Kevin Dorosh <[email protected]> * Remove dead code Signed-off-by: Kevin Dorosh <[email protected]> --------- Signed-off-by: Kevin Dorosh <[email protected]>

adiprerepa self-assigned this Jan 13, 2023

howardjohn mentioned this issue Jan 23, 2023

certificates: stop trying to refresh when identity is no longer on the node #351

Closed

adiprerepa removed their assignment Jan 26, 2023

howardjohn assigned qfel Feb 7, 2023

This was referenced Feb 13, 2023

Implement prioritized certificate fetching #391

Merged

Clean up error handling in Secret Manager #406

Merged

howardjohn closed this as completed Jun 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CA: implement prioritization/rate limitting #334

CA: implement prioritization/rate limitting #334

howardjohn commented Jan 12, 2023

bleggett commented Feb 2, 2023

howardjohn commented Feb 2, 2023

keithmattix commented Jun 23, 2023

howardjohn commented Jun 23, 2023

CA: implement prioritization/rate limitting #334

CA: implement prioritization/rate limitting #334

Comments

howardjohn commented Jan 12, 2023

bleggett commented Feb 2, 2023

howardjohn commented Feb 2, 2023

keithmattix commented Jun 23, 2023

howardjohn commented Jun 23, 2023