Define complete set of integration points #251

howardjohn · 2022-12-06T18:22:32Z

A concern for ztunnel is that it expands to support an unbounded number of integrations, leading to similar issues we have with Envoy. A goal of ztunnel is that it is light and minimal.

I would propose we limit integration points to the following:

Metrics: Prometheus and OTLP
Access Logs: stdout and OTLP
Traces: OTLP (if we implement tracing)
CA: Citadel gRPC and Kubernetes CSR (we can actually drop Citadel if Istiod supports CSR, but we might want it for VMs)

type: design

bleggett · 2023-02-02T19:06:54Z

One thing I think it is important for ambient to continue to support is support for alternative SPIFFE-compliant workload CAs that implement SDS

We already support that in sidecar so parity is good/important, but in general I think having as few opinions as possible in ztunnel about how certs are minted and rotated and what the CA is attesting (or even what CA you happen to be using to do those things - our default or another) is good for constraining ztunnel's complexity.

We could build our own SDS-lite in ztunnel, and obviously we don't want to implement envoy specs for everything, but where it gets us broader compatibility with existing stuff and lets us clearly demarcate what ztunnel will not do and will not implement, I think it can be helpful.

howardjohn · 2023-02-02T19:09:55Z

Any option would be a tradeoff between supported integrations and complexity. SDS might make the cut. The Kubernetes CSR is a core K8s API intended for exactly that purpose though, so I would be curious if existing SDS implementations could/would implement the K8s CSR API - and if not, their reasoning may be interesting

howardjohn · 2023-02-02T19:10:26Z

Hmm I guess SDS probably gets us a full cert+priv key, while CSR simply signs them.

bleggett · 2023-02-02T19:17:10Z

@howardjohn does ztunnel currently (or do we expect it to in the future) mint workload certs or sign anything related to workloads itself, or does it largely work like sidecars do and simply expect $some_workload_cert to exist and be validatable against some authority?

I would assume not, and that it should be able to remain pretty stupid about workload certs.

howardjohn · 2023-02-02T19:29:58Z

We have a certificate per workload identity on the node, not just 1 cert

bleggett · 2023-02-02T20:20:12Z

We have a certificate per workload identity on the node, not just 1 cert

Right - but ztunnel shouldn't ever need to mint any of those, it should just expect to obtain them, and validate them against some authority, correct.

howardjohn · 2023-02-02T20:21:47Z

If there was some way to obtain them, yes. Currently there isn't afaik

bleggett · 2023-02-03T15:14:53Z

Linking istio/istio#42339 as potentially related - as currently sidecar proxy SDS support is how we integrate SPIRE with Sidecar and presumably we'd need a semi-compatible model for ambient.

howardjohn · 2023-02-03T16:13:37Z

I will note full compatibility with all Istio features is an anti-goal of ambient; part of the goals are to ditch all the baggage accumulated over the years. The goal of this issue is to find the minimum set of integration points we can have. So while I think SDS is plausible, sidecar support is not an important factor in decisions IMO. For example, we would not add lightstep support or something (instead using Otel)

bleggett · 2023-02-03T16:41:21Z

I will note full compatibility with all Istio features is an anti-goal of ambient; part of the goals are to ditch all the baggage accumulated over the years. The goal of this issue is to find the minimum set of integration points we can have. So while I think SDS is plausible, sidecar support is not an important factor in decisions IMO. For example, we would not add lightstep support or something (instead using Otel)

Agreed, and that's certainly fair - but in general keeping ztunnel as cleanly separated from the workload CA as possible in a way that supports people bringing their own seems like a reasonable thing to consider, as a clarifying function for what ztunnel should not care about.

kyessenov · 2023-02-03T17:46:39Z

I think it's a nice aspiration to limit the integrations, but the reality is that the world evolves, and the standards come and go (not to mention it'll be ages before current ones are fully adopted). I'd suggest instead to focus on clear separations in #353, and support integrations with extensions.

bleggett · 2023-02-03T17:55:14Z

I think it's a nice aspiration to limit the integrations, but the reality is that the world evolves, and the standards come and go (not to mention it'll be ages before current ones are fully adopted). I'd suggest instead to focus on clear separations in #353, and support integrations with extensions.

That's an avenue as well, agreed. At the end of the day, I think it's important to have a requirement that ztunnel should be OK with you bringing your own SPIFFE-compliant workload CA instead of forcing people to use the one Istio ships - at this point I don't have a strong opinion on how that should be implemented.

bleggett · 2023-02-08T22:38:45Z

I think @kyessenov brings up a good point wrt "integration points" here in #353 - I think there might be two categories of things we are discussing here:

Documented, end-user facing integrations - "bring your own X and we will play with it" - these are the kinds of things John is mentioning - the list John has here makes sense, and I agree these need to be limited and formalized, as we will have to document and support them here in this repo -> we (this repo) bear the cost.
Internal interfaces/extension points that make it easier to hack on ztunnel proper, but which we don't make stability, compatibility, or functionality guarantees about and which we don't support as user-facing - if you want to use one of these you either need to fork the repo (and bear that cost yourself) or wrap ztunnel with your own code (and bear the cost of keeping in sync with upstream) -> you (entity hacking on ztunnel) bear the cost.

I think we will have to have both of these - I think John's list seems like a very reasonable upper bound for category no. 1 at this time, and I think things like e.g. #353 make sense for category no. 2 and what @kyessenov is pointing out - internal interfaces and abstraction points that make it easier for people to work on and add self-maintained extensions to the code (possibly for future upstreaming, possibly not), but which don't meet the bar of "this repo will support this as a global integration point for end-user BYOS"

As far as for constraining category no. 2, I think the only practical/effective way to do that in the long run is to keep ztunnel's sphere of functional responsibility and feature set limited.

kyessenov · 2023-02-08T22:59:41Z

My point is that limiting (1) is basically impossible, any software will have to majorly adapt to evolving standards in the cloud infrastructure. Even OpenTelemetry is rapidly changing, so it's not clear which version of OpenTelemetry is referenced here. (2) will help re-disk, because it'll limit the impact of change. SPIRE/SPIFFE will also change, and SDS is not really a standard (even Google doesn't always use it for mTLS).

The best way to preserve compatibility is via protocols, so data (HBONE) and config specifications (xDS) is what really would help software evolution.

bleggett · 2023-02-08T23:19:20Z

My point is that limiting (1) is basically impossible, any software will have to majorly adapt to evolving standards in the cloud infrastructure. Even OpenTelemetry is rapidly changing, so it's not clear which version of OpenTelemetry is referenced here. (2) will help re-disk, because it'll limit the impact of change. SPIRE/SPIFFE will also change, and SDS is not really a standard (even Google doesn't always use it for mTLS).

Agree - limiting either (1) or (2) is not entirely possible in the cosmic sense, but being slow/parsimonious with what we agree to support with (1) is something we do need to do, and to your point having (2) de-risks/allows us to do that.

The best way to preserve compatibility is via protocols, so data (HBONE) and config specifications (xDS) is what really would help software evolution.

I agree with that up to the point where we invent our own protocol and specification instead of using/tracking someone else's open spec - OTel is an example of that, it's an open spec + wire standard (the only one, really) with broad support we should track and follow, and not care much about - there's very little value in supporting anything else at this point.

SDS is just a de-facto standard yeah, but the SDS pattern of keeping secrets and CAs (and what those CAs attest about the workloads they are minting certs for) formally separate from proxy concerns is one we will need to follow, regardless of the mechanism.

kyessenov · 2023-02-08T23:29:49Z

I agree with that up to the point where we invent our own protocol and specification instead of using/tracking someone else's open spec - OTel is an example of that, it's an open spec + wire standard (the only one, really) with broad support we should track and follow, and not care much about - there's very little value in supporting anything else at this point.

Yes, I didn't mean we invent custom protocols. Exactly the opposite - we use the standards like MASQUE, xDS data plane config, OTel protocols. Integrations happen on the other side of these protocols - if you want custom telemetry, use an OTel collector; want some fancy routing, modify xDS, etc. Custom secret distribution may be in the domain of xDS (SDS). We should be very clear that HBONE is not a new protocol, it's an adaptation of MASQUE.

The main question raised earlier was about wire integrations, e.g. Wasm modules. If we follow the above principle, the right architecture is to chain traffic using a standard protocol to a different process. This can be a waypoint host or a different runtime that speaks HBONE. Waypoint is not just for HTTP processing, it is really the integration mechanism for the traffic analysis that is potentially unsafe or costly.

bleggett · 2023-02-09T21:33:11Z

Waypoint is not just for HTTP processing, it is really the integration mechanism for the traffic analysis that is potentially unsafe or costly.

I think there are some cases this does not cover - where a waypoint roundtrip is not acceptable, or we have no guarantee that a waypoint exists - but agree in general, especially for anything above L4/simple TCP packet routing.

SpectralHiss · 2023-04-24T13:12:01Z

Continued support for Citadel gRPC (istio-csr API) in ztunnel would allow us to drop-in the istio-csr cert-manager project for ambient as well. is it too much of a pain to move the citadel istio-agent code to rust? we can try and provide assistance on this front.
If not mistaken it seems from this issue that only Kubernetes CSR is currently definitely marked for inclusion in future releases?

howardjohn · 2023-04-24T15:19:07Z

I don't think there is agreement on dropping Citadel yet. This is also currently the only mode actually implemented, btw.

dhruvmehtaaa · 2024-02-06T13:14:13Z

Hello Everyone,
My name is Dhruv Mehta and I am currently a student at IIT Bombay, India. I would like to work on this issue in the upcoming LFX term.

kfaseela · 2024-02-06T14:12:38Z

@dhruvmehtaaa We are still finalizing on the LFX mentee selection process, and once done, will reach out to the selected person.

kfox1111 · 2024-12-21T01:01:18Z

I would think, choosing a mechanism that outsources the key/cert/ca config to an outside process would be much more compatible with existing and future software, then a limited subset of functionality like passing a csr to another process. The key/cert/ca api would allow processes to be plugged in for kubernetes and spire, while just a csr excludes spire support.

howardjohn added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Dec 6, 2022

howardjohn added this to ztunnel tracking Jan 3, 2023

howardjohn mentioned this issue Feb 2, 2023

CA: implement prioritization/rate limitting #334

Closed

bleggett mentioned this issue Feb 9, 2023

proposal: extensions #353

Closed

howardjohn mentioned this issue Feb 14, 2023

WIP - SPIRE integration with ambient mesh istio/istio#43273

Closed

12 tasks

keithmattix added this to Ambient tracking Jun 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Define complete set of integration points #251

Define complete set of integration points #251

howardjohn commented Dec 6, 2022

bleggett commented Feb 2, 2023 •

edited

Loading

howardjohn commented Feb 2, 2023

howardjohn commented Feb 2, 2023

bleggett commented Feb 2, 2023 •

edited

Loading

howardjohn commented Feb 2, 2023

bleggett commented Feb 2, 2023

howardjohn commented Feb 2, 2023

bleggett commented Feb 3, 2023

howardjohn commented Feb 3, 2023

bleggett commented Feb 3, 2023

kyessenov commented Feb 3, 2023

bleggett commented Feb 3, 2023 •

edited

Loading

bleggett commented Feb 8, 2023 •

edited

Loading

kyessenov commented Feb 8, 2023

bleggett commented Feb 8, 2023 •

edited

Loading

kyessenov commented Feb 8, 2023

bleggett commented Feb 9, 2023 •

edited

Loading

SpectralHiss commented Apr 24, 2023 •

edited

Loading

howardjohn commented Apr 24, 2023

dhruvmehtaaa commented Feb 6, 2024

kfaseela commented Feb 6, 2024

kfox1111 commented Dec 21, 2024

Define complete set of integration points #251

Define complete set of integration points #251

Comments

howardjohn commented Dec 6, 2022

bleggett commented Feb 2, 2023 • edited Loading

howardjohn commented Feb 2, 2023

howardjohn commented Feb 2, 2023

bleggett commented Feb 2, 2023 • edited Loading

howardjohn commented Feb 2, 2023

bleggett commented Feb 2, 2023

howardjohn commented Feb 2, 2023

bleggett commented Feb 3, 2023

howardjohn commented Feb 3, 2023

bleggett commented Feb 3, 2023

kyessenov commented Feb 3, 2023

bleggett commented Feb 3, 2023 • edited Loading

bleggett commented Feb 8, 2023 • edited Loading

kyessenov commented Feb 8, 2023

bleggett commented Feb 8, 2023 • edited Loading

kyessenov commented Feb 8, 2023

bleggett commented Feb 9, 2023 • edited Loading

SpectralHiss commented Apr 24, 2023 • edited Loading

howardjohn commented Apr 24, 2023

dhruvmehtaaa commented Feb 6, 2024

kfaseela commented Feb 6, 2024

kfox1111 commented Dec 21, 2024

bleggett commented Feb 2, 2023 •

edited

Loading

bleggett commented Feb 2, 2023 •

edited

Loading

bleggett commented Feb 3, 2023 •

edited

Loading

bleggett commented Feb 8, 2023 •

edited

Loading

bleggett commented Feb 8, 2023 •

edited

Loading

bleggett commented Feb 9, 2023 •

edited

Loading

SpectralHiss commented Apr 24, 2023 •

edited

Loading