-
Notifications
You must be signed in to change notification settings - Fork 564
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add WasmPlugin API #1940
Add WasmPlugin API #1940
Conversation
😊 Welcome @dgn! This is either your first contribution to the Istio api repo, or it's been You can learn more about the Istio working groups, code of conduct, and contributing guidelines Thanks for contributing! Courtesy of your friendly welcome wagon. |
Skipping CI for Draft Pull Request. |
/test all |
/test all |
extensions/v1alpha1/wasm.proto
Outdated
// workloads in the same namespace. If the `WasmExtension` is | ||
// present in the config root namespace, it will be applied to all | ||
// applicable workloads in any namespace. | ||
istio.type.v1beta1.WorkloadSelector workload_selector = 5; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way the proposed API is only designed for local workloads correct? What is the thought process to selectively select the workload for a or two other clusters?
You may be thinking we could leverage the MCS expose + import service API but I don't think that makes sense as I could be deploying Wasm extensions to workloads that are not exposed outside of the cluster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we have such cluster/mesh selectors anywhere else? I would imagine you would label them differently, that would certainly work with this API
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@linsun what makes you think its local workloads? Everything else selects across all clusters - since the config only lives in one cluster?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the reason I am asking this question is we have a similar Solo API called WasmDeployment and that API allows users to specify which cluster this WasmDeployment is for. So I'm trying to understand how this could be mapped to Istio API.
IIUC, @dgn, in this API, the workload selector is only for the local cluster in multi primary user case so basically Istio selects the matching workload based on the selector and apply the Wasm extension to the workload. This means if I have reviews in cluster 1 and cluster 2, and if I want this Wasm extension to be applied to reviews in both of these clusters, I would deploy a Wasm resource to both clusters
@howardjohn I think you are referring to primary-remote scenario, which I agree the Istio config is applied to all clusters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In all Istio APIs currently, which I would assume this follows suite, workload selectors apply to all clusters. If we were to change it I would want to change it to all APIs which is a major undertaking - but this one should be consistent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@linsun we do not have any other api that is cluster specific today. This api uses an existing istio.type.v1beta1.WorkloadSelector message. If you want to propose cluster awareness (or location, or anything other than labels) you should do it as a separate proposal.
@mandarjog I am not proposing adding cluster to this API yet. I am trying to understand the semantics of this API... wanted to see if we could translate Solo's Wasmdeployment resource to this new Wasm resource.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just mentioning that the semantics here are not just for this API, but for all of Istio. We need to be consistent in our configuration application across the project. So maybe uplevel the conversation out of this PR and into a discussion in TOC or something if we want to talk about how config gets applied to particular endpoints and how to take cluster etc. into account.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, it is across all APIs. The question came up specifically here is because the comments were not clear.
I can try create some doc for this. If anyone else is interested, let me know!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does k8s intend to standardize a synthetic cluster label just like they have for namespace IIRC
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does anyone know more about this synthetic cluster label? that is interesting. @howardjohn @smawson @nmittler
How could i apply this to a specific service port? inbound and outbound? |
Very good question. In my current PoC, it will be applied to all inbound HTTP filter chains of the service ports. We might want to limit it to specific ports, but I considered this an extension we can add later. I don't think we should apply to outbound listeners as it makes the system quite complex to reason about (different behaviour depending on which service is sending the request) |
extensions/v1alpha1/wasm.proto
Outdated
// <namespace>/<name>. Names that do not contain forward-slashes are | ||
// interpreted as the names of secrets in the same namespace as the | ||
// `WasmExtension`. | ||
repeated string image_pull_secrets = 4; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this actually work? How does the proxy get the secret to pull the image?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in the current design, the agent is responsible for fetching and unpacking the OCI images, so the agent would have to fetch the secrets using client-go prior to that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think this is viable. Importing client-go into the agent adds major overhead (ie invert this: https://mobile.twitter.com/karlstoney/status/1367134238613663744). Additionally, this would require giving ~every pod RBAC permissions for secrets. The load from every single pod in the cluster opening k8s watches is also likely to kill and non-trivial cluster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think this is viable. Importing client-go into the agent adds major overhead (ie invert this: https://mobile.twitter.com/karlstoney/status/1367134238613663744).
that would be an argument for moving the pulling of images to a central component, either istiod itself or something else. I don't think it means the API is unimplementable.
Additionally, this would require giving ~every pod RBAC permissions for secrets. The load from every single pod in the cluster opening k8s watches is also likely to kill and non-trivial cluster.
All pods have access to secrets in their namespace by default. For secrets outside the pod namespace, yes, users would have to make sure they have access. I would also argue that watching isn't necessary, as a) pull secrets rarely change and b) we only need it when pulling the image which, depending on pullPolicy, doesn't necessarily happen often. If we get a 403 back, we can still go and re-fetch the Secret
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it means the API is unimplementable.
I agree, but I would not feel comfortable approving an API that doesn't also have a planned implementation?
All pods have access to secrets in their namespace by default.
For mounting yes, but not for reading through the API, right? Since this is dynamic, they would need to read through the API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, just compiling in the library adds to the RAM in the tweet, and binary size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is also fairly incompatible with VMs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All pods have access to secrets in their namespace by default.
For mounting yes, but not for reading through the API, right? Since this is dynamic, they would need to read through the API.
That's right, nevermind. I had assumed that this was the case by default. Makes sense that it isn't though. So yes, a user would have to guarantee that all pods that need to use an Extension have access to the pullSecret, should a pullSecret be required. It's not a blocker for me personally, but I understand the concern. Pulling from a central component would alleviate this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For istio-agent based fetching, we can make Wasm API only referencing local mounted secrets. That would remove concern of client-go and incompatibility with VMs. Although user would need a way to indicate secret mount, which I original thought ExtensionProvider would be a great fit for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can separate concerns for accessing the extension provider from the use of the extension? E.g. add a Wasm extension provider with pull secrets that is globally defined in MeshConfig, and reference the Wasm provider from this API. We could support both inlined and referenced Wasm extension provider with the intent to allow using secrets only from the referenced provider.
Another q: If I have multiple Wasm images, do I need to create one Wasm resource for each Wasm image/Wasm filter for a given workload? |
extensions/v1alpha1/wasm.proto
Outdated
// workloads in the same namespace. If the `WasmExtension` is | ||
// present in the config root namespace, it will be applied to all | ||
// applicable workloads in any namespace. | ||
istio.type.v1beta1.WorkloadSelector workload_selector = 5; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just mentioning that the semantics here are not just for this API, but for all of Istio. We need to be consistent in our configuration application across the project. So maybe uplevel the conversation out of this PR and into a discussion in TOC or something if we want to talk about how config gets applied to particular endpoints and how to take cluster etc. into account.
extensions/v1alpha1/wasm.proto
Outdated
istio.type.v1beta1.WorkloadSelector workload_selector = 5; | ||
|
||
// Determines where in the filter chain this `WasmExtension` is to be injected. | ||
istio.networking.v1alpha3.EnvoyFilter.Patch.FilterClass phase = 6; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not take a dependency on EnvoyFilter and pull that enum out into its own thing that we can define here.
I think the values/semantics are actually pretty good (which phase to run this was extension during) but this shouldn't depend on EnvoyFilter proto.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also do we need an option to run an extension before the istio authentication filters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
enum likely also needs refinement @mandarjog
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@smawson @louiscryan we will relocate the FilterClass and amend enums in a separate PR after some more discussion here.
@dgn I disagree. Manipulate request at outbound (including gateway) and refresh routing accordingly are probably one of the most common use cases that user would want with Wasm extensions. I think for now, at outbound, we can make it only support injecting filters at listener level. Envoy Wasm extension does not yet support per route config anyway. |
@dgn want to make sure that it is our collective intention to only use Wasm api and no longer need EnvoyFilter api to manipulate all wasm use cases. wasm api will not cover aspects of EnvoyFilter api that are unrelated to wasm. @mdhume we would love if you can come up with a proposal for migration from EF to the new API since you are running things in prod. |
// +cue-gen:WasmPlugin:printerColumn:name=Age,type=date,JSONPath=.metadata.creationTimestamp,description="CreationTimestamp is a timestamp | ||
// representing the server time when this object was created. It is not guaranteed to be set in happens-before order across separate operations. | ||
// Clients may not set this value. It is represented in RFC3339 form and is in UTC. | ||
// Populated by the system. Read-only. Null for lists. More info: https://git.k8s.io/community/contributors/devel/api-conventions.md#metadata" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking really good to me. |
// If `priority` is not set, or two `WasmPlugins` exist with the same | ||
// value, the ordering will be deterministically derived from name and | ||
// namespace of the `WasmPlugins`. | ||
google.protobuf.Int64Value priority = 10; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the default priority value? In your scenario above, what if I specify one Wasm resource without priority and one with a priority of 1000 (or 10) within the same phase, how do I know which is prioritized first?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Defaults to 0
, I added a sentence to the comment. Thank you!
LGTM overall! would like the following to be addressed:
|
So higher priority means later in the filter chain? That sounds confusing |
@mandarjog thanks, SG! I agree with @dgn - I had thought priority higher means it is first/earlier in the filter chain. Let's clarify default |
@dgn let’s clarify the comment and merge it! |
@linsun please lgtm it since you have reviewed it, John and Brian have already indicated so. |
adding label until we get 3 lgtms. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - pending the priority comment clarification of default.
extensions/v1alpha1/wasm.proto
Outdated
// Criteria used to select the specific set of pods/VMs on which | ||
// this plugin configuration should be applied. If omitted, this | ||
// configuration will be applied to all workload instances in the same | ||
// namespace. If omitted, the `WasmPlugin` will be applied to all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duplicate comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
// the `url` field contains the URL of an OCI image that is referenced | ||
// by tag, the reference will be replaced by the value of the `sha256` | ||
// field (using the `@sha256:` notation). | ||
optional string sha256 = 3; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is definitely more confusing than is necessary.
For non-OCI usage, I'm assuming this is treated as a checksum, and used to verify that what you fetch is what you wanted to fetch. For the OCI usage it is used to fetch a specific version right? Not just as a checksum.
My assumptions would be for OCI:
- If @sha256 is set in the URL, we raise an error if sha256 is set in the field to something different
- If sha is set (either in URL or field), we also verify that the downloaded image has the correct sha.
- If sha field is set, but not in the URL, the image is still downloaded by tag and sha verified after download (not sure about this one).
It is confusing to have fields used for both fetch and verify, better to make it more explicit and have the sha field always used just for verification.
extensions/v1alpha1/wasm.proto
Outdated
// Insert plugin before Istio authentication filters. | ||
AUTHN = 1; | ||
|
||
// Insert plugin before Istio authorization filters. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this also after Istio authentication filters? Maybe specify that if so to be explicit (same with STATS being after authorization filters).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's also after the istio auth filters. I made it more explicit, also in the comment for STATS below
// mirroring K8s behaviour. | ||
// | ||
// buf:lint:ignore ENUM_VALUE_UPPER_SNAKE_CASE | ||
enum PullPolicy { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need an UNSPECIFIED = 0 here.
Otherwise I think even explicitly setting it to IfNotPresent would get overridden if latest is in the tag, since we couldn't distinguish between default and explicit set to IfNotPresent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I'll add UNSPECIFIED so we can match k8s behaviour
- Clarify default for priority - Add UNSPECIFIED_POLICY value - Change sha256 behaviour
I'm happy with this if you want to remove the hold Mandar. |
/cherry-pick release-1.10 |
@mandarjog: once the present PR merges, I will cherry-pick it on top of release-1.10 in a new PR and assign it to you. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@mandarjog: new pull request created: #2067 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/cherry-pick release-1.11 |
@mandarjog: new pull request created: #2068 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Implementing Design Doc: Wasm Extensions API