-
Notifications
You must be signed in to change notification settings - Fork 507
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HTTPRoute: allow more match clauses #3205
HTTPRoute: allow more match clauses #3205
Conversation
Today, a route is limited to 16 routes with 8 matches each. This is problematic in real world environments. While its easy to split a route up (you can easily get the same behavior, and similar cognitive complexity by having 1 route per HTTPRoute), its quite hard to split up matches. For instance, https://github.com/istio/istio/blob/d7a9700d5eaa4e9728274b408623670f48deadb5/samples/bookinfo/gateway-api/bookinfo-gateway.yaml#L23-L38 is an example of a route matching the exposed APIs for a frontend. This is only a small tivial example, and it already uses 5. In our user base, we often see far beyond 8. Splitting these up is both complex for the user, and may actually lead to different behaviors if implementations treat route groups differently (for instance, if we add retry budget -- that could now be split; other core or implementation specific policies may behave similarly, this is just an example). The limit on 8 here seems quite small given the context of real usage, the other limits in the API (16 routes, 64 listeners), and the cost of manually working around it. Based on this, I propose we raise the limit to match Gateway listeners.
/retest |
@howardjohn Historically we've been against this because of the worst case scenarios this could lead to if you max out the length of every list within a list (16 rules * 8 matchers * 16 filters == 2048 possible filters in a Route). Increasing the matchers from 8 to 64 would turn that into 16384, which would obviously be problematic. With that said, we've got CEL now, so we could likely make a case for the following:
That would mean we're allowing significantly more flexibility here without actually increasing the max size of an individual route. If this ends up working reasonably well, we could apply the same principle elsewhere in the API. |
For my understanding, what specifically are we trying to constrain with the overall size limit? Is it...
If its the CEL limits, is the concern that its within the complexity limits now (which already take into account the 16*64), or that it could in the future if matches get more complex? I ask since it may impact the solution here |
All of the above. I'd also mention that it's possible that we'll add more types of matchers inside this list which will only increase the overall size and complexity, I don't want to get so close to a max size that we end up having no additional room to grow. I'm also strongly biased towards more smaller routes than a few very large routes. In general though, when we're talking about API compatibility, this is already a GA API, and significantly loosening the validation would be problematic. I think we can make a reasonably strong case in favor of a change that allows for more flexibility but still fits within the original constraints though. |
+1 for this change, this will reduce the friction users are facing when migrating from exiting APIs to Gateway API, who are used to authoring rules a specific way, and this is currently slowing down Gateway API adoption. @robscott's suggestion of adding a CEL validation to measure and limit total filters for a Route, is a great idea to account for the concerns of the size of the final object persisting in the API server cross linking previous issues raised by users in the past |
I've pushed up a change to limit the aggregate size. LMK what you think |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @howardjohn!
Sadly looks like we're running into kubernetes/kubernetes#120973 - CRD is invalid because rule is too complex |
That's only from the extra restriction fwiw. It works fine when we just allow it to be unbounded, even with the theoretical 16k entries. |
Honestly don't get how that rule can possibly be 100x over the limit.. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @howardjohn! Will defer to someone else for LGTM.
/approve
apis/v1/httproute_types.go
Outdated
@@ -119,6 +119,7 @@ type HTTPRouteSpec struct { | |||
// +optional | |||
// +kubebuilder:validation:MaxItems=16 | |||
// +kubebuilder:default={{matches: {{path: {type: "PathPrefix", value: "/"}}}}} | |||
// +kubebuilder:validation:XValidation:message="While 16 rules and 64 matches are allowed, the total matches must be less than 128",rule="(self.size() > 0 ? self[0].matches.size() : 0) + (self.size() > 1 ? self[1].matches.size() : 0) + (self.size() > 2 ? self[2].matches.size() : 0) + (self.size() > 3 ? self[3].matches.size() : 0) + (self.size() > 4 ? self[4].matches.size() : 0) + (self.size() > 5 ? self[5].matches.size() : 0) + (self.size() > 6 ? self[6].matches.size() : 0) + (self.size() > 7 ? self[7].matches.size() : 0) + (self.size() > 8 ? self[8].matches.size() : 0) + (self.size() > 9 ? self[9].matches.size() : 0) + (self.size() > 10 ? self[10].matches.size() : 0) + (self.size() > 11 ? self[11].matches.size() : 0) + (self.size() > 12 ? self[12].matches.size() : 0) + (self.size() > 13 ? self[13].matches.size() : 0) + (self.size() > 14 ? self[14].matches.size() : 0) + (self.size() > 15 ? self[15].matches.size() : 0) <= 128" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is definitely gross, but the least bad we can offer until CEL cost estimation allows us to use map
here (Kubernetes 1.30+). Here's the CEL playground link for future readers.
Actually, would really like @youngnick specifically to sign off on this, will add a hold until he's able to take a look. /hold |
LGTM Deferring review to Nick as per #3205 (comment) |
As long as the overall complexity isn't increased, this makes sense to me. Nice use of CEL to make sure that's the case. /lgtm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we do the same for GRPCRoute as well while we're making this change?
I think updating GRPCRoute to match would be great. Would approve in this PR or a follow up. |
Added gRPC route as well |
Thanks @howardjohn! /lgtm |
I think we've got enough other LGTMs on this one to remove the hold. /hold cancel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: howardjohn, robscott, sunjayBhatia The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
* HTTPRoute: allow more match clauses Today, a route is limited to 16 routes with 8 matches each. This is problematic in real world environments. While its easy to split a route up (you can easily get the same behavior, and similar cognitive complexity by having 1 route per HTTPRoute), its quite hard to split up matches. For instance, https://github.com/istio/istio/blob/d7a9700d5eaa4e9728274b408623670f48deadb5/samples/bookinfo/gateway-api/bookinfo-gateway.yaml#L23-L38 is an example of a route matching the exposed APIs for a frontend. This is only a small tivial example, and it already uses 5. In our user base, we often see far beyond 8. Splitting these up is both complex for the user, and may actually lead to different behaviors if implementations treat route groups differently (for instance, if we add retry budget -- that could now be split; other core or implementation specific policies may behave similarly, this is just an example). The limit on 8 here seems quite small given the context of real usage, the other limits in the API (16 routes, 64 listeners), and the cost of manually working around it. Based on this, I propose we raise the limit to match Gateway listeners. * Limit aggregate size * Drop to 128 and add tests * hacky * Unroll the loop * gRPCRoute and update comment * Fix grpcroute
The grpc implementation is incorrect here - the limit was never raised. |
Today, a route is limited to 16 routes with 8 matches each. This is problematic in real world environments.
While its easy to split a route up (you can easily get the same behavior, and similar cognitive complexity by having 1 route per HTTPRoute), its quite hard to split up matches. For instance, https://github.com/istio/istio/blob/d7a9700d5eaa4e9728274b408623670f48deadb5/samples/bookinfo/gateway-api/bookinfo-gateway.yaml#L23-L38 is an example of a route matching the exposed APIs for a frontend. This is only a small tivial example, and it already uses 5. In our user base, we often see far beyond 8.
Splitting these up is both complex for the user, and may actually lead to different behaviors if implementations treat route groups differently (for instance, if we add retry budget -- that could now be split; other core or implementation specific policies may behave similarly, this is just an example).
The limit on 8 here seems quite small given the context of real usage, the other limits in the API (16 routes, 64 listeners), and the cost of manually working around it. Based on this, I propose we raise the limit to match Gateway listeners.
What type of PR is this?
/kind feature
Does this PR introduce a user-facing change?: