-
Notifications
You must be signed in to change notification settings - Fork 337
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automatic reachableServices #6551
Comments
|
I think just relying on 'MeshTrafficPermissions' might not be sufficient. We could do the combination of both Also, 2nd idea is to use stats of communication, it's less reliable but we could do something more like
|
Can you elaborate on this point? |
I think this could just be solely a new feature, likely a filter, added to MeshHTTPRoute. Where you have a final "match all" rule and then configure a direct return of 401/403. |
Let's say I'm a mesh operator and have a Kuma deployment with 100 services and no MeshTrafficPermission nor reachable services. I started to see perf struggles because of lack of reachable services. Current options: reachable services. Proposed solution: graph from MeshTrafficPermission I'm not saying the proposed solution is bad, I like it. We just need to figure out how we can make it scalable inside the organization. |
1
Note: You don't have no MeshTrafficPermission as this would allow no-one (because of 0 trust being the default). What you'd have is 1 allow all permission.
I think it makes sense though here's a couple of remarks:
Some service owners would go "well I can't do this because I run a public service I'm not sure who consumes this". I think there's a few things to keep in mind: a. We don't need a perfect subset we just need to prune things down to make conf slimmer 2
At the moment if you have a Deny MTP the envoy configuration will still have a cluster. So someone could learn about services that exist by just looking at their envoy dump. I feel like this might not be super great security wise (you shouldn't be able to know about things you can't talk to). 3
Yeah I agree this is a different feature altogether I'd quite like here to stay with each 4
This 2nd idea is something we talked about but I also think it's a whole different feature than this. |
Yes, ok. That does not change my point.
I don't think it is. Correct me if I'm wrong, but Service Map is potentially such a tool, but
Ideally, the GUI would have a tab that can generate
IMHO this is very rarely the case unless there is strict security already in place or there is a tooling to check it.
👍 |
IMO we can rely solely on Mesh*Route and avoid making decisions based on MeshTrafficPermission at all. Deciding outbound clusters based on MeshTrafficPermission has a few downsides:
|
Can you clarify what you're proposing with "rely solely on Mesh*Route"? That is, how do we "rely solely on Mesh*Route" "to figure who receives the configuration for which service"? |
We configure the outbound clusters only if a MeshTCPRoute (or MeshHTTPRoute) matches your service. If there are no MeshTCPRoutes or MeshHTTPRoutes matching your service, your service has 0 outbound clusters and can't consume other services. Basically Mesh*Route policy is a single source of information on what outbounds your service requires. So if you create: apiVersion: kuma.io/v1alpha1
kind: MeshTCPRoute
metadata:
name: route-2
namespace: kuma-system
spec:
targetRef:
kind: MeshService
name: client
to:
- targetRef:
kind: MeshService
name: backend
- targetRef:
kind: MeshService
name: web then the Note: we have to adjust the Mesh*Route policy to not require explicitly listing services when the apiVersion: kuma.io/v1alpha1
kind: MeshTCPRoute
metadata:
name: route-2
namespace: kuma-system
spec:
targetRef:
kind: MeshService
name: client
to:
- targetRef:
kind: Mesh |
Aren't these all things that can be solved by just generating clusters differently? The principal question IMO is more what are the semantics of MeshHTTPRoute and MeshTrafficPermission in Kuma?
IMO neither is better. It's about the particular semantics of "deny traffic". Why should service A, that service B shouldn't ever call or even know about, return a 403 error? Why not just cut traffic off completely.
Not sure about the capabilities of RBAC filters, but again, why can't we just create the RBAC filters based on more than one type of resource.
I don't get why this makes no sense. If I want to deny traffic, I create a MeshTrafficPermission to block it. If the only way to guarantee this, because mTLS is off, is to generate Envoy config without clusters than we just do that. The trend for meshes AFAICT and IMO more intuitive semantics of an "HTTP route" is that they modify HTTP request/responses. The idea of needing a "do nothing" route is IMO counterintuitive. They do not directly correspond to Envoy resources like Routes or Clusters and IMO the average user won't expect them to. We can just use both MeshTrafficPermission and Mesh*Routes to generate Envoy resources. Needing a default resource is fraught and clunky IMO, and should be avoided. If we want to optimize performance and reduce the amount of clusters we create, IMO this optimization maps much more cleanly to the idea of "permission to contact a service". Not "what happens to my HTTP request and response as it moves through my mesh". But I think I've said all that before so I'll leave this comment as my opinion on default deny or default allow with route resources. |
It's a false sense of security, clients still can consume the service using ClusterIP (probably even using Kubernetes DNS) unless you disable passthrough cluster mesh-wide. |
I was thinking we don't want to generate clusters on the client side at all (to reduce the size of the envoy config). Do you have something else in mind? |
100% This is exactly like return 404 when you don't have access to a repo. You shouldn't know something you don't have access to exists.
I don't think that's dramatic. It's the consumer's problem that they are trying to call you.
Yes I see this as a feature you don't need mTLS anymore to use MTP. With the caveat that it's less secure than when you are using mTLS.
Yes I think as we go with routes being as close as possible to GatewayAPI and there's no such thing as a route without a destination in GatewayAPI I'd stay away from this. |
Anything using plaintext is a false sense of security :) |
I think this is applicable to the "edge" use case when clients are external. In our case, it's internal infrastructure, there is no need for an envoy proxy to hide information from another envoy proxy. Also, think about troubleshooting, if users misconfigure MeshTrafficPermission and there is no traffic between A and B, the only thing they see in traces/metrics is 404 errors.
True, but if you have clients that use k8s hostnames |
What about |
I'm not convinced by this argument. In large enough organisations even "internal infrastructure" has boundaries of knowledge. For the 2nd part I think this is easily fixable by looking at the inspect apis and educating people. Anyway that's probably a longer discussion but it's not uncommon that things show as non-existent when you don't have access to it and for good reasons.
Hmm isn't this already the behaviour if the destination service is not in the mesh? And then wouldn't it be quite easy for the destination service in HTTP to check that the source traffic comes from Envoy to minimize this issue (I think we even mentioned we could make MTP work on non mTLS traffic by leveraging headers)?
IFAIK MTP works for ES if that's not the case that's a bug. |
@jakubdyszkiewicz suggest having a |
How is this different to reachable services? |
By using MeshSubset as a targetRef you can more easily add it a list of services. But I agree it's very similar to existing reachableServices |
but we do need that Route on the Envoy side for the traffic to flow. So you're for an implicit
re:
we will know if this is a real 404 or a 404 caused by missing cluster. I think we're fine with that as long as we surface the actual reason to the user. |
I think this is double the work (it's exactly what I did in the previous company). I assume here that you need a default to be able to route everywhere (so for onboarding the mesh you don't need to start with everything configured from the get-go), then you define which services you want to talk to and then you define who can talk to you. Basing this on only MeshTrafficPermission means we implicitly define the outbounds and we cut down on one step. It's usually pretty easy to know which service you call in your service so this might be the right approach. But tying this to MeshHTTPRoute changes the meaning of the policy a bit. |
How do large orgs migrate to Kuma? Can we only enable ingress to gather stats and then based on these stats enable egress with reachable services? |
I'm gonna start working on a MADR for this. |
I'm for: when I install my mesh, no resources are created by default and traffic continues to flow as it did before. I suppose you can call this an implicit allow all but "it can't be deleted by the user" is kind of nonsensical because there's nothing to delete in the first place. It's the same idea as |
next points from meeting with Charly:
|
Description
Reachable services is an ok optimization but it's very manual and annoying.
We rely on TrafficRoutes to populate all the clusters. So if you don't have a TrafficRoute there's nothing you can do (unfortunately this makes it impossible to only use MeshHTTPRoute and MeshTCPRoute).
Could we use the existing policies to only add clusters for the clusters we want to talk to.
Open questions
To me we should be able to route everywhere.
This makes sense as you would expect to not have any information about a service you don't have access to.
Implementation idea
Use MeshTrafficPermissions to build the reverse graph of which services can reach which services.
Only add envoy clusters for the services you have permissions to.
This has the following benefits:
targetRef: Mesh -- from: targetRef: Mesh -- Allow
policy (which would likely be a good default).Drawbacks:
Note: If we do this you'd want
ShadowDeny
to be considered the same asAllow
when it comes to advertising clusters to consumers of a service. Because you want to get the opportunity of a serviceThe text was updated successfully, but these errors were encountered: