Skip to content

Commit

Permalink
doc: add proposal for mcs with native service name
Browse files Browse the repository at this point in the history
Signed-off-by: jwcesign <[email protected]>
  • Loading branch information
jwcesign committed Nov 21, 2023
1 parent 50b0c51 commit cb6d424
Show file tree
Hide file tree
Showing 3 changed files with 201 additions and 74 deletions.
275 changes: 201 additions & 74 deletions docs/proposals/service-discovery/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,13 +3,12 @@ title: Service discovery with native Kubernetes naming and resolution
authors:
- "@bivas"
- "@XiShanYongYe-Chang"
- "@jwcesign"
reviewers:
- "@RainbowMango"
- "@GitHubxsy"
- "@Rains6"
- "@jwcesign"
- "@chaunceyjiang"
- TBD
approvers:
- "@RainbowMango"

Expand All @@ -22,7 +21,7 @@ update-date: 2023-08-19

## Summary

With the current `ServiceImportController` when a `ServiceImport` object is reconciled, the derived service is prefixed with `derived-` prefix.
In multi-cluster scenarios, there is a need to access services across clusters. Currently, Karmada support this by creating derived service(with `derived-` prefix, ) in other clusters to access the service.

This Proposal propose a method for multi-cluster service discovery using Kubernetes native Service, to modify the current implementation of Karmada's MCS. This approach does not add a `derived-` prefix when accessing services across clusters.

Expand All @@ -32,12 +31,15 @@ This Proposal propose a method for multi-cluster service discovery using Kuberne
This section is for explicitly listing the motivation, goals, and non-goals of
this KEP. Describe why the change is important and the benefits to users.
-->

Having a `derived-` prefix for `Service` resources seems counterintuitive when thinking about service discovery:
- Assuming the pod is exported as the service `foo`
- Another pod that wishes to access it on the same cluster will simply call `foo` and Kubernetes will bind to the correct one
- If that pod is scheduled to another cluster, the original service discovery will fail as there's no service by the name `foo`
- To find the original pod, the other pod is required to know it is in another cluster and use `derived-foo` to work properly

If Karmada supports service discovery using native Kubernetes naming and resolution (without the `derived-` prefix), users can access the service using its original name without needing to modify their code to accommodate services with the `derived-` prefix.

### Goals

- Remove the "derived-" prefix from the service
Expand All @@ -49,43 +51,6 @@ Having a `derived-` prefix for `Service` resources seems counterintuitive when t

## Proposal

Following are flows to support the service import proposal:

1. `Deployment` and `Service` are created on cluster member1 and the `Service` imported to cluster member2 using `ServiceImport` (described below as [user story 1](#story-1))
2. `Deployment` and `Service` are created on cluster member1 and both propagated to cluster member2. `Service` from cluster member1 is imported to cluster member2 using `ServiceImport` (described below as [user story 2](#story-2))

The proposal for this flow is what can be referred to as local-and-remote service discovery. In the process handling, it can be simply distinguished into the following scenarios:

1. **Local** only - In case there's a local service by the name `foo` Karmada never attempts to import the remote service and doesn't create an `EndPointSlice`
2. **Local** and **Remote** - Users accessing the `foo` service will reach either member1 or member2
3. **Remote** only - in case there's a local service by the name `foo` Karmada will remove the local `EndPointSlice` and will create an `EndPointSlice` pointing to the other cluster (e.g. instead of resolving member2 cluster is will reach member1)

Based on the above three scenarios, we have proposed two strategies:

- **RemoteAndLocal** - When accessing Service, the traffic will be evenly distributed between the local cluster and remote cluster's Service.
- **LocalFirst** - When accessing Services, if the local cluster Service can provide services, it will directly access the Service of the local cluster. If a failure occurs in the Service on the local cluster, it will access the Service on remote clusters.

> Note: How can we detect the failure?
> Maybe we need to watch the EndpointSlices resources of the relevant Services in the member cluster. If the EndpointSlices resource becomes non-existent or the statue become not ready, we need to synchronize it with other clusters.
> As for the specific implementation of the LocalFirst strategy, we can iterate on it subsequently.
This proposal suggests using the [MultiClusterService API](https://github.com/karmada-io/karmada/blob/24bb5829500658dd1caeea16eeace8252bcef682/pkg/apis/networking/v1alpha1/service_types.go#L30) to enable cross-cluster service discovery. To avoid conflicts with the previously provided [prefixed cross-cluster service discovery](./../networking/multiclusterservice.md#cross-clusters-service-discovery), we can add an annotation on the MultiClusterService API with the key `discovery.karmada.io/strategy`, its value can be either `RemoteAndLocal` or `LocalFirst`.

```yaml
apiVersion: networking.karmada.io/v1alpha1
kind: MultiClusterService
metadata:
name: foo
annotation:
discovery.karmada.io/strategy: RemoteAndLocal
spec:
types:
- CrossCluster
range:
clusterNames:
- member2
```
### User Stories (Optional)

<!--
Expand All @@ -97,27 +62,16 @@ bogged down.

#### Story 1

As a Kubernetes cluster member,
I want to access a service from another cluster member,
So that I can communicate with the service using its original name.
**Background**: The Service named `foo` is created on cluster member1 and imported to cluster member2 using `ServiceImport`.
As a user of a Kubernetes cluster, I want to be able to access a service whose corresponding pods are located in another cluster. I hope to communicate with the service using its original name.

**Scenario**:

1. Given that the `Service` named `foo` exists on cluster member1
2. And the `ServiceImport` resource is created on cluster member2, specifying the import of `foo`
3. When I try to access the service inside member2
4. Then I can access the service using the name `foo.myspace.svc.cluster.local`
1. When I try to access the service inside member2, I can access the service using the name `foo.myspace.svc.cluster.local`

#### Story 2

As a Kubernetes cluster member,
I want to handle conflicts when importing a service from another cluster member,
So that I can access the service without collisions and maintain high availability.

**Background**: The Service named `foo` is created on cluster member1 and has a conflict when attempting to import to cluster member2.
Conflict refers to the situation where there is already a `Service` `foo` existing on the cluster (e.g. propagated with `PropagationPolicy`), but we still need to import `Service` `foo` from other clusters onto this cluster (using `ServiceImport`)
As a user of a Kubernetes cluster, I want to access a service that has pods located in both this cluster and another. I expect to communicate with the service using its original name, and have the requests routed to the appropriate pods across clusters.

**Scenario**:

Expand All @@ -138,15 +92,20 @@ This might be a good place to talk about core concepts and how they relate.
### Risks and Mitigations

<!--
What are the risks of this proposal, and how do we mitigate?
What are the risks of this proposal, and how do we mitigate?
How will security be reviewed, and by whom?
How will UX be reviewed, and by whom?
Consider including folks who also work outside the SIG or subproject.
-->
Adding a `Service` that resolve to a remote cluster will add a network latency of communication between clusters.

Adding a `Service` that resolve to a remote cluster will add a network latency of communication between clusters.

### Feature gate

This feature is an experimental feature, so we add a feature gate to control the global enablement.

## Design Details

Expand All @@ -159,40 +118,208 @@ proposal will be implemented, this is the place to discuss them.

### API changes

The optimization design for the MultiClusterService API needs to be further iterated and improved, such as fixing the annotation `discovery.karmada.io/strategy` in the spec.
This proposal proposes two new fields `ServerLocaltionClusters` and `ClientLocationClusters` in the `MultiClusterService` API.

```go
type MultiClusterService struct {
metav1.TypeMeta `json:",inline"`
metav1.ObjectMeta `json:"metadata,omitempty"`

// Spec is the desired state of the MultiClusterService.
Spec MultiClusterServiceSpec `json:"spec"`

// Status is the current state of the MultiClusterService.
// +optional
Status corev1.ServiceStatus `json:"status,omitempty"`
}

type MultiClusterServiceSpec struct {
// Types specifies how to expose the service referencing by this
// MultiClusterService.
// +required
Types []ExposureType `json:"types"`

// Ports is the list of ports that are exposed by this MultiClusterService.
// No specified port will be filtered out during the service
// exposure and discovery process.
// All ports in the referencing service will be exposed by default.
// +optional
Ports []ExposurePort `json:"ports,omitempty"`

// ServerLocaltionClusters specifies the clusters which will provide the service backend.
// If leave it empty, we will collect the backend pods' IP from all clusters and sync them to the ClientLocationClusters.
// +optional
ServerLocaltionClusters []string `json:"serverLocationClusters,omitempty"`

// ClientLocationClusters specifies the clusters which will request the service.
// If leave it empty, the service will be exposed to all clusters.
// +optional
ClientLocationClusters []string `json:"clientLocationClusters,omitempty"`
}

// ExposureType describes how to expose the service.
type ExposureType string

const (
// ExposureTypeCrossCluster means a service will be accessible across clusters.
ExposureTypeCrossCluster ExposureType = "CrossCluster"

// ExposureTypeLoadBalancer means a service will be exposed via an external
// load balancer.
ExposureTypeLoadBalancer ExposureType = "LoadBalancer"
)
```

### General Idea
With this API, we will:
* Use `ServerLocationClusters` to specify the member clusters for propagating services from the Karmada control plane.
* Use `ClientLocationClusters` to specify the `EndpointSlice` resources for synchronization from `ServerLocationClusters`.

Before delving into the specific design details, let's first take a look from the user's perspective at what preparations they need to make.
For example, if we want access foo service which are localted in member2 from member3 , we can use the following yaml:
```yaml
apiVersion: v1
kind: Service
metadata:
name: foo
spec:
ports:
- port: 80
targetPort: 8080
selector:
app: foo
---
apiVersion: networking.karmada.io/v1alpha1
kind: MultiClusterService
metadata:
name: foo
spec:
types:
- CrossCluster
range:
serverLocationClusters:
- member2
clientLocationClusters:
- member3
```
### Implementation workflow
#### Service propagation
The process of propagating Service from Karmada control plane to member clusters is as follows:
![img](./statics/mcs-svc-sync.png)
1. mcs-controller will list&watch `Service` and `MultiClusterService` resources from Karmada control plane.
1. Once there is same name MultiClusterService and Service, mcs-controller will create the Work(corresponding to `Service`), the target cluster namespace is all the clusters in filed `spec.serverLocationClusters` and `spec.clientLocationClusters`.
1. The Work will be synchronized with the member clusters. After synchronization, `EndpointSlice`` will be created in member clusters.

#### `EndpointSlice` synchronization

The process of synchronizing `EndpointSlice` from `ServerClusters` to `ClientClusters` is as follows:
![img](./statics/mcs-eps-sync.png)

1. mcs-eps-controller will list&watch `MultiClusterService`.
1. mcs-eps-controller will list&watch `EndpointSlice` from `MultiClusterService`'s `spec.serverLocationClusters`.
1. mcs-eps-controller will creat the corresponding Work for each `EndpointSlice` in the cluster namespace of `MultiClusterService`'s `spec.clientLocationClusters`.
When creating the Work, in order to facilitate problem investigation, we should add following annotation to record the original `EndpointSlice` information:
* `mcs.karmada.io/eps-cluster`: the cluster name of the original `EndpointSlice`.
* `mcs.karmada.io/eps-generation`: the generation of the original `EndpointSlice`.
1. Karmada will sync the `EndpointSlice` to the member clusters.

But, there is one point to note that, assume I have following configuration:
```yaml
apiVersion: v1
kind: Service
metadata:
name: foo
spec:
ports:
- port: 80
targetPort: 8080
selector:
app: foo
---
apiVersion: networking.karmada.io/v1alpha1
kind: MultiClusterService
metadata:
name: foo
spec:
types:
- CrossCluster
serverLocationClusters:
- member1
- member2
clientLocationClusters:
- member2
```

1. The user creates a foo Deployment and Service on the Karmada control panel, and creates a PropagationPolicy to distribute them into the member cluster member1.
When create the corresponding Work, Karmada should only sync the exists `EndpointSlice` in `member1` to `member2`.

![image](statics/user-operation-01.png)
### Components change

1. The user creates an MCS object on the Karmada control plane to enable cross-cluster service foo. In this way, the service on cluster member2 can access the foo Service on cluster member1.
#### karmada-controller

![image](statics/user-operation-02.png)
* Add mcs-controller to support reconcile `MultiClusterService` and Clusters, including creation/deletion/updating.
* Add mcs-eps-controller to support reconcile `MultiClusterService` and Clusters, sync `EndpointSlice` from `ServerClusters` to `ClientClusters`.

Then, present our specific plan design.
### Status Record

1. When the `mcs-controller` detects that a user has created a `MultiClusterService` object, it creates a `ServiceExport` in the Karmada control plane and propagates it to the source clusters through creating a `ResourceBinding` (the source clusters can obtain this via the Service associated with `ResourceBinding`).
We should have following Condition in `MultiClusterService`:
```go
MCSServiceAppliedConditionType = "ServiceApplied"
![image](statics/design-01.png)
MCSEndpointSliceCollectedCondtionType = "EndpointSliceCollected"
1. Depending on the existing MCS atomic capabilities, the `service-export-controller` will collect the `EndpointSlices` related to `foo` Service into the Karmada control plane.
MCSEndpointSliceAppliedCondtionType = "EndpointSliceApplied"
```

![image](statics/design-02.png)
`MCSServiceAppliedConditionType` is used to record the status of `Service` propagation, for example:
```yaml
status:
conditions:
- lastTransitionTime: "2023-11-20T02:30:49Z"
message: Service is propagated to target clusters.
reason: ServiceAppliedSuccess
status: "True"
type: ServiceApplied
```

`MCSEndpointSliceCollectedCondtionType` is used to record the status of `EndpointSlice` collection, for example:
```yaml
status:
conditions:
- lastTransitionTime: "2023-11-20T02:30:49Z"
message: Failed to list&watch EndpointSlice in member3.
reason: EndpointSliceCollectedFailed
status: "True"
type: EndpointSliceCollected
```

`MCSEndpointSliceAppliedCondtionType` is used to record the status of `EndpointSlice` synchronization, for example:
```yaml
status:
conditions:
- lastTransitionTime: "2023-11-20T02:30:49Z"
message: EndpointSlices are propagated to target clusters.
reason: EndpointSliceAppliedSuccess
status: "True"
type: EndpointSliceApplied
```

1. The `mcs-controller`, on the Karmada control plane, creates a `ResourceBinding` to propagate Service and EndpointSlice to destination clusters. This is done considering that some target Services already exist in certain destination clusters. Therefore, it's necessary to confirm the specific destination cluster based on the strategy specified in the `MultiClusterService` object.
### Metrics Record

- If there is a Service existing on the target cluster, there is no need to resynchronize the EndpointSlices exported from this cluster to the cluster. Only synchronize the EndpointSlices received from other clusters.
- If there is no Service on the target cluster, both the Service and the EndpointSlices collected from other clusters need to be synchronized to that cluster.
For better monitoring, we should have following metrics:

![image](statics/design-03.png)
* `mcs_sync_svc_duration_seconds` - The duration of syncing `Service` from Karmada control plane to member clusters.
* `mcs_sync_eps_duration_seconds` - The time it takes from detecting the EndpointSlice to creating/updating the corresponding Work in a specific namespace.

At this point, the entire process is complete, and `foo` Service can now be accessed across clusters.
### Development Plan

![image](statics/access.png)
* API definition, including API files, CRD files, and generated code. (1d)
* For mcs-controller, List&watch mcs and service, reconcile the work in execution namespace. (5d)
* For mcs-controller, List&watch cluster creation/deletion, reconcile the work in corresponding cluster execution namespace. (10)
* For mcs-eps-controller, List&watch mcs, collect the corresponding EndpointSlice from `ServerLocationClusters` and create/update the corresponding Work. (5d)
* For mcs-eps-controller, List&watch cluster creation/deletion, reconcile the EndpointSlice's work in corresponding cluster execution namespace. (10d)
* If cluster gets unhealth, mcs-eps-controller should delete the EndpointSlice from all the cluster execution namespace. (5d)

### Test Plan

Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit cb6d424

Please sign in to comment.