You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Although Coil implements a high-available egress NAT, the connection
tracking states are lost when one of the egress NAT Pods is gone.
Linux tracks connection status by conntrack tables in netfilter, and we can read and edit
conntrack tables via netlink. There is even a program called conntrackd to export and
synchronize conntrack data between two servers.
With this capability, Coil can keep connections on egress NAT between Pod restarts.
How
To switch all connections from one NAT pod to another, Coil has to do a few things.
The new Pod should take over the global IP address of the old Pod.
Coil should stop advertising the global IP on the node of the old Pod and start it on the node of the new Pod.
This means that Coil should not assign the global IP address to the Pod.
Instead, Coil should assign a normal cluster-internal IP address to NAT Pods
and give them extra global IP addresses for NAT use. Those global IP addresses
float between NAT Pods, so we can call them floating addresses.
Below is a summary of the necessary changes.
We need a detailed design doc still.
Define a pool of floating addresses for egress NAT.
Assign floating addresses to egress NAT Pods and program routing.
Reprogram routing when the owner of a floating address is changed.
One idea is to change the Service endpoints.
Another idea is to get rid of Service for egress Pods and program routing in each client Pod.
Appropriately advertise floating addresses for the current owner Pods.
Implement some fast health-checking for failed Pods.
Often used are VRRP or BFD, but we can use any protocol.
Synchronize the conntrack status between egress NAT Pods
I guess this may contradict #274.
The routing and advertisement part of this needs to be implemented or backed by the main CNI.
We may decouple the relationship by, for example, exporting the floating address of a Pod
to an unused Linux routing table so that other programs can configure routing and advertisement
for the floating address. But who can implement such a feature besides Coil?
What
Although Coil implements a high-available egress NAT, the connection
tracking states are lost when one of the egress NAT Pods is gone.
Linux tracks connection status by conntrack tables in netfilter, and we can read and edit
conntrack tables via netlink. There is even a program called conntrackd to export and
synchronize conntrack data between two servers.
With this capability, Coil can keep connections on egress NAT between Pod restarts.
How
To switch all connections from one NAT pod to another, Coil has to do a few things.
This means that Coil should not assign the global IP address to the Pod.
Instead, Coil should assign a normal cluster-internal IP address to NAT Pods
and give them extra global IP addresses for NAT use. Those global IP addresses
float between NAT Pods, so we can call them floating addresses.
Below is a summary of the necessary changes.
We need a detailed design doc still.
Checklist
The text was updated successfully, but these errors were encountered: