Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Priority of autoconfigured mirrors #765

Open
thejoeejoee opened this issue Mar 3, 2025 · 1 comment · May be fixed by #766
Open

Priority of autoconfigured mirrors #765

thejoeejoee opened this issue Mar 3, 2025 · 1 comment · May be fixed by #766
Labels
bug Something isn't working

Comments

@thejoeejoee
Copy link
Contributor

thejoeejoee commented Mar 3, 2025

Spegel version

v0.0.30

Kubernetes distribution

custom kubeadm

Kubernetes version

1.31.5

CNI

cilium

Describe the bug

Currently, the configuration init container configures the order of mirrors as following:

- --mirror-targets
- http://$(NODE_IP):{{ .Values.service.registry.hostPort }}
- http://$(NODE_IP):{{ .Values.service.registry.nodePort }}

which leads to expected order in hosts.toml (order of hosts matters here):

server = 'internal'

[host.'http://10.245.0.49:30020']
capabilities = ['pull', 'resolve']

[host.'http://10.245.0.49:30021']
capabilities = ['pull', 'resolve'

What is the reasoning behind this order?

In my perception, using the NodePort as first mirror would be more meaningful,
since it's the responsibility of local Spegel to do the lookup for an image/blob.

Using the HostPort service as the first mirror introduces the unnecessary network hop
even if the local spegel is able to resolve the image via p2p.

Also, the infrastrusture schema doesn't correspond to the order, schema is showing that 30020 is local Spegel instance, which is not true, it's balanced HostPort.

Image

@thejoeejoee thejoeejoee added the bug Something isn't working label Mar 3, 2025
@phillebaba
Copy link
Member

The node port service was added a while back as a fallback if the local Spegel instance was not functioning properly. About three years ago there was a KEP which sadly was never implemented which would have allowed prefer local to be a setting. This would basically create a load balanced service which would always direct traffic locally and only fall back if the local Pod was unhealthy. My initial design idea was to add the node port service and then switch over to it completely when then KEP was implemented.

I think you may have mixed up the two ports when trying to understand the functionality. The port 30020 is the host port, and will only function if the local instance of Spegel is running. The port 30021 routes to the service which will route traffic to any other node in the cluster. No matter where traffic is routed the first request will have to try to lookup another node which has the requested layer. There is no guarantee that the node the service routes to actually has the requested layer.

You actually want the first request to be to the local Spegel instance because the networking is local. It is then up to the local instance to forward the request to the correct Spegel instance on another node. If you are using the fallback you could in theory make two hops to two different nodes which would most certainly increase the latency.

Does this makes sense?

I have actually been considering removing the fallback service or at lease disabling it by default because I see less value in having it. It increases complexity and may even make latency higher. I am unsure if it actually solves the problem I first thought of when adding it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants