Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Single OCI registry: Stage 1 - container images #2882

Open
42 of 50 tasks
piontec opened this issue Oct 12, 2023 · 8 comments
Open
42 of 50 tasks

Single OCI registry: Stage 1 - container images #2882

piontec opened this issue Oct 12, 2023 · 8 comments

Comments

@piontec
Copy link

piontec commented Oct 12, 2023

Related epics

Tasks

We discussed the new registry architecture - RFC.

We have to get out of Docker Hub by the end of 2023 - this is called 'stage 1' in the RFC. This ticket tracks and groups tasks around that.

Step 0 - Preparations

Preview Give feedback
  1. 2 of 2
    componenet/registries epic/2023registries team/honeybadger
    piontec

Step 1 - Setting up the new registry

Preview Give feedback
  1. 9 of 11
    componenet/registries epic/2023registries team/honeybadger
    kubasobon
  2. 7 of 7
    componenet/registries epic/2023registries team/honeybadger
    marians piontec
  3. componenet/registries epic/2023registries team/honeybadger
  4. componenet/registries epic/2023registries team/honeybadger
  5. componenet/registries team/honeybadger
    piontec
  6. 3 of 6
    componenet/registries team/honeybadger
    marians piontec

Step 1.5 - Handling private images

Preview Give feedback
  1. componenet/registries team/honeybadger
  2. component/architect team/honeybadger team/null
    allanger piontec
  3. 12 of 12
    componenet/registries epic/2023registries team/honeybadger
    piontec
  4. componenet/registries epic/2023registries team/honeybadger
  5. epic/2023registries team/honeybadger
    piontec
  6. componenet/registries epic/2023registries team/honeybadger

Step 2 - Phase out Docker Hub usage

Preview Give feedback
  1. componenet/registries team/honeybadger
    uvegla
  2. epic/2023registries honeybadger/appplatform needs/rfc
  3. 0 of 2
    componenet/registries epic/2023registries team/honeybadger
  4. Ready area/kaas needs/refinement team/turtles
  5. Ready area/kaas needs/refinement team/turtles
  6. 0 of 44
    componenet/registries epic/2023registries team/honeybadger
  7. epic/2023registries team/honeybadger
    piontec
  8. 4 of 54
    team/honeybadger
    marians
  9. 0 of 4
    Ready area/kaas provider/cluster-api-aws provider/cluster-api-azure provider/vcd provider/vsphere team/turtles topic/capi
  10. 2 of 2
  11. epic/2023registries team/honeybadger
    uvegla
  12. componenet/registries epic/2023registries team/honeybadger
  13. team/honeybadger
    marians
  14. epic/2023registries team/honeybadger
  15. area/kaas provider/cluster-api-aws team/phoenix
    fiunchinho
  16. area/kaas team/phoenix
    fiunchinho
  17. area/kaas team/phoenix
    fiunchinho
  18. area/kaas team/phoenix
    fiunchinho
  19. area/kaas team/phoenix
    AndiDog fiunchinho
  20. area/kaas team/phoenix team/rocket
    AndiDog fiunchinho
  21. area/kaas team/rocket topic/capi
    glitchcrab
  22. area/kaas team/rocket topic/capi
    glitchcrab
  23. area/kaas team/rocket topic/capi
  24. area/kaas needs/refinement team/rocket topic/capi
    glitchcrab
  25. area/kaas team/rocket topic/capi

Step 3 - wrapping up

Preview Give feedback

Security enhancements

Preview Give feedback
@marians
Copy link
Member

marians commented Nov 17, 2023

We should announce to customers in time that this change is going to happen. Here are the details we should include IMO.

  • Our docker.io credentials are going to be removed from the hosts (date to be discussed)
  • When using images from docker.io in workloads, please provide your own ImagePullSecret. You can start doing this immediately, no need to wait.
  • Otherwise you might get affected by (probably) IP-based rate limiting. Actually Docker does not disclose details. By using your own credentials via ImagePullSecrets, you make sure that pulls are accounted for via your account(s) and you have more control over this.
  • Side note: most customers use their own registries and copy the images they need from Docker Hub over to their own registries. This is the method we would recommend.

Please let's verify this messaging and also fill in the missing date info.

Also in KaaS sync the suggestion was made that we approach customers directly who, based on our data, use images from docker.io.

In addition we may also give them a dedicated dashboard to help customers track which workloads are using docker.io images.

@marians
Copy link
Member

marians commented Nov 20, 2023

📉 Metrics on containers using docker.io images are now available in this Grafana Cloud dashboard

@piontec
Copy link
Author

piontec commented Nov 23, 2023

Ad Step 2. Notes from Whites:

MCs:

WCs:

@marians
Copy link
Member

marians commented Nov 27, 2023

I looked for internal documentation regarding registries, to be updated/reviewed. So far I found

@marians
Copy link
Member

marians commented Dec 5, 2023

Since it has gotten tedious to track PRs for more than 150 repos, I created this spreadsheet:

https://docs.google.com/spreadsheets/d/1CwK4a9Q9AKZq_n2rYNcMBfUaXAaxFXCm_gvzhMmjgCM/edit#gid=0

The idea is:

  1. CircleCI PR should be merged first. Extra issue: Modify repositories' CircleCI config to push to gsoci instead of Docker Hub #2979
  2. Then default domain can be adjusted via another PR. (Not all of the repos we treated in (1) actually have a chart or a default domain setting. In this case it is probably not a deployed workload that generates many pulls, so these aren't important.) - Extra issue: Modify app values to replace docker.io as container registry domain with gsoci.azurecr.io #3017
  3. Then a release is made

@alex-dabija
Copy link

alex-dabija commented Jan 10, 2024

I tested the creation of a CAPA cluster with the new registry (gsoci.azurecr.io) with the following configuration:

global:
  components:
    containerd:
      containerRegistries:
        docker.io:
          - endpoint: gsoci.azurecr.io
        gsoci.azurecr.io:
          - endpoint: gsoci.azurecr.io
  connectivity:
    availabilityZoneUsageLimit: 3
    network: {}
    topology: {}
  controlPlane: {}
  metadata:
    name: alex02
    organization: giantswarm
  nodePools:
    nodepool0:
      instanceType: m5.xlarge
      maxSize: 3
      minSize: 3
      rootVolumeSizeGB: 300
  providerSpecific: {}

I couldn't remove docker.io entirely because of how Helm merges dicts.

The cluster had the following containerd config:

version = 2

# recommended defaults from https://github.com/containerd/containerd/blob/main/docs/ops.md#base-configuration
# set containerd as a subreaper on linux when it is not running as PID 1
subreaper = true
# set containerd's OOM score
oom_score = -999
disabled_plugins = []
[plugins."io.containerd.runtime.v1.linux"]
# shim binary name/path
shim = "containerd-shim"
# runtime binary name/path
runtime = "runc"
# do not use a shim when starting containers, saves on memory but
# live restore is not supported
no_shim = false

[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
# setting runc.options unsets parent settings
runtime_type = "io.containerd.runc.v2"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
SystemdCgroup = true
[plugins."io.containerd.grpc.v1.cri"]
sandbox_image = "quay.io/giantswarm/pause:3.9"

[plugins."io.containerd.grpc.v1.cri".registry]
  [plugins."io.containerd.grpc.v1.cri".registry.mirrors]
    [plugins."io.containerd.grpc.v1.cri".registry.mirrors."docker.io"]
      endpoint = ["https://gsoci.azurecr.io",]
    [plugins."io.containerd.grpc.v1.cri".registry.mirrors."gsoci.azurecr.io"]
      endpoint = ["https://gsoci.azurecr.io",]
[plugins."io.containerd.grpc.v1.cri".registry.configs]

The cluster had the following images running:

kubectl --context gs-grizzly-alex02-clientcert get pods --all-namespaces -o jsonpath="{.items[*].spec['initContainers', 'containers'][*].image}" |\
tr -s '[[:space:]]' '\n' |\
sort |\
uniq -c
      4 docker.io/giantswarm/aws-ebs-csi-driver:v1.21.0
      3 docker.io/giantswarm/aws-ebs-csi-volume-limiter:0.1.0
      7 docker.io/giantswarm/cert-exporter:2.8.4
      1 docker.io/giantswarm/cert-manager-cainjector:v1.12.4
      2 docker.io/giantswarm/cert-manager-webhook:v1.12.4
      2 docker.io/giantswarm/cilium-operator-generic:v1.14.3
     42 docker.io/giantswarm/cilium:v1.14.3
      3 docker.io/giantswarm/configmap-reload:v0.8.0
      3 docker.io/giantswarm/coredns:1.11.1
      1 docker.io/giantswarm/csi-attacher:v4.4.1
      3 docker.io/giantswarm/csi-node-driver-registrar:v2.9.0
      1 docker.io/giantswarm/csi-provisioner:v3.6.1
      1 docker.io/giantswarm/csi-resizer:v1.8.1
      3 docker.io/giantswarm/etcd:3.5.6-0
      1 docker.io/giantswarm/etcd-kubernetes-resources-count-exporter:1.8.0
      1 docker.io/giantswarm/external-dns:v0.11.0
      3 docker.io/giantswarm/grafana-agent:v0.37.2
      1 docker.io/giantswarm/hubble-relay:v1.14.3
      1 docker.io/giantswarm/hubble-ui-backend:v0.12.1
      1 docker.io/giantswarm/hubble-ui:v0.12.1
      3 docker.io/giantswarm/kube-apiserver:v1.24.16
      3 docker.io/giantswarm/kube-controller-manager:v1.24.16
      3 docker.io/giantswarm/kube-scheduler:v1.24.16
      1 docker.io/giantswarm/kube-state-metrics:v2.10.0
      4 docker.io/giantswarm/livenessprobe:v2.11.0
      6 docker.io/giantswarm/node-exporter:v1.3.1
      2 docker.io/giantswarm/prometheus-config-reloader:v0.68.0
      1 docker.io/giantswarm/prometheus-operator:v0.68.0
      1 docker.io/giantswarm/prometheus:v2.47.1
      6 docker.io/giantswarm/promtail:2.8.4
      2 docker.io/giantswarm/vpa-admission-controller:0.14.0
      1 docker.io/giantswarm/vpa-recommender:0.14.0
      1 docker.io/giantswarm/vpa-updater:0.14.0
      1 public.ecr.aws/gravitational/teleport-distroless:14.1.3
      3 quay.io/giantswarm/amazon-eks-pod-identity-webhook-gs:v0.2.0
      3 quay.io/giantswarm/aws-cloud-controller-manager:v1.24.1
      6 quay.io/giantswarm/capi-node-labeler-app:0.3.4
      1 quay.io/giantswarm/cert-manager-controller:v1.12.4
      2 quay.io/giantswarm/chart-operator:2.35.2
      1 quay.io/giantswarm/cluster-autoscaler:v1.27.3
      6 quay.io/giantswarm/docker-kubectl:1.25.4
      2 quay.io/giantswarm/metrics-server:v0.6.4
      6 quay.io/giantswarm/net-exporter:1.18.0

I'm assuming that the docker.io images are pulled from gsoci.azurecr.io because it's the only endpoint configured.

## TODOs for KaaS
- [ ] Make the `gsoci.azurecr.io` registry the default;
- [ ] Update `imageRepository: docker.io/giantswarm` for the kubeadm control plane configuration in the cluster-aws chart;
- [ ] Update `imageRepository: docker.io/giantswarm` for the kubeadm control plane configuration in the cluster chart;
- [ ] Make `public.ecr.aws/gravitational/teleport-distroless:14.1.3` available from our registry;
- [ ] Replace `quay.io` with the new registry.

@weatherhog
Copy link

@piontec what is missing here for stage 1. Can this be closed?

@piontec
Copy link
Author

piontec commented Oct 2, 2024

Stage 1 is complete, 1.5 as well, but this is a big epic that covers next steps, that are not done yet. For me it's a great overview of where we are in this big story and I would keep this open, even if we're not currently actively working on it.

@teemow teemow moved this from Inbox 📥 to Blocked / Waiting ⛔️ in Roadmap Feb 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Blocked / Waiting ⛔️
Development

No branches or pull requests

4 participants