Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable '"additionalProperties": false' in values.yaml schema for cluster apps. #3187

Closed
piontec opened this issue Jan 24, 2024 · 8 comments
Closed
Labels
area/kaas Mission: Cloud Native Platform - Self-driving Kubernetes as a Service team/phoenix Team Phoenix team/rocket Team Rocket team/turtles Team Turtles

Comments

@piontec
Copy link

piontec commented Jan 24, 2024

As we learned, some of our apps (so far we've found only some cluster apps) have in their values.schema.json file the property "additionalProperties": false, (example.
This means, that if at any time someone will add a new global value to one of our catalog config maps or to the config repo and this value will be pushed to such an app, it won't install, as helm's validation (which can't be disabled if the schema file is present) will fail.

It seems that currently the recommended way is to allow additionalProperties and validate (and show a warning) before passing the config to helm.

CC @marians

@github-project-automation github-project-automation bot moved this to Inbox 📥 in Roadmap Jan 24, 2024
@piontec piontec changed the title Disable "allowedExtraKeys" in values.yaml schema for cluster apps. Disable '"additionalProperties": false' in values.yaml schema for cluster apps. Jan 24, 2024
@alex-dabija alex-dabija added team/rocket Team Rocket team/phoenix Team Phoenix team/turtles Team Turtles area/kaas Mission: Cloud Native Platform - Self-driving Kubernetes as a Service labels Jan 24, 2024
@erkanerol
Copy link

In cluster-vsphere and cluster-cloud-director, we don't have additionalProperties:false in the root level but it seems we have it for some section of the schema such as internal.

Should we remove it completely from all levels? I am not fully sure.

@AndiDog
Copy link

AndiDog commented Jan 25, 2024

This means we do no more checking of fields and customers + our engineers can easily make typos and mistakes. The schema becomes partially useless. People don't normally look at warnings, and CIs don't look at warnings at all. We also shouldn't add new, special validations for this case because then we're not using standard Helm commands anymore which may hinder our migration to pure Helm in the future.

Hence, here are some alternative suggestions:

  • In the cluster-xyz chart schema, reference a "Giant Swarm shared" JSON schema (e.g. by "$ref": "https://raw.githubusercontent.com/giantswarm/our-new-shared-schema/main/cluster-chart.schema.json"). We already previously had this idea of sharing schemas (in #wg-cluster-app-schema, I guess). Make affected teams CODEOWNER in that new repo so that breaking changes are avoided by having more reviewers (e.g. Honeybadger + Turtles + Phoenix).
  • Worse than the above idea: add a global.shared object which has additionalProperties=true so anything could go in there. Any new shared fields can be added in there. This still provides no validation at all, so I don't like this idea very much.

@erkanerol
Copy link

When we set additionalProperties=false, we will not be able to catch typos or wrong keys if they are not marked as required.

When we keep additionalProperties=true, we have to upgrade all cluster-<provider> apps to the latest before releasing a change in the GitOps or an operator.

I believe the latter is worse than the former one. For example, when you need to add a new parameter to the configmap created by cluster-apps-operator, you don't want to have to upgrade all clusters in advance. That would slow us down a lot.

@nprokopic
Copy link

nprokopic commented Feb 7, 2024

TL;DR We should make a clear distinction between configuration of "regular" apps (that deploy controllers/operators) and cluster-<provider> (and default-apps-<provider>) apps. They are used for different things and "global" configs (changed "globally") have quite different impacts. Universal config for "regular" apps (that deploy controllers/operators) should IMO not be universal for and shared with cluster-<provider> (and default-apps-<provider>) apps as well.

any time someone will add a new global value to one of our catalog config maps or to the config repo

When it comes to config being added to catalog config maps and to config repo, we should really make a clear distinction between "regular" apps (that deploy controllers/operators) and cluster-<provider> apps that contain CAPI+CAPx+HelmRelease+App resources (and default-apps-<provider> apps that go hand in hand with cluster-<provider> apps and will be soon merged into cluster-<provider> apps).

"Regular" apps, those that deploy controllers/operators:

  • They come from the control-plane, giantswarm, or default catalog.
    • There is a great benefit to have a single place to push config for all apps from these catalogs, because here we're talking about many dozens of apps at least (or 100+ maybe?).
  • They get configured from the config repo.
  • Updating a config of some app will just trigger an update of that app. One app might (or not) restart and that's it.
    • This is perfectly fine on the MC.
    • It's a bit less fine on the WC (depending on which app we're talking about and if the app got restarted), but at least the impact is limited to a single app (or multiple apps, depending on how many apps use that config).

On the other hand, cluster- apps (default-apps-<provider> apps will get merged into cluster-):

  • They come from the cluster catalog.
    • Only cluster-<provider> apps and default-apps-<provider> apps come from this catalog.
    • We have just 4 providers, so just 4 pairs of cluster-<provider> + default-apps-<provider> (soon just 4 cluster-<provider> apps), so it's quick and easy to update these when needed, so IMO here we don't waste doing doing needed changes.
    • The "regular" apps that we deploy from these by applying HelmRelease/App on the MC (e.g. cilium, core-dns, external-dns, cert-manager, etc) still come from "regular" app catalogs mentioned above (control-plane, giantswarm, or default catalog), so catalog config maps for those catalogs still work regularly for the "regular" apps that we deploy here from cluster-<provider> apps and default-apps-<provider> apps.
  • They don't get configured from the config repo.
  • Updating config of a cluster-<provider> app can easily trigger a rolling update of a workload cluster.
    • e.g. registry config for k8s components is in KubeadmControlPlane (resource in cluster-<provider> app), so changing that will trigger a rolling update of a workload cluster.
    • I don't think that we want to enable that global config changes (for all apps) trigger rolling update for almost all of our customer's workload clusters.

Now when it comes to cluster-<provider> apps schema:

This means we do no more checking of fields and customers + our engineers can easily make typos and mistakes. The schema becomes partially useless. People don't normally look at warnings, and CIs don't look at warnings at all. We also shouldn't add new, special validations for this case because then we're not using standard Helm commands anymore which may hinder our migration to pure Helm in the future.

💯 what Andreas wrote. JSON schema is natively integrated into Helm and this is very helpful. It gives us a strong validation and it's our 1st line of defence from incorrectly configured workload clusters. With JSON schema we can write simple Helm templates, because we can rely on the JSON schema validation and assume that Helm values that we are using in the template is just correct. Templates become easier to understand and maintain that way.

AFAIK cluster-<provider> apps (and default-apps-<provider>) get values injected from these places only:

  • cluster catalog config map. The values we have here are:
    • provider: not used anywhere and useless, because cluster-<provider> apps (and default-apps-<provider>) know which provider they are.
    • baseDomain: currently we use the same baseDomain for all workload clusters in a single MC, but in CAPI AFAIK we want to move away from this, and enable customers to have a separate base domain for every WC (if they want). So base domain can be specified on cluster creation, but even if it stays in cluster catalog config map it's an immutable config that cannot change.
    • managementCluster: as far as I could find, just used in Teleport config. Also this is immutable config.
  • <cluster name>-cluster-values config map that is created in the cluster-apps-operator.
    • Once we merge default-apps-<provider> into cluster-<provider>, a lot (most) of the config that is currently set in <cluster name>-cluster-values will be easy to set directly in the cluster-<provider> apps (so an operator will not have to set those values).
    • Most importantly, it's the "regular" apps from cluster-<provider> apps and default-apps-<provider> that need <cluster name>-cluster-values, cluster-<provider> apps and default-apps-<provider> themselves do not need <cluster name>-cluster-values.

So the only really useful/used config that gets injected into cluster-<provider> apps and default-apps-<provider> are base domain and management cluster name that get injected from cluster catalog config map, and both are immutable. So even this is more about defaulting some immutable value (for the sake of convenience I guess), and not about making something configurable.

When we keep additionalProperties=true, we have to upgrade all cluster- apps to the latest before releasing a change in the GitOps or an operator.

@erkanerol In these cases, which IMO should be very rare, AFAIK in cluster-apps-operator you can always check for cluster-<provider> and default-apps-<provider> version, and set the required values based on those versions, which IMO we should be even doing now, so we don't mutate existing workload clusters config with an MC component change. This problem existing here is IMO an architecture smell.

@T-Kukawka
Copy link
Contributor

i am having hard to to conclude what shall we do here?

@nprokopic
Copy link

i am having hard to to conclude what shall we do here?

When it comes to cluster-<provider> apps, IMO we should not do anything :)

To quote my TL;DR from above:

TL;DR We should make a clear distinction between configuration of "regular" apps (that deploy controllers/operators) and cluster-<provider> (and default-apps-<provider>) apps. They are used for different things and "global" configs (changed "globally") have quite different impacts. Universal config for "regular" apps (that deploy controllers/operators) should IMO not be universal for and shared with cluster-<provider> (and default-apps-<provider>) apps as well.

@T-Kukawka
Copy link
Contributor

T-Kukawka commented Oct 22, 2024

@alex-dabija can we close this? it is for Turtles/Rocket/Phoenix - current state for us is that having strict validation and protecting against the app platform rolling everything is actually a good thing, also i don't think any team really touched this

@alex-dabija
Copy link

@alex-dabija can we close this? it is for Turtles/Rocket/Phoenix - current state for us is that having strict validation and protecting against the app platform rolling everything is actually a good thing, also i don't think any team really touched this

Yes, it's fine to close this issue.

@github-project-automation github-project-automation bot moved this from Inbox 📥 to Done ✅ in Roadmap Oct 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/kaas Mission: Cloud Native Platform - Self-driving Kubernetes as a Service team/phoenix Team Phoenix team/rocket Team Rocket team/turtles Team Turtles
Projects
Archived in project
Development

No branches or pull requests

7 participants