Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Predictable Release Names with the Helm Operator SDK #1094

Closed
giacomoguiulfo opened this issue Feb 12, 2019 · 16 comments · Fixed by #1818
Closed

Predictable Release Names with the Helm Operator SDK #1094

giacomoguiulfo opened this issue Feb 12, 2019 · 16 comments · Fixed by #1818
Assignees
Labels
kind/feature Categorizes issue or PR as related to a new feature. language/helm Issue is related to a Helm operator project

Comments

@giacomoguiulfo
Copy link

Type of question

General context and help around the operator-sdk.

Question

What did you do?

Deployed a Helm Operator for a chart and then created a CustomResource for it.

What did you expect to see?

The chart resources to be created under the release name ${my-cr-name}

What did you see instead? Under which circumstances?

The chart resources created under a release name with following format ${my-cr-name}-${uid}.

Environment

  • operator-sdk version:

    v0.5.0

  • Kubernetes version information:

    Client Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.2", GitCommit:"bb9ffb1654d4a729bb4cec18ff088eacc153c239", GitTreeState:"clean", BuildDate:"2018-08-08T16:31:10Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"darwin/amd64"}

    Server Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.3", GitCommit:"a4529464e4629c21224b3d52edfe0ea91b072862", GitTreeState:"clean", BuildDate:"2018-09-09T17:53:03Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}

  • Kubernetes cluster kind:

    MetalK8s

Additional context

Is there a way to specify the release name of the chart to be installed by the Helm operator, or a way to omit the uid as part of the release name?

I was hoping to have predictable names in my deployment for reference by other services.

@joelanford
Copy link
Member

@giacomoguiulfo Currently it is not possible to define a custom release name, but that seems like a solid use case to me.

We've thought about supporting an annotation on the CR that would allow a user to define their own release name, but since Helm release names cannot be easily changed, we would need to consider how to handle changes to that annotation once a release has been deployed:

  1. Fail the upgrade until the annotation is removed (or changed to match the existing release name)
  2. Uninstall the existing release and reinstall with the new release name.

The safest choice would be option 1. Does that seem reasonable?

@joelanford joelanford added language/helm Issue is related to a Helm operator project kind/feature Categorizes issue or PR as related to a new feature. labels Feb 12, 2019
@joelanford joelanford self-assigned this Feb 12, 2019
@giacomoguiulfo
Copy link
Author

Thanks for your reply @joelanford!

I agree, option 1 seems very reasonable to me.

If this is something you could work on and have the time for it, that'll be amazing. Otherwise, let me know and I will be happy to contribute.

@joelanford
Copy link
Member

joelanford commented Feb 12, 2019

@giacomoguiulfo Looking at this a little closer, it's not currently straightforward to implement a check for a release name change since we don't have access to the previous CR object during an update.

The deployed release name is currently available in the status, but relying on status to get the current state goes against Kubernetes design principles, which states:

Object status must be 100% reconstructable by observation. Any history kept must be just an optimization and not required for correct operation.

The correct way to solve this may be with a validating admission webhook that forbids changes to that annotation on updates. We don't currently have any examples of adding webhooks to an operator, but that's something we're looking into.

I think we'll have time to work on this, but it may not be finalized until we have a better idea about how we want to integrate webhooks in the SDK in general.

@joelanford
Copy link
Member

@theishshah ^^ Thought you might be interested since you're looking into the admission controllers and webhooks.

@giacomoguiulfo
Copy link
Author

@joelanford Couldn't the release name be the metadata.name of the CR?

@joelanford
Copy link
Member

@giacomoguiulfo I remembered we had some discussion about the release name in operator-framework/helm-app-operator-kit#49 and operator-framework/helm-app-operator-kit#57 before we integrated the Helm support into the SDK, so check that out for background about how we ended up with the current release name.

Perhaps the scenario of two CRs of different resource kinds having with the same name in the same scope is rare enough in practice that it wouldn't be a problem to just use the CR's metadata.name. One gotcha I know of is that changing the default release name in a future release will cause problems for users that upgrade from the current version of the helm-operator with active CRs and releases. I'd have to think about that and experiment to see if there are any other gotchas.

In the meantime, I noticed that the chart best practices doc has a section on resource naming that mentions a workaround for your use case if you control your helm chart templates.

However, there may be cases where it is known that there won’t be naming conflicts from a fixed name. In these cases a fixed name might make it easier for an application to find a resource such as a Service. If the option for fixed names is needed then one way to manage this might be to make the setting of the name explicit by using a service.name value from the values.yaml if provided:

apiVersion: v1
kind: Service
metadata:
  {{- if .Values.service.name }}
    name: {{ .Values.service.name }}
  {{- else }}
    name: {{ template "fullname" . }}
  {{- end }}

Lastly, there may be another way that we can support this use case and still allow different resource kinds to share the same release name. If each CR had its own storage backend that was guaranteed not to overlap with another CR's storage backend, there would be no chance of release name collisions since each CRs' releases would be stored in a separate backend.

We have an open PR that migrates the helm-operator to use Helm's ConfigMap storage backend (see #917), but it may be worth putting that on hold and considering a custom backend implementation instead (see #1100) since there are other benefits if we go that direction (e.g. store state objects in the same namespace as CRs, add owner references to the state objects)

@hasbro17 Thoughts?

@giacomoguiulfo
Copy link
Author

@joelanford Thanks for your response and all the helpful information. The templating trick you mentioned worked very well for my purposes!

@jayunit100
Copy link

Definetly need this as well

@joelanford
Copy link
Member

@jayunit100 could you explain your use case? Are you able to use the templating workaround I mentioned in my previous comment?

I'd just like to make sure we understand the use cases so we can get the design right when we're in the position to do this.

@snelis
Copy link

snelis commented Apr 10, 2019

I would also love to be able to control the release name.
The release name with the uid is also used in dependencies, so even tho im able to control the name with that trick in my own charts, for any dependencies i still get names like my-release-eniduuzz7apmkuy4s9kpn37od-memcached

@openshift-bot
Copy link

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci-robot openshift-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 9, 2019
@gnunn1
Copy link

gnunn1 commented Jul 11, 2019

As another use case, gitlab's helm chart creates a service account, gitlab-shared-secrets, to run a job which is what does the actual installation of the gitlab components into the namespace. It expects this job to run as root and it fails on OpenShift unless you give the SA the anyuid scc.

When installing the chart with tiller, it's easy to manage as I can simply assign the anyuid scc to the service account in advance before I deploy the chart, i.e.:

oc new-project gitlab

oc adm policy add-scc-to-user anyuid -z default
oc adm policy add-scc-to-user anyuid -z gitlab-shared-secrets

oc create secret tls apps-ocplab-com --cert=$TLS_PATH/fullchain.pem --key=$TLS_PATH/privkey.pem

helm upgrade --install gitlab gitlab/gitlab \
  --timeout 600 \
  --set global.hosts.domain=apps.ocplab.com \
  --set global.hosts.externalIP=10.10.10.10 \
  --set [email protected] \
  --set nginx-ingress.enabled=false \
  --set prometheus.install=false \
  --set certmanager.install=false \
  --set global.ingress.configureCertmanager=false \
  --set global.ingress.tls.secretName=apps-ocplab-com

However with the helm-operator the name of this service account is mangled with the uid, so when I deploy I have to watch for the service account to be created and then give the mangled version of the service account the anyuid scc. While certainly some of this rests on gitlab and their continued instance on using root everywhere, It's not a user friendly experience and it would be alleviated with a predictable name.

I will also add that working with deployed artifacts with these mangled names is not pleasant, if there was an option for the user to be able to specify a unique identifier to use instead of the uid it would be much easier since you could pick something memorable.

@pgerv12
Copy link

pgerv12 commented Jul 19, 2019

I understand Kubernetes allows for different resources to have the same name but if a Helm Operator is watching more than one resource, I think it would be good to have an annotation, env variable, flag or something to make the Operator act more like Helm and fail an installation if the name already exists -- even though it is a different resource. Then, users could have the options to have predictable names but understand the risk that all resources watched by the Operator need to be uniquely named OR have uniquely generated release names.

@joelanford
Copy link
Member

The team had some discussion about this today, and the current thinking is that we need to use both mutating and validating admission webhooks to transition to a new helm release name mechanism, allow arbitrary names set by an annotation, and support backwards-compatibility with existing CRs.

The idea is to use an annotation to set the release name:

  1. On creation, the mutating admission webhook will default the release name annotation to the CR name (unless it's already set).
  2. On creation, the validating admission webhook will fail the validation if a release already exists in the same namespace based on the CR's release name annotation.
  3. On update, the validating admission webhook will fail the validation if the CR release name annotation has changed.
  4. For existing CRs that don't have the release name annotation set, the mutating admission webhook will set it to the current auto-generated name.

@joelanford
Copy link
Member

/remove-lifecycle stale

@openshift-ci-robot openshift-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 19, 2019
@joelanford
Copy link
Member

@dmesser @giacomoguiulfo @jayunit100 @snelis @gnunn1 @pgerv12 I finally had some time to delve into this and I think I've figured out a way to implement a backwards-compatible change from the UID-based release name to one that uses the CR name only.

This also covers the duplicate release name problem by ensuring that an existing release's chart is the same one that is mapped to the GVK being reconciled.

Lastly, by getting the default release name changed right now, this will set us up nicely to move the duplicate checks to create and update time (rather than reconcile time) and to introduce customizations via annotations when we have support for the necessary validating admission webhooks.

Take a look, try it out, and let me know if you see anything I missed: #1818

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. language/helm Issue is related to a Helm operator project
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants