Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dask-gateway error on deploy #479

Closed
tjcrone opened this issue Nov 27, 2019 · 11 comments
Closed

Dask-gateway error on deploy #479

tjcrone opened this issue Nov 27, 2019 · 11 comments

Comments

@tjcrone
Copy link
Contributor

tjcrone commented Nov 27, 2019

Getting a new error when trying to deploy the OOI deployment:

Downloading pangeo from repo https://pangeo-data.github.io/helm-chart/
Deleting outdated charts
UPGRADE FAILED
Error: render error in "pangeo-deploy/charts/pangeo/charts/dask-gateway/templates/web-proxy-deployment.yaml": template: pangeo-deploy/charts/pangeo/charts/dask-gateway/templates/web-proxy-deployment.yaml:23:28: executing "pangeo-deploy/charts/pangeo/charts/dask-gateway/templates/web-proxy-deployment.yaml" at <include (print .Template.BasePath "/secret.yaml") .>: error calling include: template: pangeo-deploy/charts/pangeo/charts/dask-gateway/templates/secret.yaml:9:19: executing "pangeo-deploy/charts/pangeo/charts/dask-gateway/templates/secret.yaml" at <required "gateway.proxyToken must be a 32 byte random string" .Values.gateway.proxyToken>: error calling required: gateway.proxyToken must be a 32 byte random string
Error: UPGRADE FAILED: render error in "pangeo-deploy/charts/pangeo/charts/dask-gateway/templates/web-proxy-deployment.yaml": template: pangeo-deploy/charts/pangeo/charts/dask-gateway/templates/web-proxy-deployment.yaml:23:28: executing "pangeo-deploy/charts/pangeo/charts/dask-gateway/templates/web-proxy-deployment.yaml" at <include (print .Template.BasePath "/secret.yaml") .>: error calling include: template: pangeo-deploy/charts/pangeo/charts/dask-gateway/templates/secret.yaml:9:19: executing "pangeo-deploy/charts/pangeo/charts/dask-gateway/templates/secret.yaml" at <required "gateway.proxyToken must be a 32 byte random string" .Values.gateway.proxyToken>: error calling required: gateway.proxyToken must be a 32 byte random string
Traceback (most recent call last):
  File "/home/circleci/repo/venv/bin/hubploy", line 11, in <module>
    load_entry_point('hubploy==0.1.0', 'console_scripts', 'hubploy')()
  File "/home/circleci/repo/venv/lib/python3.7/site-packages/hubploy/__main__.py", line 74, in main
    helm.deploy(args.deployment, args.chart, args.environment, args.namespace, args.set, args.version)
  File "/home/circleci/repo/venv/lib/python3.7/site-packages/hubploy/helm.py", line 116, in deploy
    version
  File "/home/circleci/repo/venv/lib/python3.7/site-packages/hubploy/helm.py", line 56, in helm_upgrade
    subprocess.check_call(cmd)
  File "/usr/local/lib/python3.7/subprocess.py", line 347, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['helm', 'upgrade', '--wait', '--install', '--namespace', 'ooi-staging', 'ooi-staging', 'pangeo-deploy', '-f', 'deployments/ooi/config/common.yaml', '-f', 'deployments/ooi/config/staging.yaml', '-f', 'deployments/ooi/secrets/staging.yaml', '--set', 'pangeo.jupyterhub.singleuser.image.tag=4d92f5f', '--set', 'pangeo.jupyterhub.singleuser.image.name=ooicloud.azurecr.io/ooi-pangeo-io-notebook']' returned non-zero exit status 1.
{"errors":[{"message":"Permission denied, wrong credentials","field":null,"help":null}]}
Exited with code 1

What is the gateway.proxyToken, and why isn't mine 32 bytes?

@TomAugspurger
Copy link
Member

TomAugspurger commented Nov 27, 2019

That'd be from #477, sorry.

Do you have dask-gateway config values in deployments/ooi/secrets/staging.yaml? That file is encrypted for me locally.

It'd be something like

pangeo:
  jupyterhub:
    hub:
      services:
        dask-gateway:
          apiToken: "<token1>"


  dask-gateway:
    gateway:
      proxyToken: "<token2>"
      auth:
        type: jupyterhub
        jupyterhub:
          apiToken: "<token1>"

Note that token1 is used in two places, under pangeo.jupyterhub.hub.services.dask-gateway.apiToken and pangeo.dsak-gateway.gateway.auth.jupyterhub.apiToken.

@jhamman may want a way for deployments to opt out of dask-gateway. Presumably helm has a way to do that.

@tjcrone
Copy link
Contributor Author

tjcrone commented Nov 27, 2019

Thanks @TomAugspurger. Definitely moving us forward. Here is a new error:

Saving 1 charts
Downloading pangeo from repo https://pangeo-data.github.io/helm-chart/
Deleting outdated charts
UPGRADE FAILED
Error: kind Secret with the name "ooi-staging-dask-gateway" already exists in the cluster and wasn't defined in the previous release. Before upgrading, please either delete the resource from the cluster or remove it from the chart
Error: UPGRADE FAILED: kind Secret with the name "ooi-staging-dask-gateway" already exists in the cluster and wasn't defined in the previous release. Before upgrading, please either delete the resource from the cluster or remove it from the chart
Traceback (most recent call last):
  File "/home/circleci/repo/venv/bin/hubploy", line 11, in <module>
    load_entry_point('hubploy==0.1.0', 'console_scripts', 'hubploy')()
  File "/home/circleci/repo/venv/lib/python3.7/site-packages/hubploy/__main__.py", line 74, in main
    helm.deploy(args.deployment, args.chart, args.environment, args.namespace, args.set, args.version)
  File "/home/circleci/repo/venv/lib/python3.7/site-packages/hubploy/helm.py", line 116, in deploy
    version
  File "/home/circleci/repo/venv/lib/python3.7/site-packages/hubploy/helm.py", line 56, in helm_upgrade
    subprocess.check_call(cmd)
  File "/usr/local/lib/python3.7/subprocess.py", line 347, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['helm', 'upgrade', '--wait', '--install', '--namespace', 'ooi-staging', 'ooi-staging', 'pangeo-deploy', '-f', 'deployments/ooi/config/common.yaml', '-f', 'deployments/ooi/config/staging.yaml', '-f', 'deployments/ooi/secrets/staging.yaml', '--set', 'pangeo.jupyterhub.singleuser.image.tag=4d92f5f', '--set', 'pangeo.jupyterhub.singleuser.image.name=ooicloud.azurecr.io/ooi-pangeo-io-notebook']' returned non-zero exit status 1.
{"errors":[{"message":"Permission denied, wrong credentials","field":null,"help":null}]}
Exited with code 1

I can delete this Secret, but it would be good if it was able to deal with this. My sense is that these might not have been tested enough before they were deployed. I guess we are testing it now.

@TomAugspurger
Copy link
Member

I can delete this Secret

I'm not sure where that original secret would have come from. Perhaps from the failed deployment? I don't think it'd be inheriting from a common config.

I'd try deleting it now, adding a dummy commit, and redeploying.

@tjcrone
Copy link
Contributor Author

tjcrone commented Nov 27, 2019

Yes, failed deployment. A helm upgrade on Azure can take a long time, I believe because of the length of time it can take to download a new image. So I sometimes need to run a workflow again after the first run finishes the image download to the cluster machines. I will figure this out. Thanks for your help and happy thanksgiving!

@jhamman
Copy link
Member

jhamman commented Nov 27, 2019

FWIW, I had a similar problem yesterday on the dev deployment (but not the other GCP hubs). I ended up cleaning up the gateway deployments in the following way:

namespace="dev-staging"
kubectl delete serviceaccount ${namespace}-dask-gateway -n ${namespace}
kubectl delete secret ${namespace}-dask-gateway -n ${namespace}
kubectl delete configmap ${namespace}-dask-gateway -n ${namespace}
kubectl delete rolebinding ${namespace}-dask-gateway -n ${namespace}
kubectl delete role ${namespace}-dask-gateway -n ${namespace}
kubectl delete service ${namespace}-dask-gateway -n ${namespace}
kubectl delete service gateway-api-${namespace}-dask-gateway -n ${namespace}
kubectl delete service scheduler-api-${namespace}-dask-gateway -n ${namespace}
kubectl delete service scheduler-public-${namespace}-dask-gateway -n ${namespace}
kubectl delete service web-public-${namespace}-dask-gateway -n ${namespace}
kubectl delete service web-api-${namespace}-dask-gateway -n ${namespace}
kubectl delete deployment gateway-${namespace}-dask-gateway -n ${namespace}
kubectl delete deployment scheduler-proxy-${namespace}-dask-gateway -n ${namespace}
kubectl delete deployment web-proxy-${namespace}-dask-gateway -n ${namespace}

Seeing that you ran into this too, I wonder if dask-gateway could do a better job of cleaning up after a failed deployment.

@tjcrone
Copy link
Contributor Author

tjcrone commented Nov 28, 2019

It's worth a lot (IWAL), @jhamman. Thank you! I will try this out after turkey tomorrow. Or, maybe Friday. Thank you very much for your help. Cheers.

@jcrist
Copy link
Member

jcrist commented Dec 2, 2019

I'm not sure where that original secret would have come from. Perhaps from the failed deployment? I don't think it'd be inheriting from a common config.

This is a secret created as part of the dask-gateway helm chart. Helm should be able to automatically delete resources from previous deployments provided they have the same name as the current deployment (which the do here). Our helm chart is fairly close to that of JupyterHub's, so I don't think this is a dask-gateway specific issue. I'm not sure what's going on, perhaps @yuvipanda would have an idea?

may want a way for deployments to opt out of dask-gateway. Presumably helm has a way to do that.

Yeah, we can do that with a conditional dependency. See https://helm.sh/docs/topics/charts/#tags-and-condition-fields-in-dependencies.

@jhamman
Copy link
Member

jhamman commented Dec 3, 2019

Yeah, we can do that with a conditional dependency. See https://helm.sh/docs/topics/charts/#tags-and-condition-fields-in-dependencies.

Let's move this topic to the Pangeo Helm Chart repo. I personally don't think this is something we want to do. Once the daskkubernetes service account is gone, turning off dask-gateway would give you a vanilla zero-to-jupyterhub deployment.

@jcrist
Copy link
Member

jcrist commented Dec 5, 2019

Helm should be able to automatically delete resources from previous deployments provided they have the same name as the current deployment (which the do here).

I just remembered that I fixed a helm chart bug, but hadn't done a new release until recently (0.6.1). The issue you're seeing here may be due to dask/dask-gateway#150, which has since been fixed. After upgrading the dask-gateway chart dependency to >= 0.6.0 things like this should hopefully not happen (the issue was a label that the official helm docs mistakenly recommended). (Note that until you have a version >= 0.6.0 running, version upgrades of dask-gateway won't run smoothly due to this issue - the upgrade failures are due to the currently running chart, not the new chart).

@jhamman
Copy link
Member

jhamman commented Jan 15, 2020

I'm not sure we ever moved past the original problem in this issue but we have a new one:

...Successfully got an update from the "pangeo" chart repository
...Successfully got an update from the "dask-gateway" chart repository
...Successfully got an update from the "jupyterhub" chart repository
...Successfully got an update from the "stable" chart repository
Update Complete.
Saving 1 charts
Downloading pangeo from repo https://pangeo-data.github.io/helm-chart/
Deleting outdated charts
UPGRADE FAILED
Error: Deployment.apps "gateway-ocean-staging-dask-gateway" is invalid: spec.selector: Invalid value: v1.LabelSelector{MatchLabels:map[string]string{"app.kubernetes.io/component":"gateway", "app.kubernetes.io/instance":"ocean-staging", "app.kubernetes.io/managed-by":"Tiller", "app.kubernetes.io/name":"dask-gateway"}, MatchExpressions:[]v1.LabelSelectorRequirement(nil)}: field is immutable && Deployment.apps "scheduler-proxy-ocean-staging-dask-gateway" is invalid: spec.selector: Invalid value: v1.LabelSelector{MatchLabels:map[string]string{"app.kubernetes.io/component":"scheduler-proxy", "app.kubernetes.io/instance":"ocean-staging", "app.kubernetes.io/managed-by":"Tiller", "app.kubernetes.io/name":"dask-gateway"}, MatchExpressions:[]v1.LabelSelectorRequirement(nil)}: field is immutable && Deployment.apps "web-proxy-ocean-staging-dask-gateway" is invalid: spec.selector: Invalid value: v1.LabelSelector{MatchLabels:map[string]string{"app.kubernetes.io/component":"web-proxy", "app.kubernetes.io/instance":"ocean-staging", "app.kubernetes.io/managed-by":"Tiller", "app.kubernetes.io/name":"dask-gateway"}, MatchExpressions:[]v1.LabelSelectorRequirement(nil)}: field is immutable
Error: UPGRADE FAILED: Deployment.apps "gateway-ocean-staging-dask-gateway" is invalid: spec.selector: Invalid value: v1.LabelSelector{MatchLabels:map[string]string{"app.kubernetes.io/component":"gateway", "app.kubernetes.io/instance":"ocean-staging", "app.kubernetes.io/managed-by":"Tiller", "app.kubernetes.io/name":"dask-gateway"}, MatchExpressions:[]v1.LabelSelectorRequirement(nil)}: field is immutable && Deployment.apps "scheduler-proxy-ocean-staging-dask-gateway" is invalid: spec.selector: Invalid value: v1.LabelSelector{MatchLabels:map[string]string{"app.kubernetes.io/component":"scheduler-proxy", "app.kubernetes.io/instance":"ocean-staging", "app.kubernetes.io/managed-by":"Tiller", "app.kubernetes.io/name":"dask-gateway"}, MatchExpressions:[]v1.LabelSelectorRequirement(nil)}: field is immutable && Deployment.apps "web-proxy-ocean-staging-dask-gateway" is invalid: spec.selector: Invalid value: v1.LabelSelector{MatchLabels:map[string]string{"app.kubernetes.io/component":"web-proxy", "app.kubernetes.io/instance":"ocean-staging", "app.kubernetes.io/managed-by":"Tiller", "app.kubernetes.io/name":"dask-gateway"}, MatchExpressions:[]v1.LabelSelectorRequirement(nil)}: field is immutable
Traceback (most recent call last):
  File "/home/circleci/repo/venv/bin/hubploy", line 11, in <module>
    sys.exit(main())
  File "/home/circleci/repo/venv/lib/python3.7/site-packages/hubploy/__main__.py", line 89, in main
    helm.deploy(args.deployment, args.chart, args.environment, args.namespace, args.set, args.version, args.timeout, args.force)
  File "/home/circleci/repo/venv/lib/python3.7/site-packages/hubploy/helm.py", line 126, in deploy
    force
  File "/home/circleci/repo/venv/lib/python3.7/site-packages/hubploy/helm.py", line 62, in helm_upgrade
    subprocess.check_call(cmd)
  File "/usr/local/lib/python3.7/subprocess.py", line 347, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['helm', 'upgrade', '--wait', '--install', '--namespace', 'ocean-staging', 'ocean-staging', 'pangeo-deploy', '-f', 'deployments/ocean/config/common.yaml', '-f', 'deployments/ocean/config/staging.yaml', '-f', 'deployments/ocean/secrets/staging.yaml', '--set', 'pangeo.jupyterhub.singleuser.image.tag=93467b8', '--set', 'pangeo.jupyterhub.singleuser.image.name=us.gcr.io/*************/ocean-pangeo-io-notebook']' returned non-zero exit status 1.

@jcrist - have you seen this error ( Deployment.apps "gateway-ocean-staging-dask-gateway" is invalid)?

@jcrist
Copy link
Member

jcrist commented Jan 15, 2020

If you scroll to the right you can see that it's indicating that a label selector is immutable. This should have been fixed by dask/dask-gateway#150 (see the issue for details on why this was). I assume you're upgrading from 0.5.0 to 0.6.* here? If you're already on 0.6.* then this is something new. If you're upgrading from 0.5.0 to 0.6.* you'll have to delete the old deployment and reinstall, after that upgrades between versions should run smoothly.

Edit: see also my comment above yours, which is about the same issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants