Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

oc cluster up with metrics fails with "No API token found for service account metrics-deployer" #11946

Closed
jmazzitelli opened this issue Nov 17, 2016 · 12 comments
Assignees
Labels

Comments

@jmazzitelli
Copy link

jmazzitelli commented Nov 17, 2016

Cannot run "oc cluster up --metrics" successfully. It always fails.

Version
$ ./oc version
oc v1.4.0-alpha.1+f189ede
kubernetes v1.4.0+776c994
features: Basic-Auth GSSAPI Kerberos SPNEGO
Steps To Reproduce
  1. Simply run the command "sudo ./oc cluster up --metrics"
Current Result
$ sudo ./oc cluster up --metrics
-- Checking OpenShift client ... OK
-- Checking Docker client ... OK
-- Checking Docker version ... OK
-- Checking for existing OpenShift container ... OK
-- Checking for openshift/origin:v1.4.0-alpha.1 image ... OK
-- Checking Docker daemon configuration ... OK
-- Checking for available ports ... 
   WARNING: Binding DNS on port 8053 instead of 53, which may not be resolvable from all clients.
-- Checking type of volume mount ... 
   Using nsenter mounter for OpenShift volumes
-- Creating host directories ... OK
-- Finding server IP ... 
   Using 192.168.1.2 as the server IP
-- Starting OpenShift container ... 
   Creating initial OpenShift configuration
   Starting OpenShift using container 'origin'
   Waiting for API server to start listening
   OpenShift server started
-- Adding default OAuthClient redirect URIs ... OK
-- Installing registry ... OK
-- Installing router ... OK
-- Installing metrics ... FAIL
   Error: cannot create metrics deployer pod
   Details:
     Last 10 lines of "origin" container log:
     I1117 00:38:44.353250   12675 trace.go:61] Trace "Update
/api/v1/namespaces/openshift-infra/serviceaccounts/deployment-controller" (started 2016-11-17
00:38:43.74571921 +0000 UTC):
     [22.617µs] [22.617µs] About to convert to expected version
     [93.692µs] [71.075µs] Conversion done
     [99.156µs] [5.464µs] About to store object in database
     [607.415218ms] [607.316062ms] Object stored in database
     [607.425586ms] [10.368µs] Self-link added
     [607.484338ms] [58.752µs] END
     I1117 00:38:44.353911   12675 trace.go:61] Trace "Delete
/api/v1/namespaces/openshift-infra/secrets/namespace-controller-token-fwytu" (started 2016-11-17
00:38:42.796986729 +0000 UTC):
     [30.765µs] [30.765µs] About do delete object from database
     [1.556883322s] [1.556852557s] END

   Caused By:
     Error: No API token found for service account "metrics-deployer", retry after the token is
automatically created and added to the service account
Expected Result

Successful install of OpenShift with metrics.

Additional Information

I originally did not have my user in the docker group, which is why I prefix my command with "sudo". However, I did try this by putting my user in the docker group, and it didn't help. Same problem occurs. So I do not think it is related, but here's what I did:
$ sudo groupadd docker && sudo gpasswd -a ${USER} docker && sudo systemctl restart docker && newgrp docker

Also, I build "oc" from current master branch, and I get the same problem.

jmazzitelli added a commit to hawkular/hawkular-openshift-agent that referenced this issue Nov 17, 2016
NOTE: this isn't working for me. It might be a OS bug.
See: openshift/origin#11946
@pweil- pweil- added component/composition component/metrics priority/P2 kind/bug Categorizes issue or PR as related to a bug. labels Nov 17, 2016
@mwringe
Copy link
Contributor

mwringe commented Nov 17, 2016

@pweil- I don't believe this has anything to do with Origin Metrics directly. Setting up the service account is done as part of the cluster up command.

If @jmazzitelli installs origin metrics directly then it works for him.

And if I follow the steps outlined in what he is doing, then it works properly for me.

@csrwng csrwng assigned coreydaley and unassigned mwringe and csrwng Nov 17, 2016
@jmazzitelli
Copy link
Author

I just want to be clear about the replication procedures, I get this failure by doing nothing special other than download the oc binary, untar it, and run it via sudo:

$ wget https://github.com/openshift/origin/releases/download/v1.4.0-alpha.1/openshift-origin-client-tools-v1.4.0-alpha.1.f189ede-linux-64bit.tar.gz
$ tar xvfz openshift-origin-client-tools-v1.4.0-alpha.1.f189ede-linux-64bit.tar.gz
$ cd openshift-origin-client-tools-v1.4.0-alpha.1+f189ede-linux-64bit/
$ sudo ./oc cluster up --metrics

For the record, I am on Fedora 23, with "uname -a" as follows:

$ uname -a
Linux mazztower 4.6.7-200.fc23.x86_64 #1 SMP Wed Aug 17 14:24:53 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

@liggitt
Copy link
Contributor

liggitt commented Nov 17, 2016

if we are synchronously creating a service account then immediately creating a pod that uses it, that code needs to be able to retry creating the pod if it is forbidden because the service account's token hasn't been auto-generated yet.

@liggitt
Copy link
Contributor

liggitt commented Nov 17, 2016

@csrwng
Copy link
Contributor

csrwng commented Nov 17, 2016

@liggitt thanks, would changing it to a Job do the trick?

@liggitt
Copy link
Contributor

liggitt commented Nov 17, 2016

@soltysh could confirm, but I would expect so

@jmazzitelli
Copy link
Author

jmazzitelli commented Nov 17, 2016

I'm quickly going to see if this works. Not sure if you'd want a PR with this (this changes the code so it just keeps retrying if the error it gets is this "retry after the token is ready" error message. I suspect this is going to fix it (if this truly is a case where retrying will help - I will see in a minute).

-       deployerPod := metricsDeployerPod(hostName, imagePrefix, imageVersion)
-       if _, err = kubeClient.Pods(infraNamespace).Create(deployerPod); err != nil {
-               return errors.NewError("cannot create metrics deployer pod").WithCause(err).WithDetails(h.OriginLog())
+       for keepTrying := true; keepTrying == true; {
+               deployerPod := metricsDeployerPod(hostName, imagePrefix, imageVersion)
+               if _, err = kubeClient.Pods(infraNamespace).Create(deployerPod); err != nil {
+                       if !strings.Contains(err.Error(), "retry after the token") {
+                               return errors.NewError("cannot create metrics deployer pod").WithCause(err).WithDetails(h.OriginLog())
+                       }
+               } else {
+                       keepTrying = false
+               }

@jmazzitelli
Copy link
Author

That fix works. My "cluster up" command finished successfully and I do see this in the output:

-- Installing metrics ... OK

@csrwng
Copy link
Contributor

csrwng commented Nov 17, 2016

@jmazzitelli thanks for confirming that's the problem. I'd rather simply instantiate a job so that the job controller can do the retry for us.

@jmazzitelli
Copy link
Author

@csrwng sounds good to me. thanks for looking into this.

@coreydaley
Copy link
Member

Submitted pull request #12174 for review

@coreydaley
Copy link
Member

Closed via #12174

jmazzitelli added a commit to jmazzitelli/hawkular-openshift-agent that referenced this issue Dec 21, 2016
NOTE: this isn't working for me. It might be a OS bug.
See: openshift/origin#11946
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants