Skip to content
This repository has been archived by the owner on Mar 5, 2024. It is now read-only.

unable to reach /latest/meta-data/iam/security-credentials/; Status 500 #201

Closed
mbordash2 opened this issue Dec 16, 2018 · 4 comments
Closed

Comments

@mbordash2
Copy link

Hello, I am running into an issue where it appears the kiam_agent is not able to get credentials but I am thinking it is ultimately not able to talk to the kiam_server. I haven't gotten any errors on the server side so I am not sure if it is even getting that far.

This is running on EKS.

I have checked the kubedns and dnsmasq logs and they are clean. I am able to curl the meta-data service from the pod without issue, it is just this endpoint that is not working. Any input would be awesome as I am kind of spinning my wheels on this one for a while. I have debug logging enabled but I am wondering if there is a way to get more details on what is causing the 500 error.

Kiam agent logs:

{"level":"warning","msg":"error finding role for pod: rpc error: code = Unavailable desc = there is no connection available","pod.ip":"10.128.157.131","time":"2018-12-16T21:20:52Z"}
{"addr":"10.128.157.131:59770","level":"error","method":"GET","msg":"error processing request: rpc error: code = Unavailable desc = there is no connection available","path":"/latest/meta-data/iam/security-credentials/","status":500,"time":"2018-12-16T21:20:52Z"}
{"addr":"10.128.157.131:59770","headers":{"Content-Type":["text/plain; charset=utf-8"],"X-Content-Type-Options":["nosniff"]},"level":"info","method":"GET","msg":"processed request","path":"/latest/meta-data/iam/security-credentials/","status":500,"time":"2018-12-16T21:20:52Z"}
{"addr":"10.128.157.198:60666","headers":{},"level":"info","method":"GET","msg":"processed request","path":"/ping","status":200,"time":"2018-12-16T21:20:53Z"}

Kiam agent intercepting other meta-data commands successfully:

{"addr":"10.128.157.131:40366","headers":{"Accept-Ranges":["none"],"Content-Length":["11"],"Content-Type":["text/plain"],"Date":["Sun, 16 Dec 2018 22:08:05 GMT"],"Last-Modified":["Tue, 11 Dec 2018 17:28:14 GMT"],"Server":["EC2ws"]},"level":"info","method":"GET","msg":"processed request","path":"/latest/meta-data/network","status":200,"time":"2018-12-16T22:08:05Z"}

Kiam server logs:

time="2018-12-16T21:43:46Z" level=debug msg="updated pod" generation.metadata=0 pod.iam.role=k8s-kiam-pod-role pod.name=ubuntu-756bdb9b99-j9rzl pod.namespace=default pod.status.ip=10.128.157.131 pod.status.phase=Running resource.version=3366192 
@pingles
Copy link
Contributor

pingles commented Dec 17, 2018

Looks like you have an issue with the Agent being unable to contact the Server. I'd check all the usual issues:

  1. That your server processes are running and healthy.
  2. Your configuring the agent to connect with the right endpoint (and that it resolves via DNS)
  3. The TLS configuration is correct and has the right hosts configured etc.

The Agent uses gRPC client load-balancing to forward requests to all running servers, if no servers are available then it can't make requests. To help diagnose things you can add some environment variables to cause the gRPC lib to log more output. For more background please see #94 (comment).

Let me know what you find- I'm 99.9% certain it'll be a configuration/cluster error rather than something in Kiam.

@mbordash2
Copy link
Author

Thanks Paul, I was able to get into the containers and run the health commands with extra logging but I don't seem to be getting any better ideas since there isn't much detail in the error to go on..

Since the server seems to work, I am assuming that means the TLS certs are ok. From the agent output it appears that DNS is working as well.

btw, I am using v2.7.

Are the liveness and readiness probes needed? I have them commented out as I read there was some issues with them.

Agent output:

# GRPC_GO_LOG_SEVERITY_LEVEL=info GRPC_GO_LOG_VERBOSITY_LEVEL=20 /health --cert=/etc/kiam/tls/agent.pem --key=/etc/kiam/tls/agent-key.pem --ca=/etc/kiam/tls/ca.pem --server-address=kiam-server:443
INFO: 2018/12/17 17:27:47 ccBalancerWrapper: updating state and picker called by balancer: IDLE, 0xc4200529c0
INFO: 2018/12/17 17:27:47 dialing to target with scheme: ""
INFO: 2018/12/17 17:27:47 could not get resolver for scheme: ""
INFO: 2018/12/17 17:27:47 balancerWrapper: is pickfirst: false
WARN[0000] error checking health: rpc error: code = Unavailable desc = there is no address available 
INFO: 2018/12/17 17:27:47 balancerWrapper: got update addr from Notify: [{10.128.155.229:443 {1 62393235663564.kiam-server.kube-system.svc.cluster.local.}}]
INFO: 2018/12/17 17:27:47 ccBalancerWrapper: new subconn: [{10.128.155.229:443 0  {1 62393235663564.kiam-server.kube-system.svc.cluster.local.}}]
INFO: 2018/12/17 17:27:47 balancerWrapper: handle subconn state change: 0xc4202d6200, CONNECTING
INFO: 2018/12/17 17:27:47 ccBalancerWrapper: updating state and picker called by balancer: CONNECTING, 0xc4200529c0
INFO: 2018/12/17 17:27:47 balancerWrapper: handle subconn state change: 0xc4202d6200, TRANSIENT_FAILURE
INFO: 2018/12/17 17:27:47 ccBalancerWrapper: updating state and picker called by balancer: TRANSIENT_FAILURE, 0xc4200529c0
INFO: 2018/12/17 17:27:47 balancerWrapper: handle subconn state change: 0xc4202d6200, CONNECTING
INFO: 2018/12/17 17:27:47 ccBalancerWrapper: updating state and picker called by balancer: CONNECTING, 0xc4200529c0
WARN[0000] error checking health: rpc error: code = Unavailable desc = there is no connection available 
WARN[0000] error checking health: rpc error: code = Unavailable desc = there is no connection available 
WARN[0000] error checking health: rpc error: code = Unavailable desc = there is no connection available 
WARN[0000] error checking health: rpc error: code = Unavailable desc = there is no connection available 
WARN[0000] error checking health: rpc error: code = Unavailable desc = there is no connection available 
WARN[0000] error checking health: rpc error: code = Unavailable desc = there is no connection available 
WARN[0000] error checking health: rpc error: code = Unavailable desc = there is no connection available 
WARN[0000] error checking health: rpc error: code = Unavailable desc = there is no connection available 
WARN[0000] error checking health: rpc error: code = Unavailable desc = there is no connection available 
FATA[0001] error retrieving health: rpc error: code = Unavailable desc = there is no connection available 

Server output:

# GRPC_GO_LOG_SEVERITY_LEVEL=info GRPC_GO_LOG_VERBOSITY_LEVEL=8 /health --cert=/etc/kiam/tls/server.pem --key=/etc/kiam/tls/server-key.pem --ca=/etc/kiam/tls/ca.pem --server-ad
dress=127.0.0.1:443
INFO: 2018/12/17 17:29:09 ccBalancerWrapper: updating state and picker called by balancer: IDLE, 0xc420389320
INFO: 2018/12/17 17:29:09 dialing to target with scheme: ""
INFO: 2018/12/17 17:29:09 could not get resolver for scheme: ""
WARN[0000] error checking health: rpc error: code = Unavailable desc = there is no connection available 
INFO: 2018/12/17 17:29:09 balancerWrapper: is pickfirst: false
INFO: 2018/12/17 17:29:09 balancerWrapper: got update addr from Notify: [{127.0.0.1:443 <nil>}]
INFO: 2018/12/17 17:29:09 ccBalancerWrapper: new subconn: [{127.0.0.1:443 0  <nil>}]
INFO: 2018/12/17 17:29:09 balancerWrapper: handle subconn state change: 0xc420432080, CONNECTING
INFO: 2018/12/17 17:29:09 ccBalancerWrapper: updating state and picker called by balancer: CONNECTING, 0xc420389320
INFO: 2018/12/17 17:29:09 balancerWrapper: handle subconn state change: 0xc420432080, TRANSIENT_FAILURE
INFO: 2018/12/17 17:29:09 ccBalancerWrapper: updating state and picker called by balancer: TRANSIENT_FAILURE, 0xc420389320
INFO: 2018/12/17 17:29:09 balancerWrapper: handle subconn state change: 0xc420432080, CONNECTING
INFO: 2018/12/17 17:29:09 ccBalancerWrapper: updating state and picker called by balancer: CONNECTING, 0xc420389320
INFO: 2018/12/17 17:29:09 balancerWrapper: handle subconn state change: 0xc420432080, READY
INFO: 2018/12/17 17:29:09 ccBalancerWrapper: updating state and picker called by balancer: READY, 0xc420389320
INFO[0000] healthy: ok                              

@pingles
Copy link
Contributor

pingles commented Dec 18, 2018

Ok, so TLS doesn't appear to be a problem- there does appear to be some kind of problem with the Agent connecting to the Server though- are there any other quirks in how your pods connect/networking is configured? What happens if you try and curl to the server?

@mbordash2
Copy link
Author

I was able to figure out the communication issue was due to a misconfiguration in the security groups which was prevent the agent from talking to the server. I then was able to rerun the health commands and now I see more meaningful error messages.

Thank you for taking the time to work through this issue, much appreciated!

/ # GRPC_GO_LOG_SEVERITY_LEVEL=info GRPC_GO_LOG_VERBOSITY_LEVEL=20 /health --cert=/etc/kiam/tls/agent.pem --key=/etc/kiam/tls/agent-key.pem --ca=/etc/kiam/tls/ca.pem --serve
r-address=kiam-server:443
INFO: 2018/12/19 05:15:52 ccBalancerWrapper: updating state and picker called by balancer: IDLE, 0xc420052a20
INFO: 2018/12/19 05:15:52 dialing to target with scheme: ""
INFO: 2018/12/19 05:15:52 could not get resolver for scheme: ""
WARN[0000] error checking health: rpc error: code = Unavailable desc = there is no address available 
INFO: 2018/12/19 05:15:52 balancerWrapper: is pickfirst: false
INFO: 2018/12/19 05:15:52 balancerWrapper: got update addr from Notify: [{10.128.155.229:443 {1 62393235663564.kiam-server.kube-system.svc.cluster.local.}}]
INFO: 2018/12/19 05:15:52 ccBalancerWrapper: new subconn: [{10.128.155.229:443 0  {1 62393235663564.kiam-server.kube-system.svc.cluster.local.}}]
INFO: 2018/12/19 05:15:52 balancerWrapper: handle subconn state change: 0xc420288200, CONNECTING
INFO: 2018/12/19 05:15:52 ccBalancerWrapper: updating state and picker called by balancer: CONNECTING, 0xc420052a20
INFO: 2018/12/19 05:15:52 balancerWrapper: handle subconn state change: 0xc420288200, TRANSIENT_FAILURE
INFO: 2018/12/19 05:15:52 ccBalancerWrapper: updating state and picker called by balancer: TRANSIENT_FAILURE, 0xc420052a20
INFO: 2018/12/19 05:15:52 balancerWrapper: handle subconn state change: 0xc420288200, CONNECTING
INFO: 2018/12/19 05:15:52 ccBalancerWrapper: updating state and picker called by balancer: CONNECTING, 0xc420052a20
WARNING: 2018/12/19 05:15:52 Failed to dial 10.128.155.229:443: connection error: desc = "transport: authentication handshake failed: x509: certificate is valid for kiam-server, 127.0.0.1:443, 127.0.0.1:9610, not kiam-server:443"; please retry.
INFO: 2018/12/19 05:15:52 balancerWrapper: handle subconn state change: 0xc420288200, SHUTDOWN
INFO: 2018/12/19 05:15:52 ccBalancerWrapper: updating state and picker called by balancer: TRANSIENT_FAILURE, 0xc420052a20
WARN[0000] error checking health: rpc error: code = Unavailable desc = there is no connection available 
WARN[0000] error checking health: rpc error: code = Unavailable desc = there is no connection available 
WARN[0000] error checking health: rpc error: code = Unavailable desc = there is no connection available 
WARN[0000] error checking health: rpc error: code = Unavailable desc = there is no connection available 
WARN[0000] error checking health: rpc error: code = Unavailable desc = there is no connection available 
WARN[0000] error checking health: rpc error: code = Unavailable desc = there is no connection available 
WARN[0000] error checking health: rpc error: code = Unavailable desc = there is no connection available 
WARN[0000] error checking health: rpc error: code = Unavailable desc = there is no connection available 
WARN[0000] error checking health: rpc error: code = Unavailable desc = there is no connection available 
FATA[0001] error retrieving health: rpc error: code = Unavailable desc = there is no connection available 

/ # GRPC_GO_LOG_SEVERITY_LEVEL=info GRPC_GO_LOG_VERBOSITY_LEVEL=20 /health --cert=/etc/kiam/tls/agent.pem --key=/etc/kiam/tls/agent-key.pem --ca=/etc/kiam/tls/ca.pem --serve
r-address=kiam-server
INFO: 2018/12/19 05:16:25 ccBalancerWrapper: updating state and picker called by balancer: IDLE, 0xc4200529c0
INFO: 2018/12/19 05:16:25 dialing to target with scheme: ""
INFO: 2018/12/19 05:16:25 could not get resolver for scheme: ""
WARN[0000] error checking health: rpc error: code = Unavailable desc = there is no address available 
INFO: 2018/12/19 05:16:25 balancerWrapper: is pickfirst: false
INFO: 2018/12/19 05:16:25 balancerWrapper: got update addr from Notify: [{10.128.155.229:443 {1 62393235663564.kiam-server.kube-system.svc.cluster.local.}}]
INFO: 2018/12/19 05:16:25 ccBalancerWrapper: new subconn: [{10.128.155.229:443 0  {1 62393235663564.kiam-server.kube-system.svc.cluster.local.}}]
INFO: 2018/12/19 05:16:25 balancerWrapper: handle subconn state change: 0xc4203420e0, CONNECTING
INFO: 2018/12/19 05:16:25 ccBalancerWrapper: updating state and picker called by balancer: CONNECTING, 0xc4200529c0
INFO: 2018/12/19 05:16:25 balancerWrapper: handle subconn state change: 0xc4203420e0, TRANSIENT_FAILURE
INFO: 2018/12/19 05:16:25 ccBalancerWrapper: updating state and picker called by balancer: TRANSIENT_FAILURE, 0xc4200529c0
INFO: 2018/12/19 05:16:25 balancerWrapper: handle subconn state change: 0xc4203420e0, CONNECTING
INFO: 2018/12/19 05:16:25 ccBalancerWrapper: updating state and picker called by balancer: CONNECTING, 0xc4200529c0
INFO: 2018/12/19 05:16:25 balancerWrapper: handle subconn state change: 0xc4203420e0, READY
INFO: 2018/12/19 05:16:25 ccBalancerWrapper: updating state and picker called by balancer: READY, 0xc4200529c0
INFO[0000] healthy: ok 

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants