Skip to content
This repository has been archived by the owner on Mar 5, 2024. It is now read-only.

unable to assign pod the iam role #46

Closed
tasdikrahman opened this issue Mar 15, 2018 · 16 comments
Closed

unable to assign pod the iam role #46

tasdikrahman opened this issue Mar 15, 2018 · 16 comments

Comments

@tasdikrahman
Copy link
Contributor

checked the kiam-agent logs on the node where the pod (which was to be assigned the iam role) was scheduled which look like

{"addr":"10.2.40.2:34938","headers":{},"level":"info","method":"GET","msg":"processed request","path":"/latest/meta-data/iam/security-credentials/","status":200,"time":"2018-03-15T07:02:32Z"}
{"level":"warning","msg":"error finding role for pod: rpc error: code = Unknown desc = pod not found","pod.ip":"10.2.40.2","time":"2018-03-15T07:02:32Z"}
{"addr":"10.2.40.2:34940","level":"error","method":"GET","msg":"error processing request: error checking assume role permissions: rpc error: code = Unknown desc = pod not found","path":"/latest/meta-data/iam/security-credentials/my-app-iam-role","status":500,"time":"2018-03-15T07:02:32Z"}
{"addr":"10.2.40.2:34940","headers":{"Content-Type":["text/plain; charset=utf-8"],"X-Content-Type-Options":["nosniff"]},"level":"info","method":"GET","msg":"processed request","path":"/latest/meta-data/iam/security-credentials/my-app-iam-role","status":500,"time":"2018-03-15T07:02:32Z"}
{"addr":"10.2.40.2:34942","headers":{},"level":"info","method":"GET","msg":"processed request","path":"/latest/meta-data/iam/security-credentials/","status":200,"time":"2018-03-15T07:02:33Z"}
{"level":"warning","msg":"error finding role for pod: rpc error: code = Unknown desc = pod not found","pod.ip":"10.2.40.2","time":"2018-03-15T07:02:33Z"}
{"addr":"10.2.40.2:34944","level":"error","method":"GET","msg":"error processing request: error checking assume role permissions: rpc error: code = Unknown desc = pod not found","path":"/latest/meta-data/iam/security-credentials/my-app-iam-role","status":500,"time":"2018-03-15T07:02:33Z"}
{"addr":"10.2.40.2:34944","headers":{"Content-Type":["text/plain; charset=utf-8"],"X-Content-Type-Options":["nosniff"]},"level":"info","method":"GET","msg":"processed request","path":"/latest/meta-data/iam/security-credentials/my-app-iam-role","status":500,"time":"2018-03-15T07:02:33Z"}
{"addr":"10.2.40.2:34946","headers":{},"level":"info","method":"GET","msg":"processed request","path":"/latest/meta-data/iam/security-credentials/","status":200,"time":"2018-03-15T07:02:33Z"}
{"level":"warning","msg":"error finding role for pod: rpc error: code = Unknown desc = pod not found","pod.ip":"10.2.40.2","time":"2018-03-15T07:02:33Z"}
{"addr":"10.2.40.2:34948","level":"error","method":"GET","msg":"error processing request: error checking assume role permissions: rpc error: code = Unknown desc = pod not found","path":"/latest/meta-data/iam/security-credentials/my-app-iam-role","status":500,"time":"2018-03-15T07:02:33Z"}
{"addr":"10.2.40.2:34948","headers":{"Content-Type":["text/plain; charset=utf-8"],"X-Content-Type-Options":["nosniff"]},"level":"info","method":"GET","msg":"processed request","path":"/latest/meta-data/iam/security-credentials/my-app-iam-role","status":500,"time":"2018-03-15T07:02:33Z"}

running uswitch/kiam:v2.4 on both the agent and the server.

The namespace where the pod is scheduled has the annotation

  annotations:
    iam.amazonaws.com/permitted: .*

as stated in the docs

along with

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "",
      "Effect": "Allow",
      "Principal": {
        "Service": "ec2.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

in the trust relationships on the iam-role picked up the node where the pods are being scheduled.

Not sure if it's related, but we recently upgraded k8s from 1.8.4 to 1.8.9, but I guess that shouldn't be the problem.

@tasdikrahman
Copy link
Contributor Author

strange, killing the kiam agent/server pods and letting the new ones come up fixed the issue. Any ideas on it?

@pingles
Copy link
Contributor

pingles commented Mar 15, 2018 via email

@tasdikrahman
Copy link
Contributor Author

Ah too bad, I didn't get the logs from the server agent before deleting them. :/

@pingles
Copy link
Contributor

pingles commented Mar 15, 2018

Yeah sorry, without the server logs its difficult to know what the problem is. I'm going to close this for now but please reopen with if it happens again with as much log data as you can capture please.

Thanks!

@pingles pingles closed this as completed Mar 15, 2018
@tasdikrahman
Copy link
Contributor Author

Hey thanks @pingles , will post here again if I face the issue again. Thanks for your time.

@pingles
Copy link
Contributor

pingles commented Mar 15, 2018 via email

@tasdikrahman
Copy link
Contributor Author

Hey @pingles, whilst upgrading a cluster of ours. We faced the above issue again.

Logs from one of the client-agents

{"level":"warning","msg":"error finding role for pod: rpc error: code = Unknown desc = pod not found","pod.ip":"10.2.3.39","time":"2018-03-21T14:23:39Z"}
{"level":"warning","msg":"error finding role for pod: rpc error: code = Unknown desc = pod not found","pod.ip":"10.2.3.39","time":"2018-03-21T14:23:39Z"}
{"level":"warning","msg":"error finding role for pod: rpc error: code = Unknown desc = pod not found","pod.ip":"10.2.3.39","time":"2018-03-21T14:23:39Z"}
{"level":"warning","msg":"error finding role for pod: rpc error: code = Unknown desc = pod not found","pod.ip":"10.2.3.39","time":"2018-03-21T14:23:39Z"}
{"level":"warning","msg":"error finding role for pod: rpc error: code = Unknown desc = pod not found","pod.ip":"10.2.3.39","time":"2018-03-21T14:23:39Z"}
{"level":"warning","msg":"error finding role for pod: rpc error: code = Unknown desc = pod not found","pod.ip":"10.2.3.39","time":"2018-03-21T14:23:39Z"}
{"level":"warning","msg":"error finding role for pod: rpc error: code = Unknown desc = pod not found","pod.ip":"10.2.3.39","time":"2018-03-21T14:23:39Z"}
{"level":"warning","msg":"error finding role for pod: rpc error: code = Unknown desc = pod not found","pod.ip":"10.2.3.39","time":"2018-03-21T14:23:39Z"}
{"level":"warning","msg":"error finding role for pod: rpc error: code = Unknown desc = pod not found","pod.ip":"10.2.3.39","time":"2018-03-21T14:23:39Z"}
{"level":"warning","msg":"error finding role for pod: rpc error: code = Unknown desc = pod not found","pod.ip":"10.2.3.39","time":"2018-03-21T14:23:40Z"}
{"addr":"10.2.3.24:54024","headers":{},"level":"info","method":"GET","msg":"processed request","path":"/latest/meta-data/iam/security-credentials/","status":200,"time":"2018-03-21T14:23:40Z"}
{"addr":"10.2.3.24:54024","headers":{"Content-Type":["application/json"]},"level":"info","method":"GET","msg":"processed request","path":"/latest/meta-data/iam/security-credentials/my-iam-role","status":200,"time":"2018-03-21T14:23:40Z"}
{"level":"warning","msg":"error finding role for pod: rpc error: code = Unknown desc = pod not found","pod.ip":"10.2.3.39","time":"2018-03-21T14:23:40Z"}
{"level":"warning","msg":"error finding role for pod: rpc error: code = Unknown desc = pod not found","pod.ip":"10.2.3.39","time":"2018-03-21T14:23:40Z"}
{"addr":"10.1.54.111:37730","headers":{},"level":"info","method":"GET","msg":"processed request","path":"/ping","status":200,"time":"2018-03-21T14:23:40Z"}
{"addr":"10.2.3.39:35978","level":"error","method":"GET","msg":"error processing request: rpc error: code = Unknown desc = pod not found","path":"/latest/meta-data/iam/security-credentials/","status":500,"time":"2018-03-21T14:23:40Z"}
{"addr":"10.2.3.39:35978","headers":{"Content-Type":["text/plain; charset=utf-8"],"X-Content-Type-Options":["nosniff"]},"level":"info","method":"GET","msg":"processed request","path":"/latest/meta-data/iam/security-credentials/","status":500,"time":"2018-03-21T14:23:40Z"}
{"addr":"10.1.54.111:37770","headers":{},"level":"info","method":"GET","msg":"processed request","path":"/ping","status":200,"time":"2018-03-21T14:23:43Z"}

Logs from one of the server agents


ERROR: logging before flag.Parse: E0321 09:56:13.663622       1 reflector.go:315] github.com/uswitch/kiam/pkg/k8s/pod_cache.go:227: Failed to watch *v1.Pod: Get https://10.3.0.1:443/api/v1/pods?resourceVersion=26477282&timeoutSeconds=501&watch=true: dial tcp 10.3.0.1:443: getsockopt: connection refused
ERROR: logging before flag.Parse: E0321 09:56:13.663639       1 reflector.go:315] github.com/uswitch/kiam/pkg/k8s/namespace_cache.go:83: Failed to watch *v1.Namespace: Get https://10.3.0.1:443/api/v1/namespaces?resourceVersion=26255952&timeoutSeconds=503&watch=true: dial tcp 10.3.0.1:443: getsockopt: connection refused
ERROR: logging before flag.Parse: E0321 09:56:14.664819       1 reflector.go:205] github.com/uswitch/kiam/pkg/k8s/pod_cache.go:227: Failed to list *v1.Pod: Get https://10.3.0.1:443/api/v1/pods?resourceVersion=0: dial tcp 10.3.0.1:443: getsockopt: connection refused
ERROR: logging before flag.Parse: E0321 09:56:14.665933       1 reflector.go:205] github.com/uswitch/kiam/pkg/k8s/namespace_cache.go:83: Failed to list *v1.Namespace: Get https://10.3.0.1:443/api/v1/namespaces?resourceVersion=0: dial tcp 10.3.0.1:443: getsockopt: connection refused
ERROR: logging before flag.Parse: E0321 09:56:15.665916       1 reflector.go:205] github.com/uswitch/kiam/pkg/k8s/pod_cache.go:227: Failed to list *v1.Pod: Get https://10.3.0.1:443/api/v1/pods?resourceVersion=0: dial tcp 10.3.0.1:443: getsockopt: connection refused
ERROR: logging before flag.Parse: E0321 09:56:15.666963       1 reflector.go:205] github.com/uswitch/kiam/pkg/k8s/namespace_cache.go:83: Failed to list *v1.Namespace: Get https://10.3.0.1:443/api/v1/namespaces?resourceVersion=0: dial tcp 10.3.0.1:443: getsockopt: connection refused
ERROR: logging before flag.Parse: E0321 09:56:16.667094       1 reflector.go:205] github.com/uswitch/kiam/pkg/k8s/pod_cache.go:227: Failed to list *v1.Pod: Get https://10.3.0.1:443/api/v1/pods?resourceVersion=0: dial tcp 10.3.0.1:443: getsockopt: connection refused
ERROR: logging before flag.Parse: E0321 09:56:16.667965       1 reflector.go:205] github.com/uswitch/kiam/pkg/k8s/namespace_cache.go:83: Failed to list *v1.Namespace: Get https://10.3.0.1:443/api/v1/namespaces?resourceVersion=0: dial tcp 10.3.0.1:443: getsockopt: connection refused
{"level":"info","msg":"found role","pod.iam.role":"my-iam-role","pod.ip":"10.2.2.193","time":"2018-03-21T09:56:17Z"}
{"level":"info","msg":"requesting credentials","pod.iam.role":"my-iam-role","time":"2018-03-21T09:56:17Z"}
{"level":"info","msg":"found role","pod.iam.role":"my-iam-role","pod.ip":"10.2.3.25","time":"2018-03-21T09:56:17Z"}
ERROR: logging before flag.Parse: E0321 09:56:17.668059       1 reflector.go:205] github.com/uswitch/kiam/pkg/k8s/pod_cache.go:227: Failed to list *v1.Pod: Get https://10.3.0.1:443/api/v1/pods?resourceVersion=0: dial tcp 10.3.0.1:443: getsockopt: connection refused
ERROR: logging before flag.Parse: E0321 09:56:17.669509       1 reflector.go:205] github.com/uswitch/kiam/pkg/k8s/namespace_cache.go:83: Failed to list *v1.Namespace: Get https://10.3.0.1:443/api/v1/namespaces?resourceVersion=0: dial tcp 10.3.0.1:443: getsockopt: connection refused
...
...
{"credentials.access.key":"ASIAIKN7LBPV3EDBRAHA","credentials.expiration":"2018-03-21T10:06:38Z","credentials.role":"my-second-iam-role","generation.metadata":0,"level":"info","msg":"fetched credentials","pod.iam.role":"prod-harvester","pod.name":"pod1-76b6d9746-l8w2f","pod.namespace":"ns1","pod.status.ip":"10.2.11.145","pod.status.phase":"Running","resource.version":"26310469","time":"2018-03-21T09:56:18Z"}
{"credentials.access.key":"ASIAIPZZL4EO5OJJTDQA","credentials.expiration":"2018-03-21T10:07:37Z","credentials.role":"my-third-iam-role","generation.metadata":0,"level":"info","msg":"fetched credentials","pod.iam.role":"prod-euler","pod.name":"pod2-844d859d8-p4jnt","pod.namespace":"ns2","pod.status.ip":"10.2.10.13","pod.status.phase":"Running","resource.version":"22112994","time":"2018-03-21T09:56:18Z"}
ERROR: logging before flag.Parse: E0321 09:56:18.838270       1 runtime.go:66] Observed a panic: &runtime.TypeAssertionError{interfaceString:"interface {}", concreteString:"cache.DeletedFinalStateUnknown", assertedString:"*v1.Pod", missingMethod:""} (interface conversion: interface {} is cache.DeletedFinalStateUnknown, not *v1.Pod)
/go/src/github.com/uswitch/kiam/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:72
/go/src/github.com/uswitch/kiam/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:65
/go/src/github.com/uswitch/kiam/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:51
/usr/local/go/src/runtime/asm_amd64.s:509
/usr/local/go/src/runtime/panic.go:491
/usr/local/go/src/runtime/iface.go:172
/go/src/github.com/uswitch/kiam/pkg/k8s/pod_cache.go:126
/go/src/github.com/uswitch/kiam/pkg/k8s/pod_cache.go:214
/go/src/github.com/uswitch/kiam/vendor/k8s.io/client-go/tools/cache/delta_fifo.go:451
/go/src/github.com/uswitch/kiam/vendor/k8s.io/client-go/tools/cache/controller.go:150
/go/src/github.com/uswitch/kiam/vendor/k8s.io/client-go/tools/cache/controller.go:124
/go/src/github.com/uswitch/kiam/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133
/go/src/github.com/uswitch/kiam/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134
/go/src/github.com/uswitch/kiam/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88
/go/src/github.com/uswitch/kiam/vendor/k8s.io/client-go/tools/cache/controller.go:124
/usr/local/go/src/runtime/asm_amd64.s:2337

Please let me know if you want something else from the logs. Thanks

@pingles pingles reopened this Mar 22, 2018
@pingles
Copy link
Contributor

pingles commented Mar 22, 2018

Interesting, definitely looks like something's wrong there. I've reopened.

@pingles
Copy link
Contributor

pingles commented Mar 22, 2018

So kiam definitely assumes that the deltas delivered by the k8s client are only for *v1.Pod.

Had a quick search for the error and this seems the same: kubernetes/kubernetes@1c65d1d

Should be relatively easy to fix. I'll try and do it asap unless someone else beats me to it!

@pingles
Copy link
Contributor

pingles commented Mar 22, 2018

@tasdikrahman
Copy link
Contributor Author

Thanks for your time, appreciate it :)

pingles added a commit that referenced this issue Apr 23, 2018
This helps to simplify the implementation of the pod and namespace caches, as well as better handling errors from `cache.DeletedFinalStateUnknown` identified in #46 and more.
@pingles
Copy link
Contributor

pingles commented Apr 23, 2018

I committed a fix earlier for this but I've also just changed again to remove some of the pod cache internals.

This should also remove the runtime.TypeAssertionError for cache.DeletedFinalStateUnknown.

The latest PR that addresses this (#51) also changes the server boot process so that the pod and namespace caches must Sync before the gRPC listener starts. Hopefully both of those should mean the server process behaves better for you.

@tasdikrahman I'm going to close this issue for now (relating to the type error). If after updating Kiam (sorry, you'll need to use latest or the SHA for now) you see the erroneous behaviour again (pod not found errors) please re-open.

@pingles pingles closed this as completed Apr 23, 2018
@tasdikrahman
Copy link
Contributor Author

I'm going to close this issue for now (relating to the type error). If after updating Kiam (sorry, you'll need to use latest or the SHA for now) you see the erroneous behaviour again (pod not found errors) please re-open.

No problem at all. Will update on this issue if see the error again. Thanks a lot! Just for my sanity, was curious if a release is scheduled around 😀after 2.6

@pingles
Copy link
Contributor

pingles commented Apr 25, 2018 via email

@tasdikrahman
Copy link
Contributor Author

Yep- we’ll probably do a release soon. I’d like to get better Prometheus
metrics exported first (which should be quite quick) then do a release so
perhaps within a few days/week?

Sounds good, thanks for the help! :)

@pingles
Copy link
Contributor

pingles commented Apr 30, 2018 via email

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants