unable to assign pod the iam role #46

tasdikrahman · 2018-03-15T07:32:19Z

checked the kiam-agent logs on the node where the pod (which was to be assigned the iam role) was scheduled which look like

{"addr":"10.2.40.2:34938","headers":{},"level":"info","method":"GET","msg":"processed request","path":"/latest/meta-data/iam/security-credentials/","status":200,"time":"2018-03-15T07:02:32Z"}
{"level":"warning","msg":"error finding role for pod: rpc error: code = Unknown desc = pod not found","pod.ip":"10.2.40.2","time":"2018-03-15T07:02:32Z"}
{"addr":"10.2.40.2:34940","level":"error","method":"GET","msg":"error processing request: error checking assume role permissions: rpc error: code = Unknown desc = pod not found","path":"/latest/meta-data/iam/security-credentials/my-app-iam-role","status":500,"time":"2018-03-15T07:02:32Z"}
{"addr":"10.2.40.2:34940","headers":{"Content-Type":["text/plain; charset=utf-8"],"X-Content-Type-Options":["nosniff"]},"level":"info","method":"GET","msg":"processed request","path":"/latest/meta-data/iam/security-credentials/my-app-iam-role","status":500,"time":"2018-03-15T07:02:32Z"}
{"addr":"10.2.40.2:34942","headers":{},"level":"info","method":"GET","msg":"processed request","path":"/latest/meta-data/iam/security-credentials/","status":200,"time":"2018-03-15T07:02:33Z"}
{"level":"warning","msg":"error finding role for pod: rpc error: code = Unknown desc = pod not found","pod.ip":"10.2.40.2","time":"2018-03-15T07:02:33Z"}
{"addr":"10.2.40.2:34944","level":"error","method":"GET","msg":"error processing request: error checking assume role permissions: rpc error: code = Unknown desc = pod not found","path":"/latest/meta-data/iam/security-credentials/my-app-iam-role","status":500,"time":"2018-03-15T07:02:33Z"}
{"addr":"10.2.40.2:34944","headers":{"Content-Type":["text/plain; charset=utf-8"],"X-Content-Type-Options":["nosniff"]},"level":"info","method":"GET","msg":"processed request","path":"/latest/meta-data/iam/security-credentials/my-app-iam-role","status":500,"time":"2018-03-15T07:02:33Z"}
{"addr":"10.2.40.2:34946","headers":{},"level":"info","method":"GET","msg":"processed request","path":"/latest/meta-data/iam/security-credentials/","status":200,"time":"2018-03-15T07:02:33Z"}
{"level":"warning","msg":"error finding role for pod: rpc error: code = Unknown desc = pod not found","pod.ip":"10.2.40.2","time":"2018-03-15T07:02:33Z"}
{"addr":"10.2.40.2:34948","level":"error","method":"GET","msg":"error processing request: error checking assume role permissions: rpc error: code = Unknown desc = pod not found","path":"/latest/meta-data/iam/security-credentials/my-app-iam-role","status":500,"time":"2018-03-15T07:02:33Z"}
{"addr":"10.2.40.2:34948","headers":{"Content-Type":["text/plain; charset=utf-8"],"X-Content-Type-Options":["nosniff"]},"level":"info","method":"GET","msg":"processed request","path":"/latest/meta-data/iam/security-credentials/my-app-iam-role","status":500,"time":"2018-03-15T07:02:33Z"}

running uswitch/kiam:v2.4 on both the agent and the server.

The namespace where the pod is scheduled has the annotation

  annotations:
    iam.amazonaws.com/permitted: .*

as stated in the docs

along with

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "",
      "Effect": "Allow",
      "Principal": {
        "Service": "ec2.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

in the trust relationships on the iam-role picked up the node where the pods are being scheduled.

Not sure if it's related, but we recently upgraded k8s from 1.8.4 to 1.8.9, but I guess that shouldn't be the problem.

The text was updated successfully, but these errors were encountered:

tasdikrahman · 2018-03-15T07:48:07Z

strange, killing the kiam agent/server pods and letting the new ones come up fixed the issue. Any ideas on it?

pingles · 2018-03-15T08:11:25Z

Not off the top of my head- do you have the log data from the server processes? From the errors you forwarded before it sounds most likely an issue inside the server process.

…

On Thu, 15 Mar 2018 at 07:48, Tasdik Rahman ***@***.***> wrote: strange, killing the kiam agent/server pods and letting the new ones come up fixed the issue. Any ideas on it? — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#46 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAAEfpCgm6glIOuZf0Xelo99eWHprVD1ks5tehy3gaJpZM4SrrIo> .

tasdikrahman · 2018-03-15T08:51:33Z

Ah too bad, I didn't get the logs from the server agent before deleting them. :/

pingles · 2018-03-15T10:30:48Z

Yeah sorry, without the server logs its difficult to know what the problem is. I'm going to close this for now but please reopen with if it happens again with as much log data as you can capture please.

Thanks!

tasdikrahman · 2018-03-15T10:40:37Z

Hey thanks @pingles , will post here again if I face the issue again. Thanks for your time.

pingles · 2018-03-15T11:03:56Z

No problem, thanks for reporting an issue.

…

On Thu, 15 Mar 2018 at 10:40, Tasdik Rahman ***@***.***> wrote: Hey thanks @pingles <https://github.com/pingles> , will post here again if I face the issue again. Thanks for your time. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#46 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAAEfn-5_CKUcF9BzcOUQRSXxTNKKDxkks5tekUlgaJpZM4SrrIo> .

tasdikrahman · 2018-03-22T07:27:03Z

Hey @pingles, whilst upgrading a cluster of ours. We faced the above issue again.

Logs from one of the client-agents

{"level":"warning","msg":"error finding role for pod: rpc error: code = Unknown desc = pod not found","pod.ip":"10.2.3.39","time":"2018-03-21T14:23:39Z"}
{"level":"warning","msg":"error finding role for pod: rpc error: code = Unknown desc = pod not found","pod.ip":"10.2.3.39","time":"2018-03-21T14:23:39Z"}
{"level":"warning","msg":"error finding role for pod: rpc error: code = Unknown desc = pod not found","pod.ip":"10.2.3.39","time":"2018-03-21T14:23:39Z"}
{"level":"warning","msg":"error finding role for pod: rpc error: code = Unknown desc = pod not found","pod.ip":"10.2.3.39","time":"2018-03-21T14:23:39Z"}
{"level":"warning","msg":"error finding role for pod: rpc error: code = Unknown desc = pod not found","pod.ip":"10.2.3.39","time":"2018-03-21T14:23:39Z"}
{"level":"warning","msg":"error finding role for pod: rpc error: code = Unknown desc = pod not found","pod.ip":"10.2.3.39","time":"2018-03-21T14:23:39Z"}
{"level":"warning","msg":"error finding role for pod: rpc error: code = Unknown desc = pod not found","pod.ip":"10.2.3.39","time":"2018-03-21T14:23:39Z"}
{"level":"warning","msg":"error finding role for pod: rpc error: code = Unknown desc = pod not found","pod.ip":"10.2.3.39","time":"2018-03-21T14:23:39Z"}
{"level":"warning","msg":"error finding role for pod: rpc error: code = Unknown desc = pod not found","pod.ip":"10.2.3.39","time":"2018-03-21T14:23:39Z"}
{"level":"warning","msg":"error finding role for pod: rpc error: code = Unknown desc = pod not found","pod.ip":"10.2.3.39","time":"2018-03-21T14:23:40Z"}
{"addr":"10.2.3.24:54024","headers":{},"level":"info","method":"GET","msg":"processed request","path":"/latest/meta-data/iam/security-credentials/","status":200,"time":"2018-03-21T14:23:40Z"}
{"addr":"10.2.3.24:54024","headers":{"Content-Type":["application/json"]},"level":"info","method":"GET","msg":"processed request","path":"/latest/meta-data/iam/security-credentials/my-iam-role","status":200,"time":"2018-03-21T14:23:40Z"}
{"level":"warning","msg":"error finding role for pod: rpc error: code = Unknown desc = pod not found","pod.ip":"10.2.3.39","time":"2018-03-21T14:23:40Z"}
{"level":"warning","msg":"error finding role for pod: rpc error: code = Unknown desc = pod not found","pod.ip":"10.2.3.39","time":"2018-03-21T14:23:40Z"}
{"addr":"10.1.54.111:37730","headers":{},"level":"info","method":"GET","msg":"processed request","path":"/ping","status":200,"time":"2018-03-21T14:23:40Z"}
{"addr":"10.2.3.39:35978","level":"error","method":"GET","msg":"error processing request: rpc error: code = Unknown desc = pod not found","path":"/latest/meta-data/iam/security-credentials/","status":500,"time":"2018-03-21T14:23:40Z"}
{"addr":"10.2.3.39:35978","headers":{"Content-Type":["text/plain; charset=utf-8"],"X-Content-Type-Options":["nosniff"]},"level":"info","method":"GET","msg":"processed request","path":"/latest/meta-data/iam/security-credentials/","status":500,"time":"2018-03-21T14:23:40Z"}
{"addr":"10.1.54.111:37770","headers":{},"level":"info","method":"GET","msg":"processed request","path":"/ping","status":200,"time":"2018-03-21T14:23:43Z"}

Logs from one of the server agents


ERROR: logging before flag.Parse: E0321 09:56:13.663622       1 reflector.go:315] github.com/uswitch/kiam/pkg/k8s/pod_cache.go:227: Failed to watch *v1.Pod: Get https://10.3.0.1:443/api/v1/pods?resourceVersion=26477282&timeoutSeconds=501&watch=true: dial tcp 10.3.0.1:443: getsockopt: connection refused
ERROR: logging before flag.Parse: E0321 09:56:13.663639       1 reflector.go:315] github.com/uswitch/kiam/pkg/k8s/namespace_cache.go:83: Failed to watch *v1.Namespace: Get https://10.3.0.1:443/api/v1/namespaces?resourceVersion=26255952&timeoutSeconds=503&watch=true: dial tcp 10.3.0.1:443: getsockopt: connection refused
ERROR: logging before flag.Parse: E0321 09:56:14.664819       1 reflector.go:205] github.com/uswitch/kiam/pkg/k8s/pod_cache.go:227: Failed to list *v1.Pod: Get https://10.3.0.1:443/api/v1/pods?resourceVersion=0: dial tcp 10.3.0.1:443: getsockopt: connection refused
ERROR: logging before flag.Parse: E0321 09:56:14.665933       1 reflector.go:205] github.com/uswitch/kiam/pkg/k8s/namespace_cache.go:83: Failed to list *v1.Namespace: Get https://10.3.0.1:443/api/v1/namespaces?resourceVersion=0: dial tcp 10.3.0.1:443: getsockopt: connection refused
ERROR: logging before flag.Parse: E0321 09:56:15.665916       1 reflector.go:205] github.com/uswitch/kiam/pkg/k8s/pod_cache.go:227: Failed to list *v1.Pod: Get https://10.3.0.1:443/api/v1/pods?resourceVersion=0: dial tcp 10.3.0.1:443: getsockopt: connection refused
ERROR: logging before flag.Parse: E0321 09:56:15.666963       1 reflector.go:205] github.com/uswitch/kiam/pkg/k8s/namespace_cache.go:83: Failed to list *v1.Namespace: Get https://10.3.0.1:443/api/v1/namespaces?resourceVersion=0: dial tcp 10.3.0.1:443: getsockopt: connection refused
ERROR: logging before flag.Parse: E0321 09:56:16.667094       1 reflector.go:205] github.com/uswitch/kiam/pkg/k8s/pod_cache.go:227: Failed to list *v1.Pod: Get https://10.3.0.1:443/api/v1/pods?resourceVersion=0: dial tcp 10.3.0.1:443: getsockopt: connection refused
ERROR: logging before flag.Parse: E0321 09:56:16.667965       1 reflector.go:205] github.com/uswitch/kiam/pkg/k8s/namespace_cache.go:83: Failed to list *v1.Namespace: Get https://10.3.0.1:443/api/v1/namespaces?resourceVersion=0: dial tcp 10.3.0.1:443: getsockopt: connection refused
{"level":"info","msg":"found role","pod.iam.role":"my-iam-role","pod.ip":"10.2.2.193","time":"2018-03-21T09:56:17Z"}
{"level":"info","msg":"requesting credentials","pod.iam.role":"my-iam-role","time":"2018-03-21T09:56:17Z"}
{"level":"info","msg":"found role","pod.iam.role":"my-iam-role","pod.ip":"10.2.3.25","time":"2018-03-21T09:56:17Z"}
ERROR: logging before flag.Parse: E0321 09:56:17.668059       1 reflector.go:205] github.com/uswitch/kiam/pkg/k8s/pod_cache.go:227: Failed to list *v1.Pod: Get https://10.3.0.1:443/api/v1/pods?resourceVersion=0: dial tcp 10.3.0.1:443: getsockopt: connection refused
ERROR: logging before flag.Parse: E0321 09:56:17.669509       1 reflector.go:205] github.com/uswitch/kiam/pkg/k8s/namespace_cache.go:83: Failed to list *v1.Namespace: Get https://10.3.0.1:443/api/v1/namespaces?resourceVersion=0: dial tcp 10.3.0.1:443: getsockopt: connection refused
...
...
{"credentials.access.key":"ASIAIKN7LBPV3EDBRAHA","credentials.expiration":"2018-03-21T10:06:38Z","credentials.role":"my-second-iam-role","generation.metadata":0,"level":"info","msg":"fetched credentials","pod.iam.role":"prod-harvester","pod.name":"pod1-76b6d9746-l8w2f","pod.namespace":"ns1","pod.status.ip":"10.2.11.145","pod.status.phase":"Running","resource.version":"26310469","time":"2018-03-21T09:56:18Z"}
{"credentials.access.key":"ASIAIPZZL4EO5OJJTDQA","credentials.expiration":"2018-03-21T10:07:37Z","credentials.role":"my-third-iam-role","generation.metadata":0,"level":"info","msg":"fetched credentials","pod.iam.role":"prod-euler","pod.name":"pod2-844d859d8-p4jnt","pod.namespace":"ns2","pod.status.ip":"10.2.10.13","pod.status.phase":"Running","resource.version":"22112994","time":"2018-03-21T09:56:18Z"}
ERROR: logging before flag.Parse: E0321 09:56:18.838270       1 runtime.go:66] Observed a panic: &runtime.TypeAssertionError{interfaceString:"interface {}", concreteString:"cache.DeletedFinalStateUnknown", assertedString:"*v1.Pod", missingMethod:""} (interface conversion: interface {} is cache.DeletedFinalStateUnknown, not *v1.Pod)
/go/src/github.com/uswitch/kiam/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:72
/go/src/github.com/uswitch/kiam/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:65
/go/src/github.com/uswitch/kiam/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:51
/usr/local/go/src/runtime/asm_amd64.s:509
/usr/local/go/src/runtime/panic.go:491
/usr/local/go/src/runtime/iface.go:172
/go/src/github.com/uswitch/kiam/pkg/k8s/pod_cache.go:126
/go/src/github.com/uswitch/kiam/pkg/k8s/pod_cache.go:214
/go/src/github.com/uswitch/kiam/vendor/k8s.io/client-go/tools/cache/delta_fifo.go:451
/go/src/github.com/uswitch/kiam/vendor/k8s.io/client-go/tools/cache/controller.go:150
/go/src/github.com/uswitch/kiam/vendor/k8s.io/client-go/tools/cache/controller.go:124
/go/src/github.com/uswitch/kiam/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133
/go/src/github.com/uswitch/kiam/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134
/go/src/github.com/uswitch/kiam/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88
/go/src/github.com/uswitch/kiam/vendor/k8s.io/client-go/tools/cache/controller.go:124
/usr/local/go/src/runtime/asm_amd64.s:2337

Please let me know if you want something else from the logs. Thanks

pingles · 2018-03-22T09:47:04Z

Interesting, definitely looks like something's wrong there. I've reopened.

pingles · 2018-03-22T09:53:56Z

So kiam definitely assumes that the deltas delivered by the k8s client are only for *v1.Pod.

Had a quick search for the error and this seems the same: kubernetes/kubernetes@1c65d1d

Should be relatively easy to fix. I'll try and do it asap unless someone else beats me to it!

pingles · 2018-03-22T09:56:30Z

And more relevant docs: https://github.com/kubernetes/client-go/blob/master/tools/cache/delta_fifo.go#L656

tasdikrahman · 2018-03-22T09:58:53Z

Thanks for your time, appreciate it :)

This helps to simplify the implementation of the pod and namespace caches, as well as better handling errors from `cache.DeletedFinalStateUnknown` identified in #46 and more.

pingles · 2018-04-23T20:42:23Z

I committed a fix earlier for this but I've also just changed again to remove some of the pod cache internals.

This should also remove the runtime.TypeAssertionError for cache.DeletedFinalStateUnknown.

The latest PR that addresses this (#51) also changes the server boot process so that the pod and namespace caches must Sync before the gRPC listener starts. Hopefully both of those should mean the server process behaves better for you.

@tasdikrahman I'm going to close this issue for now (relating to the type error). If after updating Kiam (sorry, you'll need to use latest or the SHA for now) you see the erroneous behaviour again (pod not found errors) please re-open.

tasdikrahman · 2018-04-25T07:49:12Z

I'm going to close this issue for now (relating to the type error). If after updating Kiam (sorry, you'll need to use latest or the SHA for now) you see the erroneous behaviour again (pod not found errors) please re-open.

No problem at all. Will update on this issue if see the error again. Thanks a lot! Just for my sanity, was curious if a release is scheduled around 😀after 2.6

pingles · 2018-04-25T08:09:44Z

Yep- we’ll probably do a release soon. I’d like to get better Prometheus metrics exported first (which should be quite quick) then do a release so perhaps within a few days/week?

…

On Wed, 25 Apr 2018 at 08:49, Tasdik Rahman ***@***.***> wrote: I'm going to close this issue for now (relating to the type error). If after updating Kiam (sorry, you'll need to use latest or the SHA for now) you see the erroneous behaviour again (pod not found errors) please re-open. No problem at all. Will update on this issue if see the error again. Thanks a lot! Just for my sanity, was curious if a release is scheduled around 😀after 2.6 — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub <#46 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAAEfmgPKiRsrgHwk0njYGCIWHZCya2gks5tsCp4gaJpZM4SrrIo> .

tasdikrahman · 2018-04-30T07:04:10Z

Yep- we’ll probably do a release soon. I’d like to get better Prometheus
metrics exported first (which should be quite quick) then do a release so
perhaps within a few days/week?

Sounds good, thanks for the help! :)

pingles · 2018-04-30T07:09:20Z

I’ll try and do a release today- that’ll pull in the error handling fix. I’ll push the Prometheus changes to the next.

…

On Mon, 30 Apr 2018 at 08:04, Tasdik Rahman ***@***.***> wrote: Yep- we’ll probably do a release soon. I’d like to get better Prometheus metrics exported first (which should be quite quick) then do a release so perhaps within a few days/week? Sounds good, thanks for the help! :) — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub <#46 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAAEfusGPni9uOoZj6--IDgeg9wC5fkpks5ttrdrgaJpZM4SrrIo> .

pingles closed this as completed Mar 15, 2018

ghost mentioned this issue Mar 15, 2018

Unable to assign role due to "pod not found" error #47

Closed

pingles reopened this Mar 22, 2018

pingles mentioned this issue Apr 13, 2018

Issues with IAM role access after a rolling update of the k8s cluster #48

Open

pingles added a commit that referenced this issue Apr 13, 2018

#46: handle cache.DeletedFinalStateUnknown in pod cache during process()

562b2e6

pingles mentioned this issue Apr 17, 2018

Use IndexerInformer rather than controller and queue #51

Merged

pingles closed this as completed Apr 23, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

unable to assign pod the iam role #46

unable to assign pod the iam role #46

tasdikrahman commented Mar 15, 2018

tasdikrahman commented Mar 15, 2018

pingles commented Mar 15, 2018 via email

tasdikrahman commented Mar 15, 2018

pingles commented Mar 15, 2018

tasdikrahman commented Mar 15, 2018

pingles commented Mar 15, 2018 via email

tasdikrahman commented Mar 22, 2018

pingles commented Mar 22, 2018

pingles commented Mar 22, 2018

pingles commented Mar 22, 2018

tasdikrahman commented Mar 22, 2018

pingles commented Apr 23, 2018

tasdikrahman commented Apr 25, 2018

pingles commented Apr 25, 2018 via email

tasdikrahman commented Apr 30, 2018

pingles commented Apr 30, 2018 via email

unable to assign pod the iam role #46

unable to assign pod the iam role #46

Comments

tasdikrahman commented Mar 15, 2018

tasdikrahman commented Mar 15, 2018

pingles commented Mar 15, 2018 via email

tasdikrahman commented Mar 15, 2018

pingles commented Mar 15, 2018

tasdikrahman commented Mar 15, 2018

pingles commented Mar 15, 2018 via email

tasdikrahman commented Mar 22, 2018

pingles commented Mar 22, 2018

pingles commented Mar 22, 2018

pingles commented Mar 22, 2018

tasdikrahman commented Mar 22, 2018

pingles commented Apr 23, 2018

tasdikrahman commented Apr 25, 2018

pingles commented Apr 25, 2018 via email

tasdikrahman commented Apr 30, 2018

pingles commented Apr 30, 2018 via email