Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

after a pod is scheduled, it should take < 5 seconds for apiserver to show it as running #3952

Closed
alex-mohr opened this issue Jan 29, 2015 · 39 comments
Assignees
Labels
area/kubelet-api priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/scalability Categorizes an issue or PR as relevant to SIG Scalability.
Milestone

Comments

@alex-mohr
Copy link
Contributor

@dchen1107 There seems to be a 0 to 20 second delay before a single pause pod to get to Running. It should be << 10 seconds. For version 0.8.1, part of #3936.

Here's an example. In this case, scheduler seemed to schedule it reasonably quickly (< 1 sec), the kubelet launched it quickly (<1 sec), and docker transitioned container to running quickly (<1 sec), but pod status available via the master took another 15 seconds. In particular, the status update seemed to either be blocked on the following kubelet.log entry, or that entry is triggered by something or another that was blocked:

I0129 22:08:20.019854 4903 server.go:324] GET /api/v1beta1/podInfo?podID=pause-pod-1fcbe922&podNamespace=default: (4.894854ms) 200

From my workstation:

% gcloud preview container pods --cluster=${PROJECT} create --name=pause-pod-1fcbe922 --image=kubernetes/pause --port=${PORT}

Output of list pods with timestamps in PST:
Thu Jan 29 14:08:03 PST 2015: 1: creating pause-pod-1fcbe922 on port 30069
pause-pod-1fcbe922
Thu Jan 29 14:08:05 PST 2015: 1: fetching pod list
pause-pod-1fcbe922 pause-pod-1fcbe922 kubernetes/pause k8s-prod5-node-1.c.mohr-instance-manager.internal/ name=pause-pod-1fcbe922 Unknown
Thu Jan 29 14:08:06 PST 2015: 1: fetching pod list
pause-pod-1fcbe922 pause-pod-1fcbe922 kubernetes/pause k8s-prod5-node-1.c.mohr-instance-manager.internal/ name=pause-pod-1fcbe922 Unknown
Thu Jan 29 14:08:08 PST 2015: 1: fetching pod list
pause-pod-1fcbe922 pause-pod-1fcbe922 kubernetes/pause k8s-prod5-node-1.c.mohr-instance-manager.internal/ name=pause-pod-1fcbe922 Unknown
Thu Jan 29 14:08:10 PST 2015: 1: fetching pod list
pause-pod-1fcbe922 pause-pod-1fcbe922 kubernetes/pause k8s-prod5-node-1.c.mohr-instance-manager.internal/ name=pause-pod-1fcbe922 Unknown
Thu Jan 29 14:08:11 PST 2015: 1: fetching pod list
pause-pod-1fcbe922 pause-pod-1fcbe922 kubernetes/pause k8s-prod5-node-1.c.mohr-instance-manager.internal/ name=pause-pod-1fcbe922 Unknown
Thu Jan 29 14:08:13 PST 2015: 1: fetching pod list
pause-pod-1fcbe922 pause-pod-1fcbe922 kubernetes/pause k8s-prod5-node-1.c.mohr-instance-manager.internal/ name=pause-pod-1fcbe922 Unknown
Thu Jan 29 14:08:14 PST 2015: 1: fetching pod list
pause-pod-1fcbe922 pause-pod-1fcbe922 kubernetes/pause k8s-prod5-node-1.c.mohr-instance-manager.internal/ name=pause-pod-1fcbe922 Unknown
Thu Jan 29 14:08:16 PST 2015: 1: fetching pod list
pause-pod-1fcbe922 pause-pod-1fcbe922 kubernetes/pause k8s-prod5-node-1.c.mohr-instance-manager.internal/ name=pause-pod-1fcbe922 Unknown
Thu Jan 29 14:08:18 PST 2015: 1: fetching pod list
pause-pod-1fcbe922 pause-pod-1fcbe922 kubernetes/pause k8s-prod5-node-1.c.mohr-instance-manager.internal/ name=pause-pod-1fcbe922 Unknown
Thu Jan 29 14:08:19 PST 2015: 1: fetching pod list
Thu Jan 29 14:08:21 PST 2015: 1: 1 running
Thu Jan 29 14:08:21 PST 2015: 1: deleting all pods

Relevant kubelet log in UTC:
I0129 22:08:05.297181 4903 logs.go:39] etcd DEBUG: recv.success.http://10.240.222.31:4001/v2/keys/registry/nodes/k8s-prod5-node-1.c.mohr-instance-manager.interna
l/boundpods?consistent=true&wait=true&waitIndex=12699
I0129 22:08:05.297448 4903 logs.go:39] etcd DEBUG: get /registry/nodes/k8s-prod5-node-1.c.mohr-instance-manager.internal/boundpods [http://10.240.222.31:4001]
I0129 22:08:05.297535 4903 logs.go:39] etcd DEBUG: Connecting to etcd: attempt1forkeys/registry/nodes/k8s-prod5-node-1.c.mohr-instance-manager.internal/boundpods
?consistent=true&wait=true&waitIndex=12706
I0129 22:08:05.297548 4903 logs.go:39] etcd DEBUG: send.request.to http://10.240.222.31:4001/v2/keys/registry/nodes/k8s-prod5-node-1.c.mohr-instance-manager.inte
rnal/boundpods?consistent=true&wait=true&waitIndex=12706 | method GET
W0129 22:08:05.308951 4903 config.go:307] Pod 1 () from etcd failed validation, ignoring: [name: required value '' namespace: required value 'default']
I0129 22:08:05.312089 4903 logs.go:39] etcd DEBUG: recv.response.fromhttp://10.240.222.31:4001/v2/keys/registry/nodes/k8s-prod5-node-1.c.mohr-instance-manager.in
ternal/boundpods?consistent=true&wait=true&waitIndex=12706
I0129 22:08:05.314162 4903 kubelet.go:793] Network container doesn't exist for pod "pause-pod-1fcbe922.default.etcd", killing and re-creating the pod
I0129 22:08:05.317955 4903 event.go:92] Event(api.ObjectReference{Kind:"BoundPod", Namespace:"default", Name:"pause-pod-1fcbe922", UID:"4cadf377-a803-11e4-a2ec-42010af0de1f", APIVersion:"v1beta1", ResourceVersion:"", FieldPath:"implicitly required container net"}): status: 'waiting', reason: 'pulled' Successfully pulled image "kubernetes/pause:latest"
I0129 22:08:05.419784 4903 event.go:92] Event(api.ObjectReference{Kind:"BoundPod", Namespace:"default", Name:"pause-pod-1fcbe922", UID:"4cadf377-a803-11e4-a2ec-42010af0de1f", APIVersion:"v1beta1", ResourceVersion:"", FieldPath:"implicitly required container net"}): status: 'waiting', reason: 'created' Created with docker id 93b26b849bfb454cc86cd6467a32906a79eb72a5705c3ae473ee1c7edc64b7ee
I0129 22:08:05.554694 4903 event.go:92] Event(api.ObjectReference{Kind:"BoundPod", Namespace:"default", Name:"pause-pod-1fcbe922", UID:"4cadf377-a803-11e4-a2ec-42010af0de1f", APIVersion:"v1beta1", ResourceVersion:"", FieldPath:"implicitly required container net"}): status: 'running', reason: 'started' Started with docker id 93b26b849bfb454cc86cd6467a32906a79eb72a5705c3ae473ee1c7edc64b7ee
I0129 22:08:05.675279 4903 event.go:92] Event(api.ObjectReference{Kind:"BoundPod", Namespace:"default", Name:"pause-pod-1fcbe922", UID:"4cadf377-a803-11e4-a2ec-42010af0de1f", APIVersion:"v1beta1", ResourceVersion:"", FieldPath:"spec.containers{pause-pod-1fcbe922}"}): status: 'waiting', reason: 'created' Created with docker id 80def133b602fa2e4d2cf98a9e7903f9c83887fa281277d38f2850dbfcbaeed8
I0129 22:08:05.777623 4903 event.go:92] Event(api.ObjectReference{Kind:"BoundPod", Namespace:"default", Name:"pause-pod-1fcbe922", UID:"4cadf377-a803-11e4-a2ec-42010af0de1f", APIVersion:"v1beta1", ResourceVersion:"", FieldPath:"spec.containers{pause-pod-1fcbe922}"}): status: 'running', reason: 'started' Started with docker id 80def133b602fa2e4d2cf98a9e7903f9c83887fa281277d38f2850dbfcbaeed8
I0129 22:08:07.996475 4903 server.go:324] GET /healthz: (10.235us) 200
I0129 22:08:10.519742 4903 server.go:324] GET /healthz: (11.461us) 200
I0129 22:08:10.960311 4903 file.go:171] Got pod from file "/etc/kubernetes/manifests/cadvisor.manifest": default.cadvisor-agent (cadvisormanifes12uqn2ohido76855gdecd9roadm7l0)
I0129 22:08:18.003785 4903 server.go:324] GET /healthz: (11.971us) 200
I0129 22:08:20.011837 4903 server.go:324] GET /healthz: (10.642us) 200
I0129 22:08:20.012008 4903 server.go:324] GET /healthz: (3.816us) 200
I0129 22:08:20.015171 4903 logs.go:39] http: multiple response.WriteHeader calls
I0129 22:08:20.015208 4903 server.go:324] GET /api/v1beta1/podInfo?podID=&podNamespace=: (80.895us) 400
goroutine 42938 [running]:
github.com/GoogleCloudPlatform/kubernetes/pkg/httplog.(_respLogger).WriteHeader(0xc209810600, 0x190)
/go/src/github.com/GoogleCloudPlatform/kubernetes/_output/dockerized/go/src/github.com/GoogleCloudPlatform/kubernetes/pkg/httplog/log.go:182 +0x9c
net/http.Error(0x7f3876a0f830, 0xc209810600, 0xaf2cf0, 0x1d, 0x190)
/usr/src/go/src/pkg/net/http/server.go:1244 +0x96
github.com/GoogleCloudPlatform/kubernetes/pkg/kubelet.(_Server).handlePodInfo(0xc20812a320, 0x7f3876a0f830, 0xc209810600, 0xc20a112dd0, 0x95d6901)
/go/src/github.com/GoogleCloudPlatform/kubernetes/_output/dockerized/go/src/github.com/GoogleCloudPlatform/kubernetes/pkg/kubelet/server.go:212 +0x2a4
github.com/GoogleCloudPlatform/kubernetes/pkg/kubelet.(_Server).handlePodInfoVersioned(0xc20812a320, 0x7f3876a0f830, 0xc209810600, 0xc20a112dd0)
/go/src/github.com/GoogleCloudPlatform/kubernetes/_output/dockerized/go/src/github.com/GoogleCloudPlatform/kubernetes/pkg/kubelet/server.go:197 +0x4a
github.com/GoogleCloudPlatform/kubernetes/pkg/kubelet._Server.(github.com/GoogleCloudPlatform/kubernetes/pkg/kubelet.handlePodInfoVersioned)·fm(0x7f3876a0f830, 0xc209810600, 0xc20a112dd0)
/go/src/github.com/GoogleCloudPlatform/kubernetes/_output/dockerized/go/src/github.com/GoogleCloudPlatform/kubernetes/pkg/kubelet/server.go:92 +0x44
net/http.HandlerFunc.ServeHTTP(0xc2081264e0, 0x7f3876a0f830, 0xc209810600, 0xc20a112dd0)
/usr/src/go/src/pkg/net/http/server.go:1235 +0x40
net/http.(_ServeMux).ServeHTTP(0xc20811ab10, 0x7f3876a0f830, 0xc209810600, 0xc20a112dd0)
/usr/src/go/src/pkg/net/http/server.go:1511 +0x1a3
github.com/GoogleCloudPlatform/kubernetes/pkg/kubelet.(_Server).ServeHTTP(0xc20812a300, 0x7f3876a0f7f8, 0xc208131220, 0xc20a112dd0)
/go/src/github.com/GoogleCloudPlatform/kubernetes/_output/dockerized/go/src/github.com/GoogleCloudPlatform/kubernetes/pkg/kubelet/server.go:323 +0x11d
net/http.serverHandler.ServeHTTP(0xc208112ba0, 0x7f3876a0f7f8, 0xc208131220, 0xc20a112dd0)
/usr/src/go/src/pkg/net/http/server.go:1673 +0x19f
net/http.(*
I0129 22:08:20.019854 4903 server.go:324] GET /api/v1beta1/podInfo?podID=pause-pod-1fcbe922&podNamespace=default: (4.894854ms) 200
I0129 22:08:20.910257 4903 server.go:324] GET /healthz: (11.933us) 200
I0129 22:08:22.819342 4903 logs.go:39] etcd DEBUG: recv.success.http://10.240.222.31:4001/v2/keys/registry/nodes/k8s-prod5-node-1.c.mohr-instance-manager.internal/boundpods?consistent=true&wait=true&waitIndex=12706
I0129 22:08:22.819569 4903 logs.go:39] etcd DEBUG: get /registry/nodes/k8s-prod5-node-1.c.mohr-instance-manager.internal/boundpods [http://10.240.222.31:4001]
I0129 22:08:22.819603 4903 logs.go:39] etcd DEBUG: Connecting to etcd: attempt1forkeys/registry/nodes/k8s-prod5-node-1.c.mohr-instance-manager.internal/boundpods?consistent=true&wait=true&waitIndex=12714
I0129 22:08:22.819613 4903 logs.go:39] etcd DEBUG: send.request.to http://10.240.222.31:4001/v2/keys/registry/nodes/k8s-prod5-node-1.c.mohr-instance-manager.internal/boundpods?consistent=true&wait=true&waitIndex=12714 | method GET
W0129 22:08:22.831964 4903 config.go:307] Pod 1 () from etcd failed validation, ignoring: [name: required value '' namespace: required value 'default']
I0129 22:08:22.834110 4903 logs.go:39] etcd DEBUG: recv.response.fromhttp://10.240.222.31:4001/v2/keys/registry/nodes/k8s-prod5-node-1.c.mohr-instance-manager.internal/boundpods?consistent=true&wait=true&waitIndex=12714
I0129 22:08:22.836408 4903 kubelet.go:1044] Killing unwanted container {podFullName:pause-pod-1fcbe922.default.etcd uuid:4cadf377-a803-11e4-a2ec-42010af0de1f containerName:pause-pod-1fcbe922}

@erictune
Copy link
Member

Why is kubelet talking to etcd? A recently built kubelet should only talk
to apiserver.

On Thu, Jan 29, 2015 at 2:16 PM, Alex Mohr [email protected] wrote:

@dchen1107 https://github.com/dchen1107 There seems to be a 0 to 20
second delay before a single pause pod to get to Running. It should be <<
10 seconds. For version 0.8.1, part of #3936
#3936.

Here's an example. In this case, scheduler seemed to schedule it
reasonably quickly (< 1 sec), the kubelet launched it quickly (<1 sec), and
docker transitioned container to running quickly (<1 sec), but pod status
available via the master took another 15 seconds. In particular, the status
update seemed to either be blocked on the following kubelet.log entry, or
that entry is triggered by something or another that was blocked:

I0129 22:08:20.019854 4903 server.go:324] GET
/api/v1beta1/podInfo?podID=pause-pod-1fcbe922&podNamespace=default:
(4.894854ms) 200

From my workstation:

% gcloud preview container pods --cluster=${PROJECT} create
--name=pause-pod-1fcbe922 --image=kubernetes/pause --port=${PORT}

Output of list pods with timestamps in PST:
Thu Jan 29 14:08:03 PST 2015: 1: creating pause-pod-1fcbe922 on port 30069
pause-pod-1fcbe922
Thu Jan 29 14:08:05 PST 2015: 1: fetching pod list
pause-pod-1fcbe922 pause-pod-1fcbe922 kubernetes/pause
k8s-prod5-node-1.c.mohr-instance-manager.internal/ name=pause-pod-1fcbe922
Unknown
Thu Jan 29 14:08:06 PST 2015: 1: fetching pod list
pause-pod-1fcbe922 pause-pod-1fcbe922 kubernetes/pause
k8s-prod5-node-1.c.mohr-instance-manager.internal/ name=pause-pod-1fcbe922
Unknown
Thu Jan 29 14:08:08 PST 2015: 1: fetching pod list
pause-pod-1fcbe922 pause-pod-1fcbe922 kubernetes/pause
k8s-prod5-node-1.c.mohr-instance-manager.internal/ name=pause-pod-1fcbe922
Unknown
Thu Jan 29 14:08:10 PST 2015: 1: fetching pod list
pause-pod-1fcbe922 pause-pod-1fcbe922 kubernetes/pause
k8s-prod5-node-1.c.mohr-instance-manager.internal/ name=pause-pod-1fcbe922
Unknown
Thu Jan 29 14:08:11 PST 2015: 1: fetching pod list
pause-pod-1fcbe922 pause-pod-1fcbe922 kubernetes/pause
k8s-prod5-node-1.c.mohr-instance-manager.internal/ name=pause-pod-1fcbe922
Unknown
Thu Jan 29 14:08:13 PST 2015: 1: fetching pod list
pause-pod-1fcbe922 pause-pod-1fcbe922 kubernetes/pause
k8s-prod5-node-1.c.mohr-instance-manager.internal/ name=pause-pod-1fcbe922
Unknown
Thu Jan 29 14:08:14 PST 2015: 1: fetching pod list
pause-pod-1fcbe922 pause-pod-1fcbe922 kubernetes/pause
k8s-prod5-node-1.c.mohr-instance-manager.internal/ name=pause-pod-1fcbe922
Unknown
Thu Jan 29 14:08:16 PST 2015: 1: fetching pod list
pause-pod-1fcbe922 pause-pod-1fcbe922 kubernetes/pause
k8s-prod5-node-1.c.mohr-instance-manager.internal/ name=pause-pod-1fcbe922
Unknown
Thu Jan 29 14:08:18 PST 2015: 1: fetching pod list
pause-pod-1fcbe922 pause-pod-1fcbe922 kubernetes/pause
k8s-prod5-node-1.c.mohr-instance-manager.internal/ name=pause-pod-1fcbe922
Unknown
Thu Jan 29 14:08:19 PST 2015: 1: fetching pod list
Thu Jan 29 14:08:21 PST 2015: 1: 1 running
Thu Jan 29 14:08:21 PST 2015: 1: deleting all pods

Relevant kubelet log in UTC:
I0129 22:08:05.297181 4903 logs.go:39] etcd DEBUG: recv.success.
http://10.240.222.31:4001/v2/keys/registry/nodes/k8s-prod5-node-1.c.mohr-instance-manager.interna
l/boundpods?consistent=true&wait=true&waitIndex=12699
I0129 22:08:05.297448 4903 logs.go:39] etcd DEBUG: get
/registry/nodes/k8s-prod5-node-1.c.mohr-instance-manager.internal/boundpods
[http://10.240.222.31:4001]
I0129 22:08:05.297535 4903 logs.go:39] etcd DEBUG: Connecting to etcd:
attempt1forkeys/registry/nodes/k8s-prod5-node-1.c.mohr-instance-manager.internal/boundpods
?consistent=true&wait=true&waitIndex=12706
I0129 22:08:05.297548 4903 logs.go:39] etcd DEBUG: send.request.to
http://10.240.222.31:4001/v2/keys/registry/nodes/k8s-prod5-node-1.c.mohr-instance-manager.inte
rnal/boundpods?consistent=true&wait=true&waitIndex=12706 | method GET
W0129 22:08:05.308951 4903 config.go:307] Pod 1 () from etcd failed
validation, ignoring: [name: required value '' namespace: required value
'default']
I0129 22:08:05.312089 4903 logs.go:39] etcd DEBUG:
recv.response.fromhttp://
10.240.222.31:4001/v2/keys/registry/nodes/k8s-prod5-node-1.c.mohr-instance-manager.in
ternal/boundpods?consistent=true&wait=true&waitIndex=12706
I0129 22:08:05.314162 4903 kubelet.go:793] Network container doesn't exist
for pod "pause-pod-1fcbe922.default.etcd", killing and re-creating the pod
I0129 22:08:05.317955 4903 event.go:92]
Event(api.ObjectReference{Kind:"BoundPod", Namespace:"default",
Name:"pause-pod-1fcbe922", UID:"4cadf377-a803-11e4-a2ec-42010af0de1f",
APIVersion:"v1beta1", ResourceVersion:"", FieldPath:"implicitly required
container net"}): status: 'waiting', reason: 'pulled' Successfully pulled
image "kubernetes/pause:latest"
I0129 22:08:05.419784 4903 event.go:92]
Event(api.ObjectReference{Kind:"BoundPod", Namespace:"default",
Name:"pause-pod-1fcbe922", UID:"4cadf377-a803-11e4-a2ec-42010af0de1f",
APIVersion:"v1beta1", ResourceVersion:"", FieldPath:"implicitly required
container net"}): status: 'waiting', reason: 'created' Created with docker
id 93b26b849bfb454cc86cd6467a32906a79eb72a5705c3ae473ee1c7edc64b7ee
I0129 22:08:05.554694 4903 event.go:92]
Event(api.ObjectReference{Kind:"BoundPod", Namespace:"default",
Name:"pause-pod-1fcbe922", UID:"4cadf377-a803-11e4-a2ec-42010af0de1f",
APIVersion:"v1beta1", ResourceVersion:"", FieldPath:"implicitly required
container net"}): status: 'running', reason: 'started' Started with docker
id 93b26b849bfb454cc86cd6467a32906a79eb72a5705c3ae473ee1c7edc64b7ee
I0129 22:08:05.675279 4903 event.go:92]
Event(api.ObjectReference{Kind:"BoundPod", Namespace:"default",
Name:"pause-pod-1fcbe922", UID:"4cadf377-a803-11e4-a2ec-42010af0de1f",
APIVersion:"v1beta1", ResourceVersion:"",
FieldPath:"spec.containers{pause-pod-1fcbe922}"}): status: 'waiting',
reason: 'created' Created with docker id
80def133b602fa2e4d2cf98a9e7903f9c83887fa281277d38f2850dbfcbaeed8
I0129 22:08:05.777623 4903 event.go:92]
Event(api.ObjectReference{Kind:"BoundPod", Namespace:"default",
Name:"pause-pod-1fcbe922", UID:"4cadf377-a803-11e4-a2ec-42010af0de1f",
APIVersion:"v1beta1", ResourceVersion:"",
FieldPath:"spec.containers{pause-pod-1fcbe922}"}): status: 'running',
reason: 'started' Started with docker id
80def133b602fa2e4d2cf98a9e7903f9c83887fa281277d38f2850dbfcbaeed8
I0129 22:08:07.996475 4903 server.go:324] GET /healthz: (10.235us) 200
I0129 22:08:10.519742 4903 server.go:324] GET /healthz: (11.461us) 200
I0129 22:08:10.960311 4903 file.go:171] Got pod from file
"/etc/kubernetes/manifests/cadvisor.manifest": default.cadvisor-agent
(cadvisormanifes12uqn2ohido76855gdecd9roadm7l0)
I0129 22:08:18.003785 4903 server.go:324] GET /healthz: (11.971us) 200
I0129 22:08:20.011837 4903 server.go:324] GET /healthz: (10.642us) 200
I0129 22:08:20.012008 4903 server.go:324] GET /healthz: (3.816us) 200
I0129 22:08:20.015171 4903 logs.go:39] http: multiple response.WriteHeader
calls
I0129 22:08:20.015208 4903 server.go:324] GET
/api/v1beta1/podInfo?podID=&podNamespace=: (80.895us) 400
goroutine 42938 [running]:
github.com/GoogleCloudPlatform/kubernetes/pkg/httplog.(

_respLogger).WriteHeader(0xc209810600, 0x190)
/go/src/github.com/GoogleCloudPlatform/kubernetes/_output/dockerized/go/src/github.com/GoogleCloudPlatform/kubernetes/pkg/httplog/log.go:182
http://github.com/GoogleCloudPlatform/kubernetes/_output/dockerized/go/src/github.com/GoogleCloudPlatform/kubernetes/pkg/httplog/log.go:182
+0x9c net/http.Error(0x7f3876a0f830, 0xc209810600, 0xaf2cf0, 0x1d, 0x190)
/usr/src/go/src/pkg/net/http/server.go:1244 +0x96
github.com/GoogleCloudPlatform/kubernetes/pkg/kubelet.(
http://github.com/GoogleCloudPlatform/kubernetes/pkg/kubelet.(_Server).handlePodInfo(0xc20812a320,
0x7f3876a0f830, 0xc209810600, 0xc20a112dd0, 0x95d6901)
/go/src/
github.com/GoogleCloudPlatform/kubernetes/_output/dockerized/go/src/github.com/GoogleCloudPlatform/kubernetes/pkg/kubelet/server.go:212
+0x2a4
github.com/GoogleCloudPlatform/kubernetes/pkg/kubelet.(

_Server).handlePodInfoVersioned(0xc20812a320, 0x7f3876a0f830,
0xc209810600, 0xc20a112dd0)
/go/src/github.com/GoogleCloudPlatform/kubernetes/_output/dockerized/go/src/github.com/GoogleCloudPlatform/kubernetes/pkg/kubelet/server.go:197
http://github.com/GoogleCloudPlatform/kubernetes/_output/dockerized/go/src/github.com/GoogleCloudPlatform/kubernetes/pkg/kubelet/server.go:197
+0x4a github.com/GoogleCloudPlatform/kubernetes/pkg/kubelet
http://github.com/GoogleCloudPlatform/kubernetes/pkg/kubelet._Server.(
github.com/GoogleCloudPlatform/kubernetes/pkg/kubelet.handlePodInfoVersioned)·fm(0x7f3876a0f830
http://github.com/GoogleCloudPlatform/kubernetes/pkg/kubelet.handlePodInfoVersioned)%C2%B7fm(0x7f3876a0f830,
0xc209810600, 0xc20a112dd0)
/go/src/
github.com/GoogleCloudPlatform/kubernetes/_output/dockerized/go/src/github.com/GoogleCloudPlatform/kubernetes/pkg/kubelet/server.go:92
+0x44
net/http.HandlerFunc.ServeHTTP(0xc2081264e0, 0x7f3876a0f830, 0xc209810600,
0xc20a112dd0)
/usr/src/go/src/pkg/net/http/server.go:1235 +0x40
net/http.(

_ServeMux).ServeHTTP(0xc20811ab10, 0x7f3876a0f830, 0xc209810600,
0xc20a112dd0) /usr/src/go/src/pkg/net/http/server.go:1511 +0x1a3
github.com/GoogleCloudPlatform/kubernetes/pkg/kubelet.(
http://github.com/GoogleCloudPlatform/kubernetes/pkg/kubelet.(_Server).ServeHTTP(0xc20812a300,
0x7f3876a0f7f8, 0xc208131220, 0xc20a112dd0)
/go/src/
github.com/GoogleCloudPlatform/kubernetes/_output/dockerized/go/src/github.com/GoogleCloudPlatform/kubernetes/pkg/kubelet/server.go:323
+0x11d
net/http.serverHandler.ServeHTTP(0xc208112ba0, 0x7f3876a0f7f8,
0xc208131220, 0xc20a112dd0)
/usr/src/go/src/pkg/net/http/server.go:1673 +0x19f
net/http.(*
I0129 22:08:20.019854 4903 server.go:324] GET
/api/v1beta1/podInfo?podID=pause-pod-1fcbe922&podNamespace=default:
(4.894854ms) 200
I0129 22:08:20.910257 4903 server.go:324] GET /healthz: (11.933us) 200
I0129 22:08:22.819342 4903 logs.go:39] etcd DEBUG: recv.success.
http://10.240.222.31:4001/v2/keys/registry/nodes/k8s-prod5-node-1.c.mohr-instance-manager.internal/boundpods?consistent=true&wait=true&waitIndex=12706
I0129 22:08:22.819569 4903 logs.go:39] etcd DEBUG: get
/registry/nodes/k8s-prod5-node-1.c.mohr-instance-manager.internal/boundpods
[http://10.240.222.31:4001]
I0129 22:08:22.819603 4903 logs.go:39] etcd DEBUG: Connecting to etcd:
attempt1forkeys/registry/nodes/k8s-prod5-node-1.c.mohr-instance-manager.internal/boundpods?consistent=true&wait=true&waitIndex=12714
I0129 22:08:22.819613 4903 logs.go:39] etcd DEBUG: send.request.to
http://10.240.222.31:4001/v2/keys/registry/nodes/k8s-prod5-node-1.c.mohr-instance-manager.internal/boundpods?consistent=true&wait=true&waitIndex=12714
| method GET
W0129 22:08:22.831964 4903 config.go:307] Pod 1 () from etcd failed
validation, ignoring: [name: required value '' namespace: required value
'default']
I0129 22:08:22.834110 4903 logs.go:39] etcd DEBUG:
recv.response.fromhttp://
10.240.222.31:4001/v2/keys/registry/nodes/k8s-prod5-node-1.c.mohr-instance-manager.internal/boundpods?consistent=true&wait=true&waitIndex=12714
I0129 22:08:22.836408 4903 kubelet.go:1044] Killing unwanted container
{podFullName:pause-pod-1fcbe922.default.etcd
uuid:4cadf377-a803-11e4-a2ec-42010af0de1f containerName:pause-pod-1fcbe922}


Reply to this email directly or view it on GitHub
#3952.

@brendandburns
Copy link
Contributor

Did those changes missed the 0.8.x release series. I don't see any etcd
from a server @ head...

--brendan

On Thu, Jan 29, 2015 at 4:21 PM, Eric Tune [email protected] wrote:

Why is kubelet talking to etcd? A recently built kubelet should only talk
to apiserver.

On Thu, Jan 29, 2015 at 2:16 PM, Alex Mohr [email protected]
wrote:

@dchen1107 https://github.com/dchen1107 There seems to be a 0 to 20
second delay before a single pause pod to get to Running. It should be
<<
10 seconds. For version 0.8.1, part of #3936
#3936.

Here's an example. In this case, scheduler seemed to schedule it
reasonably quickly (< 1 sec), the kubelet launched it quickly (<1 sec),
and
docker transitioned container to running quickly (<1 sec), but pod
status
available via the master took another 15 seconds. In particular, the
status
update seemed to either be blocked on the following kubelet.log entry,
or
that entry is triggered by something or another that was blocked:

I0129 22:08:20.019854 4903 server.go:324] GET
/api/v1beta1/podInfo?podID=pause-pod-1fcbe922&podNamespace=default:
(4.894854ms) 200

From my workstation:

% gcloud preview container pods --cluster=${PROJECT} create
--name=pause-pod-1fcbe922 --image=kubernetes/pause --port=${PORT}

Output of list pods with timestamps in PST:
Thu Jan 29 14:08:03 PST 2015: 1: creating pause-pod-1fcbe922 on port
30069
pause-pod-1fcbe922
Thu Jan 29 14:08:05 PST 2015: 1: fetching pod list
pause-pod-1fcbe922 pause-pod-1fcbe922 kubernetes/pause
k8s-prod5-node-1.c.mohr-instance-manager.internal/
name=pause-pod-1fcbe922
Unknown
Thu Jan 29 14:08:06 PST 2015: 1: fetching pod list
pause-pod-1fcbe922 pause-pod-1fcbe922 kubernetes/pause
k8s-prod5-node-1.c.mohr-instance-manager.internal/
name=pause-pod-1fcbe922
Unknown
Thu Jan 29 14:08:08 PST 2015: 1: fetching pod list
pause-pod-1fcbe922 pause-pod-1fcbe922 kubernetes/pause
k8s-prod5-node-1.c.mohr-instance-manager.internal/
name=pause-pod-1fcbe922
Unknown
Thu Jan 29 14:08:10 PST 2015: 1: fetching pod list
pause-pod-1fcbe922 pause-pod-1fcbe922 kubernetes/pause
k8s-prod5-node-1.c.mohr-instance-manager.internal/
name=pause-pod-1fcbe922
Unknown
Thu Jan 29 14:08:11 PST 2015: 1: fetching pod list
pause-pod-1fcbe922 pause-pod-1fcbe922 kubernetes/pause
k8s-prod5-node-1.c.mohr-instance-manager.internal/
name=pause-pod-1fcbe922
Unknown
Thu Jan 29 14:08:13 PST 2015: 1: fetching pod list
pause-pod-1fcbe922 pause-pod-1fcbe922 kubernetes/pause
k8s-prod5-node-1.c.mohr-instance-manager.internal/
name=pause-pod-1fcbe922
Unknown
Thu Jan 29 14:08:14 PST 2015: 1: fetching pod list
pause-pod-1fcbe922 pause-pod-1fcbe922 kubernetes/pause
k8s-prod5-node-1.c.mohr-instance-manager.internal/
name=pause-pod-1fcbe922
Unknown
Thu Jan 29 14:08:16 PST 2015: 1: fetching pod list
pause-pod-1fcbe922 pause-pod-1fcbe922 kubernetes/pause
k8s-prod5-node-1.c.mohr-instance-manager.internal/
name=pause-pod-1fcbe922
Unknown
Thu Jan 29 14:08:18 PST 2015: 1: fetching pod list
pause-pod-1fcbe922 pause-pod-1fcbe922 kubernetes/pause
k8s-prod5-node-1.c.mohr-instance-manager.internal/
name=pause-pod-1fcbe922
Unknown
Thu Jan 29 14:08:19 PST 2015: 1: fetching pod list
Thu Jan 29 14:08:21 PST 2015: 1: 1 running
Thu Jan 29 14:08:21 PST 2015: 1: deleting all pods

Relevant kubelet log in UTC:
I0129 22:08:05.297181 4903 logs.go:39] etcd DEBUG: recv.success.

http://10.240.222.31:4001/v2/keys/registry/nodes/k8s-prod5-node-1.c.mohr-instance-manager.interna
l/boundpods?consistent=true&wait=true&waitIndex=12699
I0129 22:08:05.297448 4903 logs.go:39] etcd DEBUG: get

/registry/nodes/k8s-prod5-node-1.c.mohr-instance-manager.internal/boundpods
[http://10.240.222.31:4001]
I0129 22:08:05.297535 4903 logs.go:39] etcd DEBUG: Connecting to etcd:

attempt1forkeys/registry/nodes/k8s-prod5-node-1.c.mohr-instance-manager.internal/boundpods

?consistent=true&wait=true&waitIndex=12706
I0129 22:08:05.297548 4903 logs.go:39] etcd DEBUG: send.request.to

http://10.240.222.31:4001/v2/keys/registry/nodes/k8s-prod5-node-1.c.mohr-instance-manager.inte
rnal/boundpods?consistent=true&wait=true&waitIndex=12706 | method GET
W0129 22:08:05.308951 4903 config.go:307] Pod 1 () from etcd failed
validation, ignoring: [name: required value '' namespace: required value
'default']
I0129 22:08:05.312089 4903 logs.go:39] etcd DEBUG:
recv.response.fromhttp://

10.240.222.31:4001/v2/keys/registry/nodes/k8s-prod5-node-1.c.mohr-instance-manager.in
ternal/boundpods?consistent=true&wait=true&waitIndex=12706
I0129 22:08:05.314162 4903 kubelet.go:793] Network container doesn't
exist
for pod "pause-pod-1fcbe922.default.etcd", killing and re-creating the
pod
I0129 22:08:05.317955 4903 event.go:92]
Event(api.ObjectReference{Kind:"BoundPod", Namespace:"default",
Name:"pause-pod-1fcbe922", UID:"4cadf377-a803-11e4-a2ec-42010af0de1f",
APIVersion:"v1beta1", ResourceVersion:"", FieldPath:"implicitly required
container net"}): status: 'waiting', reason: 'pulled' Successfully
pulled
image "kubernetes/pause:latest"
I0129 22:08:05.419784 4903 event.go:92]
Event(api.ObjectReference{Kind:"BoundPod", Namespace:"default",
Name:"pause-pod-1fcbe922", UID:"4cadf377-a803-11e4-a2ec-42010af0de1f",
APIVersion:"v1beta1", ResourceVersion:"", FieldPath:"implicitly required
container net"}): status: 'waiting', reason: 'created' Created with
docker
id 93b26b849bfb454cc86cd6467a32906a79eb72a5705c3ae473ee1c7edc64b7ee
I0129 22:08:05.554694 4903 event.go:92]
Event(api.ObjectReference{Kind:"BoundPod", Namespace:"default",
Name:"pause-pod-1fcbe922", UID:"4cadf377-a803-11e4-a2ec-42010af0de1f",
APIVersion:"v1beta1", ResourceVersion:"", FieldPath:"implicitly required
container net"}): status: 'running', reason: 'started' Started with
docker
id 93b26b849bfb454cc86cd6467a32906a79eb72a5705c3ae473ee1c7edc64b7ee
I0129 22:08:05.675279 4903 event.go:92]
Event(api.ObjectReference{Kind:"BoundPod", Namespace:"default",
Name:"pause-pod-1fcbe922", UID:"4cadf377-a803-11e4-a2ec-42010af0de1f",
APIVersion:"v1beta1", ResourceVersion:"",
FieldPath:"spec.containers{pause-pod-1fcbe922}"}): status: 'waiting',
reason: 'created' Created with docker id
80def133b602fa2e4d2cf98a9e7903f9c83887fa281277d38f2850dbfcbaeed8
I0129 22:08:05.777623 4903 event.go:92]
Event(api.ObjectReference{Kind:"BoundPod", Namespace:"default",
Name:"pause-pod-1fcbe922", UID:"4cadf377-a803-11e4-a2ec-42010af0de1f",
APIVersion:"v1beta1", ResourceVersion:"",
FieldPath:"spec.containers{pause-pod-1fcbe922}"}): status: 'running',
reason: 'started' Started with docker id
80def133b602fa2e4d2cf98a9e7903f9c83887fa281277d38f2850dbfcbaeed8
I0129 22:08:07.996475 4903 server.go:324] GET /healthz: (10.235us) 200
I0129 22:08:10.519742 4903 server.go:324] GET /healthz: (11.461us) 200
I0129 22:08:10.960311 4903 file.go:171] Got pod from file
"/etc/kubernetes/manifests/cadvisor.manifest": default.cadvisor-agent
(cadvisormanifes12uqn2ohido76855gdecd9roadm7l0)
I0129 22:08:18.003785 4903 server.go:324] GET /healthz: (11.971us) 200
I0129 22:08:20.011837 4903 server.go:324] GET /healthz: (10.642us) 200
I0129 22:08:20.012008 4903 server.go:324] GET /healthz: (3.816us) 200
I0129 22:08:20.015171 4903 logs.go:39] http: multiple
response.WriteHeader
calls
I0129 22:08:20.015208 4903 server.go:324] GET
/api/v1beta1/podInfo?podID=&podNamespace=: (80.895us) 400
goroutine 42938 [running]:
github.com/GoogleCloudPlatform/kubernetes/pkg/httplog.(

*respLogger).WriteHeader(0xc209810600, 0x190)
/go/src/
github.com/GoogleCloudPlatform/kubernetes/_output/dockerized/go/src/github.com/GoogleCloudPlatform/kubernetes/pkg/httplog/log.go:182
<
http://github.com/GoogleCloudPlatform/kubernetes/_output/dockerized/go/src/github.com/GoogleCloudPlatform/kubernetes/pkg/httplog/log.go:182>

+0x9c net/http.Error(0x7f3876a0f830, 0xc209810600, 0xaf2cf0, 0x1d,
0x190)
/usr/src/go/src/pkg/net/http/server.go:1244 +0x96
github.com/GoogleCloudPlatform/kubernetes/pkg/kubelet.(
http://github.com/GoogleCloudPlatform/kubernetes/pkg/kubelet.(*Server).handlePodInfo(0xc20812a320,

0x7f3876a0f830, 0xc209810600, 0xc20a112dd0, 0x95d6901)
/go/src/

github.com/GoogleCloudPlatform/kubernetes/_output/dockerized/go/src/github.com/GoogleCloudPlatform/kubernetes/pkg/kubelet/server.go:212
+0x2a4
github.com/GoogleCloudPlatform/kubernetes/pkg/kubelet.(

*Server).handlePodInfoVersioned(0xc20812a320, 0x7f3876a0f830,
0xc209810600, 0xc20a112dd0)
/go/src/
github.com/GoogleCloudPlatform/kubernetes/_output/dockerized/go/src/github.com/GoogleCloudPlatform/kubernetes/pkg/kubelet/server.go:197
<
http://github.com/GoogleCloudPlatform/kubernetes/_output/dockerized/go/src/github.com/GoogleCloudPlatform/kubernetes/pkg/kubelet/server.go:197>

+0x4a github.com/GoogleCloudPlatform/kubernetes/pkg/kubelet
http://github.com/GoogleCloudPlatform/kubernetes/pkg/kubelet.*Server.(

github.com/GoogleCloudPlatform/kubernetes/pkg/kubelet.handlePodInfoVersioned)·fm(0x7f3876a0f830
http://github.com/GoogleCloudPlatform/kubernetes/pkg/kubelet.handlePodInfoVersioned)%C2%B7fm(0x7f3876a0f830
<
http://github.com/GoogleCloudPlatform/kubernetes/pkg/kubelet.handlePodInfoVersioned)%C2%B7fm(0x7f3876a0f830>,

0xc209810600, 0xc20a112dd0)
/go/src/

github.com/GoogleCloudPlatform/kubernetes/_output/dockerized/go/src/github.com/GoogleCloudPlatform/kubernetes/pkg/kubelet/server.go:92
+0x44
net/http.HandlerFunc.ServeHTTP(0xc2081264e0, 0x7f3876a0f830,
0xc209810600,
0xc20a112dd0)
/usr/src/go/src/pkg/net/http/server.go:1235 +0x40
net/http.(

_ServeMux).ServeHTTP(0xc20811ab10, 0x7f3876a0f830, 0xc209810600,
0xc20a112dd0) /usr/src/go/src/pkg/net/http/server.go:1511 +0x1a3
github.com/GoogleCloudPlatform/kubernetes/pkg/kubelet.(
http://github.com/GoogleCloudPlatform/kubernetes/pkg/kubelet.(_Server).ServeHTTP(0xc20812a300,

0x7f3876a0f7f8, 0xc208131220, 0xc20a112dd0)
/go/src/

github.com/GoogleCloudPlatform/kubernetes/_output/dockerized/go/src/github.com/GoogleCloudPlatform/kubernetes/pkg/kubelet/server.go:323
+0x11d
net/http.serverHandler.ServeHTTP(0xc208112ba0, 0x7f3876a0f7f8,
0xc208131220, 0xc20a112dd0)
/usr/src/go/src/pkg/net/http/server.go:1673 +0x19f
net/http.(*
I0129 22:08:20.019854 4903 server.go:324] GET
/api/v1beta1/podInfo?podID=pause-pod-1fcbe922&podNamespace=default:
(4.894854ms) 200
I0129 22:08:20.910257 4903 server.go:324] GET /healthz: (11.933us) 200
I0129 22:08:22.819342 4903 logs.go:39] etcd DEBUG: recv.success.

http://10.240.222.31:4001/v2/keys/registry/nodes/k8s-prod5-node-1.c.mohr-instance-manager.internal/boundpods?consistent=true&wait=true&waitIndex=12706
I0129 22:08:22.819569 4903 logs.go:39] etcd DEBUG: get

/registry/nodes/k8s-prod5-node-1.c.mohr-instance-manager.internal/boundpods
[http://10.240.222.31:4001]
I0129 22:08:22.819603 4903 logs.go:39] etcd DEBUG: Connecting to etcd:

attempt1forkeys/registry/nodes/k8s-prod5-node-1.c.mohr-instance-manager.internal/boundpods?consistent=true&wait=true&waitIndex=12714

I0129 22:08:22.819613 4903 logs.go:39] etcd DEBUG: send.request.to

http://10.240.222.31:4001/v2/keys/registry/nodes/k8s-prod5-node-1.c.mohr-instance-manager.internal/boundpods?consistent=true&wait=true&waitIndex=12714
| method GET
W0129 22:08:22.831964 4903 config.go:307] Pod 1 () from etcd failed
validation, ignoring: [name: required value '' namespace: required value
'default']
I0129 22:08:22.834110 4903 logs.go:39] etcd DEBUG:
recv.response.fromhttp://

10.240.222.31:4001/v2/keys/registry/nodes/k8s-prod5-node-1.c.mohr-instance-manager.internal/boundpods?consistent=true&wait=true&waitIndex=12714
I0129 22:08:22.836408 4903 kubelet.go:1044] Killing unwanted container
{podFullName:pause-pod-1fcbe922.default.etcd
uuid:4cadf377-a803-11e4-a2ec-42010af0de1f
containerName:pause-pod-1fcbe922}


Reply to this email directly or view it on GitHub
#3952.


Reply to this email directly or view it on GitHub
#3952 (comment)
.

@erictune
Copy link
Member

I guess so.

On Thu, Jan 29, 2015 at 4:28 PM, Brendan Burns [email protected]
wrote:

Did those changes missed the 0.8.x release series. I don't see any etcd
from a server @ head...

--brendan

On Thu, Jan 29, 2015 at 4:21 PM, Eric Tune [email protected]
wrote:

Why is kubelet talking to etcd? A recently built kubelet should only
talk
to apiserver.

On Thu, Jan 29, 2015 at 2:16 PM, Alex Mohr [email protected]
wrote:

@dchen1107 https://github.com/dchen1107 There seems to be a 0 to 20
second delay before a single pause pod to get to Running. It should be
<<
10 seconds. For version 0.8.1, part of #3936
#3936.

Here's an example. In this case, scheduler seemed to schedule it
reasonably quickly (< 1 sec), the kubelet launched it quickly (<1
sec),
and
docker transitioned container to running quickly (<1 sec), but pod
status
available via the master took another 15 seconds. In particular, the
status
update seemed to either be blocked on the following kubelet.log entry,
or
that entry is triggered by something or another that was blocked:

I0129 22:08:20.019854 4903 server.go:324] GET
/api/v1beta1/podInfo?podID=pause-pod-1fcbe922&podNamespace=default:
(4.894854ms) 200

From my workstation:

% gcloud preview container pods --cluster=${PROJECT} create
--name=pause-pod-1fcbe922 --image=kubernetes/pause --port=${PORT}

Output of list pods with timestamps in PST:
Thu Jan 29 14:08:03 PST 2015: 1: creating pause-pod-1fcbe922 on port
30069
pause-pod-1fcbe922
Thu Jan 29 14:08:05 PST 2015: 1: fetching pod list
pause-pod-1fcbe922 pause-pod-1fcbe922 kubernetes/pause
k8s-prod5-node-1.c.mohr-instance-manager.internal/
name=pause-pod-1fcbe922
Unknown
Thu Jan 29 14:08:06 PST 2015: 1: fetching pod list
pause-pod-1fcbe922 pause-pod-1fcbe922 kubernetes/pause
k8s-prod5-node-1.c.mohr-instance-manager.internal/
name=pause-pod-1fcbe922
Unknown
Thu Jan 29 14:08:08 PST 2015: 1: fetching pod list
pause-pod-1fcbe922 pause-pod-1fcbe922 kubernetes/pause
k8s-prod5-node-1.c.mohr-instance-manager.internal/
name=pause-pod-1fcbe922
Unknown
Thu Jan 29 14:08:10 PST 2015: 1: fetching pod list
pause-pod-1fcbe922 pause-pod-1fcbe922 kubernetes/pause
k8s-prod5-node-1.c.mohr-instance-manager.internal/
name=pause-pod-1fcbe922
Unknown
Thu Jan 29 14:08:11 PST 2015: 1: fetching pod list
pause-pod-1fcbe922 pause-pod-1fcbe922 kubernetes/pause
k8s-prod5-node-1.c.mohr-instance-manager.internal/
name=pause-pod-1fcbe922
Unknown
Thu Jan 29 14:08:13 PST 2015: 1: fetching pod list
pause-pod-1fcbe922 pause-pod-1fcbe922 kubernetes/pause
k8s-prod5-node-1.c.mohr-instance-manager.internal/
name=pause-pod-1fcbe922
Unknown
Thu Jan 29 14:08:14 PST 2015: 1: fetching pod list
pause-pod-1fcbe922 pause-pod-1fcbe922 kubernetes/pause
k8s-prod5-node-1.c.mohr-instance-manager.internal/
name=pause-pod-1fcbe922
Unknown
Thu Jan 29 14:08:16 PST 2015: 1: fetching pod list
pause-pod-1fcbe922 pause-pod-1fcbe922 kubernetes/pause
k8s-prod5-node-1.c.mohr-instance-manager.internal/
name=pause-pod-1fcbe922
Unknown
Thu Jan 29 14:08:18 PST 2015: 1: fetching pod list
pause-pod-1fcbe922 pause-pod-1fcbe922 kubernetes/pause
k8s-prod5-node-1.c.mohr-instance-manager.internal/
name=pause-pod-1fcbe922
Unknown
Thu Jan 29 14:08:19 PST 2015: 1: fetching pod list
Thu Jan 29 14:08:21 PST 2015: 1: 1 running
Thu Jan 29 14:08:21 PST 2015: 1: deleting all pods

Relevant kubelet log in UTC:
I0129 22:08:05.297181 4903 logs.go:39] etcd DEBUG: recv.success.

http://10.240.222.31:4001/v2/keys/registry/nodes/k8s-prod5-node-1.c.mohr-instance-manager.interna

l/boundpods?consistent=true&wait=true&waitIndex=12699
I0129 22:08:05.297448 4903 logs.go:39] etcd DEBUG: get

/registry/nodes/k8s-prod5-node-1.c.mohr-instance-manager.internal/boundpods

[http://10.240.222.31:4001]
I0129 22:08:05.297535 4903 logs.go:39] etcd DEBUG: Connecting to etcd:

attempt1forkeys/registry/nodes/k8s-prod5-node-1.c.mohr-instance-manager.internal/boundpods

?consistent=true&wait=true&waitIndex=12706
I0129 22:08:05.297548 4903 logs.go:39] etcd DEBUG: send.request.to

http://10.240.222.31:4001/v2/keys/registry/nodes/k8s-prod5-node-1.c.mohr-instance-manager.inte

rnal/boundpods?consistent=true&wait=true&waitIndex=12706 | method GET
W0129 22:08:05.308951 4903 config.go:307] Pod 1 () from etcd failed
validation, ignoring: [name: required value '' namespace: required
value
'default']
I0129 22:08:05.312089 4903 logs.go:39] etcd DEBUG:
recv.response.fromhttp://

10.240.222.31:4001/v2/keys/registry/nodes/k8s-prod5-node-1.c.mohr-instance-manager.in

ternal/boundpods?consistent=true&wait=true&waitIndex=12706
I0129 22:08:05.314162 4903 kubelet.go:793] Network container doesn't
exist
for pod "pause-pod-1fcbe922.default.etcd", killing and re-creating the
pod
I0129 22:08:05.317955 4903 event.go:92]
Event(api.ObjectReference{Kind:"BoundPod", Namespace:"default",
Name:"pause-pod-1fcbe922", UID:"4cadf377-a803-11e4-a2ec-42010af0de1f",
APIVersion:"v1beta1", ResourceVersion:"", FieldPath:"implicitly
required
container net"}): status: 'waiting', reason: 'pulled' Successfully
pulled
image "kubernetes/pause:latest"
I0129 22:08:05.419784 4903 event.go:92]
Event(api.ObjectReference{Kind:"BoundPod", Namespace:"default",
Name:"pause-pod-1fcbe922", UID:"4cadf377-a803-11e4-a2ec-42010af0de1f",
APIVersion:"v1beta1", ResourceVersion:"", FieldPath:"implicitly
required
container net"}): status: 'waiting', reason: 'created' Created with
docker
id 93b26b849bfb454cc86cd6467a32906a79eb72a5705c3ae473ee1c7edc64b7ee
I0129 22:08:05.554694 4903 event.go:92]
Event(api.ObjectReference{Kind:"BoundPod", Namespace:"default",
Name:"pause-pod-1fcbe922", UID:"4cadf377-a803-11e4-a2ec-42010af0de1f",
APIVersion:"v1beta1", ResourceVersion:"", FieldPath:"implicitly
required
container net"}): status: 'running', reason: 'started' Started with
docker
id 93b26b849bfb454cc86cd6467a32906a79eb72a5705c3ae473ee1c7edc64b7ee
I0129 22:08:05.675279 4903 event.go:92]
Event(api.ObjectReference{Kind:"BoundPod", Namespace:"default",
Name:"pause-pod-1fcbe922", UID:"4cadf377-a803-11e4-a2ec-42010af0de1f",
APIVersion:"v1beta1", ResourceVersion:"",
FieldPath:"spec.containers{pause-pod-1fcbe922}"}): status: 'waiting',
reason: 'created' Created with docker id
80def133b602fa2e4d2cf98a9e7903f9c83887fa281277d38f2850dbfcbaeed8
I0129 22:08:05.777623 4903 event.go:92]
Event(api.ObjectReference{Kind:"BoundPod", Namespace:"default",
Name:"pause-pod-1fcbe922", UID:"4cadf377-a803-11e4-a2ec-42010af0de1f",
APIVersion:"v1beta1", ResourceVersion:"",
FieldPath:"spec.containers{pause-pod-1fcbe922}"}): status: 'running',
reason: 'started' Started with docker id
80def133b602fa2e4d2cf98a9e7903f9c83887fa281277d38f2850dbfcbaeed8
I0129 22:08:07.996475 4903 server.go:324] GET /healthz: (10.235us) 200
I0129 22:08:10.519742 4903 server.go:324] GET /healthz: (11.461us) 200
I0129 22:08:10.960311 4903 file.go:171] Got pod from file
"/etc/kubernetes/manifests/cadvisor.manifest": default.cadvisor-agent
(cadvisormanifes12uqn2ohido76855gdecd9roadm7l0)
I0129 22:08:18.003785 4903 server.go:324] GET /healthz: (11.971us) 200
I0129 22:08:20.011837 4903 server.go:324] GET /healthz: (10.642us) 200
I0129 22:08:20.012008 4903 server.go:324] GET /healthz: (3.816us) 200
I0129 22:08:20.015171 4903 logs.go:39] http: multiple
response.WriteHeader
calls
I0129 22:08:20.015208 4903 server.go:324] GET
/api/v1beta1/podInfo?podID=&podNamespace=: (80.895us) 400
goroutine 42938 [running]:
github.com/GoogleCloudPlatform/kubernetes/pkg/httplog.(

*respLogger).WriteHeader(0xc209810600, 0x190)
/go/src/

github.com/GoogleCloudPlatform/kubernetes/_output/dockerized/go/src/github.com/GoogleCloudPlatform/kubernetes/pkg/httplog/log.go:182

<

http://github.com/GoogleCloudPlatform/kubernetes/_output/dockerized/go/src/github.com/GoogleCloudPlatform/kubernetes/pkg/httplog/log.go:182>

+0x9c net/http.Error(0x7f3876a0f830, 0xc209810600, 0xaf2cf0, 0x1d,
0x190)
/usr/src/go/src/pkg/net/http/server.go:1244 +0x96
github.com/GoogleCloudPlatform/kubernetes/pkg/kubelet.(
http://github.com/GoogleCloudPlatform/kubernetes/pkg/kubelet.(*Server).handlePodInfo(0xc20812a320,

0x7f3876a0f830, 0xc209810600, 0xc20a112dd0, 0x95d6901)
/go/src/

github.com/GoogleCloudPlatform/kubernetes/_output/dockerized/go/src/github.com/GoogleCloudPlatform/kubernetes/pkg/kubelet/server.go:212

+0x2a4
github.com/GoogleCloudPlatform/kubernetes/pkg/kubelet.(

*Server).handlePodInfoVersioned(0xc20812a320, 0x7f3876a0f830,
0xc209810600, 0xc20a112dd0)
/go/src/

github.com/GoogleCloudPlatform/kubernetes/_output/dockerized/go/src/github.com/GoogleCloudPlatform/kubernetes/pkg/kubelet/server.go:197

<

http://github.com/GoogleCloudPlatform/kubernetes/_output/dockerized/go/src/github.com/GoogleCloudPlatform/kubernetes/pkg/kubelet/server.go:197>

+0x4a github.com/GoogleCloudPlatform/kubernetes/pkg/kubelet
http://github.com/GoogleCloudPlatform/kubernetes/pkg/kubelet.*Server.(

github.com/GoogleCloudPlatform/kubernetes/pkg/kubelet.handlePodInfoVersioned)·fm(0x7f3876a0f830
http://github.com/GoogleCloudPlatform/kubernetes/pkg/kubelet.handlePodInfoVersioned)%C2%B7fm(0x7f3876a0f830
<
http://github.com/GoogleCloudPlatform/kubernetes/pkg/kubelet.handlePodInfoVersioned)%C2%B7fm(0x7f3876a0f830>

<

http://github.com/GoogleCloudPlatform/kubernetes/pkg/kubelet.handlePodInfoVersioned)%C2%B7fm(0x7f3876a0f830>,

0xc209810600, 0xc20a112dd0)
/go/src/

github.com/GoogleCloudPlatform/kubernetes/_output/dockerized/go/src/github.com/GoogleCloudPlatform/kubernetes/pkg/kubelet/server.go:92

+0x44
net/http.HandlerFunc.ServeHTTP(0xc2081264e0, 0x7f3876a0f830,
0xc209810600,
0xc20a112dd0)
/usr/src/go/src/pkg/net/http/server.go:1235 +0x40
net/http.(

_ServeMux).ServeHTTP(0xc20811ab10, 0x7f3876a0f830, 0xc209810600,
0xc20a112dd0) /usr/src/go/src/pkg/net/http/server.go:1511 +0x1a3
github.com/GoogleCloudPlatform/kubernetes/pkg/kubelet.(
http://github.com/GoogleCloudPlatform/kubernetes/pkg/kubelet.(_Server).ServeHTTP(0xc20812a300,

0x7f3876a0f7f8, 0xc208131220, 0xc20a112dd0)
/go/src/

github.com/GoogleCloudPlatform/kubernetes/_output/dockerized/go/src/github.com/GoogleCloudPlatform/kubernetes/pkg/kubelet/server.go:323

+0x11d
net/http.serverHandler.ServeHTTP(0xc208112ba0, 0x7f3876a0f7f8,
0xc208131220, 0xc20a112dd0)
/usr/src/go/src/pkg/net/http/server.go:1673 +0x19f
net/http.(*
I0129 22:08:20.019854 4903 server.go:324] GET
/api/v1beta1/podInfo?podID=pause-pod-1fcbe922&podNamespace=default:
(4.894854ms) 200
I0129 22:08:20.910257 4903 server.go:324] GET /healthz: (11.933us) 200
I0129 22:08:22.819342 4903 logs.go:39] etcd DEBUG: recv.success.

http://10.240.222.31:4001/v2/keys/registry/nodes/k8s-prod5-node-1.c.mohr-instance-manager.internal/boundpods?consistent=true&wait=true&waitIndex=12706

I0129 22:08:22.819569 4903 logs.go:39] etcd DEBUG: get

/registry/nodes/k8s-prod5-node-1.c.mohr-instance-manager.internal/boundpods

[http://10.240.222.31:4001]
I0129 22:08:22.819603 4903 logs.go:39] etcd DEBUG: Connecting to etcd:

attempt1forkeys/registry/nodes/k8s-prod5-node-1.c.mohr-instance-manager.internal/boundpods?consistent=true&wait=true&waitIndex=12714

I0129 22:08:22.819613 4903 logs.go:39] etcd DEBUG: send.request.to

http://10.240.222.31:4001/v2/keys/registry/nodes/k8s-prod5-node-1.c.mohr-instance-manager.internal/boundpods?consistent=true&wait=true&waitIndex=12714

| method GET
W0129 22:08:22.831964 4903 config.go:307] Pod 1 () from etcd failed
validation, ignoring: [name: required value '' namespace: required
value
'default']
I0129 22:08:22.834110 4903 logs.go:39] etcd DEBUG:
recv.response.fromhttp://

10.240.222.31:4001/v2/keys/registry/nodes/k8s-prod5-node-1.c.mohr-instance-manager.internal/boundpods?consistent=true&wait=true&waitIndex=12714

I0129 22:08:22.836408 4903 kubelet.go:1044] Killing unwanted container
{podFullName:pause-pod-1fcbe922.default.etcd
uuid:4cadf377-a803-11e4-a2ec-42010af0de1f
containerName:pause-pod-1fcbe922}


Reply to this email directly or view it on GitHub
#3952.


Reply to this email directly or view it on GitHub
<
https://github.com/GoogleCloudPlatform/kubernetes/issues/3952#issuecomment-72131750>

.


Reply to this email directly or view it on GitHub
#3952 (comment)
.

alex-mohr added a commit that referenced this issue Jan 30, 2015
Make pod status update every 5 seconds instead of every 30 to mask the issue #3952.  I suspect (but haven't confirmed) that #3927 doesn't completely address the issue since it will still happily cache a Pending status for up to 30 seconds even though the pod may have transitioned to Running already.
@bgrant0607 bgrant0607 added sig/scalability Categorizes an issue or PR as relevant to SIG Scalability. and removed area/usability labels Feb 5, 2015
@lavalamp
Copy link
Member

lavalamp commented Mar 3, 2015

We need to reevaluate this now that kubelet should be writing these values without the PodCache.

@davidopp
Copy link
Member

Can I close this and just assume we'll notice when working on #3954 if this is still a problem?

@lavalamp
Copy link
Member

TL;DR: when a pod starts running on kubelet, it should take < 5 seconds for apiserver to show it as running. Really this should say something about tail latency, but let's say 50th percentile for now. Then we can close this bug when the pod status is written by kubelet and not computed by pod cache, which I do not believe is the case yet. @dchen1107 do you know who is owning that?

@alex-mohr alex-mohr changed the title 0-20 second delay before running pod transitions to status: Running when a pod starts running on kubelet, it should take < 5 seconds for apiserver to show it as running Mar 19, 2015
@alex-mohr alex-mohr changed the title when a pod starts running on kubelet, it should take < 5 seconds for apiserver to show it as running after a pod is scheduled, it should take < 5 seconds for apiserver to show it as running Mar 19, 2015
@alex-mohr
Copy link
Contributor Author

@davidopp this tracks separate exit criteria: there's a separate delay after a pod is scheduled, so I clarified title. If you want to re-word #3954 to track e2e latency for X pods to appear as running after being sent to apiserver and close this one, that'd also be fine.

@davidopp davidopp added this to the v1.0 milestone Mar 19, 2015
@davidopp
Copy link
Member

@lavalamp is that #5555?

@fgrzadkowski
Copy link
Contributor

I think that #5555 + #5714 should address this (kubelet updates pod status when it changes)

@wojtek-t
Copy link
Member

I'm not 100% sure what this issue refers to. Basically, once the pod is scheduled, it needs to be started on Kubelet. This requires e.g. pulling the image if it isn't already present on that node. This can easily take more than 5 seconds, so I'm afraid that this "5 seconds" from the title cannot be satisfied in all cases.

So I think we need to clarify what we would like to take less than 5 seconds (maybe adding that 5 seconds assuming that the image is already on the machine would be enough?)

[update]: What I wanted to say is that #5555 + #5714 is only fixing the problem of "reporting after the pod is already started". But IIUC, the bug is saying also about starting the pod itself.

@alex-mohr
Copy link
Contributor Author

Yes, pulls can take arbitrarily long, so the intent with this was "API call to create a pod whose image is already pre-pulled until API reports pod is running should be < 5 seconds." Ideally we'd be in the sub-second range.

I think that #5555 + #5714 should address this (kubelet updates pod status when it changes)

That's plausible, but we should verify they do. We care about product performance for v1 -- implementations are a means to that end.

@wojtek-t
Copy link
Member

Thanks for clarification Alex - I completely agree that if we exclude pulling image, this goal is definitely reasonable.
Also, what I wanted to say is that I think that fast reporting is only part of the problem - the other part is startup time in Kubelet and we need to look deeper into it.

@lavalamp
Copy link
Member

@davidopp @alex-mohr Yes I think #5555 + #5714 should fix this, and someone should write an e2e test that verifies this.

@satnam6502
Copy link
Contributor

For my soak tests I routinely see pods fail to schedule after waiting 20 minutes.

@wojtek-t
Copy link
Member

cc @fgrzadkowski
I also started some experiments yesterday and I don't think that PR's mentioned above are fixing the problem. Basically what I observed was: scheduling is quite fast, but starting pods is pretty slow - I will try to provide more detailed numbers soon.

@davidopp davidopp assigned fgrzadkowski and gmarek and unassigned fgrzadkowski Jun 1, 2015
@davidopp
Copy link
Member

davidopp commented Jun 2, 2015

Reassigning to @gmarek. This is very similar to #3954.

@satnam6502
Copy link
Contributor

@wojtek-t Minor point, note that if for any reason a cluster has no monitoring or logging set up there may not be any containers running on a node.

If you wanted to run a beefier container that is easy to check for health by querying through a REST API call I can suggest Elasticsearch which we have on gcr.io.

@davidopp davidopp added priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. and removed priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. labels Jun 8, 2015
@davidopp
Copy link
Member

I'd like to close this issue. In this comment:
#3952 (comment)
@alex-mohr said "If you want to re-word #3954 to track e2e latency for X pods to appear as running after being sent to apiserver and close this one, that'd also be fine." But that's actually how we've been interpreting #3954 even though technically that's not what it said (it said time from submitting until running on kubelet, but we've been measuring time from submitting until shows as running on master).

@davidopp
Copy link
Member

Sorry, never mind, somehow I overlooked that the measurements in #3954 use pod startup time reported by Kubelet, so we actually are measuring exactly what #3954 specified.

@gmarek maybe you could augment #9186 to also record the difference between the pod startup time reported by Kubelet and the time on the master when the test detects the pod as running? (I think that's what this issue was asking for.)

@davidopp davidopp added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. labels Jun 10, 2015
@gmarek
Copy link
Contributor

gmarek commented Jun 10, 2015

I can, but I don't really know how to do this easily, short of hammering the master. I may add a watch for Pods in test but this seems like an overkill to me, as the only "latency" between time when Kubelet reports that it wrote data to etcd and moment when master "notices" the change (i.e. have it written in etcd) is the time skew, as AFAIK etcd writes are synchronous.

@davidopp
Copy link
Member

@alex-mohr thinks it is important to measure this so that we can get an end-to-end startup latency number. (And I agree.) You could do a new measurement that just measures the difference between time observed on master and StartTime recorded in PodStatus, or a new measurement that's the full end-to-end startup time (time in. #3952 + #3954). I agree there is very little in the path that could make this number large, but I think it makes sense to measure it to avoid surprises. Doing something based on watch sounds reasonable.

@timothysc
Copy link
Member

I'm almost certain @rrati and @fgrzadkowski have done similar things in other tests.

@gmarek
Copy link
Contributor

gmarek commented Jun 10, 2015

If you think it's important I'll add this.

@lavalamp
Copy link
Member

A measurement of this could be added to my #9592, which is already measuring something similar.

@gmarek
Copy link
Contributor

gmarek commented Jun 11, 2015

I added watch-based monitoring to #9186, similar to what @lavalamp uses. The problem with this approach is that effectively we measure watch lag, which adds to e2e startup latency only for systems depending on watch. Ordinary gets/lists will return fresh data from etcd, which will include recently added entries.

@rrati
Copy link

rrati commented Jun 11, 2015

@davidopp I read this requirement differently. To me, a pod is scheduled when it has a node assigned to it and in that case I don't think there's a need to get any information from the kubelet. You would check the apiserver for when a pod is assigned to a node (scheduled) and when the apiserver reports the pod running.

Am I misunderstanding the requirement?

@davidopp
Copy link
Member

I honestly don't remember how we originally intended this issue and #3954 to be interpreted. But I think the best interpretation is that they describe non-overlapping parts of the startup process that sum to the end-to-end startup latency. That is, given the following steps:

(1) pod created
(2) scheduler assigns host to pod object
(3) pod starts running on kubelet; PodStatus.StartTime is set to this time
(4) the fact that the pod has started running is reflected in the API server (thus visible to user)

I think a reasonable interpretation is that #3954 talks about time from (1) to (3) and this issue (#3952) talks about time from (3) to (4). This is what we're doing. I agree there are other things we could measure, but really the only thing that matters in the end is the end-to-end user-perceived startup time.

@alex-mohr
Copy link
Contributor Author

If it helps, my take: one reasonable end-user-visible SLO is the time from "kubectl create -f blah.yaml" until a "kubectl get pod blah" returns it in a running state. Other end-user SLOs might also be useful to measure: e.g. the time from "kubectl create" until some/all of the pod's processes are created, or until watch returns it, etc. Those are the things users observe, and if we directly measure them under various load conditions, then we have a good understanding of what users will experience.

There's another set of metrics that might be helpful to measure too, even if they're not directly user visible -- that is, if they indirectly contribute to those end-user metrics: duration of a scheduler pass, time elapsed from a desired pod being written to etcd until scheduler has updated etcd with a node assignment, etcd transaction times, various stages of docker operations, pause container or network pre-req creation, client-side delays, etc. Essentially, they're microbenchmarks.

So why did I mention that end-user-visibility as a metric is generally important to land eventually? Because proxies drift over time. E.g., etcd is heavily loaded, someone adds a podStatusCache with 5s eviction, and now the "create until time-written-to-etcd" microbenchmark as a proxy no longer reflects the user-visible behavior that it once did. Or if we go HA and reads use eventual consistency from non-leaders, ditto.

And fwiw, that's not just theorycrafting: Google-internal systems that care about latency have often discovered surprising sources of it when switching from server-side-only to client-and-server-sides latency monitoring.

@gmarek
Copy link
Contributor

gmarek commented Jun 11, 2015

I partially agree. Yes, there can be some lag between information about running pod is written to etcd and the time when "get pod" will return it, but goal stated this way is impossible to test without somehow heavily instrumenting API server. If we want to treat API server as a black box, it's impossible to know if "get" call would return running pod or not without actually calling it, and we don't want to do that too often.

Thus we probably need some proxy for the end-user-visibility. I see two reasonable possibilities: (I) it can be either time when 'running' status is written to etcd (which is an upper bound on the time when pod is actually running and 'serving' for some meaning of serving), or (II) time when watch will report this change (which is reasonable data for lag for users automated systems).

Problem with (I) that I see is that we fill up the timestamp before actually writing to etcd, so if writes are slow, we may have additional 'user observable' lag (if we care about what API server says, not the fact that containers are actually running).

Given that I don't see an easy way of implementing 'end-user-visibility' defined as 'get pods' returns running status I suggest we choose one of the above. I suggest using (I), as we had issues with etcd watch implementation previously, so I don't think binding our SLO with theirs is a good idea.

FYI: currently I observe between 0.5 to 2.5 sec of watch lag (assuming no clock skew)

@lavalamp
Copy link
Member

sophisticated users will use watch instead of get for measuring this, so I think it's valid for us to use watch when making this timing measurement.

@bprashanth
Copy link
Contributor

FYI: currently I observe between 0.5 to 2.5 sec of watch lag (assuming no clock skew)

Watch can lag upto 30s in a 100 node cluster with 3000 pods and an n1-standard4 master :)

@davidopp
Copy link
Member

I realized my 4-item list earlier was wrong. Step (3) has two parts
(3a) Kubelet receives pod; this is the time it will use in PodStatus.StartTime
(3b) pod starts running on Kubelet (after images are fetched and process(es) started)

The thing we're measuring in #3954 is (1) through (3a).

Anyway, I agree with @alex-mohr that we want an end-to-end time (with pods that are already on the minion so there's no fetch time). Using watch sounds like the best thing to do on the client side. My suggestion is that we re-title this issue to "measure end-to-end pod startup latency in 100 node cluster with 30 pods/node" and drop the specific performance goal. At this point we don't have time (before 1.0) to do more than just get the monitoring in place.

30s watch lag sounds pretty sub-optimal. We should work on that...

@gmarek
Copy link
Contributor

gmarek commented Jun 12, 2015

In my test I measure something slightly different than (3a) and (3b). Namely there's a
(3b') All containers in the pod are in the Running state.

The thing that is measured in #9186 is (3b') which is maximum over containers of Pod.Container[x].State.Running.StartedAt.

We'll merge #9549 today, so I'll be able to run tests, and update this bug with the results.

@gmarek
Copy link
Contributor

gmarek commented Jun 12, 2015

I did the experiment and put the results in #3954. I measure everything separately, so we have 'time to running on kubelet' latency, 'time from trying to submit to seeing in in watch' latency and the sum of those two.

@davidopp
Copy link
Member

This effectively became a dupe of #3954. Closing this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/kubelet-api priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/scalability Categorizes an issue or PR as relevant to SIG Scalability.
Projects
None yet
Development

No branches or pull requests