-
Notifications
You must be signed in to change notification settings - Fork 40.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
after a pod is scheduled, it should take < 5 seconds for apiserver to show it as running #3952
Comments
Why is kubelet talking to etcd? A recently built kubelet should only talk On Thu, Jan 29, 2015 at 2:16 PM, Alex Mohr [email protected] wrote:
|
Did those changes missed the 0.8.x release series. I don't see any etcd --brendan On Thu, Jan 29, 2015 at 4:21 PM, Eric Tune [email protected] wrote:
|
I guess so. On Thu, Jan 29, 2015 at 4:28 PM, Brendan Burns [email protected]
|
We need to reevaluate this now that kubelet should be writing these values without the PodCache. |
Can I close this and just assume we'll notice when working on #3954 if this is still a problem? |
TL;DR: when a pod starts running on kubelet, it should take < 5 seconds for apiserver to show it as running. Really this should say something about tail latency, but let's say 50th percentile for now. Then we can close this bug when the pod status is written by kubelet and not computed by pod cache, which I do not believe is the case yet. @dchen1107 do you know who is owning that? |
I'm not 100% sure what this issue refers to. Basically, once the pod is scheduled, it needs to be started on Kubelet. This requires e.g. pulling the image if it isn't already present on that node. This can easily take more than 5 seconds, so I'm afraid that this "5 seconds" from the title cannot be satisfied in all cases. So I think we need to clarify what we would like to take less than 5 seconds (maybe adding that 5 seconds assuming that the image is already on the machine would be enough?) [update]: What I wanted to say is that #5555 + #5714 is only fixing the problem of "reporting after the pod is already started". But IIUC, the bug is saying also about starting the pod itself. |
Yes, pulls can take arbitrarily long, so the intent with this was "API call to create a pod whose image is already pre-pulled until API reports pod is running should be < 5 seconds." Ideally we'd be in the sub-second range.
That's plausible, but we should verify they do. We care about product performance for v1 -- implementations are a means to that end. |
Thanks for clarification Alex - I completely agree that if we exclude pulling image, this goal is definitely reasonable. |
@davidopp @alex-mohr Yes I think #5555 + #5714 should fix this, and someone should write an e2e test that verifies this. |
For my soak tests I routinely see pods fail to schedule after waiting 20 minutes. |
cc @fgrzadkowski |
@wojtek-t Minor point, note that if for any reason a cluster has no monitoring or logging set up there may not be any containers running on a node. If you wanted to run a beefier container that is easy to check for health by querying through a REST API call I can suggest Elasticsearch which we have on gcr.io. |
I'd like to close this issue. In this comment: |
Sorry, never mind, somehow I overlooked that the measurements in #3954 use pod startup time reported by Kubelet, so we actually are measuring exactly what #3954 specified. @gmarek maybe you could augment #9186 to also record the difference between the pod startup time reported by Kubelet and the time on the master when the test detects the pod as running? (I think that's what this issue was asking for.) |
I can, but I don't really know how to do this easily, short of hammering the master. I may add a watch for Pods in test but this seems like an overkill to me, as the only "latency" between time when Kubelet reports that it wrote data to etcd and moment when master "notices" the change (i.e. have it written in etcd) is the time skew, as AFAIK etcd writes are synchronous. |
@alex-mohr thinks it is important to measure this so that we can get an end-to-end startup latency number. (And I agree.) You could do a new measurement that just measures the difference between time observed on master and StartTime recorded in PodStatus, or a new measurement that's the full end-to-end startup time (time in. #3952 + #3954). I agree there is very little in the path that could make this number large, but I think it makes sense to measure it to avoid surprises. Doing something based on watch sounds reasonable. |
I'm almost certain @rrati and @fgrzadkowski have done similar things in other tests. |
If you think it's important I'll add this. |
A measurement of this could be added to my #9592, which is already measuring something similar. |
I added watch-based monitoring to #9186, similar to what @lavalamp uses. The problem with this approach is that effectively we measure watch lag, which adds to e2e startup latency only for systems depending on watch. Ordinary gets/lists will return fresh data from etcd, which will include recently added entries. |
@davidopp I read this requirement differently. To me, a pod is scheduled when it has a node assigned to it and in that case I don't think there's a need to get any information from the kubelet. You would check the apiserver for when a pod is assigned to a node (scheduled) and when the apiserver reports the pod running. Am I misunderstanding the requirement? |
I honestly don't remember how we originally intended this issue and #3954 to be interpreted. But I think the best interpretation is that they describe non-overlapping parts of the startup process that sum to the end-to-end startup latency. That is, given the following steps: (1) pod created I think a reasonable interpretation is that #3954 talks about time from (1) to (3) and this issue (#3952) talks about time from (3) to (4). This is what we're doing. I agree there are other things we could measure, but really the only thing that matters in the end is the end-to-end user-perceived startup time. |
If it helps, my take: one reasonable end-user-visible SLO is the time from "kubectl create -f blah.yaml" until a "kubectl get pod blah" returns it in a running state. Other end-user SLOs might also be useful to measure: e.g. the time from "kubectl create" until some/all of the pod's processes are created, or until watch returns it, etc. Those are the things users observe, and if we directly measure them under various load conditions, then we have a good understanding of what users will experience. There's another set of metrics that might be helpful to measure too, even if they're not directly user visible -- that is, if they indirectly contribute to those end-user metrics: duration of a scheduler pass, time elapsed from a desired pod being written to etcd until scheduler has updated etcd with a node assignment, etcd transaction times, various stages of docker operations, pause container or network pre-req creation, client-side delays, etc. Essentially, they're microbenchmarks. So why did I mention that end-user-visibility as a metric is generally important to land eventually? Because proxies drift over time. E.g., etcd is heavily loaded, someone adds a podStatusCache with 5s eviction, and now the "create until time-written-to-etcd" microbenchmark as a proxy no longer reflects the user-visible behavior that it once did. Or if we go HA and reads use eventual consistency from non-leaders, ditto. And fwiw, that's not just theorycrafting: Google-internal systems that care about latency have often discovered surprising sources of it when switching from server-side-only to client-and-server-sides latency monitoring. |
I partially agree. Yes, there can be some lag between information about running pod is written to etcd and the time when "get pod" will return it, but goal stated this way is impossible to test without somehow heavily instrumenting API server. If we want to treat API server as a black box, it's impossible to know if "get" call would return running pod or not without actually calling it, and we don't want to do that too often. Thus we probably need some proxy for the end-user-visibility. I see two reasonable possibilities: (I) it can be either time when 'running' status is written to etcd (which is an upper bound on the time when pod is actually running and 'serving' for some meaning of serving), or (II) time when watch will report this change (which is reasonable data for lag for users automated systems). Problem with (I) that I see is that we fill up the timestamp before actually writing to etcd, so if writes are slow, we may have additional 'user observable' lag (if we care about what API server says, not the fact that containers are actually running). Given that I don't see an easy way of implementing 'end-user-visibility' defined as 'get pods' returns running status I suggest we choose one of the above. I suggest using (I), as we had issues with etcd watch implementation previously, so I don't think binding our SLO with theirs is a good idea. FYI: currently I observe between 0.5 to 2.5 sec of watch lag (assuming no clock skew) |
sophisticated users will use watch instead of get for measuring this, so I think it's valid for us to use watch when making this timing measurement. |
Watch can lag upto 30s in a 100 node cluster with 3000 pods and an n1-standard4 master :) |
I realized my 4-item list earlier was wrong. Step (3) has two parts The thing we're measuring in #3954 is (1) through (3a). Anyway, I agree with @alex-mohr that we want an end-to-end time (with pods that are already on the minion so there's no fetch time). Using watch sounds like the best thing to do on the client side. My suggestion is that we re-title this issue to "measure end-to-end pod startup latency in 100 node cluster with 30 pods/node" and drop the specific performance goal. At this point we don't have time (before 1.0) to do more than just get the monitoring in place. 30s watch lag sounds pretty sub-optimal. We should work on that... |
In my test I measure something slightly different than (3a) and (3b). Namely there's a The thing that is measured in #9186 is (3b') which is maximum over containers of Pod.Container[x].State.Running.StartedAt. We'll merge #9549 today, so I'll be able to run tests, and update this bug with the results. |
I did the experiment and put the results in #3954. I measure everything separately, so we have 'time to running on kubelet' latency, 'time from trying to submit to seeing in in watch' latency and the sum of those two. |
This effectively became a dupe of #3954. Closing this issue. |
@dchen1107 There seems to be a 0 to 20 second delay before a single pause pod to get to Running. It should be << 10 seconds. For version 0.8.1, part of #3936.
Here's an example. In this case, scheduler seemed to schedule it reasonably quickly (< 1 sec), the kubelet launched it quickly (<1 sec), and docker transitioned container to running quickly (<1 sec), but pod status available via the master took another 15 seconds. In particular, the status update seemed to either be blocked on the following kubelet.log entry, or that entry is triggered by something or another that was blocked:
From my workstation:
Output of list pods with timestamps in PST:
Thu Jan 29 14:08:03 PST 2015: 1: creating pause-pod-1fcbe922 on port 30069
pause-pod-1fcbe922
Thu Jan 29 14:08:05 PST 2015: 1: fetching pod list
pause-pod-1fcbe922 pause-pod-1fcbe922 kubernetes/pause k8s-prod5-node-1.c.mohr-instance-manager.internal/ name=pause-pod-1fcbe922 Unknown
Thu Jan 29 14:08:06 PST 2015: 1: fetching pod list
pause-pod-1fcbe922 pause-pod-1fcbe922 kubernetes/pause k8s-prod5-node-1.c.mohr-instance-manager.internal/ name=pause-pod-1fcbe922 Unknown
Thu Jan 29 14:08:08 PST 2015: 1: fetching pod list
pause-pod-1fcbe922 pause-pod-1fcbe922 kubernetes/pause k8s-prod5-node-1.c.mohr-instance-manager.internal/ name=pause-pod-1fcbe922 Unknown
Thu Jan 29 14:08:10 PST 2015: 1: fetching pod list
pause-pod-1fcbe922 pause-pod-1fcbe922 kubernetes/pause k8s-prod5-node-1.c.mohr-instance-manager.internal/ name=pause-pod-1fcbe922 Unknown
Thu Jan 29 14:08:11 PST 2015: 1: fetching pod list
pause-pod-1fcbe922 pause-pod-1fcbe922 kubernetes/pause k8s-prod5-node-1.c.mohr-instance-manager.internal/ name=pause-pod-1fcbe922 Unknown
Thu Jan 29 14:08:13 PST 2015: 1: fetching pod list
pause-pod-1fcbe922 pause-pod-1fcbe922 kubernetes/pause k8s-prod5-node-1.c.mohr-instance-manager.internal/ name=pause-pod-1fcbe922 Unknown
Thu Jan 29 14:08:14 PST 2015: 1: fetching pod list
pause-pod-1fcbe922 pause-pod-1fcbe922 kubernetes/pause k8s-prod5-node-1.c.mohr-instance-manager.internal/ name=pause-pod-1fcbe922 Unknown
Thu Jan 29 14:08:16 PST 2015: 1: fetching pod list
pause-pod-1fcbe922 pause-pod-1fcbe922 kubernetes/pause k8s-prod5-node-1.c.mohr-instance-manager.internal/ name=pause-pod-1fcbe922 Unknown
Thu Jan 29 14:08:18 PST 2015: 1: fetching pod list
pause-pod-1fcbe922 pause-pod-1fcbe922 kubernetes/pause k8s-prod5-node-1.c.mohr-instance-manager.internal/ name=pause-pod-1fcbe922 Unknown
Thu Jan 29 14:08:19 PST 2015: 1: fetching pod list
Thu Jan 29 14:08:21 PST 2015: 1: 1 running
Thu Jan 29 14:08:21 PST 2015: 1: deleting all pods
Relevant kubelet log in UTC:
I0129 22:08:05.297181 4903 logs.go:39] etcd DEBUG: recv.success.http://10.240.222.31:4001/v2/keys/registry/nodes/k8s-prod5-node-1.c.mohr-instance-manager.interna
l/boundpods?consistent=true&wait=true&waitIndex=12699
I0129 22:08:05.297448 4903 logs.go:39] etcd DEBUG: get /registry/nodes/k8s-prod5-node-1.c.mohr-instance-manager.internal/boundpods [http://10.240.222.31:4001]
I0129 22:08:05.297535 4903 logs.go:39] etcd DEBUG: Connecting to etcd: attempt1forkeys/registry/nodes/k8s-prod5-node-1.c.mohr-instance-manager.internal/boundpods
?consistent=true&wait=true&waitIndex=12706
I0129 22:08:05.297548 4903 logs.go:39] etcd DEBUG: send.request.to http://10.240.222.31:4001/v2/keys/registry/nodes/k8s-prod5-node-1.c.mohr-instance-manager.inte
rnal/boundpods?consistent=true&wait=true&waitIndex=12706 | method GET
W0129 22:08:05.308951 4903 config.go:307] Pod 1 () from etcd failed validation, ignoring: [name: required value '' namespace: required value 'default']
I0129 22:08:05.312089 4903 logs.go:39] etcd DEBUG: recv.response.fromhttp://10.240.222.31:4001/v2/keys/registry/nodes/k8s-prod5-node-1.c.mohr-instance-manager.in
ternal/boundpods?consistent=true&wait=true&waitIndex=12706
I0129 22:08:05.314162 4903 kubelet.go:793] Network container doesn't exist for pod "pause-pod-1fcbe922.default.etcd", killing and re-creating the pod
I0129 22:08:05.317955 4903 event.go:92] Event(api.ObjectReference{Kind:"BoundPod", Namespace:"default", Name:"pause-pod-1fcbe922", UID:"4cadf377-a803-11e4-a2ec-42010af0de1f", APIVersion:"v1beta1", ResourceVersion:"", FieldPath:"implicitly required container net"}): status: 'waiting', reason: 'pulled' Successfully pulled image "kubernetes/pause:latest"
I0129 22:08:05.419784 4903 event.go:92] Event(api.ObjectReference{Kind:"BoundPod", Namespace:"default", Name:"pause-pod-1fcbe922", UID:"4cadf377-a803-11e4-a2ec-42010af0de1f", APIVersion:"v1beta1", ResourceVersion:"", FieldPath:"implicitly required container net"}): status: 'waiting', reason: 'created' Created with docker id 93b26b849bfb454cc86cd6467a32906a79eb72a5705c3ae473ee1c7edc64b7ee
I0129 22:08:05.554694 4903 event.go:92] Event(api.ObjectReference{Kind:"BoundPod", Namespace:"default", Name:"pause-pod-1fcbe922", UID:"4cadf377-a803-11e4-a2ec-42010af0de1f", APIVersion:"v1beta1", ResourceVersion:"", FieldPath:"implicitly required container net"}): status: 'running', reason: 'started' Started with docker id 93b26b849bfb454cc86cd6467a32906a79eb72a5705c3ae473ee1c7edc64b7ee
I0129 22:08:05.675279 4903 event.go:92] Event(api.ObjectReference{Kind:"BoundPod", Namespace:"default", Name:"pause-pod-1fcbe922", UID:"4cadf377-a803-11e4-a2ec-42010af0de1f", APIVersion:"v1beta1", ResourceVersion:"", FieldPath:"spec.containers{pause-pod-1fcbe922}"}): status: 'waiting', reason: 'created' Created with docker id 80def133b602fa2e4d2cf98a9e7903f9c83887fa281277d38f2850dbfcbaeed8
I0129 22:08:05.777623 4903 event.go:92] Event(api.ObjectReference{Kind:"BoundPod", Namespace:"default", Name:"pause-pod-1fcbe922", UID:"4cadf377-a803-11e4-a2ec-42010af0de1f", APIVersion:"v1beta1", ResourceVersion:"", FieldPath:"spec.containers{pause-pod-1fcbe922}"}): status: 'running', reason: 'started' Started with docker id 80def133b602fa2e4d2cf98a9e7903f9c83887fa281277d38f2850dbfcbaeed8
I0129 22:08:07.996475 4903 server.go:324] GET /healthz: (10.235us) 200
I0129 22:08:10.519742 4903 server.go:324] GET /healthz: (11.461us) 200
I0129 22:08:10.960311 4903 file.go:171] Got pod from file "/etc/kubernetes/manifests/cadvisor.manifest": default.cadvisor-agent (cadvisormanifes12uqn2ohido76855gdecd9roadm7l0)
I0129 22:08:18.003785 4903 server.go:324] GET /healthz: (11.971us) 200
I0129 22:08:20.011837 4903 server.go:324] GET /healthz: (10.642us) 200
I0129 22:08:20.012008 4903 server.go:324] GET /healthz: (3.816us) 200
I0129 22:08:20.015171 4903 logs.go:39] http: multiple response.WriteHeader calls
I0129 22:08:20.015208 4903 server.go:324] GET /api/v1beta1/podInfo?podID=&podNamespace=: (80.895us) 400
goroutine 42938 [running]:
github.com/GoogleCloudPlatform/kubernetes/pkg/httplog.(_respLogger).WriteHeader(0xc209810600, 0x190)
/go/src/github.com/GoogleCloudPlatform/kubernetes/_output/dockerized/go/src/github.com/GoogleCloudPlatform/kubernetes/pkg/httplog/log.go:182 +0x9c
net/http.Error(0x7f3876a0f830, 0xc209810600, 0xaf2cf0, 0x1d, 0x190)
/usr/src/go/src/pkg/net/http/server.go:1244 +0x96
github.com/GoogleCloudPlatform/kubernetes/pkg/kubelet.(_Server).handlePodInfo(0xc20812a320, 0x7f3876a0f830, 0xc209810600, 0xc20a112dd0, 0x95d6901)
/go/src/github.com/GoogleCloudPlatform/kubernetes/_output/dockerized/go/src/github.com/GoogleCloudPlatform/kubernetes/pkg/kubelet/server.go:212 +0x2a4
github.com/GoogleCloudPlatform/kubernetes/pkg/kubelet.(_Server).handlePodInfoVersioned(0xc20812a320, 0x7f3876a0f830, 0xc209810600, 0xc20a112dd0)
/go/src/github.com/GoogleCloudPlatform/kubernetes/_output/dockerized/go/src/github.com/GoogleCloudPlatform/kubernetes/pkg/kubelet/server.go:197 +0x4a
github.com/GoogleCloudPlatform/kubernetes/pkg/kubelet._Server.(github.com/GoogleCloudPlatform/kubernetes/pkg/kubelet.handlePodInfoVersioned)·fm(0x7f3876a0f830, 0xc209810600, 0xc20a112dd0)
/go/src/github.com/GoogleCloudPlatform/kubernetes/_output/dockerized/go/src/github.com/GoogleCloudPlatform/kubernetes/pkg/kubelet/server.go:92 +0x44
net/http.HandlerFunc.ServeHTTP(0xc2081264e0, 0x7f3876a0f830, 0xc209810600, 0xc20a112dd0)
/usr/src/go/src/pkg/net/http/server.go:1235 +0x40
net/http.(_ServeMux).ServeHTTP(0xc20811ab10, 0x7f3876a0f830, 0xc209810600, 0xc20a112dd0)
/usr/src/go/src/pkg/net/http/server.go:1511 +0x1a3
github.com/GoogleCloudPlatform/kubernetes/pkg/kubelet.(_Server).ServeHTTP(0xc20812a300, 0x7f3876a0f7f8, 0xc208131220, 0xc20a112dd0)
/go/src/github.com/GoogleCloudPlatform/kubernetes/_output/dockerized/go/src/github.com/GoogleCloudPlatform/kubernetes/pkg/kubelet/server.go:323 +0x11d
net/http.serverHandler.ServeHTTP(0xc208112ba0, 0x7f3876a0f7f8, 0xc208131220, 0xc20a112dd0)
/usr/src/go/src/pkg/net/http/server.go:1673 +0x19f
net/http.(*
I0129 22:08:20.019854 4903 server.go:324] GET /api/v1beta1/podInfo?podID=pause-pod-1fcbe922&podNamespace=default: (4.894854ms) 200
I0129 22:08:20.910257 4903 server.go:324] GET /healthz: (11.933us) 200
I0129 22:08:22.819342 4903 logs.go:39] etcd DEBUG: recv.success.http://10.240.222.31:4001/v2/keys/registry/nodes/k8s-prod5-node-1.c.mohr-instance-manager.internal/boundpods?consistent=true&wait=true&waitIndex=12706
I0129 22:08:22.819569 4903 logs.go:39] etcd DEBUG: get /registry/nodes/k8s-prod5-node-1.c.mohr-instance-manager.internal/boundpods [http://10.240.222.31:4001]
I0129 22:08:22.819603 4903 logs.go:39] etcd DEBUG: Connecting to etcd: attempt1forkeys/registry/nodes/k8s-prod5-node-1.c.mohr-instance-manager.internal/boundpods?consistent=true&wait=true&waitIndex=12714
I0129 22:08:22.819613 4903 logs.go:39] etcd DEBUG: send.request.to http://10.240.222.31:4001/v2/keys/registry/nodes/k8s-prod5-node-1.c.mohr-instance-manager.internal/boundpods?consistent=true&wait=true&waitIndex=12714 | method GET
W0129 22:08:22.831964 4903 config.go:307] Pod 1 () from etcd failed validation, ignoring: [name: required value '' namespace: required value 'default']
I0129 22:08:22.834110 4903 logs.go:39] etcd DEBUG: recv.response.fromhttp://10.240.222.31:4001/v2/keys/registry/nodes/k8s-prod5-node-1.c.mohr-instance-manager.internal/boundpods?consistent=true&wait=true&waitIndex=12714
I0129 22:08:22.836408 4903 kubelet.go:1044] Killing unwanted container {podFullName:pause-pod-1fcbe922.default.etcd uuid:4cadf377-a803-11e4-a2ec-42010af0de1f containerName:pause-pod-1fcbe922}
The text was updated successfully, but these errors were encountered: