-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ui, ts: detect, store, & show number of cores on each node #24205
Comments
We should also consider the possibility that we're not the only thing on the machine. If a cockroach process is running alongside some other application process on a four-core machine and they're each using half the available CPU, we need to show both that cockroach is using 2 cpu s/s and that the machine as a whole is at 100% cpu utilization. We should collect two new timeseries: total CPUs and idle CPU time. |
Note from in-person convo with @bdarnell: Docker & Kubernetes make it more complicated to know how much CPU is available, since they have concepts of allocating a certain amount of CPU to each process. Not sure how this fits together or how to get a number on how much CPU is available; do you know @a-robinson? |
It's a good idea to figure out if we can get this accurately, and if we can we should incorporate it into the UI to add useful context. But I'm a 👎 on reporting a CPU utilization as a percentage where the denominator is the total CPU available, because of how common it is to report CPU as 100% == single core fully utilized, with multi-core figures going over 100% regularly. |
Agreed. Maybe we want to show the fraction |
Yeah, it doesn't mean that there isn't a way, but I'm not aware of a way to tell what fraction of the machine's cores are considered usable by the container in the common case. There's also the fact that the kernel can be configured to treat a container's cpu limit as a hard cap (i.e. always enforce it) or just a soft cap (only enforce it if the machine is busy). If a container is restricted to use only certain CPUs then that will be reflected inside the container, but that's not the mechanism that Kubernetes uses to restrict container CPU usage. |
If possible, we should indicate in the UI if cockroach is restricted to a fraction of a CPU. |
See #21416 (comment). We should investigate using some of the |
I don't see anything in the node_exporter libraries that gets the number of cores, just reports seconds used by the process, like what we already have from gosigar. I also haven't been able to find a way to know from inside of a docker container how much CPU is available. In the absence of such an API, maybe we should just call Go's |
Looks like |
I'm not sure if that reflects limitations imposed by docker/k8s, though. Also, looks like gosigar has a similar CPU list API: https://github.com/cloudfoundry/gosigar/blob/master/sigar_interface.go#L69-L71 And a ProcCPU API (new since the version we're using): https://github.com/cloudfoundry/gosigar/blob/master/sigar_interface.go#L136-L141 The meaning of these APIs isn't well documented. Will play around with them when the admin UI team gets to our "improve hardware stats" milestone at the end of June. |
In addition to better CPU metrics, I'd really like disk and network stats as well (e.g. disk utilization, network bandwidth). |
In most cases, it does not, because those limitations are typically implemented via CFS quota. It's not easy to do anything about that, though. And in our default Kubernetes configs, we don't actually set limits on CPU (or recommend doing so) because of the effect it can have on tail latencies. |
Closing this because we now report the number of CPUs on the machine in the node list; opened #29366 to track more specific docker throttling issue. |
Currently, the UI shows CPU usage in various places (e.g. the cluster visualization) but doesn't show the number of available cores anywhere. Thus, one can't know how much CPU is available without going going elsewhere (e.g. the GCP or AWS console).
We should track the number of cores each node has, and show that somehow near CPU usage indicators.
This raises a bit of a design question: how do you notate this? The clusterviz shows CPU usage as a percentage, which is often above 100, since it's really
cpu seconds / second * 100
. Do we normalize by number of cores to make it a true percentage?cc @mrtracy @couchand
The text was updated successfully, but these errors were encountered: