ui, ts: detect, store, & show number of cores on each node #24205

vilterp · 2018-03-26T15:59:06Z

Currently, the UI shows CPU usage in various places (e.g. the cluster visualization) but doesn't show the number of available cores anywhere. Thus, one can't know how much CPU is available without going going elsewhere (e.g. the GCP or AWS console).

We should track the number of cores each node has, and show that somehow near CPU usage indicators.

This raises a bit of a design question: how do you notate this? The clusterviz shows CPU usage as a percentage, which is often above 100, since it's really cpu seconds / second * 100. Do we normalize by number of cores to make it a true percentage?

cc @mrtracy @couchand

The text was updated successfully, but these errors were encountered:

bdarnell · 2018-03-26T16:43:46Z

We should also consider the possibility that we're not the only thing on the machine. If a cockroach process is running alongside some other application process on a four-core machine and they're each using half the available CPU, we need to show both that cockroach is using 2 cpu s/s and that the machine as a whole is at 100% cpu utilization. We should collect two new timeseries: total CPUs and idle CPU time.

vilterp · 2018-03-26T21:25:41Z

Note from in-person convo with @bdarnell: Docker & Kubernetes make it more complicated to know how much CPU is available, since they have concepts of allocating a certain amount of CPU to each process.

Not sure how this fits together or how to get a number on how much CPU is available; do you know @a-robinson?

couchand · 2018-03-26T21:58:36Z

It's a good idea to figure out if we can get this accurately, and if we can we should incorporate it into the UI to add useful context.

But I'm a 👎 on reporting a CPU utilization as a percentage where the denominator is the total CPU available, because of how common it is to report CPU as 100% == single core fully utilized, with multi-core figures going over 100% regularly.

vilterp · 2018-03-26T22:04:28Z

Agreed. Maybe we want to show the fraction (<usage> cpu s/s) / (<total> cpu s/s)?

a-robinson · 2018-03-27T14:24:29Z

Note from in-person convo with @bdarnell: Docker & Kubernetes make it more complicated to know how much CPU is available, since they have concepts of allocating a certain amount of CPU to each process.

Not sure how this fits together or how to get a number on how much CPU is available; do you know @a-robinson?

Yeah, it doesn't mean that there isn't a way, but I'm not aware of a way to tell what fraction of the machine's cores are considered usable by the container in the common case. There's also the fact that the kernel can be configured to treat a container's cpu limit as a hard cap (i.e. always enforce it) or just a soft cap (only enforce it if the machine is busy).

If a container is restricted to use only certain CPUs then that will be reflected inside the container, but that's not the mechanism that Kubernetes uses to restrict container CPU usage.

petermattis · 2018-03-27T14:32:39Z

If possible, we should indicate in the UI if cockroach is restricted to a fraction of a CPU.

petermattis · 2018-03-29T19:02:41Z

See #21416 (comment). We should investigate using some of the node_exporter libraries. Our Prometheus/Grafana configs seem to provide a reasonable CPU metric.

vilterp · 2018-06-01T19:36:35Z

I don't see anything in the node_exporter libraries that gets the number of cores, just reports seconds used by the process, like what we already have from gosigar.

I also haven't been able to find a way to know from inside of a docker container how much CPU is available. In the absence of such an API, maybe we should just call Go's runtime.NumCPU(), report that in the nodes table, and document in a tooltip (and in the docs) that you might not actually be able to use all those CPUs?

petermattis · 2018-06-01T19:50:13Z

I don't see anything in the node_exporter libraries that gets the number of cores, just reports seconds used by the process, like what we already have from gosigar.

Looks like node_exporter gets its cpu metrics from https://github.com/prometheus/procfs/blob/master/stat.go#L62.

vilterp · 2018-06-01T20:17:02Z

I'm not sure if that reflects limitations imposed by docker/k8s, though.

Also, looks like gosigar has a similar CPU list API: https://github.com/cloudfoundry/gosigar/blob/master/sigar_interface.go#L69-L71

And a ProcCPU API (new since the version we're using): https://github.com/cloudfoundry/gosigar/blob/master/sigar_interface.go#L136-L141

The meaning of these APIs isn't well documented. Will play around with them when the admin UI team gets to our "improve hardware stats" milestone at the end of June.

petermattis · 2018-06-01T20:19:14Z

In addition to better CPU metrics, I'd really like disk and network stats as well (e.g. disk utilization, network bandwidth).

a-robinson · 2018-06-04T14:36:19Z

I'm not sure if that reflects limitations imposed by docker/k8s, though.

In most cases, it does not, because those limitations are typically implemented via CFS quota. It's not easy to do anything about that, though. And in our default Kubernetes configs, we don't actually set limits on CPU (or recommend doing so) because of the effect it can have on tail latencies.

vilterp · 2018-08-30T18:52:31Z

Closing this because we now report the number of CPUs on the machine in the node list; opened #29366 to track more specific docker throttling issue.

This was referenced Mar 26, 2018

ui: [suggestion] clusterviz storage is too prominent relative to CPU #23970

Closed

server: collect system CPU, RAM, Network, IO metrics #21416

Closed

petermattis added this to the 2.1 milestone Mar 29, 2018

couchand added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-webui-general Issues on the DB Console that span multiple areas or don't have another clear category. A-monitoring labels Apr 24, 2018

bdarnell mentioned this issue May 15, 2018

ui: provide a count of cpus on the overview page #25514

Closed

vilterp closed this as completed Aug 30, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ui, ts: detect, store, & show number of cores on each node #24205

ui, ts: detect, store, & show number of cores on each node #24205

vilterp commented Mar 26, 2018 •

edited

Loading

bdarnell commented Mar 26, 2018

vilterp commented Mar 26, 2018

couchand commented Mar 26, 2018

vilterp commented Mar 26, 2018

a-robinson commented Mar 27, 2018

petermattis commented Mar 27, 2018

petermattis commented Mar 29, 2018

vilterp commented Jun 1, 2018 •

edited

Loading

petermattis commented Jun 1, 2018

vilterp commented Jun 1, 2018

petermattis commented Jun 1, 2018

a-robinson commented Jun 4, 2018

vilterp commented Aug 30, 2018

ui, ts: detect, store, & show number of cores on each node #24205

ui, ts: detect, store, & show number of cores on each node #24205

Comments

vilterp commented Mar 26, 2018 • edited Loading

bdarnell commented Mar 26, 2018

vilterp commented Mar 26, 2018

couchand commented Mar 26, 2018

vilterp commented Mar 26, 2018

a-robinson commented Mar 27, 2018

petermattis commented Mar 27, 2018

petermattis commented Mar 29, 2018

vilterp commented Jun 1, 2018 • edited Loading

petermattis commented Jun 1, 2018

vilterp commented Jun 1, 2018

petermattis commented Jun 1, 2018

a-robinson commented Jun 4, 2018

vilterp commented Aug 30, 2018

vilterp commented Mar 26, 2018 •

edited

Loading

vilterp commented Jun 1, 2018 •

edited

Loading