-
Notifications
You must be signed in to change notification settings - Fork 471
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hardware dashboard reference document #3419
Conversation
v2.1/admin-ui-hardware-dashboard.md
Outdated
|
||
- In the cluster view, the graph shows the memory in use across all nodes in the cluster. | ||
|
||
## Disk Read Bytes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add a note for
- disk {read, write, iops}
- net {read, write}
that those metrics are across the whole host, not just for the cockroach process, i.e. they could show disk activity from other processes. Memory and CPU, on the other hand, are just for the cockroach process. Kind of confusing, but I wasn't able to find a good way to get the per-process IO info on both Mac and Linux.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vilterp I made the changes - might have made it worse. Can you check and let me know?
v2.1/admin-ui-hardware-dashboard.md
Outdated
|
||
<img src="{{ 'images/v2.1/admin_ui_disk_read_bytes.png' | relative_url }}" alt="CockroachDB Admin UI Disk Read Bytes graph" style="border:1px solid #eee;max-width:100%" /> | ||
|
||
- In the node view, the graph shows the current moving average, over the last 10 seconds, of the number of bytes read per second for the node. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think "current moving average, over the last ten seconds, of per second" is pretty confusing, but I haven't been able to think of a better suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This language is consistent with the description for SQL Queries. I agree this is not very clear - I am trying to come up with a way to present it diagrammatically. I vote for not changing the language in the meanwhile, because it's technically correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained
v2.1/admin-ui-hardware-dashboard.md, line 47 at r2 (raw file):
Previously, Amruta-Ranade (Amruta Ranade) wrote…
This language is consistent with the description for SQL Queries. I agree this is not very clear - I am trying to come up with a way to present it diagrammatically. I vote for not changing the language in the meanwhile, because it's technically correct.
Suggested reword (for whenever you do try to make this clearer): "[...] the graph shows the current moving average of the number of bytes read per second by all processes (including CockroachDB) for the selected node. The average is taken across the last 10 seconds."
v2.1/admin-ui-hardware-dashboard.md, line 15 at r3 (raw file):
<img src="{{ 'images/v2.1/admin_ui_user_cpu.png' | relative_url }}" alt="CockroachDB Admin UI User CPU Percent graph" style="border:1px solid #eee;max-width:100%" /> - In the node view, the User CPU Percent graph shows the percentage of the total CPU seconds per second used by the CockroachDB process for the selected node.
"the User CPU Percent graph..." is inconsistent with the other sections (which only say, "the graph shows...")
v2.1/admin-ui-hardware-dashboard.md, line 20 at r3 (raw file):
{{site.data.alerts.callout_info}} For multi-core systems, the User CPU Percent can be greater than 100%. Full utilization of one core is considered as 100% CPU usage. If you have n cores, then the User CPU Percent can range from 0% (indicating an idle system) to (n*100)% (indicating full utilization).
I think "User CPU Percent" should be "user CPU percent" here since you're referring to the metric, not the graph.
v2.1/admin-ui-hardware-dashboard.md, line 27 at r3 (raw file):
<img src="{{ 'images/v2.1/admin_ui_system_cpu.png' | relative_url }}" alt="CockroachDB Admin UI System CPU Percent graph" style="border:1px solid #eee;max-width:100%" /> - In the node view, the graph shows the percentage of the total CPU seconds per second used by the system calls made by CockroachDB for the selected node.
Might be clearer to say "[...] used by the CockroachDB system calls for the selected node." Not sure if it changes the meaning though.
v2.1/admin-ui-hardware-dashboard.md, line 32 at r3 (raw file):
{{site.data.alerts.callout_info}} For multi-core systems, the System CPU Percent can be greater than 100%. Full utilization of one core is considered as 100% CPU usage. If you have n cores, then the User CPU Percent can range from 0% (indicating an idle system) to (n*100)% (indicating full utilization).
I think "System CPU Percent" should be "system CPU percent" here since you're referring to the metric, not the graph.
v2.1/admin-ui-hardware-dashboard.md, line 77 at r3 (raw file):
- In the cluster view, the graph shows the maximum allocated capacity, available storage capacity, and capacity used by CockroachDB across all nodes in the cluster. On hovering over the graph, the values for the following metrics are displayed:
Reword to be active/prescriptive: "Hover over the graph to display the values for the following metrics:"
v2.1/admin-ui-hardware-dashboard.md, line 83 at r3 (raw file):
Capacity | The maximum storage capacity allocated to CockroachDB. You can configure the maximum allocated storage capacity for CockroachDB using the <code>--store</code> flag. For more information, see [Start a Node](start-a-node.html#store). Available | The free storage capacity available to CockroachDB. Used | Disk space used by the data in the CockroachDB store. Note that this value is less than (Capacity - Available) because Capacity and Available metrics consider the entire disk and all applications on the disk including CockroachDB, whereas Used metric tracks only the store's disk usage.
Bold Capacity, Available, and Used since you're referring to the different metrics
v2.1/admin-ui-hardware-dashboard.md, line 83 at r3 (raw file): Previously, lhirata wrote…
Ummm...not sure why we would do that? 🤔 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained
v2.1/admin-ui-hardware-dashboard.md, line 83 at r3 (raw file):
Previously, Amruta-Ranade (Amruta Ranade) wrote…
Ummm...not sure why we would do that? 🤔
Since we're referring to labels/attributes on the graph, it seems like we should adhere to the following guidance:
Use bolded text to emphasize an important word or phrase, when referring to the name of a UI section or field, or to create visual separation and callouts
v2.1/admin-ui-hardware-dashboard.md, line 47 at r2 (raw file): Previously, lhirata wrote…
Ooh..just found the term "10-second moving average". Rewording our phrase: " [... ]the 10-second moving average of the number of bytes read per second [...]". Does that sound better? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained
v2.1/admin-ui-hardware-dashboard.md, line 47 at r2 (raw file):
Previously, Amruta-Ranade (Amruta Ranade) wrote…
Ooh..just found the term "10-second moving average". Rewording our phrase: " [... ]the 10-second moving average of the number of bytes read per second [...]". Does that sound better?
👍
v2.1/admin-ui-hardware-dashboard.md, line 47 at r2 (raw file): Previously, lhirata wrote…
Discussion with Pete: It's not a 10-second moving avg; it is an "average over a 10-second window". |
v2.1/admin-ui-hardware-dashboard.md, line 27 at r3 (raw file): Previously, lhirata wrote…
Discussion with Pete: Needs further reading (it's not the making of calls that takes time, it's the time required for completing the calls) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained
v2.1/admin-ui-hardware-dashboard.md, line 15 at r3 (raw file):
Previously, lhirata wrote…
"the User CPU Percent graph..." is inconsistent with the other sections (which only say, "the graph shows...")
Done.
v2.1/admin-ui-hardware-dashboard.md, line 20 at r3 (raw file):
Previously, lhirata wrote…
I think "User CPU Percent" should be "user CPU percent" here since you're referring to the metric, not the graph.
Done.
v2.1/admin-ui-hardware-dashboard.md, line 32 at r3 (raw file):
Previously, lhirata wrote…
I think "System CPU Percent" should be "system CPU percent" here since you're referring to the metric, not the graph.
Done.
v2.1/admin-ui-hardware-dashboard.md, line 77 at r3 (raw file):
Previously, lhirata wrote…
Reword to be active/prescriptive: "Hover over the graph to display the values for the following metrics:"
Done.
v2.1/admin-ui-hardware-dashboard.md, line 83 at r3 (raw file):
Previously, lhirata wrote…
Since we're referring to labels/attributes on the graph, it seems like we should adhere to the following guidance:
Use bolded text to emphasize an important word or phrase, when referring to the name of a UI section or field, or to create visual separation and callouts
Done.
v2.1/admin-ui-hardware-dashboard.md
Outdated
|
||
- In the node view, the graph shows the number of disk reads and writes in queue for all processes including CockroachDB for the selected node. | ||
|
||
- In the cluster view, the graph shows the number of disk reads and writes in queue for all processes including CockroachDB across all nodes in the cluster. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
forgot to mention — IOPS in progress will always show up as 0 on Mac, because the library we're using doesn't report it on Mac, only Linux and maybe Windows. (There may be some way to get it; I just haven't found it yet)
Arguably we just shouldn't show this graph on mac, but we don't have any other OS-specific graphs. So, probably just best to note it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Filed cockroachdb/cockroach#27927; can use as known limitation
v2.1/admin-ui-hardware-dashboard.md, line 47 at r2 (raw file): Previously, Amruta-Ranade (Amruta Ranade) wrote…
Confirmed with Matt T.: It is a 10-second average. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale)
…r nav, and worked on review comments
59d6ecb
to
09a0637
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work, @Amruta-Ranade. I have some comments, but this is pretty close, with some eventual follow-up work to do for the full 2.1 release.
One thing not noted below: You need to add this to https://www.cockroachlabs.com/docs/v2.1/admin-ui-overview.html somehow.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale)
v2.1/admin-ui-hardware-dashboard.md, line 20 at r3 (raw file):
Previously, Amruta-Ranade (Amruta Ranade) wrote…
Done.
I agree with @lhirata. You still need to make Percent
lower-case.
v2.1/admin-ui-hardware-dashboard.md, line 32 at r3 (raw file):
Previously, Amruta-Ranade (Amruta Ranade) wrote…
Done.
Again, agree with @lhirata. You still need to make Percent
lower-case.
v2.1/admin-ui-hardware-dashboard.md, line 83 at r3 (raw file):
Previously, Amruta-Ranade (Amruta Ranade) wrote…
Done.
You still need to bold the referenced metrics in the Used description.
v2.1/admin-ui-hardware-dashboard.md, line 15 at r5 (raw file):
<img src="{{ 'images/v2.1/admin_ui_user_cpu.png' | relative_url }}" alt="CockroachDB Admin UI User CPU Percent graph" style="border:1px solid #eee;max-width:100%" /> - In the node view, the graph shows the percentage of the total CPU seconds per second used by the CockroachDB process for the selected node.
Based on discussions in the #frontend Slack channel, the CPU seconds per second
terminology is probably not right here and needs to be re-thought. Same is true of instances of CPU seconds per second
below.
v2.1/admin-ui-hardware-dashboard.md, line 27 at r5 (raw file):
<img src="{{ 'images/v2.1/admin_ui_system_cpu.png' | relative_url }}" alt="CockroachDB Admin UI System CPU Percent graph" style="border:1px solid #eee;max-width:100%" /> - In the node view, the graph shows the percentage of the total CPU seconds per second used for CockroachDB system-level operations for the selected node.
In a follow-up PR, I think it is important to more clearly differentiate these two CPU metrics. How can we make them concrete for users? Perhaps we can provide a quick example for each.
v2.1/admin-ui-hardware-dashboard.md, line 47 at r5 (raw file):
<img src="{{ 'images/v2.1/admin_ui_disk_read_bytes.png' | relative_url }}" alt="CockroachDB Admin UI Disk Read Bytes graph" style="border:1px solid #eee;max-width:100%" /> - In the node view, the graph shows the 10-second average of the number of bytes read per second by all processes including CockroachDB for the selected node.
nit: Let's break up the sentence a bit with commas: all processes, including CockroachDB, for the selected node
.
v2.1/admin-ui-hardware-dashboard.md, line 49 at r5 (raw file):
- In the node view, the graph shows the 10-second average of the number of bytes read per second by all processes including CockroachDB for the selected node. - In the cluster view, the graph shows the 10-second average of the number of bytes read per second by all processes including CockroachDB across all nodes.
Same as above.
v2.1/admin-ui-hardware-dashboard.md, line 55 at r5 (raw file):
<img src="{{ 'images/v2.1/admin_ui_disk_write_bytes.png' | relative_url }}" alt="CockroachDB Admin UI Disk Write Bytes graph" style="border:1px solid #eee;max-width:100%" /> - In the node view, the graph shows the 10-second average of the number of bytes written per second by all processes including CockroachDB for the node.
Same as above.
v2.1/admin-ui-hardware-dashboard.md, line 57 at r5 (raw file):
- In the node view, the graph shows the 10-second average of the number of bytes written per second by all processes including CockroachDB for the node. - In the cluster view, the graph shows the 10-second average of the number of bytes written per second by all processes including CockroachDB across all nodes.
Same as above.
v2.1/admin-ui-hardware-dashboard.md, line 63 at r5 (raw file):
<img src="{{ 'images/v2.1/admin_ui_disk_iops.png' | relative_url }}" alt="CockroachDB Admin UI Disk IOPS in Progress graph" style="border:1px solid #eee;max-width:100%" /> - In the node view, the graph shows the number of disk reads and writes in queue for all processes including CockroachDB for the selected node.
Same as above.
v2.1/admin-ui-hardware-dashboard.md, line 65 at r5 (raw file):
- In the node view, the graph shows the number of disk reads and writes in queue for all processes including CockroachDB for the selected node. - In the cluster view, the graph shows the number of disk reads and writes in queue for all processes including CockroachDB across all nodes in the cluster.
Same as above.
v2.1/admin-ui-hardware-dashboard.md, line 68 at r5 (raw file):
{{site.data.alerts.callout_info}} For Mac OS, this graph is not populated and shows zero Disk IOPS in Progress. This is a [known limitation](https://github.com/cockroachdb/cockroach/issues/27927) that may be lifted in the future.
Disk IOPS in Progress
> disk IOPS in progress
.
v2.1/admin-ui-hardware-dashboard.md, line 75 at r5 (raw file):
<img src="{{ 'images/v2.1/admin_ui_disk_capacity.png' | relative_url }}" alt="CockroachDB Admin UI Disk Capacity graph" style="border:1px solid #eee;max-width:100%" /> You can monitor the **Disk Capacity** graph to determine when additional storage is needed.
I like stating up-front what users can get from a specific graph. In a follow-up PR, please consider doing this for the other graphs, where possible.
v2.1/admin-ui-hardware-dashboard.md, line 85 at r5 (raw file):
Let's tweak the second sentence to:
You can configure the maximum storage capacity for a given node using the
--store
flag.
Also, let's use backticks for --store
instead of <code>--store</code>
, unless there's a good reason.
v2.1/admin-ui-hardware-dashboard.md, line 87 at r5 (raw file):
**Capacity** | The maximum storage capacity allocated to CockroachDB. You can configure the maximum allocated storage capacity for CockroachDB using the <code>--store</code> flag. For more information, see [Start a Node](start-a-node.html#store). **Available** | The free storage capacity available to CockroachDB. **Used** | Disk space used by the data in the CockroachDB store. Note that this value is less than (Capacity - Available) because Capacity and Available metrics consider the entire disk and all applications on the disk including CockroachDB, whereas Used metric tracks only the store's disk usage.
For readability, use another comma: ... on the disk, including CockroachDB, ...
.
v2.1/admin-ui-overview-dashboard.md, line 16 at r5 (raw file):
<img src="{{ 'images/v2.1/admin_ui_sql_queries.png' | relative_url }}" alt="CockroachDB Admin UI SQL Queries graph" style="border:1px solid #eee;max-width:100%" /> - In the node view, the SQL Queries graph shows the 10-second average of the number of `SELECT`/`INSERT`/`UPDATE`/`DELETE` queries per second issued by SQL clients on the node.
While you're here, you might as well change the SQL Queries graph
to just the graph
, for concision.
v2.1/admin-ui-hardware-dashboard.md, line 7 at r5 (raw file):
One more thought, @Amruta-Ranade: Instead of or in addition to listing the types of metrics on this dashboard, the intro should probably make it clear why you would use this, what types of insights you would glean from it. I'd suggest looking at the airtable record, specifically the user stories there. @piyush-singh can probably help, too. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale)
v2.1/admin-ui-hardware-dashboard.md, line 43 at r2 (raw file):
Previously, Amruta-Ranade (Amruta Ranade) wrote…
@vilterp I made the changes - might have made it worse. Can you check and let me know?
Follow-up with Pete: Looks okay.
v2.1/admin-ui-hardware-dashboard.md, line 20 at r3 (raw file):
Previously, jseldess (Jesse Seldess) wrote…
I agree with @lhirata. You still need to make
Percent
lower-case.
Done.
v2.1/admin-ui-hardware-dashboard.md, line 27 at r3 (raw file):
Previously, Amruta-Ranade (Amruta Ranade) wrote…
Discussion with Pete: Needs further reading (it's not the making of calls that takes time, it's the time required for completing the calls)
Done.
v2.1/admin-ui-hardware-dashboard.md, line 32 at r3 (raw file):
Previously, jseldess (Jesse Seldess) wrote…
Again, agree with @lhirata. You still need to make
Percent
lower-case.
Done.
v2.1/admin-ui-hardware-dashboard.md, line 83 at r3 (raw file):
Previously, jseldess (Jesse Seldess) wrote…
You still need to bold the referenced metrics in the Used description.
Done
v2.1/admin-ui-hardware-dashboard.md, line 7 at r5 (raw file):
Previously, jseldess (Jesse Seldess) wrote…
One more thought, @Amruta-Ranade: Instead of or in addition to listing the types of metrics on this dashboard, the intro should probably make it clear why you would use this, what types of insights you would glean from it. I'd suggest looking at the airtable record, specifically the user stories there. @piyush-singh can probably help, too.
I agree. I want to contextualize not just the dashboards but each graph wherever I can. In the next milestone, we are planning to reorganize the time series and dashboards, so would it be okay if I take it up holistically then? I think this dashboard might change, and so will the description.
v2.1/admin-ui-hardware-dashboard.md, line 15 at r5 (raw file):
Previously, jseldess (Jesse Seldess) wrote…
Based on discussions in the #frontend Slack channel, the
CPU seconds per second
terminology is probably not right here and needs to be re-thought. Same is true of instances ofCPU seconds per second
below.
For now, I am taking Ben's suggestion and going with: "In the node view, the graph shows the amount of CPU resources used by the CockroachDB process for the selected node." Does that look okay? Pete's working on changing the implementation, which will hopefully make it less convoluted to describe.
v2.1/admin-ui-hardware-dashboard.md, line 27 at r5 (raw file):
Previously, jseldess (Jesse Seldess) wrote…
In a follow-up PR, I think it is important to more clearly differentiate these two CPU metrics. How can we make them concrete for users? Perhaps we can provide a quick example for each.
Yes.
v2.1/admin-ui-hardware-dashboard.md, line 49 at r5 (raw file):
Previously, jseldess (Jesse Seldess) wrote…
Same as above.
Done.
v2.1/admin-ui-hardware-dashboard.md, line 55 at r5 (raw file):
Previously, jseldess (Jesse Seldess) wrote…
Same as above.
Done.
v2.1/admin-ui-hardware-dashboard.md, line 57 at r5 (raw file):
Previously, jseldess (Jesse Seldess) wrote…
Same as above.
Done.
v2.1/admin-ui-hardware-dashboard.md, line 63 at r5 (raw file):
Previously, jseldess (Jesse Seldess) wrote…
Same as above.
Done.
v2.1/admin-ui-hardware-dashboard.md, line 65 at r5 (raw file):
Previously, jseldess (Jesse Seldess) wrote…
Same as above.
Done.
v2.1/admin-ui-hardware-dashboard.md, line 68 at r5 (raw file):
Previously, jseldess (Jesse Seldess) wrote…
Disk IOPS in Progress
>disk IOPS in progress
.
Done.
v2.1/admin-ui-hardware-dashboard.md, line 85 at r5 (raw file):
Previously, jseldess (Jesse Seldess) wrote…
Let's tweak the second sentence to:
You can configure the maximum storage capacity for a given node using the
--store
flag.Also, let's use backticks for
--store
instead of<code>--store</code>
, unless there's a good reason.
Done.
v2.1/admin-ui-hardware-dashboard.md, line 87 at r5 (raw file):
Previously, jseldess (Jesse Seldess) wrote…
For readability, use another comma:
... on the disk, including CockroachDB, ...
.
Done.
v2.1/admin-ui-overview-dashboard.md, line 16 at r5 (raw file):
Previously, jseldess (Jesse Seldess) wrote…
While you're here, you might as well change
the SQL Queries graph
to justthe graph
, for concision.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, with one remaining nit.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale)
v2.1/admin-ui-hardware-dashboard.md, line 32 at r3 (raw file):
Previously, Amruta-Ranade (Amruta Ranade) wrote…
Done.
Still need to lower-case the Percent
in the second sentence.
v2.1/admin-ui-hardware-dashboard.md, line 7 at r5 (raw file):
Previously, Amruta-Ranade (Amruta Ranade) wrote…
I agree. I want to contextualize not just the dashboards but each graph wherever I can. In the next milestone, we are planning to reorganize the time series and dashboards, so would it be okay if I take it up holistically then? I think this dashboard might change, and so will the description.
Sounds good. Please just keep a record of the work to revisit, perhaps in the related github issue.
v2.1/admin-ui-hardware-dashboard.md, line 15 at r5 (raw file):
Previously, Amruta-Ranade (Amruta Ranade) wrote…
For now, I am taking Ben's suggestion and going with: "In the node view, the graph shows the amount of CPU resources used by the CockroachDB process for the selected node." Does that look okay? Pete's working on changing the implementation, which will hopefully make it less convoluted to describe.
Hmm, I don't think that text really describes what the graph is showing currently, unfortunately, but if it's what you all agree on for now, I'm fine with it, as long as we revisit.
v2.1/admin-ui-hardware-dashboard.md, line 27 at r5 (raw file):
Previously, Amruta-Ranade (Amruta Ranade) wrote…
Yes.
Again, it'd be great to note this in the related issue so we don't forget.
Summary of changes: