-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sqlstats: include regions in statement_statistics. #92259
Conversation
Part of #89949. Before this change, tenants did not have a way to see which regions their statements were executing in: the SQL Activity UI depends on `cockroach.server.serverpb.NodesUI`, which is not available for tenants, and there doesn't seem to be any way to gather the information via SQL. I am not sure that the approach I've taken here will be the one that makes the most sense architecturally, so I've laid out my thinking and would welcome some advice. An alternative path would be to implement the `NodesUI` endpoint for tenants, to allow the UI to continue working as is. This work has been suggested in #92065, but I am wondering if we perhaps don't want to be talking about "nodes" for serverless tenants at all? In the meantime, the concept of regions _is_ still meaningful for serverless tenants. So, we attach a `regions` array to the `statistics` JSON of `crdb_internal.statement_statistics` bearing the regions of the nodes the statement executed on. Note that, as of this writing, this change only includes the region of the gateway node. We will want to do more, but an immediate solution was not at hand, so I figured it was worth first validating this direction before pushing further. Release note (sql change): A `regions` field was added to the statistics column of `crdb_internal.statement_statistics`, reporting the regions of the nodes on which the statement was executed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 12 of 13 files at r1, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @matthewtodd)
-- commits
line 23 at r1:
Seems like a reasonable approach to me and easy for gateway node. You will need to find a good approach to gather the remaining node/regions info, and check if that would cause any performance issues.
pkg/sql/executor_statement_metrics.go
line 188 at r1 (raw file):
// TODO(todd): Is there a way to gather region info from the other nodes // the query traverses? (Compare with getNodesFromPlanner.) While the
can you check how the nodes endpoint does it? maybe something there that you can use as a guide
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @matthewtodd)
pkg/ui/workspaces/cluster-ui/src/util/appStats/appStats.ts
line 199 at r1 (raw file):
: b.last_exec_timestamp, nodes: uniqueLong([...a.nodes, ...b.nodes]), regions: [],
you should be combining the value here, the same way we do for plan gist and idx recommendations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @maryliag and @matthewtodd)
pkg/sql/executor_statement_metrics.go
line 188 at r1 (raw file):
Previously, maryliag (Marylia Gutierrez) wrote…
can you check how the nodes endpoint does it? maybe something there that you can use as a guide
We already collect the regions information in some cases (when sampling the stmt) for the telemetry, so we could piggy-back on that. This happens in populateQueryLevelStatsAndRegions
and the regions info is stored in instrumentationHelper.regions
. I believe we currently don't have other places where we compute this info.
There is #88498 to improve things further. I just looked into getAllNodeDescriptors
(what we use for determining the region locality) and it appears to not work in multi-tenant environment, so it seems like instrumentationHelper.regions
will be empty for serverless even when we do sample a stmt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @matthewtodd and @yuzefovich)
pkg/ui/workspaces/db-console/src/views/statements/statements.spec.tsx
line 530 at r1 (raw file):
}, nodes: [Long.fromInt(1), Long.fromInt(2), Long.fromInt(3)], regions: [],
better to use actual values, otherwise you won't see anything when testing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @matthewtodd and @yuzefovich)
-- commits
line 33 at r1:
isn't this also adding to the system table?
Thank you both for your reviews! Closing this for now in favor of the stop-gap fix in #93268. |
93268: statusccl: temporarily serve /_status/nodes to tenants r=matthewtodd a=matthewtodd Fixes #92065 Part of #89949 Our current UI architecture has many of the SQL Activity pages mapping node or sql instance ids to regions at page load time, using the response from this `/_status/nodes` endpoint to get the information it needs. This approach is problematic for ephemeral serverless tenants but will need a broader reimagining, probably ending up with us storing locality information directly in many of the sqlstats tables, which will probably require something like sending instance localities along in the distsql tracing. An initial exploration in this direction happened in #92259. In the meantime, as a stop-gap measure, we implement a reduced version of this `/_status/nodes` endpoint for tenants, keeping the SQL Activity pages somewhat working for multiregion tenants. Release note: None Co-authored-by: Matthew Todd <[email protected]>
95449: sqlstats: include regions in statement_statistics r=matthewtodd a=matthewtodd Part of #89949. This completes the work abandoned in #92259, properly storing the regions in which a statement was executed, rather than waiting until UI render time to try to map (dangerously ephemeral) node IDs to their respective regions. With this work in place, we will be able to remove both the UI mapping code (usages of this [selector][]) and the stop-gap work of #93268. [selector]: https://github.com/cockroachdb/cockroach/blob/0f6333cbd6c78264a59dc435324c0c33d75ea148/pkg/ui/workspaces/cluster-ui/src/store/nodes/nodes.selectors.ts#L52-L64 Release note (sql change): A regions field was added to the statistics column of crdb_internal.statement_statistics, reporting the regions of the nodes on which the statement was executed. Co-authored-by: Matthew Todd <[email protected]>
Part of #89949.
Before this change, tenants did not have a way to see which regions their statements were executing in: the SQL Activity UI depends on
cockroach.server.serverpb.NodesUI
, which is not available for tenants, and there doesn't seem to be any way to gather the information via SQL.I am not sure that the approach I've taken here will be the one that makes the most sense architecturally, so I've laid out my thinking and would welcome some advice.
An alternative path would be to implement the
NodesUI
endpoint for tenants, to allow the UI to continue working as is. This work has been suggested in #92065, but I am wondering if we perhaps don't want to be talking about "nodes" for serverless tenants at all?In the meantime, the concept of regions is still meaningful for serverless tenants.
So, we attach a
regions
array to thestatistics
JSON ofcrdb_internal.statement_statistics
bearing the regions of the nodes the statement executed on.Note that, as of this writing, this change only includes the region of the gateway node. We will want to do more, but an immediate solution was not at hand, so I figured it was worth first validating this direction before pushing further.
Release note (sql change): A
regions
field was added to the statistics column ofcrdb_internal.statement_statistics
, reporting the regions of the nodes on which the statement was executed.