Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sqlstats: include regions in statement_statistics. #92259

Closed
wants to merge 1 commit into from
Closed

sqlstats: include regions in statement_statistics. #92259

wants to merge 1 commit into from

Conversation

matthewtodd
Copy link
Contributor

Part of #89949.

Before this change, tenants did not have a way to see which regions their statements were executing in: the SQL Activity UI depends on cockroach.server.serverpb.NodesUI, which is not available for tenants, and there doesn't seem to be any way to gather the information via SQL.

I am not sure that the approach I've taken here will be the one that makes the most sense architecturally, so I've laid out my thinking and would welcome some advice.

An alternative path would be to implement the NodesUI endpoint for tenants, to allow the UI to continue working as is. This work has been suggested in #92065, but I am wondering if we perhaps don't want to be talking about "nodes" for serverless tenants at all?

In the meantime, the concept of regions is still meaningful for serverless tenants.

So, we attach a regions array to the statistics JSON of crdb_internal.statement_statistics bearing the regions of the nodes the statement executed on.

Note that, as of this writing, this change only includes the region of the gateway node. We will want to do more, but an immediate solution was not at hand, so I figured it was worth first validating this direction before pushing further.

Release note (sql change): A regions field was added to the statistics column of crdb_internal.statement_statistics, reporting the regions of the nodes on which the statement was executed.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

Part of #89949.

Before this change, tenants did not have a way to see which regions
their statements were executing in: the SQL Activity UI depends on
`cockroach.server.serverpb.NodesUI`, which is not available for tenants,
and there doesn't seem to be any way to gather the information via SQL.

I am not sure that the approach I've taken here will be the one that
makes the most sense architecturally, so I've laid out my thinking and
would welcome some advice.

An alternative path would be to implement the `NodesUI` endpoint for
tenants, to allow the UI to continue working as is. This work has been
suggested in #92065, but I am wondering if we perhaps don't want to be
talking about "nodes" for serverless tenants at all?

In the meantime, the concept of regions _is_ still meaningful for
serverless tenants.

So, we attach a `regions` array to the `statistics` JSON of
`crdb_internal.statement_statistics` bearing the regions of the nodes
the statement executed on.

Note that, as of this writing, this change only includes the region of
the gateway node. We will want to do more, but an immediate solution was
not at hand, so I figured it was worth first validating this direction
before pushing further.

Release note (sql change): A `regions` field was added to the statistics
column of `crdb_internal.statement_statistics`, reporting the regions of
the nodes on which the statement was executed.
@matthewtodd matthewtodd marked this pull request as ready for review November 21, 2022 16:59
@matthewtodd matthewtodd requested a review from a team November 21, 2022 16:59
@matthewtodd matthewtodd requested a review from a team as a code owner November 21, 2022 16:59
Copy link
Contributor

@maryliag maryliag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 12 of 13 files at r1, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @matthewtodd)


-- commits line 23 at r1:
Seems like a reasonable approach to me and easy for gateway node. You will need to find a good approach to gather the remaining node/regions info, and check if that would cause any performance issues.


pkg/sql/executor_statement_metrics.go line 188 at r1 (raw file):

	// TODO(todd): Is there a way to gather region info from the other nodes
	//  the query traverses? (Compare with getNodesFromPlanner.) While the

can you check how the nodes endpoint does it? maybe something there that you can use as a guide

Copy link
Contributor

@maryliag maryliag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @matthewtodd)


pkg/ui/workspaces/cluster-ui/src/util/appStats/appStats.ts line 199 at r1 (raw file):

        : b.last_exec_timestamp,
    nodes: uniqueLong([...a.nodes, ...b.nodes]),
    regions: [],

you should be combining the value here, the same way we do for plan gist and idx recommendations

Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @maryliag and @matthewtodd)


pkg/sql/executor_statement_metrics.go line 188 at r1 (raw file):

Previously, maryliag (Marylia Gutierrez) wrote…

can you check how the nodes endpoint does it? maybe something there that you can use as a guide

We already collect the regions information in some cases (when sampling the stmt) for the telemetry, so we could piggy-back on that. This happens in populateQueryLevelStatsAndRegions and the regions info is stored in instrumentationHelper.regions. I believe we currently don't have other places where we compute this info.

There is #88498 to improve things further. I just looked into getAllNodeDescriptors (what we use for determining the region locality) and it appears to not work in multi-tenant environment, so it seems like instrumentationHelper.regions will be empty for serverless even when we do sample a stmt.

Copy link
Contributor

@maryliag maryliag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @matthewtodd and @yuzefovich)


pkg/ui/workspaces/db-console/src/views/statements/statements.spec.tsx line 530 at r1 (raw file):

    },
    nodes: [Long.fromInt(1), Long.fromInt(2), Long.fromInt(3)],
    regions: [],

better to use actual values, otherwise you won't see anything when testing

Copy link
Contributor

@maryliag maryliag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @matthewtodd and @yuzefovich)


-- commits line 33 at r1:
isn't this also adding to the system table?

@matthewtodd
Copy link
Contributor Author

Thank you both for your reviews! Closing this for now in favor of the stop-gap fix in #93268.

@matthewtodd matthewtodd closed this Dec 8, 2022
@matthewtodd matthewtodd deleted the sqlstats-regions branch December 8, 2022 17:24
craig bot pushed a commit that referenced this pull request Dec 9, 2022
93268: statusccl: temporarily serve /_status/nodes to tenants r=matthewtodd a=matthewtodd

Fixes #92065
Part of #89949

Our current UI architecture has many of the SQL Activity pages mapping node or sql instance ids to regions at page load time, using the response from this `/_status/nodes` endpoint to get the information it needs.

This approach is problematic for ephemeral serverless tenants but will need a broader reimagining, probably ending up with us storing locality information directly in many of the sqlstats tables, which will probably require something like sending instance localities along in the distsql tracing. An initial exploration in this direction happened in #92259.

In the meantime, as a stop-gap measure, we implement a reduced version of this `/_status/nodes` endpoint for tenants, keeping the SQL Activity pages somewhat working for multiregion tenants.

Release note: None

Co-authored-by: Matthew Todd <[email protected]>
craig bot pushed a commit that referenced this pull request Mar 6, 2023
95449: sqlstats: include regions in statement_statistics r=matthewtodd a=matthewtodd

Part of #89949.

This completes the work abandoned in #92259, properly storing the regions in which a statement was executed, rather than waiting until UI render time to try to map (dangerously ephemeral) node IDs to their respective regions.

With this work in place, we will be able to remove both the UI mapping code (usages of this [selector][]) and the stop-gap work of #93268.

[selector]: https://github.com/cockroachdb/cockroach/blob/0f6333cbd6c78264a59dc435324c0c33d75ea148/pkg/ui/workspaces/cluster-ui/src/store/nodes/nodes.selectors.ts#L52-L64

Release note (sql change): A regions field was added to the statistics column of crdb_internal.statement_statistics, reporting the regions of the nodes on which the statement was executed.

Co-authored-by: Matthew Todd <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants