Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stats: include system.jobs in auto stats collection #102213

Closed
dt opened this issue Apr 25, 2023 · 3 comments · Fixed by #102594
Closed

stats: include system.jobs in auto stats collection #102213

dt opened this issue Apr 25, 2023 · 3 comments · Fixed by #102594
Assignees
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-testcluster Issues found or occurred on a test cluster, i.e. a long-running internal cluster T-sql-queries SQL Queries Team

Comments

@dt
Copy link
Member

dt commented Apr 25, 2023

In #80887 we excluded system.jobs from automatic stats collection, but this is a system table that has 5 different indexes and is used by a wide variety of different queries coming from different places -- the jobs system itself, inspection tools, virtual tables, join with other jobs-info, etc, with a variety of different predicates. Choosing the best plan and index for each of these is obviously going to depend on the predicate and join and data distribution, but this is exactly the kind of information gathered statistics would provide the optimizer.

The lack of statistics here was recently implicated in an incident where a query on system.jobs that had two predicates -- one on jobs.job_type that it be one of x values and one on jobs.status that it be be one of y values; the plan chosen picked the job_type index instead of the job status index, even though there were 130,000 jobs of the matching type and only ~10 of the matching status.

Jira issue: CRDB-27344

@dt dt added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-sql-queries SQL Queries Team labels Apr 25, 2023
@dt dt added the O-testcluster Issues found or occurred on a test cluster, i.e. a long-running internal cluster label Apr 25, 2023
@dt
Copy link
Member Author

dt commented Apr 25, 2023

GH is claiming I removed those labels but I don't think I did? maybe some stale browser state or something. Putting them back.

@dt
Copy link
Member Author

dt commented Apr 25, 2023

That said, I'm not sure this is a GA-blocker: all the queries we are running now are apparently running well enough, and we've manually index-hinted the one that wasn't, so I while I do suspect that in the long term we want stats on this table I don't feel strongly that we do in 23.1

@DrewKimball
Copy link
Collaborator

DrewKimball commented Apr 25, 2023

[yahor] We split off the most active/contentious part of the jobs table, so we may be able to reconsider collecting stats for the jobs table to ensure good index selection. One concern is that the auto-stats query that checks for existing auto-stats checks the jobs table.
[becca] We might start using views instead of virtual tables in the future, which might make statistics unnecessary. Worried about backporting a change to 23.1.0.

Removing the GA-blocker label. Plan is to enable stats on the jobs table for master, then consider back-porting to 23.1.1.

rytaft added a commit to rytaft/cockroach that referenced this issue Apr 28, 2023
Fixes cockroachdb#102213

Release note (performance improvement): We now automaticlaly collect table
statistics on the system.jobs table, which will enable the optimizer to
produce better query plans for internal queries that access the system.jobs
table. This may result in better performance of the system.
craig bot pushed a commit that referenced this issue Apr 28, 2023
102594: sql/stats: collect automatic table statistics on the system.jobs table r=rytaft a=rytaft

Fixes #102213

Release note (performance improvement): We now automatically collect table statistics on the `system.jobs` table, which will enable the optimizer to produce better query plans for internal queries that access the `system.jobs` table. This may result in better performance of the system.

Co-authored-by: Rebecca Taft <[email protected]>
@craig craig bot closed this as completed in a6e2818 Apr 28, 2023
blathers-crl bot pushed a commit that referenced this issue Apr 28, 2023
Fixes #102213

Release note (performance improvement): We now automatically collect table
statistics on the system.jobs table, which will enable the optimizer to
produce better query plans for internal queries that access the system.jobs
table. This may result in better performance of the system.
craig bot pushed a commit that referenced this issue Apr 28, 2023
102637: release-23.1: sql/stats: collect automatic table statistics on the system.jobs table r=rytaft a=blathers-crl[bot]

Backport 1/1 commits from #102594 on behalf of `@rytaft.`

/cc `@cockroachdb/release`

----

Fixes #102213

Release note (performance improvement): We now automatically collect table statistics on the `system.jobs` table, which will enable the optimizer to produce better query plans for internal queries that access the `system.jobs` table. This may result in better performance of the system.

----

Release justification: with changes to the jobs infrastructure, it's more important to have accurate stats on the jobs table. 

Co-authored-by: Rebecca Taft <[email protected]>
@mgartner mgartner moved this to Done in SQL Queries Jul 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-testcluster Issues found or occurred on a test cluster, i.e. a long-running internal cluster T-sql-queries SQL Queries Team
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

4 participants