Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: query performance regression in pg_catalog query that resolves database objects and their descriptions #102613

Closed
dikshant opened this issue Apr 28, 2023 · 5 comments · Fixed by #103106
Assignees
Labels
A-sql-pgcatalog C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. S-2 Medium-high impact: many users impacted, risks of availability and difficult-to-fix data errors T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)

Comments

@dikshant
Copy link

dikshant commented Apr 28, 2023

This query is really slow and seems to be a regression in 23.1 compared to 22.2.8:

SELECT
  current_database(), n.nspname, relname, d.description
FROM
  pg_description AS d
  INNER JOIN pg_class AS c ON d.objoid = c.oid
  INNER JOIN pg_namespace AS n ON n.oid = c.relnamespace
WHERE
  d.objsubid = 0
  AND n.nspname
    NOT IN (
        'gp_toolkit':::STRING:::NAME,
        'information_schema':::STRING:::NAME,
        'pgagent':::STRING:::NAME
      )
  AND n.nspname NOT LIKE 'pg_%';

Explain:

root@localhost:26257/defaultdb> EXPLAIN ANALYZE (VERBOSE) SELECT
                             ->   current_database(), n.nspname, relname, d.description
                             -> FROM
                             ->   pg_description AS d
                             ->   INNER JOIN pg_class AS c ON d.objoid = c.oid
                             ->   INNER JOIN pg_namespace AS n ON n.oid = c.relnamespace
                             -> WHERE
                             ->   d.objsubid = 0
                             ->   AND n.nspname
                             ->     NOT IN (
                             ->         'gp_toolkit':::STRING:::NAME,
                             ->         'information_schema':::STRING:::NAME,
                             ->         'pgagent':::STRING:::NAME
                             ->       )
                             ->   AND n.nspname NOT LIKE 'pg_%';
                                                        info
--------------------------------------------------------------------------------------------------------------------
  planning time: 1ms
  execution time: 4.5s
  distribution: local
  vectorized: true
  maximum memory usage: 310 KiB
  network usage: 0 B (0 messages)
  sql cpu time: 4.5s
  regions: local

  • render
  │ columns: (current_database, nspname, relname, description)
  │ render current_database: 'defaultdb'
  │ render description: description
  │ render relname: relname
  │ render nspname: nspname
  │
  └── • hash join (inner)
      │ columns: (oid, nspname, objoid, objsubid, description, oid, relname, relnamespace)
      │ nodes: n1
      │ regions: local
      │ actual row count: 76
      │ vectorized batch count: 2
      │ estimated max memory allocated: 180 KiB
      │ estimated max sql temp disk usage: 0 B
      │ sql cpu time: 177µs
      │ estimated row count: 156 (missing stats)
      │ equality: (oid) = (relnamespace)
      │
      ├── • filter
      │   │ columns: (oid, nspname)
      │   │ nodes: n1
      │   │ regions: local
      │   │ actual row count: 2
      │   │ vectorized batch count: 2
      │   │ sql cpu time: 9µs
      │   │ estimated row count: 111 (missing stats)
      │   │ filter: (nspname NOT IN ('gp_toolkit', 'information_schema', 'pgagent')) AND (nspname NOT LIKE 'pg_%')
      │   │
      │   └── • virtual table
      │         columns: (oid, nspname)
      │         nodes: n1
      │         regions: local
      │         actual row count: 5
      │         vectorized batch count: 3
      │         sql cpu time: 69µs
      │         estimated row count: 1,000 (missing stats)
      │         table: pg_namespace@primary
      │
      └── • virtual table lookup join (inner)
          │ columns: (objoid, objsubid, description, oid, relname, relnamespace)
          │ nodes: n1
          │ regions: local
          │ actual row count: 294
          │ vectorized batch count: 9
          │ sql cpu time: 4.5s
          │ estimated row count: 99 (missing stats)
          │ table: pg_class@pg_class_oid_idx
          │ equality: (objoid) = (oid)
          │
          └── • filter
              │ columns: (objoid, objsubid, description)
              │ nodes: n1
              │ regions: local
              │ actual row count: 2,230
              │ vectorized batch count: 12
              │ sql cpu time: 14µs
              │ estimated row count: 10 (missing stats)
              │ filter: objsubid = 0
              │
              └── • virtual table
                    columns: (objoid, objsubid, description)
                    nodes: n1
                    regions: local
                    actual row count: 2,230
                    vectorized batch count: 12
                    sql cpu time: 2ms
                    estimated row count: 1,000 (missing stats)
                    table: pg_description@primary
(78 rows)

Time: 4.535s total (execution 4.535s / network 0.000s)

The slow part

      └── • virtual table lookup join (inner)
          │ columns: (objoid, objsubid, description, oid, relname, relnamespace)
          │ nodes: n1
          │ regions: local
          │ actual row count: 294
          │ vectorized batch count: 9
          │ sql cpu time: 4.5s
          │ estimated row count: 99 (missing stats)
          │ table: pg_class@pg_class_oid_idx
          │ equality: (objoid) = (oid)

However on 22.2.8 this is much faster

root@localhost:26257/defaultdb> EXPLAIN ANALYZE (VERBOSE) SELECT                                                                            current_database(), n.nspname, relname, d.description                                                                                   FROM                                                                                                                                        pg_description AS d                                                                                                                       INNER JOIN pg_class AS c ON d.objoid = c.oid                                                                                              INNER JOIN pg_namespace AS n ON n.oid = c.relnamespace                                                                                  WHERE                                                                                                                                       d.objsubid = 0                                                                                                                            AND n.nspname                                                                                                                               NOT IN (                                                                                                                                      'gp_toolkit':::STRING:::NAME,                                                                                                             'information_schema':::STRING:::NAME,                                                                                                     'pgagent':::STRING:::NAME                                                                                                               )                                                                                                                                     AND n.nspname NOT LIKE 'pg_%';
                                                        info
--------------------------------------------------------------------------------------------------------------------
  planning time: 834µs
  execution time: 18ms
  distribution: local
  vectorized: true
  maximum memory usage: 300 KiB
  network usage: 0 B (0 messages)
  regions: local

  • render
  │ columns: (current_database, nspname, relname, description)
  │ render current_database: 'defaultdb'
  │ render description: description
  │ render relname: relname
  │ render nspname: nspname
  │
  └── • hash join (inner)
      │ columns: (oid, nspname, objoid, objsubid, description, oid, relname, relnamespace)
      │ nodes: n1
      │ regions: local
      │ actual row count: 64
      │ vectorized batch count: 2
      │ estimated max memory allocated: 190 KiB
      │ estimated max sql temp disk usage: 0 B
      │ estimated row count: 156 (missing stats)
      │ equality: (oid) = (relnamespace)
      │
      ├── • filter
      │   │ columns: (oid, nspname)
      │   │ nodes: n1
      │   │ regions: local
      │   │ actual row count: 2
      │   │ vectorized batch count: 2
      │   │ estimated row count: 111 (missing stats)
      │   │ filter: (nspname NOT IN ('gp_toolkit', 'information_schema', 'pgagent')) AND (nspname NOT LIKE 'pg_%')
      │   │
      │   └── • virtual table
      │         columns: (oid, nspname)
      │         nodes: n1
      │         regions: local
      │         actual row count: 5
      │         vectorized batch count: 3
      │         estimated row count: 1,000 (missing stats)
      │         table: pg_namespace@primary
      │
      └── • virtual table lookup join (inner)
          │ columns: (objoid, objsubid, description, oid, relname, relnamespace)
          │ nodes: n1
          │ regions: local
          │ actual row count: 282
          │ vectorized batch count: 9
          │ estimated row count: 99 (missing stats)
          │ table: pg_class@pg_class_oid_idx
          │ equality: (objoid) = (oid)
          │
          └── • filter
              │ columns: (objoid, objsubid, description)
              │ nodes: n1
              │ regions: local
              │ actual row count: 282
              │ vectorized batch count: 9
              │ estimated row count: 10 (missing stats)
              │ filter: objsubid = 0
              │
              └── • virtual table
                    columns: (objoid, objsubid, description)
                    nodes: n1
                    regions: local
                    actual row count: 282
                    vectorized batch count: 9
                    estimated row count: 1,000 (missing stats)
                    table: pg_description@primary
(71 rows)


Time: 20ms total (execution 19ms / network 0ms)

root@localhost:26257/defaultdb>

Statement Bundle

Commit used to build cockroach: f8523ec

Jira issue: CRDB-27560

@dikshant dikshant added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-sql-pgcatalog T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) T-sql-queries SQL Queries Team labels Apr 28, 2023
@dikshant dikshant changed the title sql: pg_catalog query that resolves database objects and their descriptions is very slow due to a virtual lookup join sql: query performance regression in pg_catalog query that resolves database objects and their descriptions Apr 28, 2023
@mgartner mgartner added the S-2 Medium-high impact: many users impacted, risks of availability and difficult-to-fix data errors label Apr 28, 2023
@mgartner
Copy link
Collaborator

Adding a S-2 label because this is a regression from v22.2, and I want to prioritize looking into it.

@mgartner
Copy link
Collaborator

Here's a simplified reproduction:

SELECT relname, d.description
FROM pg_description AS d
  INNER JOIN pg_class AS c ON d.objoid = c.oid
WHERE d.objsubid = 0;

On v22.2.6 in the demo it takes ~10ms. On master it takes ~4.5s.

@rafiss
Copy link
Collaborator

rafiss commented Apr 30, 2023

when we fix this, let's be sure to add the repro into pkg/bench/rttanalysis/orm_queries_bench_test.go

@DrewKimball
Copy link
Collaborator

CPU profile one a single node repeatedly running the simplified repro:
Screenshot 2023-05-05 at 9 02 57 AM

@rafiss
Copy link
Collaborator

rafiss commented May 9, 2023

We saw this problem come up in #102851. @fqazi tracked it down to the fact that populating pg_description fetches all descriptors to load comments. We'll work on this in our next milestone.

fqazi added a commit to fqazi/cockroach that referenced this issue May 10, 2023
Previously, we refactored the code to fetch comments, descriptors,
and zone config together in all cases. A side effect of this change
was that the crdb_internal.kv_system_comments table was substantially
slower for larger tables leading to big regressions. To address this,
this patch adds a method of only fetching comments within the
collections.

Informs: cockroachdb#102851
Fixes: cockroachdb#102613

Release note (bug fix): Optimize over-head of
pg_catalog.pg_description and pg_catalog.pg_shdescription, which
can lead to performance regression relative to 22.2
fqazi added a commit to fqazi/cockroach that referenced this issue May 10, 2023
Previously, we refactored the code to fetch comments, descriptors,
and zone config together in all cases. A side effect of this change
was that the crdb_internal.kv_system_comments table was substantially
slower for larger tables leading to big regressions. To address this,
this patch adds a method of only fetching comments within the
collections.

Informs: cockroachdb#102851
Fixes: cockroachdb#102613

Release note (bug fix): Optimize over-head of
pg_catalog.pg_description and pg_catalog.pg_shdescription, which
can lead to performance regression relative to 22.2
fqazi added a commit to fqazi/cockroach that referenced this issue May 11, 2023
Previously, we refactored the code to fetch comments, descriptors,
and zone config together in all cases. A side effect of this change
was that the crdb_internal.kv_system_comments table was substantially
slower for larger tables leading to big regressions. To address this,
this patch adds a method of only fetching comments within the
collections.

Informs: cockroachdb#102851
Fixes: cockroachdb#102613

Release note (bug fix): Optimize over-head of
pg_catalog.pg_description and pg_catalog.pg_shdescription, which
can lead to performance regression relative to 22.2
fqazi added a commit to fqazi/cockroach that referenced this issue May 11, 2023
Previously, we refactored the code to fetch comments, descriptors,
and zone config together in all cases. A side effect of this change
was that the crdb_internal.kv_system_comments table was substantially
slower for larger tables leading to big regressions. To address this,
this patch adds a method of only fetching comments within the
collections.

Informs: cockroachdb#102851
Fixes: cockroachdb#102613

Release note (bug fix): Optimize over-head of
pg_catalog.pg_description and pg_catalog.pg_shdescription, which
can lead to performance regression relative to 22.2
craig bot pushed a commit that referenced this issue May 13, 2023
103106: sql/catalog: avoid fetching descriptors when fetching comments r=fqazi a=fqazi

Previously, we refactored the code to fetch comments, descriptors, and zone config together in all cases. A side effect of this change was that the crdb_internal.kv_system_comments table was substantially slower for larger tables leading to big regressions. To address this, this patch adds a method of only fetching comments within the collections. 

Additionally, we added many builtins into both functions, which meant our existing virtual index look-up could end up falling back too frequently to full scans. Also, a virtual index is added on the kv_catalog_comments table for fast point look-ups when referencing descriptors.


Informs: #102851
Fixes: #102613

Release note (bug fix): Optimize over-head of
pg_catalog.pg_description and pg_catalog.pg_shdescription, which can lead to performance regression relative to 22.2

Co-authored-by: Faizan Qazi <[email protected]>
@craig craig bot closed this as completed in c3b0af2 May 13, 2023
@exalate-issue-sync exalate-issue-sync bot removed the T-sql-queries SQL Queries Team label May 13, 2023
blathers-crl bot pushed a commit that referenced this issue May 15, 2023
Previously, we refactored the code to fetch comments, descriptors,
and zone config together in all cases. A side effect of this change
was that the crdb_internal.kv_system_comments table was substantially
slower for larger tables leading to big regressions. To address this,
this patch adds a method of only fetching comments within the
collections.

Informs: #102851
Fixes: #102613

Release note (bug fix): Optimize over-head of
pg_catalog.pg_description and pg_catalog.pg_shdescription, which
can lead to performance regression relative to 22.2
@mgartner mgartner moved this to Done in SQL Queries Jul 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sql-pgcatalog C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. S-2 Medium-high impact: many users impacted, risks of availability and difficult-to-fix data errors T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)
Projects
Archived in project
5 participants