sql: query performance regression in pg_catalog query that resolves database objects and their descriptions #102613

dikshant · 2023-04-28T15:41:49Z

This query is really slow and seems to be a regression in 23.1 compared to 22.2.8:

SELECT
  current_database(), n.nspname, relname, d.description
FROM
  pg_description AS d
  INNER JOIN pg_class AS c ON d.objoid = c.oid
  INNER JOIN pg_namespace AS n ON n.oid = c.relnamespace
WHERE
  d.objsubid = 0
  AND n.nspname
    NOT IN (
        'gp_toolkit':::STRING:::NAME,
        'information_schema':::STRING:::NAME,
        'pgagent':::STRING:::NAME
      )
  AND n.nspname NOT LIKE 'pg_%';

Explain:

root@localhost:26257/defaultdb> EXPLAIN ANALYZE (VERBOSE) SELECT
                             ->   current_database(), n.nspname, relname, d.description
                             -> FROM
                             ->   pg_description AS d
                             ->   INNER JOIN pg_class AS c ON d.objoid = c.oid
                             ->   INNER JOIN pg_namespace AS n ON n.oid = c.relnamespace
                             -> WHERE
                             ->   d.objsubid = 0
                             ->   AND n.nspname
                             ->     NOT IN (
                             ->         'gp_toolkit':::STRING:::NAME,
                             ->         'information_schema':::STRING:::NAME,
                             ->         'pgagent':::STRING:::NAME
                             ->       )
                             ->   AND n.nspname NOT LIKE 'pg_%';
                                                        info
--------------------------------------------------------------------------------------------------------------------
  planning time: 1ms
  execution time: 4.5s
  distribution: local
  vectorized: true
  maximum memory usage: 310 KiB
  network usage: 0 B (0 messages)
  sql cpu time: 4.5s
  regions: local

  • render
  │ columns: (current_database, nspname, relname, description)
  │ render current_database: 'defaultdb'
  │ render description: description
  │ render relname: relname
  │ render nspname: nspname
  │
  └── • hash join (inner)
      │ columns: (oid, nspname, objoid, objsubid, description, oid, relname, relnamespace)
      │ nodes: n1
      │ regions: local
      │ actual row count: 76
      │ vectorized batch count: 2
      │ estimated max memory allocated: 180 KiB
      │ estimated max sql temp disk usage: 0 B
      │ sql cpu time: 177µs
      │ estimated row count: 156 (missing stats)
      │ equality: (oid) = (relnamespace)
      │
      ├── • filter
      │   │ columns: (oid, nspname)
      │   │ nodes: n1
      │   │ regions: local
      │   │ actual row count: 2
      │   │ vectorized batch count: 2
      │   │ sql cpu time: 9µs
      │   │ estimated row count: 111 (missing stats)
      │   │ filter: (nspname NOT IN ('gp_toolkit', 'information_schema', 'pgagent')) AND (nspname NOT LIKE 'pg_%')
      │   │
      │   └── • virtual table
      │         columns: (oid, nspname)
      │         nodes: n1
      │         regions: local
      │         actual row count: 5
      │         vectorized batch count: 3
      │         sql cpu time: 69µs
      │         estimated row count: 1,000 (missing stats)
      │         table: pg_namespace@primary
      │
      └── • virtual table lookup join (inner)
          │ columns: (objoid, objsubid, description, oid, relname, relnamespace)
          │ nodes: n1
          │ regions: local
          │ actual row count: 294
          │ vectorized batch count: 9
          │ sql cpu time: 4.5s
          │ estimated row count: 99 (missing stats)
          │ table: pg_class@pg_class_oid_idx
          │ equality: (objoid) = (oid)
          │
          └── • filter
              │ columns: (objoid, objsubid, description)
              │ nodes: n1
              │ regions: local
              │ actual row count: 2,230
              │ vectorized batch count: 12
              │ sql cpu time: 14µs
              │ estimated row count: 10 (missing stats)
              │ filter: objsubid = 0
              │
              └── • virtual table
                    columns: (objoid, objsubid, description)
                    nodes: n1
                    regions: local
                    actual row count: 2,230
                    vectorized batch count: 12
                    sql cpu time: 2ms
                    estimated row count: 1,000 (missing stats)
                    table: pg_description@primary
(78 rows)

Time: 4.535s total (execution 4.535s / network 0.000s)

The slow part

      └── • virtual table lookup join (inner)
          │ columns: (objoid, objsubid, description, oid, relname, relnamespace)
          │ nodes: n1
          │ regions: local
          │ actual row count: 294
          │ vectorized batch count: 9
          │ sql cpu time: 4.5s
          │ estimated row count: 99 (missing stats)
          │ table: pg_class@pg_class_oid_idx
          │ equality: (objoid) = (oid)

However on 22.2.8 this is much faster

root@localhost:26257/defaultdb> EXPLAIN ANALYZE (VERBOSE) SELECT                                                                            current_database(), n.nspname, relname, d.description                                                                                   FROM                                                                                                                                        pg_description AS d                                                                                                                       INNER JOIN pg_class AS c ON d.objoid = c.oid                                                                                              INNER JOIN pg_namespace AS n ON n.oid = c.relnamespace                                                                                  WHERE                                                                                                                                       d.objsubid = 0                                                                                                                            AND n.nspname                                                                                                                               NOT IN (                                                                                                                                      'gp_toolkit':::STRING:::NAME,                                                                                                             'information_schema':::STRING:::NAME,                                                                                                     'pgagent':::STRING:::NAME                                                                                                               )                                                                                                                                     AND n.nspname NOT LIKE 'pg_%';
                                                        info
--------------------------------------------------------------------------------------------------------------------
  planning time: 834µs
  execution time: 18ms
  distribution: local
  vectorized: true
  maximum memory usage: 300 KiB
  network usage: 0 B (0 messages)
  regions: local

  • render
  │ columns: (current_database, nspname, relname, description)
  │ render current_database: 'defaultdb'
  │ render description: description
  │ render relname: relname
  │ render nspname: nspname
  │
  └── • hash join (inner)
      │ columns: (oid, nspname, objoid, objsubid, description, oid, relname, relnamespace)
      │ nodes: n1
      │ regions: local
      │ actual row count: 64
      │ vectorized batch count: 2
      │ estimated max memory allocated: 190 KiB
      │ estimated max sql temp disk usage: 0 B
      │ estimated row count: 156 (missing stats)
      │ equality: (oid) = (relnamespace)
      │
      ├── • filter
      │   │ columns: (oid, nspname)
      │   │ nodes: n1
      │   │ regions: local
      │   │ actual row count: 2
      │   │ vectorized batch count: 2
      │   │ estimated row count: 111 (missing stats)
      │   │ filter: (nspname NOT IN ('gp_toolkit', 'information_schema', 'pgagent')) AND (nspname NOT LIKE 'pg_%')
      │   │
      │   └── • virtual table
      │         columns: (oid, nspname)
      │         nodes: n1
      │         regions: local
      │         actual row count: 5
      │         vectorized batch count: 3
      │         estimated row count: 1,000 (missing stats)
      │         table: pg_namespace@primary
      │
      └── • virtual table lookup join (inner)
          │ columns: (objoid, objsubid, description, oid, relname, relnamespace)
          │ nodes: n1
          │ regions: local
          │ actual row count: 282
          │ vectorized batch count: 9
          │ estimated row count: 99 (missing stats)
          │ table: pg_class@pg_class_oid_idx
          │ equality: (objoid) = (oid)
          │
          └── • filter
              │ columns: (objoid, objsubid, description)
              │ nodes: n1
              │ regions: local
              │ actual row count: 282
              │ vectorized batch count: 9
              │ estimated row count: 10 (missing stats)
              │ filter: objsubid = 0
              │
              └── • virtual table
                    columns: (objoid, objsubid, description)
                    nodes: n1
                    regions: local
                    actual row count: 282
                    vectorized batch count: 9
                    estimated row count: 1,000 (missing stats)
                    table: pg_description@primary
(71 rows)


Time: 20ms total (execution 19ms / network 0ms)

root@localhost:26257/defaultdb>

Statement Bundle

Commit used to build cockroach: f8523ec

Jira issue: CRDB-27560

The text was updated successfully, but these errors were encountered:

mgartner · 2023-04-28T16:07:39Z

Adding a S-2 label because this is a regression from v22.2, and I want to prioritize looking into it.

mgartner · 2023-04-28T16:14:25Z

Here's a simplified reproduction:

SELECT relname, d.description
FROM pg_description AS d
  INNER JOIN pg_class AS c ON d.objoid = c.oid
WHERE d.objsubid = 0;

On v22.2.6 in the demo it takes ~10ms. On master it takes ~4.5s.

rafiss · 2023-04-30T20:35:45Z

when we fix this, let's be sure to add the repro into pkg/bench/rttanalysis/orm_queries_bench_test.go

DrewKimball · 2023-05-05T15:04:58Z

CPU profile one a single node repeatedly running the simplified repro:

rafiss · 2023-05-09T18:31:50Z

We saw this problem come up in #102851. @fqazi tracked it down to the fact that populating pg_description fetches all descriptors to load comments. We'll work on this in our next milestone.

Previously, we refactored the code to fetch comments, descriptors, and zone config together in all cases. A side effect of this change was that the crdb_internal.kv_system_comments table was substantially slower for larger tables leading to big regressions. To address this, this patch adds a method of only fetching comments within the collections. Informs: cockroachdb#102851 Fixes: cockroachdb#102613 Release note (bug fix): Optimize over-head of pg_catalog.pg_description and pg_catalog.pg_shdescription, which can lead to performance regression relative to 22.2

103106: sql/catalog: avoid fetching descriptors when fetching comments r=fqazi a=fqazi Previously, we refactored the code to fetch comments, descriptors, and zone config together in all cases. A side effect of this change was that the crdb_internal.kv_system_comments table was substantially slower for larger tables leading to big regressions. To address this, this patch adds a method of only fetching comments within the collections. Additionally, we added many builtins into both functions, which meant our existing virtual index look-up could end up falling back too frequently to full scans. Also, a virtual index is added on the kv_catalog_comments table for fast point look-ups when referencing descriptors. Informs: #102851 Fixes: #102613 Release note (bug fix): Optimize over-head of pg_catalog.pg_description and pg_catalog.pg_shdescription, which can lead to performance regression relative to 22.2 Co-authored-by: Faizan Qazi <[email protected]>

Previously, we refactored the code to fetch comments, descriptors, and zone config together in all cases. A side effect of this change was that the crdb_internal.kv_system_comments table was substantially slower for larger tables leading to big regressions. To address this, this patch adds a method of only fetching comments within the collections. Informs: #102851 Fixes: #102613 Release note (bug fix): Optimize over-head of pg_catalog.pg_description and pg_catalog.pg_shdescription, which can lead to performance regression relative to 22.2

dikshant added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-sql-pgcatalog T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) T-sql-queries SQL Queries Team labels Apr 28, 2023

dikshant changed the title ~~sql: pg_catalog query that resolves database objects and their descriptions is very slow due to a virtual lookup join~~ sql: query performance regression in pg_catalog query that resolves database objects and their descriptions Apr 28, 2023

mgartner added the S-2 Medium-high impact: many users impacted, risks of availability and difficult-to-fix data errors label Apr 28, 2023

dikshant mentioned this issue Apr 28, 2023

roachtest: add roachtest/workload for testing latency of ORM introspection queries #96359

Closed

rafiss assigned fqazi May 9, 2023

fqazi mentioned this issue May 10, 2023

sql/catalog: avoid fetching descriptors when fetching comments #102991

Closed

fqazi mentioned this issue May 11, 2023

sql/catalog: avoid fetching descriptors when fetching comments #103106

Merged

craig bot closed this as completed in c3b0af2 May 13, 2023

exalate-issue-sync bot removed the T-sql-queries SQL Queries Team label May 13, 2023

blathers-crl bot mentioned this issue May 15, 2023

release-23.1: sql/catalog: avoid fetching descriptors when fetching comments #103331

Merged

mgartner added this to SQL Queries Jul 24, 2023

mgartner moved this to Done in SQL Queries Jul 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql: query performance regression in pg_catalog query that resolves database objects and their descriptions #102613

sql: query performance regression in pg_catalog query that resolves database objects and their descriptions #102613

dikshant commented Apr 28, 2023 •

edited

Loading

mgartner commented Apr 28, 2023

mgartner commented Apr 28, 2023

rafiss commented Apr 30, 2023

DrewKimball commented May 5, 2023

rafiss commented May 9, 2023

sql: query performance regression in pg_catalog query that resolves database objects and their descriptions #102613

sql: query performance regression in pg_catalog query that resolves database objects and their descriptions #102613

Comments

dikshant commented Apr 28, 2023 • edited Loading

mgartner commented Apr 28, 2023

mgartner commented Apr 28, 2023

rafiss commented Apr 30, 2023

DrewKimball commented May 5, 2023

rafiss commented May 9, 2023

dikshant commented Apr 28, 2023 •

edited

Loading