Use numrows_pre_compression in approx row count #6365

nikkhils · 2023-11-30T15:26:29Z

The approximate_row_count function was using the reltuples from compressed chunks and multiplying that with 1000 which is the default batch size. This was leading to a huge skew between the actual row count and the approximate one. We now use the numrows_pre_compression value from the timescaledb catalog which accurately represents the number of rows before the actual compression.

github-actions · 2023-11-30T15:26:51Z

@erimatnor, @mahipv: please review this pull request.

Powered by pull-review

codecov · 2023-11-30T15:36:54Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (ef030d2) 86.96% compared to head (e25a779) 82.41%.

❗ Current head e25a779 differs from pull request most recent head 6800284. Consider uploading reports for the commit 6800284 to get more accurate results

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #6365      +/-   ##
==========================================
- Coverage   86.96%   82.41%   -4.56%     
==========================================
  Files         249      249              
  Lines       57966    57966              
  Branches    12903    12901       -2     
==========================================
- Hits        50411    47773    -2638     
- Misses       5176     6765    +1589     
- Partials     2379     3428    +1049

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

fabriziomello

LGTM.

Just one comment is don't reference SDC issues in this public repo because other users don't have access it, so either you create a correspondent issue on the public repository or give more details about the problem you're solving on the PR. :-)

nikkhils · 2023-12-01T04:31:38Z

LGTM.

Just one comment is don't reference SDC issues in this public repo because other users don't have access it, so either you create a correspondent issue on the public repository or give more details about the problem you're solving on the PR. :-)

Yeah, this is for the cross linking with the SDC to allow its auto closure mostly. It's removed from the original commit

mkindahl

The reference to the internal support case does not seem to be useful so you can remove it. Also wondering about the test coverage.

mkindahl · 2023-12-01T10:26:54Z

tsl/test/expected/compression.out

 SELECT approximate_row_count('stattest');
 approximate_row_count 
 -----------------------
-                     0
+                    26


You might want to add a test that compares this with the actual number of rows counted explicitly. For cases where you have serial execution, you should get the same value.

The test also seems to be missing some cases where you have a mix of uncompressed and compressed rows in a chunk, so might be good to verify that.

@mkindahl many additional tests added to this PR now

nikkhils · 2023-12-01T11:13:38Z

The reference to the internal support case does not seem to be useful so you can remove it. Also wondering about the test coverage.

removed

The approximate_row_count function was using the reltuples from compressed chunks and multiplying that with 1000 which is the default batch size. This was leading to a huge skew between the actual row count and the approximate one. We now use the numrows_pre_compression value from the timescaledb catalog which accurately represents the number of rows before the actual compression.

@MA-MacDonald

This release contains bug fixes since the 2.13.0 release. We recommend that you upgrade at the next available opportunity. **Bugfixes** * timescale#6365 Use numrows_pre_compression in approximate row count * timescale#6377 Use processed group clauses in PG16 * timescale#6384 Change bgw_log_level to use PGC_SUSET * timescale#6393 Disable vectorized sum for expressions. * timescale#6408 Fix groupby pathkeys for gapfill in PG16 * timescale#6428 Fix index matching during DML decompression * timescale#6439 Fix compressed chunk permission handling on PG16 * timescale#6443 Fix lost concurrent CAgg updates * timescale#6454 Fix unique expression indexes on compressed chunks * timescale#6465 Fix use of freed path in decompression sort logic **Thanks** * @MA-MacDonald for reporting an issue with gapfill in PG16 * @aarondglover for reporting an issue with unique expression indexes on compressed chunks * @adriangb for reporting an issue with security barrier views on pg16

@MA-MacDonald

This release contains bug fixes since the 2.13.0 release. We recommend that you upgrade at the next available opportunity. **Bugfixes** * timescale#6365 Use numrows_pre_compression in approximate row count * timescale#6377 Use processed group clauses in PG16 * timescale#6384 Change bgw_log_level to use PGC_SUSET * timescale#6393 Disable vectorized sum for expressions. * timescale#6408 Fix groupby pathkeys for gapfill in PG16 * timescale#6428 Fix index matching during DML decompression * timescale#6439 Fix compressed chunk permission handling on PG16 * timescale#6443 Fix lost concurrent CAgg updates * timescale#6454 Fix unique expression indexes on compressed chunks * timescale#6465 Fix use of freed path in decompression sort logic **Thanks** * @MA-MacDonald for reporting an issue with gapfill in PG16 * @aarondglover for reporting an issue with unique expression indexes on compressed chunks * @adriangb for reporting an issue with security barrier views on pg16

@MA-MacDonald

This release contains bug fixes since the 2.13.0 release. We recommend that you upgrade at the next available opportunity. **Bugfixes** * timescale#6365 Use numrows_pre_compression in approximate row count * timescale#6377 Use processed group clauses in PG16 * timescale#6384 Change bgw_log_level to use PGC_SUSET * timescale#6393 Disable vectorized sum for expressions. * timescale#6408 Fix groupby pathkeys for gapfill in PG16 * timescale#6428 Fix index matching during DML decompression * timescale#6439 Fix compressed chunk permission handling on PG16 * timescale#6443 Fix lost concurrent CAgg updates * timescale#6454 Fix unique expression indexes on compressed chunks * timescale#6465 Fix use of freed path in decompression sort logic **Thanks** * @MA-MacDonald for reporting an issue with gapfill in PG16 * @aarondglover for reporting an issue with unique expression indexes on compressed chunks * @adriangb for reporting an issue with security barrier views on pg16

@MA-MacDonald

This release contains bug fixes since the 2.13.0 release. We recommend that you upgrade at the next available opportunity. **Bugfixes** * timescale#6365 Use numrows_pre_compression in approximate row count * timescale#6377 Use processed group clauses in PG16 * timescale#6384 Change bgw_log_level to use PGC_SUSET * timescale#6393 Disable vectorized sum for expressions. * timescale#6408 Fix groupby pathkeys for gapfill in PG16 * timescale#6428 Fix index matching during DML decompression * timescale#6439 Fix compressed chunk permission handling on PG16 * timescale#6443 Fix lost concurrent CAgg updates * timescale#6454 Fix unique expression indexes on compressed chunks * timescale#6465 Fix use of freed path in decompression sort logic **Thanks** * @MA-MacDonald for reporting an issue with gapfill in PG16 * @aarondglover for reporting an issue with unique expression indexes on compressed chunks * @adriangb for reporting an issue with security barrier views on pg16

@MA-MacDonald

This release contains bug fixes since the 2.13.0 release. We recommend that you upgrade at the next available opportunity. **Bugfixes** * timescale#6365 Use numrows_pre_compression in approximate row count * timescale#6377 Use processed group clauses in PG16 * timescale#6384 Change bgw_log_level to use PGC_SUSET * timescale#6393 Disable vectorized sum for expressions. * timescale#6408 Fix groupby pathkeys for gapfill in PG16 * timescale#6428 Fix index matching during DML decompression * timescale#6439 Fix compressed chunk permission handling on PG16 * timescale#6443 Fix lost concurrent CAgg updates * timescale#6454 Fix unique expression indexes on compressed chunks * timescale#6465 Fix use of freed path in decompression sort logic **Thanks** * @MA-MacDonald for reporting an issue with gapfill in PG16 * @aarondglover for reporting an issue with unique expression indexes on compressed chunks * @adriangb for reporting an issue with security barrier views on pg16

@MA-MacDonald

This release contains bug fixes since the 2.13.0 release. We recommend that you upgrade at the next available opportunity. **Bugfixes** * #6365 Use numrows_pre_compression in approximate row count * #6377 Use processed group clauses in PG16 * #6384 Change bgw_log_level to use PGC_SUSET * #6393 Disable vectorized sum for expressions. * #6408 Fix groupby pathkeys for gapfill in PG16 * #6428 Fix index matching during DML decompression * #6439 Fix compressed chunk permission handling on PG16 * #6443 Fix lost concurrent CAgg updates * #6454 Fix unique expression indexes on compressed chunks * #6465 Fix use of freed path in decompression sort logic **Thanks** * @MA-MacDonald for reporting an issue with gapfill in PG16 * @aarondglover for reporting an issue with unique expression indexes on compressed chunks * @adriangb for reporting an issue with security barrier views on pg16

@MA-MacDonald

This release contains bug fixes since the 2.13.0 release. We recommend that you upgrade at the next available opportunity. **Bugfixes** * timescale#6365 Use numrows_pre_compression in approximate row count * timescale#6377 Use processed group clauses in PG16 * timescale#6384 Change bgw_log_level to use PGC_SUSET * timescale#6393 Disable vectorized sum for expressions. * timescale#6405 Read CAgg watermark from materialized data * timescale#6408 Fix groupby pathkeys for gapfill in PG16 * timescale#6428 Fix index matching during DML decompression * timescale#6439 Fix compressed chunk permission handling on PG16 * timescale#6443 Fix lost concurrent CAgg updates * timescale#6454 Fix unique expression indexes on compressed chunks * timescale#6465 Fix use of freed path in decompression sort logic **Thanks** * @MA-MacDonald for reporting an issue with gapfill in PG16 * @aarondglover for reporting an issue with unique expression indexes on compressed chunks * @adriangb for reporting an issue with security barrier views on pg16

@MA-MacDonald

This release contains bug fixes since the 2.13.0 release. We recommend that you upgrade at the next available opportunity. **Bugfixes** * timescale#6365 Use numrows_pre_compression in approximate row count * timescale#6377 Use processed group clauses in PG16 * timescale#6384 Change bgw_log_level to use PGC_SUSET * timescale#6393 Disable vectorized sum for expressions. * timescale#6405 Read CAgg watermark from materialized data * timescale#6408 Fix groupby pathkeys for gapfill in PG16 * timescale#6428 Fix index matching during DML decompression * timescale#6439 Fix compressed chunk permission handling on PG16 * timescale#6443 Fix lost concurrent CAgg updates * timescale#6454 Fix unique expression indexes on compressed chunks * timescale#6465 Fix use of freed path in decompression sort logic **Thanks** * @MA-MacDonald for reporting an issue with gapfill in PG16 * @aarondglover for reporting an issue with unique expression indexes on compressed chunks * @adriangb for reporting an issue with security barrier views on pg16

@MA-MacDonald

This release contains bug fixes since the 2.13.0 release. We recommend that you upgrade at the next available opportunity. **Bugfixes** * timescale#6365 Use numrows_pre_compression in approximate row count * timescale#6377 Use processed group clauses in PG16 * timescale#6384 Change bgw_log_level to use PGC_SUSET * timescale#6393 Disable vectorized sum for expressions. * timescale#6405 Read CAgg watermark from materialized data * timescale#6408 Fix groupby pathkeys for gapfill in PG16 * timescale#6428 Fix index matching during DML decompression * timescale#6439 Fix compressed chunk permission handling on PG16 * timescale#6443 Fix lost concurrent CAgg updates * timescale#6454 Fix unique expression indexes on compressed chunks * timescale#6465 Fix use of freed path in decompression sort logic **Thanks** * @MA-MacDonald for reporting an issue with gapfill in PG16 * @aarondglover for reporting an issue with unique expression indexes on compressed chunks * @adriangb for reporting an issue with security barrier views on pg16

@MA-MacDonald

This release contains bug fixes since the 2.13.0 release. We recommend that you upgrade at the next available opportunity. **Bugfixes** * timescale#6365 Use numrows_pre_compression in approximate row count * timescale#6377 Use processed group clauses in PG16 * timescale#6384 Change bgw_log_level to use PGC_SUSET * timescale#6393 Disable vectorized sum for expressions. * timescale#6405 Read CAgg watermark from materialized data * timescale#6408 Fix groupby pathkeys for gapfill in PG16 * timescale#6428 Fix index matching during DML decompression * timescale#6439 Fix compressed chunk permission handling on PG16 * timescale#6443 Fix lost concurrent CAgg updates * timescale#6454 Fix unique expression indexes on compressed chunks * timescale#6465 Fix use of freed path in decompression sort logic **Thanks** * @MA-MacDonald for reporting an issue with gapfill in PG16 * @aarondglover for reporting an issue with unique expression indexes on compressed chunks * @adriangb for reporting an issue with security barrier views on pg16

@MA-MacDonald

This release contains bug fixes since the 2.13.0 release. We recommend that you upgrade at the next available opportunity. **Bugfixes** * timescale#6365 Use numrows_pre_compression in approximate row count * timescale#6377 Use processed group clauses in PG16 * timescale#6384 Change bgw_log_level to use PGC_SUSET * timescale#6393 Disable vectorized sum for expressions. * timescale#6405 Read CAgg watermark from materialized data * timescale#6408 Fix groupby pathkeys for gapfill in PG16 * timescale#6428 Fix index matching during DML decompression * timescale#6439 Fix compressed chunk permission handling on PG16 * timescale#6443 Fix lost concurrent CAgg updates * timescale#6454 Fix unique expression indexes on compressed chunks * timescale#6465 Fix use of freed path in decompression sort logic **Thanks** * @MA-MacDonald for reporting an issue with gapfill in PG16 * @aarondglover for reporting an issue with unique expression indexes on compressed chunks * @adriangb for reporting an issue with security barrier views on pg16

@MA-MacDonald

This release contains bug fixes since the 2.13.0 release. We recommend that you upgrade at the next available opportunity. **Bugfixes** * timescale#6365 Use numrows_pre_compression in approximate row count * timescale#6377 Use processed group clauses in PG16 * timescale#6384 Change bgw_log_level to use PGC_SUSET * timescale#6393 Disable vectorized sum for expressions. * timescale#6405 Read CAgg watermark from materialized data * timescale#6408 Fix groupby pathkeys for gapfill in PG16 * timescale#6428 Fix index matching during DML decompression * timescale#6439 Fix compressed chunk permission handling on PG16 * timescale#6443 Fix lost concurrent CAgg updates * timescale#6454 Fix unique expression indexes on compressed chunks * timescale#6465 Fix use of freed path in decompression sort logic **Thanks** * @MA-MacDonald for reporting an issue with gapfill in PG16 * @aarondglover for reporting an issue with unique expression indexes on compressed chunks * @adriangb for reporting an issue with security barrier views on pg16

@MA-MacDonald

This release contains bug fixes since the 2.13.0 release. We recommend that you upgrade at the next available opportunity. **Bugfixes** * timescale#6365 Use numrows_pre_compression in approximate row count * timescale#6377 Use processed group clauses in PG16 * timescale#6384 Change bgw_log_level to use PGC_SUSET * timescale#6393 Disable vectorized sum for expressions. * timescale#6405 Read CAgg watermark from materialized data * timescale#6408 Fix groupby pathkeys for gapfill in PG16 * timescale#6428 Fix index matching during DML decompression * timescale#6439 Fix compressed chunk permission handling on PG16 * timescale#6443 Fix lost concurrent CAgg updates * timescale#6454 Fix unique expression indexes on compressed chunks * timescale#6465 Fix use of freed path in decompression sort logic **Thanks** * @MA-MacDonald for reporting an issue with gapfill in PG16 * @aarondglover for reporting an issue with unique expression indexes on compressed chunks * @adriangb for reporting an issue with security barrier views on pg16

@MA-MacDonald

This release contains bug fixes since the 2.13.0 release. We recommend that you upgrade at the next available opportunity. **Bugfixes** * timescale#6365 Use numrows_pre_compression in approximate row count * timescale#6377 Use processed group clauses in PG16 * timescale#6384 Change bgw_log_level to use PGC_SUSET * timescale#6393 Disable vectorized sum for expressions. * timescale#6405 Read CAgg watermark from materialized data * timescale#6408 Fix groupby pathkeys for gapfill in PG16 * timescale#6428 Fix index matching during DML decompression * timescale#6439 Fix compressed chunk permission handling on PG16 * timescale#6443 Fix lost concurrent CAgg updates * timescale#6454 Fix unique expression indexes on compressed chunks * timescale#6465 Fix use of freed path in decompression sort logic **Thanks** * @MA-MacDonald for reporting an issue with gapfill in PG16 * @aarondglover for reporting an issue with unique expression indexes on compressed chunks * @adriangb for reporting an issue with security barrier views on pg16

@MA-MacDonald

This release contains bug fixes since the 2.13.0 release. We recommend that you upgrade at the next available opportunity. **Bugfixes** * #6365 Use numrows_pre_compression in approximate row count * #6377 Use processed group clauses in PG16 * #6384 Change bgw_log_level to use PGC_SUSET * #6393 Disable vectorized sum for expressions. * #6405 Read CAgg watermark from materialized data * #6408 Fix groupby pathkeys for gapfill in PG16 * #6428 Fix index matching during DML decompression * #6439 Fix compressed chunk permission handling on PG16 * #6443 Fix lost concurrent CAgg updates * #6454 Fix unique expression indexes on compressed chunks * #6465 Fix use of freed path in decompression sort logic **Thanks** * @MA-MacDonald for reporting an issue with gapfill in PG16 * @aarondglover for reporting an issue with unique expression indexes on compressed chunks * @adriangb for reporting an issue with security barrier views on pg16

nikkhils self-assigned this Nov 30, 2023

github-actions bot requested review from erimatnor and mahipv November 30, 2023 15:26

nikkhils force-pushed the comp_rows branch from cd0bbe1 to 01c742e Compare November 30, 2023 15:28

fabriziomello approved these changes Nov 30, 2023

View reviewed changes

nikkhils force-pushed the comp_rows branch from 01c742e to e25a779 Compare December 1, 2023 04:30

nikkhils requested review from mkindahl and antekresic December 1, 2023 04:31

mkindahl reviewed Dec 1, 2023

View reviewed changes

nikkhils force-pushed the comp_rows branch from e25a779 to 6800284 Compare December 4, 2023 14:44

mkindahl approved these changes Dec 4, 2023

View reviewed changes

nikkhils merged commit 293104a into timescale:main Dec 4, 2023
42 checks passed

nikkhils deleted the comp_rows branch December 4, 2023 16:57

jnidzwetzki added the force-auto-backport Automatically backport this PR or fix of this issue, even if it's not marked as "bug" label Jan 3, 2024

timescale-automation mentioned this pull request Jan 3, 2024

Backport to 2.13.x: #6365: Use numrows_pre_compression in approx row count #6490

Merged

jnidzwetzki mentioned this pull request Jan 3, 2024

Release 2.13.1 #6492

Merged

timescale-automation added the backported-2.13.x label Jan 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use numrows_pre_compression in approx row count #6365

Use numrows_pre_compression in approx row count #6365

nikkhils commented Nov 30, 2023 •

edited

Loading

github-actions bot commented Nov 30, 2023

codecov bot commented Nov 30, 2023 •

edited

Loading

fabriziomello left a comment

nikkhils commented Dec 1, 2023

mkindahl left a comment

mkindahl Dec 1, 2023

nikkhils Dec 4, 2023

nikkhils commented Dec 1, 2023

Use numrows_pre_compression in approx row count #6365

Use numrows_pre_compression in approx row count #6365

Conversation

nikkhils commented Nov 30, 2023 • edited Loading

github-actions bot commented Nov 30, 2023

codecov bot commented Nov 30, 2023 • edited Loading

Codecov Report

fabriziomello left a comment

Choose a reason for hiding this comment

nikkhils commented Dec 1, 2023

mkindahl left a comment

Choose a reason for hiding this comment

mkindahl Dec 1, 2023

Choose a reason for hiding this comment

nikkhils Dec 4, 2023

Choose a reason for hiding this comment

nikkhils commented Dec 1, 2023

nikkhils commented Nov 30, 2023 •

edited

Loading

codecov bot commented Nov 30, 2023 •

edited

Loading