-
Notifications
You must be signed in to change notification settings - Fork 901
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use numrows_pre_compression in approx row count #6365
Conversation
@erimatnor, @mahipv: please review this pull request.
|
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #6365 +/- ##
==========================================
- Coverage 86.96% 82.41% -4.56%
==========================================
Files 249 249
Lines 57966 57966
Branches 12903 12901 -2
==========================================
- Hits 50411 47773 -2638
- Misses 5176 6765 +1589
- Partials 2379 3428 +1049 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Just one comment is don't reference SDC issues in this public repo because other users don't have access it, so either you create a correspondent issue on the public repository or give more details about the problem you're solving on the PR. :-)
Yeah, this is for the cross linking with the SDC to allow its auto closure mostly. It's removed from the original commit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reference to the internal support case does not seem to be useful so you can remove it. Also wondering about the test coverage.
SELECT approximate_row_count('stattest'); | ||
approximate_row_count | ||
----------------------- | ||
0 | ||
26 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might want to add a test that compares this with the actual number of rows counted explicitly. For cases where you have serial execution, you should get the same value.
The test also seems to be missing some cases where you have a mix of uncompressed and compressed rows in a chunk, so might be good to verify that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mkindahl many additional tests added to this PR now
removed |
The approximate_row_count function was using the reltuples from compressed chunks and multiplying that with 1000 which is the default batch size. This was leading to a huge skew between the actual row count and the approximate one. We now use the numrows_pre_compression value from the timescaledb catalog which accurately represents the number of rows before the actual compression.
This release contains bug fixes since the 2.13.0 release. We recommend that you upgrade at the next available opportunity. **Bugfixes** * timescale#6365 Use numrows_pre_compression in approximate row count * timescale#6377 Use processed group clauses in PG16 * timescale#6384 Change bgw_log_level to use PGC_SUSET * timescale#6393 Disable vectorized sum for expressions. * timescale#6408 Fix groupby pathkeys for gapfill in PG16 * timescale#6428 Fix index matching during DML decompression * timescale#6439 Fix compressed chunk permission handling on PG16 * timescale#6443 Fix lost concurrent CAgg updates * timescale#6454 Fix unique expression indexes on compressed chunks * timescale#6465 Fix use of freed path in decompression sort logic **Thanks** * @MA-MacDonald for reporting an issue with gapfill in PG16 * @aarondglover for reporting an issue with unique expression indexes on compressed chunks * @adriangb for reporting an issue with security barrier views on pg16
This release contains bug fixes since the 2.13.0 release. We recommend that you upgrade at the next available opportunity. **Bugfixes** * timescale#6365 Use numrows_pre_compression in approximate row count * timescale#6377 Use processed group clauses in PG16 * timescale#6384 Change bgw_log_level to use PGC_SUSET * timescale#6393 Disable vectorized sum for expressions. * timescale#6408 Fix groupby pathkeys for gapfill in PG16 * timescale#6428 Fix index matching during DML decompression * timescale#6439 Fix compressed chunk permission handling on PG16 * timescale#6443 Fix lost concurrent CAgg updates * timescale#6454 Fix unique expression indexes on compressed chunks * timescale#6465 Fix use of freed path in decompression sort logic **Thanks** * @MA-MacDonald for reporting an issue with gapfill in PG16 * @aarondglover for reporting an issue with unique expression indexes on compressed chunks * @adriangb for reporting an issue with security barrier views on pg16
This release contains bug fixes since the 2.13.0 release. We recommend that you upgrade at the next available opportunity. **Bugfixes** * timescale#6365 Use numrows_pre_compression in approximate row count * timescale#6377 Use processed group clauses in PG16 * timescale#6384 Change bgw_log_level to use PGC_SUSET * timescale#6393 Disable vectorized sum for expressions. * timescale#6408 Fix groupby pathkeys for gapfill in PG16 * timescale#6428 Fix index matching during DML decompression * timescale#6439 Fix compressed chunk permission handling on PG16 * timescale#6443 Fix lost concurrent CAgg updates * timescale#6454 Fix unique expression indexes on compressed chunks * timescale#6465 Fix use of freed path in decompression sort logic **Thanks** * @MA-MacDonald for reporting an issue with gapfill in PG16 * @aarondglover for reporting an issue with unique expression indexes on compressed chunks * @adriangb for reporting an issue with security barrier views on pg16
This release contains bug fixes since the 2.13.0 release. We recommend that you upgrade at the next available opportunity. **Bugfixes** * timescale#6365 Use numrows_pre_compression in approximate row count * timescale#6377 Use processed group clauses in PG16 * timescale#6384 Change bgw_log_level to use PGC_SUSET * timescale#6393 Disable vectorized sum for expressions. * timescale#6408 Fix groupby pathkeys for gapfill in PG16 * timescale#6428 Fix index matching during DML decompression * timescale#6439 Fix compressed chunk permission handling on PG16 * timescale#6443 Fix lost concurrent CAgg updates * timescale#6454 Fix unique expression indexes on compressed chunks * timescale#6465 Fix use of freed path in decompression sort logic **Thanks** * @MA-MacDonald for reporting an issue with gapfill in PG16 * @aarondglover for reporting an issue with unique expression indexes on compressed chunks * @adriangb for reporting an issue with security barrier views on pg16
This release contains bug fixes since the 2.13.0 release. We recommend that you upgrade at the next available opportunity. **Bugfixes** * timescale#6365 Use numrows_pre_compression in approximate row count * timescale#6377 Use processed group clauses in PG16 * timescale#6384 Change bgw_log_level to use PGC_SUSET * timescale#6393 Disable vectorized sum for expressions. * timescale#6408 Fix groupby pathkeys for gapfill in PG16 * timescale#6428 Fix index matching during DML decompression * timescale#6439 Fix compressed chunk permission handling on PG16 * timescale#6443 Fix lost concurrent CAgg updates * timescale#6454 Fix unique expression indexes on compressed chunks * timescale#6465 Fix use of freed path in decompression sort logic **Thanks** * @MA-MacDonald for reporting an issue with gapfill in PG16 * @aarondglover for reporting an issue with unique expression indexes on compressed chunks * @adriangb for reporting an issue with security barrier views on pg16
This release contains bug fixes since the 2.13.0 release. We recommend that you upgrade at the next available opportunity. **Bugfixes** * #6365 Use numrows_pre_compression in approximate row count * #6377 Use processed group clauses in PG16 * #6384 Change bgw_log_level to use PGC_SUSET * #6393 Disable vectorized sum for expressions. * #6408 Fix groupby pathkeys for gapfill in PG16 * #6428 Fix index matching during DML decompression * #6439 Fix compressed chunk permission handling on PG16 * #6443 Fix lost concurrent CAgg updates * #6454 Fix unique expression indexes on compressed chunks * #6465 Fix use of freed path in decompression sort logic **Thanks** * @MA-MacDonald for reporting an issue with gapfill in PG16 * @aarondglover for reporting an issue with unique expression indexes on compressed chunks * @adriangb for reporting an issue with security barrier views on pg16
This release contains bug fixes since the 2.13.0 release. We recommend that you upgrade at the next available opportunity. **Bugfixes** * timescale#6365 Use numrows_pre_compression in approximate row count * timescale#6377 Use processed group clauses in PG16 * timescale#6384 Change bgw_log_level to use PGC_SUSET * timescale#6393 Disable vectorized sum for expressions. * timescale#6405 Read CAgg watermark from materialized data * timescale#6408 Fix groupby pathkeys for gapfill in PG16 * timescale#6428 Fix index matching during DML decompression * timescale#6439 Fix compressed chunk permission handling on PG16 * timescale#6443 Fix lost concurrent CAgg updates * timescale#6454 Fix unique expression indexes on compressed chunks * timescale#6465 Fix use of freed path in decompression sort logic **Thanks** * @MA-MacDonald for reporting an issue with gapfill in PG16 * @aarondglover for reporting an issue with unique expression indexes on compressed chunks * @adriangb for reporting an issue with security barrier views on pg16
This release contains bug fixes since the 2.13.0 release. We recommend that you upgrade at the next available opportunity. **Bugfixes** * timescale#6365 Use numrows_pre_compression in approximate row count * timescale#6377 Use processed group clauses in PG16 * timescale#6384 Change bgw_log_level to use PGC_SUSET * timescale#6393 Disable vectorized sum for expressions. * timescale#6405 Read CAgg watermark from materialized data * timescale#6408 Fix groupby pathkeys for gapfill in PG16 * timescale#6428 Fix index matching during DML decompression * timescale#6439 Fix compressed chunk permission handling on PG16 * timescale#6443 Fix lost concurrent CAgg updates * timescale#6454 Fix unique expression indexes on compressed chunks * timescale#6465 Fix use of freed path in decompression sort logic **Thanks** * @MA-MacDonald for reporting an issue with gapfill in PG16 * @aarondglover for reporting an issue with unique expression indexes on compressed chunks * @adriangb for reporting an issue with security barrier views on pg16
This release contains bug fixes since the 2.13.0 release. We recommend that you upgrade at the next available opportunity. **Bugfixes** * timescale#6365 Use numrows_pre_compression in approximate row count * timescale#6377 Use processed group clauses in PG16 * timescale#6384 Change bgw_log_level to use PGC_SUSET * timescale#6393 Disable vectorized sum for expressions. * timescale#6405 Read CAgg watermark from materialized data * timescale#6408 Fix groupby pathkeys for gapfill in PG16 * timescale#6428 Fix index matching during DML decompression * timescale#6439 Fix compressed chunk permission handling on PG16 * timescale#6443 Fix lost concurrent CAgg updates * timescale#6454 Fix unique expression indexes on compressed chunks * timescale#6465 Fix use of freed path in decompression sort logic **Thanks** * @MA-MacDonald for reporting an issue with gapfill in PG16 * @aarondglover for reporting an issue with unique expression indexes on compressed chunks * @adriangb for reporting an issue with security barrier views on pg16
This release contains bug fixes since the 2.13.0 release. We recommend that you upgrade at the next available opportunity. **Bugfixes** * timescale#6365 Use numrows_pre_compression in approximate row count * timescale#6377 Use processed group clauses in PG16 * timescale#6384 Change bgw_log_level to use PGC_SUSET * timescale#6393 Disable vectorized sum for expressions. * timescale#6405 Read CAgg watermark from materialized data * timescale#6408 Fix groupby pathkeys for gapfill in PG16 * timescale#6428 Fix index matching during DML decompression * timescale#6439 Fix compressed chunk permission handling on PG16 * timescale#6443 Fix lost concurrent CAgg updates * timescale#6454 Fix unique expression indexes on compressed chunks * timescale#6465 Fix use of freed path in decompression sort logic **Thanks** * @MA-MacDonald for reporting an issue with gapfill in PG16 * @aarondglover for reporting an issue with unique expression indexes on compressed chunks * @adriangb for reporting an issue with security barrier views on pg16
This release contains bug fixes since the 2.13.0 release. We recommend that you upgrade at the next available opportunity. **Bugfixes** * timescale#6365 Use numrows_pre_compression in approximate row count * timescale#6377 Use processed group clauses in PG16 * timescale#6384 Change bgw_log_level to use PGC_SUSET * timescale#6393 Disable vectorized sum for expressions. * timescale#6405 Read CAgg watermark from materialized data * timescale#6408 Fix groupby pathkeys for gapfill in PG16 * timescale#6428 Fix index matching during DML decompression * timescale#6439 Fix compressed chunk permission handling on PG16 * timescale#6443 Fix lost concurrent CAgg updates * timescale#6454 Fix unique expression indexes on compressed chunks * timescale#6465 Fix use of freed path in decompression sort logic **Thanks** * @MA-MacDonald for reporting an issue with gapfill in PG16 * @aarondglover for reporting an issue with unique expression indexes on compressed chunks * @adriangb for reporting an issue with security barrier views on pg16
This release contains bug fixes since the 2.13.0 release. We recommend that you upgrade at the next available opportunity. **Bugfixes** * timescale#6365 Use numrows_pre_compression in approximate row count * timescale#6377 Use processed group clauses in PG16 * timescale#6384 Change bgw_log_level to use PGC_SUSET * timescale#6393 Disable vectorized sum for expressions. * timescale#6405 Read CAgg watermark from materialized data * timescale#6408 Fix groupby pathkeys for gapfill in PG16 * timescale#6428 Fix index matching during DML decompression * timescale#6439 Fix compressed chunk permission handling on PG16 * timescale#6443 Fix lost concurrent CAgg updates * timescale#6454 Fix unique expression indexes on compressed chunks * timescale#6465 Fix use of freed path in decompression sort logic **Thanks** * @MA-MacDonald for reporting an issue with gapfill in PG16 * @aarondglover for reporting an issue with unique expression indexes on compressed chunks * @adriangb for reporting an issue with security barrier views on pg16
This release contains bug fixes since the 2.13.0 release. We recommend that you upgrade at the next available opportunity. **Bugfixes** * timescale#6365 Use numrows_pre_compression in approximate row count * timescale#6377 Use processed group clauses in PG16 * timescale#6384 Change bgw_log_level to use PGC_SUSET * timescale#6393 Disable vectorized sum for expressions. * timescale#6405 Read CAgg watermark from materialized data * timescale#6408 Fix groupby pathkeys for gapfill in PG16 * timescale#6428 Fix index matching during DML decompression * timescale#6439 Fix compressed chunk permission handling on PG16 * timescale#6443 Fix lost concurrent CAgg updates * timescale#6454 Fix unique expression indexes on compressed chunks * timescale#6465 Fix use of freed path in decompression sort logic **Thanks** * @MA-MacDonald for reporting an issue with gapfill in PG16 * @aarondglover for reporting an issue with unique expression indexes on compressed chunks * @adriangb for reporting an issue with security barrier views on pg16
This release contains bug fixes since the 2.13.0 release. We recommend that you upgrade at the next available opportunity. **Bugfixes** * timescale#6365 Use numrows_pre_compression in approximate row count * timescale#6377 Use processed group clauses in PG16 * timescale#6384 Change bgw_log_level to use PGC_SUSET * timescale#6393 Disable vectorized sum for expressions. * timescale#6405 Read CAgg watermark from materialized data * timescale#6408 Fix groupby pathkeys for gapfill in PG16 * timescale#6428 Fix index matching during DML decompression * timescale#6439 Fix compressed chunk permission handling on PG16 * timescale#6443 Fix lost concurrent CAgg updates * timescale#6454 Fix unique expression indexes on compressed chunks * timescale#6465 Fix use of freed path in decompression sort logic **Thanks** * @MA-MacDonald for reporting an issue with gapfill in PG16 * @aarondglover for reporting an issue with unique expression indexes on compressed chunks * @adriangb for reporting an issue with security barrier views on pg16
This release contains bug fixes since the 2.13.0 release. We recommend that you upgrade at the next available opportunity. **Bugfixes** * #6365 Use numrows_pre_compression in approximate row count * #6377 Use processed group clauses in PG16 * #6384 Change bgw_log_level to use PGC_SUSET * #6393 Disable vectorized sum for expressions. * #6405 Read CAgg watermark from materialized data * #6408 Fix groupby pathkeys for gapfill in PG16 * #6428 Fix index matching during DML decompression * #6439 Fix compressed chunk permission handling on PG16 * #6443 Fix lost concurrent CAgg updates * #6454 Fix unique expression indexes on compressed chunks * #6465 Fix use of freed path in decompression sort logic **Thanks** * @MA-MacDonald for reporting an issue with gapfill in PG16 * @aarondglover for reporting an issue with unique expression indexes on compressed chunks * @adriangb for reporting an issue with security barrier views on pg16
The approximate_row_count function was using the reltuples from compressed chunks and multiplying that with 1000 which is the default batch size. This was leading to a huge skew between the actual row count and the approximate one. We now use the numrows_pre_compression value from the timescaledb catalog which accurately represents the number of rows before the actual compression.