Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read watermark from materialized data #6405

Conversation

fabriziomello
Copy link
Contributor

@fabriziomello fabriziomello commented Dec 11, 2023

In 38fcd1b we improved the cagg_watermark performance by storing it into a metadata table and update it during the refresh.

But we made a minor mistake here reading the watermark from the partial view instead of the already materialized data that should be much fast because we're reading already aggregated data.

Fixed this mistake by reading the watermark from the underlying materialization hypertable (already aggregated data).

Disable-check: force-changelog-file

@fabriziomello fabriziomello self-assigned this Dec 11, 2023
@fabriziomello fabriziomello force-pushed the cagg_read_watermark_from_materialized_data branch 2 times, most recently from 7230cf2 to ba8bfd5 Compare December 15, 2023 22:58
Copy link

codecov bot commented Dec 15, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (4f2f658) 87.33% compared to head (0e12772) 87.32%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #6405      +/-   ##
==========================================
- Coverage   87.33%   87.32%   -0.02%     
==========================================
  Files         187      187              
  Lines       41820    41768      -52     
  Branches     9313     9291      -22     
==========================================
- Hits        36525    36472      -53     
+ Misses       3623     3619       -4     
- Partials     1672     1677       +5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@fabriziomello fabriziomello force-pushed the cagg_read_watermark_from_materialized_data branch from ba8bfd5 to 0cdad42 Compare December 18, 2023 14:21
@fabriziomello fabriziomello marked this pull request as ready for review December 18, 2023 14:53
@github-actions github-actions bot requested review from akuzm and nikkhils December 18, 2023 14:54
Copy link

@akuzm, @nikkhils: please review this pull request.

Powered by pull-review

@fabriziomello fabriziomello force-pushed the cagg_read_watermark_from_materialized_data branch from 0cdad42 to 02606b9 Compare December 18, 2023 15:21
@fabriziomello fabriziomello force-pushed the cagg_read_watermark_from_materialized_data branch from 02606b9 to 97a1176 Compare December 18, 2023 16:43
@fabriziomello
Copy link
Contributor Author

Simple Benchmark

Schema

CREATE TABLE conditions(
    time timestamp with time zone NOT NULL,
    device_id INTEGER,
    temperature FLOAT,
    humidity FLOAT
);

SELECT * FROM create_hypertable('conditions', 'time', chunk_time_interval => '1 hour'::interval);

INSERT INTO conditions
SELECT time, (random()*3 + 1)::int, random()*80 - 40, random()*100
FROM generate_series(now() - INTERVAL '2 years', now(), '1 minute') AS time;

CREATE MATERIALIZED VIEW cagg WITH (timescaledb.continuous) AS 
SELECT time_bucket('1 hour', "time"), device_id, AVG(temperature) AS avg_temp, AVG(humidity) AS avg_hum
FROM conditions
GROUP BY 1, 2
WITH NO DATA;

Refresh on main branch

372898 (leader) fabrizio=# CALL refresh_continuous_aggregate ('cagg', NULL, NULL);
CALL
Time: 522531,804 ms (08:42,532)

Some logs:

2023-12-18 14:45:30.426 -03 [372898] LOG:  deleted 0 row(s) from materialization table "_timescaledb_internal._materialized_hypertable_3"
2023-12-18 14:45:30.426 -03 [372898] STATEMENT:  CALL refresh_continuous_aggregate ('cagg', NULL, NULL);
2023-12-18 14:48:50.027 -03 [372898] LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp372898.0", size 34390016
2023-12-18 14:48:50.027 -03 [372898] CONTEXT:  SQL statement "INSERT INTO _timescaledb_internal._materialized_hypertable_3 SELECT * FROM _timescaledb_internal._partial_view_3 AS I WHERE I.time_bucket >= '4714-11-23 20:53:32-03:06:28 BC' AND I.time_bucket < '2023-12-18 15:00:00-03' ;"
2023-12-18 14:48:50.027 -03 [372898] STATEMENT:  CALL refresh_continuous_aggregate ('cagg', NULL, NULL);
2023-12-18 14:48:50.217 -03 [372898] LOG:  inserted 70084 row(s) into materialization table "_timescaledb_internal._materialized_hypertable_3"
2023-12-18 14:48:50.217 -03 [372898] STATEMENT:  CALL refresh_continuous_aggregate ('cagg', NULL, NULL);
2023-12-18 14:52:06.832 -03 [372898] LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp372898.1", size 1548288
2023-12-18 14:52:06.832 -03 [372898] CONTEXT:  SQL statement "SELECT pg_catalog.max(time_bucket) FROM _timescaledb_internal._partial_view_3 AS I WHERE I.time_bucket >= '4714-11-23 20:53:32-03:06:28 BC' AND I.time_bucket < '2023-12-18 15:00:00-03' ;"
2023-12-18 14:52:06.832 -03 [372898] STATEMENT:  CALL refresh_continuous_aggregate ('cagg', NULL, NULL);
2023-12-18 14:52:07.131 -03 [372898] LOG:  duration: 522515.833 ms  statement: CALL refresh_continuous_aggregate ('cagg', NULL, NULL);

Refresh using the current PR

367649 (leader) fabrizio=# CALL refresh_continuous_aggregate ('cagg', NULL, NULL);
CALL
Time: 328990,482 ms (05:28,990)

Some logs:

2023-12-18 14:24:08.131 -03 [367649] LOG:  deleted 0 row(s) from materialization table "_timescaledb_internal._materialized_hypertable_3"
2023-12-18 14:24:08.131 -03 [367649] STATEMENT:  CALL refresh_continuous_aggregate ('cagg', NULL, NULL);
2023-12-18 14:27:41.860 -03 [367649] LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp367649.3", size 34390016
2023-12-18 14:27:41.860 -03 [367649] CONTEXT:  SQL statement "INSERT INTO _timescaledb_internal._materialized_hypertable_3 SELECT * FROM _timescaledb_internal._partial_view_3 AS I WHERE I.time_bucket >= '4714-11-23 20:53:32-03:06:28 BC' AND I.time_bucket < '2023-12-18 15:00:00-03' ;"
2023-12-18 14:27:41.860 -03 [367649] STATEMENT:  CALL refresh_continuous_aggregate ('cagg', NULL, NULL);
2023-12-18 14:27:42.046 -03 [367649] LOG:  inserted 70084 row(s) into materialization table "_timescaledb_internal._materialized_hypertable_3"
2023-12-18 14:27:42.046 -03 [367649] STATEMENT:  CALL refresh_continuous_aggregate ('cagg', NULL, NULL);
2023-12-18 14:27:42.407 -03 [367649] LOG:  duration: 328976.816 ms  statement: CALL refresh_continuous_aggregate ('cagg', NULL, NULL);

Results

main branch Time: 522531,804 ms (08:42,532)
current PR Time: 328990,482 ms (05:28,990)
Improvement ~37%

@fabriziomello fabriziomello force-pushed the cagg_read_watermark_from_materialized_data branch 3 times, most recently from 8336b9f to 65419d0 Compare December 20, 2023 11:29
In 38fcd1b we improved the cagg_watermark performance by storing it
into a metadata table and update it during the refresh.

But we made a minor mistake here reading the watermark from the
partial view instead of the already materialized data that should be
much fast because we're reading already aggregated data.

Fixed this mistake by reading the watermark from the underlying
materialization hypertable (already aggregated data).
@fabriziomello fabriziomello force-pushed the cagg_read_watermark_from_materialized_data branch from 65419d0 to 0e12772 Compare December 20, 2023 13:57
@fabriziomello fabriziomello added this to the TimescaleDB 2.13.1 milestone Dec 21, 2023
@fabriziomello fabriziomello merged commit 3d93bfb into timescale:main Dec 21, 2023
42 checks passed
@fabriziomello fabriziomello mentioned this pull request Jan 3, 2024
jnidzwetzki added a commit to jnidzwetzki/timescaledb that referenced this pull request Jan 4, 2024
This release contains bug fixes since the 2.13.0 release.
We recommend that you upgrade at the next available opportunity.

**Bugfixes**
* timescale#6365 Use numrows_pre_compression in approximate row count
* timescale#6377 Use processed group clauses in PG16
* timescale#6384 Change bgw_log_level to use PGC_SUSET
* timescale#6393 Disable vectorized sum for expressions.
* timescale#6405 Read CAgg watermark from materialized data
* timescale#6408 Fix groupby pathkeys for gapfill in PG16
* timescale#6428 Fix index matching during DML decompression
* timescale#6439 Fix compressed chunk permission handling on PG16
* timescale#6443 Fix lost concurrent CAgg updates
* timescale#6454 Fix unique expression indexes on compressed chunks
* timescale#6465 Fix use of freed path in decompression sort logic

**Thanks**
* @MA-MacDonald for reporting an issue with gapfill in PG16
* @aarondglover for reporting an issue with unique expression indexes on compressed chunks
* @adriangb for reporting an issue with security barrier views on pg16
jnidzwetzki added a commit to jnidzwetzki/timescaledb that referenced this pull request Jan 4, 2024
This release contains bug fixes since the 2.13.0 release.
We recommend that you upgrade at the next available opportunity.

**Bugfixes**
* timescale#6365 Use numrows_pre_compression in approximate row count
* timescale#6377 Use processed group clauses in PG16
* timescale#6384 Change bgw_log_level to use PGC_SUSET
* timescale#6393 Disable vectorized sum for expressions.
* timescale#6405 Read CAgg watermark from materialized data
* timescale#6408 Fix groupby pathkeys for gapfill in PG16
* timescale#6428 Fix index matching during DML decompression
* timescale#6439 Fix compressed chunk permission handling on PG16
* timescale#6443 Fix lost concurrent CAgg updates
* timescale#6454 Fix unique expression indexes on compressed chunks
* timescale#6465 Fix use of freed path in decompression sort logic

**Thanks**
* @MA-MacDonald for reporting an issue with gapfill in PG16
* @aarondglover for reporting an issue with unique expression indexes on compressed chunks
* @adriangb for reporting an issue with security barrier views on pg16
jnidzwetzki added a commit to jnidzwetzki/timescaledb that referenced this pull request Jan 4, 2024
This release contains bug fixes since the 2.13.0 release.
We recommend that you upgrade at the next available opportunity.

**Bugfixes**
* timescale#6365 Use numrows_pre_compression in approximate row count
* timescale#6377 Use processed group clauses in PG16
* timescale#6384 Change bgw_log_level to use PGC_SUSET
* timescale#6393 Disable vectorized sum for expressions.
* timescale#6405 Read CAgg watermark from materialized data
* timescale#6408 Fix groupby pathkeys for gapfill in PG16
* timescale#6428 Fix index matching during DML decompression
* timescale#6439 Fix compressed chunk permission handling on PG16
* timescale#6443 Fix lost concurrent CAgg updates
* timescale#6454 Fix unique expression indexes on compressed chunks
* timescale#6465 Fix use of freed path in decompression sort logic

**Thanks**
* @MA-MacDonald for reporting an issue with gapfill in PG16
* @aarondglover for reporting an issue with unique expression indexes on compressed chunks
* @adriangb for reporting an issue with security barrier views on pg16
jnidzwetzki added a commit to jnidzwetzki/timescaledb that referenced this pull request Jan 4, 2024
This release contains bug fixes since the 2.13.0 release.
We recommend that you upgrade at the next available opportunity.

**Bugfixes**
* timescale#6365 Use numrows_pre_compression in approximate row count
* timescale#6377 Use processed group clauses in PG16
* timescale#6384 Change bgw_log_level to use PGC_SUSET
* timescale#6393 Disable vectorized sum for expressions.
* timescale#6405 Read CAgg watermark from materialized data
* timescale#6408 Fix groupby pathkeys for gapfill in PG16
* timescale#6428 Fix index matching during DML decompression
* timescale#6439 Fix compressed chunk permission handling on PG16
* timescale#6443 Fix lost concurrent CAgg updates
* timescale#6454 Fix unique expression indexes on compressed chunks
* timescale#6465 Fix use of freed path in decompression sort logic

**Thanks**
* @MA-MacDonald for reporting an issue with gapfill in PG16
* @aarondglover for reporting an issue with unique expression indexes on compressed chunks
* @adriangb for reporting an issue with security barrier views on pg16
jnidzwetzki added a commit to jnidzwetzki/timescaledb that referenced this pull request Jan 9, 2024
This release contains bug fixes since the 2.13.0 release.
We recommend that you upgrade at the next available opportunity.

**Bugfixes**
* timescale#6365 Use numrows_pre_compression in approximate row count
* timescale#6377 Use processed group clauses in PG16
* timescale#6384 Change bgw_log_level to use PGC_SUSET
* timescale#6393 Disable vectorized sum for expressions.
* timescale#6405 Read CAgg watermark from materialized data
* timescale#6408 Fix groupby pathkeys for gapfill in PG16
* timescale#6428 Fix index matching during DML decompression
* timescale#6439 Fix compressed chunk permission handling on PG16
* timescale#6443 Fix lost concurrent CAgg updates
* timescale#6454 Fix unique expression indexes on compressed chunks
* timescale#6465 Fix use of freed path in decompression sort logic

**Thanks**
* @MA-MacDonald for reporting an issue with gapfill in PG16
* @aarondglover for reporting an issue with unique expression indexes on compressed chunks
* @adriangb for reporting an issue with security barrier views on pg16
jnidzwetzki added a commit to jnidzwetzki/timescaledb that referenced this pull request Jan 9, 2024
This release contains bug fixes since the 2.13.0 release.
We recommend that you upgrade at the next available opportunity.

**Bugfixes**
* timescale#6365 Use numrows_pre_compression in approximate row count
* timescale#6377 Use processed group clauses in PG16
* timescale#6384 Change bgw_log_level to use PGC_SUSET
* timescale#6393 Disable vectorized sum for expressions.
* timescale#6405 Read CAgg watermark from materialized data
* timescale#6408 Fix groupby pathkeys for gapfill in PG16
* timescale#6428 Fix index matching during DML decompression
* timescale#6439 Fix compressed chunk permission handling on PG16
* timescale#6443 Fix lost concurrent CAgg updates
* timescale#6454 Fix unique expression indexes on compressed chunks
* timescale#6465 Fix use of freed path in decompression sort logic

**Thanks**
* @MA-MacDonald for reporting an issue with gapfill in PG16
* @aarondglover for reporting an issue with unique expression indexes on compressed chunks
* @adriangb for reporting an issue with security barrier views on pg16
jnidzwetzki added a commit to jnidzwetzki/timescaledb that referenced this pull request Jan 9, 2024
This release contains bug fixes since the 2.13.0 release.
We recommend that you upgrade at the next available opportunity.

**Bugfixes**
* timescale#6365 Use numrows_pre_compression in approximate row count
* timescale#6377 Use processed group clauses in PG16
* timescale#6384 Change bgw_log_level to use PGC_SUSET
* timescale#6393 Disable vectorized sum for expressions.
* timescale#6405 Read CAgg watermark from materialized data
* timescale#6408 Fix groupby pathkeys for gapfill in PG16
* timescale#6428 Fix index matching during DML decompression
* timescale#6439 Fix compressed chunk permission handling on PG16
* timescale#6443 Fix lost concurrent CAgg updates
* timescale#6454 Fix unique expression indexes on compressed chunks
* timescale#6465 Fix use of freed path in decompression sort logic

**Thanks**
* @MA-MacDonald for reporting an issue with gapfill in PG16
* @aarondglover for reporting an issue with unique expression indexes on compressed chunks
* @adriangb for reporting an issue with security barrier views on pg16
jnidzwetzki added a commit to jnidzwetzki/timescaledb that referenced this pull request Jan 9, 2024
This release contains bug fixes since the 2.13.0 release.
We recommend that you upgrade at the next available opportunity.

**Bugfixes**
* timescale#6365 Use numrows_pre_compression in approximate row count
* timescale#6377 Use processed group clauses in PG16
* timescale#6384 Change bgw_log_level to use PGC_SUSET
* timescale#6393 Disable vectorized sum for expressions.
* timescale#6405 Read CAgg watermark from materialized data
* timescale#6408 Fix groupby pathkeys for gapfill in PG16
* timescale#6428 Fix index matching during DML decompression
* timescale#6439 Fix compressed chunk permission handling on PG16
* timescale#6443 Fix lost concurrent CAgg updates
* timescale#6454 Fix unique expression indexes on compressed chunks
* timescale#6465 Fix use of freed path in decompression sort logic

**Thanks**
* @MA-MacDonald for reporting an issue with gapfill in PG16
* @aarondglover for reporting an issue with unique expression indexes on compressed chunks
* @adriangb for reporting an issue with security barrier views on pg16
jnidzwetzki added a commit that referenced this pull request Jan 9, 2024
This release contains bug fixes since the 2.13.0 release.
We recommend that you upgrade at the next available opportunity.

**Bugfixes**
* #6365 Use numrows_pre_compression in approximate row count
* #6377 Use processed group clauses in PG16
* #6384 Change bgw_log_level to use PGC_SUSET
* #6393 Disable vectorized sum for expressions.
* #6405 Read CAgg watermark from materialized data
* #6408 Fix groupby pathkeys for gapfill in PG16
* #6428 Fix index matching during DML decompression
* #6439 Fix compressed chunk permission handling on PG16
* #6443 Fix lost concurrent CAgg updates
* #6454 Fix unique expression indexes on compressed chunks
* #6465 Fix use of freed path in decompression sort logic

**Thanks**
* @MA-MacDonald for reporting an issue with gapfill in PG16
* @aarondglover for reporting an issue with unique expression indexes on compressed chunks
* @adriangb for reporting an issue with security barrier views on pg16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants