Support late materialization #5829

Lloyd-Pottiger · 2022-09-08T15:24:39Z

Feature Request

Is your feature request related to a problem? Please describe:

For some query like select * from t where a + b <> 0;, we now scan all columns and then do filter, which is heavy and we can do better.

Describe the feature you'd like:

Support late materialization. We can scan column(s) at first to filter out unnecessary data segment, and then just scan the rest columns.

Describe alternatives you've considered:

Teachability, Documentation, Adoption, Migration Strategy:

Subtask:

support late materialization in Selection
support late materialization in fast scan
support late materialization in disaggregated TiFlash
better scan context after some filter conditions pushed down
(optional) support push down TopN
(optional) support late materialization in Join

The text was updated successfully, but these errors were encountered:

ref #5829

Lloyd-Pottiger · 2023-02-03T04:00:02Z

Given this query:

SELECT id, view_count, like_count, dislike_count, title FROM youtube ORDER BY view_count DESC LIMIT 100

The column title is very large, and the query takes a long time:

100 rows in set. Elapsed: 26.280 sec. Processed 4.56 billion rows, 500.52 GB (173.38 million rows/s., 19.05 GB/s.)

It will be much faster to read the view_count column first while keeping track of the data parts and row number in the data part where the records are located, ORDER BY and LIMIT the required number of records, then using the same table snapshot, read the remaining columns.

For example, this query:

SELECT view_count FROM youtube ORDER BY view_count DESC LIMIT 100

Only takes:

100 rows in set. Elapsed: 6.920 sec. Processed 4.56 billion rows, 41.01 GB (658.44 million rows/s., 5.93 GB/s.)

Implementation proposal

Introduce a new column type, ColumnLazy, that will store the reference to the table snapshot to use, the list of data parts, and its main content will be (part num, row num) pairs. Multiple ColumnLazy can share the same content (but represent different delayed columns).

ref pingcap#5829 Signed-off-by: ywqzzy <[email protected]>

close #5829

ref #5829

ref #5829, close pingcap/tidb#42555

ref #5829

Lloyd-Pottiger added type/feature-request Categorizes issue or PR as related to a new feature. component/storage type/performance labels Sep 8, 2022

Lloyd-Pottiger self-assigned this Oct 14, 2022

This was referenced Dec 2, 2022

feat: using bitmap in normal scan and support late materialization #6411

Closed

docs: add support late materialization rfc pingcap/tidb#39654

Merged

Lloyd-Pottiger mentioned this issue Dec 27, 2022

Feat/lm in block #6536

Closed

12 tasks

Lloyd-Pottiger mentioned this issue Jan 13, 2023

planner: support push down some filter conditions to TiFlash table scan pingcap/tidb#40562

Merged

12 tasks

Lloyd-Pottiger mentioned this issue Jan 28, 2023

*: rename PushDownFilter to FilterConditions #6679

Merged

12 tasks

ti-chi-bot pushed a commit that referenced this issue Jan 29, 2023

*: rename PushDownFilter to FilterConditions (#6679)

e64d8cc

ref #5829

Lloyd-Pottiger mentioned this issue Feb 13, 2023

Add hstackBlocks for Block to join blocks by columns #6803

Merged

12 tasks

ywqzzy pushed a commit to ywqzzy/tiflash_1 that referenced this issue Feb 13, 2023

*: rename PushDownFilter to FilterConditions (pingcap#6679)

2d49389

ref pingcap#5829 Signed-off-by: ywqzzy <[email protected]>

ti-chi-bot closed this as completed in #6803 Feb 13, 2023

ti-chi-bot pushed a commit that referenced this issue Feb 13, 2023

Add hstackBlocks for Block to join blocks by columns (#6803)

01a5ddd

close #5829

Lloyd-Pottiger reopened this Feb 13, 2023

This was referenced Feb 13, 2023

Support read(filter, true) in FilterBlockInputStream #6810

Merged

Support skip rows both in delta and stable #6818

Merged

ti-chi-bot pushed a commit that referenced this issue Feb 16, 2023

Support read(filter, true) in FilterBlockInputStream (#6810)

47d4c8f

ref #5829

ti-chi-bot pushed a commit that referenced this issue Mar 1, 2023

Support skip rows both in delta and stable (#6818)

19dd20a

ref #5829

This was referenced Mar 2, 2023

DAGStorageInterpreter: move duration/timezone cast after filter #6929

Closed

Support late materialization #6966

Merged

ti-chi-bot pushed a commit that referenced this issue Mar 16, 2023

Support late materialization (#6966)

b82b0ba

ref #5829

Lloyd-Pottiger mentioned this issue Mar 31, 2023

Storage: should seek before read #7159

Merged

12 tasks

ti-chi-bot pushed a commit that referenced this issue Mar 31, 2023

Storage: should seek before read (#7159)

b400544

ref #5829, close pingcap/tidb#42555

ti-chi-bot mentioned this issue Mar 31, 2023

Storage: should seek before read (#7159) #7192

Closed

12 tasks

This was referenced Apr 6, 2023

Storage: support late materialization in disaggregated mode #7231

Closed

fix columns_to_read contains _tidb_rowid #7283

Merged

ti-chi-bot pushed a commit that referenced this issue Apr 12, 2023

fix columns_to_read contains _tidb_rowid (#7283)

56507d9

ref #5829

Lloyd-Pottiger mentioned this issue Apr 14, 2023

Cloud-Native support late-materialized. #7295

Merged

12 tasks

ti-chi-bot pushed a commit that referenced this issue Apr 18, 2023

Cloud-Native support late-materialized. (#7295)

56e7fbe

ref #5829

Lloyd-Pottiger mentioned this issue Apr 18, 2023

Storage: refine some code in lm #7313

Merged

12 tasks

ti-chi-bot pushed a commit that referenced this issue Apr 19, 2023

Storage: refine some code in lm (#7313)

0db3d27

ref #5829

Lloyd-Pottiger mentioned this issue Apr 27, 2023

Add a config to force enable lm and add more tests for lm #7399

Merged

12 tasks

This was referenced May 10, 2023

[POC] late materialization filter cache #7448

Draft

Support late materialization pingcap/tidb#40601

Closed

ti-chi-bot bot pushed a commit that referenced this issue May 25, 2023

Add a config to force enable lm and add more tests for lm (#7399)

cdef326

ref #5829

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support late materialization #5829

Support late materialization #5829

Lloyd-Pottiger commented Sep 8, 2022 •

edited

Loading

Lloyd-Pottiger commented Feb 3, 2023

Support late materialization #5829

Support late materialization #5829

Comments

Lloyd-Pottiger commented Sep 8, 2022 • edited Loading

Feature Request

Lloyd-Pottiger commented Feb 3, 2023

Lloyd-Pottiger commented Sep 8, 2022 •

edited

Loading