Support bloom filter when reading/writing parquet files #1830

v0y4g3r · 2023-06-26T03:55:59Z

Performance

ParquetWriter already supports bloom filter encoding, but we have to apply query clauses to bloom filters during table scan.

Once we can build external index file, we may also switch to xor filter and it's rust implementation for better performance.

The text was updated successfully, but these errors were encountered:

killme2008 · 2023-09-01T02:17:31Z

@v0y4g3r Any progress?

killme2008 · 2024-01-02T03:46:57Z

@v0y4g3r What's the plan for this issue? I am not sure if we still need it.

evenyag · 2024-01-02T04:32:17Z

IMO, we should do some benchmarks to compare with the inverted index later as parquet already supports it.

killme2008 · 2025-02-08T02:56:40Z

We already implemented skipping data index.

v0y4g3r added the C-enhancement Category Enhancements label Jun 26, 2023

v0y4g3r self-assigned this Jun 26, 2023

killme2008 added the C-performance Category Performance label Jul 26, 2023

killme2008 added this to the v0.4 milestone Jul 26, 2023

killme2008 modified the milestones: v0.4, v0.5 Oct 11, 2023

evenyag added this to mito2 Jan 11, 2024

evenyag moved this to Todo in mito2 Jan 11, 2024

fengjiachun modified the milestones: v0.5, v0.8 Feb 28, 2024

github-actions bot unassigned v0y4g3r Mar 19, 2024

killme2008 closed this as completed Feb 8, 2025

github-project-automation bot moved this from Todo to Done in mito2 Feb 8, 2025

Provide feedback