Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bugfix: remove cnf_rewrite in push_down_filter #4825

Merged
merged 2 commits into from
Jan 7, 2023
Merged

Conversation

jackwener
Copy link
Member

@jackwener jackwener commented Jan 5, 2023

Which issue does this PR close?

Closes #4822.

Rationale for this change

I found this error when I finish #4465

What changes are included in this PR?

  • remove cnf_rewrite (its implementation is wrong, and we already have rule RewriteDisjunctivePredicate)
  • dedup by using hashset when merge filter
  • move rule RewriteDisjunctivePredicate into front

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added core Core DataFusion crate optimizer Optimizer rules labels Jan 5, 2023
@@ -1561,7 +1561,7 @@ async fn reduce_left_join_2() -> Result<()> {
let expected = vec![
"Explain [plan_type:Utf8, plan:Utf8]",
" Projection: t1.t1_id, t1.t1_name, t1.t1_int, t2.t2_id, t2.t2_name, t2.t2_int [t1_id:UInt32;N, t1_name:Utf8;N, t1_int:UInt32;N, t2_id:UInt32;N, t2_name:Utf8;N, t2_int:UInt32;N]",
" Filter: t2.t2_int < UInt32(10) OR t1.t1_int > UInt32(2) [t1_id:UInt32;N, t1_name:Utf8;N, t1_int:UInt32;N, t2_id:UInt32;N, t2_name:Utf8;N, t2_int:UInt32;N]",
" Filter: t2.t2_int < UInt32(10) OR t1.t1_int > UInt32(2) AND t2.t2_name != Utf8(\"w\") [t1_id:UInt32;N, t1_name:Utf8;N, t1_int:UInt32;N, t2_id:UInt32;N, t2_name:Utf8;N, t2_int:UInt32;N]",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can find AND t2.t2_name != Utf8(\"w\") is lost due to original cnf_rewrite()

@jackwener
Copy link
Member Author

PTAL @alamb @Ted-Jiang

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jackwener -- I think this rewrite was added by @Ted-Jiang in #3903 so perhaps he can weigh in on this proposal

Given it fixes a bug I am inclined to merge it in but will wait for @Ted-Jiang for a few days before doing so

I found https://github.com/apache/arrow-datafusion/pull/4825/files?w=1 shows the diffs much easier

push_down_all_join(predicates, &filter.input, left, right, vec![])?
}
LogicalPlan::TableScan(scan) => {
let mut new_scan_filters = scan.filters.clone();
let mut new_predicate = vec![];

let filter_predicates = utils::split_conjunction_owned(
utils::cnf_rewrite(filter.predicate.clone()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is the only place that calls cnf_rewrite

https://github.com/search?q=repo%3Aapache%2Farrow-datafusion%20cnf_rewrite&type=code

so if we are going to stop calling it, I think we should also remove cnf_rewrite from the code

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll plan to merge this over the weekend unless we hear something from @Ted-Jiang

@alamb alamb merged commit 2db3d2e into apache:master Jan 7, 2023
@alamb
Copy link
Contributor

alamb commented Jan 7, 2023

Thanks again @jackwener

@ursabot
Copy link

ursabot commented Jan 7, 2023

Benchmark runs are scheduled for baseline = a6d067c and contender = 2db3d2e. 2db3d2e is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

@jackwener jackwener deleted the filter branch January 18, 2023 11:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate optimizer Optimizer rules
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PushdownFilter rule exist bug will cause filter change wrong
3 participants