Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fallback to CPU when aggregate push down used for parquet #4623

Conversation

HaoYang670
Copy link
Collaborator

@HaoYang670 HaoYang670 commented Jan 25, 2022

Signed-off-by: remzi [email protected]
close #3951
Spark 330 added support for aggregate pushdown for parquet.

In this PR, we:

  1. Fallback to CPU when an aggregation is pushed down because this is a metadata scan and nothing to compute so nothing to accelerate by running it on the GPU
  2. Add tests

@HaoYang670 HaoYang670 changed the title fallback to CPU when there is aggregate pushed down for parquet Aggregate pushed down for parquet Jan 25, 2022
@HaoYang670 HaoYang670 changed the title Aggregate pushed down for parquet Enable aggregate push down for parquet Jan 25, 2022
@HaoYang670
Copy link
Collaborator Author

build

@sameerz sameerz added the audit_3.3.0 Audit related tasks for 3.3.0 label Jan 25, 2022
@sameerz sameerz added this to the Jan 10 - Jan 28 milestone Jan 25, 2022
@sameerz sameerz linked an issue Jan 25, 2022 that may be closed by this pull request
Signed-off-by: remzi <[email protected]>
@HaoYang670
Copy link
Collaborator Author

build

if (a.isInstanceOf[SupportsRuntimeFiltering]) {
willNotWorkOnGpu("Parquet does not support Runtime filtering (DPP)" +
" on datasource V2 yet.")
} else if (a.pushedAggregate.nonEmpty) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The basic principle of tagging is to record more information about not running on GPU. So here, let's move the pushedAggregate checking out of the else if

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, I will change it.

@HaoYang670
Copy link
Collaborator Author

build

}

@pytest.mark.skipif(is_before_spark_330(), reason='Aggregate push down on Parquet is a new feature of Spark 330')
@allow_non_gpu(any = True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this test is not supposed to fallback, why are we allowing non-GPU nodes in the plan?

Comment on lines 708 to 712
assert_cpu_and_gpu_are_equal_collect_with_capture(
do_parquet_scan,
exist_classes= "GpuBatchScanExec",
non_exist_classes= "BatchScanExec",
conf= conf_for_parquet_aggregate_pushdown)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should just be assert_cpu_and_gpu_are_equal_collect. We're honestly not that interested in exactly which GPU nodes are involved here, rather that the plan is all on the GPU. That's what assert_cpu_and_gpu_are_equal_collect already checks, once we remove the @allow_non_gpu decorator.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! I have updated

@HaoYang670
Copy link
Collaborator Author

build

@HaoYang670 HaoYang670 merged commit f7db39f into NVIDIA:branch-22.04 Jan 29, 2022
@HaoYang670 HaoYang670 deleted the issue3951_aggregate_push_down_for_parquet branch January 29, 2022 09:22
@tgravescs tgravescs changed the title Enable aggregate push down for parquet Fallback to CPU when aggregate push down used for parquet Mar 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
audit_3.3.0 Audit related tasks for 3.3.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Aggregate (Min/Max/Count) push down for Parquet
5 participants