-
Notifications
You must be signed in to change notification settings - Fork 245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A small improvment in GPU If
and CaseWhen
for the all-true and all-false cases.
#4329
Changes from all commits
e39b33b
95237f9
c25986b
e139b49
cb963cb
1057071
c29b868
df49e47
762446a
26f0e96
1344040
669cb7a
e8df00b
8fc650b
4ed653f
0c23c50
7eea3a4
6dafd72
3d904f7
18a8a0c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1325,6 +1325,13 @@ object RapidsConf { | |
.booleanConf | ||
.createWithDefault(value = false) | ||
|
||
val MAX_CONDITION_BRANCH_NUMBER = conf("spark.rapids.sql.opt.condition.maxBranchNumber") | ||
.doc("Maximum number of branches for GPU case-when to enable the lazy evaluation of true " + | ||
"and else expressions if the predicates on a batch are all-true or all-false. Big number " + | ||
"may get GPU OOM easily since the predicates are cached during the computation.") | ||
.integerConf | ||
.createWithDefault(2) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not a fan of this config. Users will not know what to set it to for their queries, nor will they realize this config is what needs to be tuned if the GPU runs out of memory due to it. For small input batches we probably could cache far more than 2, but for very large batches we may not even want to do 2. What we really want is a budget, in memory size not branch count or row sizes, that the CaseWhen expression is allowed to use. Unfortunately we don't currently have a clear budget for this (and other expressions could similarly leverage this budget for its processing/optimizations). cc: @revans2 since he may have further thoughts on this. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agree. Besides, different GPUs have different memory sizes, a value can work on A100 but may not work on T4. However for most cases, I think, users do not need to take care of this config, since 2 should be small enough to run without GPU OOM. Personally I don't want to maximize this config value to enable this special optimization as much as possible. After all, it is not a common optimization. But make it configurable to give users a chance to tune it if they really have such cases. Maybe the doc is not good enough, needing more explanation. |
||
|
||
private def printSectionHeader(category: String): Unit = | ||
println(s"\n### $category") | ||
|
||
|
@@ -1741,6 +1748,8 @@ class RapidsConf(conf: Map[String, String]) extends Logging { | |
|
||
lazy val isFastSampleEnabled: Boolean = get(ENABLE_FAST_SAMPLE) | ||
|
||
lazy val maxConditionBranchNumber: Int = get(MAX_CONDITION_BRANCH_NUMBER) | ||
|
||
private val optimizerDefaults = Map( | ||
// this is not accurate because CPU projections do have a cost due to appending values | ||
// to each row that is produced, but this needs to be a really small number because | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why this change is part of this PR if GPU OOM during premerge tests are not related to the case when changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to pass the premerge build.
The OOM we are talking about is not on GPU, but the host memory on CPU.
What I just found is if we add enough new tests for case_when with the default parallel value (it is 4 according to the build log), the ci_2 process will run into OOM (Host memory reached beyond the up limit of a pod). When reduce it to 2, the build can pass.
I doult if we missed to release some memory between tests, or there would be some issues of memory management in our premerge build. Since the host memory in the ci_2 process kept increasing during the tests running.