-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Multi-stage] Only track max joined rows within each block #13981
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #13981 +/- ##
============================================
- Coverage 61.75% 57.89% -3.86%
- Complexity 207 219 +12
============================================
Files 2436 2612 +176
Lines 133233 143202 +9969
Branches 20636 21985 +1349
============================================
+ Hits 82274 82905 +631
- Misses 44911 53819 +8908
- Partials 6048 6478 +430
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
41f290b
to
845e9f6
Compare
if (incrementJoinedRowsAndCheckLimit()) { | ||
break; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't we need the rows limit check for semi and anti joins?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it because we want this protection mainly for cross joins and other similar join conditions where the number of joined rows can be much more than the sum of individual rows from the left and right blocks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we are going to apply the limit per block, right? semi and anti join (and I guess inner) cannot produce more rows that the ones received (and I guess we assume each received block will have an acceptable size)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is correct. We should never run a setting where it allows rows less than a block
if (incrementJoinedRowsAndCheckLimit()) { | ||
break; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we are going to apply the limit per block, right? semi and anti join (and I guess inner) cannot produce more rows that the ones received (and I guess we assume each received block will have an acceptable size)
In #13922 we added support to apply max rows limit to joined rows.
The intention is to protect operator from OOM on large
CROSS JOIN
, so we want to limit the rows in memory (similar to the protection over in-memory hash table).This PR changes the logic to track joined rows per block instead of globally, so that memory is protected, but large join can still work.