-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[multistage] refactor traversals of stage nodes into visitor pattern #9560
Conversation
This PR improves the traversal code around StageNode. It sets up a common pattern for visiting nodes, collecting information, rewriting and making other changes. This PR is setup for one that will help us implement a global sort stage for LIMIT/OFFSET queries.
0d6aa1f
to
d12d5ff
Compare
Codecov Report
@@ Coverage Diff @@
## master #9560 +/- ##
=============================================
+ Coverage 28.28% 68.54% +40.25%
- Complexity 53 4920 +4867
=============================================
Files 1917 1938 +21
Lines 102594 103383 +789
Branches 15586 15683 +97
=============================================
+ Hits 29022 70867 +41845
+ Misses 70735 27497 -43238
- Partials 2837 5019 +2182
Flags with carried forward coverage won't be shown. Click here to find out more.
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
partial review on the planner side. will do another pass on the runtime a bit later
pinot-query-planner/src/main/java/org/apache/pinot/query/planner/stage/MailboxReceiveNode.java
Outdated
Show resolved
Hide resolved
pinot-query-planner/src/main/java/org/apache/pinot/query/planner/ExplainPlanStageVisitor.java
Show resolved
Hide resolved
pinot-query-planner/src/main/java/org/apache/pinot/query/planner/stage/StageNodeVisitor.java
Show resolved
Hide resolved
@@ -87,7 +87,7 @@ public void init(PinotConfiguration config, InstanceDataManager instanceDataMana | |||
_serverExecutor = new ServerQueryExecutorV1Impl(); | |||
_serverExecutor.init(config, instanceDataManager, serverMetrics); | |||
_workerExecutor = new WorkerQueryExecutor(); | |||
_workerExecutor.init(config, serverMetrics, _mailboxService, _hostname, _port); | |||
_workerExecutor.init(_mailboxService, _hostname, _port); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why reducing the init method signature?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these weren't used - so just some cleanup. I can remove it if I'm missing something
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please do not remove them for now. config is useful for overrides and metrics we will be using in the future for sure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this isn't fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops! sorry I thought I had done that but looks like I only fixed the other code path 😬 shame on me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you revert this change?
pinot-query-planner/src/main/java/org/apache/pinot/query/planner/logical/ShuffleRewriter.java
Outdated
Show resolved
Hide resolved
pinot-query-planner/src/main/java/org/apache/pinot/query/planner/logical/StagePlanner.java
Outdated
Show resolved
Hide resolved
pinot-query-planner/src/main/java/org/apache/pinot/query/planner/logical/GenerateQueryPlan.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good to me overall. please see the comments and otherwise good to go
...-query-planner/src/main/java/org/apache/pinot/query/planner/logical/AttachStageMetadata.java
Outdated
Show resolved
Hide resolved
pinot-query-planner/src/main/java/org/apache/pinot/query/planner/logical/ShuffleRewriter.java
Outdated
Show resolved
Hide resolved
pinot-query-planner/src/main/java/org/apache/pinot/query/planner/QueryPlan.java
Show resolved
Hide resolved
pinot-query-planner/src/main/java/org/apache/pinot/query/planner/stage/MailboxReceiveNode.java
Outdated
Show resolved
Hide resolved
pinot-query-planner/src/main/java/org/apache/pinot/query/planner/logical/StagePlanner.java
Outdated
Show resolved
Hide resolved
pinot-query-planner/src/main/java/org/apache/pinot/query/planner/logical/StagePlanner.java
Outdated
Show resolved
Hide resolved
@@ -87,7 +87,7 @@ public void init(PinotConfiguration config, InstanceDataManager instanceDataMana | |||
_serverExecutor = new ServerQueryExecutorV1Impl(); | |||
_serverExecutor.init(config, instanceDataManager, serverMetrics); | |||
_workerExecutor = new WorkerQueryExecutor(); | |||
_workerExecutor.init(config, serverMetrics, _mailboxService, _hostname, _port); | |||
_workerExecutor.init(_mailboxService, _hostname, _port); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this isn't fixed
...query-runtime/src/main/java/org/apache/pinot/query/runtime/executor/PhysicalPlanBuilder.java
Outdated
Show resolved
Hide resolved
...query-runtime/src/main/java/org/apache/pinot/query/runtime/executor/WorkerQueryExecutor.java
Outdated
Show resolved
Hide resolved
// FIXME: there's a bug where singletonInstance may be null in the case of a JOIN where | ||
// one side is BROADCAST and the other is SINGLETON (this is the case with nested loop | ||
// joins for inequality conditions). This causes NPEs in the logs, but actually works | ||
// because the side that hits the NPE doesn't expect to get any data anyway (that's the | ||
// side that gets the broadcast from one side but nothing from the SINGLETON) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we create an issue and link it here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#9592 - added to comment as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm.
* inspecting whether all data required by a specific subtree already resides on | ||
* a single host. It gathers the information recursively by checking which partitioned |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* inspecting whether all data required by a specific subtree already resides on | |
* a single host. It gathers the information recursively by checking which partitioned | |
* inspecting whether all data required by a specific subtree are already colocated. | |
* It gathers the information recursively by checking which partitioned |
import org.apache.pinot.query.runtime.operator.TransformOperator; | ||
|
||
|
||
public class PhysicalPlanVisitor implements StageNodeVisitor<Operator<TransferableBlock>, Void> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
javadoc
This PR improves the traversal code around StageNode. It sets up a common pattern for visiting nodes, collecting information, rewriting and making other changes. This PR is setup for one that will help us implement a global sort stage for LIMIT/OFFSET queries and support sort push down.
There are five main parts to look at:
StageNodeVisitor
interface and implementedvisit
in all of the Stage Node implementationspartitionKey
optimization (that removes a shuffle if not necessary) into a Visitor (ShuffleRewriter
)QueryPlan
metadata into a visitor (this is in preparation for the next PR) (QueryPlanGenerator
)Operator
into a visitor (PhyscialPlanBuilder
)QueryPlan#explain
into a visitor, and also improved the functionality (see new plan explain below)Lastly, I added some quality of life improvements in debug-ability and I identified a "bug" in nested loop joins - though I'll fix that one in a future PR (see
FIXME
comment)Example of the improved explain (it now properly recognizes which nodes are executing what code):