-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement readable explain plans for physical plans #337
Conversation
@@ -58,8 +52,7 @@ impl<'a, 'b> PlanVisitor for IndentVisitor<'a, 'b> { | |||
if self.indent > 0 { | |||
writeln!(self.f)?; | |||
} | |||
self.write_indent()?; | |||
|
|||
write!(self.f, "{:indent$}", "", indent = self.indent * 2)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a better way to make indents that I found while googling around
#[derive(Debug, Clone, Copy)] | ||
pub enum DisplayFormatType { | ||
/// Default, compact format. Example: `FilterExec: c12 < 10.0` | ||
Default, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I envision adding more types (e.g. Graphviz) as needs evolve
} | ||
} | ||
|
||
/// Return a [wrapper](DisplayableExecutionPlan) around an |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the main proposed interface: displayable
which returns a struct
which then has several ways to display it. It would be ideal if I could add this to ExecutionPlan
directly itself, but since it is a trait this was the best I could come up with (along with a bunch of documentation)
#[tokio::test] | ||
async fn test_physical_plan_display_indent() { | ||
let mut ctx = ExecutionContext::new(); | ||
register_aggregate_csv(&mut ctx).unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought one end to end test would be reasonable to make sure the output was ok and that it didn't regress, but also wouldn't take too much effort to maintain
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not look into details but in general the changes are mostly visit+previsit+posvisit and formats which make sense. The test is very clear, too.
@@ -356,13 +356,15 @@ pub enum Partitioning { | |||
/// after all children have been visited. | |||
//// | |||
/// To use, define a struct that implements this trait and then invoke | |||
/// "LogicalPlan::accept". | |||
/// [`LogicalPlan::accept`]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: What this change does? better looking in the doc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes it an hyperlink in API docs :)
/// visitor.post_visit(CsvExec) | ||
/// visitor.post_visit(FilterExec) | ||
/// visitor.post_visit(ProjectionExec) | ||
/// ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice
datafusion/tests/sql.rs
Outdated
" ProjectionExec: expr=[c1, MAX(c12), MIN(c12) as the_min]", | ||
" HashAggregateExec: mode=Final, gby=[c1], aggr=[MAX(c12), MIN(c12)]", | ||
" MergeExec", | ||
" HashAggregateExec: mode=Partial, gby=[c1], aggr=[MAX(c12), MIN(c12)]", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great to see the partial aggregate displayed here
datafusion/tests/sql.rs
Outdated
" HashAggregateExec: mode=Partial, gby=[c1], aggr=[MAX(c12), MIN(c12)]", | ||
" CoalesceBatchesExec: target_batch_size=4096", | ||
" FilterExec: c12 < CAST(10 AS Float64)", | ||
" RepartitionExec: partitioning=RoundRobinBatch(16)", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the RepartitionExec does? to split is to smaller batches to send to multi-threads?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Round robin repartition will move the batches as is one by one to different partitions, in this case based on "round robin" so partion 1,2,3 etc. which are executed in different threads.
There is also hash repartition which sends the values based on hashed keys to different threads.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense. Thanks @Dandandan
datafusion/tests/sql.rs
Outdated
" HashAggregateExec: mode=Partial, gby=[c1], aggr=[MAX(c12), MIN(c12)]", | ||
" CoalesceBatchesExec: target_batch_size=4096", | ||
" FilterExec: c12 < CAST(10 AS Float64)", | ||
" RepartitionExec: partitioning=RoundRobinBatch(16)", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should statically set the concurrency level in the execution config if we want to check the plan like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good call -- I will do so.
Looking much better @alamb !!! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good with the test fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️
b9ca684
to
03b776f
Compare
Test fixed in 03b776f (I hope 🤞 ) |
Codecov Report
@@ Coverage Diff @@
## master #337 +/- ##
==========================================
- Coverage 76.07% 75.59% -0.48%
==========================================
Files 142 143 +1
Lines 23788 23695 -93
==========================================
- Hits 18097 17913 -184
- Misses 5691 5782 +91
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks amazing! Thanks a lot, @alamb !
03b776f
to
3610906
Compare
Which issue does this PR close?
#333
Rationale for this change
EXPLAIN output for physical plans is currently close to useless (in my opinion).
What changes are included in this PR?
ExecutionPlans
displayable
function for displayingExecutionPlans
reasonablyNote I will hope to use the same basic infrastructure to implement graphviz plans #219
Example new format
Are there any user-facing changes?
Yes: output format for
EXPLAIN VERBOSE
has changedNew Output:
API changes
None: All changes are backwards compatible
Example old format