-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Throw an exception when MV columns are present in the order-by expression list in selection order-by only queries #9078
Throw an exception when MV columns are present in the order-by expression list in selection order-by only queries #9078
Conversation
Codecov Report
@@ Coverage Diff @@
## master #9078 +/- ##
=============================================
- Coverage 69.90% 26.09% -43.82%
+ Complexity 4685 44 -4641
=============================================
Files 1862 1854 -8
Lines 99335 99143 -192
Branches 15115 15100 -15
=============================================
- Hits 69444 25869 -43575
- Misses 24967 70690 +45723
+ Partials 4924 2584 -2340
Flags with carried forward coverage won't be shown. Click here to find out more.
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
Good catch!
|
+1 on both suggestions. from this claim:
I think we should definitely not allow MV values present in the order-by list with explicit error, and we can do that in the parser level already right? |
Thanks for the suggestions @Jackie-Jiang For (2), we were thinking about throwing there, but there are a few problems.
Let me know your thoughts on the above |
I did consider that, but we may need to add more checks on the broker level if we wanted to do it there. The server side figured out which queries are aggregation types already and creates a different PlanNode for that. We'd have to add some of that logic on broker side if we wanted to do this at the parser level. Also, this is not a problem if GROUP BY is included in the query as GROUP BY flattens out the column. Let me know your thoughts.
|
@@ -250,13 +250,33 @@ public PlanNode makeSegmentPlanNode(IndexSegment indexSegment, QueryContext quer | |||
return new AggregationPlanNode(indexSegment, queryContext); | |||
} | |||
} else if (QueryContextUtils.isSelectionQuery(queryContext)) { | |||
validateSelectionOrderByExpressions(indexSegment, queryContext); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the earliest stage that this check can be done? We should try to do such semantic checks as soon as possible (during or right after parsing) so that the query can be failed as quickly as possible without consuming any extra resources. There are a bunch of checks in BrokerRequestHandler, so I am wondering if this check can be done there instead of in server side plan generation?
@somandal The idea here is to throw exception when user tries to order-by any expression that is MV. The earliest time when we know the expression is SV or MV is in the constructor of |
So there are broadly 2 categories of MV specific order by expressions for selection order by queries
I agree that changing min-max operator is too late and the problem can be solved / detected sooner. I also agree that purist way of solving this is to detect the presence of any MV expression in the ORDER BY expressions like in However, currently the engine lets 2nd category of queries go through by ignoring the MV expression in the ORDER BY list. The 1st category fails later during execution with not a meaningful error in the min-max operator when trying to execute first expression. Making changes to constructor of To continue running 2nd category queries (like today), we decided not to make the operator change and instead fail during server planning by special casing for category 1 queries. Any thoughts on the risk to do the clean/pure solution ? If there is such query in production, we may have to rollback or force the client app to change queries. cc @somandal @Jackie-Jiang @walterddr |
@siddharthteotia I understand the concern of breaking some prod queries, but IMO it is better to return exception rather than return unexpected result. The failed queries won't serve the purpose anyway, and returning an exception with proper error message is the best way to catch bad queries. Also, I don't think people have such queries, or they will complain when we added the |
I think we are on the same page here w.r.t what is the end state here -- block any MV expressions in ORDER BY via throwing exception in I am not comfortable with making a change that may fail existing queries. Shall we do this ?
I am also ok with not doing this change at all and just do the metric change since when the final solution is implemented, the current change will probably be deleted anyway. |
It is a good idea to first emit metrics, then after monitoring for some time, change it to throw exceptions. I'd suggest only changing the |
Hey @Jackie-Jiang, I've created a new PR: #9117 to address the metrics change. Once we get that in and monitor the usage on our end for such queries, I'll get back to this fix and make the clean change in the |
…in selection order-by queries
… null for REALTIME segments
0bb6bc8
to
b00ac55
Compare
Hey @siddharthteotia and @Jackie-Jiang I've cleaned up this code, removed the metric and added the exception as part of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with minor comments
Today MV columns as the first expression in a Selection Order By query fails during query execution. The error seen today is:
This PR throws an
UnsupportedOperationException
during the server side planning phase when it identifies a MV column as the first expression in the selection order-by query to provide users with a better message about why the query fails.Note that this PR only errors out the scenario when the MV column is the first expression in the order-by expression list rather than erroring out whenever any MV column is present in the order-by expression list. The reason for this is to maintain backward compatibility. Today if a mix of SV and MV columns are present in a selection order-by query, and if the first expression is a SV column then the query execution completes successfully. The position of the MV column has no affect on the actual ordering of results as the
SelectionOrderByOperator
only includes the SV columns in the comparator used to set the values in thePriorityQueue
maintained by it. The query execution failure only occurs when the MV column is the first expression.Selection order-by on MV columns don't make sense semantically which is why it is better to error out in a meaningful way as the error thrown on the query execution failure requires debugging to understand why it failed.
cc @siddharthteotia