Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#7714 ignore query options in commented out queries #7894

Conversation

kriti-sc
Copy link
Contributor

Fixes #7714

Description

Since query option is processed as a regex, options are being picked up from commented-out queries. E.g.

SELECT * FROM table
-- SELECT * FROM table OPTION(...)

This change ensures that options in commented-out query are ignored.

@codecov-commenter
Copy link

codecov-commenter commented Dec 12, 2021

Codecov Report

Merging #7894 (cba7c8d) into master (12edbdb) will increase coverage by 0.09%.
The diff coverage is 90.90%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master    #7894      +/-   ##
============================================
+ Coverage     71.28%   71.38%   +0.09%     
- Complexity     4091     4193     +102     
============================================
  Files          1587     1594       +7     
  Lines         82059    82562     +503     
  Branches      12264    12321      +57     
============================================
+ Hits          58498    58935     +437     
- Misses        19596    19641      +45     
- Partials       3965     3986      +21     
Flag Coverage Δ
integration1 29.03% <27.27%> (-0.15%) ⬇️
integration2 27.68% <27.27%> (-0.05%) ⬇️
unittests1 68.25% <90.90%> (-0.16%) ⬇️
unittests2 14.30% <0.00%> (-0.12%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...org/apache/pinot/sql/parsers/CalciteSqlParser.java 87.77% <90.90%> (-4.32%) ⬇️
...ry/optimizer/statement/JsonStatementOptimizer.java 0.00% <0.00%> (-77.64%) ⬇️
.../pinot/spi/exception/BadQueryRequestException.java 66.66% <0.00%> (-33.34%) ⬇️
...in/stream/kafka20/KafkaPartitionLevelConsumer.java 66.66% <0.00%> (-20.00%) ⬇️
...nt/local/startree/v2/store/StarTreeDataSource.java 40.00% <0.00%> (-13.34%) ⬇️
...query/postaggregation/PostAggregationFunction.java 74.28% <0.00%> (-12.82%) ⬇️
...r/filter/predicate/PredicateEvaluatorProvider.java 69.23% <0.00%> (-11.54%) ⬇️
...ache/pinot/core/operator/docidsets/OrDocIdSet.java 86.36% <0.00%> (-11.37%) ⬇️
...che/pinot/core/transport/grpc/GrpcQueryServer.java 48.97% <0.00%> (-8.17%) ⬇️
...not/common/restlet/resources/SegmentErrorInfo.java 30.76% <0.00%> (-7.70%) ⬇️
... and 94 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 12edbdb...cba7c8d. Read the comment docs.

@kriti-sc
Copy link
Contributor Author

kriti-sc commented Dec 13, 2021

@Jackie-Jiang carrying review forward from PR: #7869

The way I have resolved the issue you have mentioned: I am using -{2,} pattern (SQL query comment pattern) as signposts to remove query options found in the commented-out portion of the query.

This will not work for queries which have query options specified and are using the -{2,} pattern in string literals & identifiers. For ex.

SELECT * FROM tablex WHERE cola LIKE '%---%' OPTION (a=b)

will be parsed as

SELECT * FROM tablex WHERE cola LIKE '%---%'

This can be mitigated by differentiating regex pattern between query comment and string literal/identifier.

I did check if we could use the default MySQL dialect to pass query options (using SET) – this does not seem to be supported by Calcite. For example, the following query threw a Calcite parsing error:

SET session.timeoutMs="X" 

SELECT * FROM baseballStats

Please let me know if I am missing something here ^^

@Jackie-Jiang
Copy link
Contributor

@kriti-sc
I feel it can be hard to identify identifiers and literals from the query as that basically requires parsing the query
Calcite does support query hints which is similar to the options we have: https://calcite.apache.org/docs/reference.html#sql-hints. We can have a separate thread discussing whether it makes sense to replace the current query options with query hints, or provide query options as a separate parameter so that we don't have to extract that from the query.

@@ -95,7 +95,7 @@ private CalciteSqlParser() {
// `OPTION (<k1> = <v1>) OPTION (<k2> = <v2>) OPTION (<k3> = <v3>)`
private static final Pattern OPTIONS_REGEX_PATTEN =
Pattern.compile("option\\s*\\(([^\\)]+)\\)", Pattern.CASE_INSENSITIVE);

private static final Pattern COMMENTED_QUERY_PATTERN = Pattern.compile("-{2,}");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can use simply search for "--" instead of using regex match? Substring search is much cheaper comparing to regex match

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is done – our original problem with using -- pattern as signpost remains.

@@ -122,6 +122,7 @@ public static PinotQuery compileToPinotQuery(String sql)

// Extract OPTION statements from sql as Calcite Parser doesn't parse it.
List<String> options = extractOptionsFromSql(sql);
sql = removeCommentedOptionsFromSql(sql);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to remove the commented options again here?

I feel a simpler and more readable way to support this is to first remove the commented part from the sql, then extract the options

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not wise to remove the commented part from the sql because it may distort the query. Ex:
SELECT * FROM tableA where colA LIKE '%--%' becomes SELECT * FROM tableA where colA LIKE '%%'

But this is the change I have made – The first thing that will happen is the removal of commented options. After this, query parsing will proceed like before.

@Jackie-Jiang Jackie-Jiang merged commit d25a488 into apache:master Dec 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ignore query option from the commented out queries
3 participants