Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[multistage][test] add table expression tests #9817

Merged
merged 4 commits into from
Nov 28, 2022

Conversation

walterddr
Copy link
Contributor

from PostgresQL doc section 7.2.2

Copy link
Contributor

@agavra agavra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, suggest adding two tests

},
"queries": [
{
"sql": "SELECT strCol FROM {tbl1} GROUP BY strCol"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • can we group by multiple columns?
  • can we group by a function call?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • there's already a multi column group by but i added more
  • also added group by without agg with single/multi columns (distinct basically)
  • added function call on group by too. (alias will not be covered as they will be in the alias test cases)

@codecov-commenter
Copy link

codecov-commenter commented Nov 16, 2022

Codecov Report

Merging #9817 (36cab68) into master (6ef4dfc) will decrease coverage by 28.56%.
The diff coverage is n/a.

@@              Coverage Diff              @@
##             master    #9817       +/-   ##
=============================================
- Coverage     63.44%   34.88%   -28.57%     
+ Complexity     5326      199     -5127     
=============================================
  Files          1960     1972       +12     
  Lines        105350   105703      +353     
  Branches      15960    15998       +38     
=============================================
- Hits          66840    36872    -29968     
- Misses        33589    65707    +32118     
+ Partials       4921     3124     -1797     
Flag Coverage Δ
integration1 ?
integration2 24.62% <ø> (+0.05%) ⬆️
unittests1 ?
unittests2 15.75% <ø> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...ain/java/org/apache/pinot/spi/utils/LoopUtils.java 0.00% <0.00%> (-100.00%) ⬇️
...in/java/org/apache/pinot/spi/utils/BytesUtils.java 0.00% <0.00%> (-100.00%) ⬇️
...java/org/apache/pinot/spi/trace/BaseRecording.java 0.00% <0.00%> (-100.00%) ⬇️
...java/org/apache/pinot/spi/trace/NoOpRecording.java 0.00% <0.00%> (-100.00%) ⬇️
...ava/org/apache/pinot/spi/config/table/FSTType.java 0.00% <0.00%> (-100.00%) ⬇️
...ava/org/apache/pinot/spi/config/user/RoleType.java 0.00% <0.00%> (-100.00%) ⬇️
...ava/org/apache/pinot/spi/data/MetricFieldSpec.java 0.00% <0.00%> (-100.00%) ⬇️
...ava/org/apache/pinot/spi/stream/StreamMessage.java 0.00% <0.00%> (-100.00%) ⬇️
...java/org/apache/pinot/common/tier/TierFactory.java 0.00% <0.00%> (-100.00%) ⬇️
...a/org/apache/pinot/spi/config/table/TableType.java 0.00% <0.00%> (-100.00%) ⬇️
... and 1353 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

]
}
},
"queries": [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also add the following ?

Non-correlated sub-query with HAVING

Without the subquery, we will first have to execute a query to find overall average and hardcode that value in HAVING clause of the second query.

SELECT group1, group2, AVG(col) AS groupAverage 
FROM FOO 
HAVING groupAverage > (
       SELECT AVG(col) FROM FOO)

Similarly correlated sub-query with HAVING

SELECT group1, group2, AVG(col) AS groupAverage 
FROM FOO as OUTER
HAVING groupAverage > (
                 SELECT AVG(col) 
                 FROM FOO 
                 WHERE group2 < OUTER.group2 )

I don't think correlated are supported yet though ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct correlate is not supported.

@@ -0,0 +1,93 @@
{
Copy link
Contributor

@siddharthteotia siddharthteotia Nov 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

High level question on table expression tests?

By table expressions do you mean CTEs ? CTEs create named queries using WITH afaik.

May be we can call these Subqueries.json to distinguish from CTEs ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

table expression as defined in postgresql 7.2 section. (e.g. all clauses that alters the behavior of a SQL after choosing the tables)
some of the sections we don't support is not included such as:

{ "sql": "SELECT strCol, intCol FROM {tbl1} GROUP BY strCol, intCol" },
{ "sql": "SELECT strCol, SUM(intCol) FROM {tbl1} GROUP BY strCol" },
{ "sql": "SELECT strCol, b.strCol2, (sum(a.intCol) * b.intCol) AS colAlias FROM {tbl1} a INNER JOIN {tbl2} b ON a.strCol = b.strCol1 GROUP BY strCol, b.strCol2, b.intCol" },
{ "sql": "SELECT MOD(b.intCol, 2), sum(a.intCol) AS colAlias FROM {tbl1} a INNER JOIN {tbl2} b ON a.strCol = b.strCol1 GROUP BY MOD(b.intCol, 2)" },
Copy link
Contributor

@siddharthteotia siddharthteotia Nov 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we also ensure that logically equivalent SQL queries return same results ?

For example INNER JOIN query and a corresponding subquery with IN clause should return same results.

SELECT p.product_id, p.product_name
FROM 
products p
INNER JOIN 
categories c 
ON p.category_id = c.category_id
WHERE c.category_id > 25

should return the same results as

SELECT p.product_id, p.product_name
FROM products p
WHERE p.category_id IN (
                    SELECT c.category_id
                    FROM categories c
                    WHERE c.category_id > 25
                 )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

both are tested against a reference database. so yeah indirectly this is implied.

{
"ignored": true,
"comments": "Relation Decorrelator not supported",
"sql": "SELECT * FROM {tbl} AS a WHERE a.strCol IN (SELECT b.strCol FROM {tbl} AS b WHERE b.intCol = a.intCol + 1)"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For subquery in WHERE with IN, can we also add NOT IN ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For IN clause subquery, let's also add test that the exact same query that uses ANY (supported by postgres) instead of IN returns identical results.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same goes for SOME I guess

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any and some will be tested in the function test instead of the table expression test

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added. none are supported but added with ignore flag

{
"ignored": true,
"comments": "EXISTS not supported",
"sql": "SELECT * FROM {tbl} AS a WHERE EXISTS (SELECT strCol FROM {tbl} AS b WHERE b.intCol = a.intCol + 1)"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do the same for NOT EXISTS ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

{ "sql": "SELECT MOD(b.intCol, 2), sum(a.intCol) AS colAlias FROM {tbl1} a INNER JOIN {tbl2} b ON a.strCol = b.strCol1 GROUP BY MOD(b.intCol, 2)" },
{ "sql": "SELECT strCol, SUM(intCol) FROM {tbl1} GROUP BY strCol HAVING SUM(intCol) > 3" },
{ "sql": "SELECT strCol, SUM(intCol) FROM {tbl1} GROUP BY strCol HAVING AVG(intCol) > 1 AND MIN(intCol) < 10" },
{ "sql": "SELECT strCol, b.strCol2, (sum(a.intCol) * b.intCol) AS colAlias FROM {tbl1} a INNER JOIN {tbl2} b ON a.strCol = b.strCol1 GROUP BY strCol, b.strCol2, b.intCol HAVING avg(a.intCol) < 100" }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's also add tests for subquery in the SELECT clause as a scalar operand ?

Something like ....

SELECT 
empID, lastName, 
firstName, (
       CASE WHEN deptID = 
                 (
                     SELECT deptID 
                     FROM departments 
                     WHERE ......
                 ) 
         THEN 'blah1' 
         ELSE 'blah2'
         END)
FROM emp

@walterddr walterddr added the multi-stage Related to the multi-stage query engine label Nov 17, 2022
}
]
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add select foo from (select 1) as foo?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i am not sure this is a valid query in most other SQL systems. did you meant

select foo.* from (select 1) as foo

foo is a table not a column name in this context, and postgres returns a table type which is not supported in Pinot

@siddharthteotia
Copy link
Contributor

Thanks for addressing comment. LGTM

@walterddr walterddr merged commit 375b7d0 into apache:master Nov 28, 2022
@walterddr walterddr deleted the pr_add_table_tests branch December 6, 2023 16:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
multi-stage Related to the multi-stage query engine
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants