Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[multistage][test] add table expression tests #9817

Merged
merged 4 commits into from
Nov 28, 2022
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,8 @@ public class ResourceBasedQueriesTest extends QueryRunnerTestBase {
"SpecialSyntax.json",
"LexicalStructure.json",
"ValueExpressions.json",
"NumericTypes.json"
"NumericTypes.json",
"TableExpressions.json"
);

@BeforeClass
Expand Down
111 changes: 111 additions & 0 deletions pinot-query-runtime/src/test/resources/queries/TableExpressions.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
{
Copy link
Contributor

@siddharthteotia siddharthteotia Nov 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

High level question on table expression tests?

By table expressions do you mean CTEs ? CTEs create named queries using WITH afaik.

May be we can call these Subqueries.json to distinguish from CTEs ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

table expression as defined in postgresql 7.2 section. (e.g. all clauses that alters the behavior of a SQL after choosing the tables)
some of the sections we don't support is not included such as:

"where_clause_tests": {
"psql": "7.2.2",
"tables": {
"tbl": {
"schema": [
{"name": "strCol", "type": "STRING"},
{"name": "intCol", "type": "INT"}
],
"inputs": [
["foo", 1],
["bar", 2],
["alice", 42],
["bob", 196883]
]
}
},
"queries": [
{
"sql": "SELECT * FROM {tbl} WHERE intCol > 5"
},
{
"sql": "SELECT * FROM {tbl} WHERE strCol IN ('foo', 'bar')"
},
{
"sql": "SELECT * FROM {tbl} WHERE intCol IN (196883, 42)"
},
{
"sql": "SELECT * FROM {tbl} WHERE strCol IN (SELECT strCol FROM {tbl} WHERE intCol > 100)"
},
{
"sql": "SELECT * FROM {tbl} WHERE intCol < (SELECT SUM(intCol) FROM {tbl} AS b WHERE strCol BETWEEN 'bar' AND 'foo')"
},
{
"sql": "SELECT * FROM {tbl} WHERE intCol BETWEEN 0 AND 100 AND strCol BETWEEN 'bar' AND 'foo'"
},
{
"ignored": true,
"comments": "Relation Decorrelator not supported",
"sql": "SELECT * FROM {tbl} AS a WHERE a.strCol IN (SELECT b.strCol FROM {tbl} AS b WHERE b.intCol = a.intCol + 1)"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For subquery in WHERE with IN, can we also add NOT IN ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For IN clause subquery, let's also add test that the exact same query that uses ANY (supported by postgres) instead of IN returns identical results.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same goes for SOME I guess

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any and some will be tested in the function test instead of the table expression test

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added. none are supported but added with ignore flag

},
{
"ignored": true,
"comments": "BETWEEN with a non-deterministic single-value result from sub-query is not supported",
"sql": "SELECT * FROM {tbl} AS a WHERE a.intCol BETWEEN (SELECT b.intCol FROM {tbl} AS b WHERE b.intCol = a.intCol + 1) AND 100"
},
{
"ignored": true,
"comments": "Relation Decorrelator not supported",
"sql": "SELECT * FROM {tbl} AS a WHERE a.intCol BETWEEN (SELECT MIN(b.intCol) FROM {tbl} AS b WHERE b.intCol = a.intCol + 1) AND 100"
},
{
"ignored": true,
"comments": "EXISTS not supported",
"sql": "SELECT * FROM {tbl} AS a WHERE EXISTS (SELECT strCol FROM {tbl} AS b WHERE b.intCol = a.intCol + 1)"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do the same for NOT EXISTS ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

},
{
"ignored": true,
"comments": "Relation Decorrelator not supported",
"sql": "SELECT * FROM {tbl} AS a WHERE (SELECT count(*) FROM {tbl} AS b WHERE b.intCol = a.intCol + 1) > 0"
}
]
},
"group_by_and_having_tests": {
"psql": "7.2.3",
"tables": {
"tbl1": {
"schema": [
{"name": "strCol", "type": "STRING"},
{"name": "intCol", "type": "INT"}
],
"inputs": [
["a", 3],
["b", 2],
["c", 5],
["a", 1]
]
},
"tbl2": {
"schema": [
{"name": "strCol1", "type": "STRING"},
{"name": "strCol2", "type": "STRING"},
{"name": "intCol", "type": "INT"}
],
"inputs": [
["a", "foo", 1],
["a", "bar", 2],
["b", "alice", 42],
["b", "bob", 196883]
]
}
},
"queries": [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also add the following ?

Non-correlated sub-query with HAVING

Without the subquery, we will first have to execute a query to find overall average and hardcode that value in HAVING clause of the second query.

SELECT group1, group2, AVG(col) AS groupAverage 
FROM FOO 
HAVING groupAverage > (
       SELECT AVG(col) FROM FOO)

Similarly correlated sub-query with HAVING

SELECT group1, group2, AVG(col) AS groupAverage 
FROM FOO as OUTER
HAVING groupAverage > (
                 SELECT AVG(col) 
                 FROM FOO 
                 WHERE group2 < OUTER.group2 )

I don't think correlated are supported yet though ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct correlate is not supported.

{
"sql": "SELECT strCol FROM {tbl1} GROUP BY strCol"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • can we group by multiple columns?
  • can we group by a function call?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • there's already a multi column group by but i added more
  • also added group by without agg with single/multi columns (distinct basically)
  • added function call on group by too. (alias will not be covered as they will be in the alias test cases)

},
{
"sql": "SELECT strCol, SUM(intCol) FROM {tbl1} GROUP BY strCol"
},
{
"sql": "SELECT strCol, b.strCol2, (sum(a.intCol) * b.intCol) AS colAlias FROM {tbl1} a INNER JOIN {tbl2} b ON a.strCol = b.strCol1 GROUP BY strCol, b.strCol2, b.intCol"
},
{
"sql": "SELECT strCol, SUM(intCol) FROM {tbl1} GROUP BY strCol HAVING SUM(intCol) > 3"
},
{
"sql": "SELECT strCol, SUM(intCol) FROM {tbl1} GROUP BY strCol HAVING AVG(intCol) > 1 AND MIN(intCol) < 10"
}
]
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add select foo from (select 1) as foo?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i am not sure this is a valid query in most other SQL systems. did you meant

select foo.* from (select 1) as foo

foo is a table not a column name in this context, and postgres returns a table type which is not supported in Pinot