-
-
Notifications
You must be signed in to change notification settings - Fork 218
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
groupby and having getfield index fixes #2281
Conversation
sql/planbuilder/parse_test.go
Outdated
}, | ||
{ | ||
// TODO: this should fail | ||
Query: "select x + 1 as xx from xy join uv on x = u group by xx having x = 123;", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
poking around this, if there is a HAVING but not GROUP BY does it still fail? Same issue with a WINDOW clause? Same issue with cartesian join that is just select ... from xy,uv ...
. This just seems really weird, incremental improvements are good but more info would be nice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HAVING
with no GROUP BY
should fail as well, added test for this
I think we're running into issues detailed here: dolthub/dolt#6676
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there's an alias that is x
it's definition is still the projection one? Subquery aliases also. For select x + 1 as xx, (select 1 from uv having x = 123) from xy, uv group by xx)
, I'm assuming the x is still visible inside the subquery? Lots other patterns that might be weird, HAVING subquery is another one select x + 1 as xx from xy join uv on x = u group by xx having (select 1 from u where x = 123);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a new havingScope because it has special visibility rules, but the way this is structured still makes it difficult for me to understand what the rules are. Are some table columns excluded? It seems like everything in fromScope is a fallback. So then we only block certain projection aliases? So then the question is what projection scope columns take precedence, and the list seems like 1) a column from fromScope is resolved before a projection alias of the same name, 2) there is a blacklist of aliases depending on whether we are in a grouping. The alias->getField check seems maybe unnecessary? I can't think why it would be a problem for HAVING to resolve an alias rather than its underlying column.
Generating invalid ids also seems like it is still a problem? Why can't we reference the projection scope alias expression id in a HAVING clause?
It also seemed like a prior version was concerned with orphaned HAVING column identifiers post-grouping (ex: select sum(x) from xy group by y HAVING x > 0
), which I'm not sure is reasonable to validate here, but is not this PRs concern now?
sql/planbuilder/parse_test.go
Outdated
@@ -160,11 +160,6 @@ Project | |||
├─ colSet: (4-6) | |||
└─ tableId: 2 | |||
`, | |||
}, | |||
{ | |||
Query: "analyze table xy update histogram on (x, y) using data '{\"row_count\": 40, \"distinct_count\": 40, \"null_count\": 1, \"columns\": [\"x\", \"y\"], \"histogram\": [{\"row_count\": 20, \"upper_bound\": [50.0]}, {\"row_count\": 20, \"upper_bound\": [80.0]}]}'", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
keep?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, more to do in the future but the incremental fixes w/ thorough testing and doc comments are all wins
} | ||
|
||
// Aliased expression is allowed if there is no group by (it is implicitly a single group) | ||
if len(fromScope.groupBy.inCols) == 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is inCols == 0 equivalent to hasAggs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think hasAggs()
returns whether or not there are aggregations used in the groupBy
, while inCols
are just the columns in the groupBy (can be outside of aggregation function)
Implementing more groupby and having strangeness from MySQL, and fixing some getfield indexing errors.
Aliases in the projection scope were always added as new columns in the having scope; this led to getfields with indexes that were invalid in aggregation functions. Now, we don't add the alias to the scope if the alias overlaps with a column that is used in an aggregation.
There is also a small fix to rewriting planbuilder tests to have proper tabbing, write skipped queries, and removed some random new lines.
There are still some issues, left as skipped tests:
Correctness: dolthub/dolt#7382