Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mark all input columns in LIMIT BY as required output #5407

Merged

Conversation

kvap
Copy link
Contributor

@kvap kvap commented May 24, 2019

I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

Prevent removal of any columns in LimitBy by marking all input columns as required output.

Category

Bug Fix

Short description

The query analyzer only marks the actual arguments of LIMIT BY as required
output for the LimitBy step in the pipeline. This is fine, unless the query is
distributed, in which case the first stage might remove a column that is used
at the second stage (e.g. for ORDER BY) but is not part of the final select.

Detailed description

If t is a distributed table with columns a and b, then a query like this will not work:

:) SELECT a FROM t ORDER BY b LIMIT 1 BY a;
-- will raise DB::Exception: Not found column b in block. There are only columns: a.

That happens because on the remote side the pipeline looks like this:

LimitBy
 Expression
  MergeSorting
   PartialSorting
    Expression
     Log

Here the LimitBy is the last step in the pipeline, so finalize() feels it can remove all columns that are not part of the final projection and not required in LimitBy.

But then, on the initial node the pipeline looks like this:

Expression
 LimitBy
  Expression
   MergingSorted <-- the ORDER BY column is unavailable
    Asynchronous × 3
     Remote

Fix that by marking all inputs of LimitBy as required outputs.

The query analyzer only marks the actual arguments of LIMIT BY as required
output for the LimitBy step in the pipeline. This is fine, unless the query is
distributed, in which case the first stage might remove a column that is used
at the second stage (e.g. for ORDER BY) but is not part of the final select.

Prevent removal of any columns in LimitBy by marking all input columns as
required output.
@kvap
Copy link
Contributor Author

kvap commented May 24, 2019

@victor-perov
Copy link

Looks good! Thanks, @kvap!

@alexey-milovidov alexey-milovidov added can be tested pr-bugfix Pull request with bugfix, not backported by default labels May 25, 2019
@alexey-milovidov alexey-milovidov merged commit bffe621 into ClickHouse:master May 25, 2019
akuzm pushed a commit that referenced this pull request May 29, 2019
Mark all input columns in LIMIT BY as required output

(cherry picked from commit bffe621)
akuzm added a commit that referenced this pull request May 31, 2019
…5468)

Mark all input columns in LIMIT BY as required output

(cherry picked from commit bffe621)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-bugfix Pull request with bugfix, not backported by default
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants