Mark all input columns in LIMIT BY as required output #5407

kvap · 2019-05-24T14:17:51Z

I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

Prevent removal of any columns in LimitBy by marking all input columns as required output.

Short description

The query analyzer only marks the actual arguments of LIMIT BY as required
output for the LimitBy step in the pipeline. This is fine, unless the query is
distributed, in which case the first stage might remove a column that is used
at the second stage (e.g. for ORDER BY) but is not part of the final select.

Detailed description

If t is a distributed table with columns a and b, then a query like this will not work:

:) SELECT a FROM t ORDER BY b LIMIT 1 BY a;
-- will raise DB::Exception: Not found column b in block. There are only columns: a.

That happens because on the remote side the pipeline looks like this:

LimitBy
 Expression
  MergeSorting
   PartialSorting
    Expression
     Log

Here the LimitBy is the last step in the pipeline, so finalize() feels it can remove all columns that are not part of the final projection and not required in LimitBy.

But then, on the initial node the pipeline looks like this:

Expression
 LimitBy
  Expression
   MergingSorted <-- the ORDER BY column is unavailable
    Asynchronous × 3
     Remote

Fix that by marking all inputs of LimitBy as required outputs.

The query analyzer only marks the actual arguments of LIMIT BY as required output for the LimitBy step in the pipeline. This is fine, unless the query is distributed, in which case the first stage might remove a column that is used at the second stage (e.g. for ORDER BY) but is not part of the final select. Prevent removal of any columns in LimitBy by marking all input columns as required output.

kvap · 2019-05-24T14:19:14Z

cc @bocharov @twalwyn @Dorokhov @victor-perov

victor-perov · 2019-05-24T15:47:38Z

Looks good! Thanks, @kvap!

Mark all input columns in LIMIT BY as required output (cherry picked from commit bffe621)

…5468) Mark all input columns in LIMIT BY as required output (cherry picked from commit bffe621)

alexey-milovidov added can be tested pr-bugfix Pull request with bugfix, not backported by default labels May 25, 2019

Update ExpressionAnalyzer.cpp

1d98441

alexey-milovidov merged commit bffe621 into ClickHouse:master May 25, 2019

akuzm pushed a commit that referenced this pull request May 29, 2019

Merge pull request #5407 from kvap/all-columns-required-in-limit-by

a3a7fc6

Mark all input columns in LIMIT BY as required output (cherry picked from commit bffe621)

akuzm mentioned this pull request May 29, 2019

Backport to 19.7: Merge pull request #5407 from kvap/all-columns-required-in-limit-by #5468

Merged

akuzm added a commit that referenced this pull request May 31, 2019

Merge pull request #5407 from kvap/all-columns-required-in-limit-by (#…

58ee7a6

…5468) Mark all input columns in LIMIT BY as required output (cherry picked from commit bffe621)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mark all input columns in LIMIT BY as required output #5407

Mark all input columns in LIMIT BY as required output #5407

kvap commented May 24, 2019

kvap commented May 24, 2019

victor-perov commented May 24, 2019

Mark all input columns in LIMIT BY as required output #5407

Mark all input columns in LIMIT BY as required output #5407

Conversation

kvap commented May 24, 2019

Category

Short description

Detailed description

kvap commented May 24, 2019

victor-perov commented May 24, 2019