Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: fix left semi and left anti virtual lookup joins #92713

Merged
merged 1 commit into from
Dec 2, 2022

Conversation

yuzefovich
Copy link
Member

@yuzefovich yuzefovich commented Nov 30, 2022

Epic: CRDB-23454

This commit fixes the execution of the left semi and left anti virtual
lookup joins. The bug was that we forgot to project away the looked up
columns (coming from the "right" side) which could then lead to wrong
columns being used higher up the tree. The bug was introduced during
22.1 release cycle where we added the optimizer support for generating
plans that could contain left semi and left anti virtual lookup joins.
This commit fixes that issue as well as the output columns of such joins
(I'm not sure whether there is a user facing impact of having incorrect
"output columns").

Additionally, this commit fixes the execution of these virtual lookup
joins to correctly return the input row only once. Previously, for left
anti joins we'd be producing an output row if there was a match (which
is wrong), and for both left semi and left anti we would emit an output
row every time there was a match (but this should be done only once).
(Although I'm not sure whether it is possible for virtual indexes to
result in multiple looked up rows.)

Also, as a minor simplification this commit makes it so that the output
rows are not added into the row container for left semi and left anti
and the container is not instantiated at all.

Fixes: #91012.
Fixes: #88096.

Release note (bug fix): CockroachDB previously could incorrectly
evaluate queries that performed left semi and left anti "virtual lookup"
joins on tables in pg_catalog or information_schema. These join types
can be planned when a subquery is used inside of a filter condition. The
bug was introduced in 20.2.0 and is now fixed.

@yuzefovich
Copy link
Member Author

For context, #76697 is where we added the support for semi, anti, and left outer joins in the optimizer.

Feedback on the release note would be especially helpful to make it more useful for the end users to understand if they are affected.

Also I took the liberty and left a TODO for Marcus, I hope you don't mind 😃

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@rafiss
Copy link
Collaborator

rafiss commented Dec 1, 2022

idea for a release note:

CockroachDB previously could incorrectly evaluate queries that performed left semi and left anti "virtual lookup" joins on tables in pg_catalog or information_schema. These join types can be planned when a subquery is used inside of a filter condition. The bug was introduced in 22.1.0 and is now fixed.

I don't think we need too many details, since the release note in the docs will link to the PR for anyone who is interested in more.

@yuzefovich yuzefovich force-pushed the fix-virtual-lookup-join branch from ceefca6 to 8d83fa9 Compare December 1, 2022 20:44
This commit fixes the execution of the left semi and left anti virtual
lookup joins. The bug was that we forgot to project away the looked up
columns (coming from the "right" side) which could then lead to wrong
columns being used higher up the tree. The bug was introduced during
22.1 release cycle where we added the optimizer support for generating
plans that could contain left semi and left anti virtual lookup joins.
This commit fixes that issue as well as the output columns of such joins
(I'm not sure whether there is a user facing impact of having incorrect
"output columns").

Additionally, this commit fixes the execution of these virtual lookup
joins to correctly return the input row only once. Previously, for left
anti joins we'd be producing an output row if there was a match (which
is wrong), and for both left semi and left anti we would emit an output
row every time there was a match (but this should be done only once).
(Although I'm not sure whether it is possible for virtual indexes to
result in multiple looked up rows.)

Also, as a minor simplification this commit makes it so that the output
rows are not added into the row container for left semi and left anti
and the container is not instantiated at all.

Release note (bug fix): CockroachDB previously could incorrectly
evaluate queries that performed left semi and left anti "virtual lookup"
joins on tables in `pg_catalog` or `information_schema`. These join types
can be planned when a subquery is used inside of a filter condition. The
bug was introduced in 22.1.0 and is now fixed.
@yuzefovich yuzefovich force-pushed the fix-virtual-lookup-join branch from 8d83fa9 to f58e248 Compare December 1, 2022 20:44
@yuzefovich
Copy link
Member Author

Included the fix to #88096 as well as the release note as Rafi suggested (thanks!).

RFAL.

Copy link
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While you're here, I just wrote a head-scratching query while trying to demo something to @dikshant. This query should fail with the below error, but instead it succeeds. Wanna add it to this logictest with the expected error?

ERROR: column "oid" does not exist
SQLSTATE: 42703
SELECT *
  FROM pg_class
 WHERE oid
       IN (
            SELECT oid
              FROM pg_index
             WHERE indrelid
                   IN (SELECT oid FROM pg_class WHERE relname = 'foo')
        );

edit: the above query is weird and I'm wrong.

@rafiss
Copy link
Collaborator

rafiss commented Dec 1, 2022

I'm missing something - why is an error expected? Seems like that query works fine on postgres.

@ajwerner
Copy link
Contributor

ajwerner commented Dec 1, 2022

Oh gosh correlated subqueries are confusing. oid here is from the outer query. Ignore me. I'm sorry.

Copy link
Contributor

@msirek msirek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Nice fix!

Reviewed 3 of 4 files at r1, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @mgartner)

@yuzefovich
Copy link
Member Author

TFTR!

bors r+

@craig
Copy link
Contributor

craig bot commented Dec 2, 2022

Build succeeded:

@craig craig bot merged commit e7b15eb into cockroachdb:master Dec 2, 2022
@blathers-crl
Copy link

blathers-crl bot commented Dec 2, 2022

Encountered an error creating backports. Some common things that can go wrong:

  1. The backport branch might have already existed.
  2. There was a merge conflict.
  3. The backport branch contained merge commits.

You might need to create your backport manually using the backport tool.


error creating merge commit from f58e248 to blathers/backport-release-22.1-92713: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 22.1.x failed. See errors above.


error creating merge commit from f58e248 to blathers/backport-release-22.2-92713: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 22.2.x failed. See errors above.


🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

@@ -292,6 +292,7 @@ func (c *CustomFuncs) GenerateLookupJoins(
//
// It should be possible to support semi- and anti- joins. Left joins may be
// possible with additional complexity.
// TODO(mgartner): update this comment.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack. Though this has nothing to do with lookup joins on virtual tables - this rule is for lookup joins involving virtual computed columns.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed here: #93042

@mgartner
Copy link
Collaborator

mgartner commented Dec 5, 2022

For context, #76697 is where we added the support for semi, anti, and left outer joins in the optimizer.

I don't think that PR is related - it has to do with virtual computed columns, not virtual indexes. So this bug might be present in versions older than 22.1.

@yuzefovich
Copy link
Member Author

Thanks Marcus for the comments. Indeed, I think you're right - I tried reproductions from both issues on 21.2.17, and both resulted in incorrect output. Now I believe that the bugs date back to 20.2 (#48226 which introduced the support for virtual lookup joins) since the corresponding optimizer changes were merged in 2019 (#38625). Is it worth opening a docs issue to update the release note?

@mgartner
Copy link
Collaborator

mgartner commented Dec 8, 2022

@yuzefovich Thanks for trying the older version. Yes, let's open a docs issue to get the correct version that the bug was introduced in the release note.

@yuzefovich
Copy link
Member Author

Filed cockroachdb/docs#15753 and cockroachdb/docs#15755.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants