Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ESQL: Skip multivalues in LOOKUP JOIN matches #120519

Merged
merged 7 commits into from
Jan 22, 2025
Merged

Conversation

ivancea
Copy link
Contributor

@ivancea ivancea commented Jan 21, 2025

Fixes #118780

To follow the same logics of the == operator (Doesn't match or work at all on multivalues or nulls), we're removing the Lucene matching currently being indirectly used by LOOKUP JOIN.

@ivancea ivancea added >non-issue Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) auto-backport Automatically create backport pull requests when merged :Analytics/ES|QL AKA ESQL v9.0.0 v8.18.0 labels Jan 21, 2025
@ivancea ivancea requested review from nik9000 and alex-spies January 21, 2025 11:13
@ivancea ivancea marked this pull request as draft January 21, 2025 11:14
@ivancea ivancea marked this pull request as ready for review January 21, 2025 12:47
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@@ -146,7 +146,7 @@ private static void forEachFromRelation(PhysicalPlan plan, Consumer<EsRelation>
}

/**
* Similar to {@link Node#forEachUp(Consumer)}, but with a custom callback to get the node children.
* Similar to {@link Node#forEachUp(Class, Consumer)}, but with a custom callback to get the node children.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unrelated, just a minor docs fix

Query query = queryList.getQuery(queryPosition);
if (query != null) {
return query;
if (skipMultiValuesMatching == false || queryList.block.getValueCount(queryPosition) == 1) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This queryList.block.getValueCount(queryPosition) is a bit weird for 2 reasons:

  1. It's a protected field, accessed here for convenience
  2. It's an abstract method (I'll run benchmarks to ensure this doesn't affect performance)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should do this by changing TermQueryList - either by making a new one or pushing a boolean into it.

/**
* LOOKUP JOIN without MV matching (https://github.com/elastic/elasticsearch/issues/118780)
*/
JOIN_LOOKUP_SKIP_MV(JOIN_LOOKUP_V11.isEnabled()),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make it _V12?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The post-merge conflicts with the renaming approach make it very prone to get main broken. So I'll either rename it later before merging, or just remove the capability and update the other one in another PR

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That hit me a while back too. We keep bumping the same number and then everything breaks.

Query query = queryList.getQuery(queryPosition);
if (query != null) {
return query;
if (skipMultiValuesMatching == false || queryList.block.getValueCount(queryPosition) == 1) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should do this by changing TermQueryList - either by making a new one or pushing a boolean into it.

Copy link
Contributor Author

@ivancea ivancea Jan 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tip: Hide whitespace when reviewing this file

@ivancea ivancea enabled auto-merge (squash) January 22, 2025 15:01
@ivancea ivancea merged commit 9c19268 into elastic:main Jan 22, 2025
15 of 16 checks passed
@ivancea ivancea deleted the esql-join-mv branch January 22, 2025 16:01
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
8.x Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 120519

ivancea added a commit that referenced this pull request Jan 22, 2025
ivancea added a commit that referenced this pull request Jan 28, 2025
Fixes #118780

Second part of #120519

In the first PR, we avoid matching multivalue keys in lookup when they come from the query.
Now, we avoid matching multivalues when the lookup index has multivalues in the key column.
elasticsearchmachine pushed a commit that referenced this pull request Jan 28, 2025
#121037)

Fixes #118780

Second part of #120519

In the first PR, we avoid matching multivalue keys in lookup when they come from the query.
Now, we avoid matching multivalues when the lookup index has multivalues in the key column.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL auto-backport Automatically create backport pull requests when merged >non-issue Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.18.0 v9.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ESQL: LOOKUP JOIN should warn when join keys are multi-valued
3 participants