-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deprecate reference to _type in lookup queries #37016
Deprecate reference to _type in lookup queries #37016
Conversation
Pinging @elastic/es-search |
0d159a3
to
ab26db4
Compare
* Add integration test coverage for typeless lookup queries. * Fix a bug around typeless terms lookup queries. * Make sure to provide non-deprecated methods in QueryBuilders. * Make the deprecation messages more accurate. * Update the query DSL documentation. * Avoid creating duplicate query builder tests. Don't use 'ids' when generating random queries to avoid types warnings.
3b25c64
to
d7b0cd8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jtibshirani I left a few comments, let me know what you think. Since I haven't done so many type removal related reviews I think it would be good to also get a second reviewer on this.
server/src/main/java/org/elasticsearch/indices/TermsLookup.java
Outdated
Show resolved
Hide resolved
if (out.getVersion().onOrAfter(Version.V_7_0_0)) { | ||
out.writeOptionalString(type); | ||
} else { | ||
out.writeString(type); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this might break if type is "null". We should write some dummy type to the stream then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that before this change, type
could never be null
in this class (it is marked as final, and the constructor validates that it is not null). I will add a comment here to clarify.
This also reminds me of something I was hoping to get your thoughts on. With this implementation, it is not possible to use a typeless terms lookup query in a mixed 6.x and 7.0 cluster. The error will occur when a 6.x nodes tries to decode the optional string that the 7.0 node is sending here. Does this seem reasonable in terms of behavior + error messaging (not sure if there is any precedent for such a change)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error will occur when a 6.x nodes tries to decode the optional string that the 7.0 node is sending here.
Exactly what I thought, with the exception that the problem happens then a 7.0 node sends an object with a null-type I think we should make sure this cannot happen though. If we want to keep the current decision in this PR to allow null-references for types in the POJOs we need to send some dummy type when serializing to a 6.x node. We should also add unit tests for this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, I understand your concern now. I had assumed that passing a dummy type would actually cause more confusion than simply erroring out. For example, say a user had an index with document type my_type
, and a mixed-node cluster. Then a typeless terms lookup query will always succeed if the lookup document is on a 7.0 node, but show no error and not find the document if it lives on a 6.x node. This is because the type name _doc
will be used for the 6.x lookup, and we will fail to find the lookup document without any indication as to why. Using _doc
in get requests to mean 'any type' is functionality we only support on 7.0.
I'm now thinking it would be clearest if I explicitly checked for a null type here, and errored with the message "Typeless terms lookup queries are not supported if any node is running a version before 7.0."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is because the type name
_doc
will be used for the 6.x lookup, and we will fail to find the lookup document without any indication as to why.
Is there a chance to use the dummy "_doc" type in order to not break serialization but then infer the "right" type name when on the shard for this case? As far as i understand the types removal process all 6.x indices should only have one type left, even if it is named differently?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that would be possible if we backported the part of #35790 that applied to 'get' requests. I would prefer not to do that if possible because it is not so straightfoward to backport, especially since 6.x can contain some indices with multiple types, as it supports indices created in 5.x. I also don't think that using typeless terms lookups in a mixed-node cluster will be a common enough occurrence to warrant the extra work + complexity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for explaining, that is unfortunate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the 36549 PR for the IndexRequest class I opted to serialize using the result of a call to type()
which will return _doc
if null. This made 7.x+6.x nodes happy to talk to each other.
It was important that IndexRequest kept null types in the calling client because type==null
was the condition I used to know if a blank request should inherit a choice of type from a BulkRequest's choice of default.
server/src/test/java/org/elasticsearch/index/query/RandomQueryBuilder.java
Outdated
Show resolved
Hide resolved
server/src/test/java/org/elasticsearch/index/query/TermsQueryBuilderTests.java
Show resolved
Hide resolved
Thanks @cbuescher for the helpful comments, I've pushed some commits to address them. I understand your thoughts around a second reviewer, just wanted to note for context that (1) these changes are quite different from the types deprecation work we've done so far, and (2) both Mayya and I have taken a close look at them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a minor nit, but I think the mixed cluster serialization issue thats left open needs adressing and also probably a unit test that checks we can send/receive query builders from a 6.x node.
server/src/test/java/org/elasticsearch/index/query/RandomQueryBuilder.java
Outdated
Show resolved
Hide resolved
…d with pre-7.0 nodes.
I think this is ready for another look. |
} else { | ||
if (type == null) { | ||
throw new IllegalArgumentException("Typeless [terms] lookup queries are not supported if any " + | ||
"node is running a version before 7.0."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really like throwing an IAE on the serialization layer, but I see that we don't have the information about the output streams version elsewhere. I will try to look into how this error propagates to e.g. the user making a rest call and I'd like to discuss the alternative of using a "dummy" type (with the downside that this then won't match lookup items with a non-default 6.x type name) again. In any case we should add something to the docs or some "known issues" document to capture this edge case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, I can also take a look at this by trying it out in REST tests. I will ping you offline to further discuss the idea of using a 'dummy' type name.
} | ||
if (indexedShapeId != null && indexedShapeType == null) { | ||
throw new IllegalArgumentException("indexedShapeType is required if indexedShapeId is specified"); | ||
throw new IllegalArgumentException("either shapeBytes or indexedShapeId is required"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"shapeBytes" should be "shape"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
@@ -912,9 +964,18 @@ public static MoreLikeThisQueryBuilder fromXContent(XContentParser parser) throw | |||
if (stopWords != null) { | |||
moreLikeThisQueryBuilder.stopWords(stopWords); | |||
} | |||
|
|||
if (!moreLikeThisQueryBuilder.isTypeless()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prefer == false to !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
if (out.getVersion().onOrAfter(Version.V_7_0_0)) { | ||
out.writeOptionalString(type); | ||
} else { | ||
out.writeString(type); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the 36549 PR for the IndexRequest class I opted to serialize using the result of a call to type()
which will return _doc
if null. This made 7.x+6.x nodes happy to talk to each other.
It was important that IndexRequest kept null types in the calling client because type==null
was the condition I used to know if a blank request should inherit a choice of type from a BulkRequest's choice of default.
assertThat(query, instanceOf(MoreLikeThisQueryBuilder.class)); | ||
|
||
MoreLikeThisQueryBuilder mltQuery = (MoreLikeThisQueryBuilder) query; | ||
if (!mltQuery.isTypeless()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
==false
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
assertThat(query, CoreMatchers.instanceOf(TermsQueryBuilder.class)); | ||
|
||
TermsQueryBuilder termsQuery = (TermsQueryBuilder) query; | ||
if (termsQuery.termsLookup() != null && termsQuery.termsLookup().type() != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe add an isTypeless() abstraction like you did for MLT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Thanks to both of you for the review. I pushed changes to address @markharwood's suggestions. I looked into changing the strategy or handling mixed-cluster serialization, and my intuition is to stick with the current approach:
|
Then I'm fine with that, as far as I understand user will still have the option to use types if anything on their side doesn't work without types in a mixed-cluster environment, they'd only have to live with the deprecation warnings. Is that correct? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Yes, that's correct. I can't find a 'known issues' document, but I will create another section under |
@elasticmachine run gradle build tests 1 |
Relates to #35190