-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Search hit refactoring #41656
Search hit refactoring #41656
Conversation
Pinging @elastic/es-search |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay in reviewing this. I have a few comments. The most important one is to ensure we don't break the json response structure.
} else if (size == 1) { | ||
DocumentField hitField = DocumentField.readDocumentField(in); | ||
fields = singletonMap(hitField.getName(), hitField); | ||
if (in.getVersion().onOrAfter(Version.V_8_0_0)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be 7.3.0, since that is currently 7.x, where this would be backported to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in the last rebasing I adapted to 8.0.0 again; @mayya-sharipova should I maybe use an earlier version instead, e.g. 7.7.0?
} | ||
} | ||
|
||
private void splitFieldsByMetadata(Map<String, DocumentField> fields, Map<String, DocumentField> outOther, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
outOther -> outDocument? to parallel documentFields that it writes out
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
} | ||
|
||
public void fields(Map<String, DocumentField> fields) { | ||
this.fields = fields; | ||
this.metaFields.clear(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why clear? We should just set to maps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was a way to make sure that the new content coming from fields
map does not get mixed with old content of the maps. If I got you right, this will be replaced with creation of 2 new empty hash maps here.
this.fields = fields; | ||
this.metaFields.clear(); | ||
this.documentFields.clear(); | ||
if (fields == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While null
is allowed right now, I don't think we ever actually pass it. I see 3 non-test uses of this method, all of which pass a non-null map. I think you can add a requiresNonNull here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added Objects.requireNonNull
@@ -624,9 +675,16 @@ public XContentBuilder toInnerXContent(XContentBuilder builder, Params params) t | |||
if (source != null) { | |||
XContentHelper.writeRawField(SourceFieldMapper.NAME, source, builder, params); | |||
} | |||
if (!otherFields.isEmpty()) { | |||
builder.startObject(Fields.FIELDS); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't break the json response like this. The union of the document and metadata fields should still exist under "fields".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
corrected. Looks like the metadata fields were just on the root object, while only document fields were under fields
. Corrected to what it has been before.
It is slightly concerning that we have a dependency to static MapperService.getAllMetaFields()
in the json deserialization part, which contradicts our strategic plan of separating the metadata and document fields in different maps on SearchHit
level. This is not the objective of this PR, but not sure how can we deal with this in future
269f3b8
to
8745b55
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just noticed you had updated this PR. In the future, it is best to ping when there are more updates to review, as it is difficult to notice status updates on a PR vs watching for pings.
Are you still interested in getting this in? i left a few comments.
@@ -543,6 +605,8 @@ public void setInnerHits(Map<String, SearchHits> innerHits) { | |||
static final String _PRIMARY_TERM = "_primary_term"; | |||
static final String _SCORE = "_score"; | |||
static final String FIELDS = "fields"; | |||
static final String DOCUMENT_FIELDS = "document_fields"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I mentioned in the original issue, modifying the json output structure is not something we want to do, since it would be a breaking change and affect basically every user of elasticsearch. We should continue writing to xcontent in the old structure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HI @rjernst , I think the issue I struggled with last time and tried to describe in #41656 (comment) is that it is difficult to discern metafields from regular fields during deserialization if we store both of them under the same key "fields". Then the only source of information would be MapperService.getAllMetaFields()
which we are trying to refactor away. Do you see any other way to separate those groups of fields during deserialization if json structure remains the same?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there are 2 cases here: rest (json) responses, and internal node to node communication. The latter is handled by the stream input/output version checks.
For rest/json response, we currently recommend users upgrade Elasticsearch before upgrading the HLRC, which means the rest layer here must stay the same for compatibility with the older client during this upgrade period.
Only document fields are output within the "fields" object. Meta fields are output at the root level (the same level as "fields"). We should be able to catch the rest of the special fields we did not parse specially (ie if a field wasn't parsed explicitly like _id, then it is a meta field). Currently we register each meta field in the object parser. Instead, we should now be using an UnknownFieldConsumer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @rjernst, I was reading through code again - looks like that concern of mine is no big issue after all.
I think there are 2 cases here: rest (json) responses, and internal node to node communication. The latter is handled by the stream input/output version
agree. The node to node communication was already addressed before in this PR, we are only discussing json responses here
For rest/json response, we currently recommend users upgrade Elasticsearch before upgrading the HLRC, which means the rest layer here must stay the same for compatibility with the older client during this upgrade period.
I previously though it would be difficult to have it compatible, however after having another look, I don't see problems with that. Currently this PR should not be introducing breaking changes to rest/json api.
Only document fields are output within the "fields" object.
Is preserved.
Meta fields are output at the root level (the same level as "fields").
Is preserved as well.
We should be able to catch the rest of the special fields we did not parse specially (ie if a field wasn't parsed explicitly like _id, then it is a meta field).
Exactly this is happening in the declareMetaDataFields
method which already existed before this PR. Currently this method is mainly remaining intact in the PR, I only change where in the object parser map does it store the metafields
Currently we register each meta field in the object parser. Instead, we should now be using an UnknownFieldConsumer.
This is the only part I have questions about. Looks like declareMetaDataFields
is doing the job already, why should we move from object parser to UnknownFieldConsumer?
Also addressed part of your comments in the last commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mayya-sharipova I was referring to the last question here.
} else if (size == 1) { | ||
DocumentField hitField = DocumentField.readDocumentField(in); | ||
fields = singletonMap(hitField.getName(), hitField); | ||
if (in.getVersion().onOrAfter(Version.V_7_3_0)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This and the corresponding read will need to be updated to 7.4 now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated to Version.V_8_0_0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mayya-sharipova could you please confirm that Version.V_8_0_0 should be used for this PR (and not e.g. some 7.x.x. version)
@@ -264,6 +271,44 @@ public void writeTo(StreamOutput out) throws IOException { | |||
} | |||
} | |||
|
|||
private Map<String, DocumentField> readFields(StreamInput in) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading and writing the document fields directly shoudln't be necessary. Instead we can use in.readMap(StreamInput::readString, DocumentField::new)
and for writing out.writeMap(map, StreamOutput::writeString, DocumentField::writeTo)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added readMap/writeMap for communication with server versions 8.0.0+; for communication with the older versions the old serialization using explicit DocumentField parsing remains as before to avoid potential backward compatibility bugs.
return fields.values().iterator(); | ||
// need to join the fields and metadata fields | ||
Map<String, DocumentField> allFields = this.getFields(); | ||
if (allFields == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can never be null, right? We always create a new map in getFields() currently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
} | ||
|
||
Map<String, DocumentField> searchFields = getSearchFields(context, fieldsVisitor, subDocId, | ||
storedToRequestedFields, subReaderContext); | ||
|
||
Map<String, DocumentField> metaFields = new HashMap<>(), | ||
documentFields = new HashMap<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Please use a separate line for declaration of documentFields, no need for a continuation of the type.
@sandmannn Sorry that it has been long since we were doing review of this PR. We would you like to continue to review it and drive it to the end in earnest. |
@mayya-sharipova nice to hear from you. I can work on it during the next 2 weeks. I plan to have limited availability after that, so it is reasonable to make a handover after that. For now, I will concentrate on addressing the merge conflicts and getting the tests back to green. In addition, there is one open question from the previous review cycle regarding usage of |
8745b55
to
77c4daf
Compare
@mayya-sharipova to my current understanding, the PR is rebased on master and does not provide any serialization inconsistency neither with JSON nor in binary. Yet the question of migration from explicitly enumerated metafields to using UnknownFieldConsumer is still open - looks like it may be a quite complicated step that may be better addressable in a separate PR as this one is getting large. We need to make the decision about the scope of PR and for that I need input from you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sandmannn Thanks for rebasing your PR. The PR looks overall very good to me. I left some comments
About using UnknownFieldConsumer
instead of declareMetaDataFields
it depends on you if you want to tackle this in this PR or you would like to leave it to a new PR.
I don't expect many changes for using UnknownFieldConsumer
. The code should be something:
private static ObjectParser.UnknownFieldConsumer<Map<String, Object>> unknownMetaFieldConsumer = (map, fieldName, fieldValue) -> {
Map<String, DocumentField> fieldMap = (Map<String, DocumentField>) map.computeIfAbsent(METADATA_FIELDS, v -> new HashMap<String, DocumentField>());
fieldMap.put(fieldName, new DocumentField(fieldName, Collections.singletonList(fieldValue)));
};
private static final ObjectParser<Map<String, Object>, Void> MAP_PARSER = new ObjectParser<>("innerHitParser", unknownMetaFieldConsumer,
HashMap::new);
But we can also leave this to a new PR. In this case, can you please put TODO comment to replace declareMetaDataFields
with UnknownFieldConsumer
.
SearchHit searchHit = new SearchHit(docId, fieldsVisitor.id(), searchFields); | ||
Map<String, DocumentField> metaFields = new HashMap<>(); | ||
Map<String, DocumentField> documentFields = new HashMap<>(); | ||
splitSearchHitFields(searchFields, documentFields, metaFields); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use SearchHit::splitFieldsByMetadata
instead of introducing a new function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
TODO added. |
@elasticmachine test this please |
@sandmannn Thanks for your changes, can you please address failing tests. |
I am on it, will need some time. |
adjusted the way new DocumentFields were added to an existing SearchHit instance which was previously causing the integration test to fail; @mayya-sharipova please retrigger the CI job |
@elasticmachine test this please |
@sandmannn Congratulations, your PR has been merger to master! And it looks like we have not broken anything! |
Hi @mayya-sharipova as it should mainly consist of downporting the changes to another branch, I think I can do it but please expect some delay since, as we discussed above, I will have less opportunity to contribute starting from end of March/beginning of April due to some extra load related to my employment. I guess I should open a new PR against https://github.com/elastic/elasticsearch/tree/7.7 ? |
@sandmannn Thank you. Please let me know if you don't have time to do this, I can do the backport myself. |
@sandmannn No worries about backport, I will do it myself tomorrow. We want backports to happen fast. |
@mayya-sharipova thanks for taking it over; I doubt I would have found time for it today. |
Refactor SearchHit to have separate document and meta fields. This is a part of bigger refactoring of issue elastic#24422 to remove dependency on MapperService to check if a field is metafield. Relates to PR: elastic#38373 Relates to issue elastic#24422
Refactor SearchHit to have separate document and meta fields. This is a part of bigger refactoring of issue #24422 to remove dependency on MapperService to check if a field is metafield. Relates to PR: #38373 Relates to issue #24422 Co-authored-by: sandmannn <[email protected]>
Before to determine if a field is meta-field, a static method of MapperService isMetadataField was used. This method was using an outdated static list of meta-fields. This PR instead changes this method to the instance method that is also aware of meta-fields in all registered plugins. This PR also refactors DocumentField to add a new parameter to the constructor of DocumentField isMetadataField which describes whether this is metadata field or not. This refactoring is necessary as there is no more statiic method to check if a field is meta-field, but DocumentField objects are always created from the contexts where this information is available, and will be passed to DocumentField constructor for further usage. Related elastic#38373, elastic#41656 Closes elastic#24422
Before to determine if a field is meta-field, a static method of MapperService isMetadataField was used. This method was using an outdated static list of meta-fields. This PR instead changes this method to the instance method that is also aware of meta-fields in all registered plugins. Related elastic#38373, elastic#41656 Closes elastic#24422
Before to determine if a field is meta-field, a static method of MapperService isMetadataField was used. This method was using an outdated static list of meta-fields. This PR instead changes this method to the instance method that is also aware of meta-fields in all registered plugins. Related #38373, #41656 Closes #24422
Before to determine if a field is meta-field, a static method of MapperService isMetadataField was used. This method was using an outdated static list of meta-fields. This PR instead changes this method to the instance method that is also aware of meta-fields in all registered plugins. Related elastic#38373, elastic#41656 Closes elastic#24422
Before to determine if a field is meta-field, a static method of MapperService isMetadataField was used. This method was using an outdated static list of meta-fields. This PR instead changes this method to the instance method that is also aware of meta-fields in all registered plugins. Related #38373, #41656 Closes #24422
This commit restores the filtering of empty fields during the xcontent serialization of SearchHit. The filtering was removed unintentionally in elastic#41656.
This commit restores the filtering of empty fields during the xcontent serialization of SearchHit. The filtering was removed unintentionally in #41656.
This commit restores the filtering of empty fields during the xcontent serialization of SearchHit. The filtering was removed unintentionally in #41656.
This commit restores the filtering of empty fields during the xcontent serialization of SearchHit. The filtering was removed unintentionally in #41656.
This is the second PR to deal with the issue discussed in #24422. In the first PR #38373 the GetResult is adjusted while here we concentrate on SearchHit