-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fetch meta fields in FetchFieldsPhase using ValueFetcher #106325
Fetch meta fields in FetchFieldsPhase using ValueFetcher #106325
Conversation
Pinging @elastic/es-search (Team:Search) |
Pinging @elastic/es-storage-engine (Team:StorageEngine) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for opening this separate PR, I left some comments.
@@ -77,7 +68,7 @@ public FetchSubPhaseProcessor getProcessor(FetchContext fetchContext) { | |||
Collection<String> fieldNames = sec.getMatchingFieldNames(field); | |||
for (String fieldName : fieldNames) { | |||
MappedFieldType ft = sec.getFieldType(fieldName); | |||
if (ft.isStored() == false) { | |||
if (ft.isStored() == false || sec.isMetadataField(fieldName)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that StoredField is a private record, and we create it within this class, I think we should simply skip creating an instance for metadata fields, instead of creating it and later ignoring it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the field is a metadata field we hit the if and execute continue
which means we do not create the stored field and we do not add it to storedFields
. So I think we are not creating an instance od StoredField
for metadata fields. So it looks to me that we already skip the creation of the stored field. Am I wrong?
I guess one could argue that later the check on storedField.isMetadataField == false
is not necessary because of this since all stored fields we create are not metadata fields, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are correct, I think we should now be able to remove the isMetadataField member from the private StoredField record. There is also a conditional to exclude the _source
from retrieval, that can and should be now removed.
@Override | ||
public FetchSubPhaseProcessor getProcessor(FetchContext fetchContext) { | ||
StoredFieldsContext storedFieldsContext = fetchContext.storedFieldsContext(); | ||
if (storedFieldsContext == null || storedFieldsContext.fetchFields() == false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that MetadataFetcher below uses valueFetchers to retrieve values, and that is great. That means that this phase should no longer be exposed to how a certain field is stored, it just retrieves metadata fields via value fetchers.
That should be straightforward for metadata fields which get returned by default, because they are not explicitly requested and instead of being retrieve from stored fields they go via value fetchers.
Metadata fields that are requested explicitly are requested in different ways: via fields
, which goes through FetchFieldsPhase
, and via stored_fields
which currently goes through StoredFieldsPhase
. It seems to me that you intend to move the latter and not the former to this new phase. In fact FetchFieldsPhase
already goes through value fetchers, so requesting fields:_ignored
would become equivalent to retrieving it via stored_fields:_ignored
in how the field is retrieved, but it would be returned in a different section in the search response?
What is odd here is that for this phase to return metadata fields via value fetchers, they need to be requested via stored_fields
. That tells me that it may make sense to unify this new phase with the existing FetchFieldsPhase, and add the notion of extracting meta fields requested via stored fields context. That would also require less code duplication, or am I missing something?
}; | ||
} | ||
|
||
private static List<FieldAndFormat> getAdditionalFields(final StoredFieldsContext storedFieldsContext) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand the purpose of this method, shouldn't we extract only metadata fields that are explicitly requested via stored_fields
. What do we do instead? Why the explicit mention of _size
?
import java.util.Map; | ||
import java.util.stream.Stream; | ||
|
||
public class MetadataFetcher { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
given that this is the only way to retrieve metadata fields in the new phase, I don't see why we need a separate component for it. A big portion of it is what FetchFieldsPhase already does, so maybe we should unify the two and try to reuse what is already available? What is the need for a new component / phase ? The only difference is where fields are returned in the search response?
Source.fromMap(filteredSource, source.sourceContentType()), | ||
doc | ||
); | ||
|
||
Map<String, Object> nestedEntry = new HashMap<>(); | ||
for (DocumentField field : fetchResult.values()) { | ||
for (DocumentField field : fetchResult.documentFields().values()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here I am using just the document fields because there is no one of the metadata fields used in the response at the top level that need to be used in a nested context.
server/src/main/java/org/elasticsearch/search/fetch/subphase/FieldFetcher.java
Outdated
Show resolved
Hide resolved
…a fields" This reverts commit 99f129d.
If fields are not allowed, later on we are not able to retrieve the corresponding MappedFieldType which results in a null pointer exception. Filtering such fields happens by means of the QueryRewriteContext#allowedFields which prevents fetching mappings if the field predicate evaluates to false.
While looking at #106325, which moves fetching of metadata fields out of the StoredFieldsPhase, I noticed some small adjustments that we can make to FieldsVisitor, CustomFieldsVisitor and StoredFieldsPhase. These are not functional changes, the only goal is to make things simpler and clearer, hopefully. - add test coverage for situation where _routing is provided with docs, hence returned by default make a stronger connection between CustomFieldsVisitor and FieldsVisitor around fields that are treated differently (_ignored, _routing, _id and _source) - explicitly exclude _id from StoredFieldsPhase like we already do for _source as it's retrieved separately - move the _source exclusion in StoredFieldsPhase to after calling getMatchingFieldNames, so that patterns that match _source still exclude it
rest-api-spec/src/yamlRestTest/resources/rest-api-spec/test/search/520_fetch_fields.yml
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found two small things that need addressing, but it's real close. Thanks!
rest-api-spec/src/yamlRestTest/resources/rest-api-spec/test/get/120_stored_fields_ignored.yml
Outdated
Show resolved
Hide resolved
rest-api-spec/src/yamlRestTest/resources/rest-api-spec/test/search/520_fetch_fields.yml
Show resolved
Hide resolved
rest-api-spec/src/yamlRestTest/resources/rest-api-spec/test/search/520_fetch_fields.yml
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/search/fetch/subphase/FetchFieldsPhase.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/search/fetch/subphase/FetchFieldsPhase.java
Show resolved
Hide resolved
final Set<String> matchingFieldNames = searchExecutionContext.getMatchingFieldNames(storedField); | ||
for (final String matchingFieldName : matchingFieldNames) { | ||
final MappedFieldType fieldType = searchExecutionContext.getFieldType(matchingFieldName); | ||
if (SourceFieldMapper.NAME.equals(matchingFieldName) == false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this conditional should be before getting the field type: there is no need to retrieve the field type for source and id, and for any other field that is not a metadata field.
if (SourceFieldMapper.NAME.equals(matchingFieldName) == false | ||
&& IdFieldMapper.NAME.equals(matchingFieldName) == false | ||
&& searchExecutionContext.isMetadataField(matchingFieldName) | ||
&& fieldType.isStored()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a comment around using isStored
here to reproduce the previous behaviour we had in StoredFieldsPhase
. This is purely for bw comp. I think one way to improve this is merging #107086 later.
…card in stored fields phase
…via wildcard in stored fields phase" This reverts commit 4ace7a0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
We have moved fetching of metadata fields to FetchFieldsPhase with elastic#106325 . For backwards compatibility, metadata fields requested via `stored_fields` are returned even when they match a wildcard expression, which differs from how patterns are resolved via `fields`. Instead of calling getMatchingFieldNames and getFieldType twice per field, this distinction is now included in FieldFetcher#create via a predicate
We have moved fetching of metadata fields to FetchFieldsPhase with #106325 . For backwards compatibility, metadata fields requested via `stored_fields` are returned even when they match a wildcard expression, which differs from how patterns are resolved via `fields`. Instead of calling getMatchingFieldNames and getFieldType twice per field, this distinction is now included in FieldFetcher#create via a predicate
Here we extract the logic to populate metadata fields such as
_ignored
,_routing
,_size
and the deprecated_type
intoFetchFieldsPhase
so that we can use theValueFetcher
interface to retrieve field values. This allows us to fetch values no matter if the Mapper uses stored or doc values.This is a refactoring required for #101373