Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fetch meta fields in FetchFieldsPhase using ValueFetcher #106325

Conversation

salvatore-campagna
Copy link
Contributor

@salvatore-campagna salvatore-campagna commented Mar 13, 2024

Here we extract the logic to populate metadata fields such as _ignored, _routing, _size and the deprecated _type into FetchFieldsPhase so that we can use the ValueFetcher interface to retrieve field values. This allows us to fetch values no matter if the Mapper uses stored or doc values.

This is a refactoring required for #101373

@elasticsearchmachine elasticsearchmachine added v8.14.0 needs:triage Requires assignment of a team area label labels Mar 13, 2024
@salvatore-campagna salvatore-campagna added :Search/Search Search-related issues that do not fall into other categories >refactoring :StorageEngine/Logs You know, for Logs and removed needs:triage Requires assignment of a team area label labels Mar 13, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

Copy link
Member

@javanna javanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for opening this separate PR, I left some comments.

@@ -77,7 +68,7 @@ public FetchSubPhaseProcessor getProcessor(FetchContext fetchContext) {
Collection<String> fieldNames = sec.getMatchingFieldNames(field);
for (String fieldName : fieldNames) {
MappedFieldType ft = sec.getFieldType(fieldName);
if (ft.isStored() == false) {
if (ft.isStored() == false || sec.isMetadataField(fieldName)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that StoredField is a private record, and we create it within this class, I think we should simply skip creating an instance for metadata fields, instead of creating it and later ignoring it?

Copy link
Contributor Author

@salvatore-campagna salvatore-campagna Mar 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the field is a metadata field we hit the if and execute continue which means we do not create the stored field and we do not add it to storedFields. So I think we are not creating an instance od StoredField for metadata fields. So it looks to me that we already skip the creation of the stored field. Am I wrong?

I guess one could argue that later the check on storedField.isMetadataField == false is not necessary because of this since all stored fields we create are not metadata fields, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct, I think we should now be able to remove the isMetadataField member from the private StoredField record. There is also a conditional to exclude the _source from retrieval, that can and should be now removed.

@Override
public FetchSubPhaseProcessor getProcessor(FetchContext fetchContext) {
StoredFieldsContext storedFieldsContext = fetchContext.storedFieldsContext();
if (storedFieldsContext == null || storedFieldsContext.fetchFields() == false) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that MetadataFetcher below uses valueFetchers to retrieve values, and that is great. That means that this phase should no longer be exposed to how a certain field is stored, it just retrieves metadata fields via value fetchers.

That should be straightforward for metadata fields which get returned by default, because they are not explicitly requested and instead of being retrieve from stored fields they go via value fetchers.

Metadata fields that are requested explicitly are requested in different ways: via fields, which goes through FetchFieldsPhase, and via stored_fields which currently goes through StoredFieldsPhase. It seems to me that you intend to move the latter and not the former to this new phase. In fact FetchFieldsPhase already goes through value fetchers, so requesting fields:_ignored would become equivalent to retrieving it via stored_fields:_ignored in how the field is retrieved, but it would be returned in a different section in the search response?

What is odd here is that for this phase to return metadata fields via value fetchers, they need to be requested via stored_fields. That tells me that it may make sense to unify this new phase with the existing FetchFieldsPhase, and add the notion of extracting meta fields requested via stored fields context. That would also require less code duplication, or am I missing something?

};
}

private static List<FieldAndFormat> getAdditionalFields(final StoredFieldsContext storedFieldsContext) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the purpose of this method, shouldn't we extract only metadata fields that are explicitly requested via stored_fields. What do we do instead? Why the explicit mention of _size?

import java.util.Map;
import java.util.stream.Stream;

public class MetadataFetcher {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

given that this is the only way to retrieve metadata fields in the new phase, I don't see why we need a separate component for it. A big portion of it is what FetchFieldsPhase already does, so maybe we should unify the two and try to reuse what is already available? What is the need for a new component / phase ? The only difference is where fields are returned in the search response?

@salvatore-campagna salvatore-campagna changed the title Extract meta fields fetching in separate fetch sub-phase using ValueFecther Fetch meta fields in FetchFieldsPhase using ValueFetcher Mar 14, 2024
Source.fromMap(filteredSource, source.sourceContentType()),
doc
);

Map<String, Object> nestedEntry = new HashMap<>();
for (DocumentField field : fetchResult.values()) {
for (DocumentField field : fetchResult.documentFields().values()) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I am using just the document fields because there is no one of the metadata fields used in the response at the top level that need to be used in a nested context.

@salvatore-campagna salvatore-campagna self-assigned this Mar 18, 2024
If fields are not allowed, later on we are not able to retrieve the
corresponding MappedFieldType which results in a null pointer exception.
Filtering such fields happens by means of the QueryRewriteContext#allowedFields
which prevents fetching mappings if the field predicate evaluates
to false.
javanna added a commit that referenced this pull request Apr 5, 2024
While looking at #106325, which moves fetching of metadata fields out of the StoredFieldsPhase, I noticed some small adjustments that we can make to FieldsVisitor, CustomFieldsVisitor and StoredFieldsPhase.

These are not functional changes, the only goal is to make things simpler and clearer, hopefully.

- add test coverage for situation where _routing is provided with docs, hence returned by default
make a stronger connection between CustomFieldsVisitor and FieldsVisitor around fields that are treated differently (_ignored, _routing, _id and _source)
- explicitly exclude _id from StoredFieldsPhase like we already do for _source as it's retrieved separately
- move the _source exclusion in StoredFieldsPhase to after calling getMatchingFieldNames, so that patterns that match _source still exclude it
Copy link
Member

@javanna javanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found two small things that need addressing, but it's real close. Thanks!

final Set<String> matchingFieldNames = searchExecutionContext.getMatchingFieldNames(storedField);
for (final String matchingFieldName : matchingFieldNames) {
final MappedFieldType fieldType = searchExecutionContext.getFieldType(matchingFieldName);
if (SourceFieldMapper.NAME.equals(matchingFieldName) == false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this conditional should be before getting the field type: there is no need to retrieve the field type for source and id, and for any other field that is not a metadata field.

if (SourceFieldMapper.NAME.equals(matchingFieldName) == false
&& IdFieldMapper.NAME.equals(matchingFieldName) == false
&& searchExecutionContext.isMetadataField(matchingFieldName)
&& fieldType.isStored()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a comment around using isStored here to reproduce the previous behaviour we had in StoredFieldsPhase. This is purely for bw comp. I think one way to improve this is merging #107086 later.

Copy link
Member

@javanna javanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@salvatore-campagna salvatore-campagna merged commit 4dfcb08 into elastic:main Apr 15, 2024
14 checks passed
salvatore-campagna added a commit to eyalkoren/elasticsearch that referenced this pull request Apr 15, 2024
javanna added a commit to javanna/elasticsearch that referenced this pull request Apr 15, 2024
We have moved fetching of metadata fields to FetchFieldsPhase with elastic#106325 .
For backwards compatibility, metadata fields requested via `stored_fields` are returned
even when they match a wildcard expression, which differs from how patterns are resolved
via `fields`. Instead of calling getMatchingFieldNames and getFieldType twice per field,
this distinction is now included in FieldFetcher#create via a predicate
javanna added a commit that referenced this pull request May 2, 2024
We have moved fetching of metadata fields to FetchFieldsPhase with #106325 .
For backwards compatibility, metadata fields requested via `stored_fields` are returned
even when they match a wildcard expression, which differs from how patterns are resolved
via `fields`. Instead of calling getMatchingFieldNames and getFieldType twice per field,
this distinction is now included in FieldFetcher#create via a predicate
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>refactoring :Search/Search Search-related issues that do not fall into other categories :StorageEngine/Logs You know, for Logs Team:Search Meta label for search team Team:StorageEngine v8.14.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants