Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make field limit more predictable #102885

Conversation

felixbarny
Copy link
Member

@felixbarny felixbarny commented Dec 1, 2023

Today, we're counting all mappers, including mappers for subfields that aren't explicitly added to the mapping towards the field limit.

This means that some field types, such as search_as_you_type or percolator count as more than one field even though that's not apparent to users as they're just defining them as a single field in the mapping.

This change makes it so that each field mapper only counts as one. We're still counting multi-fields.

This makes it easier to understand for users why the field limit is hit.

In addition to that, it also simplifies #96235 as it makes the implementation of Mapper.Builder#getTotalFieldsCount much easier and easier to align with Mapper#getTotalFieldsCount. This reduces the risk of over- or under-estimating the field count of a Mapper.Builder in DocumentParserContext#addDynamicMapper,
which in turn reduces the risk of data loss due to the issue described here: #96235 (comment).

Edit: due to #103865, we don't need an implementation of getTotalFieldsCount or mapperSize in Mapper.Builder. Still, this PR more closely aligns Mapper#getTotalFieldsCount with MappingLookup#getTotalFieldsCount, which DocumentParserContext#addDynamicMapper uses to determine whether the field limit is hit

A potential risk of this is that we're now effectively allowing more fields in the mapping. It may be surprising to users that more fields can be added to a mapping. Although, I'd not expect negative consequences from that. Generally, I'd expect users to be happy about any change that reduces the risk of data loss.

We could also think about whether to apply the new counting logic only to new indices (depending on the IndexVersion). However, that would add more complexity and I'm not convinced about the value. We'd then need to maintain two different ways of counting fields and also require passing in the IndexVersion to MappingLookup which previously didn't require the IndexVersion.

This PR is meant as a conversation starter. It would also simplify #96235 but I don't think this blocks that PR in any way.

I'm curious about the opinion of @javanna and @jpountz on this.

Today, we're counting all mappers, including mappers for subfields that aren't explicitly added to the mapping towards the field limit.

This means that some field types, such as search_as_you_type or percolator count as more than one field even though that's not apparent to users as they're just defining them as a single field in the mapping.

This change makes it so that each field mapper only counts as one. We're still counting multi-fields.

This makes it easier to understand for users why the field limit is hit.
@felixbarny felixbarny added >enhancement :Search Foundations/Mapping Index mappings, including merging and defining field types labels Dec 1, 2023
@elasticsearchmachine elasticsearchmachine added v8.12.0 external-contributor Pull request authored by a developer outside the Elasticsearch team labels Dec 1, 2023
@elasticsearchmachine
Copy link
Collaborator

Hi @felixbarny, I've created a changelog YAML for you.

@felixbarny felixbarny self-assigned this Dec 1, 2023
@jpountz
Copy link
Contributor

jpountz commented Dec 13, 2023

+1 to this change. The thing that is not entirely clear to me is what should we do about meta fields (_id , _source, etc.). It might be better to not count them either?

There is a strong benefit for us, which is that recording additional fields under the hood is no longer a change that affects users, if we don't count these fields.

@felixbarny
Copy link
Member Author

Meta fields (Mapping#metadataMappers) haven't been counted previously and they also aren't counted with this PR (only Mapping#root).

@felixbarny felixbarny marked this pull request as ready for review December 13, 2023 13:55
@elasticsearchmachine elasticsearchmachine added the Team:Search Meta label for search team label Dec 13, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@felixbarny
Copy link
Member Author

There is a strong benefit for us, which is that recording additional fields under the hood is no longer a change that affects users, if we don't count these fields.

I think this is a good point. Currently, if we'd decide to add additional mappers for a field, that might be seen as a breaking change as users can then add fewer of these fields to the mapping. So this change would avoid these kinds of breaking changes in the future.

@siposea siposea added the :StorageEngine/Logs You know, for Logs label Jan 25, 2024
elasticsearchmachine pushed a commit that referenced this pull request Feb 2, 2024
)

Adds a new `index.mapping.total_fields.ignore_dynamic_beyond_limit`
index setting.

When set to `true`, new fields are added to the mapping as long as the
field limit (`index.mapping.total_fields.limit`) is not exceeded. Fields
that would exceed the limit are not added to the mapping, similar to
`dynamic: false`.  Ignored fields are added to the `_ignored` metadata
field.

Relates to #89911

To make this easier to review, this is split into the following PRs: -
[x] #102915 - [x]
#102936 - [x]
#104769

Related but not a prerequisite: - [ ]
#102885
@@ -113,6 +113,11 @@ public void validate(MappingLookup mappers) {
}
}

@Override
public int getTotalFieldsCount() {
return 1;
Copy link
Contributor

@salvatore-campagna salvatore-campagna Feb 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see here that we are counting aliases as one...wouldn't it make sense to skip aliases not counting them?
Considering also that we might use (extensively) passthrough fields #103648
probably it might make sense not to count them.

@salvatore-campagna
Copy link
Contributor

LGTM

Copy link
Member

@javanna javanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that we should not complicate things. If all this change does is possibly allow for more fields in the mappings with the same total fields limit, it should not be a concern. It is more permissive and won't break existing users. It does mean that docs/mappings that were rejected before the upgrade are possibly going to be accepted after the upgrade, for the same indices, which is slightly weird. I am not convinced though that we should tie this change to the index created version.

I must admit that I still find it difficult to summarize what counting this change affects, in that it moves things around by delegating the counting to each mapper, and I am not sure what part of that is just mechanical changes and what part affects the actual counting semantics. Is the only difference those special cases like percolator and search as you type, which have more fields than one, and will now count as 1, while before we were not able to make that distinction within MappingLookup?

Should we update the docs on this or perhaps it is too much in the weeds that users don't need to know? I guess we don't even document the current behaviour that clearly?

* Returns the total number of mappers defined in the mappings, including field mappers and their sub-fields
* (which are not explicitly defined in the mappings), multi-fields, object mappers, runtime fields and metadata field mappers.
*/
public long getTotalMapperCount() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I can no longer tell the difference between getTotalFieldsCount and getTotalMapperCount . It's very subtle, or is there any difference at all?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@felixbarny
Copy link
Member Author

Is the only difference those special cases like percolator and search as you type, which have more fields than one, and will now count as 1, while before we were not able to make that distinction within MappingLookup?

Yes, this pretty much summarizes the change. Another way to put it would be to say that now we're counting only the fields that are visible in the mapping. Before the change, we counted mappers, not fields and a field may have multiple mappers.

This is also why MappingLookup now has two different methods: getTotalMapperCount returns the number of mapper instances in the mapping, which is used to estimate the storage overhead of the mapping in NodeMappingStats. The getTotalFieldsCount method is used when it comes to validating the field limit or checking whether we have budget left to add a dynamic field.

Should we update the docs on this or perhaps it is too much in the weeds that users don't need to know? I guess we don't even document the current behaviour that clearly?

I don't feel like docs changes are needed. It should be clearer and more easy to reason about why the field limit has been hit than before. Now, users can just count the fields in the mapping and when the limit is hit, we'd expect that the number of fields visible in the mapping is equal to the field limit.

Copy link
Member

@javanna javanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@felixbarny felixbarny added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Feb 6, 2024
@elasticsearchmachine elasticsearchmachine merged commit ff0f83f into elastic:main Feb 6, 2024
14 checks passed
@felixbarny felixbarny deleted the field-limit-count-fields-not-mappers branch February 6, 2024 12:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) >enhancement external-contributor Pull request authored by a developer outside the Elasticsearch team :Search Foundations/Mapping Index mappings, including merging and defining field types :StorageEngine/Logs You know, for Logs Team:Search Meta label for search team v8.13.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants