-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make field limit more predictable #102885
Make field limit more predictable #102885
Conversation
Today, we're counting all mappers, including mappers for subfields that aren't explicitly added to the mapping towards the field limit. This means that some field types, such as search_as_you_type or percolator count as more than one field even though that's not apparent to users as they're just defining them as a single field in the mapping. This change makes it so that each field mapper only counts as one. We're still counting multi-fields. This makes it easier to understand for users why the field limit is hit.
Hi @felixbarny, I've created a changelog YAML for you. |
+1 to this change. The thing that is not entirely clear to me is what should we do about meta fields ( There is a strong benefit for us, which is that recording additional fields under the hood is no longer a change that affects users, if we don't count these fields. |
Meta fields ( |
Pinging @elastic/es-search (Team:Search) |
I think this is a good point. Currently, if we'd decide to add additional mappers for a field, that might be seen as a breaking change as users can then add fewer of these fields to the mapping. So this change would avoid these kinds of breaking changes in the future. |
) Adds a new `index.mapping.total_fields.ignore_dynamic_beyond_limit` index setting. When set to `true`, new fields are added to the mapping as long as the field limit (`index.mapping.total_fields.limit`) is not exceeded. Fields that would exceed the limit are not added to the mapping, similar to `dynamic: false`. Ignored fields are added to the `_ignored` metadata field. Relates to #89911 To make this easier to review, this is split into the following PRs: - [x] #102915 - [x] #102936 - [x] #104769 Related but not a prerequisite: - [ ] #102885
@@ -113,6 +113,11 @@ public void validate(MappingLookup mappers) { | |||
} | |||
} | |||
|
|||
@Override | |||
public int getTotalFieldsCount() { | |||
return 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see here that we are counting aliases as one...wouldn't it make sense to skip aliases not counting them?
Considering also that we might use (extensively) passthrough fields #103648
probably it might make sense not to count them.
LGTM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that we should not complicate things. If all this change does is possibly allow for more fields in the mappings with the same total fields limit, it should not be a concern. It is more permissive and won't break existing users. It does mean that docs/mappings that were rejected before the upgrade are possibly going to be accepted after the upgrade, for the same indices, which is slightly weird. I am not convinced though that we should tie this change to the index created version.
I must admit that I still find it difficult to summarize what counting this change affects, in that it moves things around by delegating the counting to each mapper, and I am not sure what part of that is just mechanical changes and what part affects the actual counting semantics. Is the only difference those special cases like percolator and search as you type, which have more fields than one, and will now count as 1, while before we were not able to make that distinction within MappingLookup?
Should we update the docs on this or perhaps it is too much in the weeds that users don't need to know? I guess we don't even document the current behaviour that clearly?
* Returns the total number of mappers defined in the mappings, including field mappers and their sub-fields | ||
* (which are not explicitly defined in the mappings), multi-fields, object mappers, runtime fields and metadata field mappers. | ||
*/ | ||
public long getTotalMapperCount() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I can no longer tell the difference between getTotalFieldsCount and getTotalMapperCount . It's very subtle, or is there any difference at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this pretty much summarizes the change. Another way to put it would be to say that now we're counting only the fields that are visible in the mapping. Before the change, we counted mappers, not fields and a field may have multiple mappers. This is also why
I don't feel like docs changes are needed. It should be clearer and more easy to reason about why the field limit has been hit than before. Now, users can just count the fields in the mapping and when the limit is hit, we'd expect that the number of fields visible in the mapping is equal to the field limit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Today, we're counting all mappers, including mappers for subfields that aren't explicitly added to the mapping towards the field limit.
This means that some field types, such as
search_as_you_type
orpercolator
count as more than one field even though that's not apparent to users as they're just defining them as a single field in the mapping.This change makes it so that each field mapper only counts as one. We're still counting multi-fields.
This makes it easier to understand for users why the field limit is hit.
In addition to that, it also simplifies #96235 as it makes the implementation ofMapper.Builder#getTotalFieldsCount
much easier and easier to align withMapper#getTotalFieldsCount
. This reduces the risk of over- or under-estimating the field count of aMapper.Builder
inDocumentParserContext#addDynamicMapper
,which in turn reduces the risk of data loss due to the issue described here: #96235 (comment).
Edit: due to #103865, we don't need an implementation of
getTotalFieldsCount
ormapperSize
inMapper.Builder
. Still, this PR more closely alignsMapper#getTotalFieldsCount
withMappingLookup#getTotalFieldsCount
, whichDocumentParserContext#addDynamicMapper
uses to determine whether the field limit is hitA potential risk of this is that we're now effectively allowing more fields in the mapping. It may be surprising to users that more fields can be added to a mapping. Although, I'd not expect negative consequences from that. Generally, I'd expect users to be happy about any change that reduces the risk of data loss.
We could also think about whether to apply the new counting logic only to new indices (depending on the
IndexVersion
). However, that would add more complexity and I'm not convinced about the value. We'd then need to maintain two different ways of counting fields and also require passing in theIndexVersion
toMappingLookup
which previously didn't require theIndexVersion
.This PR is meant as a conversation starter. It would also simplify #96235 but I don't think this blocks that PR in any way.
I'm curious about the opinion of @javanna and @jpountz on this.