-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make total fields limit less of a nuisance to users #89911
Comments
Pinging @elastic/es-search (Team:Search) |
I think this would be a reasonable approach. Especially if you could configure limit on sub-documents as well (some parts of a document might be coming from users and you want to limit dynamic mappings there). |
We discussed with the team, and we said the following:
|
++ to the above, as discussed on another channel. Applying the limit to dynamic mapping updates only seems to be the way to go to me too. We have a lot of built-in mappings with far more than 1k fields in our own products already. For dynamic mapping updates I think a higher limit may make sense if this is something that's actually causing trouble for users. The only reservation I had in this regard was that unlike a static mapping of thousands of fields, a dynamic mapping of thousands of fields will have data in every field. This causes higher memory use from Lucene data structures than just having unused fields in a static mapping. As we discussed on another channel, that kind of issue shouldn't be addressed by the default value of this setting though but rather by other means of reducing per-field overhead. |
Huge +1 on everything that was said here. Just one addition that wasn't discussed yet in this thread. I think that the biggest pain with the field limit is that it causes data loss. Therefore, there should be a mode where hitting the field limit doesn't lead to rejecting the document but to not adding additional dynamic fields once the limit has reached. In other words it should be possible to index the first 1000 dynamic fields and after that, the additional fields would just be stored but not added to the mapping (similar to This would resolve a huge pain point that we have for logging and tracing data where bogus documents or misuse of an API can lead to data loss. As in APM, multiple services share the same data stream, a single misbehaving service can cause data loss for all services. |
) Adds a new `index.mapping.total_fields.ignore_dynamic_beyond_limit` index setting. When set to `true`, new fields are added to the mapping as long as the field limit (`index.mapping.total_fields.limit`) is not exceeded. Fields that would exceed the limit are not added to the mapping, similar to `dynamic: false`. Ignored fields are added to the `_ignored` metadata field. Relates to #89911 To make this easier to review, this is split into the following PRs: - [x] #102915 - [x] #102936 - [x] #104769 Related but not a prerequisite: - [ ] #102885
Update: We've merged This addresses the document loss when adding dynamic fields beyond the limit. It doesn't cover the part to apply the field limit only to dynamic mappings updates. |
Is there a way to change this setting on all future indices? I don't understand how to prevent document loss if there is no IaC way of setting this rule before the index is created and starts skipping fields. How would I apply this rule to sharded indices, where streams are piped to an index name with a timestamp? It seems that this setting cannot be defined in any configuration file like most others, or in the stack management advanced settings.... This has been a massive barrier in Elasticsearch usability for our use with very little documentation. Our ingest data haven't even exceeded 4GB at this stage. |
It is easy to specify this in your index configuration: {
"settings": {
"index.mapping.total_fields.limit": 20000
},
"mappings": {
}
} |
Hey @jackgray, you can use index templates to define the mappings for an index pattern before these indices exist. Elasticsearch also ships with a default index template for Besides that, the default index template for |
Pinging @elastic/es-search-foundations (Team:Search Foundations) |
Since version 5.0, every index created within Elasticsearch has a maximum total number of fields defined in its mappings, which defaults to 1000. Fields can be manually added to the index mappings through the put mappings API or via dynamic mappings by indexing documents through the index API. A request that causes the total fields count to go above the limit is rejected, whether that be a put mappings, a create index or an index call. The total fields limit can be manually increased using the update index settings API.
The main reason why the total fields limit was introduced (see #11443) is to prevent a mappings explosion caused by ingesting a bogus document with many JSON keys (see #73460 for an example). A mappings explosion impacts the size of the cluster state, the memory footprint of data nodes, and hinders stability.
While the total fields limit is a safety measure against mappings explosion, it is not an effective solution to prevent data nodes from going out of memory due to too many fields being defined: it's an index based limit, meaning that you can have 10k indices with 990 fields each without hitting the limit, yet possibly running into problems depending on the available resources, but a single index with 1000 fields is not allowed. Data nodes load mappings only for the indices that have at least one shard allocated to them, which makes it quite difficult to have a reasonable limit to effectively prevent data nodes from going out of memory.
It is quite common for users to reach the total fields limit, which causes ingestion failures, and consequent need to increase the total fields limit. Our Solutions (e.g. APM) increase the total fields limit too. The fact that many users end up reaching the limit despite they are not ingesting bogus documents sounds like a bug: ideally the limit would be reached only with a very high number of fields that is very likely to be caused by a bogus document, and no user would have to know about or increase the limit otherwise.
The total fields limit has been around for quite some time, so it may very well be that the
1000
default was reasonably high when it was introduced, but it turned out to be too low over time. Possibly all the recent improvements made in the cluster state handling area on dealing with many shards and many indices have also helped supporting more fields in the mappings. An area of improvement is the memory footprint of mappings within data nodes (see #86440), and once we improve that we will be able to support even more fields, yet this is a tangential issue given that the current limit does not prevent data nodes from going out of memory.I'd propose that we consider making the following changes, with the high-level goal of making the total fields limit less visible to users, yet while being still effective for its original goal:
apply the total fields limit only to dynamic mappings update: if the limit was introduced to protect from dynamic mappings update caused by bogus documents, why do we apply it to every mapping update including the ones triggered by put mappings and create index calls? Shall we stop doing that and apply the limit only to dynamic mappings update?
given that the limit turns out to be too low, and users that rely on dynamic mappings end up having to increase it, would it be reasonable to increase the limit in a way that less users would stumble upon it, yet the limit would still ensure that bogus documents are rejected? Would 10000 fit these requirements? Is it still too low for situations where a single index is created?
should we consider introducing a different mechanism to limit fields creation based on resources, and taking into account also the amount of indices etc. to prevent data nodes from going out of memory due to too many fields in the mappings?
Is there any preparation work needed to feel confident that making the total fields limit more permissive does not cause problems? Could we end up allowing for situations that would have previously legitimately hit the limit?
The text was updated successfully, but these errors were encountered: