Make field limit more predictable #102885

felixbarny · 2023-12-01T17:09:45Z

Today, we're counting all mappers, including mappers for subfields that aren't explicitly added to the mapping towards the field limit.

This means that some field types, such as search_as_you_type or percolator count as more than one field even though that's not apparent to users as they're just defining them as a single field in the mapping.

This change makes it so that each field mapper only counts as one. We're still counting multi-fields.

This makes it easier to understand for users why the field limit is hit.

In addition to that, it also simplifies #96235 as it makes the implementation of Mapper.Builder#getTotalFieldsCount much easier and easier to align with Mapper#getTotalFieldsCount. This reduces the risk of over- or under-estimating the field count of a Mapper.Builder in DocumentParserContext#addDynamicMapper,
which in turn reduces the risk of data loss due to the issue described here: #96235 (comment).

Edit: due to #103865, we don't need an implementation of getTotalFieldsCount or mapperSize in Mapper.Builder. Still, this PR more closely aligns Mapper#getTotalFieldsCount with MappingLookup#getTotalFieldsCount, which DocumentParserContext#addDynamicMapper uses to determine whether the field limit is hit

A potential risk of this is that we're now effectively allowing more fields in the mapping. It may be surprising to users that more fields can be added to a mapping. Although, I'd not expect negative consequences from that. Generally, I'd expect users to be happy about any change that reduces the risk of data loss.

We could also think about whether to apply the new counting logic only to new indices (depending on the IndexVersion). However, that would add more complexity and I'm not convinced about the value. We'd then need to maintain two different ways of counting fields and also require passing in the IndexVersion to MappingLookup which previously didn't require the IndexVersion.

This PR is meant as a conversation starter. It would also simplify #96235 but I don't think this blocks that PR in any way.

I'm curious about the opinion of @javanna and @jpountz on this.

Today, we're counting all mappers, including mappers for subfields that aren't explicitly added to the mapping towards the field limit. This means that some field types, such as search_as_you_type or percolator count as more than one field even though that's not apparent to users as they're just defining them as a single field in the mapping. This change makes it so that each field mapper only counts as one. We're still counting multi-fields. This makes it easier to understand for users why the field limit is hit.

elasticsearchmachine · 2023-12-01T17:10:31Z

Hi @felixbarny, I've created a changelog YAML for you.

jpountz · 2023-12-13T13:20:30Z

+1 to this change. The thing that is not entirely clear to me is what should we do about meta fields (_id , _source, etc.). It might be better to not count them either?

There is a strong benefit for us, which is that recording additional fields under the hood is no longer a change that affects users, if we don't count these fields.

felixbarny · 2023-12-13T13:23:33Z

Meta fields (Mapping#metadataMappers) haven't been counted previously and they also aren't counted with this PR (only Mapping#root).

elasticsearchmachine · 2023-12-13T13:55:35Z

Pinging @elastic/es-search (Team:Search)

felixbarny · 2023-12-13T14:11:00Z

There is a strong benefit for us, which is that recording additional fields under the hood is no longer a change that affects users, if we don't count these fields.

I think this is a good point. Currently, if we'd decide to add additional mappers for a field, that might be seen as a breaking change as users can then add fewer of these fields to the mapping. So this change would avoid these kinds of breaking changes in the future.

…lds-not-mappers

) Adds a new `index.mapping.total_fields.ignore_dynamic_beyond_limit` index setting. When set to `true`, new fields are added to the mapping as long as the field limit (`index.mapping.total_fields.limit`) is not exceeded. Fields that would exceed the limit are not added to the mapping, similar to `dynamic: false`. Ignored fields are added to the `_ignored` metadata field. Relates to #89911 To make this easier to review, this is split into the following PRs: - [x] #102915 - [x] #102936 - [x] #104769 Related but not a prerequisite: - [ ] #102885

salvatore-campagna · 2024-02-01T10:44:12Z

server/src/main/java/org/elasticsearch/index/mapper/FieldAliasMapper.java

@@ -113,6 +113,11 @@ public void validate(MappingLookup mappers) {
        }
    }

+    @Override
+    public int getTotalFieldsCount() {
+        return 1;


I see here that we are counting aliases as one...wouldn't it make sense to skip aliases not counting them?
Considering also that we might use (extensively) passthrough fields #103648
probably it might make sense not to count them.

salvatore-campagna · 2024-02-02T13:58:47Z

LGTM

javanna

I agree that we should not complicate things. If all this change does is possibly allow for more fields in the mappings with the same total fields limit, it should not be a concern. It is more permissive and won't break existing users. It does mean that docs/mappings that were rejected before the upgrade are possibly going to be accepted after the upgrade, for the same indices, which is slightly weird. I am not convinced though that we should tie this change to the index created version.

I must admit that I still find it difficult to summarize what counting this change affects, in that it moves things around by delegating the counting to each mapper, and I am not sure what part of that is just mechanical changes and what part affects the actual counting semantics. Is the only difference those special cases like percolator and search as you type, which have more fields than one, and will now count as 1, while before we were not able to make that distinction within MappingLookup?

Should we update the docs on this or perhaps it is too much in the weeds that users don't need to know? I guess we don't even document the current behaviour that clearly?

javanna · 2024-02-02T14:29:10Z

server/src/main/java/org/elasticsearch/index/mapper/MappingLookup.java

+     * Returns the total number of mappers defined in the mappings, including field mappers and their sub-fields
+     * (which are not explicitly defined in the mappings), multi-fields, object mappers, runtime fields and metadata field mappers.
+     */
+    public long getTotalMapperCount() {


I think I can no longer tell the difference between getTotalFieldsCount and getTotalMapperCount . It's very subtle, or is there any difference at all?

See #102885 (comment)

felixbarny · 2024-02-06T10:25:14Z

Is the only difference those special cases like percolator and search as you type, which have more fields than one, and will now count as 1, while before we were not able to make that distinction within MappingLookup?

Yes, this pretty much summarizes the change. Another way to put it would be to say that now we're counting only the fields that are visible in the mapping. Before the change, we counted mappers, not fields and a field may have multiple mappers.

This is also why MappingLookup now has two different methods: getTotalMapperCount returns the number of mapper instances in the mapping, which is used to estimate the storage overhead of the mapping in NodeMappingStats. The getTotalFieldsCount method is used when it comes to validating the field limit or checking whether we have budget left to add a dynamic field.

Should we update the docs on this or perhaps it is too much in the weeds that users don't need to know? I guess we don't even document the current behaviour that clearly?

I don't feel like docs changes are needed. It should be clearer and more easy to reason about why the field limit has been hit than before. Now, users can just count the fields in the mapping and when the limit is hit, we'd expect that the number of fields visible in the mapping is equal to the field limit.

…lds-not-mappers

javanna

LGTM

felixbarny added 2 commits December 1, 2023 17:53

Add test case for multi-field inside a multi-field

7b227bb

felixbarny added >enhancement :Search Foundations/Mapping Index mappings, including merging and defining field types labels Dec 1, 2023

elasticsearchmachine added v8.12.0 external-contributor Pull request authored by a developer outside the Elasticsearch team labels Dec 1, 2023

Update docs/changelog/102885.yaml

e96cb64

felixbarny self-assigned this Dec 1, 2023

Add assertWarnings to LegacyGeoShapeFieldMapperTests

bc3542f

This was referenced Dec 4, 2023

Add ability to limit fields added during Mapper#merge #102936

Merged

Add setting to ignore dynamic fields when field limit is reached #96235

Merged

brianseeders added v8.13.0 and removed v8.12.0 labels Dec 6, 2023

felixbarny mentioned this pull request Dec 8, 2023

Avoid contacting master on noop mapping update #102915

Merged

felixbarny marked this pull request as ready for review December 13, 2023 13:55

elasticsearchmachine added the Team:Search Meta label for search team label Dec 13, 2023

dakrone requested a review from salvatore-campagna January 3, 2024 14:35

felixbarny added 4 commits January 24, 2024 12:28

Merge remote-tracking branch 'origin/main' into field-limit-count-fie…

37eadb6

…lds-not-mappers

Fix exceedsLimit

eb9ae17

Remove unnecessary getRootObjectMapperBuilder method

99eb652

Apply spotless suggestions

1f70c52

siposea added the :StorageEngine/Logs You know, for Logs label Jan 25, 2024

salvatore-campagna reviewed Feb 2, 2024

View reviewed changes

salvatore-campagna approved these changes Feb 2, 2024

View reviewed changes

javanna reviewed Feb 2, 2024

View reviewed changes

felixbarny added 2 commits February 6, 2024 11:29

Merge remote-tracking branch 'origin/main' into field-limit-count-fie…

c89daa3

…lds-not-mappers

Update DocumentParserContext to use new method name

fd41e67

javanna approved these changes Feb 6, 2024

View reviewed changes

felixbarny added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Feb 6, 2024

elasticsearchmachine merged commit ff0f83f into elastic:main Feb 6, 2024
14 checks passed

felixbarny deleted the field-limit-count-fields-not-mappers branch February 6, 2024 12:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make field limit more predictable #102885

Make field limit more predictable #102885

felixbarny commented Dec 1, 2023 •

edited

Loading

elasticsearchmachine commented Dec 1, 2023

jpountz commented Dec 13, 2023

felixbarny commented Dec 13, 2023

elasticsearchmachine commented Dec 13, 2023

felixbarny commented Dec 13, 2023

salvatore-campagna Feb 1, 2024 •

edited

Loading

salvatore-campagna commented Feb 2, 2024

javanna left a comment

javanna Feb 2, 2024

felixbarny Feb 6, 2024

felixbarny commented Feb 6, 2024

javanna left a comment

Make field limit more predictable #102885

Make field limit more predictable #102885

Conversation

felixbarny commented Dec 1, 2023 • edited Loading

elasticsearchmachine commented Dec 1, 2023

jpountz commented Dec 13, 2023

felixbarny commented Dec 13, 2023

elasticsearchmachine commented Dec 13, 2023

felixbarny commented Dec 13, 2023

salvatore-campagna Feb 1, 2024 • edited Loading

Choose a reason for hiding this comment

salvatore-campagna commented Feb 2, 2024

javanna left a comment

Choose a reason for hiding this comment

javanna Feb 2, 2024

Choose a reason for hiding this comment

felixbarny Feb 6, 2024

Choose a reason for hiding this comment

felixbarny commented Feb 6, 2024

javanna left a comment

Choose a reason for hiding this comment

felixbarny commented Dec 1, 2023 •

edited

Loading

salvatore-campagna Feb 1, 2024 •

edited

Loading