-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid contacting master on noop mapping update #102915
Avoid contacting master on noop mapping update #102915
Conversation
Also reset noop mapping update count when doing a retry for another reason
Pinging @elastic/es-distributed (Team:Distributed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the main part of the PR looks good, but I think the retry loop detection could be flawed towards updates and seems like it should be the same whether the master detects the noop or the data node itself detects it.
@@ -1072,6 +1091,70 @@ public void testPerformOnPrimaryReportsBulkStats() throws Exception { | |||
latch.await(); | |||
} | |||
|
|||
public void testNoopMappingUpdateInfiniteLoopPrevention() throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this test tests a non-existing scenario?
The noop prevention would only concern a case where the mapping used during parsing is different from the mapping used during preflight, containing the mapping changes added. And if so, we'd expect any retry to always succeed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this test tests a non-existing scenario?
I hope it does :)
This is meant as a safety net in case there's an unexpected bug that lead to a situation where we accept a field in DocumentParserContext
even though it turns out that don't have enough capacity for it when merging the mappings.
*/ | ||
public void resetForNoopMappingUpdateRetry(long mappingVersion) { | ||
assert assertInvariants(ItemProcessingState.TRANSLATED); | ||
if (noopMappingUpdateRetryForMappingVersion == mappingVersion) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure I follow why this infinite loop detection is necessary. I'd expect that since all dynamic fields from previous parse is now in the mappings, the retry would always succeed in one go?
The only case where I could see this not being true is updates, where a new version of the doc could appear and thus a new dynamic mapping update could potentially be required. That would be at odds with the detection here I think?
To some extent having some detection in assertions would be good, but it would have to not trigger for updates. And then be the same for other mapping update retries. I could accept not adding that now though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd expect that since all dynamic fields from previous parse is now in the mappings, the retry would always succeed in one go?
If all goes right, this is true. However, there's no strict guarantee about that. And the failure mode I have in mind is not related to concurrency. The risk comes from the fact that there are two distinct places that reason about a mapping's size. One is DocumentParserContext
, and the other is Mapping::merge
/MappingLookup
. In DocumentParserContext
, we're just estimating the size of the mapping when adding dynamic mappers. If there's a bug in that estimation (#102885 tries to make that less likely), there's a chance that we add more dynamic mappers than the mapping can hold. We normally find out if we have accepted too many dynamic mappers in DocumentParserContext
if the preflight check throws an exception. However, with #96235, for indices that have enabled the setting ignore_dynamic_beyond_limit
, the preflight check won't fail but silently ignore fields that are beyond the limit. After realizing that a noop mapping update has been requested, the indexing request gets retried. However, the mapping hasn't changed in the meantime. Therefore, when the document gets parsed, we're adding the same dynamic mapper in DocumentParserContext
again, which leads to the same dynamic mapping update that the preflight check will ignore again and we've entered an infinite loop.
Maybe another solution to this could be that the preflight check merge reason, in contrast to the auto-update merge reason doesn't ignore fields that are beyond the limit but throws an exception instead. However, due to a race condition, we might have added a dynamic mapper under the assumption that enough space is still available. After accepting the mapper, but before the preflight check, the mapping might get updated with a version that already puts us at the limit. If we then throw a validation error during the preflight check, it will lead to data loss instead of ignoring the fields that are above the limit.
The only case where I could see this not being true is updates, where a new version of the doc could appear and thus a new dynamic mapping update could potentially be required.
You're referring to the situation when the retry_on_conflict
parameter is set, right? I think this case is covered by the fact that resetForExecutionRetry()
resets noopMappingUpdateRetryForMappingVersion
. This means that the infinite loop prevention only kicks in if the BulkPrimaryExecutionContext
has been reset via noopMappingUpdateRetryForMappingVersion
twice in a row, and if the mapping also hasn't changed in the meantime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A way to reproduce the infinite loop is to purposefully introduce a bug in how DocumentParserContext
counts the size of a Mapper.Builder
in #96235:
elasticsearch/server/src/main/java/org/elasticsearch/index/mapper/DocumentParserContext.java
Line 325 in 94f25f5
int builderMapperSize = builder.mapperSize(); |
If you change that to int builderMapperSize = 1;
and execute org.elasticsearch.index.mapper.DynamicMappingIT#testIgnoreDynamicBeyondLimitSingleMultiField
, it will trigger this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're referring to the situation when the retry_on_conflict parameter is set, right?
No, I am referring to a case where an update results in a mapping update and thus a retry afterwards. That could potentially be a noop mapping update (due to concurrency). The retried update could then result in new dynamic fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is safe towards such updates in the current form, since:
- Any retry will at least use a mapping corresponding to the last mapping version, thus only adding fields missing there.
- A noop result should thus not be possible.
I think #96235 does not change this, but the argument that we'll not run into it becomes more subtle.
A worry is that an update will fail with this error if it first adds a field f (noop detected) and then another field g (that is also a noop, towards same version). That seems impossible now, but hard to verify.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
About the potential discrepancy between what the mapper results in and what the document parser holds, I suggest to add testing to verify this as well as some assertion to ensure they are aligned at relevant times. That belongs to the other PR though (perhaps we can have the size tracking as a separate PR?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what #102885 is about. It significantly simplifies the calculation of how a mapper counts against the field limit and aligns the implementations as much as possible (Mapper.Builder#getTotalFieldsCount
, Mapper#getTotalFieldsCount
, MappingLookup#getTotalFieldsCount
, and MappingLookup#checkFieldLimit
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've also added an assertion here: befd7f6
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Left a few items I'd like to see addressed.
*/ | ||
public void resetForNoopMappingUpdateRetry(long mappingVersion) { | ||
assert assertInvariants(ItemProcessingState.TRANSLATED); | ||
if (noopMappingUpdateRetryForMappingVersion == mappingVersion) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is safe towards such updates in the current form, since:
- Any retry will at least use a mapping corresponding to the last mapping version, thus only adding fields missing there.
- A noop result should thus not be possible.
I think #96235 does not change this, but the argument that we'll not run into it becomes more subtle.
A worry is that an update will fail with this error if it first adds a field f (noop detected) and then another field g (that is also a noop, towards same version). That seems impossible now, but hard to verify.
server/src/main/java/org/elasticsearch/action/bulk/BulkPrimaryExecutionContext.java
Outdated
Show resolved
Hide resolved
*/ | ||
public void resetForNoopMappingUpdateRetry(long mappingVersion) { | ||
assert assertInvariants(ItemProcessingState.TRANSLATED); | ||
if (noopMappingUpdateRetryForMappingVersion == mappingVersion) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
About the potential discrepancy between what the mapper results in and what the document parser holds, I suggest to add testing to verify this as well as some assertion to ensure they are aligned at relevant times. That belongs to the other PR though (perhaps we can have the size tracking as a separate PR?).
server/src/test/java/org/elasticsearch/action/bulk/TransportShardBulkActionTests.java
Show resolved
Hide resolved
) Adds a new `index.mapping.total_fields.ignore_dynamic_beyond_limit` index setting. When set to `true`, new fields are added to the mapping as long as the field limit (`index.mapping.total_fields.limit`) is not exceeded. Fields that would exceed the limit are not added to the mapping, similar to `dynamic: false`. Ignored fields are added to the `_ignored` metadata field. Relates to #89911 To make this easier to review, this is split into the following PRs: - [x] #102915 - [x] #102936 - [x] #104769 Related but not a prerequisite: - [ ] #102885
In an effort to make #96235 smaller, I've extracted the changes to the bulk execution logic.
This PR does two things: