-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ShardFollowNodeTask fetch operations twice #32453
Comments
Pinging @elastic/es-distributed |
Today ShardFollowNodeTask might fetch some operations more than once. This happens because we ask the leading for up to max_batch_count operations (instead of the left-over size) for the left-over request. The leading then can freely respond up to the max_batch_count, and at the same time, if one of the previous requests completed, we might issue another read request whose range overlaps with the response of the left-over request. Closes elastic#32453
Good catch @dnhatn. This feels like a bug to me? we should not get ops we didn't ask for?
|
Also, I'm not sure I follow how this explains the deletes, can you clarify? if the primary processes the same ops twice, but ignores the associated seq# and issues it's own (and replicates it) how does it explain that the primary has no deletes but the replica does? |
@bleskes The primary uses version numbers to resolve the indexing plan and rejects the duplicate operations; Whereas the replica uses seq# to resolve the indexing plan, and indexes the duplicate operations as stale documents. Does it sound reasonable to you? |
Thx @dnhatn I forgot that we don't use the bulk shard request and that exceptions on the primary don't map to lack of replications. It does. Thanks! |
1 similar comment
Thx @dnhatn I forgot that we don't use the bulk shard request and that exceptions on the primary don't map to lack of replications. It does. Thanks! |
Today ShardFollowNodeTask might fetch some operations more than once. This happens because we ask the leading for up to max_batch_count operations (instead of the left-over size) for the left-over request. The leading then can freely respond up to the max_batch_count, and at the same time, if one of the previous requests completed, we might issue another read request whose range overlaps with the response of the left-over request. Closes #32453
Fixed in #32455 |
Today ShardFollowNodeTask might fetch some operations more than once. This happens because we ask the leading for up to max_batch_count operations (instead of the left-over size) for the left-over request. The leading then can freely respond up to the max_batch_count, and at the same time, if one of the previous requests completed, we might issue another read request whose range overlaps with the response of the left-over request. Closes #32453
@dnhatn Great catch! 🎉 |
Since #31581, ShardFollowNodeTask may fetch some range twice. The following log indicates that we fetched the range [1680 to 2024] twice.
This bug and if the follower is not using the FollowingEngine (PR #32448) can explain why we have many deletes on the replicas of the follower (but not on the primaries of the follower).
The text was updated successfully, but these errors were encountered: