-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(bitswap/client/msgq): prevent duplicate requests #691
Conversation
if mq.bcstWants.sent.Len() == 0 && mq.peerWants.sent.Len() == 0 { | ||
return false | ||
mq.rebroadcastIntervalLk.RLock() | ||
rebroadcastInterval := mq.rebroadcastInterval |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, this could be a different new parameter/constant
I tested this on a k8s cluster and with a local node connected to it. It works as expected, but I believe this would benefit a lot from a proper test. Unfortunately, I can't allocate time to writing one. It's not that straightforward. |
For context, I detect duplicates with a custom multihash that logs out when the same data is hashed again. This essentially uncovered #690, and this issue |
d193c2f
to
9020b71
Compare
Codecov ReportAttention: Patch coverage is
@@ Coverage Diff @@
## main #691 +/- ##
==========================================
+ Coverage 60.36% 60.39% +0.03%
==========================================
Files 244 244
Lines 31034 31044 +10
==========================================
+ Hits 18734 18750 +16
+ Misses 10630 10626 -4
+ Partials 1670 1668 -2
|
if mq.bcstWants.sent.Len() == 0 && mq.peerWants.sent.Len() == 0 { | ||
return false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably good to leave since it avoids Lock/Unlock of mq.rebroadcastIntervalLk
and time.Now()
.
if mq.bcstWants.sent.Len() == 0 && mq.peerWants.sent.Len() == 0 {
return 0
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The lock exists only for testing. The interval is never changed outside of the unit test. Thus, I don't see any contention zero length check could prevent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the comment is not about contention but about saving unnecessary lock/unlock calls, but if this only happens every 30 seconds, then it's probably not very important.
triage note: This is a good candidate for testing in rainbow staging to observe performance differences. |
9020b71
to
5dc309b
Compare
Previously, in-progress requests could be re-requested again during periodic rebroadcast. The queue requests, and while awaiting response, the rebroadcast event happens. Rebroadcast event changes previosly sent WANTs to pending and sends them again in a new message. The solution here is to ensure WANT was in sent status for long enough, before bringing it back to pending. This utilizes existing `sendAt` map which tracks when every CID was sent.
5dc309b
to
993c48c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main thing to consider here is that:
- before, a "want" would be re-broadcasted at most 30 seconds after it was sent (could be 0.1s)
- after, a "want" would be re-broadcasted only after at least 30 seconds after it was sent (could be 59.9s).
In that respect the code looks good.
I am not sure how much of an improvement this is in practice (perhaps clients were lucky to hit a short rebroadcast period sometimes), but it makes clients more respectful at least and perf should not be based on "luck".
I think we can test on staging and discuss in the next triage if we accept the change.
if mq.bcstWants.sent.Len() == 0 && mq.peerWants.sent.Len() == 0 { | ||
return false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the comment is not about contention but about saving unnecessary lock/unlock calls, but if this only happens every 30 seconds, then it's probably not very important.
Need to test on staging before merge. |
This PR does make sure that the client does not resend wants to any peer before the rebroadcast interval has elapsed. In doing this is also makes some peers, that were just short of the interval, wait for another rebroadcast interval. In summary it changes from "wait no more than X to resend wants" to "wait at least X, but no more than 2X, to resend wants". If we want the PR, then we should consider calling Consider changing line 410 to const checksPerInterval = 2
mq.rebroadcastTimer = mq.clock.Timer(mq.rebroadcastInterval / checksPerInterval) That will change the logic to "wait at least X, but no more than X+(X/checksPerInterval), to resend wants |
Previously, in-progress requests could be re-requested again during periodic rebroadcast. The queue requests, and while awaiting a response, the rebroadcast event happens. Rebroadcast event changes previously sent WANTs to pending and sends them again in a new message, duplicating some WANT requests.
The solution here is to ensure WANT was in sent status for long enough before bringing it back to pending. This utilizes the existing
sendAt
map, which tracks when every CID was sent. Then, on every event, it compares if the message was around longer thanrebroadcastInterval