fix(bitswap/client/msgq): prevent duplicate requests #691

Wondertan · 2024-10-17T18:42:50Z

Previously, in-progress requests could be re-requested again during periodic rebroadcast. The queue requests, and while awaiting a response, the rebroadcast event happens. Rebroadcast event changes previously sent WANTs to pending and sends them again in a new message, duplicating some WANT requests.

The solution here is to ensure WANT was in sent status for long enough before bringing it back to pending. This utilizes the existing sendAt map, which tracks when every CID was sent. Then, on every event, it compares if the message was around longer than rebroadcastInterval

Wondertan · 2024-10-17T18:44:00Z

bitswap/client/internal/messagequeue/messagequeue.go

-	if mq.bcstWants.sent.Len() == 0 && mq.peerWants.sent.Len() == 0 {
-		return false
+	mq.rebroadcastIntervalLk.RLock()
+	rebroadcastInterval := mq.rebroadcastInterval


Alternatively, this could be a different new parameter/constant

Wondertan · 2024-10-17T18:45:45Z

I tested this on a k8s cluster and with a local node connected to it. It works as expected, but I believe this would benefit a lot from a proper test. Unfortunately, I can't allocate time to writing one. It's not that straightforward.

Wondertan · 2024-10-17T20:11:01Z

For context, I detect duplicates with a custom multihash that logs out when the same data is hashed again. This essentially uncovered #690, and this issue

codecov · 2024-10-19T23:23:33Z

Codecov Report

Attention: Patch coverage is 96.00000% with 1 line in your changes missing coverage. Please review.

Project coverage is 60.39%. Comparing base (37756ce) to head (d67691d).
Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
...tswap/client/internal/messagequeue/messagequeue.go	96.00%	1 Missing ⚠️

@@            Coverage Diff             @@
##             main     #691      +/-   ##
==========================================
+ Coverage   60.36%   60.39%   +0.03%     
==========================================
  Files         244      244              
  Lines       31034    31044      +10     
==========================================
+ Hits        18734    18750      +16     
+ Misses      10630    10626       -4     
+ Partials     1670     1668       -2

Files with missing lines	Coverage Δ
bitswap/client/wantlist/wantlist.go	`90.90% <ø> (-0.88%)`	⬇️
...tswap/client/internal/messagequeue/messagequeue.go	`84.06% <96.00%> (+0.53%)`	⬆️

... and 13 files with indirect coverage changes

bitswap/client/internal/messagequeue/messagequeue.go

gammazero · 2024-10-29T02:25:12Z

bitswap/client/internal/messagequeue/messagequeue.go

-	if mq.bcstWants.sent.Len() == 0 && mq.peerWants.sent.Len() == 0 {
-		return false


This is probably good to leave since it avoids Lock/Unlock of mq.rebroadcastIntervalLk and time.Now().

if mq.bcstWants.sent.Len() == 0 && mq.peerWants.sent.Len() == 0 { return 0 }

The lock exists only for testing. The interval is never changed outside of the unit test. Thus, I don't see any contention zero length check could prevent.

I think the comment is not about contention but about saving unnecessary lock/unlock calls, but if this only happens every 30 seconds, then it's probably not very important.

gammazero · 2024-10-29T17:38:20Z

triage note: This is a good candidate for testing in rainbow staging to observe performance differences.

Previously, in-progress requests could be re-requested again during periodic rebroadcast. The queue requests, and while awaiting response, the rebroadcast event happens. Rebroadcast event changes previosly sent WANTs to pending and sends them again in a new message. The solution here is to ensure WANT was in sent status for long enough, before bringing it back to pending. This utilizes existing `sendAt` map which tracks when every CID was sent.

bitswap/client/wantlist/wantlist.go

hsanjuan

The main thing to consider here is that:

before, a "want" would be re-broadcasted at most 30 seconds after it was sent (could be 0.1s)
after, a "want" would be re-broadcasted only after at least 30 seconds after it was sent (could be 59.9s).

In that respect the code looks good.

I am not sure how much of an improvement this is in practice (perhaps clients were lucky to hit a short rebroadcast period sometimes), but it makes clients more respectful at least and perf should not be based on "luck".

I think we can test on staging and discuss in the next triage if we accept the change.

hsanjuan · 2024-11-13T16:39:38Z

bitswap/client/internal/messagequeue/messagequeue.go

-	if mq.bcstWants.sent.Len() == 0 && mq.peerWants.sent.Len() == 0 {
-		return false


I think the comment is not about contention but about saving unnecessary lock/unlock calls, but if this only happens every 30 seconds, then it's probably not very important.

gammazero · 2024-11-19T17:36:43Z

Need to test on staging before merge.

gammazero · 2024-11-25T15:25:11Z

This PR does make sure that the client does not resend wants to any peer before the rebroadcast interval has elapsed. In doing this is also makes some peers, that were just short of the interval, wait for another rebroadcast interval.

In summary it changes from "wait no more than X to resend wants" to "wait at least X, but no more than 2X, to resend wants".

If we want the PR, then we should consider calling rebroadcastWantlist at half (or less) of the rebroadcast interval. @hsanjuan WDYT?

Consider changing line 410 to

const checksPerInterval = 2
mq.rebroadcastTimer = mq.clock.Timer(mq.rebroadcastInterval / checksPerInterval)

That will change the logic to "wait at least X, but no more than X+(X/checksPerInterval), to resend wants

Wondertan requested a review from a team as a code owner October 17, 2024 18:42

Wondertan commented Oct 17, 2024

View reviewed changes

Wondertan force-pushed the message-queue-duplicates branch 3 times, most recently from d193c2f to 9020b71 Compare October 19, 2024 23:20

lidel added the need/triage Needs initial labeling and prioritization label Oct 22, 2024

gammazero added need/analysis Needs further analysis before proceeding need/maintainers-input Needs input from the current maintainer(s) and removed need/triage Needs initial labeling and prioritization labels Oct 22, 2024

gammazero reviewed Oct 28, 2024

View reviewed changes

bitswap/client/internal/messagequeue/messagequeue.go Outdated Show resolved Hide resolved

gammazero reviewed Oct 29, 2024

View reviewed changes

bitswap/client/internal/messagequeue/messagequeue.go Outdated Show resolved Hide resolved

gammazero reviewed Oct 29, 2024

View reviewed changes

gammazero added status/blocked Unable to be worked further until needs are met need/author-input Needs input from the original author and removed need/maintainers-input Needs input from the current maintainer(s) labels Oct 29, 2024

Wondertan force-pushed the message-queue-duplicates branch from 9020b71 to 5dc309b Compare October 29, 2024 19:17

Wondertan force-pushed the message-queue-duplicates branch from 5dc309b to 993c48c Compare October 29, 2024 19:29

delete absorb as unused

a61d89f

Wondertan commented Oct 29, 2024

View reviewed changes

bitswap/client/wantlist/wantlist.go Show resolved Hide resolved

lidel requested a review from hsanjuan November 12, 2024 17:35

hsanjuan approved these changes Nov 13, 2024

View reviewed changes

gammazero added need/maintainers-input Needs input from the current maintainer(s) and removed need/analysis Needs further analysis before proceeding need/author-input Needs input from the original author status/blocked Unable to be worked further until needs are met labels Nov 19, 2024

gammazero self-assigned this Nov 25, 2024

gammazero added 4 commits November 25, 2024 10:06

Attempt rebroadcast more frequently than once per rebroadcast interval

8c81036

update changelog

72dc219

Merge branch 'main' into message-queue-duplicates

dbee11c

Merge branch 'main' into message-queue-duplicates

d67691d

gammazero merged commit e2d2f36 into ipfs:main Nov 25, 2024
13 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(bitswap/client/msgq): prevent duplicate requests #691

fix(bitswap/client/msgq): prevent duplicate requests #691

Wondertan commented Oct 17, 2024 •

edited

Loading

Wondertan Oct 17, 2024 •

edited

Loading

Wondertan commented Oct 17, 2024

Wondertan commented Oct 17, 2024 •

edited

Loading

codecov bot commented Oct 19, 2024 •

edited

Loading

gammazero Oct 29, 2024

Wondertan Oct 29, 2024

hsanjuan Nov 13, 2024

gammazero commented Oct 29, 2024

hsanjuan left a comment

hsanjuan Nov 13, 2024

gammazero commented Nov 19, 2024

gammazero commented Nov 25, 2024 •

edited

Loading

		if mq.bcstWants.sent.Len() == 0 && mq.peerWants.sent.Len() == 0 {
		return false

fix(bitswap/client/msgq): prevent duplicate requests #691

fix(bitswap/client/msgq): prevent duplicate requests #691

Conversation

Wondertan commented Oct 17, 2024 • edited Loading

Wondertan Oct 17, 2024 • edited Loading

Choose a reason for hiding this comment

Wondertan commented Oct 17, 2024

Wondertan commented Oct 17, 2024 • edited Loading

codecov bot commented Oct 19, 2024 • edited Loading

Codecov Report

gammazero Oct 29, 2024

Choose a reason for hiding this comment

Wondertan Oct 29, 2024

Choose a reason for hiding this comment

hsanjuan Nov 13, 2024

Choose a reason for hiding this comment

gammazero commented Oct 29, 2024

hsanjuan left a comment

Choose a reason for hiding this comment

hsanjuan Nov 13, 2024

Choose a reason for hiding this comment

gammazero commented Nov 19, 2024

gammazero commented Nov 25, 2024 • edited Loading

Wondertan commented Oct 17, 2024 •

edited

Loading

Wondertan Oct 17, 2024 •

edited

Loading

Wondertan commented Oct 17, 2024 •

edited

Loading

codecov bot commented Oct 19, 2024 •

edited

Loading

gammazero commented Nov 25, 2024 •

edited

Loading