peer_store: Warn only on reputation crossing the ban threshold #4000

lexnv · 2024-04-05T13:31:44Z

This PR changes the warning behavior of the peer_store component.

We have encountered a large number of warnings coming from peer reputation reports.
Previously, warnings were generated on all reports that were below the threshold.

This behavior also generates warnings for correct behavior (positive) reputation changes, if after adding the reputation change, the reputation is still below the threshold.
In this PR, the warning is printed only when crossing the threshold from being not banned to being banned.

cc @paritytech/networking

Signed-off-by: Alexandru Vasile <[email protected]>

dmitry-markin · 2024-04-05T13:44:10Z

That's strange that we receive to many warnings for banned peers. They should have been disconnected in the first place, aren't they?

lexnv · 2024-04-05T14:01:55Z

Indeed, it seems that the peer is still kept around after being "banned" and disconnected here:

polkadot-sdk/substrate/client/network/src/protocol_controller.rs

Lines 209 to 213 in bf424fd

    
           	/// Disconnect peer. You should remove the `PeerId` from the `PeerStore` first 
        
           	/// to not connect to the peer again during the next slot allocation. 
        
           	pub fn disconnect_peer(&self, peer_id: PeerId) { 
        
           		let _ = self.actions_tx.unbounded_send(Action::DisconnectPeer(peer_id)); 
        
           	}

From this comment, it seems like we should remove the peer from the peerstore first.

Although, the ongoing peer candidates are filtered by their reputation here:

polkadot-sdk/substrate/client/network/src/peer_store.rs

Lines 298 to 300 in bf424fd

    
           .filter_map(|(peer_id, info)| { 
        
           	(!info.is_banned() && !ignored.contains(peer_id)).then_some((*peer_id, *info)) 
        
           })

Thinking out loud, it may be possible that Action::DisconnectPeer does not disconnect the peers from the protocols entirely

sandreim · 2024-04-05T14:05:02Z

That's strange that we receive to many warnings for banned peers. They should have been disconnected in the first place, aren't they?

One reason is that there might be multiple messages in the queues of different consumers, which leads to calling report_peer called even after we disconnected the offender.

dmitry-markin · 2024-04-05T14:14:54Z

Probably makes sense double-checking that we disconnect peers as stated before merging this PR that might hide the issue.

dmitry-markin · 2024-04-05T14:25:37Z

@lexnv How bad is the situation without this fix? Is it like a hundred of messages is printed once with the same peer id, or they are being printed constantly? (This might clue on the root cause.)

lexnv · 2024-04-08T16:39:49Z

Coming back to this, I've placed more data in the linked PR:

the misbehaving client that was causing these logs is misbehaving in intervals of 10 seconds or so (which is in-line with the findings from that PR)
sometimes we see multiple logs in the same time interval (I think Andrei pointed it out that we might batch multiple calls)

I think we can safely merge this and rely on #4031 to increase the time a peer remains banned (from 10s to around 69secs)

dmitry-markin · 2024-04-09T08:03:48Z

Coming back to this, I've placed more data in the linked PR:
* the misbehaving client that was causing these logs is misbehaving in intervals of 10 seconds or so (which is in-line with the findings from that PR)

* sometimes we see multiple logs in the same time interval (I think Andrei pointed it out that we might batch multiple calls)
I think we can safely merge this and rely on #4031 to increase the time a peer remains banned (from 10s to around 69secs)

May be merging #4031 is enough then? I worry that the change from this PR can mask some reputation/misbehaving/banning issues if merged.

lexnv · 2024-04-09T11:26:36Z

Yep that makes sense! I've hit merge on: #4031, I also think it will be sufficient here.

We could still get a few more warnings from batching the reputation propagation, but generally, I think increasing the ban time should reduce the noise.

In the meanwhile, I ll close this PR and re-open if necessary, thanks!

) This is a tiny PR to increase the time a peer remains banned. A peer is banned when the reputation drops below a threshold. With every second, the peer reputation is exponentially decayed towards zero. For the previous setup: - decaying to zero from (i32::MAX or i32::MIN) would take 948 seconds (15mins 48seconds) - from i32::MIN to escaping the banned threshold would take 10 seconds This means we are decaying reputation a bit too aggressive and misbehaving peers can misbehave again in 10 seconds. Another side effect of this is that we have encountered multiple warnings caused by a few misbehaving peers. In the new setup: - decaying to zero from (i32::MAX or i32::MIN) would take 3544 seconds (59 minutes) - from i32::MIN to escaping the banned threshold would take ~69 seconds This is a followup of: - #4000. ### Testing Done - Created a misbehaving client with [subp2p-explorer](https://github.com/lexnv/subp2p-explorer), the client is banned for approx 69seconds until it is allowed to connect again. cc @paritytech/networking --------- Signed-off-by: Alexandru Vasile <[email protected]>

lexnv added 3 commits April 4, 2024 19:27

peer_store: Ban peers only once for reputation changes

404e48c

Signed-off-by: Alexandru Vasile <[email protected]>

peer_store: Simplify to include only logs in the ban threshold

b4ff3ea

Signed-off-by: Alexandru Vasile <[email protected]>

peer_store: Print trage log if not printed warning

bf424fd

Signed-off-by: Alexandru Vasile <[email protected]>

lexnv requested review from dmitry-markin and alexggh April 5, 2024 13:31

lexnv self-assigned this Apr 5, 2024

alexggh approved these changes Apr 5, 2024

View reviewed changes

dmitry-markin approved these changes Apr 5, 2024

View reviewed changes

lexnv mentioned this pull request Apr 8, 2024

peer_store: Increase peer ban time until escapes banned threshold #4031

Merged

lexnv closed this Apr 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

peer_store: Warn only on reputation crossing the ban threshold #4000

peer_store: Warn only on reputation crossing the ban threshold #4000

lexnv commented Apr 5, 2024

dmitry-markin commented Apr 5, 2024

lexnv commented Apr 5, 2024

sandreim commented Apr 5, 2024 •

edited

Loading

dmitry-markin commented Apr 5, 2024

dmitry-markin commented Apr 5, 2024 •

edited

Loading

lexnv commented Apr 8, 2024

dmitry-markin commented Apr 9, 2024

lexnv commented Apr 9, 2024

peer_store: Warn only on reputation crossing the ban threshold #4000

peer_store: Warn only on reputation crossing the ban threshold #4000

Conversation

lexnv commented Apr 5, 2024

dmitry-markin commented Apr 5, 2024

lexnv commented Apr 5, 2024

sandreim commented Apr 5, 2024 • edited Loading

dmitry-markin commented Apr 5, 2024

dmitry-markin commented Apr 5, 2024 • edited Loading

lexnv commented Apr 8, 2024

dmitry-markin commented Apr 9, 2024

lexnv commented Apr 9, 2024

sandreim commented Apr 5, 2024 •

edited

Loading

dmitry-markin commented Apr 5, 2024 •

edited

Loading