Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

peer_store: Warn only on reputation crossing the ban threshold #4000

Closed
wants to merge 3 commits into from

Conversation

lexnv
Copy link
Contributor

@lexnv lexnv commented Apr 5, 2024

This PR changes the warning behavior of the peer_store component.

We have encountered a large number of warnings coming from peer reputation reports.
Previously, warnings were generated on all reports that were below the threshold.

This behavior also generates warnings for correct behavior (positive) reputation changes, if after adding the reputation change, the reputation is still below the threshold.
In this PR, the warning is printed only when crossing the threshold from being not banned to being banned.

cc @paritytech/networking

@lexnv lexnv added A1-insubstantial Pull request requires no code review (e.g., a sub-repository hash update). R0-silent Changes should not be mentioned in any release notes I5-enhancement An additional feature request. D0-easy Can be fixed primarily by duplicating and adapting code by an intermediate coder. labels Apr 5, 2024
@lexnv lexnv requested review from dmitry-markin and alexggh April 5, 2024 13:31
@lexnv lexnv self-assigned this Apr 5, 2024
@dmitry-markin
Copy link
Contributor

That's strange that we receive to many warnings for banned peers. They should have been disconnected in the first place, aren't they?

@lexnv
Copy link
Contributor Author

lexnv commented Apr 5, 2024

Indeed, it seems that the peer is still kept around after being "banned" and disconnected here:

/// Disconnect peer. You should remove the `PeerId` from the `PeerStore` first
/// to not connect to the peer again during the next slot allocation.
pub fn disconnect_peer(&self, peer_id: PeerId) {
let _ = self.actions_tx.unbounded_send(Action::DisconnectPeer(peer_id));
}

From this comment, it seems like we should remove the peer from the peerstore first.

Although, the ongoing peer candidates are filtered by their reputation here:

.filter_map(|(peer_id, info)| {
(!info.is_banned() && !ignored.contains(peer_id)).then_some((*peer_id, *info))
})

Thinking out loud, it may be possible that Action::DisconnectPeer does not disconnect the peers from the protocols entirely

@sandreim
Copy link
Contributor

sandreim commented Apr 5, 2024

That's strange that we receive to many warnings for banned peers. They should have been disconnected in the first place, aren't they?

One reason is that there might be multiple messages in the queues of different consumers, which leads to calling report_peer called even after we disconnected the offender.

@dmitry-markin
Copy link
Contributor

Probably makes sense double-checking that we disconnect peers as stated before merging this PR that might hide the issue.

@dmitry-markin
Copy link
Contributor

dmitry-markin commented Apr 5, 2024

@lexnv How bad is the situation without this fix? Is it like a hundred of messages is printed once with the same peer id, or they are being printed constantly? (This might clue on the root cause.)

@lexnv
Copy link
Contributor Author

lexnv commented Apr 8, 2024

Coming back to this, I've placed more data in the linked PR:

  • the misbehaving client that was causing these logs is misbehaving in intervals of 10 seconds or so (which is in-line with the findings from that PR)
  • sometimes we see multiple logs in the same time interval (I think Andrei pointed it out that we might batch multiple calls)

I think we can safely merge this and rely on #4031 to increase the time a peer remains banned (from 10s to around 69secs)

@dmitry-markin
Copy link
Contributor

Coming back to this, I've placed more data in the linked PR:

* the misbehaving client that was causing these logs is misbehaving in intervals of 10 seconds or so (which is in-line with the findings from that PR)

* sometimes we see multiple logs in the same time interval (I think Andrei pointed it out that we might batch multiple calls)

I think we can safely merge this and rely on #4031 to increase the time a peer remains banned (from 10s to around 69secs)

May be merging #4031 is enough then? I worry that the change from this PR can mask some reputation/misbehaving/banning issues if merged.

@lexnv
Copy link
Contributor Author

lexnv commented Apr 9, 2024

Yep that makes sense! I've hit merge on: #4031, I also think it will be sufficient here.

We could still get a few more warnings from batching the reputation propagation, but generally, I think increasing the ban time should reduce the noise.

In the meanwhile, I ll close this PR and re-open if necessary, thanks!

@lexnv lexnv closed this Apr 9, 2024
github-merge-queue bot pushed a commit that referenced this pull request Apr 9, 2024
)

This is a tiny PR to increase the time a peer remains banned.

A peer is banned when the reputation drops below a threshold.
With every second, the peer reputation is exponentially decayed towards
zero.

For the previous setup:
- decaying to zero from (i32::MAX or i32::MIN) would take 948 seconds
(15mins 48seconds)
- from i32::MIN to escaping the banned threshold would take 10 seconds
This means we are decaying reputation a bit too aggressive and
misbehaving peers can misbehave again in 10 seconds.
Another side effect of this is that we have encountered multiple
warnings caused by a few misbehaving peers.

In the new setup:
- decaying to zero from (i32::MAX or i32::MIN) would take 3544 seconds
(59 minutes)
- from i32::MIN to escaping the banned threshold would take ~69 seconds

This is a followup of:
- #4000.

### Testing Done
- Created a misbehaving client with
[subp2p-explorer](https://github.com/lexnv/subp2p-explorer), the client
is banned for approx 69seconds until it is allowed to connect again.

cc @paritytech/networking

---------

Signed-off-by: Alexandru Vasile <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A1-insubstantial Pull request requires no code review (e.g., a sub-repository hash update). D0-easy Can be fixed primarily by duplicating and adapting code by an intermediate coder. I5-enhancement An additional feature request. R0-silent Changes should not be mentioned in any release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants