-
Notifications
You must be signed in to change notification settings - Fork 16
Hard Limit the number of hop stream goroutines #74
Conversation
An alternative to hard resetting is to add a new error code for overloaded relays. |
Resetting the stream won't work, the error will not be propagated further. |
Implemented the |
I reverted to resetting the stream when the hop limit is exceeded -- being nice has deleterious effects in the number of lingering goroutines. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds like something we need although I'm a bit worried we should be using per-peer limits instead. At the moment, ~500 peers could fully mesh-connect through the relay to kill it.
relay.go
Outdated
lhCount uint64 | ||
lhLk sync.Mutex | ||
// atomic counters | ||
sCount int32 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: could we give these full names? (streamCount
?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, will do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
relay.go
Outdated
@@ -29,6 +30,9 @@ var ( | |||
RelayAcceptTimeout = 10 * time.Second | |||
HopConnectTimeout = 30 * time.Second | |||
StopHandshakeTimeout = 1 * time.Minute | |||
|
|||
HopStreamBuffer = 4096 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ultra nit: HopStreamBufferSize
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
pb/relay.proto
Outdated
@@ -21,6 +21,7 @@ message CircuitRelay { | |||
STOP_DST_MULTIADDR_INVALID = 351; | |||
STOP_RELAY_REFUSED = 390; | |||
MALFORMED_MESSAGE = 400; | |||
RELAY_OVERLOADED = 500; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we going to use this or should we just leave it at a reset?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will drop it in the rebase/squash.
relay.go
Outdated
@@ -29,6 +30,9 @@ var ( | |||
RelayAcceptTimeout = 10 * time.Second | |||
HopConnectTimeout = 30 * time.Second | |||
StopHandshakeTimeout = 1 * time.Minute | |||
|
|||
HopStreamBuffer = 4096 | |||
HopStreamLimit = 1 << 18 // 256K hops for 512K goroutines |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have we checked this against our current numbers? This seems kind of low, actually. For 20k peers, this'll give us less than 10 streams per peer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can easily double it -- in fact the mplex relay where I am testing this is running with double the count (it overrides in the daemon init).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll try with quadruple the count (hop limit at 1M) and evaluate memory usage with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like we are tight on memory with 2M goroutines active.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I doubled the default, and actual daemons can set it higher if they have the resources.
Per-peer limits are a little complex to implement, and need a lock (which I would like to avoid). |
I'm primarily worried about someone attacking a relay this way but this certainly isn't the only way. |
rebased/squashed to just 2 commits and dropped the |
This adds a hard limit to the number of hop goroutines, so that relays don't get overloaded.
Note that the live hop tracking has been removed for two reasons:
Note 2: DO NOT MERGE AS IS; I will rebase/squash before merging to clean up history.