-
-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
prevent wantmanager from leaking goroutines (and memory) #1356
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -137,36 +137,49 @@ func (mq *msgQueue) runQueue(ctx context.Context) { | |
for { | ||
select { | ||
case <-mq.work: // there is work to be done | ||
|
||
err := mq.network.ConnectTo(ctx, mq.p) | ||
if err != nil { | ||
log.Noticef("cant connect to peer %s: %s", mq.p, err) | ||
// TODO: cant connect, what now? | ||
continue | ||
} | ||
|
||
// grab outgoing message | ||
mq.outlk.Lock() | ||
wlm := mq.out | ||
if wlm == nil || wlm.Empty() { | ||
mq.outlk.Unlock() | ||
continue | ||
} | ||
mq.out = nil | ||
mq.outlk.Unlock() | ||
|
||
// send wantlist updates | ||
err = mq.network.SendMessage(ctx, mq.p, wlm) | ||
if err != nil { | ||
log.Noticef("bitswap send error: %s", err) | ||
// TODO: what do we do if this fails? | ||
} | ||
mq.doWork(ctx) | ||
case <-mq.done: | ||
return | ||
} | ||
} | ||
} | ||
|
||
func (mq *msgQueue) doWork(ctx context.Context) { | ||
// allow ten minutes for connections | ||
// this includes looking them up in the dht | ||
// dialing them, and handshaking | ||
conctx, cancel := context.WithTimeout(ctx, time.Minute*10) | ||
defer cancel() | ||
|
||
err := mq.network.ConnectTo(conctx, mq.p) | ||
if err != nil { | ||
log.Noticef("cant connect to peer %s: %s", mq.p, err) | ||
// TODO: cant connect, what now? | ||
return | ||
} | ||
|
||
// grab outgoing message | ||
mq.outlk.Lock() | ||
wlm := mq.out | ||
if wlm == nil || wlm.Empty() { | ||
mq.outlk.Unlock() | ||
return | ||
} | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this part is different than the original source-- the mq.out = nil now gets executed every time. this means that if:
is true, then There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. hrm, fair. i can put in:
instead of blindly setting it nil. Setting it nil here doesnt change the behaviour, not nil'ing it out was just an allocation optimization. |
||
mq.out = nil | ||
mq.outlk.Unlock() | ||
|
||
sendctx, cancel := context.WithTimeout(ctx, time.Minute*5) | ||
defer cancel() | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. maybe 5min? im not sure there should be a timeout on this one-- wont the underlying connection just break and unroll all the way up here? |
||
|
||
// send wantlist updates | ||
err = mq.network.SendMessage(sendctx, mq.p, wlm) | ||
if err != nil { | ||
log.Noticef("bitswap send error: %s", err) | ||
// TODO: what do we do if this fails? | ||
return | ||
} | ||
} | ||
|
||
func (pm *WantManager) Connected(p peer.ID) { | ||
pm.connect <- p | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
understood that it's important to have a timeout here, but 1min is way too short. 15-30min is more ok. these timeouts affect people in the worst of situations, where a pageload could take >5min. It's awful, yes, but that's the rest of the world. We must work there too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
alright, ten minutes then? tcp times out at 5.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On Thu, Jun 11, 2015 at 01:26:40PM -0700, Jeromy Johnson wrote:
Do we need explicit sub-part timeouts at all? Can't the caller
specify their own timeout when they submit a task? Looking into the
context handling here, runQueue gets launched with one context, and it
passes that context down to doWork. But maybe the doWork context
should be a union of the runQueue context and a second context that
gets passed in via the channel? With a check in runQueue to see if
any context-related doWork errors were due to a msgQueue-context
timeout/cancel (in which msgQueue should close down) or the
channel-passed context (in which case pass that error back to the
caller?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could if all the callers/roots had timeouts enforced somehow. we dont do this perfectly yet.
could use https://github.com/jbenet/go-context/blob/master/ext/dagctx.go for this.