Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backoff deadlock with low number of threads #27

Closed
mratsim opened this issue Dec 4, 2019 · 1 comment
Closed

Backoff deadlock with low number of threads #27

mratsim opened this issue Dec 4, 2019 · 1 comment
Labels
bug 🪲 Something isn't working

Comments

@mratsim
Copy link
Owner

mratsim commented Dec 4, 2019

There seems to be a race condition here:

weave/weave/thieves.nim

Lines 187 to 204 in 42b0f80

proc lastStealAttemptFailure*(req: sink StealRequest) =
## If it's the last theft attempt per emitted steal requests
## - if we are the lead thread, we know that every other threads are idle/waiting for work
## but there is none --> termination
## - if we are a worker thread, we message our parent and
## passively wait for it to send us work or tell us to shutdown.
if myID() == LeaderID:
detectTermination()
forget(req)
else:
req.state = Waiting
debugTermination:
log("Worker %2d: sends state passively WAITING to its parent worker %d\n", myID(), myWorker().parent)
sendShare(req)
ascertain: not myWorker().isWaiting
myWorker().isWaiting = true
myParking().wait() # Thread is blocked here until woken up.

The child sends the steal request and then goes idle

But what if the parent checks the steal request and sends a signal to wakeup to the child before it has time to actually sleep, for example to signal termination:

weave/weave/signals.nim

Lines 42 to 56 in 42b0f80

proc signalTerminate*(_: pointer) =
preCondition: not localCtx.signaledTerminate
# 1. Terminating means everyone ran out of tasks
# so their cache for task channels should be full
# if there were sufficiently more tasks than workers
# 2. Since they have an unique parent, no one else sent them a signal (checked in asyncSignal)
if myWorker().left != Not_a_worker:
# Send the terminate signal
asyncSignal(signalTerminate, globalCtx.com.tasks[myWorker().left].access(0))
# Wake the worker up so that it can process the terminate signal
wakeup(myWorker().left)
if myWorker().right != Not_a_worker:
asyncSignal(signalTerminate, globalCtx.com.tasks[myWorker().right].access(0))
wakeup(myWorker().right)

The parent exits, is deadlocked at the exit barrier and the child worker is deadlocked sleeping forever.

@mratsim mratsim added the bug 🪲 Something isn't working label Dec 4, 2019
@mratsim
Copy link
Owner Author

mratsim commented Dec 12, 2019

closed by #28

@mratsim mratsim closed this as completed Dec 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🪲 Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant