Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent a deadlock if stepping the network crashes #262

Merged
merged 3 commits into from
Jul 7, 2022

Conversation

ocharles
Copy link
Collaborator

@ocharles ocharles commented Jul 5, 2022

Currently, the following problem causes a deadlock:

{-# language TypeApplications #-}
import Control.Exception
import Data.Functor
import Reactive.Banana
import Reactive.Banana.Frameworks

main :: IO ()
main = do
  (addHandler, fire) <- newAddHandler
  network <- compile $ do
    e <- fromAddHandler addHandler
    e' <- execute $ e $> do
      error "BANG"
      return ()

    reactimate $ print <$> e
    reactimate $ print <$> e'

  actuate network
  _ <- try @SomeException (fire ())

  fire ()

The problem occurs when first call fire. The high-level step function
first takes the network MVar, and then tries to step the network. This
crashes, but the overall program is safe because it was wrapped in
try. However, we're now in a state where the network MVar is empty,
so the second time we try and call fire we deadlock.

The fix in this commit is to bracket the whole network evaluation with
bracketOnError, and to restore the starting state if an error occurs.

Currently, the following problem causes a deadlock:

```haskell
{-# language TypeApplications #-}
import Control.Exception
import Data.Functor
import Reactive.Banana
import Reactive.Banana.Frameworks

main :: IO ()
main = do
  (addHandler, fire) <- newAddHandler
  network <- compile $ do
    e <- fromAddHandler addHandler
    e' <- execute $ e $> do
      error "BANG"
      return ()

    reactimate $ print <$> e
    reactimate $ print <$> e'

  actuate network
  _ <- try @SomeException (fire ())

  fire ()
```

The problem occurs when first call `fire`. The high-level step function
first takes the network `MVar`, and then tries to step the network. This
crashes, but the overall program is safe because it was wrapped in
`try`. However, we're now in a state where the network `MVar` is empty,
so the second time we try and call `fire` we deadlock.

The fix in this commit is to bracket the whole network evaluation with
`bracketOnError`, and to restore the starting state if an error occurs.
Copy link
Collaborator

@mitchellwrosen mitchellwrosen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM; it doesn't look totally async exception safe, but I don't think it has to be :)

@mitchellwrosen
Copy link
Collaborator

Oopsies, I misread the PR a bit - won't this still deadlock if a reactimate crashes? We'll be putting to a full MVar.

@ocharles
Copy link
Collaborator Author

ocharles commented Jul 5, 2022

@mitchellwrosen I'd love to know more! What are you thinking of?

@ocharles
Copy link
Collaborator Author

ocharles commented Jul 5, 2022

@mitchellwrosen Re reactimate - no, reactimate happens outside the MVar acquisition. I think we're fine

@ocharles
Copy link
Collaborator Author

ocharles commented Jul 5, 2022

@mitchellwrosen Ah sorry, I see what you mean now! output could throw, which we don't care about.

@ocharles
Copy link
Collaborator Author

ocharles commented Jul 5, 2022

Alright, pushed a slightly different version which I think does what we want wrt reactimate and liftIOLater.

@mitchellwrosen
Copy link
Collaborator

mitchellwrosen commented Jul 5, 2022

Okay, looks great. The async exception unsafety I was referring to is still present, I think, but probably not very important.

bracketOnError
  (takeMVar s)
  (putMVar s)
  \s1 -> do
    (output, s2) <- f s1
    putMVar s s2
    {- async exception here -}
    return output

If an async exception strikes there, I think we'll deadlock trying to put to a full MVar. There might be a higher-level way than just using mask directly, but this is roughly what I was thinking:

mask \restore -> do
  s1 <- takeMVar s
  (output, s2) <- restore (f s1) `onException` putMVar s s1
  putMVar s s2
  return output

@ocharles
Copy link
Collaborator Author

ocharles commented Jul 5, 2022

Ah, I see! That's a great catch. I really want to get this right, so I think using mask is our best option here. Round three, here we go!

@mitchellwrosen
Copy link
Collaborator

(sorry, forgot the onException in my snippet, edited)

@ocharles ocharles force-pushed the mvar-crash-deadlock branch from 93b0853 to 4e25441 Compare July 5, 2022 18:00
@ocharles
Copy link
Collaborator Author

ocharles commented Jul 5, 2022

Ok, using mask now. The only thing that bothers me is it might be nice to let users timeout the takeMVar, but I'm not quite sure how to do that. If I restore that take then I think things can still deadlock. Maybe let's go with this PR for now?

@mitchellwrosen
Copy link
Collaborator

So long as the network itself isn't created with async exceptions uninterruptibly masked, a user could still timeout the takeMVar here, because takeMVar on an empty MVar is considered a "blocking" operation, and therefore can still be sniped by an async exception, if one's enqueued.

@ocharles
Copy link
Collaborator Author

ocharles commented Jul 5, 2022

Ah! So even though it's masked, it can still be interrupted? I need to do some reading on all of this!

@mitchellwrosen
Copy link
Collaborator

mitchellwrosen commented Jul 5, 2022

Yeah, that's right, it can be interrupted if it blocks (because we are using mask instead of maskUninterruptible). Under an interruptible mask, a takeMVar from a full MVar couldn't die due to an async exception, for example, nor putting to an empty one.

I think the Control.Exception docs explain this pretty well, albeit kinda all over the place :)

@mitchellwrosen mitchellwrosen merged commit c9627bd into master Jul 7, 2022
@mitchellwrosen mitchellwrosen deleted the mvar-crash-deadlock branch July 7, 2022 18:53
@HeinrichApfelmus
Copy link
Owner

🙏 Makes sense, thanks guys!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants