Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hang with STM & two scopes #33

Closed
ocharles opened this issue Jul 15, 2024 · 6 comments
Closed

Hang with STM & two scopes #33

ocharles opened this issue Jul 15, 2024 · 6 comments

Comments

@ocharles
Copy link

The following program hangs indefinitely:

{-# LANGUAGE BlockArguments #-}
{-# LANGUAGE DeriveAnyClass #-}

module Main where

import Control.Monad
import Control.Exception
import Control.Concurrent
import Control.Concurrent.STM
import Ki

data Bang = Bang
  deriving (Exception, Show)

main :: IO ()
main = 
  scoped \scope1 -> do
    explodingThread <- fork scope1 do
      threadDelay 1_000_000
      throwIO Bang

    scoped \ki -> do
      c <- newTChanIO

      threadId <- fork ki do
        forever do
          atomically do
            readTChan c

      putStrLn "Waiting"
      threadDelay 10_000_000

I would expect this to be fine, and die with the Bang exception. If we use a single scope, that's exactly what happens, so it seems to be some kind of interation between two scopes.

@ocharles
Copy link
Author

We also tried just using threadDelay and this code does work, so it seems to be something specific to the use of STM:

{-# LANGUAGE BlockArguments #-}
{-# LANGUAGE DeriveAnyClass #-}

module Main where

import Control.Monad
import Control.Exception
import Control.Concurrent
import Control.Concurrent.STM
import Ki

data Bang = Bang
  deriving (Exception, Show)

main :: IO ()
main = 
  scoped \scope1 -> do
    explodingThread <- fork scope1 do
      threadDelay 1_000_000
      throwIO Bang

    scoped \ki -> do
      c <- newTChanIO

      threadId <- fork ki do
        threadDelay 5_000_000

      putStrLn "Waiting"
      threadDelay 10_000_000

@ocharles
Copy link
Author

One more finding:

If I do

      threadId <- fork ki do
        print =<< try @SomeException do
          forever do
            atomically do
              readTChan c
          pure ()

I get

Left thhuh.hs: thread blocked indefinitely in an STM transaction
read blocked indefinitely in an ⏎

So I think it does get the exception and something else is borked

@mitchellwrosen
Copy link
Member

Oh dear, thanks for the bug report.

@mitchellwrosen
Copy link
Member

mitchellwrosen commented Jul 15, 2024

I think the bug is here:

-- If one of our children propagated an exception to us, then we know it's about to terminate, so we don't bother
-- throwing an exception to it.
pure case result of
Left (PropagatingFrom childId) -> IntMap.Lazy.delete childId runningChildren
_ -> runningChildren

The intention is to avoid bothering to throw ScopeClosing exception right back to the child thread that's just propagated something to us (the parent).

But the child thread identifier here isn't globally unique: for each scope, it starts at 0 and counts up. So, we can easily get our wires crossed and avoid propagating an exception down to a child that should receive it.

I can see a few options:

  1. Just delete this line of code; it's fine to throwTo to a dead ThreadId. I'm not sure when GHC recycles ThreadId but surely it is not really possible to kill the wrong thread here. I'd assume ThreadId will wrap around after 2^64 ids have been assigned.
  2. Add a unique "scope id" that we can include in various places to know what scope we're talking about. In this case, the exception that a child throws to its parent can include the scope id that it was created in. This will let us accurately elect not to throw an exception back to the child thread that threw to us.

mitchellwrosen added a commit that referenced this issue Jul 15, 2024
mitchellwrosen added a commit that referenced this issue Jul 15, 2024
@mitchellwrosen
Copy link
Member

This is fixed in ki-1.0.1.2. We went with option (1); I think the motivation was rather poor for this "feature".

Kind of an embarrassing bug! 😓 But thank you again for reporting!

@ocharles
Copy link
Author

Amazing, thank you so much! And don't feel bad - most bugs are like this! We'll give this a bash soon. Fortunately we only hit this bug when a bunch of other stuff has crashed, so it's quite - if you'll forgive the pun - exceptional 😉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants