Cats Effect 3.0 Interruption Model Proposal #681

SystemFw · 2019-10-23T01:46:35Z

I'm writing this proposal to describe the interruption model I'd like to see for cats-effect 3.0.
I'm only going to be focusing on semantics, things like the place in the hierarchy should be discussed elsewhere.
The main goals of this proposal are to lower the skill level needed to write interruption- and finalisation- safe code, and to preserve as much compositionality as possible in a problem that at some level defies it.
I'm going to try setting enough context to justify the decisions in this proposal, if you'd like to skip to the core of it, go to the Combinators and Semantics section.

The problem

The notion of safety in concurrent code is not univocal, and often revolves around two separate concepts: finalisation and interruption.
Finalisation is needed to ensure that certain sections of the code will run in any scenario: for example to close resources, or restore some data to a valid state.
Interruption is needed to avoid using resources once the calling code has decided they are no longer needed, to prevent deadlock by offering timeouts, and to compositionally express several patterns (e.g. do this for 10 seconds, then start doing that).
Unfortunately, these two concepts are at odds at a fundamental level :

Interruption means "No matter what you are doing, stop"
Finalisation means "No matter what you say, I must keep going"

From a theoretical point of view, there is no perfect compromise: by prioritising resource safety at all costs, you lose all interruption (this is current uncancelable), and by prioritising interruption at
all costs, you lose resource safety (this is your standard flatMap).
However, I will posit that for our users, loss of resource safety is more dangerous than loss of interruption, and therefore we shall keep things as interruptible as possible, but prioritise resource safety when breaking a tie. You will hopefully see exactly what I mean by this, further down in the proposal.

Current situation

I think it's safe to say that currently writing resource safe code that preserves interruption is a matter for experts, for two reasons:

The first is that users need to be familiar with guaranteeCase, bracketCase, continual, uncancelable, and the interaction of those with join and cancel. Each operation has subtly different semantics when it comes to interruption and they are all absolutely necessary in some scenarios.
The second is more fundamental: writing interruption-aware code means basically writing code that restores some properties when interruption happens. How and what needs to be restored generally depends on where interruption happened, and there is no way of knowing that.
In practice this often means having to do complicated reasoning over a piece of state and figure out how to write code which can always restore it to a "steady" state, which can get very complex.

Principles

As a default, interruption can happen "anywhere" (any yield points, between any two flatMaps), but in critical sections the writer of the code needs to limit and control where interruption can happen. The ability to do this is what makes the model easier to use.
The writer of the code, not the caller, is ultimately in control of the resource safety of the code they write. It shouldn't be possible for someone to call code that was designed to be resource safe and somehow make it resource unsafe.
If we accept principle 2, it turns out that interruption behaves in exactly the opposite way, the caller of the code is ultimately in control of whether the code they are calling is interruptible or
not. The writer can allow interruption in limited places, but cannot have a guarantee that the final code will be interruptible, it's up to the caller.
In other words, it's possible for someone to call code that was designed to be interruptible and make it uninterruptible. This is unavoidable really (if we want principle 2), but I'll argue it's the
correct default. Also, preserving interruptibility is fairly easy, unlike today.

Combinators and Semantics

Safe regions

The model can be described by two functions (names can be changed):

def onCancel[A](fa: F[A])(finalizer: F[Unit]): F[A]
def mask[A](region: Poll => F[A]): F[A]

class Poll {
  def apply[B](fa: F[B]): F[B]
}


// or, in dotty to show the higher rank
def mask[A](region: ([B] => F[B] => F[B]) => F[A]): F[A]

onCancel attaches a finalizer to an F[A]. Upon interruption, all registered finalizers are run and the rest of the computation is short-circuited. The code triggering interruption backpressures on
said finalizers.
mask allows to control interruption in critical sections by defining a region in which interruption cannot happen unless one explicitly encloses the code that can be interrupted in a call to Poll.apply.
Because the writer of the code now knows exactly at which points interruption can happen, it's easy to add the correct logic via onCancel. We will soon see that the semantics of Poll.apply need to be refined a bit, but this is a good first intuition.
Example:

val p1 = mask { poll =>
  for {
    r <- openResource
    c <- poll(waitOnCondition).onCancel(closeResource)
    _ <- modifySomeStateDependingOn(c, r).attempt
    _ <- closeResource
  } yield ()
}

Which means:

val p1 = mask { poll =>
  for {
    r <- openResource // cannot be interrupted
    // flatMap cannot be interrupted
    c <- poll(waitOnCondition).onCancel(closeResource) // waiting can be interrupted
    // flatMap cannot be interrupted
    r <- modifySomeStateDependingOn(c, r).attempt // cannot be interrupted
   // flatMap cannot be interrupted
    _ <- closeResource // cannot be interrupted
  } yield ()
}

Note that p1 is almost the same as

openResource.bracket { r =>
  waitOnCondition.flatMap { c =>
    modifySomeStateDependingOn(c, r).attempt
  }
}{closeResource}

but there is a crucial difference, in the bracket version .flatMap and modifySomeState... can be interrupted, which means the code now has to worry about restoring SomeState, without knowing ifinterruption happened on waitOnCondition, or in between, or in any of the flatMaps that might make up modifySomeState..., requiring complex reasoning to figure out what to restore SomeState to.
I hope this illustrates how much nicer this model is to work with, as you essentially get to build a safe region where you need one.

Note 1:
In the snippet above I simply attempted any synchronous errors for brevity, but all sorts of error handling are possible of course.

Note 2:
In cats-effect 2.0, you have to bracket and continual to writep1 correctly.

Nested regions and restore semantics

So, in the section above we have defined Poll.apply, which is called in an uninterruptible region defined by mask, as making its argument interruptible again.
We will now see how these semantics are slightly wrong and require some refinement in order to comply with the principles above.
Imagine p1 is being called by p2, which is also safety-aware:

val p2 = mask { poll =>
  _ <- openResource2 
  _ <- p1
  _ <- closeResource2
}

Now, p2 thinks it's safe, but it's unaware that inside p1 there is an interruptible region defined by a call to poll, and so it can now be interrupted and not execute closeResource2.
In other words, writing resource safe code requires inspecting all downstream code to know whether a flatMap is safe, which in my opinion is not an acceptable solution.

We can solve this problem by changing the semantics of Poll.apply: a call to Poll.apply makes its argument inherit the interruption status of the outer scope.
For example, if we have def main = p1, the outer scope interruption status is interruptible (by default things are interruptible), and inside p1 waitOnCondition inherits this status, i.e. it's
interruptible, and we have recovered the previous semantics.
However, when nested inside p2, the outer scope is uninterruptible since we are inside mask, and so the call to poll inside p1 inherits that, and things are safe again.
If p2 wants to allow interruption of p1, it can call poll again:

val p2 = mask { poll =>
  _ <- openResource2 
  _ <- poll(p1).onCancel(closeResource2)
  _ <- closeResource2
}

def main = p2

p1.poll inherits the status of p2.poll, which inherits main, meaning that waitOnCondition is now interruptible again, but none of the other parts are, which are the desired semantics. This extends to arbitrary nesting.

Let's highlight where the tradeoff lies: the caller of code written inmask can take code that is safely interruptible, and (perhaps accidentally) make it uninterruptible by not calling poll on it.
Or, to look at it from the other side, the writer of code in mask cannot guarantee that their code is interruptible, it can only allow for it by calling poll, but the caller will have the final say.

I think this is the correct tradeoff: as the caller, you know whether you care about interrupting one of the things you call or not, so the failure mode of forgetting to call poll is way less dangerous than the initial semantics, where the code you write can become interruptible because something way down in the callstack has used an interruptible region.

I think this as compositional as we can get, and hopefully explains principles 2 and 3, and clarifies what I mean by saying that on balance, resource safety is favoured.

Relationship with `bracket` etc.

We don't have to rewrite bracket and co., and actually we could also forego onCancel and use guaranteeCase, but I'll point out for the sake of argument how our native combinators can be easily expressed in this model.
This should also serve as a small example section:

bracketCase = mask { poll =>
  acquire.flatMap { r =>
     poll(use(r))
       .onCancel(close(ExitCase.Canceled))
       .handleErrorWith(e => close(ExitCase.Error(e)) >> F.raiseError(e))
       .flatMap(result => close(ExitCase.Completed).as(result)) 
  }
  
interruptibleBracketCase = mask { poll =>
  poll(acquire).flatMap { r =>
     poll(use(r))
       .onCancel(close(ExitCase.Canceled))
       .handleErrorWith(e => close(ExitCase.Error(e)) >> F.raiseError(e))
       .flatMap(result => close(ExitCase.Completed).as(result)) 
  }
 
uncancelable = mask(_ => fa)

continual = mask { poll =>
  poll(fa).flatMap(f)
}

Differences with Haskell

This model is obviously inspired by Haskell's async exceptions, but there is a huge difference: in haskell some blessed operations are interruptible even inside mask, for example MVar.take.
As argued above, I think this is not a great default in general, and it's even worse in cats-effect because things like MVar aren't built in and users can create their own blocking primitives.
Another obvious difference is that onCancel/guaranteeCase aren't the same as handleErrorWith, so you cannot catch interruption, whereas in haskell you can and you need to remember to rethrow.

Open questions on `start`

There is an open question about start: the Haskell's equivalent (forkIO) has the spawned fiber inherit the interruption status if forkIO is called inside mask, with another primitive
forkIOWithUnmask to achieve independence.
I still haven't made up my mind as to which semantics are necessary or desirable here. The other questions are around what happens if users pass a Poll instance around, but I think those shouldn't constitute a blocker to the main drive of the proposal.

Thanks

Sorry for the wall of text! I did my best to try and condensate this immensely tricky problem to its essence, but it still turned out super long.

The text was updated successfully, but these errors were encountered:

JD557 · 2019-10-25T10:56:41Z

I have some questions regarding the poll/onCancel semantics, which might not be obvious since poll does not change the return type:

What happens if you call foo.onCancel(bar) instead of poll(foo).onCancel(bar)? Is that onCancel a no-op?
Can you chain onCancel calls like poll(foo).onCancel(bar).onCancel(baz)?

One possible way to avoid ambiguity would be to remove onCancel and move the finalizer definition to Poll:

class Poll {
  def apply[B](fa: F[B], finalizer: F[Unit]): F[B]
}

SystemFw · 2019-10-25T11:10:09Z

It depends where you are. onCancel can be used outside of a mask section just fine, i.e. a.flatMap(f).flatMap(g).onCancel(doSomething). If you are inside a mask section and you don't say poll(foo), then foo cannot be canceled, and onCancel is a no-op in that sense
They all get run, in order

One possible way to avoid ambiguity would be to remove onCancel and move the finalizer definition to Poll:

You can't do that because onCancel can be used outside of mask as well. Another way of looking at it is that onCancel is just guaranteeCase, and in fact we could just keep guaranteeCase, I chose onCancel because it's the simplest operation that can describe the model, which I was trying to distill to its essence. Good question though, I considered attaching the finaliser to poll as well, at some point

neko-kai · 2019-10-25T12:18:24Z

@SystemFw
How would this code behave under the proposed model?

mask { poll =>
  poll(F.never).guarantee {
    F.delay(println("Waiting 1s")) >>
    F.sleep(1.second) >>
    poll(F.unit) >>
    F.delay(println("Waited 1s"))
  }
}.start.flatMap {
  F.sleep(1.second) *> _.cancel
}

Will it ever print Waited 1s or will interrupt during the poll before? i.e. resulting in 2 interruptions per 1 cancel.

SystemFw · 2019-10-25T12:25:07Z

So, that code is using the two things whose semantics are still in the works, i.e:

start within mask
sharing poll outside of a mask scope

I'm still trying to define what's best there, in haskell the Fiber would inherit the masking status unless you use forkIOWithUnmask. I'm thinking we don't do that and just have start spawn a Fiber in a new status (so, interruptible), so you wouldn't need poll, and the whole code acts like if mask didn't exist. There are some implications to this choice, but I'm still fully pondering them.

What are your ideas here?

SystemFw · 2019-10-25T12:27:39Z

I think we can extract at least one clear question from there, which is:

what if you call poll inside onCancel?

My first thought is that this is a no-op, things inside onCancel should be flat-out uncancelable, but obviously open to hearing counter points. These are all great points though :)

neko-kai · 2019-10-25T12:37:46Z

@SystemFw

So, that code is using the two things whose semantics are still in the works, i.e:
start within mask
sharing poll outside of a mask scope

If you look closely, the start is outside of mask here – the entire mask {} expression is started.
Also, the poll is only used within mask expression if you accept that in mask(a guarantee b) both a and b are "within" mask.

So actually, my snippet isn't about inheritance of masking status, it's about "stickinness" of interruption – can a single cancel be observed repeatedly from multiple polls during execution of finalizers?

SystemFw · 2019-10-25T13:35:33Z

Yeah, sorry about that :)
Then I guess as I said the question reduces to:

what if you call poll inside onCancel?

And I'd say that that's a no-op and things inside onCancel are uninterruptible. What do you think?

neko-kai · 2019-10-25T15:38:37Z

@SystemFw
I'm not sure of the implications, but sounds reasonable, esp. if we dictate that a Fiber can only be canceled once – i.e. multiple cancels have no effect and are non-sensical, unlike Haskell.

Re: usage of poll out of scope, while Scala doesn't make it possible to prevent poll from leaking, it's possible to determine its scope and invalidate all out of scope uses by 1. invalidating poll in finalizer of mask to prevent sequential leaking, 2. add FiberId and invalidate poll for any fiber but the one that spawned it, to prevent leaking to different fibers.

djspiewak · 2019-10-25T23:12:45Z

Re: usage of poll out of scope, while Scala doesn't make it possible to prevent poll from leaking, it's possible to determine its scope and invalidate all out of scope uses by 1. invalidating poll in finalizer of mask to prevent sequential leaking, 2. add FiberId and invalidate poll for any fiber but the one that spawned it, to prevent leaking to different fibers.

We could actually make it possible by encoding a functional dependency within Concurrent and using the same trick that the ST monad does, but on a transformer type around F in poll. I have an example of that in Gitter, I think. But… it's horrifying. I think it's safer to say "it's not possible" and leave it at that. :-)

Anyway, I think your proposal is more realistic: just detect leaked polls and turn them into no-ops.

djspiewak · 2019-12-06T18:03:19Z

So after pondering this quite considerably…

For all of the reasons you outlined, I really don't believe it is possible in general to write code which is generic across both continual and auto-cancelable evaluation models. It simply isn't. If you write the code in a fashion which is amenable to auto-cancelability, then you will have deadlocks when evaluated in a continual model and vice versa with resource safety. Basically, all effectful code makes some sort of implicit assumption about the cancelability of its logic, and when that assumption is violated, bad things can happen.

Technically, poll provides a hammer by which such assumptions can be violated, but it's less dramatic than telling people that they need to worry about continual and auto-cancelable bind chains at the same time.

For that reason, I think we scrap the goal of unifying the two models and simply focus on auto-cancelable evaluation. To that end, I'm officially in favor of your proposal as broadly outlined in the OP, though I don't like the name mask as much as the name uncancelable. :-) I would also say that we should probably just use FunctionK rather than a bespoke natural transformation encoding such as Poll.

djspiewak · 2019-12-22T18:43:35Z

I now have this sketched out locally. I'm going to close this since I think we're pretty definitively settled on this model. I'll try to have things up on the branch this week.

Thank you SOOOO much @SystemFw for all your work on this!

…monad, vim dotty Rock Paper Scissor Game through Actor Model in Scala https://medium.com/@karthikv1392/rock-paper-scissor-game-through-actor-model-in-scala-a8a1430f10ff Sizeof in Scala http://yannmoisan.com/sizeof-in-scala.html) using `import org.apache.spark.util.SizeEstimator._` Scala Classes – Syntax, Declaration, Use case, Examples https://leobenkel.com/2019/10/scala-classes How to keep your sanity working with Scala Implicit! https://medium.com/@lprakashv/how-to-keep-your-sanity-working-with-scala-implicit-fda9ffc33162 Giter8 - Part 3 - Clean Architecture Use Case Scaffold https://www.youtube.com/watch?v=RQU74SHNB08 Cats Effect 3.0 Interruption Model Proposal typelevel/cats-effect#681 Hacking with scala circe-json https://medium.com/rahasak/hacking-with-circe-json-scala-ca626705733e Open-sourcing Polynote: an IDE-inspired polyglot notebook https://medium.com/netflix-techblog/open-sourcing-polynote-an-ide-inspired-polyglot-notebook-7f929d3f447 Websockets in Scala using sttp https://blog.softwaremill.com/websockets-in-scala-using-sttp-baefd207c5fb Journey into the IO Monad (Part 1) https://medium.com/@RaymondTayBL/journey-into-the-io-monad-part-1-bdf591b4fa09 Journey into the IO Monad (Part 2) https://medium.com/@RaymondTayBL/journey-into-the-io-monad-part-2-2e21da2826e3 Journey to the IO Monad (Part 3.1) https://medium.com/@RaymondTayBL/journey-to-the-io-monad-part-3-1-35813c7f48ac Vim with Dotty (using coc vim) https://www.dev-log.me/Coc_Vim_with_Dotty

djspiewak added the CE 3 label Oct 23, 2019

djspiewak closed this as completed Dec 22, 2019

djspiewak mentioned this issue Jan 2, 2020

Rewrite PureIO on top of coop #738

Merged

SystemFw mentioned this issue Feb 1, 2020

Add MVar.modify? #771

Closed

RaasAhsan mentioned this issue Apr 26, 2020

WIP: Fiber local state PoC #836

Closed

5 tasks

RaasAhsan mentioned this issue May 16, 2020

Revise Bracket CE2 #1 #871

Closed

SystemFw mentioned this issue Jul 21, 2020

Ensure that ce3 MVar is in sync with ce2 #967

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cats Effect 3.0 Interruption Model Proposal #681

Cats Effect 3.0 Interruption Model Proposal #681

SystemFw commented Oct 23, 2019 •

edited

Loading

JD557 commented Oct 25, 2019

SystemFw commented Oct 25, 2019

neko-kai commented Oct 25, 2019

SystemFw commented Oct 25, 2019 •

edited

Loading

SystemFw commented Oct 25, 2019

neko-kai commented Oct 25, 2019

SystemFw commented Oct 25, 2019

neko-kai commented Oct 25, 2019 •

edited

Loading

djspiewak commented Oct 25, 2019

djspiewak commented Dec 6, 2019

djspiewak commented Dec 22, 2019 •

edited

Loading

Cats Effect 3.0 Interruption Model Proposal #681

Cats Effect 3.0 Interruption Model Proposal #681

Comments

SystemFw commented Oct 23, 2019 • edited Loading

The problem

Current situation

Principles

Combinators and Semantics

Safe regions

Nested regions and restore semantics

Relationship with bracket etc.

Differences with Haskell

Open questions on start

Thanks

JD557 commented Oct 25, 2019

SystemFw commented Oct 25, 2019

neko-kai commented Oct 25, 2019

SystemFw commented Oct 25, 2019 • edited Loading

SystemFw commented Oct 25, 2019

neko-kai commented Oct 25, 2019

SystemFw commented Oct 25, 2019

neko-kai commented Oct 25, 2019 • edited Loading

djspiewak commented Oct 25, 2019

djspiewak commented Dec 6, 2019

djspiewak commented Dec 22, 2019 • edited Loading

SystemFw commented Oct 23, 2019 •

edited

Loading

Relationship with `bracket` etc.

Open questions on `start`

SystemFw commented Oct 25, 2019 •

edited

Loading

neko-kai commented Oct 25, 2019 •

edited

Loading

djspiewak commented Dec 22, 2019 •

edited

Loading