-
Notifications
You must be signed in to change notification settings - Fork 336
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add explicit shutdown to ActorCache when the Worker::Actor is shutdown #134
Conversation
src/workerd/io/actor-cache.h
Outdated
@@ -179,6 +179,10 @@ class ActorCache final: public ActorCacheInterface { | |||
class SharedLru; | |||
// Shared LRU for a whole isolate. | |||
|
|||
static constexpr auto SHUTDOWN_ERROR_MESSAGE = | |||
"broken.ignored; jsg.Error: " | |||
"Durable Object is shutting down, storage is no longer accessible."_kj; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it'll be easier for me see on the ew-test
side of things, but a user wouldn't actually receive this error if they broke their Actor via OOM/CPU-usage/throw-from-blockConcurrencyWhile, or if something like a storage error happened, right? I assume they'd receive the real underlying error tunneled to their eyeball worker? Otherwise I can see this being very confusing. ("Why is my Actor shutting down? What does shutting down even mean?")
Actually, maybe what I need to better understand is when they would see this error. I imagined an active Actor would only be shutdown because it was broken, implying some other exception was thrown.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, maybe what I need to better understand is when they would see this error. I imagined an active Actor would only be shutdown because it was broken, implying some other exception was thrown.
Yeah, in practice this is only a consequence of marking an Actor as broken. That is the current state of the world internally without any code changes.
a user wouldn't actually receive this error if they broke their Actor via ...
Most notably, code update comes to mind as a situation where the actor is broken but JS could theoretically keep running. We want to have this case covered since we don't want to allow any future operations to work. If we had the error from the brokenness in hand, we could use that instead. That's doable, we'd just need to modify Worker::Actor::shutdown()
to take an additional argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most notably, code update comes to mind
Ah yeah. I get that we need a fallback error, I just wanted to be sure that they'd get the OOM/CPU/etc error if that were the root cause, and it wasn't obvious to me that that was how it worked.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most notably, code update comes to mind as a situation where the actor is broken but JS could theoretically keep running
Thanks for calling that out. This might be slightly disruptive then in that cached reads will no longer succeed during a code upgrade, which I wouldn't be surprised if some users noticed. Not that that's a bad thing, per se -- it's nice to rule out the possibility of stale reads.
0f6d18e
to
581697d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to me, thanks for weaving the exception through!
It'd be nice to see the simple ew-test
equivalent before merging just to be sure this catches things like the write-after-CPU-overload issue, but I only say that because I'm not extremely confident in my ability to just "see" that in ActorCache code. Up to you. :)
581697d
to
a461d4d
Compare
Turns out that we needed to also shutdown the actor cache inline to the abort promise because otherwise actor cache shutdown was sequenced after the flush promise enqueuing. Fun times! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM but let's make sure @bretthoerner and @a-robinson are both good on it before landing.
a461d4d
to
bdbcbb4
Compare
src/workerd/io/actor-cache.h
Outdated
@@ -179,6 +179,10 @@ class ActorCache final: public ActorCacheInterface { | |||
class SharedLru; | |||
// Shared LRU for a whole isolate. | |||
|
|||
static constexpr auto SHUTDOWN_ERROR_MESSAGE = | |||
"broken.ignored; jsg.Error: " | |||
"Durable Object is shutting down, storage is no longer accessible."_kj; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most notably, code update comes to mind as a situation where the actor is broken but JS could theoretically keep running
Thanks for calling that out. This might be slightly disruptive then in that cached reads will no longer succeed during a code upgrade, which I wouldn't be surprised if some users noticed. Not that that's a bad thing, per se -- it's nice to rule out the possibility of stale reads.
5babb3c
to
7057bce
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still looks good 👍
Nice test, by the way!
@@ -2754,9 +2754,19 @@ void Worker::Actor::shutdown(uint16_t reasonCode) { | |||
// written. | |||
} | |||
|
|||
shutdownActorCache(error); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just wondering -- in the cases where error
is null, would it be at all useful to pass down reasonCode so that ActorCache::shutdown
at least has some sort of info to include in the exception it generates?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For better or worse, the set of reasonCodes that lack an associated error are pretty uninformative at the moment: Either "unknown reason" or "evicted".
Previously, we allowed scheduled flushes to continue after the Worker::Actor experienced shutdown. This meant that we would effectively commit to storage whatever write operations were in the write buffer when an actor became broken. Instead, we now cancel any scheduled flushes. There may still be in-flight flushes containing coalesced batches of writes from before the actor shut down.
7057bce
to
8d0058d
Compare
Previously, we allowed scheduled flushes to continue after the Worker::Actor experienced shutdown. This meant that we would effectively commit to storage whatever write operations were in the write buffer when an actor became broken. Instead, we now cancel any scheduled flushes. There may still be in-flight flushes containing coalesced batches of writes from before the actor shut down.