Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add explicit shutdown to ActorCache when the Worker::Actor is shutdown #134

Merged
merged 1 commit into from
Nov 2, 2022

Conversation

bcaimano
Copy link
Contributor

Previously, we allowed scheduled flushes to continue after the Worker::Actor experienced shutdown. This meant that we would effectively commit to storage whatever write operations were in the write buffer when an actor became broken. Instead, we now cancel any scheduled flushes. There may still be in-flight flushes containing coalesced batches of writes from before the actor shut down.

@@ -179,6 +179,10 @@ class ActorCache final: public ActorCacheInterface {
class SharedLru;
// Shared LRU for a whole isolate.

static constexpr auto SHUTDOWN_ERROR_MESSAGE =
"broken.ignored; jsg.Error: "
"Durable Object is shutting down, storage is no longer accessible."_kj;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'll be easier for me see on the ew-test side of things, but a user wouldn't actually receive this error if they broke their Actor via OOM/CPU-usage/throw-from-blockConcurrencyWhile, or if something like a storage error happened, right? I assume they'd receive the real underlying error tunneled to their eyeball worker? Otherwise I can see this being very confusing. ("Why is my Actor shutting down? What does shutting down even mean?")

Actually, maybe what I need to better understand is when they would see this error. I imagined an active Actor would only be shutdown because it was broken, implying some other exception was thrown.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, maybe what I need to better understand is when they would see this error. I imagined an active Actor would only be shutdown because it was broken, implying some other exception was thrown.

Yeah, in practice this is only a consequence of marking an Actor as broken. That is the current state of the world internally without any code changes.

a user wouldn't actually receive this error if they broke their Actor via ...

Most notably, code update comes to mind as a situation where the actor is broken but JS could theoretically keep running. We want to have this case covered since we don't want to allow any future operations to work. If we had the error from the brokenness in hand, we could use that instead. That's doable, we'd just need to modify Worker::Actor::shutdown() to take an additional argument.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most notably, code update comes to mind

Ah yeah. I get that we need a fallback error, I just wanted to be sure that they'd get the OOM/CPU/etc error if that were the root cause, and it wasn't obvious to me that that was how it worked.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most notably, code update comes to mind as a situation where the actor is broken but JS could theoretically keep running

Thanks for calling that out. This might be slightly disruptive then in that cached reads will no longer succeed during a code upgrade, which I wouldn't be surprised if some users noticed. Not that that's a bad thing, per se -- it's nice to rule out the possibility of stale reads.

@bcaimano bcaimano force-pushed the bcaimano/bye-storage-bye branch from 0f6d18e to 581697d Compare October 28, 2022 17:16
@bcaimano bcaimano requested review from bretthoerner and removed request for mikea, jclee, byule and harrishancock October 28, 2022 17:49
Copy link
Contributor

@bretthoerner bretthoerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me, thanks for weaving the exception through!

It'd be nice to see the simple ew-test equivalent before merging just to be sure this catches things like the write-after-CPU-overload issue, but I only say that because I'm not extremely confident in my ability to just "see" that in ActorCache code. Up to you. :)

@bcaimano bcaimano force-pushed the bcaimano/bye-storage-bye branch from 581697d to a461d4d Compare October 28, 2022 20:44
@bcaimano
Copy link
Contributor Author

Turns out that we needed to also shutdown the actor cache inline to the abort promise because otherwise actor cache shutdown was sequenced after the flush promise enqueuing. Fun times!

Copy link
Member

@jasnell jasnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but let's make sure @bretthoerner and @a-robinson are both good on it before landing.

@bcaimano bcaimano force-pushed the bcaimano/bye-storage-bye branch from a461d4d to bdbcbb4 Compare October 31, 2022 17:25
@@ -179,6 +179,10 @@ class ActorCache final: public ActorCacheInterface {
class SharedLru;
// Shared LRU for a whole isolate.

static constexpr auto SHUTDOWN_ERROR_MESSAGE =
"broken.ignored; jsg.Error: "
"Durable Object is shutting down, storage is no longer accessible."_kj;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most notably, code update comes to mind as a situation where the actor is broken but JS could theoretically keep running

Thanks for calling that out. This might be slightly disruptive then in that cached reads will no longer succeed during a code upgrade, which I wouldn't be surprised if some users noticed. Not that that's a bad thing, per se -- it's nice to rule out the possibility of stale reads.

@bcaimano bcaimano force-pushed the bcaimano/bye-storage-bye branch 3 times, most recently from 5babb3c to 7057bce Compare November 1, 2022 22:59
Copy link
Member

@a-robinson a-robinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still looks good 👍

Nice test, by the way!

src/workerd/io/actor-cache.c++ Outdated Show resolved Hide resolved
@@ -2754,9 +2754,19 @@ void Worker::Actor::shutdown(uint16_t reasonCode) {
// written.
}

shutdownActorCache(error);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wondering -- in the cases where error is null, would it be at all useful to pass down reasonCode so that ActorCache::shutdown at least has some sort of info to include in the exception it generates?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For better or worse, the set of reasonCodes that lack an associated error are pretty uninformative at the moment: Either "unknown reason" or "evicted".

Previously, we allowed scheduled flushes to continue after the Worker::Actor experienced shutdown. This meant that we would effectively commit to storage whatever write operations were in the write buffer when an actor became broken. Instead, we now cancel any scheduled flushes. There may still be in-flight flushes containing coalesced batches of writes from before the actor shut down.
@bcaimano bcaimano force-pushed the bcaimano/bye-storage-bye branch from 7057bce to 8d0058d Compare November 2, 2022 19:11
@bcaimano bcaimano merged commit 8657507 into cloudflare:main Nov 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants