Fix broken shutdown if scheduling from event loop context #550

mihails-strasuns · 2018-07-12T10:16:01Z

With this fix, any such tasks will be ignored allowing for clean exit.
There is a concern of losing app data but it will have to be addressed
in a separate changest, targetting minor or maybe even major branch.

gavin-norman-sociomantic · 2018-07-12T10:25:01Z

src/ocean/task/Scheduler.d

@@ -334,6 +334,7 @@ final class Scheduler : IScheduler
            auto caller_task = Task.getThis();
            if (caller_task !is null)
                caller_task.kill();
+            return;


Unrelated to this PR, but it might be worth adding a comment as to why the calling task is killed if schedule is called during shutdown.

gavin-norman-sociomantic · 2018-07-12T10:28:30Z

src/ocean/task/Scheduler.d

@@ -615,6 +615,8 @@ final class Scheduler : IScheduler
    public void processEvents ( )
    {
        auto task = Task.getThis();
+        assert(task !is null);


Why not verify?

It will crash anyway even if check is missing

Mm I don't get what you mean. Aren't these the possible cases in D2 builds?

assert, task is null: crash.

assert, task !is null: keeps running.

verify, task is null: throws.

verify, task !is null: keeps running.

Few more options depending on -release but essentially yes. But don't forget that it crashes right now if condition is false because there is no check at all. So this exactly matches assert purpose - turn crash / corruption into an easier debuggable crash.

But didn't we go through and convert all such cases into verifys?

If that was the case, we have wasted a lot of effort on something fundamentally flawed. You can't avoid crashes no matter how hard you try and must embrace them :) From my PoV goal of verify is to protect against memory corruptions and similar damage.

To word it differently - the fact that right now dhtnode can hit an assert (now verify) condition and survive after is a bug in dhtnode that I totally expect to be fixed some day, not a feature to support.

the fact that right now dhtnode can hit an assert (now verify) condition and survive after is a bug in dhtnode that I totally expect to be fixed some day

This is the key. Using asserts to notify us of such bugs causes crashes. Using verifys to notify us of such bugs (potentially) avoids crashes and allows us to log the bug. From the POV of a dhtnode, the latter is way more desirable than the former.

I disagree. Consider this PR - before my change it would always just segfault on wrong usage, in both D1 and D2. There are no asserts, just plain segfault. The very fact that it was working like that in deployed live apps for a very long time is a direct proof that this is perfectly fine.

This is reality that can't be worked around - no matter what policy you take on assert, your app will crash eventually and bugs will have to be fixed. Effort to make it as harmless as possible is meaningful, effort to prevent it altogether is not.

The very fact that it was working like that in deployed live apps for a very long time is a direct proof that this is perfectly fine.

It's only fine because it never came up. If it had have come up, an assert here would be as bad as a plain segfault, in terms of direct impact to the application.

Effort to make it as harmless as possible is meaningful, effort to prevent it altogether is not.

This is exactly what I'm suggesting. Throwing is generally less harmful than crashing. Both have equal value in terms of notifying maintainers of bugs.

mihails-strasuns · 2018-07-12T10:36:58Z

Updated with a comment

daniel-zullo

LGTM

daniel-zullo · 2018-07-17T08:25:11Z

Ping. Can we please label this PR as high priority? I need this patch to make a daemon application works properly.

mihails-strasuns-sociomantic · 2018-07-17T11:41:40Z

I will make a release as soon as @gavin-norman-sociomantic gives a green light :)

Because of missing return statement, trying to schedule a new task after shutdown would succeed but result in application infinitely stuck during shutdown sequence. With this fix, any such tasks will be ignored allowing for clean exit. There is a concern of losing app data but it will have to be addressed in a separate changest, targetting minor or maybe even major branch.

gavin-norman-sociomantic · 2018-07-17T13:46:21Z

Green light.

mihails-strasuns-sociomantic added this to the v3.8.5 milestone Jul 12, 2018

mihails-strasuns-sociomantic added the type-bug label Jul 12, 2018

gavin-norman-sociomantic reviewed Jul 12, 2018

View reviewed changes

daniel-zullo reviewed Jul 12, 2018

View reviewed changes

nemanja-boric-sociomantic added the prio-high label Jul 17, 2018

Mihails Strasuns added 2 commits July 17, 2018 13:38

Add comment explaining shutdown logic

75b4da2

gavin-norman-sociomantic approved these changes Jul 17, 2018

View reviewed changes

gavin-norman-sociomantic merged commit 5662f8a into sociomantic-tsunami:v3.8.x Jul 17, 2018

mihails-strasuns deleted the fix-shutdown branch July 31, 2018 10:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix broken shutdown if scheduling from event loop context #550

Fix broken shutdown if scheduling from event loop context #550

mihails-strasuns commented Jul 12, 2018

gavin-norman-sociomantic Jul 12, 2018

mihails-strasuns Jul 12, 2018

gavin-norman-sociomantic Jul 12, 2018

mihails-strasuns Jul 12, 2018

gavin-norman-sociomantic Jul 12, 2018

mihails-strasuns Jul 12, 2018

gavin-norman-sociomantic Jul 17, 2018

mihails-strasuns-sociomantic Jul 17, 2018

mihails-strasuns-sociomantic Jul 17, 2018

gavin-norman-sociomantic Jul 17, 2018

mihails-strasuns-sociomantic Jul 17, 2018

gavin-norman-sociomantic Jul 17, 2018

mihails-strasuns commented Jul 12, 2018

daniel-zullo left a comment

daniel-zullo commented Jul 17, 2018

mihails-strasuns-sociomantic commented Jul 17, 2018

gavin-norman-sociomantic commented Jul 17, 2018

Fix broken shutdown if scheduling from event loop context #550

Fix broken shutdown if scheduling from event loop context #550

Conversation

mihails-strasuns commented Jul 12, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mihails-strasuns commented Jul 12, 2018

daniel-zullo left a comment

Choose a reason for hiding this comment

daniel-zullo commented Jul 17, 2018

mihails-strasuns-sociomantic commented Jul 17, 2018

gavin-norman-sociomantic commented Jul 17, 2018