Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix broken shutdown if scheduling from event loop context #550

Merged
merged 2 commits into from
Jul 17, 2018
Merged

Fix broken shutdown if scheduling from event loop context #550

merged 2 commits into from
Jul 17, 2018

Conversation

mihails-strasuns
Copy link

With this fix, any such tasks will be ignored allowing for clean exit.
There is a concern of losing app data but it will have to be addressed
in a separate changest, targetting minor or maybe even major branch.

@@ -334,6 +334,7 @@ final class Scheduler : IScheduler
auto caller_task = Task.getThis();
if (caller_task !is null)
caller_task.kill();
return;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated to this PR, but it might be worth adding a comment as to why the calling task is killed if schedule is called during shutdown.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will add

@@ -615,6 +615,8 @@ final class Scheduler : IScheduler
public void processEvents ( )
{
auto task = Task.getThis();
assert(task !is null);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not verify?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will crash anyway even if check is missing

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mm I don't get what you mean. Aren't these the possible cases in D2 builds?

  • assert, task is null: crash.
  • assert, task !is null: keeps running.
  • verify, task is null: throws.
  • verify, task !is null: keeps running.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few more options depending on -release but essentially yes. But don't forget that it crashes right now if condition is false because there is no check at all. So this exactly matches assert purpose - turn crash / corruption into an easier debuggable crash.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But didn't we go through and convert all such cases into verifys?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that was the case, we have wasted a lot of effort on something fundamentally flawed. You can't avoid crashes no matter how hard you try and must embrace them :) From my PoV goal of verify is to protect against memory corruptions and similar damage.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To word it differently - the fact that right now dhtnode can hit an assert (now verify) condition and survive after is a bug in dhtnode that I totally expect to be fixed some day, not a feature to support.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the fact that right now dhtnode can hit an assert (now verify) condition and survive after is a bug in dhtnode that I totally expect to be fixed some day

This is the key. Using asserts to notify us of such bugs causes crashes. Using verifys to notify us of such bugs (potentially) avoids crashes and allows us to log the bug. From the POV of a dhtnode, the latter is way more desirable than the former.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree. Consider this PR - before my change it would always just segfault on wrong usage, in both D1 and D2. There are no asserts, just plain segfault. The very fact that it was working like that in deployed live apps for a very long time is a direct proof that this is perfectly fine.

This is reality that can't be worked around - no matter what policy you take on assert, your app will crash eventually and bugs will have to be fixed. Effort to make it as harmless as possible is meaningful, effort to prevent it altogether is not.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The very fact that it was working like that in deployed live apps for a very long time is a direct proof that this is perfectly fine.

It's only fine because it never came up. If it had have come up, an assert here would be as bad as a plain segfault, in terms of direct impact to the application.

Effort to make it as harmless as possible is meaningful, effort to prevent it altogether is not.

This is exactly what I'm suggesting. Throwing is generally less harmful than crashing. Both have equal value in terms of notifying maintainers of bugs.

@mihails-strasuns
Copy link
Author

Updated with a comment

Copy link

@daniel-zullo daniel-zullo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@daniel-zullo
Copy link

Ping. Can we please label this PR as high priority? I need this patch to make a daemon application works properly.

@mihails-strasuns-sociomantic
Copy link
Contributor

I will make a release as soon as @gavin-norman-sociomantic gives a green light :)

Mihails Strasuns added 2 commits July 17, 2018 13:38
Because of missing return statement, trying to schedule a new task
after shutdown would succeed but result in application infinitely stuck
during shutdown sequence.

With this fix, any such tasks will be ignored allowing for clean exit.
There is a concern of losing app data but it will have to be addressed
in a separate changest, targetting minor or maybe even major branch.
@gavin-norman-sociomantic gavin-norman-sociomantic merged commit 5662f8a into sociomantic-tsunami:v3.8.x Jul 17, 2018
@gavin-norman-sociomantic

Green light.

@mihails-strasuns mihails-strasuns deleted the fix-shutdown branch July 31, 2018 10:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants