Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve locked queue deadlocks #601

Merged
merged 15 commits into from
Dec 17, 2018
Merged

Improve locked queue deadlocks #601

merged 15 commits into from
Dec 17, 2018

Conversation

julianoes
Copy link
Collaborator

@julianoes julianoes commented Nov 21, 2018

This is a further attempt following #590 to prevent potential deadlocks.

  • mavlink_commands: remove state variable and mutex.
  • mavlink_parameters: remove state variable and mutex.
  • locked_queue: avoid re-locking on pop (always require borrow first).

@JonasVautherin
Copy link
Collaborator

@hdiethelm: I would love to get your review on that, if you have time! :-)

@julianoes julianoes force-pushed the fix-commands-deadlock branch 2 times, most recently from e9d02aa to 31c2315 Compare November 21, 2018 14:32
@JonasVautherin
Copy link
Collaborator

@julianoes Should this be a [WIP] PR?

@hdiethelm
Copy link

Oh, quite a lot of changes! I don't have the time to review all, sorry. I just looked a bit trough and so far it looks good!

The only risk is that somewhere you do a borrow_front and forget the corresponding pop_front/return_front resulting also in a deadlock.

I was thinking we could also drop locked_queue completely and instead use a normal queue as it is. Then add a mutex for each queue directly in MAVLinkParameters and MAVLinkCommands and lock it using lock_guard or unique_lock so unlocking is guaranteed. The disadvantage is if you forget to lock the mutex, you could mess up the queue.

I have also an other idea, I'll just test if it works.

@hdiethelm
Copy link

Check this one:
hdiethelm@f33b63d
It should fix the potential problem of double borrow and missed pop/return. But I just tested it using the unit tests.

@hdiethelm
Copy link

I added a test and it works:
hdiethelm@e986d17
By the way: Is there a way to catch intentionally generated asserts? I tested also the double_borrow from the same process and pop / return without borrow case and the test fails with assert as it should... ;-) However, the test runner is killed which is not so nice.

@julianoes
Copy link
Collaborator Author

So, yes we can test for an assert using EXPECT_DEATH(call(), "assert bla failed...").
But one problem is that the test will fail for Release builds where asserts are not built.

@hdiethelm
Copy link

What do you think about my suggestion, passing a unique_lock with no associated mutex to borrow_front() to avoid an accidental lock without unlock?
hdiethelm@f33b63d and hdiethelm@e986d17

@julianoes
Copy link
Collaborator Author

What do you think about my suggestion, passing a unique_lock with no associated mutex to borrow_front() to avoid an accidental lock without unlock?

Thanks yes! I think that's definitely better because it can avoid mistakes. I was hoping that we can have the locked queue pass some kind of smart pointer thing that keeps the lock internal and returns if it goes out of scope. I didn't have time to try that though.

@hdiethelm
Copy link

hdiethelm commented Nov 22, 2018

What do you think about my suggestion, passing a unique_lock with no associated mutex to borrow_front() to avoid an accidental lock without unlock?

Thanks yes! I think that's definitely better because it can avoid mistakes. I was hoping that we can have the locked queue pass some kind of smart pointer thing that keeps the lock internal and returns if it goes out of scope. I didn't have time to try that though.

unique_lock is one way of that kind of smart pointer.
However, you could also write some kind of queue_guard class which calls return_front in it's destructor. Same effect, somewhat nicer look.
So it would be something like:

{
    queue_guard guard(locked_queue);
    locked_queue.borrow_front(gard);
    //No return_front needed, ~queue_guard handles this
}

My way is:

{
    std::unique_lock<std::mutex> queue_lock;
    locked_queue.borrow_front(queue_lock);
    //No return_front needed, ~unique_lock unlocks the mutex
}

@julianoes julianoes force-pushed the fix-commands-deadlock branch from 9f7b0a4 to 4d97764 Compare November 23, 2018 09:44
@julianoes
Copy link
Collaborator Author

I gave this a try to get to something close to the guard semantics:

auto guard = _work_queue.guard();
auto work = guard.get_front();

if (bla) {
    guard.pop_front();
}

See the last 3 commits. What do you think?

@hdiethelm
Copy link

Looks nice! I think that should do it!

@julianoes julianoes force-pushed the fix-commands-deadlock branch from 138bcd1 to 1748811 Compare November 26, 2018 11:40
@hdiethelm
Copy link

hdiethelm commented Nov 26, 2018

locked_queue: fix MSVC Debug unit test
That was something i was also thinking about.

Guard guard()
{
        Guard new_guard{_mutex, _queue};
        return new_guard;
}

is not so nice.

QueueGuard guard(_work_queue);
auto work = guard.get_front();
if (bla) {
    guard.pop_front();
}

could be better. And then overwrite the copy constructor of QueueGuard = delete, so it can not be copied as it is done in unique_lock.

Edit: added ``` around code.

@julianoes
Copy link
Collaborator Author

QueueGuard guard(_work_queue);

Ah, I didn't think of that, yes that would be nicer!

@hdiethelm
Copy link

Nice!

Copy link
Collaborator Author

@julianoes julianoes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just reviewed this myself. I think it's all correct.

@julianoes
Copy link
Collaborator Author

I would appreciate a review @JonasVautherin or @hdiethelm. @hdiethelm I made you a collaborator if you want to be involved 😄

JonasVautherin
JonasVautherin previously approved these changes Dec 17, 2018
Copy link
Collaborator

@JonasVautherin JonasVautherin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spent some time reviewing that, and as far as I can tell, it looks good to me.

@julianoes
Copy link
Collaborator Author

Awesome, thanks @JonasVautherin. I'll rebase and merge.

julianoes and others added 5 commits December 17, 2018 10:12
In my opinion we dont really need the _state variable because we can
derive what is going on from set timeouts and what is in the work queue.
All that we actually need is to know if we already sent an item which
has been added as a flag to each work item.
We can no longer do pops after the queue is empty and we always need to
borrow before doing a pop.
This removes the state variable to prevent potential deadlocks. Instead
a work item now contains a flag whether it has already been sent.
To make everything simpler it makes more sense to have one work queue
instead of one for get and one for set. This way we just work through
the items as they come in instead of an (unexpected) priorization of get
over set.

Also, this means that we can get rid of some code duplication, and other
uglyness.
This is a try to use something like a lock guard instead of the borrow
and return semantic.
This means all consumers need to switch to the new Guard semantics.
This fixes the unit test asserting for the MSVC debug build.
The assert is prevented by using a workaround for this case where the
return value optimization does not trigger and the move constructor is
used.
This is closer to std::lock_guard and doesn't require a return copy/move
of the Guard.
@julianoes julianoes merged commit 4332994 into develop Dec 17, 2018
@julianoes julianoes deleted the fix-commands-deadlock branch December 17, 2018 09:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants