Try to prevent command lockup #555

julianoes · 2018-09-22T10:10:32Z

This is an attempt to fix a lockup issue that we see in the Swift
example app when the camera capture status subscription is used.
Presumably, the lockup has nothing to do with the actual camera capture
status but it just leads to the right timing to trigger something else.

What happens is that receive_timeout() is called but then the command
queue turns out to be empty (returns nullptr). After that, no more
commands can be processed and we keep getting the warning:
"Command ack not matching our current command: 257".

This makes sense because we stop doing work, so send any commands as
long as _state is State::WAITING and there is no other timeout that
would reset this.

Therefore, this change should resolve that case by reseting _state, so
that we can keep going.

However, this change does not fix or explain why the queue is empty in
the first place and out of sync with _state.

This is an attempt to fix a lockup issue that we see in the Swift example app when the camera capture status subscription is used. Presumably, the lockup has nothing to do with the actual camera capture status but it just leads to the right timing to trigger something else. What happens is that `receive_timeout()` is called but then the command queue turns out to be empty (returns `nullptr`). After that, no more commands can be processed and we keep getting the warning: "Command ack not matching our current command: 257". This makes sense because we stop doing work, so send any commands as long as `_state` is `State::WAITING` and there is no other timeout that would reset this. Therefore, this change should resolve that case by reseting `_state`, so that we can keep going. However, this change does not fix or explain why the queue is empty in the first place and out of sync with `_state`.

This is an attempt to fix the issue described in the previous commit message. We should remove any timeout handlers before poping work from the queue to prevent that `receive_timeout()` is called in the moment when the queue is already empty and doesn't know what to do about it. Basically, we always try to follow this order to prevent things out of sync: 1. Set the `_state` 2. Unregister timeout handler 3. Pop from queue. 4. Call user callbacks. It is still unclear to me how the lockup case exactly happens. I can't see how `_state` can ever be out of sync with the queue given the protecting mutexes.

This has been addressed in the meantime.

This is supposed to be just a bugfix.

julianoes · 2018-09-22T11:51:56Z

I tested this and it doesn't break the integration test, so I suggest it's worth a try in the iOS example.

JonasVautherin

Thanks Julian! Let's see if that fixes the problem! I'll try to review the code again soon, as anyway I need to get a better understanding of all that :-).

core/mavlink_commands.cpp

julianoes requested a review from JonasVautherin September 22, 2018 10:10

julianoes added the in progress label Sep 22, 2018

julianoes added 4 commits September 22, 2018 13:35

core: remove comment

9e55dc1

This has been addressed in the meantime.

core: improve printf, add an unlikely error info

9cce318

version: bump to 0.2.8

397acae

This is supposed to be just a bugfix.

julianoes changed the title ~~core: try to prevent command lockup~~ Try to prevent command lockup Sep 22, 2018

JonasVautherin approved these changes Sep 22, 2018

View reviewed changes

core/mavlink_commands.cpp Show resolved Hide resolved

core/mavlink_commands.cpp Show resolved Hide resolved

core/mavlink_commands.cpp Show resolved Hide resolved

JonasVautherin merged commit c022f1c into develop Sep 22, 2018

JonasVautherin deleted the fix-command-lockup branch September 22, 2018 21:14

JonasVautherin removed the in progress label Sep 22, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Try to prevent command lockup #555

Try to prevent command lockup #555

julianoes commented Sep 22, 2018

julianoes commented Sep 22, 2018

JonasVautherin left a comment

Try to prevent command lockup #555

Try to prevent command lockup #555

Conversation

julianoes commented Sep 22, 2018

julianoes commented Sep 22, 2018

JonasVautherin left a comment

Choose a reason for hiding this comment