Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try to prevent command lockup #555

Merged
merged 5 commits into from
Sep 22, 2018
Merged

Try to prevent command lockup #555

merged 5 commits into from
Sep 22, 2018

Conversation

julianoes
Copy link
Collaborator

This is an attempt to fix a lockup issue that we see in the Swift
example app when the camera capture status subscription is used.
Presumably, the lockup has nothing to do with the actual camera capture
status but it just leads to the right timing to trigger something else.

What happens is that receive_timeout() is called but then the command
queue turns out to be empty (returns nullptr). After that, no more
commands can be processed and we keep getting the warning:
"Command ack not matching our current command: 257".

This makes sense because we stop doing work, so send any commands as
long as _state is State::WAITING and there is no other timeout that
would reset this.

Therefore, this change should resolve that case by reseting _state, so
that we can keep going.

However, this change does not fix or explain why the queue is empty in
the first place and out of sync with _state.

This is an attempt to fix a lockup issue that we see in the Swift
example app when the camera capture status subscription is used.
Presumably, the lockup has nothing to do with the actual camera capture
status but it just leads to the right timing to trigger something else.

What happens is that `receive_timeout()` is called but then the command
queue turns out to be empty (returns `nullptr`). After that, no more
commands can be processed and we keep getting the warning:
"Command ack not matching our current command: 257".

This makes sense because we stop doing work, so send any commands as
long as `_state` is `State::WAITING` and there is no other timeout that
would reset this.

Therefore, this change should resolve that case by reseting `_state`, so
that we can keep going.

However, this change does not fix or explain why the queue is empty in
the first place and out of sync with `_state`.
This is an attempt to fix the issue described in the previous commit
message. We should remove any timeout handlers before poping work from
the queue to prevent that `receive_timeout()` is called in the moment
when the queue is already empty and doesn't know what to do about it.

Basically, we always try to follow this order to prevent things out of
sync:

1. Set the `_state`
2. Unregister timeout handler
3. Pop from queue.
4. Call user callbacks.

It is still unclear to me how the lockup case exactly happens. I can't
see how `_state` can ever be out of sync with the queue given the
protecting mutexes.
This has been addressed in the meantime.
This is supposed to be just a bugfix.
@julianoes
Copy link
Collaborator Author

I tested this and it doesn't break the integration test, so I suggest it's worth a try in the iOS example.

@julianoes julianoes changed the title core: try to prevent command lockup Try to prevent command lockup Sep 22, 2018
Copy link
Collaborator

@JonasVautherin JonasVautherin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Julian! Let's see if that fixes the problem! I'll try to review the code again soon, as anyway I need to get a better understanding of all that :-).

@JonasVautherin JonasVautherin merged commit c022f1c into develop Sep 22, 2018
@JonasVautherin JonasVautherin deleted the fix-command-lockup branch September 22, 2018 21:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants