-
Notifications
You must be signed in to change notification settings - Fork 435
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MultiThreadedExecutor(MutuallyExclusive) sometimes misses to execute Waitable #2012
Comments
We attach our analysis here. Detailed analysis as belowvoid
MultiThreadedExecutor::run(size_t this_thread_number)
{
(void)this_thread_number;
while (rclcpp::ok(this->context_) && spinning.load()) {
rclcpp::AnyExecutable any_exec;
{
std::lock_guard wait_lock{wait_mutex_};
if (!rclcpp::ok(this->context_) || !spinning.load()) {
return;
}
if (!get_next_executable(any_exec, next_exec_timeout_)) {
continue;
}
} // <== There is a lock to make sure only one thread can execute these codes
if (yield_before_execute_) {
std::this_thread::yield();
}
execute_any_executable(any_exec);
// Clear the callback_group to prevent the AnyExecutable destructor from
// resetting the callback group `can_be_taken_from`
any_exec.callback_group.reset();
}
} multi-threads only can execute execute_any_executable() in parallel. In goal_callback(), publish twice. bool
Executor::get_next_ready_executable_from_map(...)
{
...
if (success) {
// If it is valid, check to see if the group is mutually exclusive or
// not, then mark it accordingly ..Check if the callback_group belongs to this executor
if (any_executable.callback_group && any_executable.callback_group->type() == \
CallbackGroupType::MutuallyExclusive)
{
// It should not have been taken otherwise
assert(any_executable.callback_group->can_be_taken_from().load());
// Set to false to indicate something is being run from this group
// This is reset to true either when the any_executable is executed or when the
// any_executable is destructued
any_executable.callback_group->can_be_taken_from().store(false);
}
}
} Note that feedback_subscription and local_subscription use the default callback group (That is, use the same one). void
get_next_waitable(...)
{
...
if (!group->can_be_taken_from().load()) {
// Group is mutually exclusive and is being used, so skip it for now
// Leave it to be checked next time, but continue searching
++it;
continue;
}
...
} Another waitable handle cannot be dealt with since callback group is 'locked' by thread A. bool
Executor::get_next_executable(AnyExecutable & any_executable, std::chrono::nanoseconds timeout)
{
bool success = false;
// Check to see if there are any subscriptions or timers needing service
// TODO(wjwwood): improve run to run efficiency of this function
success = get_next_ready_executable(any_executable);
// If there are none
if (!success) {
// Wait for subscriptions or timers to work on
wait_for_work(timeout);
if (!spinning.load()) {
return false;
}
// Try again
success = get_next_ready_executable(any_executable);
}
return success;
} At the beginning of wait_for_work, it will clear all handles (including waitable_handles_). void
Executor::wait_for_work(std::chrono::nanoseconds timeout)
{
...
memory_strategy_->clear_handles();
...
} There is a senior which publishing message |
We have one solution, but no good. |
CC: @alsora @mauropasse |
This does not seem specific to intra-process communication: the same problem could happen with any other waitable object (although intra-process communication is likely the heaviest user of this class). It looks like there are multiple, all related, issues:
The first thing I want to point out is that all the involved code is "removed" if using the events executor. This problem only happens with a mutually exclusive callback group, so what is the expected behavior here?
These two options are very different and essentially translate to either a lock-free or locked implementation. I have to admit that I never used the ROS multi-threaded executor in a real application (i used it only for some benchmarking and I found it less efficient than having multiple single-threaded executors), so I don't know much about its design. Having said that, I can see many different ways to "fix" this problem:
|
Yes.
This leads each DDS needs to change. |
@alsora appreciate for quick response and information.
agree, i will change the subject accordingly. |
this blocks everything else, until this
i think this sounds reasonable.
(correction, it is rmw implementation but DDS.) (just a idea) what about rclcpp lets rmw implementation can skip the
as a work-around, we can use |
Hello, I unfortunately noticed some callback do not fire under heavy load using multi threaded container and full intra-process solution with a bunch of components. Are there any planned actions to solve this problem? Any updates? |
@fujitatomoya, just did a quick test and it seems to work as expected👍👍 |
addressed by #2109, closing. |
Bug report
Required Info:
Steps to reproduce issue
reproducible project
procedure
Expected behavior
all subscriptions should be able to receive message for each publication.
Actual behavior
sometimes subscription does not call user callback on message.
Additional information
we can avoid this problem if,
The text was updated successfully, but these errors were encountered: