-
Notifications
You must be signed in to change notification settings - Fork 436
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Executor quick destruction after spin hangs #1454
Comments
The semantics of
I'm not sure what is the intended semantics of
Maybe instead of that, the first @wjwwood any ideas? |
What about providing the status if spinning or not to user application? so that user can do the following.
|
That doesn't seem to solve the problem, the code seems to have the same race as before. |
My opinion is that instead of setting the
IMO, we should have both: apply the first fix to I'm marking this as |
Changing it to be "cancel the current spin() loop or the following call to spin()" just moves the problem around I think, because what if you want to spin(), then either spin finish itself or cancel it asynchronously, then spin again. If you do that then it's a race to see if cancel will stop the first spin or if it fires off just after spin finishes normally. In that case sometimes the second spin would do something and other times it would return immediately. Instead maybe we could leave cancel as-is and provide a TEST(Foo, bar) {
auto node = std::make_shared<rclcpp::Node>("test");
auto executor = std::make_shared<rclcpp::executors::SingleThreadedExecutor>();
executor->add_node(node);
auto executor_spin_future = std::async(
std::launch::async, [&executor]() -> void {
executor->spin();
});
ASSERT_TRUE(executor->wait_for_spin(100ms));
executor->cancel();
} I personally don't see any need for a shutdown since you can just destroy the executor. In python this is more complicated because reliably invoking the destructor is more difficult and so maybe a shutdown makes sense there. But I'm not opposed to it either. |
Don't you still have the same problem with
The usefulness of a while(executor.is_ok()) {
executor.spin()
} and in another thread asynchronously call |
Right, or
I don't think so? Because the reason for the first But maybe I'm looking at it wrong?
Why not just |
mmm, I will provide more detailed examples to see if I understand: TEST(Foo, bar) {
auto node = std::make_shared<rclcpp::Node>("test");
auto executor = std::make_shared<rclcpp::executors::SingleThreadedExecutor>();
executor->add_node(node);
auto executor_spin_future = std::async(
std::launch::async, [&executor]() -> void {
executor->spin();
executor->spin();
});
ASSERT_TRUE(executor->wait_for_spin(100ms));
executor->cancel();
} will always cancel the first TEST(Foo, bar) {
auto node = std::make_shared<rclcpp::Node>("test");
auto executor = std::make_shared<rclcpp::executors::SingleThreadedExecutor>();
executor->add_node(node);
auto executor_spin_future = std::async(
std::launch::async, [&executor]() -> void {
executor->spin_some();
executor->spin();
});
ASSERT_TRUE(executor->wait_for_spin(100ms));
executor->cancel();
} might cancel either the first with the proposed "cancel() cancels the current/next spin()" behavior: Case 1. also works fine behavior (and doesn't require a timeout). Moreover, in loops like the following: TEST(Foo, bar) {
auto node = std::make_shared<rclcpp::Node>("test");
auto executor = std::make_shared<rclcpp::executors::SingleThreadedExecutor>();
executor->add_node(node);
auto executor_spin_future = std::async(
std::launch::async, [&executor]() -> void {
while (!executor_cancelled_condition) {
executor->spin_some();
// do more work here
}
});
ASSERT_TRUE(executor->wait_for_spin(100ms));
executor->cancel();
} The There's another alternative: a
The issue is how do you get the "executor.is_ok()" (or
is fixed. Maybe the question is: what is the use case of |
For case 2, I think that's just not going to work as intended, as you said your proposed change to how cancel works would still not work for case 2. Further more, I don't see the point in case 2, I think you'd want to test something in between, e.g.: TEST(Foo, bar) {
auto node = std::make_shared<rclcpp::Node>("test");
rclcpp::executors::SingleThreadedExecutor executor;
executor.add_node(node);
auto executor_spin_future = std::async(
std::launch::async, [&executor]() -> void {
executor.spin_some();
});
ASSERT_TRUE(executor.wait_for_spin(100ms));
executor.cancel();
// ASSERT that spin_some did something you expected
executor_spin_future.get();
executor_spin_future = std::async(
std::launch::async, [&executor]() -> void {
executor.spin();
});
ASSERT_TRUE(executor.wait_for_spin(100ms));
// ASSERT spin did something you expected, perhaps in a loop or wait on a condition variable from a callback
executor.cancel();
} If you're testing some interaction with TEST(Foo, bar) {
auto node = std::make_shared<rclcpp::Node>("test");
rclcpp::executors::SingleThreadedExecutor executor;
executor.add_node(node);
std::mutex m;
std::condition_variable cv;
bool finished = false;
auto executor_spin_future = std::async(
std::launch::async, [&]() -> void {
executor.spin_some();
executor.spin();
finished = true;
cv.notify_one();
});
do {
std::unique_lock<std::mutex> lk(m);
cv.wait_for(100ms, [&]() {return finished;});
EXPECT_TRUE(finished);
if (!finished) {
executor.cancel();
}
} while (!finished);
// ASSERT some state after the spin_some/spin sequence
} For case 3, you can cover most cases with this: TEST(Foo, bar) {
auto node = std::make_shared<rclcpp::Node>("test");
rclcpp::executors::SingleThreadedExecutor executor;
executor.add_node(node);
bool stop = false;
auto executor_spin_future = std::async(
std::launch::async, [&executor]() -> void {
while (!stop && !executor_cancelled_condition) {
// potential delay
executor.spin_some();
// do more work here
}
});
ASSERT_TRUE(executor.wait_for_spin(100ms));
stop = true;
executor.cancel();
} There's still a race if the cancel comes during the "potential delay" area, but it would only have to wait for one more
I don't see why the executor needs to provide that state. I mean the executors != the init/shutdown/ok system we have globally. That's more like the context. The user could provide this state themselves I think.
This was intentional, I think it was called
I mean, we're using it all over the place to spin indefinitely and then stop that spin once some scenario has been met, e.g.:
I think this is a perfectly normal use case. When you include tests that race I suppose I'm fine with changing how cancel works, but maybe we should keep a method that works like the current one too, i.e. it just interrupts a spin but only if there is one at that moment. |
@wjwwood I would like to start contributing to this, can you help me get started, I have built |
You should use this file for development: https://github.com/ros2/ros2/blob/master/ros2.repos You can use git from there. |
@ivanpauno is this issue not beginner friendly? |
I'm not pretty sure what we want to do here, so resolving the issue involves first discussing what's desired. I will try to reread the comments again in a few days and post something. I also feel the discussion somewhat overlaps with how |
seems much of the problem is the need to perform an action before (or after) some other action, requiring additional machinery to coordinate this in a multithreaded environment. And theres not a lot of guidance about how best to go about that. auto execution_token = executor.create_token();
std::thread t([&](){
// will spin until token is cancelled
// spinning already cancelled token just returns
executor.spin(execution_token);
});
execution_token.cancel();
t.join(); where behaviour is the same independent of execution order. Perhaps |
@rpaaron that's a great example, you can already do that with auto promise = std::promise<void>{};
auto future = promise.get_future();
std::thread t([&](){
// will spin until the future is complete
// spinning an already completed future just returns
executor.spin_until_future_complete(future);
});
promise.set_value();
t.join(); |
thanks @ivanpauno . Is this approach the current advice, in order to avoid a race condition here? void Someclass::start {
executor_thread_ = std::thread([this]() {
executor_.spin();
});
}
Someclass::~Someclass() {
if (executor_thread_.joinable()) {
executor_.cancel();
executor_thread_.join();
} which seems like it would be vulnerable to a race situation, ie if the object is destructed before |
Yeah, that code looks racy.
I think that something like the example above would work. |
@ivanpauno I'm trying to use this pattern for having multiple nodes in their own executors and threads in a test. One node works fine, but when I have multiple nodes template <typename T>
class SingleThreadedNode {
public:
explicit SingleThreadedNode(const std::string name) {
node = std::make_shared<T>(name);
exec_.add_node(node);
thread_ = std::make_unique<std::thread>([&]() {
RCLCPP_INFO(node->get_logger(), "Starting up %s", node->get_name());
exec_.spin_until_future_complete(future_);
});
}
~SingleThreadedNode() {
RCLCPP_INFO(node->get_logger(), "Trying to shutdown %s", node->get_name());
promise_.set_value();
// I don't know why i need this, but without it multiple nodes don't close
exec_.cancel();
thread_->join();
RCLCPP_INFO(node->get_logger(), "Shutdown %s", node->get_name());
}
rclcpp::Node::SharedPtr node;
private:
std::unique_ptr<std::thread> thread_;
rclcpp::executors::SingleThreadedExecutor exec_;
std::promise<void> promise_{};
std::future<void> future_ = promise_.get_future();
};
TEST(UtilsTest, SingleThreadedNodeRaces) {
// Test create and rapidly destroying multiple nodes. This tests for nasty races
auto quickNode1 = std::make_shared<SingleThreadedNode<rclcpp::Node>>("test_node1");
auto quickNode2 = std::make_shared<SingleThreadedNode<rclcpp::Node>>("test_node2");
auto quickNode3 = std::make_shared<SingleThreadedNode<rclcpp::Node>>("test_node3");
auto quickNode4 = std::make_shared<SingleThreadedNode<rclcpp::Node>>("test_node4");
} |
Bug report
Required Info:
Steps to reproduce issue
Expected behavior
That the test ends.
Actual behavior
Very frequently, the test hangs on the future destruction, waiting for thread join();
Because the thread is stuck on
executor->spin()
(wait_for_work exactly)Additional information
Uncommenting the
sleep_for
fixes the issue on my machine, so there seems to be a race condition.In our use case, we have test than end faster than that time and trigger this issue, we can add the sleep as a workaround but it's ugly: https://github.com/ros-controls/ros2_control/pull/234/files/833b1661209cc12a9d6f9ef45c87f4072e641aa7#diff-28c9aa0318c48cffb66c294bbf17394b6f9295bb8b9fb83d2cac539699f7e354R130
The text was updated successfully, but these errors were encountered: