-
Notifications
You must be signed in to change notification settings - Fork 237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix the race condition while calling rcl_shutdown #1353
Fix the race condition while calling rcl_shutdown #1353
Conversation
Regarding the fix for issue #1352, I'd like to hear your thoughts. About issue #1352, there is an issue One thread is calling rclpy::shutdown() rclpy/rclpy/src/rclpy/context.cpp Lines 44 to 53 in 8a2e2d3
Another thread is calling Context::shutdown() rclpy/rclpy/src/rclpy/context.cpp Lines 151 to 165 in 8a2e2d3
rcl_shutdown() will be called twice. And rcl_shutdown() may be called at the same time (Calling rcl_shutdown() isn't protected by rclpy/rclpy/test/test_init_shutdown.py Lines 77 to 83 in 8a2e2d3
My question is whether we still need to ensure that multiple calls to Context::shutdown() throw an exception ? Maybe It is more appropriate to display a warning In this scenario. What do you think ? |
Friendly ping @sloretz ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO this is correct fix.
rclpy/test/test_init_shutdown.py
Outdated
rclpy.init(context=context) | ||
assert context.ok() | ||
rclpy.shutdown(context=context) | ||
with pytest.raises(RuntimeError): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to remove this test?
I think we can keep this test, but expects RCLError
exception here to catch from 2nd rcl_shutdown
(internally RCL_RET_ALREADY_SHUTDOWN
returns in rcl) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your comments.
I think we can keep this test, but expects RCLError exception here to catch from 2nd rcl_shutdown (internally RCL_RET_ALREADY_SHUTDOWN returns in rcl) ?
With the current fix, if the context is not in g_contexts, rcl_shutdown will not be called.
According to the issue described in #1352, it is indeed possible for two threads to call rcl_shutdown on the same context. However, one call is from rclpy::shutdown() and the other from Context::shutdown(). In this situation, we do not want to see an exception being thrown.
The test calls Context::shutdown() twice, and it is reasonable to expect an exception on the second call. Therefore, I am considering setting a flag variable in the Context class. If Context::shutdown() has been successfully called, this variable will be set. If Context::shutdown() is called again, it will check if the flag has been set and, if so, will throw an exception CONTEXT_ALREADY_SHUTDOWN (not from RCL). What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Therefore, I am considering setting a flag variable in the Context class. If Context::shutdown() has been successfully called, this variable will be set. If Context::shutdown() is called again, it will check if the flag has been set and, if so, will throw an exception CONTEXT_ALREADY_SHUTDOWN (not from RCL). What do you think?
Please review 0eb7ee5
rcl_ret_t ret = rcl_shutdown(rcl_context_.get()); | ||
if (RCL_RET_OK != ret) { | ||
throw RCLError("failed to shutdown"); | ||
auto iter = std::find(g_contexts.begin(), g_contexts.end(), rcl_context_.get()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so the g_contexts
is the collection of valid context, and this means if it cannot find the context in the g_contexts
during this shutdown call, that is the invalid operation? so we can generate the exception without introducing already_shutdown_
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so the g_contexts is the collection of valid context, and this means if it cannot find the context in the g_contexts during this shutdown call, that is the invalid operation?
Yes.
rc_context_
is added to g_contexts in constructor of Context. Before calling rcl_shutdown(), rc_context_
is removed from g_contexts.
so we can generate the exception without introducing already_shutdown_?
already_shutdown_
is only used for multiple calls to Context::shutdown() to throw the exception.
About issue reported by #1352, it describes a possible scenario. There are two threads to call rcl_shutdown for the same context. One call is from rclpy::shutdown() and the other from Context::shutdown(). In this situation, we do not want to get an exception in Context::shutdown().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that is exactly why we have std::lock_guard<std::mutex> guard{g_contexts_mutex};
lock escalated to the this function scope? so that when rclpy::shutdown_contexts is under process, we will not find the iterator with this context, so generate the exception. i really do not understand to have already_shutdown_
flag internally...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i really do not understand to have already_shutdown_ flag internally...
Let me first summarize the two scenarios we currently need to address:
- The scenario mentioned in
rcl_shutdown
is not thread-safe when used in bothsignal_handler
andContext.__exit__
. #1352. There are two threads to call rcl_shutdown for the same context. One call is from rclpy::shutdown() and the other from Context::shutdown(). For this scenario, we don't want to get exception. - For one context instance, Context::shutdown() is being called multiple times. At this scenario, we want to throw an exception. (There is an existed test to check if an exception is thrown.) The fix for 1 would prevent 2 from throwing an exception. That's why already_shutdown_ ``` was added to indicate that Context::shutdown() has already been called once on the context instance. Do you have any better suggestions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fujitatomoya Friendly ping.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the explanation, i understand what you mean. if we take already_shutdown_
, then we need to reinitialize already_shutdown_
during context initialization? unless, already_shutdown_
will never be False?
IMO, Context::shutdown()
should generate the exception only when rcl_shutdown
fails with valid context. this actually aligns with rclcpp
behavior. of course, if we change like this, we need to adjust the unit tests accordingly.
maybe we can keep this open for more feedback.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the explanation, i understand what you mean. if we take already_shutdown_, then we need to reinitialize already_shutdown_ during context initialization? unless, already_shutdown_ will never be False?
Yes.
IMO, Context::shutdown() should generate the exception only when rcl_shutdown fails with valid context. this actually aligns with rclcpp behavior. of course, if we change like this, we need to adjust the unit tests accordingly.
Yes. The addition of already_shutdown_ is just to pass the existing tests (Ensure consistency with previous behavior.)
maybe we can keep this open for more feedback.
Yes.
@fujitatomoya do you mind doing another round of review here whenever you get a chance? |
@Barry-Xu-2018 is this still draft? are we waiting for someone or something to make if official? |
Oh, I forgot to change the status. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm generally fine with this logic (and it seems to fix some of the errors we've been having in Zenoh). I've left one naming thing to fix, and this needs to be rebased. Once those things are done, I'm happy to approve and run CI on this.
rclpy/src/rclpy/exceptions.hpp
Outdated
@@ -80,6 +80,11 @@ class InvalidHandle : public std::runtime_error | |||
using std::runtime_error::runtime_error; | |||
}; | |||
|
|||
class CONTEXT_ALREADY_SHUTDOWN : public std::runtime_error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
class CONTEXT_ALREADY_SHUTDOWN : public std::runtime_error | |
class ContextAlreadyShutdown : public std::runtime_error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I will update codes.
Signed-off-by: Barry Xu <[email protected]>
Signed-off-by: Barry Xu <[email protected]>
Signed-off-by: Barry Xu <[email protected]>
Signed-off-by: Barry Xu <[email protected]>
cfcf8e3
to
b439d70
Compare
i think this still needs to be addressed. with that, i am okay with this PR! |
One context has one |
As far as I can tell, this is correct. There is no method on the So with that, I'm happy with this. I'm going to approve. I'll wait a few hours before merging just to give @fujitatomoya a chance to respond. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Barry-Xu-2018 thanks for the explanation, as you say there is no reinitialize way to do for user application.
@Barry-Xu-2018 @clalancette btw, we can backport this to Jazzy? i think we can, in the |
I think yes. What do you think ? @clalancette |
Address #1352