-
Notifications
You must be signed in to change notification settings - Fork 913
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rospy publisher hanging when subscriber drops connection (e.g. because of wifi) #169
Comments
I ran into the same problem. It seems that rospy's use of blocking writes to sockets is indeed the cause. |
@tfoote I'm going to bring this back up -- as hydro tf/tf2 will use pure python publishers and this is going to cause a major change in how tf publishers operate over wifi/unreliable links compared to Groovy and earlier. As can be seen in https://github.com/ros/ros_comm/blob/groovy-devel/clients/rospy/src/rospy/impl/tcpros_base.py#L620, Ken never finished implementing async or retry logic, which means that if you have say, a remote RVIZ connected to a python node publishing tf, you will now have the python node hang if the wifi link drops out. You might recall the problems we had with a certain robot when python nodes started randomly hanging (in that case, it was an /odom topic). |
This is indeed a big problem... is anyone working on this? |
The actually issue has been in ROS since forever. rospy exposes a synchronous blocking publisher API. The problem is that addressing this issue requires asynchronous IO operations. In order to not block the publisher when one subscriber is not able to receive the information (drop-out of wireless, reading the data too slow) each message must be queued per connections and being delivered to the socket from a separate thread (as it is done in roscpp) This can not be done easily since there is no good way to keep the current API for publishing. E.g. signaling error conditions can not be done via exceptions since the error will happen asynchronously. May be the change @mikeferguson mentioned which he states to be new in "hydro tf/tf2" can be addressed within tf itself. @tfoote any ideas on that? |
@mikeferguson All python processes in tf were already using python based publishers. tf has always used native client communications. tf vs tf2 Was there more that you thought would exacerbate this issue? |
From the migration guide (http://wiki.ros.org/hydro/Migration):
I thought that meant tf had originally used C++ publishers in the wrapper. If not, then disregard the tf/tf2 aspects. Regardless though, this bug pretty much makes rospy useless over a wifi link, are we really saying it just can't be fixed? I guess I haven't looked at the upper level APIs, but I've looked at the roscpp queues, and thought we could implement something similar in here. |
I don't think the API creates any constraints - all the documented exceptions could happen immediately when publish() is called, not later when a write fails or something (since a write failing doesn't cause publish() to raise an exception in the caller). |
@mikeferguson The critical difference between tf2 and tf is that tf2's core library no longer uses time and all the timeout are implemented in python. That was where pytf had issues when there were timeouts insdie of the wrapped c++ but roscpp was not updating time because it was not initialized. @dirk-thomas and I have spent quite a while on the whiteboard making sure that we can do this without significantly changing existing behavior. We're closing in on a solution that I think will work. |
@tfoote thanks for the explanation, and thanks for taking a look at this! |
Please take a look at #308 and provide feedback. Would someone try to update tf2 to use this new option and verify that it fixes the describe problem adequately? |
Before closing this after releasing #308 we should collect a list of major publishers to be patched. So far I have
|
implement optional queueing for rospy publications (#169)
all four have been checked. |
Changes since 1.9.30: ^^^^^^^^^^^^^^^^^^^^^^^^ Changelog for package tf ^^^^^^^^^^^^^^^^^^^^^^^^ 1.11.3 (2014-05-07) ------------------- * convert to boost signals2 following `ros/ros_comm#267 <https://github.com/ros/ros_comm/issues/267>`_ Fixes `#23 <https://github.com/ros/geometry/issues/23>`_. Requires `ros/geometry2#61 <https://github.com/ros/geometry_experimental/issues/61>`_ as well. * add rospy publisher queue_size argument `ros/ros_comm#169 <https://github.com/ros/ros_comm/issues/169>`_ * add queue_size to tf publisher `ros/ros_comm#169 <https://github.com/ros/ros_comm/issues/169>`_ * make rostest in CMakeLists optional (`ros/rosdistro#3010 <https://github.com/ros/rosdistro/issues/3010>`_) * Contributors: Lukas Bulwahn, Tully Foote 1.11.2 (2014-02-25) ------------------- * fixing test linking * Contributors: Tully Foote 1.11.1 (2014-02-23) ------------------- 1.11.0 (2014-02-14) ------------------- * TF uses ros::MessageEvent to get connection information * Contributors: Kevin Watts, Tully Foote 1.10.8 (2014-02-01) ------------------- * Port groovy-devel patch to hydro-devel * Added rosconsole as catkin dependency for catkin_package * Add rosconsole as runtime dependency * Contributors: Michael Ferguson, Mirza Shah 1.10.7 (2013-12-27) ------------------- * fix bug in tf::Matrix3x3::getEulerYPR() Fixes a bug in tf::Matrix3x3::getEulerYPR() implementation's handling of gimbal lock cases (when the new x axis aligns with the old +/-z axis). * add test that demonstrated bug in tf::Matrix3x3 tf::Matrix3x3::getEulerYPR() has a bug which returns an incorrect rpy for certain corner case inputs. This test demonstrates that bug. * Fix const correctness of tf::Vector3 rotate() method The method does not modify the class thus should be const. This has already been fixed in Bullet itself. * add automatic tf buffer cleaning on bag loop for python This logic was already implemented for c++ but not for the python module. * Contributors: Acorn Pooley, Timo Rohling, Tully Foote, v4hn 1.10.6 (2013-08-28) ------------------- * switching to wrapper scripts which will provide a deprecation warning for `#3 <https://github.com/ros/geometry/issues/3>`_ * add missing roswtf dependency to really export the plugin (fix `#27 <https://github.com/ros/geometry/issues/27>`_) * Update listener.py Fix the tf listener service exception in rospy. See: http://answers.ros.org/question/10777/service-exception-using-tf-listener-in-rospy/ * Fix MessageFilter race condition If MessageFilter does not explicitly stop its callback timer when it's being destroyed, there is a race condition when that timer is processed in a callback queue run by a different thread. Specifically, maxRateTimerCallback() may be called after messages_mutex_ has been destroyed, causing a unrecoverable error. 1.10.5 (2013-07-19) ------------------- * tf: export dependency on tf2_ros Fixes `#21 <https://github.com/ros/geometry/issues/21>`_ * added run dependency on graphviz closes `#19 <https://github.com/ros/geometry/issues/19>`_ 1.10.4 (2013-07-11) ------------------- * fixing erase syntax * resolving ros/geometry#18 using implementation added in tf2::BufferCore, adding dependency on next version of tf2 for this 1.10.3 (2013-07-09) ------------------- * fixing unittest for new resolve syntax 1.10.2 (2013-07-09) ------------------- * strip leading slashes in resolve, and also any time a method is passed from tf to tf2 assert the leading slash is stripped as well. tf::resolve with two arguments will end up with foo/bar instead of /foo/bar. Fixes ros/geometry2#12 * added two whitespaces to make message_filter compile with c++11 more on this here: http://stackoverflow.com/questions/10329942/error-unable-to-find-string-literal-operator-slashes * using CATKIN_ENABLE_TESTING to optionally configure tests in tf 1.10.1 (2013-07-05) ------------------- * updating dependency requirement to tf2_ros 0.4.3 * removing unused functions removing unused private methods removing ``max_extrapolation_distance_`` removing unused data storage _frameIDs frameIDS_reverse ``frame_authority_`` removing cache_time from tf, passing through method to tf2 buffer_core removing unused variables ``frames_`` and ``frame_mutex_`` and ``interpolating_`` removing unused mutex and transformchanged signaling commenting on deprecation of MAX_EXTRAPOLATION_DISTANCE 1.10.0 (2013-07-05) ------------------- * adding versioned dependency on recent geometry_experimental changes * fixing test dependencies * fixing callbacks for message filters * remove extra invalid comment * dedicated thread logic all implemented * removing commented out code * mostly completed conversion of tf::TransformListener to use tf2 under the hood * lookuptwist working * tf::Transformer converted to use tf2::Buffer under the hood. passing tf_unittest.cpp * making tf exceptions typedefs of tf2 exceptions for compatability * first stage of converting Transformer to Buffer * switching to use tf2's TransformBroadcaster * adding dependency on tf2_ros to start moving over contents * fixing unit tests
implement optional queueing for rospy publications (ros#169)
I'm submitting an issue for this question on ROS answers:
http://answers.ros.org/question/29387/rospy-publisher-hanging-when-subscriber-drops-connection-because-of-wifi-is-this-normal/
Summary:
If a rospy publisher is connected to a subscriber on a different computer and you disconnect the ethernet cable, the publisher hangs after about a minute.
Issue can be replicated with the rospy talker node running on a different computer than the listener node.
The cause might be here:
https://github.com/ros/ros_comm/blob/groovy-devel/clients/rospy/src/rospy/impl/tcpros_base.py#L621
The text was updated successfully, but these errors were encountered: