You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
But after digging into it some more I have become fairly convinced this is a bug.
This example uses ros2 services. The server manages a vector of ints. Clients can "register" new values, and the server will append these values to the vector. Clients can also query a position in the vector, and if the position exists, the server will return the value in that position.
The issue I have is that when starting with my launch file (ros2 launch ros_service_mwe mwe_service.launch.py) sometimes the client that is waiting for a value at particular position gets stuck in spin_until_future_complete(). I did some debugging with WireShark and it looks like the service request never makes it to the wire. This is tricky to troubleshoot as it fails ~1 in 10 times on my particular machine, but the failure rate changes depending on the machine.
The easiest way to detect this condition is to look at the terminal output and wait for an instance where the [getNode] does not print: [INFO] [getNode]: Value of Pos: 7
I can solve this issue by adding a timeout / retry strategy to the spin_until_future_complete() call, but I really don't understand the root cause of the issue and am afraid I will mask an issue that will manifest in the future.
I started this minimal example with the add_two_ints example, so the print_usage() function still matches that example.
The problem comes down to a race condition between (A) when it is detected that the server is running and (B) when the topics between the server and client have actually been matched. In the Crystal release the client->wait_for_service() call returns after (A). Hence your first service call races with the event (B). If the call happens before (B) it won't arrive at the server and therefore the getter node won't be able to retrieve the value (ever).
Since the Crystal release this patch has been merged which makes the wait_for_service call only return true after (B) has happened. That should avoid the race you are seeing.
Please consider trying to build the latest state from the master branch to check if it resolves your problem.
Bug report
Required Info:
ros2.repos:
-Fast-RTPS
-rclcpp
Steps to reproduce issue
I initially posted this on ROS answers: https://answers.ros.org/question/313378/ros2-service-send-request-does-not-send/
But after digging into it some more I have become fairly convinced this is a bug.
This example uses ros2 services. The server manages a vector of ints. Clients can "register" new values, and the server will append these values to the vector. Clients can also query a position in the vector, and if the position exists, the server will return the value in that position.
The issue I have is that when starting with my launch file (ros2 launch ros_service_mwe mwe_service.launch.py) sometimes the client that is waiting for a value at particular position gets stuck in spin_until_future_complete(). I did some debugging with WireShark and it looks like the service request never makes it to the wire. This is tricky to troubleshoot as it fails ~1 in 10 times on my particular machine, but the failure rate changes depending on the machine.
The easiest way to detect this condition is to look at the terminal output and wait for an instance where the [getNode] does not print: [INFO] [getNode]: Value of Pos: 7
I can solve this issue by adding a timeout / retry strategy to the spin_until_future_complete() call, but I really don't understand the root cause of the issue and am afraid I will mask an issue that will manifest in the future.
I started this minimal example with the add_two_ints example, so the print_usage() function still matches that example.
Here is a link to the MWE: https://github.com/borgmanJeremy/ros_service_mwe
Expected behavior
The expected behavior is that spin_until_future_complete() will return once the server is available to handle the callback.
Actual behavior
The actual behavior is spin_until_future_complete() never returns
Additional Info:
This might be related to this question? #455
The text was updated successfully, but these errors were encountered: