Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to get trigger guard condition in jump callback #876

Closed
ThomasRFischer opened this issue Nov 18, 2020 · 4 comments · Fixed by #877
Closed

Failed to get trigger guard condition in jump callback #876

ThomasRFischer opened this issue Nov 18, 2020 · 4 comments · Fixed by #877
Assignees

Comments

@ThomasRFischer
Copy link

Bug report

Required Info:

  • Operating System:

Ubuntu 18.04.5 LTS

Steps to reproduce issue

The issue is intermittent

Expected behavior

It should run without error

Actual behavior

[planner_server-21] [ERROR 1605305996.010625274] [rcl]: Failed to get trigger guard condition in jump callback (/ros2_foxy_build/core/src/ros2/rcl/rcl/src/rcl/timer.c:114)
[planner_server-21]
[planner_server-21] >>> [rcutils|error_handling.c:108] rcutils_set_error_state()
[planner_server-21] This error state is being overwritten:
[planner_server-21]
[planner_server-21] 'guard_condition_handle not from this implementation, at /ros2_foxy_build/core/src/ros2/rmw_cyclonedds/rmw_cyclonedds_cpp/src/rmw_node.cpp:2713, at /ros2_foxy_build/core/src/ros2/rcl/rcl/src/rcl/guard_condition.c:160'
[planner_server-21]
[planner_server-21] with this new error message:
[planner_server-21]
[planner_server-21] 'guard condition implementation is invalid, at /ros2_foxy_build/core/src/ros2/rcl/rcl/src/rcl/guard_condition.c:174'
[planner_server-21]
[planner_server-21] rcutils_reset_error() should be called after error handling to avoid this.
[planner_server-21] <<<
[planner_server-21] [ERROR 1605306298.359010300] [rcl]: Failed to get trigger guard condition in jump callback (/ros2_foxy_build/core/src/ros2/rcl/rcl/src/rcl/timer.c:114)
[ERROR] [planner_server-21]: process has died [pid 12676, exit code -11, cmd '/opt/ros/foxy/nav2_planner/lib/nav2_planner/planner_server --ros-args -r __node:=planner_server --params-file /tmp/tmpp28qz1d4 -r /tf:=tf -r /tf_static:=tf_static'].

Additional information

ROS2 Costmap Planner application

@eboasson
Copy link
Contributor

I have to ask: are you sure this is actually a bug in rmw_cyclonedds_cpp? I'm asking because:

guard_condition_handle not from this implementation

is the input validation in the RMW layer (we're not even talking about the underlying guard condition implementation of Cyclone itself yet). I don't see how there could be an actual RMW implementation mix-up in the test, so I suspect there's something wrong with the rcl code itself.

E.g.,

rcl_ret_t
rcl_timer_fini(rcl_timer_t * timer)
{
  if (!timer || !timer->impl) {
    return RCL_RET_OK;
  }
  // Will return either RCL_RET_OK or RCL_RET_ERROR since the timer is valid.
  rcl_ret_t result = rcl_timer_cancel(timer);
  rcl_allocator_t allocator = timer->impl->allocator;
  rcl_ret_t fail_ret = rcl_guard_condition_fini(&(timer->impl->guard_condition));
  if (RCL_RET_OK != fail_ret) {
    RCL_SET_ERROR_MSG("Failure to fini guard condition");
  }
  if (RCL_ROS_TIME == timer->impl->clock->type) {
    fail_ret = rcl_clock_remove_jump_callback(timer->impl->clock, _rcl_timer_time_jump, timer);
    if (RCL_RET_OK != fail_ret) {
      RCUTILS_LOG_ERROR_NAMED(ROS_PACKAGE_NAME, "Failed to remove timer jump callback");
    }
  }
  allocator.deallocate(timer->impl, allocator.state);
  timer->impl = NULL;
  return result;
}

apparently destroys the guard condition while the callback function for handling time jumps is still installed, so prima facie, a time jump occurring during rcl_timer_fini could lead to this error.

I am not familiar with the rcl code base nor with anything built on top of it and I may be wildly off base, but that's where I'd start digging.

@clalancette
Copy link
Contributor

@eboasson I haven't completely verified that that is the issue, but I agree with your overall assessment. I think this ticket belongs in rcl. I'll go ahead and move the ticket there for now.

@clalancette clalancette transferred this issue from ros2/rmw_cyclonedds Dec 16, 2020
@clalancette
Copy link
Contributor

@ThomasRFischer Is there any chance you could give the code in #877 a whirl? It is obviously a bug, so we'll go forward with it regardless, but additional confirmation that it fixes your problem would be welcome. Thanks.

@ThomasRFischer
Copy link
Author

ThomasRFischer commented Dec 18, 2020

Thanks for looking into this. I will try to get back to testing next week, Monday or Tuesday. I will reply to the results of the test in ticket #877.

@clalancette clalancette self-assigned this Jan 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants