Skip to content
Nathan Rosenberg edited this page Apr 29, 2019 · 8 revisions

The following section details the ROS nodes and services that make up KOLT.

KOLT includes the following nodes:

  • yolo_detect: A service that runs the YOLOv2 object detector.
  • yolo_predict: Sends images to the yolo_detect service to be run through the detector.
  • vision_pose: Translates detected bounding boxes to poses of objects.

yolo_detect

The YOLO server is a ROS service that takes in an image and produces a detection message. It runs the YOLOv2 prediction. For a look at how this works see the yolo_predict documentation.

Request

  • image ([sensor_msgs/Image])

    The input image to run YOLOv2 over.

Response

  • detection ([vision_msgs/Detection2DArray])

    A detection message with a Detection for each detected bounding box.

Parameters

  • n_gpu (int, default: 1)

    Sets the number of GPUs to use to make the prediction. These must be available to the system.

  • backend (string, default: "full_yolo")

    The backend network structure to use. Available types include:

    • tiny_yolo
    • squeeze_net
    • mobile_net
    • full_yolo
    • inception3

    For a full description of each backend see backends.

  • input_size (int, default: 416)

    The input size of the image to the neural network (input_size, input_size). This value needs to match the value when training the network.

  • labels (rosparam, default: ["trafficcone"])

    An array of objects to detect. There can be fewer objects in this list than the network was trained on. For example, if the network was trained on the COCO dataset with 80 object categories we might only want to detect ["person"].

  • max_number_detections (int, default: 10)

    The max number of detections the network will find per image. Just as with the labels this can be shorter than what the network was trained on. If it is shorter, the detections with the top n scores will be published.

  • anchors (rosparam, default: [0.57273, 0.677385, 1.87446, 2.06253, 3.33843, 5.47434, 7.88282, 3.52778, 9.77052, 9.16828])

    An array of anchor box values. After doing some clustering studies on ground truth labels, it turns out that most bounding boxes have certain height-width ratios. So instead of directly predicting a bounding box YOLOv2 predicts off-sets from a predetermined set of boxes with particular height and width ratios. These predetermined set of boxes are anchor boxes. This needs to match the value used when training the network. It is highly recommended that this parameter not be changed from the default value.

  • weights_path (string, default: $(find kolt)/weights)

    The folder to find the weights files. This includes backend weights and pretrained weights.

  • weight_file (string, default: N/A)

    The pretrained weight file (without a /) within the specified weights_path.

yolo_predict

The yolo_predict node subscribes to an image topic and sends each off to the yolo_detect service to have YOLOv2 run on them. This was setup this way in order to allow for the use of multiple cameras with one YOLO detector. Therefore, there is a copy of this node for each camera.

Thing

Subscribed Topics

  • /camera/depth_registered/image_raw ([sensor_msgs/Image])

    The temperature measurements from which the average is computed.

  • /camera/rgb/image_raw ([sensor_msgs/Image])

    The temperature measurements from which the average is computed.

Published Topics

  • /yolo_predict/detected ([vision_msgs/Detection2DArray])

    The detection message straight from the detection server.

  • /yolo_predict/bounding_box_image ([sensor_msgs/Image])

    A image with a bounding box overlaid. Used for debugging the object detector.

Parameters

  • camera_id (string, default: "0")

    The ID of the camera assigned to this prediction node. The purpose of this being the support of multiple cameras.

  • image_topic (string, default: "/camera/rgb/image_raw")

    The incoming image topic for the prediction node to subscribe to.

  • depth_image_topic (string, default: "/camera/depth_registered/image_raw", optional)

    The incoming depth topic for the prediction node to subscribe to. This is only applicable to when image_type is set to rgbd.

  • image_type (string, default: "rgb")

    Either rgb or rgbd if using depth.

vision_pose

Once a detection has been made, it is fed to the tracking and filtering step. In this step detections are correlated to known tracked objects and their position estimated using a Kalman filter. The tracking and filtering step is handled by the vision pose node.

thing

This node waits for new detections to be published and processes them by:

  1. Finding the centroid of each detected bounding box.
  2. Using the centroid and the associated RGBD image to get its (x, y, z) position in the camera's frame.
  3. Using a tracker to correlate each new calculated position to already tracked objects using the Hungarian algorithm.
    • If a new position can not be correlated to an already tracked object, a new tracked object will be created with a unique integer ID.
    • Each tracked object has its own Kalman filter that filters incoming raw poses and produces a predicted position.
  4. The filtered poses get published as a pose array with each pose having an orientation of (0, 0, 0, 0).

Subscribed Topics

  • /yolo_predict/detection ([vision_msgs/Detection2DArray])

    The detections published by the yolo_predict node.

  • /odom ([nav_msgs/Odometry])

    The odometry of the camera.

Published Topics

  • /vision_poses ([geometry_msgs/PoseArray])

    An array of poses of tracked objects.

  • /paths ([visualization_msgs/MarkerArray])

    A marker array for displaying the path of tracked objects.

Parameters

  • camera_type (string, default: "zed")

    Sets the camera type. This is used for image type conversion. Options include:

    • zed For the ZED camera.
    • rs For the Intel RealSense D435.
  • camera_frame (string, default: "zed_left_camera_frame")

    The base frame of the RGBD image.

  • detection_topic (string, default: "/yolo_predict/detected")

    The topic on which detections are published.

  • odom_topic (string, default: "/odom")

    The topic on which odometry of the camera frame is published.

  • horiz_fov (double, default: 85.0)

    The horizontal field of view of the camera. This can be obtained from your camera's datasheet.

  • vert_fov (double, default: 54.0)

    The vertical field of view of the camera. This can be obtained from your camera's datasheet.