-
Notifications
You must be signed in to change notification settings - Fork 1
Nodes
The following section details the ROS nodes and services that make up KOLT.
KOLT includes the following nodes:
- yolo_detect: A service that runs the YOLOv2 object detector.
-
yolo_predict: Sends images to the
yolo_detect
service to be run through the detector. - vision_pose: Translates detected bounding boxes to poses of objects.
The YOLO server is a ROS service that takes in an image and produces a detection message. It runs the YOLOv2 prediction. For a look at how this works see the yolo_predict documentation.
-
image
([sensor_msgs/Image])The input image to run YOLOv2 over.
-
detection
([vision_msgs/Detection2DArray])A detection message with a Detection for each detected bounding box.
-
n_gpu
(int, default: 1)Sets the number of GPUs to use to make the prediction. These must be available to the system.
-
backend
(string, default: "full_yolo")The backend network structure to use. Available types include:
tiny_yolo
squeeze_net
mobile_net
full_yolo
inception3
For a full description of each backend see backends.
-
input_size
(int, default: 416)The input size of the image to the neural network (
input_size
,input_size
). This value needs to match the value when training the network. -
labels
(rosparam, default: ["trafficcone"])An array of objects to detect. There can be fewer objects in this list than the network was trained on. For example, if the network was trained on the COCO dataset with 80 object categories we might only want to detect
["person"]
. -
max_number_detections
(int, default: 10)The max number of detections the network will find per image. Just as with the
labels
this can be shorter than what the network was trained on. If it is shorter, the detections with the topn
scores will be published. -
anchors
(rosparam, default: [0.57273, 0.677385, 1.87446, 2.06253, 3.33843, 5.47434, 7.88282, 3.52778, 9.77052, 9.16828])An array of anchor box values. After doing some clustering studies on ground truth labels, it turns out that most bounding boxes have certain height-width ratios. So instead of directly predicting a bounding box YOLOv2 predicts off-sets from a predetermined set of boxes with particular height and width ratios. These predetermined set of boxes are anchor boxes. This needs to match the value used when training the network. It is highly recommended that this parameter not be changed from the default value.
-
weights_path
(string, default: $(find kolt)/weights)The folder to find the weights files. This includes backend weights and pretrained weights.
-
weight_file
(string, default: N/A)The pretrained weight file (without a
/
) within the specifiedweights_path
.
The yolo_predict
node subscribes to an image topic and sends each off to the yolo_detect
service to have YOLOv2 run on them. This was setup this way in order to allow for the use of multiple cameras with one YOLO detector. Therefore, there is a copy of this node for each camera.
-
/camera/depth_registered/image_raw
([sensor_msgs/Image])The temperature measurements from which the average is computed.
-
/camera/rgb/image_raw
([sensor_msgs/Image])The temperature measurements from which the average is computed.
-
/yolo_predict/detected
([vision_msgs/Detection2DArray])The detection message straight from the detection server.
-
/yolo_predict/bounding_box_image
([sensor_msgs/Image])A image with a bounding box overlaid. Used for debugging the object detector.
-
camera_id
(string, default: "0")The ID of the camera assigned to this prediction node. The purpose of this being the support of multiple cameras.
-
image_topic
(string, default: "/camera/rgb/image_raw")The incoming image topic for the prediction node to subscribe to.
-
depth_image_topic
(string, default: "/camera/depth_registered/image_raw", optional)The incoming depth topic for the prediction node to subscribe to. This is only applicable to when
image_type
is set torgbd
. -
image_type
(string, default: "rgb")Either
rgb
orrgbd
if using depth.
Once a detection has been made, it is fed to the tracking and filtering step. In this step detections are correlated to known tracked objects and their position estimated using a Kalman filter. The tracking and filtering step is handled by the vision pose node.
This node waits for new detections to be published and processes them by:
- Finding the centroid of each detected bounding box.
- Using the centroid and the associated RGBD image to get its (x, y, z) position in the camera's frame.
- Using a tracker to correlate each new calculated position to already tracked objects
using the Hungarian algorithm.
- If a new position can not be correlated to an already tracked object, a new tracked object will be created with a unique integer ID.
- Each tracked object has its own Kalman filter that filters incoming raw poses and produces a predicted position.
- The filtered poses get published as a pose array with each pose having an orientation of (0, 0, 0, 0).
-
/yolo_predict/detection
([vision_msgs/Detection2DArray])The detections published by the
yolo_predict
node. -
/odom
([nav_msgs/Odometry])The odometry of the camera.
-
/vision_poses
([geometry_msgs/PoseArray])An array of poses of tracked objects.
-
/paths
([visualization_msgs/MarkerArray])A marker array for displaying the path of tracked objects.
-
camera_type
(string, default: "zed")Sets the camera type. This is used for image type conversion. Options include:
-
zed
For the ZED camera. -
rs
For the Intel RealSense D435.
-
-
camera_frame
(string, default: "zed_left_camera_frame")The base frame of the RGBD image.
-
detection_topic
(string, default: "/yolo_predict/detected")The topic on which detections are published.
-
odom_topic
(string, default: "/odom")The topic on which odometry of the camera frame is published.
-
horiz_fov
(double, default: 85.0)The horizontal field of view of the camera. This can be obtained from your camera's datasheet.
-
vert_fov
(double, default: 54.0)The vertical field of view of the camera. This can be obtained from your camera's datasheet.