Vehicle and pedestrian detection and tracking play a vital role in autonomous driving. In previous project, I implemented a vehicle detection and tracking pipeline based on traditional computer vision techniques. This project is to explore application of RetinaNet on the vehicle detection taask.
The training and evaluation of this project is based on the Udacity annotated driving dataset. It includes driving in Mountain View California and neighboring cities during daylight conditions. I combined the two datasets and only retained bounding box annotations for car, truck, and pedestrian. The combined dataset
Here's an overview of the dataset
In this project, I'm interested in the detection accuracy of the models as well as their inference speed. The goal is to find a model that can detect vehicles with good accuracy in real time.
The accuracy of models is primarily evaluated by mean Average Precision (mAP) and mean Average Recall (mAR) at IOU of 0.5.
The models being benchmarked are
- sliding window method based on HOG feature and linear classifier
- RetinaNet with ResNet50 backbone, pre-trained on COCO
- RetinaNet with ResNet18 backbone, trained on driving dataset
- RetinaNet with MobileNet backbone, trained on driving dataset
Model | AP50 (car) | AP50 (truck) | AP50 (pedestrian) | # of parameters | CPU inference (s/frame) |
GPU inference (s/frame) |
---|---|---|---|---|---|---|
HOG | 24.6 | - | - | - | 6.9 | |
RetinaNet-ResNet50 pre-trained on COCO |
71.8 | 53.4 | 32.4 | 37.4 | 2.0 | 0.14 |
RetinaNet-ResNet18-64 | 66.7 | 54.1 | 27.2 | 12.0 | 1.4 | 0.1 |
RetinaNet-ResNet18-48 | 66.1 | 51.0 | 18.8 | 7.0 | 1.2 | 0.09 |
RetinaNet-ResNet18-32 | 71.9 | 55.2 | 34.7 | 3.4 | 0.97 | 0.09 |
RetinaNet-MobileNet-1 | 73.3 | 54.6 | 42.4 | 4.4 | 1.1 | 0.1 |
RetinaNet-MobileNet-0.75 | 67.6 | 57.2 | 29.6 | 2.8 | 1.0 | 0.07 |
RetinaNet-MobileNet-0.5 | 65.3 | 55.2 | 36.3 | 1.6 | 0.77 | 0.055 |
RetinaNet-MobileNet-0.25 | 67.6 | 54.1 | 38.2 | 0.84 | 0.54 | 0.05 |
Here's the result of running RetinaNet-ResNet50-COCO on a dash camera video
Here's the result of running RetinaNet-MobileNet-0.25 on a dash camera video
The following graph shows the structure of feature pyramid net (FPN) built on top of ResNet backbone.
The following graph showes the structure of regress and classification subnet.