Skip to content

Latest commit



651 lines (502 loc) · 21.4 KB

File metadata and controls

651 lines (502 loc) · 21.4 KB


This code includes detection and classification tasks in Computer Vision, and semantic segmentation task will be added later.

  • For classification, I reproduced LeNet5, VGG, AlexNet, ResNet(ResNeXt), GoogLeNet,MobileNet, shuffleNet. Then I will reproduce EiffcientNet, etc.

  • For object detection, I reproduced RetinaNet and SSD (I broke the code up into modules, such as backbone, neck, head,loss,etc.This makes it easier to modify and add code.) Of course, other object detection algorithms(like CenterNet, FCOS, YOLO series, Faster RCNN) will be added later.

  • For semantic segmentation, I'm going to reproduce FCN, Mask RCNN, DeepLab, UNet later.

  • Detailed explanation has been published on CSDN and Quora(Chinese) Zhihu.

In this project, you should create checkpoint(model save), log, results and tenshorboard(loss visualization).

Now, need to be added:

1.object detection
(CenterNet, FCOS, YOLO series, Faster RCNN)

2.semantic segmentation
(FCN, Mask RCNN, DeepLab, UNet)

Compiling environment

Install requirements with pip (you can put requirements.txt to venv/Scripts/, if you need it.)

pip install -r requirements.txt
python == 3.8.5

torch == 1.11.0+cu113

torchaudio == 0.11.0+cu113

torchvision == 0.12.0+cu113

pycocotools == 2.0.4




opencv-python  (maybe you want to use skimage or PIL etc...)




Dataset Path

Please, watch FolderOrganization.txt ( There are more details.)

Folder Organization

I use Ubuntu20.04 (OS).

project path: /data/PycharmProject

Simple-CV-master path: /data/PycharmProject/Simple-CV-Pytorch-master
|----checkpoints(resnet50-19c8e357.pth or retinanet_resnet50_coco.pth)
|                                 |----retinanet_voc.yaml
|                                 |----ssd300_coco.yaml
|                                 |----ssd300_voc.yaml
|----data----|----classification----|----CIAR_labels.txt( is null, this is because I just use torchvision.datasets.CIFAR10)
|            |                      |----ImageNet_labels.txt( is null, this is because I just use torchvision.datasets.ImageFolder)
|            |----detection----|----RetinaNet----|
|                              |                 |
|                              |----SSD----|
|                                          |
|                                     |----automobile.png
|              |----classification----|----crash_helmet.png
|              |                      |----photocopier.png
|              |                      |----sunflower.png
|              |----detection----|----000001.jpg
|              |                 |----000001.xml
|              |                 |----000002.jpg
|              |                 |----000002.xml
|              |                 |----000003.jpg
|              |                 |----000003.xml
|                           |----darknet19.png
|                           |----darknet53.png
|                           |----darknettiny.png                   
|                           |----googlenet.png
|                           |----lenet5.png
|                           |----mobilenet_v2.png
|                           |----mobilenet_v3_large.png
|                           |----mobilenet_v3_small.png
|                           |----resnet.png
|                           |----resnext.png
|                           |----retinanet.png
|                           |----shufflenet_v2.png
|                           |----ssd.png
|                           |----vgg.png
|----log(XXX[ detection or classification ]_XXX[  train or test or eval ].info.log)
|              |----classification----|----utils----|
|              |                      |             |
|              |                      |
|              |                      |
|              |                      |
|              |                      | resenext)
|              |                      |
|              |                      |
|              |                      |
|              |                      |
|              |----detection----|----RetinaNet----|----anchor----|
|              |                 |                 |              |
|              |                 |                 |----backbones----|
|              |                 |                 |                 |
|              |                 |                 |                 |
|              |                 |                 |----head----|
|              |                 |                 |            |
|              |                 |                 |
|              |                 |                 |----loss----|
|              |                 |                 |            |
|              |                 |                 |
|              |                 |                 |----neck----|
|----models----|                 |                 |            |
|              |                 |                 |            |----FPN.txt
|              |                 |                 |----utils----|
|              |                 |                 |             |
|              |                 |                 |             |
|              |                 |                 |             |
|              |                 |                 |             |
|              |                 |                 |
|              |                 |
|              |                 |----SSD----|----anchor----|
|              |                             |----backbone----|
|              |                             |----box_head----|
|              |                             |                |
|              |                             |                |
|              |                             |----utils----|
|              |                             |             |
|              |                             |             |
|              |                             |             |
|              |                             |
|                                 |                 |
|                                 |                 |
|                                 |----SSD----|
|                                             |
|                                             |
|               |           |----VOC----|----annot_cache----|----XXX_pr.pkl
|               |           |           |----det----|----det_test_xxx.txt(eg: car AP)
|               |           |           |----annots.pkl
|               |           |           |----detections.pkl
|               |           |           |----visualize.txt
|               |           |----XX(name: 000478)_XX(coco or voc).jpg
|               |----RetinaNet----|----COCO----|----coco_bbox_results.json
|               |                 |----VOC----|----annot_cache----|----XXX_pr.pkl
|                                 |           |----det----|----det_test_xxx.txt(eg: car)
|                                 |           |----annots.pkl
|                                 |           |----detections.pkl
|                                 |----XX(name:000478)_XX(coco or voc).jpg            
|----tensorboard(Loss Visualization)
|             |                      |
|             |                      |
|             |----detection----|----RetinaNet----|
|                               |                 |
|                               |                 |
|                               |                 |
|                               |----SSD----|
|                                           |
|                                           |
|                                           |
|             |
|             |
|             |

Run the program

Since there is not much time to adjust the accuracy, the accuracy of all models will be lower than the accuracy of the model in the paper.If you want to use the models in this project, you need to readjust the parameters and accuracy.


  • Reproduce network architectures





 I add nn.BatchNorm2d(). This is because that I was so upset about the poor accuracy.
 basenet: lenet5 (image size: 32 * 32 * 3)
 dataset: cifar
 batch_size: 32
 optim: SGD
 lr: 0.01
 momentum: 0.9
 weight_decay: 1e-4
 scheduler: MultiStepLR
 milestones: [15, 20, 30]
 gamma: 0.1
 poch: 30
epochs times avg top1 acc (%) avg top5 acc (%)
30 0h11m44s 62.21 95.97



 I add nn.BatchNorm2d(). This is because that I was so upset about the poor accuracy.
 basenet: AlexNet (image size: 224 * 224 * 3)
 dataset: cifar
 batch_size: 32
 optim: SGD
 gamma: 0.1
 momentum: 0.9
 weight_decay: 1e-4
 scheduler: MultiStepLR
 milestones: [15, 20, 30]
 lr: 0.01
 epoch: 30
epochs times avg top1 acc (%) avg top5 acc (%)
30 0h22m44s 86.27 99.00



 I add nn.BatchNorm2d() and transfer learning. This is because that I was so upset about the poor accuracy.
 basenet: vgg16 (image size: 224 * 224 * 3)
 dataset: cifar
 batch_size: 32
 optim: SGD
 lr: 0.01
 momentum: 0.9
 weight_decay: 1e-4
 scheduler: MultiStepLR
 milestones: [15, 20, 30]
 gamma: 0.1
 epoch: 30
epochs times avg top1 acc (%) avg top5 acc (%)
30 1h23m43s 76.56 96.44



 basenet: resnet18 
 dataset: ImageNet
 batch_size: 32
 optim: SGD
 lr: 0.001
 momentum: 0.9
 weight_decay: 1e-4
 scheduler: MultiStepLR
 milestones: [15, 20, 30]
 gamma: 0.1
 epoch: 30
No.epoch times/epoch top1 acc (%) top5 acc (%)
5 3h49min35s 50.21 75.59

5).ResNetXt(models/classification/ include: resnext50_32x4d,resnext101_32x8d)[5]


 basenet: resnext50_32x4d
 dataset: ImageNet
 batch_size: 32
 optim: SGD
 lr: 0.001
 momentum: 0.9
 weight_decay: 1e-4
 scheduler: ReduceLROnPlateau
 patience: 2
 epoch: 30
 pretrained: True
No.epoch times/eopch top1 acc (%) top5 acc (%)
7 4h5min16s 72.28 91.56



 basenet: GoogLeNet 
 dataset: ImageNet
 batch_size: 32
 optim: SGD
 lr: 0.01
 momentum: 0.9
 weight_decay: 1e-4
 scheduler: ReduceLROnPlateau
 patience: 2
 epoch: 30
 pretrained: True
No.epoch times/eopch top1 acc (%) top5 acc (%)
5 3h54min31s 42.70 69.34

7).MobileNet(models/classification/ or



 basenet: MobileNet_v2
 dataset: ImageNet
 batch_size: 32
 optim: SGD
 lr: 0.001
 momentum: 0.9
 weight_decay: 1e-4
 scheduler: ReduceLROnPlateau
 patience: 2
 epoch: 30
 pretrained: True
No.epoch times/epoch top1 acc (%) top5 acc (%)
5 3h58min3s 66.90 88.19




 basenet: MobileNet_v3_large
 dataset: ImageNet
 batch_size: 32
 optim: SGD
 lr: 0.001
 momentum: 0.9
 weight_decay: 1e-4
 scheduler: ReduceLROnPlateau
 patience: 2
 epoch: 30
 pretrained: True
No.epoch times/epoch top1 acc (%) top5 acc (%)
5 3h58min13s 71.15 90.32


 basenet: MobileNet_v3_small
 dataset: ImageNet
 batch_size: 32
 optim: SGD
 lr: 0.001
 momentum: 0.9
 weight_decay: 1e-4
 scheduler: ReduceLROnPlateau
 patience: 2
 epoch: 30
 pretrained: True
No.epoch times/epoch top1 acc (%) top5 acc (%)
5 3h54min38s 68.89 88.92

8).ShuffleNet v2(models/classification/[9]


 basenet: ShuffleNet_v2_x0_5
 dataset: ImageNet
 batch_size: 32
 optim: SGD
 lr: 0.001
 momentum: 0.9
 weight_decay: 1e-4
 scheduler: ReduceLROnPlateau
 patience: 2
 epoch: 30
 pretrained: True
No.epoch times/epoch top1 acc (%) top5 acc (%)
5 3h52min0s 55.61 78.84


conda activate base
python /data/PycharmProject/Simple-CV-Pytorch-master/tools/classification/ or or

2.object detection

Although all models use COCO and VOC datasets, they are processed differently, so each model has its own data( dataloader), train, test and eval.

  • Reproduce network architectures

    • CenterNet
    • FCOS
    • YOLO series
    • Faster RCNN




 Network: ssd
 backbone: vgg+add_extras
 loss: cls(cross_entropy_loss)+reg(smooth_l1_loss)
 dataset: voc
 batch_size: 16
 optim: SGD
 lr: 0.001
 scheduler: adjust_learning_rate
 epoch: 115
epochs batch norm Mean AP (%) Download Baidu yun Key
115 False 75.4 Link xwaw
115 True 76.2 Link 2xzk


 Network: ssd
 backbone: vgg+add_extras
 loss: cls(cross_entropy_loss)+reg(smooth_l1_loss)
 dataset: coco
 batch_size: 16
 optim: SGD
 lr: 0.001
 scheduler: adjust_learning_rate
 epoch: 55
epochs batch norm IoU=0.5 AP(%) Download Baidu yun Key
55 False 38.0 Link j6wn
55 True 37.7 Link 7i64





Do not show







 Network: RetinaNet
 backbone: ResNet50
 neck: FPN
 loss: Focal Loss
 dataset: voc
 batch_size: 4
 optim: Adam
 lr: 0.0001
 scheduler: WarmupCosineSchedule
 epoch: 80
epochs AP(%) Download Baidu yun Key
80 70.1 Link dww8


 Network: RetinaNet
 backbone: ResNet50
 neck: FPN
 loss: Focal Loss
 dataset: coco
 batch_size: 4
 optim: Adam
 lr: 0.0001
 scheduler: ReduceLROnPlateau
 patience: 3
 epoch: 30
 pretrained: True
epochs AP(%) Download Baidu yun Key
30 29.3 Link 5vak



conda activate base
python /data/PycharmProject/Simple-CV-Pytorch-master/tools/detection/XXX(eg:SSD or RetinaNet)/ or or or

3.semantic segmentation

  • Reproduce network architectures
    • FCN
    • DeepLab
    • U-Net


[1] LeCun Y, Bottou L, Bengio Y, et al. Gradient-based learning applied to document recognition[J]. Proceedings of the IEEE, 1998, 86(11): 2278-2324.

[2] Krizhevsky A, Sutskever I, Hinton G E. Imagenet classification with deep convolutional neural networks[J]. Advances in neural information processing systems, 2012, 25.

[3] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition[J]. arXiv preprint arXiv:1409.1556, 2014.

[4] He K, Zhang X, Ren S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 770-778.

[5] Xie S, Girshick R, Dollár P, et al. Aggregated residual transformations for deep neural networks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 1492-1500.

[6] Szegedy C, Liu W, Jia Y, et al. Going deeper with convolutions[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2015: 1-9.

[7] Sandler M, Howard A, Zhu M, et al. Mobilenetv2: Inverted residuals and linear bottlenecks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 4510-4520.

[8] Howard A, Sandler M, Chu G, et al. Searching for mobilenetv3[C]//Proceedings of the IEEE/CVF international conference on computer vision. 2019: 1314-1324.

[9] Ma N, Zhang X, Zheng H T, et al. Shufflenet v2: Practical guidelines for efficient cnn architecture design[C]//Proceedings of the European conference on computer vision (ECCV). 2018: 116-131.

[10] Liu W, Anguelov D, Erhan D, et al. Ssd: Single shot multibox detector[C]//European conference on computer vision. Springer, Cham, 2016: 21-37.

[11] Lin T Y, Goyal P, Girshick R, et al. Focal loss for dense object detection[C]//Proceedings of the IEEE international conference on computer vision. 2017: 2980-2988.