😋 Object detection

Check the CHANGELOG file to have a global overview of the latest modifications! 😋

Important Note: The EAST training procedure is not implemented yet for the current post-processing pipeline inspired from this repo. It can still be used by using the available pretrained weights! 😄

Project structure

├── architectures            : utilities for model architectures
│   ├── layers               : custom layer implementations
│   ├── transformers         : transformer architecture implementations
│   ├── common_blocks.py     : defines common blocks (e.g., Conv + BN + ReLU)
│   ├── east_arch.py         : EAST architecture
│   ├── generation_utils.py  : utilities for text and sequence generation
│   ├── hparams.py           : hyperparameter management
│   ├── simple_models.py     : defines classical models such as CNN / RNN / MLP and siamese
│   └── yolo_arch.py         : YOLOv2 architecture
├── custom_train_objects     : custom objects used in training / testing
│   ├── callbacks            : callbacks loading and implementations
│   ├── generators           : custom data generators
│   ├── losses               : loss functions
│   │   └── yolo_loss.py     : YOLO specific loss
│   ├── metrics              : metrics loading and implementations
│   ├── optimizers           : optimizer loading
│   ├── checkpoint_manager.py: handle model checkpoint management (inspired from `tf.train.CheckpointManager`)
│   └── history.py           : main History class to compute training statistics / track config
├── loggers                  : logging utilities for tracking experiment progress
├── models                   : main directory for model classes
│   ├── detection            : detector implementations
│   │   ├── base_detector.py : abstract base class for all detectors
│   │   ├── east.py          : EAST implementation for text detection
│   │   └── yolo.py          : YOLOv2 implementation for general object detection
│   ├── interfaces           : directories for interface classes
│   └── weights_converter.py : utilities to convert weights between different models
├── tests                    : unit and integration tests for model validation
├── utils                    : utility functions for data processing and visualization
├── detection.ipynb          : notebook demonstrating detection features
├── example_yolo.ipynb       : specific example notebook for YOLO model
├── LICENCE                  : project license file
├── README.md                : this file
└── requirements.txt         : required packages

Check the main project for more information about the unextended modules / structure / main classes.

Available features

Detection (module models.detection):

Feature	Function / class	Description
detection	`detect`	Detect objects on images/videos with multiple saving options (save cropped boxes, detected images, video frames, etc.)
stream	`stream`	Perform real-time detection using your camera (also allows saving frames)

The detection notebook provides a concrete demonstration of these functions 😄

Available models

Model architectures

Available architectures:

detection:
- YOLOv2 : You Only Look Once (version 2)
- EAST : Efficient and Accurate Scene Text detector

Model weights

Classes	Dataset	Architecture	Trainer	Weights
80 classes	COCO	`YOLOv2`	YOLOv2's author	link

Pretrained backend for YOLO can be downloaded at this link.

The pretrained version of EAST can be downloaded from this project. It should be stored in pretrained_models/pretrained_weights/east_vgg16.pth (torch is required to transfer the weights: pip install torch).

Installation and usage

See the installation guide for a step-by-step installation 😄

Here is a summary of the installation procedure, if you have a working python environment :

Clone this repository: git clone https://github.com/yui-mhcp/detection.git
Go to the root of this repository: cd detection
Install requirements: pip install -r requirements.txt
Open the detection notebook and follow the instructions!

TO-DO list:

Difference between `detection` and `segmentation`

The two main methodologies in object detection are detection with bounding boxes and pixel-wise segmentation. These approaches both aim to detect the position of objects in an image but with different levels of precision. This difference impacts the model architecture as the required output shape is not the same.

Here is a simple, non-exhaustive comparison of both approaches based on several criteria:

Criterion	Detection	Segmentation
Precision	Surrounding bounding boxes	Pixel by pixel
Type of output	`[x, y, w, h]` (position of bounding boxes)	Mask ([0, 1] probability score for each pixel)
Output shape	`[grid_h, grid_w, nb_box, 4 + 1 + nb_class]`*	`[image_h, image_w, 1]`
Applications	General detection + classification	Medical image detection / object extraction
Model architecture	Full CNN 2D downsampling to `(grid_h, grid_w)`	Full CNN with downsampling and upsampling
Post processing	Decode output to get position of boxes	Thresholding pixel confidence
Model mechanism	Split image into grid and detect boxes in each grid cell	Downsample the image and upsample it to give probability of object for each pixel
Support multi-label classification	Yes, by design	Yes, but not its main application

* This is the classical output shape of YOLO models. The last dimension is [x, y, w, h, confidence, * class_score]

More advanced strategies also exist, differing from the standard methodologies described above. This aims to be a simple introduction to object detection and segmentation.

Notes and references

GitHub projects

The code for the YOLO part of this project is highly inspired from this repo:

experiencor's repository: TensorFlow 1.x implementation of YOLOv2 (main inspiration for this repository)

The code for the EAST part of this project is highly inspired from this repo:

SakuraRiven pytorch implementation: PyTorch implementation of the EAST paper.