Check the CHANGELOG file to have a global overview of the latest modifications! 😋
Important Note: The EAST training procedure is not implemented yet for the current post-processing pipeline inspired from this repo. It can still be used by using the available pretrained weights! 😄
├── architectures : utilities for model architectures
│ ├── layers : custom layer implementations
│ ├── transformers : transformer architecture implementations
│ ├── common_blocks.py : defines common blocks (e.g., Conv + BN + ReLU)
│ ├── east_arch.py : EAST architecture
│ ├── generation_utils.py : utilities for text and sequence generation
│ ├── hparams.py : hyperparameter management
│ ├── simple_models.py : defines classical models such as CNN / RNN / MLP and siamese
│ └── yolo_arch.py : YOLOv2 architecture
├── custom_train_objects : custom objects used in training / testing
│ ├── callbacks : callbacks loading and implementations
│ ├── generators : custom data generators
│ ├── losses : loss functions
│ │ └── yolo_loss.py : YOLO specific loss
│ ├── metrics : metrics loading and implementations
│ ├── optimizers : optimizer loading
│ ├── checkpoint_manager.py: handle model checkpoint management (inspired from `tf.train.CheckpointManager`)
│ └── history.py : main History class to compute training statistics / track config
├── loggers : logging utilities for tracking experiment progress
├── models : main directory for model classes
│ ├── detection : detector implementations
│ │ ├── base_detector.py : abstract base class for all detectors
│ │ ├── east.py : EAST implementation for text detection
│ │ └── yolo.py : YOLOv2 implementation for general object detection
│ ├── interfaces : directories for interface classes
│ └── weights_converter.py : utilities to convert weights between different models
├── tests : unit and integration tests for model validation
├── utils : utility functions for data processing and visualization
├── detection.ipynb : notebook demonstrating detection features
├── example_yolo.ipynb : specific example notebook for YOLO model
├── LICENCE : project license file
├── README.md : this file
└── requirements.txt : required packages
Check the main project for more information about the unextended modules / structure / main classes.
- Detection (module
models.detection
):
Feature | Function / class | Description |
---|---|---|
detection | detect |
Detect objects on images/videos with multiple saving options (save cropped boxes, detected images, video frames, etc.) |
stream | stream |
Perform real-time detection using your camera (also allows saving frames) |
The detection
notebook provides a concrete demonstration of these functions 😄
Available architectures:
detection
:
Classes | Dataset | Architecture | Trainer | Weights |
---|---|---|---|---|
80 classes | COCO | YOLOv2 |
YOLOv2's author | link |
Pretrained backend
for YOLO can be downloaded at this link.
The pretrained version of EAST can be downloaded from this project. It should be stored in pretrained_models/pretrained_weights/east_vgg16.pth
(torch
is required to transfer the weights: pip install torch
).
See the installation guide for a step-by-step installation 😄
Here is a summary of the installation procedure, if you have a working python environment :
- Clone this repository:
git clone https://github.com/yui-mhcp/detection.git
- Go to the root of this repository:
cd detection
- Install requirements:
pip install -r requirements.txt
- Open the
detection
notebook and follow the instructions!
- Make the TO-DO list
- Support pretrained COCO model
- Add weights for face detection
- Add label-based model loading (without manual association)
- Add
producer-consumer
based streaming - Automatically download the official YOLOv2 pretrained weights (if not loaded)
- Add the Locality-Aware Non Maximum Suppression (NMS) method as described in the
EAST
paper - Keras 3 support
- Convert the pretrained models to be compatible with Keras 3
- Make comprehensive comparison example between NMS and LANMS
The two main methodologies in object detection are detection
with bounding boxes
and pixel-wise segmentation
. These approaches both aim to detect the position of objects in an image but with different levels of precision. This difference impacts the model architecture as the required output shape is not the same.
Here is a simple, non-exhaustive comparison of both approaches based on several criteria:
Criterion | Detection | Segmentation |
---|---|---|
Precision | Surrounding bounding boxes | Pixel by pixel |
Type of output | [x, y, w, h] (position of bounding boxes) |
Mask ([0, 1] probability score for each pixel) |
Output shape | [grid_h, grid_w, nb_box, 4 + 1 + nb_class] * |
[image_h, image_w, 1] |
Applications | General detection + classification | Medical image detection / object extraction |
Model architecture | Full CNN 2D downsampling to (grid_h, grid_w) |
Full CNN with downsampling and upsampling |
Post processing | Decode output to get position of boxes | Thresholding pixel confidence |
Model mechanism | Split image into grid and detect boxes in each grid cell | Downsample the image and upsample it to give probability of object for each pixel |
Support multi-label classification | Yes, by design | Yes, but not its main application |
* This is the classical output shape of YOLO
models. The last dimension is [x, y, w, h, confidence, * class_score]
More advanced strategies also exist, differing from the standard methodologies described above. This aims to be a simple introduction to object detection and segmentation.
The code for the YOLO part of this project is highly inspired from this repo:
- experiencor's repository: TensorFlow 1.x implementation of
YOLOv2
(main inspiration for this repository)
The code for the EAST part of this project is highly inspired from this repo:
- SakuraRiven pytorch implementation: PyTorch implementation of the EAST paper.
- TensorFlow Object Detection API Tutorial : Step-by-step guide
- Image segmentation tutorials : U-Net implementation in TensorFlow + image segmentation tutorial
- PyTorch Vision Tutorial : Object detection with PyTorch
- YOLO Explained : Detailed explanation of YOLO architecture
- Gentle guide on how YOLO object detection works : Good tutorial explaining the image detection mechanism
- YOLO9000: Better, Stronger, Faster : The original YOLOv2 paper
- U-Net: Convolutional Networks for Biomedical Image Segmentation : U-Net original paper
- EAST: An Efficient and Accurate Scene Text Detector : Text detection (with possibly rotated bounding boxes) with a segmentation model (U-Net).
- Kaggle's Computer Vision Tutorials : Practical computer vision examples
- Two Minute Papers : Quick explanations of recent deep learning papers
- COCO dataset : 80 labels dataset for object detection in real context
- COCO Text dataset : An extension of COCO for text detection
- Wider Face dataset : Face detection dataset
- kangaroo dataset: Funny tiny dataset to train fast a powerful model (fun to have fast results)
Contacts:
- Mail:
[email protected]
- Discord: yui0732
This project is licensed under the GNU Affero General Public License v3.0 (AGPL-3.0). See the LICENSE file for details.
This license allows you to use, modify, and distribute the code, as long as you include the original copyright and license notice in any copy of the software/source. Additionally, if you modify the code and distribute it, or run it on a server as a service, you must make your modified version available under the same license.
For more information about the AGPL-3.0 license, please visit the official website
If you find this project useful in your work, please add this citation to give it more visibility! 😋
@misc{yui-mhcp
author = {yui},
title = {A Deep Learning projects centralization},
year = {2021},
publisher = {GitHub},
howpublished = {\url{https://github.com/yui-mhcp}}
}