Skip to content

kimhy365/AttentionCNN

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

57 Commits
 
 
 
 

Repository files navigation

Attention-based CNN for Small Object Classification

πŸ“š Overview

We propose an Attention-based CNN (A-CNN) model that addresses the challenges of small object classification in real-world manufacturing environments. Unlike traditional CNNs that struggle with object-to-image area ratio (OAR) constraints, our model leverages an attention mechanism to dynamically focus on small objects, achieving superior classification accuracy and efficiency.

Key innovations of our model include:

  • Integration of an Attention module to adaptively extract Regions of Interest (ROI), increasing OAR without manual preprocessing.
  • A multi-task learning framework that enables end-to-end training with minimal data labeling (only 5% of the dataset labeled), significantly reducing human effort and time.
  • For an edge device, NVIDIA Jetson Nano, providing real-time inference (67.1 fps) while maintaining high accuracy (99.92%).

These contributions ensure that our A-CNN is not only effective but also practical for deployment in resource-constrained environments, such as automated optical inspection (AOI) systems.

πŸ“‚ Architecture of the Attention-based CNN model

This model utilizes a spatial transformer (Attention) module to sample the ROIs from the input images. The localization network predicts the center coordinates of the ROIs, and the classification network assigns class scores based on the ROIs. In the Attention module, the sizes of both the ROI and the resized ROI are hyperparameters.

A-CNN Architecture

Dataset

This dataset was created as part of our research. It is publicly available to facilitate reproducibility and further advancements in the field.
➑️ download dataset

  • Images:
    • train data: from device 0
    • test data: from device 1
  • Labels:
    • YOLO format labels corresponding to each image.

Dataset Example

Performance (updated after the paper publication)

The A-CNN model can be effectively trained end-to-end with minimal data labeling compared to object detection methods. Experimental results show that the proposed A-CNN model achieves a classification accuracy of 99.92% and an inference speed of 62.9 fps on the NVIDIA Jetson Nano platform, outperforming the smallest models of YOLOv5, YOLOv7, YOLOv8, YOLOv9 and YOLOv10, state-of-the-art object detection algorithms, in terms of both accuracy and latency. Notably, our model has 3.8Γ— faster than the fastest YOLO model, underscoring its efficiency in real-time applications. These findings highlight the potential of the A-CNN model as an accurate and practical solution for small object classification.

Comparison of the A-CNN with YOLO Object Detection Models

Model Params (M) FLOPsf (G) Input (resized) Accuracy (%) Latencya (ms)
YOLOv5-Nano 1.76 1.55 640Γ—480 99.67 61
0.67 416Γ—312 97.92 61
0.22 224Γ—168 82.83 55
YOLOv7-Tiny 6.02 4.95 640Γ—480 99.83 135
2.15 416Γ—312 98.42 135
0.69 224Γ—168 95.33 130
YOLOv8-Nano 3.01 3.01 640Γ—480 98.95 72
1.33 416Γ—312 95.58 44
0.43 224Γ—168 64.00 44
YOLOv9-Tiny 2.01 2.94 640Γ—480 99.50 112
1.28 416Γ—312 99.08 102
0.41 224Γ—168 77.08 95
YOLOv10-Nano 2.71 3.15 640Γ—480 99.75 84
1.36 416Γ—312 99.33 59
0.44 224Γ—168 75.08 57
A-CNN (base) 0.71 2.22 640Γ—480 99.75 14.9 (6.2)
A-CNN (best) 0.70 1.00 640Γ—480 99.82 15.5 (6.3)
A-CNN (opt) 0.68 0.38 640Γ—480 99.92 15.9 (6.6)

Notes:

  • f FLOPs in the forward process of model, excluding the pre- and post-processing for YOLO models.
  • a End-to-end inference time measured on the NVIDIA Jetson Nano, including the pre- and post-processing.
    Values in parentheses indicate inference time using TensorRT with FP32 precision.

πŸ“„ Citation

If you use this dataset, please cite the following paper:

Hyun-Yong Kim, Taek-Joon Yi, and Jong-Yun Lee
An Attention-based Convolutional Neural Network with Spatial Transformer Module for Automated Optical Inspection of Small Objects
IEEE Transactions on Instrumentation and Measurement, 2025.
DOI: 10.1109/TIM.2025.3548240

About

Attention-based CNN

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published