Implementation of PiT (ICCV2021) Based on PaddlePaddle

This is an unofficial repo based on PaddlePaddle of PiT (ICCV2021): Rethinking Spatial Dimensions of Vision Transformers

English | 简体中文

paddle_pit

Paper：Rethinking Spatial Dimensions of Vision Transformers.
Official repo (PyTorch) PiT.

1 Introduction

From the successful design principles of CNN, we investigate the role of spatial dimension conversion and its effectiveness on transformer-based architecture. We particularly attend to the dimension reduction principle of CNNs; as the depth increases, a conventional CNN increases channel dimension and decreases spatial dimensions. We empirically show that such a spatial dimension reduction is beneficial to a transformer architecture as well, and propose a novel Pooling-based Vision Transformer (PiT) upon the original ViT model.

The algorithm has been summarized in:

Algorithm of PiT

2 Accuracy

Model	Original Acc@1	Reproduced Acc@1	Image Size	batch_size	Crop_pct	epoch
pit_ti	73.0	72.97	224	256*4GPUs	0.9	300 (+10 COOLDOWN)

NOTE that the result in the table above is obtained on the validation set of ILSVRC2012. It's worth mentioning that the Acc@1 on the validation set of Light_ILSVRC2012 is 73.17.
The model parameters and training logs have been placed in the output folder.

3 Dataset

ImageNet-1k 2012, i.e. ILSVRC2012.

ILSVRC2012 is a large classification dataset, the size of which is 144GB. It contains 1,281,167 training images and 50,000 test images (1000 object categories).
To save time, this repo uses a lightweight version of ILSVRC2012, named Light_ILSVRC2012, whose size is 65GB. And the links are as follows: Light_ILSVRC2012_part_0.tar and Light_ILSVRC2012_part_1.tar.

You should arrange the dataset following this structure:

│imagenet/
├──train/
│  ├── n01440764
│  │   ├── n01440764_10026.JPEG
│  │   ├── n01440764_10027.JPEG
│  │   ├── ......
│  ├── ......
├──val/
│  ├── n01440764
│  │   ├── ILSVRC2012_val_00000293.JPEG
│  │   ├── ILSVRC2012_val_00002138.JPEG
│  │   ├── ......
│  ├── ......

You may also find this helpful.

4 Environment

My Environment：

Python: 3.7.11
PaddlePaddle: 2.2.2
yacs==0.1.8
scipy
pyyaml
Hardware: Tesla V100 * 4, p.s., I really appreciate what Baidu PaddlePaddle platform offers me.

5 Quick start

step1: git and download

git clone https://github.com/hatimwen/paddle_pit.git
cd paddle_pit

step2: change arguments

Please change the scripts you want to run in scripts according to the practical needs.

step3: eval

For multi-GPUs pc:
```
sh scripts/run_eval_multi.sh
```
For single-GPU pc:
```
sh scripts/run_eval.sh
```

step4: train

For multi-GPUs pc:
```
sh scripts/run_train_multi.sh
```
For single-GPU pc:
```
sh scripts/run_train.sh
```

step5: predict

python predict.py \
-pretrained='output/Best_PiT' \
-img_path='images/ILSVRC2012_val_00004506.JPEG'

Picture (id: 244)

Output Results:

class_id: 244, prob: 0.8468140959739685

Clearly, the output is is in line with expectations.

6 Code Structure and Description

|-- paddle_pit
    |-- output
    |-- configs
        |-- pit_ti.yaml
    |-- datasets
        |-- ImageNet1K
    |-- scripts
        |-- run_train.sh
        |-- run_train_multi.sh
        |-- run_eval.sh
        |-- run_eval_multi.sh
    |-- augment.py
    |-- config.py
    |-- datasets.py
    |-- droppath.py
    |-- losses.py
    |-- main_multi_gpu.py
    |-- main_single_gpu.py
    |-- mixup.py
    |-- model_ema.py
    |-- pit.py
    |-- random_erasing.py
    |-- regnet.py
    |-- transforms.py
    |-- utils.py
    |-- README.md
    |-- requirements.txt

8 Model info

Info	Description
Author	Hatimwen
Email	hatimwen@163.com
Date	2022.01
Version	PaddlePaddle 2.2.2
Field	Classification
Supported Devices	GPU
AI Studio	AI Studio

9 Citation

@inproceedings{heo2021pit,
    title={Rethinking Spatial Dimensions of Vision Transformers},
    author={Byeongho Heo and Sangdoo Yun and Dongyoon Han and Sanghyuk Chun and Junsuk Choe and Seong Joon Oh},
    booktitle = {International Conference on Computer Vision (ICCV)},
    year={2021},
}

pit
PaddlePaddle
PaddleViT

Last but not least, thank PaddlePaddle very much for its holding 飞桨论文复现挑战赛（第五期）, which helps me learn a lot. Meanwhile, I also thank Dr. Zhu's team very much for their PaddleViT since most of my codes are copied from it except for the whole alignment for the training process. ♥️

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README_en.md

README_en.md

Implementation of PiT (ICCV2021) Based on PaddlePaddle

1 Introduction

Algorithm of PiT

2 Accuracy

3 Dataset

4 Environment

5 Quick start

step1: git and download

step2: change arguments

step3: eval

step4: train

step5: predict

Picture (id: 244)

6 Code Structure and Description

8 Model info

9 Citation

Files

README_en.md

Latest commit

History

README_en.md

File metadata and controls

Implementation of PiT (ICCV2021) Based on PaddlePaddle

1 Introduction

Algorithm of PiT

2 Accuracy

3 Dataset

4 Environment

5 Quick start

step1: git and download

step2: change arguments

step3: eval

step4: train

step5: predict

Picture (id: 244)

6 Code Structure and Description

8 Model info

9 Citation