Sparsity-Aware Orthogonal Initialization

PyTorch implementation of SAO from the paper Sparsity-Aware Orthogonal Initialization of Deep Neural Networks by Esguerra et al.

Overview

Pruning is a common technique to reduce the number of parameters of a deep neural network. However, this technique can have adverse effects such as network disconnection and loss of dynamical isometry.

Thus, our solution is to leverage expander graphs (sparse yet highly-connected graphs) to form the sparse structure and then orthogonalize this structure through appropriate weight assignments.

Installation

git clone https://github.com/kiaraesguerra/SAO
cd SAO
git clone https://github.com/kiaraesguerra/AutoAugment
conda create -n myenv python=3.9
conda activate myenv
pip install -r requirements.txt

Features

Datasets: CIFAR-10, CIFAR-100, CINIC-10
Models: Plain MLP and CNN, ResNet, LipConvNet-N
Initialization methods: kaiming-normal, delta-orthogonal initialization, explicitly-constructed orthogonal convolutions
Pruning/Sparse construction methods: magnitude pruning, random pruning, Ramanujan pruning, Ramanujan normal, Ramanujan uniform, SAO

Training

1. Delta on Vanilla CNN

python main.py --model cnn --num-layers 32 --hidden-width 128 --activation 'relu' --weight-init 'delta' --max-lr 1e-2 --min-lr 0 --scheduler 'cosine' --autoaugment

2. SAO-Delta on Vanilla CNN

When implementing SAO, the user can specify either the sparsity:

python main.py --model cnn --num-layers 32 --hidden-width 128 --activation 'relu' --pruning-method SAO --sparsity 0.5 --max-lr 1e-2 --min-lr 0 --scheduler 'cosine' --autoaugment

or the degree:

python main.py --model cnn --num-layers 32 --hidden-width 128 --activation 'relu' --pruning-method SAO --degree 4 --max-lr 1e-2 --min-lr 0 --scheduler 'cosine' --autoaugment

Note: When using Tanh, the minimum degree is 2. This should also be noted when specifying the sparsity, such that the sparsity should not result in a degree lower than 2, e.g., for Conv2d(16, 16), the maximum sparsity is 87.50%. For ReLU, the minimum degree is 4, where for Conv2d(16, 16), the maximum sparsity is 75.00%.

3. ECO on Vanilla CNN

python main.py --model cnn_eco --num-layers 32 --hidden-width 128 --activation 'relu' --weight-init 'delta-eco' --max-lr 1e-2 --min-lr 0 --scheduler 'cosine' --autoaugment

4. SAO-ECO on Vanilla CNN

Using sparsity:

python main.py --model cnn_eco --num-layers 32 --hidden-width 128 --activation 'relu' --pruning-method SAO --sparsity 0.5 --max-lr 1e-2 --min-lr 0 --scheduler 'cosine' --autoaugment

using degree:

python main.py --model cnn_eco --num-layers 32 --hidden-width 128 --activation 'relu' --pruning-method SAO --degree 4 --max-lr 1e-2 --min-lr 0 --scheduler 'cosine' --autoaugment

Sample Notebook

I've provided a sample notebook in Google Colab on how to run the code: https://colab.research.google.com/drive/1O47ZD5RT3sYg3uuixs6f-2MO0-9J7AMe?usp=sharing

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Sparsity-Aware Orthogonal Initialization

Overview

Installation

Features

Training

1. Delta on Vanilla CNN

2. SAO-Delta on Vanilla CNN

3. ECO on Vanilla CNN

4. SAO-ECO on Vanilla CNN

Sample Notebook

Files

README.md

Latest commit

History

README.md

File metadata and controls

Sparsity-Aware Orthogonal Initialization

Overview

Installation

Features

Training

1. Delta on Vanilla CNN

2. SAO-Delta on Vanilla CNN

3. ECO on Vanilla CNN

4. SAO-ECO on Vanilla CNN

Sample Notebook