Extracts features from a pre-trained CNN and trains a LSTM.
Updates
- Specific test batchsize. (new)
- Label offsets. Max batch-size inference.
- gen_outputs.py can now read multilabeled files, and ignore missing images.
- Masked input for many-to-one prediction.
- Early stop. Choosing GPU.
- Added script to save images into h5 file.
- AUC score, better sorting frames in gen_output, saving best outputs.
- Correct and incorrect data examples in this README
- Docker Image.
- Added Classification tests.
- Added Classification task.
If you use our code, please cite:
@article{rodriguez2017deep,
title={Deep Pain: Exploiting Long Short-Term Memory Networks for Facial Expression Classification},
author={Rodriguez, Pau and Cucurull, Guillem and Gonz{\`a}lez, Jordi and Gonfaus, Josep M and Nasrollahi, Kamal and Moeslund, Thomas B and Roca, F Xavier},
journal={IEEE Transactions on Cybernetics},
year={2017},
publisher={IEEE}
}
The code has been used in the following publications:
@article{rodriguez2017deep,
title={Deep Pain: Exploiting Long Short-Term Memory Networks for Facial Expression Classification},
author={Rodriguez, Pau and Cucurull, Guillem and Gonz{\`a}lez, Jordi and Gonfaus, Josep M and Nasrollahi, Kamal and Moeslund, Thomas B and Roca, F Xavier},
journal={IEEE Transactions on Cybernetics},
year={2017},
publisher={IEEE}
}
@inproceedings{bellantonio2016spatio,
title={Spatio-Temporal Pain Recognition in CNN-based Super-Resolved Facial Images},
author={Bellantonio, Marco and Haque, Mohammad A and Rodriguez, Pau and Nasrollahi, Kamal and Telve, Taisi and Escarela, Sergio and Gonzalez, Jordi and Moeslund, Thomas B and Rasti, Pejman and Anbarjafari, Gholamreza},
booktitle={International Conference on Pattern Recognition (icpr)},
year={2016},
organization={Springer}
}
@article{negin2018praxis,
title={PRAXIS: Towards automatic cognitive assessment using gesture recognition},
author={Negin, Farhood and Rodriguez, Pau and Koperski, Michal and Kerboua, Adlen and Gonz{\`a}lez, Jordi and Bourgeois, Jeremy and Chapoulie, Emmanuelle and Robert, Philippe and Bremond, Francois},
journal={Expert systems with applications},
volume={106},
pages={21--35},
year={2018},
publisher={Elsevier}
}
- Install CUDA and CUDNN if possible.
- Install and compile caffe https://github.com/BVLC/caffe
pip install h5py numpy
- Install opencv and python-opencv
- Install torch http://torch.ch/
- Install
torch-hdf5
(https://github.com/deepmind/torch-hdf5/blob/master/doc/usage.md) luarocks install nn; luarocks install cunn; luarocks install rnn; luarocks install optim; luarocks install gnuplot; luarocks install dp; luarocks install class
git clone this repository
- Set your caffe installation path in
config.py
docker pull prlz77/lstm-on-cnn
For automatic generation of the features and training the LSTM run:
./scripts/train.sh
- Images in a OpenCV compatible format (jpg, png, etc.)
- A file with the ordered list of train frames.
./path/to/frame_x.jpg label sequence_number\n...
- A file with the ordered list of validation frames.
./path/to/frame_x.jpg label sequence_number\n...
All the frames from the same video sequence must have the same sequence number. The order is important, e.g:
Wrong sequence numbering:
path label 1
path label 2
path label 3
...
path label 1
path label 2
path label 3
Correct way:
path label 1 (frame 1 of seq 1)
path label 1 (frame 2 of seq 1)
path label 1 (...)
....
path label 2 (frame 1 of seq 2)
path label 2 (frame 2 of seq 2)
path label 2 (...)
...
3
3
3
...
Since the LSTM is fed with CNN feature maps, a pre-trained model is needed. For generic baselines I would recommend any of the famous caffemodels in https://github.com/BVLC/caffe/wiki/Model-Zoo. For better results, fine-tune one of them with the specific dataset. Before using it with this code.
The script can be configured from inside:
# DATASET PATH
DEPLOY='smth.prototxt'
CAFFEMODEL='smth.caffemodel'
DATAROOT='root/to/images/'
TRAIN_LIST='train/list.txt' # ex: ./class/0345435.jpg label seq_num\n... or ./0445342.jpg label seq_num\n, etc.
VAL_LIST='val/list.txt' # ex: ./class/0345435.jpg label seq_num\n... or ./0445342.jpg label seq_num\n, etc.
# CONVNET PARAMS
EXTRACT_FROM="fc7" # example from vgg16
# DATASET MEAN
# (IMAGENET mean, change as convenient)
R=123.68
G=116.779
B=123.68
# LSTM hyperparameters
RECURRENT_DEPTH=1
HIDDEN_LAYER_SIZE=256
RHO=5 # max sequence length. n when doing n-to-1.
BATCHSIZE=32 # should be as big as possible
EPOCHS=100000
DROPOUT_PROB=0
TASK='regress' # task should be in {regress, classify}
# Other
SNAPSHOT_EVERY=10 #number of epochs to save current model. Set 0 for never.
PLOT=1 #plots the regression outputs and targets in real time during learning. Set n to plot every n epochs.
LOG='' #for specific log file. default is ./logs/current_datetime.log. Usage LOG='--logPath <path>'
# FLAGS
# Can use any in {--sort, --cpuonly, --standarize, --verbose} with spaces inbetween
FLAGS='--standarize' # use '--sort' in case the image lists do not have ordered frames
Remember lua (and torch) indexes vectors starting from 1 instead of 0. Thus, labels should be an integer from 1 to #labels.
By default the number of labels is max(labels)
but can be manually set using --nlabels
gen_outputs.py
receives a caffemodel, the images and the listfiles and creates an hdf5 file with the outputs.LSTM.lua
trains a LSTM with a training and validation hdf5 datasets with the following fields:outputs
:NxCxHxW
tensor of N images withC
channels and spatial dimsHxW
.labels
:NxC
tensor ofN
labels ofC
dimensionality.seq_numbers
: tensor ofN
numbers correponding to the video sequence from which the frame was extracted.
Use the --testonly
and --saveOutputs
options of LSTM.lua
in order to extract a HDF5 with the output of the network, and the --load
option to reload a saved model.
Given a train.txt and a test.txt file with: im_path label seq_num\n...
If we want to save the images into a database after subtracting the mean, resizing to 100px X 100px, and standarizing the labels, we can call:
# Save train.
python images2h5.py images/root/folder train.txt --output train.h5 \
--size 100 --standarize --subtract_mean --output train.h5
# Save test. We indicate the train.h5 so that the train mean, std, etc. are reused and we do not overfit.
python images2h5.py images/root/folder test.txt --output test.h5 \
--size 100 --standarize --subtract_mean --output test.h5 --with_train train.h5
I added some tests to verify it correctly works. You can run them as well to check everything is fine.
Please contact me if any problem pau.rodriguez at uab.cat