Finetuning Facebook's TimeSformer pre-trained model for any task related to video classification, such as emotion recognition, action recognition, etc.
We fine-tuned the model for a two-class face reaction classification, but you can easily modify it for any scenario regarding video classification.
- We used the Pytorch version of the MTCNN method to create a new video file that contains faces. To process all your videos, you can use
extract_face.py
. In this code, just change the 'input_folderand 'output_fodler
parameters. - Use
make_manifest.py
to create the required manifests for train and test in .csv format. You can see the CSV files to understand the format of the manifest. In the manifest, the folder name is the label. However, you can modify and utilizemanifest_5Fold.py
to make a 5-fold evaluation. - after preprocessing the video and preparing the manifests, run
video_classification_finetune.py
to fine-tune the TimeSformer model. - use
video_classification_test.py
to test the saved model using the test.csv manifest. - to tune these two scripts, run the
run.sh
file.