Summary

ElenaRyumina · Mar 29, 2024 · 1fb9c82 · 1fb9c82
1 parent 6dcfd37
commit 1fb9c82
Show file tree

Hide file tree

Showing 9 changed files with 6 additions and 9 deletions.
diff --git a/README.md b/README.md
@@ -1,10 +1,10 @@
-# Audio-Visual Compound Expression Recognition Method based on Late Modality Fusion and Rule-based Decision
+# Zero-Shot Audio-Visual Compound Expression Recognition Method based on Emotion Probability Fusion
 
-The official repository for "Audio-Visual Compound Expression Recognition Method based on Late Modality Fusion and Rule-based Decision", as a part of [6th ABAW 2024](https://affective-behavior-analysis-in-the-wild.github.io/6th/) (submitted)
+The official repository for "Zero-Shot Audio-Visual Compound Expression Recognition Method based on Emotion Probability Fusion", as a part of [CVPRW 2024](https://affective-behavior-analysis-in-the-wild.github.io/6th/) (submitted)
 
 ## Abstract
 
-This paper presents the results of the SUN team for the CE Recognition Challenge of the 6th ABAW Competition. We propose a novel audio-visual method for compound expression recognition. Our method relies on emotion recognition models that fuse modalities at the emotion probability level, while decisions regarding the prediction of compound expressions are based on predefined rules. Notably, our method does not use any training data specific to the target task. The method is evaluated in multi-corpus training and cross-corpus validation setups. Our findings from the challenge demonstrate that the proposed method can potentially form a basis for development of intelligent tools for annotating audio-visual data in the context of human's basic and compound emotions.
+A Compound Expression Recognition (CER) as a part of affective computing is a novel task in intelligent human-computer interaction and multimodal user interfaces. We propose a novel audio-visual method for CER. Our method relies on emotion recognition models that fuse modalities at the emotion probability level, while decisions regarding the prediction of compound expressions are based on the pair-wise sum of weighted emotion probability distributions. Notably, our method does not use any training data specific to the target task. Thus, the problem is a zero-shot classification task. The method is evaluated in multi-corpus training and cross-corpus validation setups. We achieved F1-score values equal to 32.15% and 25.56% for the AffWild2 and C-EXPR-DB test subsets without training on target corpus and target task, respectively. Therefore, our method is on par with methods developed training target corpus or target task. Our findings from the challenge demonstrate that the proposed method can potentially form a basis for developing intelligent tools for annotating audio-visual data in the context of human's basic and compound emotions.
 
 ## Acknowledgments
 

diff --git a/index.html b/index.html
@@ -173,7 +173,7 @@ <h2 class="title is-3">Abstract</h2>
                     <div class="columns is-centered has-text-centered">
                         <div class="column is-four-fifths">
                             <h2 class="title is-3">Pipeline of the proposed audio-visual CER method</h2>
-                            <img class="img-method" src="./static/img/Pipeline.png" alt="pipeline">
+                            <img class="img-method" src="./static/img/Pipeline.jpg" alt="pipeline">
                         </div>
                     </div>
                 </div>

diff --git a/src/README.md b/src/README.md
@@ -1,6 +1,4 @@
-# Audio-Visual Compound Expression Recognition Method based on Late Modality Fusion and Rule-based Decision
-
-This paper presents the results of the SUN team for the Compound Expression (CE) Recognition Challenge of the 6th ABAW Competition. We propose a novel audio-visual method for compound expression recognition. Our method relies on emotion recognition models that fuse modalities at the emotion probability level, while decisions regarding the prediction of compound expressions are based on predefined rules. Notably, our method does not use any training data specific to the target task. Thus, the problem is thus, a zero-shot classification task. The method is evaluated in multi-corpus training and cross-corpus validation setups. Our findings from the challenge demonstrate that the proposed method can potentially form a basis for developing intelligent tools for annotating audio-visual data in the context of human's basic and compound emotions.
+This repository introduces a new zero-short audio-visual method for compound expression recognition.
 
 Model weights are available at [models](https://drive.google.com/drive/folders/1KMkMNKkymTVV3eJaXHU6ydvEj5UfUA0E?usp=sharing). You should download them and place them in ``src/weights``. You will also need weights for the RetinaFace detection model. Please refer to the original [repository](https://github.com/hhj1897/face_detection).
 
@@ -13,6 +11,5 @@ python run.py --path_video <your path to a video file> --path_save <your path to
 Example of predictions obtained by static visual (VS), dynamic visual (VD), audio (A), and audio-visual (AV) models:
 
 <div style="display:flex; flex-direction: column;">
-    <img src="test_videos/results/faces.jpg" alt="Faces" style="width: 100%;">
-    <img src="test_videos/results/pedicted_CEs_Rule 1.jpg" alt="CE predictions" style="width: 100%;">
+    <img src="https://github.com/C-EXPR-DB/AVCER/blob/main/static/img/Predictions.png" alt="predictions" style="width: 100%;">
 </div>
diff --git a/src/test_videos/results/faces.jpg b/src/test_videos/results/faces.jpg
diff --git a/src/test_videos/results/pedicted_CEs_Rule 1.jpg b/src/test_videos/results/pedicted_CEs_Rule 1.jpg
diff --git a/src/test_videos/results/pedicted_CEs_Rule 2.jpg b/src/test_videos/results/pedicted_CEs_Rule 2.jpg
diff --git a/static/img/fig1_2_01.jpg → static/img/Pipeline.jpg b/static/img/fig1_2_01.jpg → static/img/Pipeline.jpg
diff --git a/static/img/Pipeline.png b/static/img/Pipeline.png
diff --git a/static/img/Predictions.png b/static/img/Predictions.png