Skip to content

Latest commit

 

History

History
104 lines (83 loc) · 7.29 KB

README.md

File metadata and controls

104 lines (83 loc) · 7.29 KB

Sound-IT

Sound-IT is a research project aimed at using artificial intelligence to recognize emotions in videos and generate an music that fits those emotions. We have created four models to determine the emotional tone of a video: facial expression detection, background hue recognition, body emotion recognition, and lip reading the models are running in parralele so may works slow if you only work on a cpu. After that, music is generated that matches the detected emotion using JukeBox .

The proposed method offers a potential solution to the problem of high costs associated with composing and recording original music for films.

How it works

Our model is capable of recognizing a scene's dominant emotion by analyzing factors such as body language, facial expressions, lip reading, and background color. Using this emotion, the model generates a new piece of music ideally suited to the scene.


Sound-it PIPELINE

image



Sound-it NN Achitectures & datasets

Screenshot 2023-03-30 at 11 32 57 Screenshot 2023-03-30 at 11 33 13


First, We detect what emotion the video is trying to portray by using 4 models:

facial expression detection - Initially, we created and trained a face recognition model based on the VGG-16 architecture, then we used MediaPipe, an open-source framework for building cross-platform machine learning pipelines for perception tasks such as object detection, tracking, and facial recog- nition (Lugaresi et al., 2019), to locate the face in given videos, and then we used the aforementioned model to detect the emotion from the face. background hue recognition - We extract the color values of each pixel in the video, calculate the average color of the video by averaging the color values of the pixels, and then assign the color to a specific emotion according to 4.

Screenshot 2023-03-30 at 11 32 57Screenshot 2023-03-30 at 11 32 57

Approch for Facial Emotion Recognition

Using FER2013 as dataset and implementing VGG16 Neural Network Achitechture


Screenshot 2023-03-30 at 11 32 57Screenshot 2023-03-30 at 11 32 57

Approch for BodyLanguage

Collecting Keypoints with Mediapipe Holistic Model than Training The model With 30 frames per action


Screenshot 2023-03-30 at 11 32 57Screenshot 2023-03-30 at 11 32 57


Approch for LipReading

Collecting the frame of the down face than Training The model With 75 frames per action


Screenshot 2023-03-30 at 11 32 57Screenshot 2023-03-30 at 11 32 57

Here are some examples of the emotions detected by our model and the corresponding music generated:

Example 1
Sound-IT Demo

Click on the thumbnail to watch the Sound-IT demo.

Getting Started To get started with Sound-IT, you can clone our repository and follow the instructions in the README file.

To run the code : -first of all run the pip install -r requirements.txt to install all the packages: -then change the models path in the UISOUND folder in the files : -allinference.py -inference.py -inferenceCam.py

for example : weights_1 = '/Users/kevynkrancenblum/Desktop/Data Science/Final Project/Facial_emotion_recognition/saved_models/vggnet.h5' weights_2 = '/Users/kevynkrancenblum/Desktop/Data Science/Final Project/Facial_emotion_recognition/saved_models/vggnet_up.h5' model_V1=BodySentimentModel(body_input_shape, actions.shape[0]) model_V1.load_weights('/Users/kevynkrancenblum/Desktop/Data Science/Final Project/Body_Language_recognition/modelsSaved/BodyModelCamv1.h5')

model_V2=BodySentimentModel(body_input_shape, actions.shape[0])
model_V2.load_weights('/Users/kevynkrancenblum/Desktop/Data Science/Final Project/Body_Language_recognition/modelsSaved/BodyModelCamv2.h5')
Change the path to where the your model is located : 

To train on your own emotion or own action recognition base on the body language : Go to : Body_language_recognition/streamlitRecording.py an change the code where you need to add, remove emotions or actions To train on your own emotion or facial micro emotion train you data by first of all adding your own data then run the model ( IMPORTANT THAT BECAUSE ITS MICRO EMOTION RECOGNITION YOU WILL NEED AN SIGNIFICANTE AMOUNT OF DATA ) :

FOR THE LIP READING CONSIDER RUNNING THE CODE IN LipReading/lipnet.ipynb to download the model weight

or download it using those lines : url = 'https://drive.google.com/uc?id=1YlvpDLix3S-U8fd-gqRwPcWXAXm8JwjL' output = 'data.zip' gdown.download(url, output, quiet=False) gdown.extractall('data.zip')

you can also train in you own language and own sentence by creating your own dataset with video and text aligment. important that the models is a CNN+RNN achitechture that mean thats for the Recurent Neural network you

MUST !

have an predifine sentence lenght for example here 75 frames is the sentence lenght for every video and aligment otherwise the model won't work

Contributing We welcome contributions from the community. If you have any suggestions or would like to contribute, please open an issue or pull request on our GitHub repository.