This is a speech recognition project built with the purpose of exploring feature engineering in audio samples and Python best practices.
The challenge is to break a ficticious audio CAPTCHA formed a by a sequence of four characters. The CAPTCHAs were built with audio samples that have been recorded by volunteer students of the Universidade Federal do ABC. The samples were recorded with diverse microphones, in other words, expected a variety of background noises. The character sequence was randomly assembled, so you will find nonmatching voices in the same CAPTCHA.
The proposed solution uses the Random Forest algorithm from the Scikit-learn package.
The original audio samples are not publicly available in order to preserve the privacy of the volunteers.
You must have Python 3.7 or greater and Pip installed.
Install the dependencies using the requirements.txt file.
pip install -r requirements.txt
In case you have a a folder with ".wav" samples and would like to use it, you should place them in a "data" folder structured as following and run the data prep script:
./data/training
./data/validation
./data/test
python data_prep.py
In order to train the model you should run the following command:
python train_model.py
Run the following command in order to make predictions over the test dataset:
python run_model.py
A mel-spectrogram can be generated by running:
python generate_graphs.py
- Python - The programming language.
- Scikit-learn - Used to train the model and make predictions.
- Pandas - Used to generate DataFrames.
- Librosa - Used to manipulate the audio files and extract some features.
This project is licensed under the GNU GPL3 License - see the LICENSE.md file for details
- Many thanks to João Victor Fontinelle Consonni who helped with a full report of the project.