Voice emotion conversion model for DS/ML master's thesis. F0 contour mapping in sequence-to-sequence RNN-LSTM architecture in Tensorflow.
- audio/ - samples of original and transformed voice samples in WAV format
- data/ - Matlab MAT files produced by AudioSculpt, containing F0 contours and syllable/phoneme alignments
- documentation/ - PDF reports on the state of the art, aims, method, experimental setup, results and discussion
- model/ - python code to build your own seq2seq model
- postprocessing/ - code to process the outputted F0 contours and apply to neutral WAV files
- preprocessing/ - code to process the MAT files to create input data for the model
- utilities/ - code to rename WAV files for use in the online survey, and to process survey results
- visualisation/ - jupyter notebooks to visualise the F0 contours
These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.
Libraries:
- Python 3.6, SciPy, Matplotlib (Anaconda recommended)
- Tensorflow 1.4 or later (1.8 recommended)
- as_pysrc - collection of modules for signal processing, maintained by the AS team on forge.ircam.fr (SSH access needed)
Software:
- AudioSculpt + IrcamAlign to generate .MAT files with F0 values and syllable/phoneme alignments
- tf_seq2seq_data_processing_phoneme_to_phoneme.ipynb
- check phrase/emotions/intensities are set correctly
- check ratio of train/dev/test are correct (usually 0.8/0.2/0.0)
- run all blocks to generate the syllable csv files, the combo source/target files and the vocab files
- tf_seq2seq_data_processing_test_input.ipynb
- run first section to generate the test input (all 80 neutral source files)
- put test_source.txt in the ..out/test directory (delete existing contents)
- upload files to server
- scp -r /Users/robinson/Dropbox/anasynth/_data/emoVC/Olivia2006/f0_raw_phoneme/out/* robinson@gusli:/data2/anasynth_nonbp/robinson/tf_seq2seq_data3
- tf_seq2seq_config.yml
- adjust params to fit length of source/target files (attention, max_seq_length, num_units)
- kick off training and inference
- login to gusli
- cd ~/code/tf_seq2seq
- activate gpu environment
- nohup bash ./tf_seq2seq_run.sh > nohup_$(date +"%Y%m%d_%H%M%S").out &
- download model directory (all checkpoints and predictions)
- scp -r robinson@gusli:/data2/anasynth_nonbp/robinson/tf_seq2seq_data/model/20180902_090436 ./20180902_090436
- change date for latest model directory
- scp -r robinson@gusli:/data2/anasynth_nonbp/robinson/tf_seq2seq_data/model/20180902_090436 ./20180902_090436
- download log
- scp robinson@gusli:/u/anasynth/robinson/code/tf_seq2seq/nohup_20180827_171528.out .
- move the download
- make directory /Users/robinson/Downloads/data/pred/20180712_134318/ (change date)
- make model subdirectory, and move the downloaded model folder into it, so it looks like /Users/robinson/Downloads/data/pred/20180713_153850/model/20180713_153850
- get predictions.txt and process it
- get from /pred/
- remove first line of txt file (comp device)
- remove prefixes if using conditioned model (search and replace a/b/c for nothing)
- put predictions.txt in directory where model is saved to e.g. /Users/robinson/Downloads/data/pred/20180712_134318/
- tf_seq2seq_data_processing_make_new_contours.ipynb
- edit datetimestamp in directory path to match where you put predictions.txt
- run to generate new phoneme files from predictions.txt for conversion operations to follow
- tf_seq2seq_f0_transform_phoneme_nosyll.ipynb
- copy subdirectories of '/Users/robinson/Downloads/data/Olivia2006/Olivia2006_AUDIO’ to previous saved model directory e.g. /Users/robinson/Downloads/data/pred/20180712_134318/
- set params at the top
- edit directory path datestamp folder name at the top
- change emotion to JOY, COL or other emotion tag
- set interp = True
- set harmonicity threshold, harm_thresh = 0.7
- run all blocks
- listen to the audio in the compare directory, and select the best ones
- tf_seq2seq_data_comparison_phrase_phoneme_nosyll.ipynb
- run this to generate F0 contour figures in f0compare directory, for visual inspection of results
- edit syll_input_directory_path datestamp folder name e.g. '/Users/robinson/Downloads/data/pred/20180712_134318/syllables’
- Carl Robinson - Master's Student - Voice Tech Podcast
- Nicolas Obin - Supervisor - IRCAM - CNRS - Sorbonne Université
This project is licensed under the MIT License - see the LICENSE.md file for details
This project would not have been possible without the generous help and support of many talented people. I would like to thank the following people for giving me their time, for sharing their knowledge and ideas, and for their continued encouragement:
- Nicolas Obin
- Axel Roebel, Guillaume Doras and Rafael Ferro
- Sylvie Thiria, Cecile Mallet, Sonia Garcia, Nesma Houmani, and Jerome Boudy
- Eric Bolo and the Batvoice team
- Veronique Sieng