-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MCD DTW tutorial #96
base: main
Are you sure you want to change the base?
MCD DTW tutorial #96
Conversation
tts-evaluation-MCD-DTW.ipynb
Outdated
"## Conclusion\n", | ||
"<img src=\"imgs/riva-tts-MCD_DTW_final_comparision.jpeg\">\n", | ||
"\n", | ||
"From the graph above the value of MCD is greater for radtts audios than radtts audios, this is also reflected in the average MCD value for both models. Therefore we can conclude that fastpitch has better convergence than radtts. However we cannot evaluate the quality of audios generated by these models using MCD. MCD is a great tool for testing model convergence, but generated audios may have pronunciation and quality artefacts. Therefore MCD evaluation should be followed by a MOS(Mean opinion score) and CMOS(Comparative mean opinion scores) evaluation." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you clarify this sentence?
tts-evaluation-MCD-DTW.ipynb
Outdated
"sr = 22050\n", | ||
"\n", | ||
"## Mfcc params\n", | ||
"n_mfcc=n_mels" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you set n_mfcc to 34?
tts-evaluation-MCD-DTW.ipynb
Outdated
"source": [ | ||
"def mel2mfcc(mels):\n", | ||
" mfcc = librosa.feature.mfcc(S=mels, n_mfcc=n_mfcc)\n", | ||
" mfcc = librosa.power_to_db(mfcc, ref=np.max)\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Drop the power_to_db
@@ -0,0 +1,495 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How did you get the mel spectrograms from the models?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For radTTS I used: https://github.com/NVIDIA/radtts
For fastpitch, I used the method described here: https://catalog.ngc.nvidia.com/orgs/nvidia/teams/nemo/models/tts_en_fastpitch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do have a code snippet that you can add to the notebook?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can add for fastpitch, for radTTS I had to do some changes in their inference script. That wont be possible in the notebook.
Mcd tutorial