Any tips on fixing alignment problems #654
-
Hello ! Training a model using tacotron. Pretty much just first crack at this (with coqui) but wondering if there are tips on avoiding alignment problems. Voice sounds vaguely human after a few hundred epochs, but alignment is looking like this and I think this is reflected in the sound... Just wondering if anyone made a checklist or something of common things to check with regards to bad alignment? -- My audio data has been manually broken up (by humans) into utterances. (I have about 750 or so utterances total at the moment, but expecting more.) One thing that i could do - though it would take a bit of work - is to run the whole data set through another model i already have that would allow me to clip the utterances more precisely around the spoken words. Would that be likely to help? (It's a non trivial extra step). Alternatively I am wondering if I should maybe turn off the 'do_trim_silence' - these recordings are pretty good quality and made in a studio so perhaps 'trim silence' is not necessary? Current settings as follows |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 2 replies
-
Any help much appreciated in advance. |
Beta Was this translation helpful? Give feedback.
https://tts.readthedocs.io/en/latest/faq.html