Integration Tacotron #2

twidddj · 2018-04-10T09:52:12Z

So far, I couldn't find the model which attention works in "reduction factor" = 1. If we use the factor > 1, the prediction would seem like below image. It would be a bad news to wavenet performance.

Here, the original Mel-spectrum is

You can find some discussion for this issue on @Rayhane-mamah's repo and @keithito's repo also.

Rayhane-mamah · 2018-04-10T10:29:24Z

Hi @twidddj, thanks for sharing your work!

I am assuming you trained the wavenet vocoder on ground truth mels? Did you try training it on ground truth aligned samples generated with the Tacotron ( r > 1 ) model?
In best cases the wavenet will learn to map mels correctly despite the noise in them. If it doesn't work, we'll try figuring out why attention isn't working with r=1. (Have not tested it yet, I get my gpu this week so I'll tell you how it goes)

twidddj · 2018-04-10T12:08:19Z

Hi @Rayhane-mamah, welcome!

Yes, you are right. It has trained on ground truth mels not GTA. I have not tested it yet and have a plan to do it using @keithito's pretrained model(r=5) in next week. If you tell me how it goes on, it must be very helpful to me. We would get an achievement while we are at it.

Rayhane-mamah · 2018-04-10T13:42:28Z

Yes I am counting on fully training my tacotron an train a wavenet on its GTA output in the upcoming week, I'll tell you how it goes.

…

On Tue, 10 Apr 2018, 13:08 twidddj, ***@***.***> wrote: Hi @Rayhane-mamah <https://github.com/Rayhane-mamah>, welcome! Yes, you are right. It has trained on ground truth mels not GTA. I have not tested it yet and have a plan to do it using @keithito <https://github.com/keithito>'s pretrained model(r=5) in next week. If you tell me how it goes on, it must be very helpful to me. We would get an achievement while we are at it. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AhFSwCYgXQMmOIUjkj938t8rPlOt0VcYks5tnKC0gaJpZM4TN8ZY> .

twidddj · 2018-04-23T08:15:23Z

We have tried some works for this issue.

Tested Rayhane-mamah's Tacotron-2 with r=1. It's attention works and the intelligibility of TTS remarkably improved compared to previous version. However there is another issue reported by him. We believe the problem would be solved soon. Thanks!
Tested our vocoder on the mel spectrograms computed through the same method as Tacotron2 paper( 2048 fft_size, 300 hop_size, 1300 window_size on 24K sample rate). Although it seems to require more training steps(over 1000K) than r9y9's setting, it works too. Thanks to @Ondal90!

twidddj changed the title ~~Integration~~ Integration Tacotron and vocoder Apr 10, 2018

twidddj changed the title ~~Integration Tacotron and vocoder~~ Integration Tacotron and Apr 10, 2018

twidddj changed the title ~~Integration Tacotron and~~ Integration Tacotron and Wavenet vocoder Apr 10, 2018

twidddj changed the title ~~Integration Tacotron and Wavenet vocoder~~ Integration Tacotron Apr 10, 2018

twidddj added the help wanted Extra attention is needed label Apr 11, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integration Tacotron #2

Integration Tacotron #2

twidddj commented Apr 10, 2018 •

edited

Loading

Rayhane-mamah commented Apr 10, 2018

twidddj commented Apr 10, 2018

Rayhane-mamah commented Apr 10, 2018 via email

twidddj commented Apr 23, 2018 •

edited

Loading

Integration Tacotron #2

Integration Tacotron #2

Comments

twidddj commented Apr 10, 2018 • edited Loading

Rayhane-mamah commented Apr 10, 2018

twidddj commented Apr 10, 2018

Rayhane-mamah commented Apr 10, 2018 via email

twidddj commented Apr 23, 2018 • edited Loading

twidddj commented Apr 10, 2018 •

edited

Loading

twidddj commented Apr 23, 2018 •

edited

Loading