Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integration Tacotron #2

Open
twidddj opened this issue Apr 10, 2018 · 4 comments
Open

Integration Tacotron #2

twidddj opened this issue Apr 10, 2018 · 4 comments
Labels
help wanted Extra attention is needed

Comments

@twidddj
Copy link
Owner

twidddj commented Apr 10, 2018

So far, I couldn't find the model which attention works in "reduction factor" = 1. If we use the factor > 1, the prediction would seem like below image. It would be a bad news to wavenet performance.
teacher_forced_mel_prediction

Here, the original Mel-spectrum is
true_mel

You can find some discussion for this issue on @Rayhane-mamah's repo and @keithito's repo also.

@twidddj twidddj changed the title Integration Integration Tacotron and vocoder Apr 10, 2018
@twidddj twidddj changed the title Integration Tacotron and vocoder Integration Tacotron and Apr 10, 2018
@twidddj twidddj changed the title Integration Tacotron and Integration Tacotron and Wavenet vocoder Apr 10, 2018
@twidddj twidddj changed the title Integration Tacotron and Wavenet vocoder Integration Tacotron Apr 10, 2018
@Rayhane-mamah
Copy link

Hi @twidddj, thanks for sharing your work!

I am assuming you trained the wavenet vocoder on ground truth mels? Did you try training it on ground truth aligned samples generated with the Tacotron ( r > 1 ) model?
In best cases the wavenet will learn to map mels correctly despite the noise in them. If it doesn't work, we'll try figuring out why attention isn't working with r=1. (Have not tested it yet, I get my gpu this week so I'll tell you how it goes)

@twidddj
Copy link
Owner Author

twidddj commented Apr 10, 2018

Hi @Rayhane-mamah, welcome!

Yes, you are right. It has trained on ground truth mels not GTA. I have not tested it yet and have a plan to do it using @keithito's pretrained model(r=5) in next week. If you tell me how it goes on, it must be very helpful to me. We would get an achievement while we are at it.

@Rayhane-mamah
Copy link

Rayhane-mamah commented Apr 10, 2018 via email

@twidddj twidddj added the help wanted Extra attention is needed label Apr 11, 2018
@twidddj
Copy link
Owner Author

twidddj commented Apr 23, 2018

We have tried some works for this issue.

  • Tested Rayhane-mamah's Tacotron-2 with r=1. It's attention works and the intelligibility of TTS remarkably improved compared to previous version. However there is another issue reported by him. We believe the problem would be solved soon. Thanks!
  • Tested our vocoder on the mel spectrograms computed through the same method as Tacotron2 paper( 2048 fft_size, 300 hop_size, 1300 window_size on 24K sample rate). Although it seems to require more training steps(over 1000K) than r9y9's setting, it works too. Thanks to @Ondal90!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants