Generated wave were empty #122

frozen-finger · 2019-04-26T10:55:02Z

i have trained this for over 23k steps, but when using synthesis.py, the result seems empty. And i found that the generated mag to be normal. Can anyone tell me how to solve this problem?

frozen-finger · 2019-05-04T10:58:07Z

Sorry this problem may seem stupid. But when i change is_training to True, there wasn't just silence. Although i still can not understand what it said. So, was it about batch normalization? @Kyubyong

nevercast · 2019-05-23T00:40:22Z

You're going to need to train for at least 150,000 steps I'd imagine. See the pretrained models.

frozen-finger · 2019-05-23T03:59:29Z

You're going to need to train for at least 150,000 steps I'd imagine. See the pretrained models.

Thank u for your advice. Can i know how many steps had you trained and its performance

xiawenxing · 2020-06-01T15:44:15Z

I met this problem also... But even if I turn the is_training to True, the audio synthesized in synthesize mode is also far worse than in mode train.

giridhar-pamisetty · 2020-06-10T06:24:07Z

@frozen-finger How did you solve this problem? Can you please explain?

TheNarrator · 2020-06-10T09:37:42Z

The difference between the quality of audio generated during training and inference is because your model hasn't learned "attention". Make sure to look at the attention plots like the one here. If your model is learning attention, you should start to see a more or less diagonal line. This is also the reason why @nevercast suggested you train for many more steps. Most of my training sessions start producing decent attention plots around 60k steps.

If your dataset has empty spaces at the start or end of the audio files, trimming those would greatly help with this problem.

giridhar-pamisetty · 2020-06-11T05:21:40Z

@TheNarrator Thanks for the response.

@nevercast @frozen-finger @candlewill @Kyubyong
The attention plots are looking diagonal after 50k steps. It seems the model has learned attention. But may be more steps needed I think.

There seems to be problem with predicted Mel(mel_hat) in synthesis.py, because I checked by providing the original Mel extracted from wavfile to mel_hat instead of predicting from the model, this is giving perfect result and it is sounding clean.

So, I thought that mel_hat prediction is going wrong. Will it improve after more steps?

xiawenxing · 2020-06-12T09:47:11Z

@TheNarrator Thanks for the response.

@nevercast @frozen-finger @candlewill @Kyubyong
The attention plots are looking diagonal after 50k steps. It seems the model has learned attention. But may be more steps needed I think.

There seems to be problem with predicted Mel(mel_hat) in synthesis.py, because I checked by providing the original Mel extracted from wavfile to mel_hat instead of predicting from the model, this is giving perfect result and it is sounding clean.

So, I thought that mel_hat prediction is going wrong. Will it improve after more steps?

I met the same problem as you, mel_gt&mag_gt is correct but mel_hat&mag_hat prediction goes wrong. And the audio synthesized is empty. Have you fix it?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generated wave were empty #122

Generated wave were empty #122

frozen-finger commented Apr 26, 2019

frozen-finger commented May 4, 2019

nevercast commented May 23, 2019

frozen-finger commented May 23, 2019

xiawenxing commented Jun 1, 2020

giridhar-pamisetty commented Jun 10, 2020

TheNarrator commented Jun 10, 2020 •

edited

Loading

giridhar-pamisetty commented Jun 11, 2020

xiawenxing commented Jun 12, 2020

Generated wave were empty #122

Generated wave were empty #122

Comments

frozen-finger commented Apr 26, 2019

frozen-finger commented May 4, 2019

nevercast commented May 23, 2019

frozen-finger commented May 23, 2019

xiawenxing commented Jun 1, 2020

giridhar-pamisetty commented Jun 10, 2020

TheNarrator commented Jun 10, 2020 • edited Loading

giridhar-pamisetty commented Jun 11, 2020

xiawenxing commented Jun 12, 2020

TheNarrator commented Jun 10, 2020 •

edited

Loading