Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loss exploded??? #2

Open
WhiteFu opened this issue Apr 20, 2019 · 9 comments
Open

Loss exploded??? #2

WhiteFu opened this issue Apr 20, 2019 · 9 comments
Assignees

Comments

@WhiteFu
Copy link

WhiteFu commented Apr 20, 2019

I get the error "loss explode" in the training stage!
I'm not modifying the original hyperparameters, and I want to know how to solve the problem.

@adimukewar
Copy link

I am facing the same issue. Please let me know if you have resolved it.

@rishikksh20
Copy link
Owner

rishikksh20 commented May 7, 2019

actually, it is a common occurrence when dealing with a variational autoencoder. Two way to resolve it

  1. again start training from 3 or 4 back saved checkpoints (not from recent one). But be prepared loss may explode again after running for a while, then do the same process again.
  2. In the file train.py on line 133, change the value of the loss.
    @adimukewar @WhiteFu

@rishikksh20 rishikksh20 self-assigned this May 7, 2019
@WhiteFu
Copy link
Author

WhiteFu commented May 7, 2019

Thanks for your reply, I will take it immediately!

@WhiteFu
Copy link
Author

WhiteFu commented May 7, 2019

I am facing the same issue. Please let me know if you have resolved it.

sorry, I didn't reply to you in time. I have been trying some other work recently, so I haven't solved this problem

@rishikksh20
Copy link
Owner

@WhiteFu if you are using this code then use large (more than 50 hours) expressive dataset like a blizzard for getting a decent result.

@MisakaMikoto96
Copy link

I am facing the same issue. Please let me know if you have resolved it.

sorry, I didn't reply to you in time. I have been trying some other work recently, so I haven't solved this problem

hi, I have the same problem that I supposed to modified some hparams but it still not work.Please let me know if you have solved this. thx😄

@WhiteFu
Copy link
Author

WhiteFu commented Jun 19, 2019

The loss is not stable, so you can modify the upper limit of the parameter In the file train.py on line 133,

@MisakaMikoto96
Copy link

MisakaMikoto96 commented Jun 20, 2019

The loss is not stable, so you can modify the upper limit of the parameter In the file train.py on line 133,

hi, but it seems my loss = nan (every time at the same step when training) and I try to modify the batch size or learning rate but it still not work.

@rishikksh20
Copy link
Owner

@MisakaMikoto96 aware of Nan loss, it means your variational autoencoder (VAE) unable to learn the latent representation. This is the common problem when you dealing with Variational autoencoder but the sad part is, there is no simple solution for that.
One solution you can try to go line and manipulate w1 and w2.
But before that make sure you have adequate quantity and expressiveness rich voice data and also sometimes after getting error, restart training from 2 steps back saved checkpoint is worked fine for me, if you getting the error again at the same checkpoint then restart from 3 steps back saved checkpoint and so on. If again and again, you get Nan at the same step count then try the above solution.
You can also read variational autoencoder paper for more understanding and otherwise feel free to ask here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants