-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How long does it takes to train. #4
Comments
Hi, I'm also traying to train this repo. |
Similar results after 145k on cifar. I wonder if it is harder to be trained than GAN or it is not stable enough yet... |
@ariel415el The loss plateau for the figure I show if my memory serves me well. I forget some details about the experiments, it ran like 36-48 hours on one 2080Ti. Batchsize is 32 with fp16, unet dim 64. |
@IceClear Do you mind sharing sample images if you could? |
|
I've been training using this repo and am getting (very) good results on 256x256 images after around 800,000 global steps (batch of 16). Score-based models are known to take more compute to train vs a comparable GAN, so perhaps more training time is required in your cases? |
Thanks @Smith42 , |
@ariel415el unfortunately I can't share the results just yet, but should have a preprint out soon that I can share.
The loss didn't seem to plateau for me until very late in the training cycle, but this is with training on a dataset with order 10^6 examples.
On a single V100 it took around 2 weeks of training. |
@IceClear @ariel415el This is the fid curve on cifar10 for sampled 1k images. |
@Sumching, @qshzh, @IceClear @ariel415el, @Smith42 Guys, how low are your training losses? In my case, the noise prediction losses are several hundred ~ thousands. Is this right? |
That's way too high, I'm getting sub 0.1 once fully trained. Have you checked your normalisations? |
@Smith42 |
Hi @Smith42 & @jiangxiluning when you say you get a loss below 0.1 are you using a L1 or L2 loss? |
@cajoek for me, it is L1. |
L1 for me too |
Thanks @jiangxiluning @Smith42! My loss unfortunately plateaus at about 0.10-0.15 so I decided to plot the mean L1 loss over one epoch versus the timestep t and I noticed that the loss stays quite high for low values of t, as can be seen i this figure. Do you know if that is expected? |
@Smith42 Would you be able to show some samples/results from training your CelebA model? It seems that a lot of other people are struggling to reproduce the results shown in the paper. |
@malekinho8 I ran a fork of lucidrains' model on a large galaxy image data set here, not on CelebA. However, the galaxy imagery is well replicated with this codebase, so I expect it will work okay on CelebA too. |
@jiangxiluning Can you pleasure share your code? I am also training on cifar10 and the loss does not go below 0.7. Below is my trainer model |
Hi, I got the same problem in cifar10. The model generated failed images even after 150k steps. Did you succeeded?
|
Hi: so cifar10 contains tiny pictures |
|
Thanks for your clean implementation sharing.
I try on celeba datasets. After 150k steps, the generated images are not well as it claimed in the paper and the flowers you show in the readme.
Is it something to do with the datasets or I need more time to train?
The text was updated successfully, but these errors were encountered: