How long does it takes to train. #4

qsh-zh · 2021-02-24T09:06:35Z

Thanks for your clean implementation sharing.

I try on celeba datasets. After 150k steps, the generated images are not well as it claimed in the paper and the flowers you show in the readme.

Is it something to do with the datasets or I need more time to train?

ariel415el · 2021-05-19T07:24:02Z

Hi, I'm also traying to train this repo.
What image resolution are you using?
In the paper (Appendix B) they say they trained 256x256 CelebA-HQ for 500k steps of 64 batchsize.
Did your loss plateau or is it still decreasing?
And by the way, how much time did it take to train these 150k steps? what batchsize?

IceClear · 2021-05-29T14:35:37Z

Similar results after 145k on cifar. I wonder if it is harder to be trained than GAN or it is not stable enough yet...

qsh-zh · 2021-05-29T17:22:12Z

@ariel415el The loss plateau for the figure I show if my memory serves me well. I forget some details about the experiments, it ran like 36-48 hours on one 2080Ti. Batchsize is 32 with fp16, unet dim 64.

qsh-zh · 2021-05-29T17:28:46Z

@IceClear Do you mind sharing sample images if you could?

IceClear · 2021-05-30T11:07:41Z

@IceClear Do you mind sharing sample images if you could?

Sure, here it is (sample 186) after 186k.

Smith42 · 2021-06-12T08:57:04Z

I've been training using this repo and am getting (very) good results on 256x256 images after around 800,000 global steps (batch of 16). Score-based models are known to take more compute to train vs a comparable GAN, so perhaps more training time is required in your cases?

ariel415el · 2021-06-13T07:38:32Z

Thanks @Smith42 ,
The thing is for me and @qshzh the train loss plateaus so I'm not sure how more steps can help. Did your loss continue decreasing throughout training?
Can you share some of your result images here so that we know what to expect?
BTW, for how long did you train the model? i guess it was more than 2 days.

Smith42 · 2021-06-14T08:27:22Z

Can you share some of your result images here so that we know what to expect?

@ariel415el unfortunately I can't share the results just yet, but should have a preprint out soon that I can share.

The thing is for me and @qshzh the train loss plateaus so I'm not sure how more steps can help. Did your loss continue decreasing throughout training?

The loss didn't seem to plateau for me until very late in the training cycle, but this is with training on a dataset with order 10^6 examples.

BTW, for how long did you train the model? i guess it was more than 2 days.

On a single V100 it took around 2 weeks of training.

qsh-zh · 2021-06-14T15:03:50Z

@IceClear @ariel415el This is the fid curve on cifar10 for sampled 1k images.

The 26 step in the figure is global 108000 steps. For 50k samples, its fid is 15.13.

Sumching · 2021-06-20T05:58:40Z

The image size is 256, batchsize is 32, and 480k steps, which does not look good.

gwang-kim · 2021-09-08T11:29:36Z

@Sumching, @qshzh, @IceClear @ariel415el, @Smith42 Guys, how low are your training losses? In my case, the noise prediction losses are several hundred ~ thousands. Is this right?

Smith42 · 2021-09-23T10:39:07Z

@Sumching, @qshzh, @IceClear @ariel415el, @Smith42 Guys, how low are your training losses? In my case, the noise prediction losses are several hundred ~ thousands. Is this right?

That's way too high, I'm getting sub 0.1 once fully trained. Have you checked your normalisations?

jiangxiluning · 2022-02-21T01:11:17Z

@Smith42
hi， I trained it with cifar-10. The batch size is 16. The image size is 128. The loss is about 0.05. But the generated images are seemed as being blurred.

Smith42 · 2022-03-08T16:16:27Z

@Smith42 hi， I trained it with cifar-10. The batch size is 16. The image size is 128. The loss is about 0.05. But the generated images are seemed as being blurred.

I use a fork of Phil's code in my paper and am not getting blurring problems. Maybe there is something up with your hyperparameters?

cajoek · 2022-03-18T10:21:05Z

Hi @Smith42 & @jiangxiluning when you say you get a loss below 0.1 are you using a L1 or L2 loss?

jiangxiluning · 2022-03-19T09:36:56Z

@cajoek for me, it is L1.

Smith42 · 2022-03-22T14:23:11Z

L1 for me too

cajoek · 2022-03-22T16:17:51Z

Thanks @jiangxiluning @Smith42!

My loss unfortunately plateaus at about 0.10-0.15 so I decided to plot the mean L1 loss over one epoch versus the timestep t and I noticed that the loss stays quite high for low values of t, as can be seen i this figure. Do you know if that is expected?

(L1 loss vs timestep t after many epochs on a small dataset. Convergence is not quite reached yet)

malekinho8 · 2022-06-03T20:35:52Z

@Smith42 Would you be able to show some samples/results from training your CelebA model? It seems that a lot of other people are struggling to reproduce the results shown in the paper.

Smith42 · 2022-06-05T08:58:11Z

@Smith42 Would you be able to show some samples/results from training your CelebA model? It seems that a lot of other people are struggling to reproduce the results shown in the paper.

@malekinho8 I ran a fork of lucidrains' model on a large galaxy image data set here, not on CelebA. However, the galaxy imagery is well replicated with this codebase, so I expect it will work okay on CelebA too.

DushyantSahoo · 2022-07-14T19:27:13Z

@jiangxiluning Can you pleasure share your code? I am also training on cifar10 and the loss does not go below 0.7. Below is my trainer model
trainer = Trainer(
diffusion,
new_train,
train_batch_size = 32,
train_lr = 1e-4,
train_num_steps = 500000, # total training steps
gradient_accumulate_every = 2, # gradient accumulation steps
ema_decay = 0.995, # exponential moving average decay
amp = True # turn on mixed precision
)
model = Unet(
dim = 16,
dim_mults = (1, 2, 4)
)

greens007 · 2022-08-16T07:02:51Z

Hi, I got the same problem in cifar10. The model generated failed images even after 150k steps. Did you succeeded?

@jiangxiluning Can you pleasure share your code? I am also training on cifar10 and the loss does not go below 0.7. Below is my trainer model trainer = Trainer( diffusion, new_train, train_batch_size = 32, train_lr = 1e-4, train_num_steps = 500000, # total training steps gradient_accumulate_every = 2, # gradient accumulation steps ema_decay = 0.995, # exponential moving average decay amp = True # turn on mixed precision ) model = Unet( dim = 16, dim_mults = (1, 2, 4) )

yiyixuxu · 2022-09-19T21:50:46Z

Hi: so cifar10 contains tiny pictures 32x32 - it is naturally going to look blurry if you resize to 128x128

177488ZL · 2023-02-28T13:31:56Z

Thanks for your clean implementation sharing.

I try on celeba datasets. After 150k steps, the generated images are not well as it claimed in the paper and the flowers you show in the readme.

Is it something to do with the datasets or I need more time to train?

Excuse me, do you modify the code or parameters during training, or load the pre-training weight file, the loss will drop to nan during my training

cajoek mentioned this issue Mar 22, 2022

Loss size #16

Open

npielawski mentioned this issue Oct 5, 2022

Added gradient clipping. #102

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How long does it takes to train. #4

How long does it takes to train. #4

qsh-zh commented Feb 24, 2021

ariel415el commented May 19, 2021

IceClear commented May 29, 2021

qsh-zh commented May 29, 2021

qsh-zh commented May 29, 2021

IceClear commented May 30, 2021

Smith42 commented Jun 12, 2021

ariel415el commented Jun 13, 2021

Smith42 commented Jun 14, 2021

qsh-zh commented Jun 14, 2021

Sumching commented Jun 20, 2021

gwang-kim commented Sep 8, 2021 •

edited

Loading

Smith42 commented Sep 23, 2021

jiangxiluning commented Feb 21, 2022

Smith42 commented Mar 8, 2022

cajoek commented Mar 18, 2022

jiangxiluning commented Mar 19, 2022

Smith42 commented Mar 22, 2022

cajoek commented Mar 22, 2022 •

edited

Loading

malekinho8 commented Jun 3, 2022

Smith42 commented Jun 5, 2022

DushyantSahoo commented Jul 14, 2022

greens007 commented Aug 16, 2022

yiyixuxu commented Sep 19, 2022

177488ZL commented Feb 28, 2023

How long does it takes to train. #4

How long does it takes to train. #4

Comments

qsh-zh commented Feb 24, 2021

ariel415el commented May 19, 2021

IceClear commented May 29, 2021

qsh-zh commented May 29, 2021

qsh-zh commented May 29, 2021

IceClear commented May 30, 2021

Smith42 commented Jun 12, 2021

ariel415el commented Jun 13, 2021

Smith42 commented Jun 14, 2021

qsh-zh commented Jun 14, 2021

Sumching commented Jun 20, 2021

gwang-kim commented Sep 8, 2021 • edited Loading

Smith42 commented Sep 23, 2021

jiangxiluning commented Feb 21, 2022

Smith42 commented Mar 8, 2022

cajoek commented Mar 18, 2022

jiangxiluning commented Mar 19, 2022

Smith42 commented Mar 22, 2022

cajoek commented Mar 22, 2022 • edited Loading

malekinho8 commented Jun 3, 2022

Smith42 commented Jun 5, 2022

DushyantSahoo commented Jul 14, 2022

greens007 commented Aug 16, 2022

yiyixuxu commented Sep 19, 2022

177488ZL commented Feb 28, 2023

gwang-kim commented Sep 8, 2021 •

edited

Loading

cajoek commented Mar 22, 2022 •

edited

Loading