traning stop after 4 ticks #2

b4nn3d · 2019-12-24T21:40:25Z

hello there, i got your fork running on colab - semi fine.
like i said in the titled, the training stop after 4 ticks

tick 0 kimg 0.1 lod 0.00 minibatch 32 time 58s sec/tick 57.7 sec/kimg 450.76 maintenance 0.0 gpumem 5.1
tick 1 kimg 6.1 lod 0.00 minibatch 32 time 12m 16s sec/tick 648.1 sec/kimg 107.73 maintenance 30.5 gpumem 5.1
tick 2 kimg 12.2 lod 0.00 minibatch 32 time 23m 19s sec/tick 644.3 sec/kimg 107.10 maintenance 18.0 gpumem 5.1
tick 3 kimg 18.2 lod 0.00 minibatch 32 time 34m 16s sec/tick 652.4 sec/kimg 108.45 maintenance 5.1 gpumem 5.1
^C

^c like a keyboard interrupt.. but i didn't give such a command

skyflynil · 2019-12-24T21:48:42Z

Did you set the 'metric' to be none? There could be issues if you are running fid metric evaluation. I don't need metric thus I did not do any testing or code change for it. btw, I am able to train through google colab for > 10 ticks

b4nn3d · 2019-12-24T21:49:46Z

I launched the training with this.

!python run_training.py --result-dir=results --data-dir=datasets --dataset=blow --config=config-f --total-kimg=12000 --mirror-augment=true --metric=none --min-h=3 --min-w=3 --res-log2=7

skyflynil · 2019-12-24T22:52:33Z

Could be memory issue. You may try this to boost your instance memory.
googlecolab/colabtools#253

b4nn3d · 2019-12-24T23:03:16Z

i got OOM when i was trying with a 512512 dataset. this one was 384384.
in your example you train a 640x384 dataset, so i don't see how this could be a problem ;)

btw, i'm trying with 18764 images.. how big is your dataset?

skyflynil · 2019-12-24T23:07:33Z

I actually did use that high memory instance (25G memory) to train. I have tried 512x512 and 640x384 and both were running fine (around 25k files).

b4nn3d · 2019-12-25T09:25:41Z

ok, it was a memory issue.
trained for 220 ticks with your method

jwb95 · 2020-02-23T20:08:11Z

Hi there,
@b4nn3d did this work out for you?

I'm on a high memory instance.
2k images
256^2 dimensions

Launching with:
!python run_training.py --num-gpus=1 --data-dir=./dataset --config=config-f --dataset=myset --mirror-augment=true --metric=none --total-kimg=2000 --min-h=4 --min-w=4 --res-log2=6

So far I've never seen more than tick 0:
tick 0 kimg 0.1 lod 0.00 minibatch 32 time 41s sec/tick 41.0 sec/kimg 320.50 maintenance 0.0 gpumem 6.1

Suggestions appreciated, cheers.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

traning stop after 4 ticks #2

traning stop after 4 ticks #2

b4nn3d commented Dec 24, 2019

skyflynil commented Dec 24, 2019

b4nn3d commented Dec 24, 2019 •

edited

Loading

skyflynil commented Dec 24, 2019

b4nn3d commented Dec 24, 2019

skyflynil commented Dec 24, 2019

b4nn3d commented Dec 25, 2019

jwb95 commented Feb 23, 2020

traning stop after 4 ticks #2

traning stop after 4 ticks #2

Comments

b4nn3d commented Dec 24, 2019

skyflynil commented Dec 24, 2019

b4nn3d commented Dec 24, 2019 • edited Loading

skyflynil commented Dec 24, 2019

b4nn3d commented Dec 24, 2019

skyflynil commented Dec 24, 2019

b4nn3d commented Dec 25, 2019

jwb95 commented Feb 23, 2020

b4nn3d commented Dec 24, 2019 •

edited

Loading