Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

traning stop after 4 ticks #2

Open
b4nn3d opened this issue Dec 24, 2019 · 7 comments
Open

traning stop after 4 ticks #2

b4nn3d opened this issue Dec 24, 2019 · 7 comments

Comments

@b4nn3d
Copy link

b4nn3d commented Dec 24, 2019

hello there, i got your fork running on colab - semi fine.
like i said in the titled, the training stop after 4 ticks

tick 0 kimg 0.1 lod 0.00 minibatch 32 time 58s sec/tick 57.7 sec/kimg 450.76 maintenance 0.0 gpumem 5.1
tick 1 kimg 6.1 lod 0.00 minibatch 32 time 12m 16s sec/tick 648.1 sec/kimg 107.73 maintenance 30.5 gpumem 5.1
tick 2 kimg 12.2 lod 0.00 minibatch 32 time 23m 19s sec/tick 644.3 sec/kimg 107.10 maintenance 18.0 gpumem 5.1
tick 3 kimg 18.2 lod 0.00 minibatch 32 time 34m 16s sec/tick 652.4 sec/kimg 108.45 maintenance 5.1 gpumem 5.1
^C

^c like a keyboard interrupt.. but i didn't give such a command

@skyflynil
Copy link
Owner

Did you set the 'metric' to be none? There could be issues if you are running fid metric evaluation. I don't need metric thus I did not do any testing or code change for it. btw, I am able to train through google colab for > 10 ticks

@b4nn3d
Copy link
Author

b4nn3d commented Dec 24, 2019

I launched the training with this.

!python run_training.py --result-dir=results --data-dir=datasets --dataset=blow --config=config-f --total-kimg=12000 --mirror-augment=true --metric=none --min-h=3 --min-w=3 --res-log2=7

@skyflynil
Copy link
Owner

Could be memory issue. You may try this to boost your instance memory.
googlecolab/colabtools#253

@b4nn3d
Copy link
Author

b4nn3d commented Dec 24, 2019

i got OOM when i was trying with a 512512 dataset. this one was 384384.
in your example you train a 640x384 dataset, so i don't see how this could be a problem ;)

btw, i'm trying with 18764 images.. how big is your dataset?

@skyflynil
Copy link
Owner

I actually did use that high memory instance (25G memory) to train. I have tried 512x512 and 640x384 and both were running fine (around 25k files).

@b4nn3d
Copy link
Author

b4nn3d commented Dec 25, 2019

ok, it was a memory issue.
trained for 220 ticks with your method

@jwb95
Copy link

jwb95 commented Feb 23, 2020

Hi there,
@b4nn3d did this work out for you?

  • I'm on a high memory instance.
  • 2k images
  • 256^2 dimensions

Launching with:
!python run_training.py --num-gpus=1 --data-dir=./dataset --config=config-f --dataset=myset --mirror-augment=true --metric=none --total-kimg=2000 --min-h=4 --min-w=4 --res-log2=6

So far I've never seen more than tick 0:
tick 0 kimg 0.1 lod 0.00 minibatch 32 time 41s sec/tick 41.0 sec/kimg 320.50 maintenance 0.0 gpumem 6.1

Suggestions appreciated, cheers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants