-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
traning stop after 4 ticks #2
Comments
Did you set the 'metric' to be none? There could be issues if you are running fid metric evaluation. I don't need metric thus I did not do any testing or code change for it. btw, I am able to train through google colab for > 10 ticks |
I launched the training with this. !python run_training.py --result-dir=results --data-dir=datasets --dataset=blow --config=config-f --total-kimg=12000 --mirror-augment=true --metric=none --min-h=3 --min-w=3 --res-log2=7 |
Could be memory issue. You may try this to boost your instance memory. |
i got OOM when i was trying with a 512512 dataset. this one was 384384. btw, i'm trying with 18764 images.. how big is your dataset? |
I actually did use that high memory instance (25G memory) to train. I have tried 512x512 and 640x384 and both were running fine (around 25k files). |
ok, it was a memory issue. |
Hi there,
Launching with: So far I've never seen more than tick 0: Suggestions appreciated, cheers. |
hello there, i got your fork running on colab - semi fine.
like i said in the titled, the training stop after 4 ticks
tick 0 kimg 0.1 lod 0.00 minibatch 32 time 58s sec/tick 57.7 sec/kimg 450.76 maintenance 0.0 gpumem 5.1
tick 1 kimg 6.1 lod 0.00 minibatch 32 time 12m 16s sec/tick 648.1 sec/kimg 107.73 maintenance 30.5 gpumem 5.1
tick 2 kimg 12.2 lod 0.00 minibatch 32 time 23m 19s sec/tick 644.3 sec/kimg 107.10 maintenance 18.0 gpumem 5.1
tick 3 kimg 18.2 lod 0.00 minibatch 32 time 34m 16s sec/tick 652.4 sec/kimg 108.45 maintenance 5.1 gpumem 5.1
^C
^c like a keyboard interrupt.. but i didn't give such a command
The text was updated successfully, but these errors were encountered: