Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training loop temporarily hangs after every 4 steps #1322

Closed
VitorGuizilini-TRI opened this issue Mar 31, 2020 · 6 comments · Fixed by #1378
Closed

Training loop temporarily hangs after every 4 steps #1322

VitorGuizilini-TRI opened this issue Mar 31, 2020 · 6 comments · Fixed by #1378
Labels
bug Something isn't working help wanted Open to be worked on
Milestone

Comments

@VitorGuizilini-TRI
Copy link
Contributor

I am porting some of my code to pytorch lightning, and everything seems to work fine. However, for some reason after every 4 training steps I see some temporary hanging (~1 second), which is severely slowing down my overall training time. Am I missing some obvious configuration? This is my Trainer configuration:

    trainer = pl.Trainer(
        gpus=8
        num_nodes=1,
        distributed_backend='ddp',
        checkpoint_callback=False,
        max_epochs=50,
        max_steps=None,
        progress_bar_refresh_rate=1,
        check_val_every_n_epoch=1,
        val_check_interval=1.0,
        gradient_clip_val=0.0,
        log_save_interval=0,
        num_sanity_val_steps=0,
        amp_level='O0',
    )
@VitorGuizilini-TRI VitorGuizilini-TRI added bug Something isn't working help wanted Open to be worked on labels Mar 31, 2020
@github-actions
Copy link
Contributor

Hi! thanks for your contribution!, great first issue!

@williamFalcon
Copy link
Contributor

@PyTorchLightning/core-contributors

@ethanwharris
Copy link
Member

Thanks for the issue! Would it be possible to post the code that reproduces this error? I've only seen this sort of behaviour before when the number of data loading workers is low - are you working with large data here (e.g. big images)?

@VitorGuizilini
Copy link
Contributor

I increased the number of workers and it works perfectly now, thank you very much! You can close this issue.

@williamFalcon
Copy link
Contributor

should we throw a warning when users use few workers?

@VitorGuizilini
Copy link
Contributor

If possible, sure! Seems like an obvious solution now, but it could save a couple of hours for other people. :)

@Borda Borda added this to the 0.7.2 milestone Apr 4, 2020
@Borda Borda modified the milestones: v0.7., v0.7.x Apr 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Open to be worked on
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants