-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better graceful shutdown for KeyboardInterrupt #19976
Conversation
⚡ Required checks status: All passing 🟢Groups summary🟢 pytorch_lightning: Tests workflowThese checks are required after the changes to 🟢 pytorch_lightning: Azure GPU
These checks are required after the changes to 🟢 pytorch_lightning: Benchmarks
These checks are required after the changes to 🟢 fabric: Docs
These checks are required after the changes to 🟢 pytorch_lightning: Docs
These checks are required after the changes to 🟢 lightning_fabric: CPU workflowThese checks are required after the changes to 🟢 lightning_fabric: Azure GPU
These checks are required after the changes to 🟢 mypy
These checks are required after the changes to 🟢 installThese checks are required after the changes to Thank you for your contribution! 💜
|
37ddbe3
to
8e99d14
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #19976 +/- ##
=========================================
- Coverage 84% 59% -25%
=========================================
Files 426 421 -5
Lines 35284 35195 -89
=========================================
- Hits 29620 20785 -8835
- Misses 5664 14410 +8746 |
…ning into feature/graceful-exit
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Epic
Hello @awaelchli, I recently upgraded Lightning and discovered this change. Before, we were able to stop the training while not stopping the rest of the program. I thought it was nice to be able to stop the training in a notebook and then continue experiments. Of course, not useful for training a "real" model, but it's pretty nice when exploring and testing new things. Is there another reason for such a change (apart from better stopping the processes)? Or maybe I'm missing something? Thanks for your help! |
What does this PR do?
When the user sends KeyboardInterrupt (Ctrl+C), Lightning runs shutdown logic. However, it does not guarantee that all processes get stopped. Users sometimes get hanging zombie processes. Furthermore, our logic does not play well when the user spams Ctrl+C repeatedly. This PR addresses both concerns.
Video Demo:
https://www.loom.com/share/a7e105baab5a493b89412434abb7c7fc?sid=1ab5b6b4-f9a1-46a1-888d-726336c5f360
📚 Documentation preview 📚: https://pytorch-lightning--19976.org.readthedocs.build/en/19976/