-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dragon server enhancement #582
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One quick question, but otherwise it looks good. Thanks for tracking this down
@@ -130,7 +130,10 @@ def redir_worker(io_conn: dragon_connection.Connection, file_path: str) -> None: | |||
except Exception as e: | |||
print(e) | |||
finally: | |||
io_conn.close() | |||
try: | |||
io_conn.close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain why this is a universal catch and print? Is it that io_conn
can fail out in a number of ways? Would re-reraising the error result in too much crashing down?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'm a bit scared of raising an exception and taking down all the dragon infra. I'll put up a ticket to investigate better ways of handling this.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #582 +/- ##
============================================
- Coverage 79.12% 60.63% -18.50%
============================================
Files 78 78
Lines 5988 6000 +12
============================================
- Hits 4738 3638 -1100
- Misses 1250 2362 +1112
|
The Dragon server could fail, dumping a core file, if it was shut down before all spawned Process Groups completed. This PR fixes such behavior: the immediate flag on the
DragonShutdownRequest
now requests every non-terminated job to be stopped.