-
Notifications
You must be signed in to change notification settings - Fork 209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove cuda event deadlocking issues in device mr tests #1097
Remove cuda event deadlocking issues in device mr tests #1097
Conversation
We fixed both deadlocking issues due to a assumption that std::mutex would have fair scheduling, and work around deadlocks found in cuda event created in very short lived threads ( < 10ms ).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing this! Just need more descriptive name for the condition .
@ajschmidt8 please test on ARM before we merge. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
I never tested the problematic code outside of CI, so I have no way of verifying whether this fix works as intended. I'll defer to the devs for the approvals here. If this fix looks good to everyone else, let's get it merged and Ops will add these changes to our GitHub Actions POC PR to see if we still experience any issues. |
@gpucibot merge |
We fixed both deadlocking issues due to a assumption that std::mutex would have fair scheduling, and work around deadlocks found in cuda event created in very short lived threads ( < 10ms ).