-
-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Core][Distributed] use cpu/gloo to initialize pynccl #4248
[Core][Distributed] use cpu/gloo to initialize pynccl #4248
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Left some small comments.
current_device = torch.cuda.current_device() | ||
try: | ||
torch.cuda.set_device(device) | ||
NCCL_CHECK( | ||
_c_ncclCommInitRank(ctypes.byref(self.comm), self.world_size, | ||
self.unique_id, self.rank)) | ||
self.stream = torch.cuda.Stream() | ||
finally: | ||
torch.cuda.set_device(current_device) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need a try...finally
block here? Can the program continue to run when there is an exception in the try block?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can, but I think it would be better for a function to be pure, i.e. don't implicitly modify some global state.
nccl initialization requires broadcasting a unique id, which lives in cpu memory. previously, we only have one
nccl
backend process group, so we have to move the unique id to gpu, broadcast it, and then move it back to cpu.After #3904 , we always have a cpu/gloo backend, so we don't need to move the unique id around. We can just broadcast it in cpu memory.