-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
setting PGU device #1128
setting PGU device #1128
Conversation
Hello @shubhamagarwal92! Thanks for updating this PR. There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2020-03-25 17:04:29 UTC |
from six import string_types | ||
from torch.utils.tensorboard.summary import hparams |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this shall be in top of the file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. but I was mostly following the practices followed in the repo. do you also want to move this line then to the top?
from torch.utils.tensorboard.summary import hparams
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah there are a few more things to cleanup lol
@PyTorchLightning/core-contributors ^^ |
@shubhamagarwal92 @Borda |
I'm not sure, but I think this PR overlaps with #1130, at least the hparams part. Correct me if I'm wrong. |
This pull request is now in conflict... :( |
It's frustrating that tb only accepts those types. I have use cases where filtering the lists and None would be very confusing when looking at the parameters. How would you feel about converting all non supported types to strings? I.e.
|
yeas, it sounds as a reasonable solution |
wasn't this already done in another pr? it was called something like |
exp, ssi, sei = hparams(params, {}) | ||
tensorboard_params = {} | ||
for k, v in params.items(): | ||
if isinstance(v, (int, float, string_types, bool, torch.Tensor)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think this is no longer necessary because non-primitive hparams are already converted to string (see master branch)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree
@tullie @awaelchli ready now... |
now this pr is the same as #1094 by the same author :) one should be closed. |
# set cuda device to root gpu | ||
root_device = (torch.device("cuda", root_gpu) if root_gpu >= 0 else torch.device("cpu")) | ||
torch.cuda.set_device(root_device) | ||
|
||
return root_gpu |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me it looks like this should not go here, because the method is called "determine_root_gpu_device", and the added code also checks if the device is cpu. This code should probably go to the place where we call determine_root_gpu_device
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where do you suggest to place this?
torch.cuda.set_device(root_device) | ||
else: | ||
raise RuntimeError( | ||
'Expected `data_parallel_device_ids` as a list, cannot determine root gpu.' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Above we have if self.single_gpu
, so why should device ids be a list?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was already an if clause:if isinstance(self.data_parallel_device_ids, list)
, I raised the runtime error as a safenet because root_gpu = 0
was used earlier by default.
@awaelchli you are right that a follow-up but conflicting PR #1130 has already been merged now with the master. Could you please let me know what should be done for #1094 ? PS. @Borda already closed this one. |
Before submitting
What does this PR do?
Related to #609. Filter params for tensorboard logging. Discussed here
PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.
Did you have fun?
Make sure you had fun coding 🙃