setting PGU device #1128

shubhamagarwal92 · 2020-03-12T16:45:26Z

Before submitting

Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
Did you read the contributor guideline, Pull Request section?
Did you make sure to update the docs?
Did you write any new necessary tests?
If you made a notable change (that affects users), did you update the CHANGELOG?

What does this PR do?

Related to #609. Filter params for tensorboard logging. Discussed here

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

…n_loop

…lightning

pep8speaks · 2020-03-12T16:45:30Z

Hello @shubhamagarwal92! Thanks for updating this PR.

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-03-25 17:04:29 UTC

shubhamagarwal92 · 2020-03-12T19:23:50Z

@Borda

all my changes related to #1094 also got reflected in this. I guess first we should resolve that PR and then merge this.

Sorry for the mess.

pytorch_lightning/loggers/tensorboard.py

Borda · 2020-03-13T11:56:11Z

pytorch_lightning/loggers/tensorboard.py

+            from six import string_types
            from torch.utils.tensorboard.summary import hparams


this shall be in top of the file

I agree. but I was mostly following the practices followed in the repo. do you also want to move this line then to the top?

from torch.utils.tensorboard.summary import hparams

yeah there are a few more things to cleanup lol

pytorch_lightning/trainer/distrib_parts.py

pytorch_lightning/trainer/evaluation_loop.py

Borda · 2020-03-13T12:10:51Z

@PyTorchLightning/core-contributors ^^

S-aiueo32 · 2020-03-14T02:38:47Z

@shubhamagarwal92 @Borda
As mentioned in #1144, all value enters the IF-statement if params are sanitized #1130, right?

pytorch_lightning/loggers/tensorboard.py

awaelchli · 2020-03-14T14:19:10Z

I'm not sure, but I think this PR overlaps with #1130, at least the hparams part. Correct me if I'm wrong.

pytorch_lightning/loggers/tensorboard.py

Borda · 2020-03-18T19:20:41Z

As mentioned in #1144, all value enters the IF-statement if params are sanitized #1130, right?

True, the first part is not needed any more...
about root GPU think not sure yet...

mergify · 2020-03-24T18:51:29Z

This pull request is now in conflict... :(

tullie · 2020-03-25T16:15:02Z

It's frustrating that tb only accepts those types. I have use cases where filtering the lists and None would be very confusing when looking at the parameters.

How would you feel about converting all non supported types to strings? I.e.

if not isinstance(v, (int, float, string_types, bool, torch.Tensor)):
  v = str(v)

tensorboard_params[k] = v

Borda · 2020-03-25T16:24:21Z

yeas, it sounds as a reasonable solution

awaelchli · 2020-03-25T16:40:52Z

wasn't this already done in another pr? it was called something like sanitize_hparams

awaelchli · 2020-03-25T16:46:12Z

found it on master @tullie . You should probably just merge?

awaelchli · 2020-03-25T16:49:08Z

pytorch_lightning/loggers/tensorboard.py

-            exp, ssi, sei = hparams(params, {})
+            tensorboard_params = {}
+            for k, v in params.items():
+                if isinstance(v, (int, float, string_types, bool, torch.Tensor)):


i think this is no longer necessary because non-primitive hparams are already converted to string (see master branch)

Borda · 2020-03-25T16:51:11Z

found it on master @tullie . You should probably just merge?

I think that it was already pointed out in another issue that there are two PRs doing almost the same
the only thing remaining is about the setting device... I will rename it to lower confusion

Borda · 2020-03-25T17:05:02Z

@tullie @awaelchli ready now...

awaelchli · 2020-03-25T17:22:55Z

now this pr is the same as #1094 by the same author :) one should be closed.

awaelchli · 2020-03-25T17:31:50Z

pytorch_lightning/trainer/distrib_parts.py

+    # set cuda device to root gpu
+    root_device = (torch.device("cuda", root_gpu) if root_gpu >= 0 else torch.device("cpu"))
+    torch.cuda.set_device(root_device)
+
    return root_gpu


To me it looks like this should not go here, because the method is called "determine_root_gpu_device", and the added code also checks if the device is cpu. This code should probably go to the place where we call determine_root_gpu_device.

Where do you suggest to place this?

awaelchli · 2020-03-25T17:32:54Z

pytorch_lightning/trainer/evaluation_loop.py

+                torch.cuda.set_device(root_device)
+            else:
+                raise RuntimeError(
+                    'Expected `data_parallel_device_ids` as a list, cannot determine root gpu.'


Above we have if self.single_gpu, so why should device ids be a list?

There was already an if clause:if isinstance(self.data_parallel_device_ids, list) , I raised the runtime error as a safenet because root_gpu = 0 was used earlier by default.

shubhamagarwal92 · 2020-03-25T18:39:46Z

now this pr is the same as #1094 by the same author :) one should be closed.

@awaelchli you are right that a follow-up but conflicting PR #1130 has already been merged now with the master. Could you please let me know what should be done for #1094 ?

PS. @Borda already closed this one.

shubhamagarwal92 added 12 commits March 8, 2020 19:58

SA: for Lightning-AI#958: set torch cuda device when finding root

5c554a1

SA: for Lightning-AI#958: removing root gpu hack in trainer/evaluatio…

2f17b2f

…n_loop

SA: setting torch cuda device

6d89505

comment line too long

54e9a5e

check if root gpu exists or available

83c291d

SA: for Lightning-AI#958: set torch cuda device when finding root

8a66308

SA: for Lightning-AI#958: removing root gpu hack in trainer/evaluatio…

ae30b27

…n_loop

SA: setting torch cuda device

67072ad

comment line too long

c684cc9

check if root gpu exists or available

3b42a40

Merge branch 'master' of https://github.com/shubhamagarwal92/pytorch-…

30f0bf3

…lightning

Related to Lightning-AI#609. Filter params for tensorboard logging.

9473303

comment line too long

10a5b05

Borda requested changes Mar 13, 2020

View reviewed changes

Borda mentioned this pull request Mar 14, 2020

Add support for hierarchical dict #1144

Closed

awaelchli reviewed Mar 14, 2020

View reviewed changes

pytorch_lightning/loggers/tensorboard.py Outdated Show resolved Hide resolved

Borda reviewed Mar 18, 2020

View reviewed changes

pytorch_lightning/loggers/tensorboard.py Outdated Show resolved Hide resolved

Apply suggestions from code review

dd8a7c2

Borda added the feature Is an improvement or enhancement label Mar 18, 2020

Borda assigned tullie Mar 25, 2020

awaelchli reviewed Mar 25, 2020

View reviewed changes

Borda changed the title ~~Filter params for tensorboard logging~~ setting PGU device Mar 25, 2020

Borda added 2 commits March 25, 2020 18:02

revert to master

2acd85d

revert to origin

08bb574

awaelchli reviewed Mar 25, 2020

View reviewed changes

Borda closed this Mar 25, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

setting PGU device #1128

setting PGU device #1128

shubhamagarwal92 commented Mar 12, 2020

pep8speaks commented Mar 12, 2020 •

edited

Loading

shubhamagarwal92 commented Mar 12, 2020

Borda Mar 13, 2020

shubhamagarwal92 Mar 13, 2020

Borda Mar 13, 2020

Borda commented Mar 13, 2020

S-aiueo32 commented Mar 14, 2020 •

edited

Loading

awaelchli commented Mar 14, 2020

Borda commented Mar 18, 2020 •

edited

Loading

mergify bot commented Mar 24, 2020

tullie commented Mar 25, 2020

Borda commented Mar 25, 2020

awaelchli commented Mar 25, 2020

awaelchli commented Mar 25, 2020 •

edited

Loading

awaelchli Mar 25, 2020

Borda Mar 25, 2020

Borda commented Mar 25, 2020

Borda commented Mar 25, 2020

awaelchli commented Mar 25, 2020

awaelchli Mar 25, 2020

shubhamagarwal92 Mar 25, 2020

awaelchli Mar 25, 2020

shubhamagarwal92 Mar 25, 2020

shubhamagarwal92 commented Mar 25, 2020

		from six import string_types
		from torch.utils.tensorboard.summary import hparams

setting PGU device #1128

setting PGU device #1128

Conversation

shubhamagarwal92 commented Mar 12, 2020

Before submitting

What does this PR do?

PR review

Did you have fun?

pep8speaks commented Mar 12, 2020 • edited Loading

Comment last updated at 2020-03-25 17:04:29 UTC

shubhamagarwal92 commented Mar 12, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Borda commented Mar 13, 2020

S-aiueo32 commented Mar 14, 2020 • edited Loading

awaelchli commented Mar 14, 2020

Borda commented Mar 18, 2020 • edited Loading

mergify bot commented Mar 24, 2020

tullie commented Mar 25, 2020

Borda commented Mar 25, 2020

awaelchli commented Mar 25, 2020

awaelchli commented Mar 25, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Borda commented Mar 25, 2020

Borda commented Mar 25, 2020

awaelchli commented Mar 25, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shubhamagarwal92 commented Mar 25, 2020

pep8speaks commented Mar 12, 2020 •

edited

Loading

S-aiueo32 commented Mar 14, 2020 •

edited

Loading

Borda commented Mar 18, 2020 •

edited

Loading

awaelchli commented Mar 25, 2020 •

edited

Loading