Clean up #2467

williamFalcon · 2020-07-02T14:24:28Z

We have TPU tests now!

codecov · 2020-07-02T15:14:45Z

Codecov Report

Merging #2467 into master will decrease coverage by 0%.
The diff coverage is 61%.

@@          Coverage Diff           @@
##           master   #2467   +/-   ##
======================================
- Coverage      89%     89%   -0%     
======================================
  Files          69      69           
  Lines        5518    5533   +15     
======================================
+ Hits         4889    4898    +9     
- Misses        629     635    +6

williamFalcon · 2020-07-03T00:06:43Z

@zcain117 any ideas here? (https://github.com/PyTorchLightning/pytorch-lightning/pull/2467/checks?check_run_id=832554097#step:10:236)

I'm just getting familiar with xla details, but is the problem that we are comparing an xla tensor vs a non-xla tensor?

bgedik · 2020-07-08T18:13:26Z

pytorch_lightning/callbacks/early_stopping.py

+        if trainer.use_ddp or trainer.use_ddp2:
+            stop = torch.tensor(int(trainer.should_stop), device=pl_module.device)
+            dist.all_reduce(stop, op=dist.reduce_op.SUM)
+            dist.barrier()


@williamFalcon Is a barrier needed after an all reduce?

bgedik · 2020-07-08T18:14:36Z

pytorch_lightning/callbacks/early_stopping.py

                self.stopped_epoch = trainer.current_epoch
                trainer.should_stop = True

+        # stop every ddp process if any world process decides to stop
+        self._stop_distributed_training(trainer, pl_module)


The function name is misleading. This does not stop training, it just updates the trainer.should_stop state.

williamFalcon added 29 commits July 2, 2020 08:08

Fixes #2455

b32f6d6

Fixes #2455

59dff54

Fixes #2455

4d2c127

Fixes #2455

2ab5928

Fixes #2455

0264783

Fixes #2455

80988a3

Fixes #2455

f30358c

Fixes #2455

c2afd05

Fixes #2455

b3e5cfb

Fixes #2455

e399545

Fixes #2455

77c5daa

Fixes #2455

9874b5e

Fixes #2455

beeee3a

Fixes #2455

f8736b5

Fixes #2455

0f70120

Fixes #2455

26936bb

Fixes #2455

4610f68

Fixes #2455

e0ddc90

Fixes #2455

f113088

Fixes #2455

cc8d1cd

Fixes #2455

fc1254b

Fixes #2455

4492804

Fixes #2455

c59df13

Fixes #2455

bea5171

Fixes #2455

6d2e0c5

Fixes #2455

ffa65ad

Fixes #2455

7d5af1c

added early stop tpu test

c907e36

added early stop tpu test

ce37587

williamFalcon changed the title ~~Clean up ES~~ Clean up Jul 2, 2020

mergify bot requested a review from a team July 2, 2020 14:25

Borda added feature Is an improvement or enhancement ci Continuous Integration labels Jul 2, 2020

Borda added this to the 0.8.x milestone Jul 2, 2020

williamFalcon added 12 commits July 2, 2020 10:58

added early stop tpu test

6c77aef

added early stop tpu test

6cd4fdc

added early stop tpu test

7fdc7ec

added early stop tpu test

7879fe2

added early stop tpu test

3d77c36

added early stop tpu test

2ff19ba

added early stop tpu test

57b601b

added early stop tpu test

9dcc73e

added early stop tpu test

82df22e

added early stop tpu test

fafe7af

added early stop tpu test

5af7f69

added early stop tpu test

fef08e2

williamFalcon added 2 commits July 2, 2020 11:15

added early stop tpu test

b4bbe1c

added early stop tpu test

7f711a4

williamFalcon added 6 commits July 2, 2020 22:00

added early stop tpu test

51d4740

added early stop tpu test

58b66bc

added early stop tpu test

50b5874

added early stop tpu test

c75e71b

added early stop tpu test

71ab1f6

added early stop tpu test

43fa463

williamFalcon merged commit 020c332 into master Jul 3, 2020

Borda deleted the ess branch July 3, 2020 07:55

bgedik reviewed Jul 8, 2020

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clean up #2467

Clean up #2467

williamFalcon commented Jul 2, 2020 •

edited

Loading

codecov bot commented Jul 2, 2020 •

edited

Loading

williamFalcon commented Jul 3, 2020

bgedik Jul 8, 2020

bgedik Jul 8, 2020

Clean up #2467

Clean up #2467

Conversation

williamFalcon commented Jul 2, 2020 • edited Loading

codecov bot commented Jul 2, 2020 • edited Loading

Codecov Report

williamFalcon commented Jul 3, 2020

bgedik Jul 8, 2020

Choose a reason for hiding this comment

bgedik Jul 8, 2020

Choose a reason for hiding this comment

williamFalcon commented Jul 2, 2020 •

edited

Loading

codecov bot commented Jul 2, 2020 •

edited

Loading