Add Trainer.validate(…) method to run one validation epoch #4707

EliaCereda · 2020-11-17T11:28:13Z

What does this PR do?

Adds a Trainer.validate(...) method to perform one evaluation epoch over the validation set, with the same semantics as Trainer.test(...).

Resolves #4634

I'd say that the PR is now in a good enough shape to remove the draft tag and request a proper review.

Before submitting

Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
Did you read the contributor guideline, Pull Request section?
Did you make sure your PR does only one thing, instead of bundling different changes together? Otherwise, we ask you to create a separate PR for every change.
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?
Did you verify new and existing tests pass locally with your changes?
If you made a notable change (that affects users), did you update the CHANGELOG?

PR review

Anyone in the community is free to review the PR once the tests have passed.
Before you start reviewing make sure you have read Review guidelines. In in short, see following bullet-list:

Is this pull request ready for review? (if not, please submit in draft mode)
Check that all items from Before submitting are resolved
Make sure the title is self explanatory and the description concisely explains the PR
Add labels and milestones (and optionally projects) to the PR so it can be classified; Bugfixes should be including in bug-fix release milestones (m.f.X) and features should be included in (m.X.b) releases.

Did you have fun?

Make sure you had fun coding 🙃

`Trainer.validate` follows the same semantics as `Trainer.test` and shares part of the implementation

pep8speaks · 2020-11-17T11:28:18Z

Hello @EliaCereda! Thanks for updating this PR.

In the file pytorch_lightning/callbacks/model_checkpoint.py:

Line 223:13: W503 line break before binary operator

Comment last updated at 2020-12-02 10:53:05 UTC

pytorch_lightning/trainer/trainer.py

…ProgressBar It seems that tqdm doesn’t support `__bool__` on its instances, so it was raising an exception.

codecov · 2020-11-17T13:24:52Z

Codecov Report

Merging #4707 (d4cb1b0) into master (add387c) will increase coverage by 0%.
The diff coverage is 99%.

@@           Coverage Diff           @@
##           master   #4707    +/-   ##
=======================================
  Coverage      93%     93%            
=======================================
  Files         124     124            
  Lines        9203    9349   +146     
=======================================
+ Hits         8524    8668   +144     
- Misses        679     681     +2

…lerator, in view of its future deprecation

rohitgr7

LGTM so far. Can you add some tests?

pytorch_lightning/trainer/evaluation_loop.py

Co-authored-by: Rohit Gupta <[email protected]>

EliaCereda · 2020-11-17T20:34:46Z

Yes, I'll also prepare some tests, I was thinking of using the test cases for Trainer.test as a reference.

EliaCereda · 2020-11-18T10:15:28Z

It's not clear to me why these tests are failing. I think CircleCI just had a transient error and might go away if we re-run.

This is the other failing test, the 'specific' variant. It fails at line 212 because the checkpoint file doesn't exist anymore, but it did exist at line 201 when validate is called the first time.

pytorch_lightning/callbacks/progress.py

tests/trainer/test_config_validator.py

tests/trainer/test_dataloaders.py

EliaCereda · 2020-11-18T16:26:40Z

@carmocca, regarding your other three comments, I based my tests directly on those for the Trainer.test(...) method. At the moment, I didn't make changes that weren't strictly necessary to adapt them.

Co-authored-by: Carlos Mocholí <[email protected]>

EliaCereda · 2020-11-18T17:11:09Z

This is the other failing test, the 'specific' variant. It fails at line 212 because the checkpoint file doesn't exist anymore, but it did exist at line 201 when validate is called the first time.

I have an idea as to why this might be: the default ModelCheckpoint callback saves a new checkpoint at the end of every validation epoch by default. This should be disabled in Trainer.validate(...), I think, or it risks deleting the very checkpoint you're evaluating.

Still wondering why this is a problem only in some CI runs and not in others. Let's see if 9e59e6d fixes this.

…evaluating Without this, ModelCheckpoint might delete the very checkpoint being evaluated. Furthermore, the model will not change during evaluation anyway.

…ors and warnings

rohitgr7

PR is big. Just a high-level review.

pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py

docs/source/trainer.rst

pytorch_lightning/callbacks/model_checkpoint.py

pytorch_lightning/trainer/trainer.py

tests/base/datamodules.py

* Remove unused test_mode parameter * Differentiate validation/test results in the printed message

Co-authored-by: Rohit Gupta <[email protected]>

pytorch_lightning/trainer/trainer.py

pytorch_lightning/callbacks/base.py

pytorch_lightning/core/datamodule.py

pytorch_lightning/core/hooks.py

Co-authored-by: Carlos Mocholí <[email protected]>

…alidate

…alidate # Conflicts: # CHANGELOG.md # pytorch_lightning/callbacks/progress.py

EliaCereda · 2020-11-28T18:10:55Z

I'll have some more time to dedicate to this over the coming week and I'd ask you if there is something I can do to bring this closer to a mergeable state.

The biggest concern is that it's quite a big PR right now. To address this, I went through to the changes that are included and I'd say they can be divided in 7 groups:

A refactor of Trainer to replace the testing attribute with a more generic evaluating, which initially can only be None or 'test'.
Update of various components to check the new evaluating attribute instead of testing.
Addition of the Trainer.validate(...) method, which basically differs from test(...) only in setting evaluating = 'validation'
Update of various components to handle the new value of evaluating.
Tests for Trainer.validate(...)
Changes to the docs of various components to mention the existence of a new stage.
Renames of some internal methods to better reflect their new purpose (eg. one train_or_test() which becomes train_or_evaluate())

Do you think I missed anything? Of these, I think 1. and 2. could definitely make sense on their own and might be pushed forward in their own PR. On the other hand, I'm having a hard time finding a way to further separate the rest.

Do you have any suggestion in this sense? Thanks!

@Borda, @rohitgr7, @carmocca

carmocca · 2020-11-28T18:27:20Z

I personally would keep it to this PR. It's definitely not the largest we've merged recently and the changes have cohesion. What do others prefer?

rohitgr7 · 2020-11-28T20:56:38Z

I'd suggest separate PRs, just to minimize any future bugs and better review, the way suggested here: #4707 (comment)

PR-1: [1, 2] + callbacks
PR-2: [3, 4, 7]
PR-3: [5, 6]  # test and docs are almost independent so you can do it either way (separate or together)

s-rog · 2020-11-30T00:25:38Z

pytorch_lightning/core/datamodule.py

            # We do this so __attach_datamodule in trainer.py doesn't mistakenly call setup('test') on trainer.test()
            stage = args[1] if len(args) > 1 else kwargs.get("stage", None)

            if stage == "fit" or stage is None:
                obj._has_setup_fit = True

+            if stage == "validation" or stage is None:


I would make all references in datamodule "validate" (instead of "validation) to keep it consistent with fit and test

fit and test can be nouns or verbs, however, we are talking about stages which means they should be nouns.
So if I am not mistaken (English is not my first language), using validation is more consistent.
The validation stage vs the validate stage

I don't think grammar should be a consideration here since we're only talking about variables in code... and that variable name consistency is more important. Thoughts on this? @justusschock @rohitgr7

I think both of you have a point here and once could certainly use both. However, I feel that validation stage is more intuitive and personally I would go with it since it sounds 'more correct' to me, but this is just a personal opinion. Also I think, that this should definitely not be a blocker here.

Yes, it is a good point. I was also ambivalent about it while I was writing the code.

There is another occurrence of this issue: the Trainer.evaluating attribute, which can be either test or validation. Here validation is the right choice in my opinion, reading it as "currently evaluating over the test/validation set".

It was not so clear cut in the data module: I'd say that 'validation' sounds better for me too, but I would not be opposed to using 'validate' either.

Gotcha, it does have a better ring to it :]

justusschock · 2020-11-30T08:38:05Z

pytorch_lightning/accelerators/accelerator.py

-    def train_or_test(self):
-        if self.trainer.testing:
-            results = self.trainer.run_test()
+    def train_or_evaluate(self):


The name here is a bit misleading as this also runs test

I used evaluate here to refer to either test or validate. I think it was inspired by the pre-existing Trainer.run_evaluation method, which is used to run either the test or validation loop depending on the value of the test_mode parameter.

Let me know if you have a better idea for the name!

…alidate # Conflicts: # CHANGELOG.md # pytorch_lightning/trainer/evaluation_loop.py # pytorch_lightning/trainer/trainer.py # tests/callbacks/test_callbacks.py

…gicMock

EliaCereda · 2020-12-02T16:05:02Z

Just published the two new PRs #4945 and #4948, split as proposed. Closing.

EliaCereda added 2 commits November 17, 2020 12:18

Add Trainer.validate(…) to run one validation epoch

62bd29e

`Trainer.validate` follows the same semantics as `Trainer.test` and shares part of the implementation

Support val_progress_bar without main_progress_bar in ProgressBar

055e1ba

EliaCereda mentioned this pull request Nov 17, 2020

Evaluation over the validation set #4634

Closed

EliaCereda commented Nov 17, 2020

View reviewed changes

pytorch_lightning/trainer/trainer.py Outdated Show resolved Hide resolved

EliaCereda commented Nov 17, 2020

View reviewed changes

pytorch_lightning/trainer/trainer.py Show resolved Hide resolved

EliaCereda commented Nov 17, 2020

View reviewed changes

pytorch_lightning/trainer/trainer.py Outdated Show resolved Hide resolved

EliaCereda added 2 commits November 17, 2020 12:39

Fix PEP 8 issue

156b669

Use main_progress_bar is not None to test if the bar is present in …

1429548

…ProgressBar It seems that tqdm doesn’t support `__bool__` on its instances, so it was raising an exception.

EliaCereda added 3 commits November 17, 2020 16:58

Simplify selection of dataloaders arg to be set

50427e7

Call setup(…) with stage ‘validation’ when running Trainer.validate(…)

d1988e0

Check self.trainer.evaluating instead of self.trainer.testing in Acce…

ae03c6b

…lerator, in view of its future deprecation

rohitgr7 reviewed Nov 17, 2020

View reviewed changes

pytorch_lightning/trainer/evaluation_loop.py Outdated Show resolved Hide resolved

Set Trainer.evaluating to None by default

5493a5b

Co-authored-by: Rohit Gupta <[email protected]>

EliaCereda added 2 commits November 18, 2020 09:16

Replace the remaining instances of self.evaluating = False with None

860fef5

Add a first batch of tests for Trainer.validate(…)

99a6161

EliaCereda force-pushed the feature/trainer-validate branch from 3b5ae9b to 99a6161 Compare November 18, 2020 09:13

carmocca reviewed Nov 18, 2020

View reviewed changes

pytorch_lightning/callbacks/progress.py Show resolved Hide resolved

carmocca reviewed Nov 18, 2020

View reviewed changes

tests/trainer/test_config_validator.py Outdated Show resolved Hide resolved

carmocca reviewed Nov 18, 2020

View reviewed changes

tests/trainer/test_config_validator.py Show resolved Hide resolved

carmocca reviewed Nov 18, 2020

View reviewed changes

tests/trainer/test_dataloaders.py Show resolved Hide resolved

Avoid an if/else in ProgressBar

307c89a

Co-authored-by: Carlos Mocholí <[email protected]>

EliaCereda added 3 commits November 18, 2020 18:21

Modify ModelCheckpoint to never save a checkpoint automatically when …

9e59e6d

…evaluating Without this, ModelCheckpoint might delete the very checkpoint being evaluated. Furthermore, the model will not change during evaluation anyway.

Update test_config_validator.py to match the messages of expected err…

a844f40

…ors and warnings

Fix Trainer.validate(…, verbose=True)

3f9f927

rohitgr7 reviewed Nov 21, 2020

View reviewed changes

EliaCereda and others added 4 commits November 21, 2020 16:19

Disable EarlyStopping in evaluation mode

6a04280

Clean up LoggerConnector.get_evaluate_epoch_results

2115350

* Remove unused test_mode parameter * Differentiate validation/test results in the printed message

Improve description of Trainer.validate in docs/source/trainer.rst

92acb12

Clean up setup() methods in tests/base/datamodules.py

8090193

Co-authored-by: Rohit Gupta <[email protected]>

EliaCereda commented Nov 21, 2020

View reviewed changes

pytorch_lightning/trainer/trainer.py Show resolved Hide resolved

EliaCereda added 3 commits November 21, 2020 17:54

Update deprecation warnings

a098489

Update Trainer.{validate, test} docstrings

f8ab391

Fix PEP 8 issue

605e7b0

carmocca reviewed Nov 22, 2020

View reviewed changes

EliaCereda and others added 4 commits November 23, 2020 10:57

Consistently use the serial comma in docstrings

14a7767

Co-authored-by: Carlos Mocholí <[email protected]>

Merge remote-tracking branch 'upstream/master' into feature/trainer-v…

e9a6956

…alidate

Merge remote-tracking branch 'upstream/master' into feature/trainer-v…

873099e

…alidate # Conflicts: # CHANGELOG.md # pytorch_lightning/callbacks/progress.py

Fix PEP 8 issue

6f2ce28

s-rog reviewed Nov 30, 2020

View reviewed changes

justusschock reviewed Nov 30, 2020

View reviewed changes

Borda modified the milestones: 1.1, 1.2 Nov 30, 2020

Borda added the discussion In a discussion stage label Nov 30, 2020

EliaCereda added 2 commits December 2, 2020 11:51

Merge remote-tracking branch 'upstream/master' into feature/trainer-v…

0f4e474

…alidate # Conflicts: # CHANGELOG.md # pytorch_lightning/trainer/evaluation_loop.py # pytorch_lightning/trainer/trainer.py # tests/callbacks/test_callbacks.py

Rewrite assertions for Trainer.validate in test_callbacks.py using Ma…

d4cb1b0

…gicMock

This was referenced Dec 2, 2020

Refactor RunningStage usage in advance of implementing Trainer.validate() #4945

Merged

Add Trainer.validate(…) method to run one validation epoch #4948

Merged

EliaCereda closed this Dec 2, 2020

Borda modified the milestones: 1.2, 1.1 Dec 4, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Trainer.validate(…) method to run one validation epoch #4707

Add Trainer.validate(…) method to run one validation epoch #4707

EliaCereda commented Nov 17, 2020 •

edited by Borda

Loading

pep8speaks commented Nov 17, 2020 •

edited

Loading

codecov bot commented Nov 17, 2020 •

edited

Loading

rohitgr7 left a comment

EliaCereda commented Nov 17, 2020

EliaCereda commented Nov 18, 2020

EliaCereda commented Nov 18, 2020

EliaCereda commented Nov 18, 2020 •

edited

Loading

rohitgr7 left a comment

EliaCereda commented Nov 28, 2020

carmocca commented Nov 28, 2020

rohitgr7 commented Nov 28, 2020 •

edited

Loading

s-rog Nov 30, 2020

carmocca Nov 30, 2020

s-rog Nov 30, 2020

justusschock Nov 30, 2020

EliaCereda Nov 30, 2020

s-rog Dec 1, 2020

justusschock Nov 30, 2020

EliaCereda Nov 30, 2020

EliaCereda commented Dec 2, 2020

Add Trainer.validate(…) method to run one validation epoch #4707

Add Trainer.validate(…) method to run one validation epoch #4707

Conversation

EliaCereda commented Nov 17, 2020 • edited by Borda Loading

What does this PR do?

Before submitting

PR review

Did you have fun?

pep8speaks commented Nov 17, 2020 • edited Loading

Comment last updated at 2020-12-02 10:53:05 UTC

codecov bot commented Nov 17, 2020 • edited Loading

Codecov Report

rohitgr7 left a comment

Choose a reason for hiding this comment

EliaCereda commented Nov 17, 2020

EliaCereda commented Nov 18, 2020

EliaCereda commented Nov 18, 2020

EliaCereda commented Nov 18, 2020 • edited Loading

rohitgr7 left a comment

Choose a reason for hiding this comment

EliaCereda commented Nov 28, 2020

carmocca commented Nov 28, 2020

rohitgr7 commented Nov 28, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

EliaCereda commented Dec 2, 2020

EliaCereda commented Nov 17, 2020 •

edited by Borda

Loading

pep8speaks commented Nov 17, 2020 •

edited

Loading

codecov bot commented Nov 17, 2020 •

edited

Loading

EliaCereda commented Nov 18, 2020 •

edited

Loading

rohitgr7 commented Nov 28, 2020 •

edited

Loading