Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Trainer.validate(…) method to run one validation epoch #4707

Closed

Conversation

EliaCereda
Copy link
Contributor

@EliaCereda EliaCereda commented Nov 17, 2020

What does this PR do?

Adds a Trainer.validate(...) method to perform one evaluation epoch over the validation set, with the same semantics as Trainer.test(...).

Resolves #4634

I'd say that the PR is now in a good enough shape to remove the draft tag and request a proper review.

Before submitting

  • Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure your PR does only one thing, instead of bundling different changes together? Otherwise, we ask you to create a separate PR for every change.
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?
  • Did you verify new and existing tests pass locally with your changes?
  • If you made a notable change (that affects users), did you update the CHANGELOG?

PR review

Anyone in the community is free to review the PR once the tests have passed.
Before you start reviewing make sure you have read Review guidelines. In in short, see following bullet-list:

  • Is this pull request ready for review? (if not, please submit in draft mode)
  • Check that all items from Before submitting are resolved
  • Make sure the title is self explanatory and the description concisely explains the PR
  • Add labels and milestones (and optionally projects) to the PR so it can be classified; Bugfixes should be including in bug-fix release milestones (m.f.X) and features should be included in (m.X.b) releases.

Did you have fun?

Make sure you had fun coding 🙃

`Trainer.validate` follows the same semantics as `Trainer.test` and shares part of the implementation
@pep8speaks
Copy link

pep8speaks commented Nov 17, 2020

Hello @EliaCereda! Thanks for updating this PR.

Line 223:13: W503 line break before binary operator

Comment last updated at 2020-12-02 10:53:05 UTC

…ProgressBar

It seems that tqdm doesn’t support `__bool__` on its instances, so it was raising an exception.
@codecov
Copy link

codecov bot commented Nov 17, 2020

Codecov Report

Merging #4707 (d4cb1b0) into master (add387c) will increase coverage by 0%.
The diff coverage is 99%.

@@           Coverage Diff           @@
##           master   #4707    +/-   ##
=======================================
  Coverage      93%     93%            
=======================================
  Files         124     124            
  Lines        9203    9349   +146     
=======================================
+ Hits         8524    8668   +144     
- Misses        679     681     +2     

Copy link
Contributor

@rohitgr7 rohitgr7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM so far. Can you add some tests?

pytorch_lightning/trainer/evaluation_loop.py Outdated Show resolved Hide resolved
@EliaCereda
Copy link
Contributor Author

Yes, I'll also prepare some tests, I was thinking of using the test cases for Trainer.test as a reference.

@EliaCereda EliaCereda force-pushed the feature/trainer-validate branch from 3b5ae9b to 99a6161 Compare November 18, 2020 09:13
@EliaCereda
Copy link
Contributor Author

It's not clear to me why these tests are failing. I think CircleCI just had a transient error and might go away if we re-run.

This is the other failing test, the 'specific' variant. It fails at line 212 because the checkpoint file doesn't exist anymore, but it did exist at line 201 when validate is called the first time.

@EliaCereda
Copy link
Contributor Author

@carmocca, regarding your other three comments, I based my tests directly on those for the Trainer.test(...) method. At the moment, I didn't make changes that weren't strictly necessary to adapt them.

Co-authored-by: Carlos Mocholí <[email protected]>
@EliaCereda
Copy link
Contributor Author

EliaCereda commented Nov 18, 2020

This is the other failing test, the 'specific' variant. It fails at line 212 because the checkpoint file doesn't exist anymore, but it did exist at line 201 when validate is called the first time.

I have an idea as to why this might be: the default ModelCheckpoint callback saves a new checkpoint at the end of every validation epoch by default. This should be disabled in Trainer.validate(...), I think, or it risks deleting the very checkpoint you're evaluating.

Still wondering why this is a problem only in some CI runs and not in others. Let's see if 9e59e6d fixes this.

…evaluating

Without this, ModelCheckpoint might delete the very checkpoint being evaluated. Furthermore, the model will not change during evaluation anyway.
Copy link
Contributor

@rohitgr7 rohitgr7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR is big. Just a high-level review.

docs/source/trainer.rst Outdated Show resolved Hide resolved
pytorch_lightning/trainer/trainer.py Outdated Show resolved Hide resolved
tests/base/datamodules.py Outdated Show resolved Hide resolved
tests/base/datamodules.py Outdated Show resolved Hide resolved
pytorch_lightning/callbacks/base.py Outdated Show resolved Hide resolved
pytorch_lightning/callbacks/base.py Outdated Show resolved Hide resolved
pytorch_lightning/core/datamodule.py Outdated Show resolved Hide resolved
pytorch_lightning/core/hooks.py Outdated Show resolved Hide resolved
pytorch_lightning/core/hooks.py Outdated Show resolved Hide resolved
pytorch_lightning/core/hooks.py Outdated Show resolved Hide resolved
pytorch_lightning/core/hooks.py Outdated Show resolved Hide resolved
@EliaCereda
Copy link
Contributor Author

I'll have some more time to dedicate to this over the coming week and I'd ask you if there is something I can do to bring this closer to a mergeable state.

The biggest concern is that it's quite a big PR right now. To address this, I went through to the changes that are included and I'd say they can be divided in 7 groups:

  1. A refactor of Trainer to replace the testing attribute with a more generic evaluating, which initially can only be None or 'test'.
  2. Update of various components to check the new evaluating attribute instead of testing.
  3. Addition of the Trainer.validate(...) method, which basically differs from test(...) only in setting evaluating = 'validation'
  4. Update of various components to handle the new value of evaluating.
  5. Tests for Trainer.validate(...)
  6. Changes to the docs of various components to mention the existence of a new stage.
  7. Renames of some internal methods to better reflect their new purpose (eg. one train_or_test() which becomes train_or_evaluate())

Do you think I missed anything? Of these, I think 1. and 2. could definitely make sense on their own and might be pushed forward in their own PR. On the other hand, I'm having a hard time finding a way to further separate the rest.

Do you have any suggestion in this sense? Thanks!

@Borda, @rohitgr7, @carmocca

@carmocca
Copy link
Contributor

I personally would keep it to this PR. It's definitely not the largest we've merged recently and the changes have cohesion. What do others prefer?

@rohitgr7
Copy link
Contributor

rohitgr7 commented Nov 28, 2020

I'd suggest separate PRs, just to minimize any future bugs and better review, the way suggested here: #4707 (comment)

PR-1: [1, 2] + callbacks
PR-2: [3, 4, 7]
PR-3: [5, 6]  # test and docs are almost independent so you can do it either way (separate or together)

# We do this so __attach_datamodule in trainer.py doesn't mistakenly call setup('test') on trainer.test()
stage = args[1] if len(args) > 1 else kwargs.get("stage", None)

if stage == "fit" or stage is None:
obj._has_setup_fit = True

if stage == "validation" or stage is None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would make all references in datamodule "validate" (instead of "validation) to keep it consistent with fit and test

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fit and test can be nouns or verbs, however, we are talking about stages which means they should be nouns.
So if I am not mistaken (English is not my first language), using validation is more consistent.
The validation stage vs the validate stage

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think grammar should be a consideration here since we're only talking about variables in code... and that variable name consistency is more important. Thoughts on this? @justusschock @rohitgr7

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think both of you have a point here and once could certainly use both. However, I feel that validation stage is more intuitive and personally I would go with it since it sounds 'more correct' to me, but this is just a personal opinion. Also I think, that this should definitely not be a blocker here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is a good point. I was also ambivalent about it while I was writing the code.

There is another occurrence of this issue: the Trainer.evaluating attribute, which can be either test or validation. Here validation is the right choice in my opinion, reading it as "currently evaluating over the test/validation set".

It was not so clear cut in the data module: I'd say that 'validation' sounds better for me too, but I would not be opposed to using 'validate' either.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha, it does have a better ring to it :]

def train_or_test(self):
if self.trainer.testing:
results = self.trainer.run_test()
def train_or_evaluate(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name here is a bit misleading as this also runs test

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used evaluate here to refer to either test or validate. I think it was inspired by the pre-existing Trainer.run_evaluation method, which is used to run either the test or validation loop depending on the value of the test_mode parameter.

Let me know if you have a better idea for the name!

@Borda Borda modified the milestones: 1.1, 1.2 Nov 30, 2020
@Borda Borda added the discussion In a discussion stage label Nov 30, 2020
…alidate

# Conflicts:
#	CHANGELOG.md
#	pytorch_lightning/trainer/evaluation_loop.py
#	pytorch_lightning/trainer/trainer.py
#	tests/callbacks/test_callbacks.py
@EliaCereda
Copy link
Contributor Author

Just published the two new PRs #4945 and #4948, split as proposed. Closing.

@EliaCereda EliaCereda closed this Dec 2, 2020
@Borda Borda modified the milestones: 1.2, 1.1 Dec 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design Includes a design discussion discussion In a discussion stage feature Is an improvement or enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Evaluation over the validation set
7 participants