Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TrainerState refactor [5/5] #7173

Merged
merged 19 commits into from
May 4, 2021
Merged

TrainerState refactor [5/5] #7173

merged 19 commits into from
May 4, 2021

Conversation

carmocca
Copy link
Contributor

@carmocca carmocca commented Apr 23, 2021

What does this PR do?

TrainerState is now a dataclass which contains the TrainerStatus, TrainerFn (previous TrainerState), and RunningStage.
See the changes to pytorch-lightning/pytorch_lightning/trainer/states.py for an overview. All other changes are just fixing the code to work with that.

cc: @ananthsub. The TrainerState dataclass could be the one used for progress tracking in the future.

I'm open to different class names if the proposed ones are confusing.

Fixes

(Reported on Slack)

When KeyboardInterrupt is handled in the trainer, we set self.state = TrainerState.INTERRUPTED, thus losing the reference to the original TrainerState ({fit,validate,test,predict}). This means when teardown gets called, we don't know which stage should be set.

Fixes #7286

Before submitting

  • Was this discussed/approved via a GitHub issue? (not for typos and docs)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure your PR does only one thing, instead of bundling different changes together?
  • [n/a] Did you make sure to update the documentation with your changes? (if necessary)
  • [n/a] Did you write any new necessary tests? (not for typos and docs)
  • Did you verify new and existing tests pass locally with your changes?
  • Did you update the CHANGELOG? (not for typos, docs, test updates, or internal minor changes/refactorings)

PR review

  • Is this pull request ready for review? (if not, please submit in draft mode)
  • Check that all items from Before submitting are resolved
  • Make sure the title is self-explanatory and the description concisely explains the PR
  • Add labels and milestones (and optionally projects) to the PR so it can be classified

@carmocca carmocca added bug Something isn't working feature Is an improvement or enhancement refactor labels Apr 23, 2021
@carmocca carmocca self-assigned this Apr 23, 2021
@pep8speaks
Copy link

pep8speaks commented Apr 23, 2021

Hello @carmocca! Thanks for updating this PR.

Line 292:13: W503 line break before binary operator

Comment last updated at 2021-05-04 10:05:34 UTC

@codecov
Copy link

codecov bot commented Apr 26, 2021

Codecov Report

Merging #7173 (c3f6354) into master (a6aa1a0) will increase coverage by 0%.
The diff coverage is 92%.

@@          Coverage Diff           @@
##           master   #7173   +/-   ##
======================================
  Coverage      91%     91%           
======================================
  Files         200     200           
  Lines       12916   12922    +6     
======================================
+ Hits        11764   11792   +28     
+ Misses       1152    1130   -22     

@carmocca carmocca mentioned this pull request Apr 27, 2021
9 tasks
@carmocca carmocca changed the title [WIP] TrainerState refactor [WIP] TrainerState refactor [3/n] Apr 27, 2021
@carmocca carmocca changed the title [WIP] TrainerState refactor [3/n] [WIP] TrainerState refactor [4/n] Apr 28, 2021
@carmocca carmocca changed the title [WIP] TrainerState refactor [4/n] [WIP] TrainerState refactor [5/n] Apr 28, 2021
@carmocca carmocca changed the title [WIP] TrainerState refactor [5/n] [WIP] TrainerState refactor [5/5] Apr 30, 2021
@carmocca carmocca changed the title [WIP] TrainerState refactor [5/5] TrainerState refactor [5/5] May 3, 2021
@carmocca carmocca added this to the v1.3 milestone May 3, 2021
@carmocca carmocca marked this pull request as ready for review May 3, 2021 13:45
@carmocca carmocca requested a review from williamFalcon as a code owner May 3, 2021 13:45
Copy link
Contributor

@ananthsub ananthsub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great refactor! I think this will make #7315 easier by providing a central dataclass we can build on top of.

How'd you find all the references to trainer.state or trainer.running_stage? grep? Deleting the properties and seeing what broke?

@carmocca
Copy link
Contributor Author

carmocca commented May 3, 2021

How'd you find all the references to trainer.state or trainer.running_stage? grep? Deleting the properties and seeing what broke?

Bit of both :)

@mergify mergify bot added the has conflicts label May 3, 2021
@carmocca carmocca added the ready PRs ready to be merged label May 3, 2021
@mergify mergify bot removed the has conflicts label May 3, 2021
@Borda Borda enabled auto-merge (squash) May 4, 2021 07:01
Copy link
Member

@Borda Borda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just need to clarify the back compatibility, how about usest who were asking trainer.state?

Copy link
Contributor

@tchaton tchaton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really neat !

@edenlightning edenlightning added priority: 0 High priority task Important labels May 4, 2021
@mergify mergify bot added the has conflicts label May 4, 2021
@mergify mergify bot removed the has conflicts label May 4, 2021
@carmocca
Copy link
Contributor Author

carmocca commented May 4, 2021

just need to clarify the back compatibility, how about usest who were asking trainer.state?

If users were relying on it, it will break but:

  • I don't see how that could be avoided
  • TrainerState is not in the docs
  • I don't think we have ever advertised this as something people can use
  • The previous TrainerState is not in any stable release yet - this is why It's important to get this into 1.3

@Borda Borda merged commit 8c0ea92 into master May 4, 2021
@Borda Borda deleted the trainer-state-refactor branch May 4, 2021 10:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working feature Is an improvement or enhancement priority: 0 High priority task ready PRs ready to be merged refactor
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AttributeError: 'DataModule' object has no attribute 'has_teardown_None'
8 participants