Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Monitor metric not found for Learning Schedulers when using Result() #3243

Closed
AxlAlm opened this issue Aug 28, 2020 · 4 comments · Fixed by #3598
Closed

Monitor metric not found for Learning Schedulers when using Result() #3243

AxlAlm opened this issue Aug 28, 2020 · 4 comments · Fixed by #3598
Labels
bug Something isn't working duplicate This issue or pull request already exists help wanted Open to be worked on priority: 0 High priority task
Milestone

Comments

@AxlAlm
Copy link

AxlAlm commented Aug 28, 2020

🐛 Bug

If you are using Result() (TrainResult() and EvalResult()) you cannot use a Learning Scheduler that monitors a metric as it will not find the metrics logged/stored by the Result() class. The available metrics for me that are listed below in the error are not the ones that exist in my TrainResult() and EvalResult().

either the update_learning_rates() function in training_loop.py is not looking in the right place for metrics or the Results() metric aggregation/updating is not updating in the correct places.

Am i right or am i doing something wrong?

ERROR:

....
~/opt/anaconda3/envs/axlnlp/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py in run_pretrain_routine(self, model)
   1237 
   1238         # CORE TRAINING LOOP
-> 1239         self.train()
   1240 
   1241     def _run_sanity_check(self, ref_model, model):

~/opt/anaconda3/envs/axlnlp/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py in train(self)
    399 
    400                 # update LR schedulers
--> 401                 self.update_learning_rates(interval='epoch')
    402 
    403                 # early stopping

~/opt/anaconda3/envs/axlnlp/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py in update_learning_rates(self, interval, monitor_metrics)
   1279                         avail_metrics = ','.join(list(self.callback_metrics.keys()))
   1280                         raise MisconfigurationException(
-> 1281                             f'ReduceLROnPlateau conditioned on metric {monitor_key}'
   1282                             f' which is not available. Available metrics are: {avail_metrics}.'
   1283                             ' Condition can be set using `monitor` key in lr scheduler dict'

MisconfigurationException: ReduceLROnPlateau conditioned on metric val-seg-f1 which is not available. Available metrics are: val_early_stop_on,val_checkpoint_on,checkpoint_on. Condition can be set using `monitor` key in lr scheduler dict
@AxlAlm AxlAlm added bug Something isn't working help wanted Open to be worked on labels Aug 28, 2020
@github-actions
Copy link
Contributor

Hi! thanks for your contribution!, great first issue!

@AxlAlm
Copy link
Author

AxlAlm commented Aug 28, 2020

This is using pl version 0.9.0

@chris-clem
Copy link

I have the same issue

@carmocca
Copy link
Contributor

Duplicate of #2976

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working duplicate This issue or pull request already exists help wanted Open to be worked on priority: 0 High priority task
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants