Proposal: Pluggable `ModelTrainer` train function #3084

plonerma · 2023-02-06T16:13:41Z

This PR aims to make the ModelTrainer easier to adapt to a wider range of training settings. I propose the following changes to move in this direction. The proposed architecture as well as the concrete implementation are very much up for discussion.

Plugin System

A plugin system has been introduced to replace the long train function and disentangle its components.
Plugins can hook into the events produced by the training loops to alter the behavior of the training procedure, produce performance scores, track results, etc.

Logging

Logging may refer to regular evaluations and irregular events. Currently there is no dedicated mechanism for logging regular evaluations.

With this PR, results of model evaluations are published via the metric_recorded event in the ModelTrainer. Regular performance reports are logged to the flair-logger in a dedicated plugin.
Loss-files and other artifacts produced by the training-loop will also be handled in dedicated plugins.

As currently, irregular events are logged via the logging module to the flair-logger wherever the occur (directly in the trainer of in separate plugins). A plugin takes care of opening and closing a file handler.

Metrics

All metrics are published as metric_recorded events in the trainer. By applying hooks to this event, these metrics can be recorded to files, a tensorboard, and/or the shell. This also allows to replace the tensorboard with a different logging framework.

Performance values are published with a metric name, the value, the type of value, the (wall)time of evaluation as well as the current step.

…r metric-recording)

…g a second time, this is more flexibel)

…gger plugins

…endeny functionality in plugin system

plonerma marked this pull request as draft February 6, 2023 16:14

plonerma changed the title ~~Pluggable ModelTrainer train function~~ Proposal: Pluggable ModelTrainer train function Feb 6, 2023

plonerma added 15 commits February 21, 2023 16:27

Minor improvements in training behavior plugin

b12c111

Improved error message on wrong hook callback return-type.

d119973

Fixed mistake in error message

566ba1d

Improved error message

c52224b

Pass epoch to collecting_train_return_values hook (may be relevant fo…

d9d59e0

…r metric-recording)

Count total batches in plugin instead of calculating it (when trainin…

667b9cd

…g a second time, this is more flexibel)

Implemented wandb alerts

b81dc1a

Merge branch 'pluggable_trainer' into pluggable_trainer_dev

e103985

Removed conditional usage variables from amp, swa, and tensorboard lo…

0bd24f7

…gger plugins

Removed references to use-conditionals

3e961ae

Fixed amp plugin, updated default plugins

8b6def8

Changed collection of return values

281d34b

Moved evaluation & basic logging to train function and deprecated dep…

4bc92e8

…endeny functionality in plugin system

Fixed minor issues

22ac9fc

Removed empty newline

640c694

alanakbik changed the base branch from master to pluggable_trainer March 24, 2023 11:43

alanakbik marked this pull request as ready for review March 24, 2023 11:44

alanakbik merged commit a92619b into flairNLP:pluggable_trainer Mar 24, 2023

alanakbik mentioned this pull request Apr 3, 2023

Major refactoring of ModelTrainer #3182

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: Pluggable `ModelTrainer` train function #3084

Proposal: Pluggable `ModelTrainer` train function #3084

plonerma commented Feb 6, 2023

Proposal: Pluggable ModelTrainer train function #3084

Proposal: Pluggable ModelTrainer train function #3084

Conversation

plonerma commented Feb 6, 2023

Plugin System

Logging

Metrics

Proposal: Pluggable `ModelTrainer` train function #3084

Proposal: Pluggable `ModelTrainer` train function #3084