Proposal: Pluggable ModelTrainer
train function
#3084
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR aims to make the
ModelTrainer
easier to adapt to a wider range of training settings. I propose the following changes to move in this direction. The proposed architecture as well as the concrete implementation are very much up for discussion.Plugin System
A plugin system has been introduced to replace the long
train
function and disentangle its components.Plugins can hook into the events produced by the training loops to alter the behavior of the training procedure, produce performance scores, track results, etc.
Logging
Logging may refer to regular evaluations and irregular events. Currently there is no dedicated mechanism for logging regular evaluations.
With this PR, results of model evaluations are published via the
metric_recorded
event in the ModelTrainer. Regular performance reports are logged to the flair-logger in a dedicated plugin.Loss-files and other artifacts produced by the training-loop will also be handled in dedicated plugins.
As currently, irregular events are logged via the logging module to the flair-logger wherever the occur (directly in the trainer of in separate plugins). A plugin takes care of opening and closing a file handler.
Metrics
All metrics are published as
metric_recorded
events in the trainer. By applying hooks to this event, these metrics can be recorded to files, a tensorboard, and/or the shell. This also allows to replace the tensorboard with a different logging framework.Performance values are published with a metric name, the value, the type of value, the (wall)time of evaluation as well as the current step.