- The repository has been rewritten in a cleaner way. A huge part of code has been moved to
utils
, betweendata_loading
andlogging_config
. - Linear layer (Convnet <> Gru communication) size is now calculated at model
__init__.py
and not hardcoded as it was.
- A pre-commit was added including
black
andflake8
for the moment. - Tests were heavily implemented for model init, forward, backward and ctc/cross-entropy loss.
- Test for
decode_predictions
(ctc version). - Any resolution is now supported by the model.
- Training a model with attention and cross entropy loss is now finally possible.
Trained with Cross Entropy Loss and Attention:
- Attention layer is now properly applied to the hidden states of GRU Unit. Before, attention was directly projected to linear layer, now Gated Recurrent Unit hidden states are multiplied by the attention weights and then projected to linear layer.
- CrossEntropyLoss is now properly working. Before it was calculating probabilistic wise the loss between two 3d tensors, now tensors are reshaped to work like in image classification, i.e the target is now reshaped to a 1d long tensor (not float) that holds the batch of targets in sequence, and only the correct indexes are there, not one hot enconding like before (target is tensor of shape: batch_size * sequence_length).
- Cross Entropy Loss was tested and achieved 80% accuracy on same dataset with same model as using CTC Loss. Cross Entropy Loss has to be multiplied by some scalar to compensate the fact that padding is quickly learned by the model. In the case I tested, targets had 6 characters and 39 pad tokens, thus, the loss is very low at the very start of the training, because 39 classifications are always correct and learned very fast since thats a strong bias.
- Added gradient clipping in the training procedure (torch.nn.utils.clip_grad_norm_) with default of 5. It's not parameterizable as I didn't see the necessity of it, but it's at
engine.py.train_fn
right afterloss.backward()
- Attention network is now at models/attention.py.
- Refactored ugly variable names in train.py.
- Cross Entropy loss is now available to use with Attention. Attention is also optional. To use it simply create the model with:
from models.crnn import CRNN model = CRNN(dims=256, num_chars=35, use_attention=True, use_ctc=True )
- Pad + One Hot Encoding method (used if you want to train with cross entropy loss).
- A decoder (utils/model_decoders.py.decode_padded_predictions) if cross entropy is used (simple output decoder replacing pad token with empty string).
- General documentation for methods and model creation.
- Removed the old models as attention and loss functions are now parameterizable.
- Added Attention mechanism at prediction stage of CRNN (available at models/attention_crnn.py)
- Refactored the whole codebase.
- Added a new CRNN with ResNet backbone.
- Added hydra configs (before it had python config.py file).
- Added hydra logging which now outputs the training log into a file. Rich tables are displayed while training, but training log like losses and accuracies with timestamps are available at train.log (that is written inside /output/date/time/train.log)
- Added a CTC Decoder method.
- config.py was replaced by hydra.
- Old methods to remove duplicates (now using ctc decoder).