This exercsie will allow you to explore language modelling. We focus on the key concept of multi-head attention.
Navigate to the src/attention_model.py
-file and implement multi-head attention [1]
To make attention useful in a language modelling scenario we cannot use future information. A model without access to upcoming future inputs or words is known as causal. Since our attention matrix is multiplied from the left we must mask out the upper triangle excluding the main diagonal for causality.
Keep in mind that
Furthermore write a function to convert the network output of vector encodings back into a string by completing the convert
function in src/util.py
.
[1] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin: Attention is All you Need. NIPS 2017: 5998-6008
Once you have implemented and tested your version of attention run sbatch scripts/train.slurm
to train your model on Bender. Once converged you can generate poetry via sbatch scripts/generate.slurm
.
Run src/model_chat.py
to talk to your model.