This repository serves as a personal learning journey through important papers in deep learning, starting with foundational architectures and gradually expanding to more complex models. Each implementation is meant to be a clean, educational reference point with a focus on understanding the core concepts.
Paper | Implementation | Key Concepts |
---|---|---|
Attention Is All You Need | transformer-implementation/ | - Multi-Head Attention - Positional Encoding - Layer Normalization - Label Smoothing - Warmup Learning Rate |
The current implementation includes a complete transformer architecture with:
- Multi-headed self-attention mechanism
- Position-wise feed-forward networks
- Positional encodings
- Layer normalization
- Encoder and decoder stacks
- Label smoothing
- Learning rate scheduling with warmup
These implementations are meant for educational purposes and self-reference. While they aim to be correct, they may not be optimized for production use. They serve as a starting point for understanding the underlying concepts and architectures described in the papers.