Deep Learning Paper Implementations

This repository serves as a personal learning journey through important papers in deep learning, starting with foundational architectures and gradually expanding to more complex models. Each implementation is meant to be a clean, educational reference point with a focus on understanding the core concepts.

Current Implementations

Paper	Implementation	Key Concepts
Attention Is All You Need	transformer-implementation/	- Multi-Head Attention - Positional Encoding - Layer Normalization - Label Smoothing - Warmup Learning Rate

Transformer Implementation Details

The current implementation includes a complete transformer architecture with:

Multi-headed self-attention mechanism
Position-wise feed-forward networks
Positional encodings
Layer normalization
Encoder and decoder stacks
Label smoothing
Learning rate scheduling with warmup

Note

These implementations are meant for educational purposes and self-reference. While they aim to be correct, they may not be optimized for production use. They serve as a starting point for understanding the underlying concepts and architectures described in the papers.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
transformer-implementation		transformer-implementation
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Deep Learning Paper Implementations

Current Implementations

Transformer Implementation Details

Note

About

Releases

Packages

Languages

KartikVashishta/papers-with-code

Folders and files

Latest commit

History

Repository files navigation

Deep Learning Paper Implementations

Current Implementations

Transformer Implementation Details

Note

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages