This repository contains a Large Language Model (LLM) built from scratch using PyTorch and Python.
All hyperparameters are kept small so that the model can be trained and inferred on a laptop with 8GB RAM and a CPU.
The final insturction fine-tuned model (127M) params, scores 46.40 against llama3-8b.
Additionally, this repo covers PyTorch fundamentals, which includes creating a Neural net, Datasets, DataLoaders and training loop for the model.
Each major part of the LLM is implemented in a separate notebook:
.
├── pytorch # PyTorch basics
├── tokenizer # Text tokenization and preprocessing
├── attention # Attention mechanism implementation (self & multi-head)
├── GPT # Transformer implementation
├── Pre-Train # Pre training of model + loading GPT-2 weights in the model
├── finetune-classification # Fine-tuning to detect spam
├── finetune-instruction # Fine-tuning to follow instructions
├── components.py # contains all components of LLM
└── README.md
Clone the repository and install dependencies:
git clone https://github.com/toheedakhtar/llm-scratch.git
cd llm-scratch
pip install -r requirements.txt
This repo provides a look under an LLM and is best for learning :
- Go to each notebook step by step understanding what is happening
- Reading and executing code
- and modifying it according to your needs (dataset, tune Hyperparameters, tokenizer)
This model is optimized for CPU training, making it accessible for most laptops.
Minimum Requirement: 8GB RAM + CPU
Tested on: Ryzen 5625U, 8GB DDR4 RAM
For faster training: A GPU (NVIDIA CUDA-enabled) is recommended.
This project is made while reading the book "Build a Large Language Model (From Scratch)" by Sebastian Raschka and follows its structure and code.