Skip to content

toheedakhtar/llm-scratch

Repository files navigation

LLM from Scratch

This repository contains a Large Language Model (LLM) built from scratch using PyTorch and Python.

All hyperparameters are kept small so that the model can be trained and inferred on a laptop with 8GB RAM and a CPU.

The final insturction fine-tuned model (127M) params, scores 46.40 against llama3-8b.

Additionally, this repo covers PyTorch fundamentals, which includes creating a Neural net, Datasets, DataLoaders and training loop for the model.

Structure

Each major part of the LLM is implemented in a separate notebook:

.
├── pytorch                               # PyTorch basics 
├── tokenizer                             # Text tokenization and preprocessing
├── attention                             # Attention mechanism implementation (self & multi-head)
├── GPT                                   # Transformer implementation
├── Pre-Train                             # Pre training of model + loading GPT-2 weights in the model
├── finetune-classification               # Fine-tuning to detect spam
├── finetune-instruction                  # Fine-tuning to follow instructions
├── components.py                         #  contains all components of LLM
└── README.md

Installation

Clone the repository and install dependencies:

git clone https://github.com/toheedakhtar/llm-scratch.git  
cd llm-scratch  
pip install -r requirements.txt  

Usage

This repo provides a look under an LLM and is best for learning :

  • Go to each notebook step by step understanding what is happening
  • Reading and executing code
  • and modifying it according to your needs (dataset, tune Hyperparameters, tokenizer)

Hardware Requirements

This model is optimized for CPU training, making it accessible for most laptops.

Minimum Requirement: 8GB RAM + CPU
Tested on: Ryzen 5625U, 8GB DDR4 RAM
For faster training: A GPU (NVIDIA CUDA-enabled) is recommended.

Acknowledgements

This project is made while reading the book "Build a Large Language Model (From Scratch)" by Sebastian Raschka and follows its structure and code.

About

building a Large Language Model (LLM) from scratch.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published