GitHub - ToukoH/office-gpt: Building a Language Model from scratch with PyTorch and training it with dialogues from The Office.

Office GPT

This is a Language Model which generates dialogues in style of the TV-show "The Office". Data set used for the training can be found from https://www.kaggle.com/datasets/nasirkhalid24/the-office-us-complete-dialoguetranscript/data. Only the first four seasons were used for training. I got the idea for this project from https://cs224d.stanford.edu/reports/oguz.pdf and Andrej Karpathy's intro to Language Models.

The model can be trained by running:

python3 scripts/train.py

After this, inference can be ran by:

python3 scripts/inference.py START_TEXT MAX_TOKENS

The output is pure gibberish, but this is what you achieve with 10 805 078 parameters and lazy hyperparameter tuning.

Example output:

Michael:
Good morning Jim!

Jim:
I am gonna call you down in the back, you can get your party.

Michael:
Yes, it is everyone with your stripper and it co-rabbed, happy. And for smaking sprays in deposition.  Thank you.

Jan:
Michael?

Michael:
They have no idea.

Jan:
No, I didn't give a cats, I think it is in my own caskward.

Michael:
What did I die to do?

Jan:
I like the party, so...

Michael:
It was just fine for me.

Technical details

Multi-Head Attention: Enhances the model's ability to process different parts of the input sequence in parallel.
Layer Normalization: Applied both before the multi-head attention mechanism and before the feed-forward network in each transformer block.
Residual Connections: Used in each transformer block to facilitate the flow of information and gradients through the network.
Embedding Layers: The model utilizes separate embedding layers for tokens and positional encodings.
Character-Level Tokenization: Each character is treated as a distinct entity, allowing the model to learn and generate text at the granular level of individual characters.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
scripts		scripts
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Office GPT

Example output:

Technical details

About

Releases

Packages

Languages

ToukoH/office-gpt

Folders and files

Latest commit

History

Repository files navigation

Office GPT

Example output:

Technical details

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages