The Pythia Trials

Intro

Public repository to perform some research on Pythia and OLMo and how they behave when varying sizes/checkpoints in training. The hope is to reverse engineer some algorithms!

2-digit Addition

I have started by looking at 2-digit addition because it is a super simple task that is easy to understand, reason about, and to implement. With 2-digit addition, I am referring to the task

question = “19+21” prompt = f"Question: What is {question}? Answer: {question}="

Where the response has to be the correct summation. I only iterate through questions where the result is still two digits, ie. I stop iterating before 50+50=100.

The following table shows the performance of OLMo and Pythia along various sizes

Model	Size	2-sum
EleutherAI/pythia-14m	14m	0.00%
EleutherAI/pythia-70m-deduped	70m	0.00%
EleutherAI/pythia-160m-deduped	70m	0.16%
EleutherAI/pythia-410m-deduped	410m	4.56%
EleutherAI/pythia-1B-deduped	1B	5.08%
EleutherAI/pythia-1.4B-deduped	1.4B	20.69%
EleutherAI/pythia-2.8B-deduped	2.8B	86.32%
EleutherAI/pythia-6.9B-deduped	6.9B	91.88%
EleutherAI/pythia-12B-deduped	12B	88.96%

allenai/OLMo-1B-0724-hf	1B	42.86%
hamishivi/OLMo-1B-0724-SFT-hf	1B	53.50%
hamishivi/OLMo-1B-0724-Instruct-hf	1B	52.02%

allenai/OLMo-7B-0724-hf	7B	99.88%
allenai/OLMo-7B-0724-SFT-hf	7B
allenai/OLMo-7B-0724-Instruct-hf	7B

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

The Pythia Trials

Intro

2-digit Addition

Files

README.md

Latest commit

History

README.md

File metadata and controls

The Pythia Trials

Intro

2-digit Addition