Tutorial for Reproducing 'Large Language Monkeys' Results

The paper Large Language Monkeys: Scaling Inference Compute with Repeated Sampling explores repeated sampling as a method to enhance reasoning performance in large language models (LLMs) by increasing inference compute.

Download the related starter files under workspace.

cd Curie
git submodule update --init --recursive

Be curious.

As a LLM researcher, you are just curious about how does the number of repeatedly generated samples per question impact the overall success? (The concrete question can be found in our benchmark benchmark/experimentation_bench/llm_reasoning/q1_simple_relation.txt, which specify the location of corresponding starter files.)

cd Curie
python3 -m curie.main --iterations 1 --question_file benchmark/experimentation_bench/llm_reasoning/q1_simple_relation.txt --task_config curie/configs/llm_reasoning_config.json

(We pre-specify the starter file directory name inside llm_reasoning_config.json.)

You can check the logging under logs/q1_simple_relation_<ID>.log.
You can check the reproducible experimentation process under workspace/large_language_monkeys_<ID>.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tutorial-large-language-monkey.md

tutorial-large-language-monkey.md

Tutorial for Reproducing 'Large Language Monkeys' Results

Files

tutorial-large-language-monkey.md

Latest commit

History

tutorial-large-language-monkey.md

File metadata and controls

Tutorial for Reproducing 'Large Language Monkeys' Results