Curie is the first AI-agent framework designed for automated and rigorous scientific experimentation. Curie helps answer your curiosity through end-to-end experimentation automation, ensuring that every step—from hypothesis formulation to result interpretation—is conducted with precision, reliability, and reproducibility.
Key Features
- 🚀 Automated Experimentation – End-to-end workflow management: hypothesis formulation, experiment setup, experiment execution, result analysis and finding reflection.
- 📊 Rigor Enhancement - Built-in verification modules enforce methodical procedure, reliability and interpretability.
- 🔬 Broad Applicability – Supports ML research, system analysis, and scientific discovery.
- 📖 Experimentation Benchmark - Provide 46 questions from 4 Computer Science domains, based on influential papers and open-source projects (
benchmark/experimentation_bench
).
-
Install docker: https://docs.docker.com/engine/install/ubuntu/. Grant permission to docker via
sudo chmod 666 /var/run/docker.sock
. Rundocker ps
to check the permission with the Docker daemon. -
Clone the repository:
git clone https://github.com/Just-Curieous/Curie.git
cd Curie
- Put your LLM API credentials under
curie/setup/env.sh
. Example:
export MODEL="gpt-4o"
export OPENAI_API_KEY="sk-xxx"
- Build the container image. This will take a few minutes. Note: you may need to setup a virtual environment before running pip install.
pip install -e .
cd curie && docker build --no-cache --progress=plain -t exp-agent-image -f ExpDockerfile_default .. && cd -
- Input your research question or problem statement (expected processing time: 5-10 minutes).
python3 -m curie.main -q "How does the choice of sorting algorithm impact runtime performance across different input distributions (random, nearly sorted, reverse sorted)?" --task_config curie/configs/base_config.json
OR
python3 -m curie.main -q "At what array size does parallel sorting outperform single-threaded sorting?" --task_config curie/configs/base_config.json
-
While the logs are continuously streamed, you can also check the logs at
logs/research_question_<ID>_verbose.log
. -
You can check the reproducible experimentation process under
workspace/research_<ID>/
.
Curie is designed for scientific discovery across multiple domains:
- 🔬 Machine Learning & AI Research – Hyperparameter tuning and algorithm behavior
- 💻 System Performance Analysis – Benchmarking systems, optimizing configurations, investigating system trade-offs.
- 🧪 Algorithmic & Scientific Discovery – Validating hypotheses, automating computational simulations.
Config curie/configs/base_config.json
to adapt to your own tasks:
- Add your domain-specific instructions by customizing
supervisor_system_prompt_filename
for the supervisor,control_worker_system_prompt_filename
for the experimentation worker and so on.
For any issues or feature requests, please open an issue on our GitHub Issues page.
Curie is released under the Apache 2.0 License. See LICENSE
for more details.