This is the codespace for 'Application of LLMs for Bone Marrow Report Generation and Structured Data Extraction' developed by Spencer Krichevsky (2025) using Ubuntu 22.04.3
conda create -n your_env python=3.10
conda activate your_env
git clone https://github.com/spencerkrichevsky/hemepath-llm
cd hemepath-llm/
pip install -e .
pip install -r requirements.txt
Source of educational templates: https://hemepathreview.com/ReportTemplates/HemeInterps-BoneMarrow.htm
cd compare_raw_reports/
python3.10 clean_reports.py
python3.10 compare_raw_reports.py
Sample reports are already contained in ./compare_raw_reports/data/raw_reports and ./compare_raw_reports/data/cleaned_reports
cd ..
cp -r compare_raw_reports/data/cleaned_reports/*.txt run_oneshot_synthesis/data/cleaned_reports/
cd run_oneshot_synthesis/
Obtain API key to run Google Gemini model: https://ai.google.dev/gemini-api/docs/api-key
python3.10 run_oneshot_gemini.py
Obtain API key to run OpenAI GPT model: https://platform.openai.com/api-keys
python3.10 run_oneshot_gpt.py
Obtain API key to run Anthropic Claude model: https://console.anthropic.com/settings/keys
python3.10 run_oneshot_claude.py
python3.10 clean_reports.py
python3.10 compare_synthesized_reports.py
Inspect ./run_oneshot_synthesis/data/stats/compare_synthensized_reports_and_real_report.csv for results
cd ..
cd run_zeroshot_summarization/
Note: manually derived annotations are made available in: ./data/annotations/annotations.csv . These data will not match synthesized reports produced by running previous steps.
python3.10 run_zeroshot_gemini.py
python3.10 run_zeroshot_gpt.py
python3.10 run_zeroshot_claude.py
python3.10 run_nlp.py
pythone.10 compare_summarizations.py
cd ..
cd finetune_summarization/
Note: manually derived annotations are made available in: ./data/annotations/annotations.csv . These data will not match synthesized reports produced by running previous steps.
Code below will randomly sample N=10 reports from each: real reports, Gemini-synthesized, GPT-synthesized, Claude-synthesized
python3.10 sample_reports.py
python3.10 finetune_gemini.py
Note: A model ID will be pushed to OpenAI UI. This will need to be copied and pasted into file below.
python3.10 finetune_gpt.py
python3.10 compare_summarizations.py