Skip to content

Official repository for FactMM-RAG: Fact-Aware Multimodal Retrieval Augmentation for Accurate Medical Radiology Report Generation [NAACL 2025]

License

Notifications You must be signed in to change notification settings

cxcscmu/FactMM-RAG

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

[NAACL 2025] FactMM-RAG: Fact-Aware Multimodal Retrieval Augmentation for Accurate Medical Radiology Report Generation

In this work, we present FactMM-RAG, a fact-aware multimodal retrieval-augmented pipeline for generating accurate radiology reports. [Paper Link]

Pipeline

📅 Schedule

  • Release the data preprocessing code
  • Release the factual report pair mining code
  • Release the retriever training code
  • Release the generator training code

📦 Requirements

  1. Clone this repository and navigate to FactMM-RAG folder
git clone https://github.com/cxcscmu/FactMM-RAG.git
cd FactMM-RAG
  1. Install Package: Create conda environment
conda create -n FactMM-RAG python=3.10 -y
conda activate FactMM-RAG
pip install -r requirements.txt
  1. Download the required dataset and checkpoint

📖 Data Preprocessing

  1. Place the downloaded datasets in ./data/mimic and ./data/chexpert. We follow the official splitting and parse them into train, valid, and train files. To process the radiology dataset and generate the output JSON file, run the following command (e.g. train file parsing):
python ./data/parse.py --image_paths_file ./data/mimic/train.image.tok \
                 --findings_file ./data/mimic/train.findings.tok \
                 --impressions_file ./data/mimic/train.impression.tok \
                 --output_json_file ./data/mimic/train.json
  1. Annotate reports with radiological entities, clinical relations, and diagnostic labels using RadGraph and CheXbert:
python ./data/label.py --input_path ./data/mimic/train.json \
                --output_path ./data/mimic/train_labeled.json \
                --device cuda   

📖 Factual Report Pairs Mining

  1. Generate factual similarity scores using annotations from RadGraph and CheXbert. Before running the scripts, ensure that you update the data paths accordingly. Since the training corpus is large, we utilize parallel processing with SLURM array jobs for efficiency. Run the following commands:
#Query: training reports | Corpus: training reports
cd ./data/factual_mining/build_pos_train/
sbatch gen_similarity.sh
#Query: validation reports | Corpus: training reports
cd ./data/factual_mining/build_pos_valid/
sbatch gen_similarity.sh
  1. Construct query and Top-K reference report pairs based on factual similarity thresholds. Run the following command:
cd ./data/factual_mining/build_pos_train/
sbatch gen_topk_pos.sh
sh merge_topk_pos.sh

cd ./data/factual_mining/build_pos_valid/
sh gen_topk_pos.sh

🚀 Training

  1. Place the downloaded MARVEL ckpt into ./src/checkpoint/. Train the multimodal retriever using constructed query-image and reference-report pairs, incorporating in-batch negative sampling. Additionally, an optional training stage with hard negatives can be included to further enhance performance. Run the following command:
cd ./src/retriever/DPR
sh train.sh
sh gen_embeddings.sh

#Optional ANCE Training
sh gen_hard_negatives.sh
cd ./src/retriever/ANCE
sh train.sh
sh gen_embeddings.sh

📚Citation

@misc{sun2025factawaremultimodalretrievalaugmentation,
      title={Fact-Aware Multimodal Retrieval Augmentation for Accurate Medical Radiology Report Generation}, 
      author={Liwen Sun and James Zhao and Megan Han and Chenyan Xiong},
      year={2025},
      eprint={2407.15268},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2407.15268}, 
}

🙏Acknowledgement

We use code from LLaVA and MARVEL. We thank the authors for releasing their code.

About

Official repository for FactMM-RAG: Fact-Aware Multimodal Retrieval Augmentation for Accurate Medical Radiology Report Generation [NAACL 2025]

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published