RaDialog: A Large Vision-Language Model for Radiology Report Generation and Conversational Assistance
Authors: Chantal Pellegrini, Ege Özsoy, Benjamin Busam, Nassir Navab, Matthias Keicher
✨ News ✨
- 12 July 2024: We published a new version of our Instruct Datset including additional tasks on PhysioNet
- 29 May 2024: The LLaVA version of RaDialog is now publically available on Huggingface and Github. This new version is much better in conversational assistance, easier to use and allows a simple inference setup with huggingface!
- 26 March 2024: RaDialog Instruct Dataset now available on PhysioNet!
Conversational AI tools that can generate and discuss clinically correct radiology reports for a given medical image have the potential to transform radiology. Such a human-in-the-loop radiology assistant could facilitate a collaborative diagnostic process, thus saving time and improving the quality of reports. Towards this goal, we introduce RaDialog, the first thoroughly evaluated and publicly available large vision-language model for radiology report generation and interactive dialog. RaDialog effectively integrates visual image features and structured pathology findings with a large language model (LLM) while simultaneously adapting it to a specialized domain using parameter-efficient fine-tuning. To keep the conversational abilities of the underlying LLM, we propose a comprehensive, semi-automatically labeled, image-grounded instruct dataset for chest X-ray radiology tasks. By training with this dataset, our method achieves state-of-the-art clinical correctness in report generation and shows impressive abilities in interactive tasks such as correcting reports and answering questions, serving as a foundational step toward clinical dialog systems.
- clone this repository and move to the radialog directory with
cd RaDialog
- Install the RaDialog environment with
conda create --name radialog python=3.7
- Activate the environment with
conda activate radialog
- Install the requirements with
pip install -r requirements.txt
- Install hl-ml-multimodal with
pip install hi-ml-multimodal==0.2.0
- Reinstall correct versions of torch and transformers with
pip install torch==1.13.0 transformers==4.28.1
- Install java and set JAVA_HOME and PATH in local_config.py (we used jre1.8.0)
- Install the CheXbert environment with
conda create --name chexbert python=3.7
- Activate the environment with
conda activate chexbert
- Move to the chexbert directory with
cd chexbert
- Install the requirements with
pip install -r requirements.txt
- Set the absolute path to the chexbert env and folder in
RaDialog/local_config.py
- Download the pretrained models from here
- place chexbert.pth in RaDialog/chexbert/src/checkpoint/
- unzip vicuna-7b-img-instruct.zip and vicuna-7b-img-report.zip and place folders into RaDialog/checkpoints/
- unzip chexpert_train and place folder into RaDialog/findings_classifier/checkpoints/
- unzip embs and place folder into RaDialog/pretraining/
- unzip checkpoint_4.pth and place it into outputs/stage1_pt_instruct_blip_origlr_img448/
- Download the MIMIC-CXR-JPG dataset from here
- The dataset should be saved in .../physionet.org/files/mimic-cxr-jpg
- Go to physionet.org/files/mimic-cxr-jpg/files/ and unzip mimic-cxr-2.0.0-split.csv.gz
- from here, dowload mimic-cxr-reports.zip
- unzip it and place the folder in the same directory as the MIMIC-CXR-JPG dataset (e.g. physionet.org/files/)
- in local_config.py set the path to the MIMIC-CXR dataset (e.g. .../physionet.org/files/)
- in model/lavis/defaults_report.yaml set the path to the MIMIC-CXR-JPG dataset (e.g. .../physionet.org/files/mimic-cxr-jpg/2.0.0 )
- go to the mimic-cxr folder in the code with
cd mimic-cxr
- run
python create_section_files.py
to prepare the report data - go back to the RaDialog directory with
cd ..
- As MIMIC-CXR needs a certified PhysioNet account to be accessed, we can not publish our instruct dataset directly.
- We are working on publishing the instruct dataset on PhysioNet. In the meantime, you can create an instruct dataset yourself by following the steps below or just use our pre-trained model.
- The MIMIC-NLE data has to be generated first, as it also contains protected data. Follow the instructions here to generate the MIMIC-NLE data and set the path to the MIMIC-NLE data in
local_config.py
. - For the correction task, you can write us, then we can share the used incorrect predictions with you.
- To generate data without Correction or Reasoning (MIMIC-NLE), please comment our line 335 or 336 in "create_data.py" accordingly.
Data for RaDialog-RG:
- run
python -m data.create_data --mode "RG"
to generate the report generation dataset in the required format (no instruct data)
Data for RaDialog-INS:
- run
python -m data.create_data --mode "INS"
to generate the instruct dataset
- run
python demo.py --cfg-path pretraining/configs/blip2_pretrain_stage1_emb.yaml
to start the demo - connect to the demo with a browser at
http://127.0.0.1:7860
and start chatting with RaDialog
- RaDialog-RG: run
python test.py --prompt img_matching_examples_ig2_noexamples_IMG_findings --use_embs --num_workers 0 --lora_model checkpoints/vicuna-7b-img-report/checkpoint-11200
- RaDialog-INS: run
python test.py --prompt img_matching_examples_ig2_noexamples_IMG_findings --use_embs --num_workers 0 --lora_model checkpoints/vicuna-7b-img-instruct/checkpoint-4800
- RaDialog-INS (correction): run
python test.py --prompt img_matching_examples_ig2_noexamples_IMG_findings --use_embs --num_workers 0 --lora_model checkpoints/vicuna-7b-img-instruct/checkpoint-4800 --do_corr
- RaDialog-INS (findings QA): run
python test.py --prompt img_matching_examples_ig2_noexamples_IMG_findings --use_embs --num_workers 0 --lora_model checkpoints/vicuna-7b-img-instruct/checkpoint-4800 --do_cp_all_qa
(or --do_cp_bin_qa)
- run
python -m findings_classifier.chexpert_train --train --run_name "train_chexbert"
- in chexpert_train.py set ckpt_path (line 152) to the path of the trained model you just trained
- then run
python -m findings_classifier.chexpert_train --run_name "save_preds"
to save the predictions of the trained model
- run
python -m pretraining.train --cfg-path pretraining/configs/blip2_pretrain_stage1.yaml
, we used the 4th epoch checkpoint - run
python -m pretraining.train --cfg-path pretraining/configs/blip2_pretrain_stage1_emb.yaml
, to save the embeddings of the trained model
Train RaDialog-RG:
- run
python finetune.py --use_embs True --base_model 'vicuna_v7' --output_dir 'checkpoints/lora-vicuna-7b-report' --wandb_run_name lora-vicuna-7b-report --prompt_template_name vicuna_v11 --data_path "data/data_files/mimic_cxr_reports_stratified.json" --cutoff_len 600 --num_epochs 10
- we used checkpoint-11200
Train RaDialog-INS:
- run
python finetune.py --use_embs True --base_model 'vicuna_v7' --output_dir 'checkpoints/lora-vicuna-7b-instruct' --wandb_run_name lora-vicuna-7b-instruct --prompt_template_name vicuna_v11 --data_path "data/data_files/mimic_cxr_instruct_stratified.json" --cutoff_len 800 --num_epochs 10
- we used checkpoint-4800
To use a model from a checkpoint, you'll need to perform the following steps:
- make a copy of "pytorch_model.bin" and rename it to "adapter_model.bin"
- copy adapter_config.json to the checkpoint folder (it will be generated after the last epoch or you can copy it from the checkpoints we provide)
When using our model or dataset, please cite:
@article{pellegrini2023radialog,
title={RaDialog: A Large Vision-Language Model for Radiology Report Generation and Conversational Assistance},
author={Pellegrini, Chantal and {\"O}zsoy, Ege and Busam, Benjamin and Navab, Nassir and Keicher, Matthias},
journal={arXiv preprint arXiv:2311.18681},
year={2023}
}