The automated generation of imaging reports proves invaluable in alleviating the workload of radiologists. A clinically applicable reports generation algorithm should demonstrate its effectiveness in producing reports that accurately describe radiology findings and attend to patient-specific indications. In this paper, we introduce a novel method, Structural Entities extraction and patient indications Incorporation (SEI) for chest X-ray report generation. Specifically, we employ a structural entities extraction (SEE) approach to eliminate presentation-style vocabulary in reports and improve the quality of factual entity sequences. This reduces the noise in the following cross-modal alignment module by aligning X-ray images with factual entity sequences in reports, thereby enhancing the precision of cross-modal alignment and further aiding the model in gradient-free retrieval of similar historical cases. Subsequently, we propose a cross-modal fusion network to integrate information from X-ray images, similar historical cases, and patient-specific indications. This process allows the text decoder to attend to discriminative features of X-ray images, assimilate historical diagnostic information from similar cases, and understand the examination intention of patients. This, in turn, assists in triggering the text decoder to produce high-quality reports. Experiments conducted on MIMIC-CXR validate the superiority of SEI over state-of-the-art approaches on both natural language generation and clinical efficacy metrics.
- 2024-09-09, Upload the Poster
- 2024-09-19, Update the repository to make it easy.
- 2024-09-19, Update the generated reports for the MIMIC-CXR test set.
torch==2.1.2+cu118
transformers==4.23.1
torchvision==0.16.2+cu118
radgraph==0.09
- Due to the specific environment of RadGraph, please refer to
knowledge_encoder/factual serialization. py
for the environment of the structural entities approach.
You can download checkpoints of SEI as follows:
- For
MIMIC-CXR
, you can download checkpoints from Baidu Netdisk (its code isMK13
) and huggingface 🤗.
-
For
MIMIC-CXR
, you can download medical images from PhysioNet. -
You can download
medical reports
from Google Drive. Note that you can apply with your license of PhysioNet, and itstoy case
is inknowledge_encoder/case.json
-
Config RadGraph environment based on
knowledge_encoder/factual_serialization.py
===================environmental setting=================
Basic Setup (One-time activity)
a. Clone the DYGIE++ repository from here. This repository is managed by Wadden et al., authors of the paper Entity, Relation, and Event Extraction with Contextualized Span Representations.
git clone https://github.com/dwadden/dygiepp.git
b. Navigate to the root of repo in your system and use the following commands to set the conda environment:
conda create --name dygiepp python=3.7 conda activate dygiepp cd dygiepp pip install -r requirements.txt conda develop . # Adds DyGIE to your PYTHONPATH
c. Activate the conda environment:
conda activate dygiepp
Notably, for our RadGraph environment, you can refer to
knowledge_encoder/radgraph_requirements.yml
. -
Config
radgraph_path
andann_path
inknowledge_encoder/see.py
.annotation.json
, can be obtained from here. Note that you can apply with your license of PhysioNet. -
Run the
knowledge_encoder/see.py
to extract factual entity sequence for each report. -
Finally, the
annotation.json
becomesmimic_cxr_annotation_sen.json
that is identical tonew_ann_file_name
variable insee.py
- Run
bash pretrain_mimic_cxr.sh
to pretrain a model on the MIMIC-CXR data (Note that themimic_cxr_ann_path
ismimic_cxr_annotation_sen.json
).
-
Config
--load
argument inpretrain_inference_mimic_cxr.sh
. Note that the argument is the pre-trained model from the first stage. -
Run
bash pretrain_inference_mimic_cxr.sh
to retrieve similar historical cases for each sample, formingmimic_cxr_annotation_sen_best_reports_keywords_20.json
(i.e., themimic_cxr_annotation_sen.json
becomes this*.json
file).
-
Extract and preprocess the
indication section
in the radiology report.a. Config
ann_path
andreport_dir
inknowledge_encoder/preprocessing_indication_section.py
, and its value ismimic_cxr_annotation_sen_best_reports_keywords_20.json
. Note thatreport_dir
can be downloaded from PhysioNet.b. Run
knowledge_encoder/preprocessing_indication_section.py
, formingmimic_cxr_annotation_sen_best_reports_keywords_20_all_components_with_fs_v0227.json
-
Config
--load
argument infinetune_mimic_cxr.sh
. Note that the argument is the pre-trained model from the first stage. Furthermore,mimic_cxr_ann_path
ismimic_cxr_annotation_sen_best_reports_keywords_20_all_components_with_fs_v0227.json
-
Download these checkpoints. Notably, the
chexbert.pth
andradgraph
are used to calculate CE metrics, andbert-base-uncased
andscibert_scivocab_uncased
are pre-trained models for cross-modal fusion network and text encoder. Then put these checkpoints in the same local dir (e.g., "/home/data/checkpoints"), and configure the--ckpt_zoo_dir /home/data/checkpoints
argument infinetune_mimic_cxr.sh
Chekpoint | Variable_name | Download |
---|---|---|
chexbert.pth | chexbert_path | here |
bert-base-uncased | bert_path | huggingface |
radgraph | radgraph_path | PhysioNet |
scibert_scivocab_uncased | scibert_path | huggingface |
- Run
bash finetune_mimic_cxr.sh
to generate reports based on similar historical cases.
-
You must download the medical images, their corresponding reports (i.e.,
mimic_cxr_annotation_sen_best_reports_keywords_20_all_components_with_fs_v0227.json
), and checkpoints (i.e.,SEI-1-finetune-model-best.pth
) in Section Datasets and Section Checkpoints, respectively. -
Config
--load
and--mimic_cxr_ann_path
arguments intest_mimic_cxr.sh
-
Run
bash test_mimic_cxr.sh
to generate reports based on similar historical cases. -
Results on MIMIC-CXR are presented as follows:
- Next, the code for this project will be streamlined.
If you use or extend our work, please cite our paper at MICCAI 2024.
@InProceedings{liu-sei-miccai-2024,
author={Liu, Kang and Ma, Zhuoqi and Kang, Xiaolu and Zhong, Zhusi and Jiao, Zhicheng and Baird, Grayson and Bai, Harrison and Miao, Qiguang},
title={Structural Entities Extraction and Patient Indications Incorporation for Chest X-Ray Report Generation},
booktitle={Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
year={2024},
publisher={Springer Nature Switzerland},
address={Cham},
pages={433--443},
isbn={978-3-031-72384-1},
doi={10.1007/978-3-031-72384-1_41}
}
- R2Gen Some codes are adapted based on R2Gen.
- R2GenCMN Some codes are adapted based on R2GenCMN.
- MGCA Some codes are adapted based on MGCA.
[1] Chen, Z., Song, Y., Chang, T.H., Wan, X., 2020. Generating radiology reports via memory-driven transformer, in: EMNLP, pp. 1439–1449.
[2] Chen, Z., Shen, Y., Song, Y., Wan, X., 2021. Cross-modal memory networks for radiology report generation, in: ACL, pp. 5904–5914.
[3] Wang, F., Zhou, Y., Wang, S., Vardhanabhuti, V., Yu, L., 2022. Multigranularity cross-modal alignment for generalized medical visual representation learning, in: NeurIPS, pp. 33536–33549.