-
Captioning : Medical Image Captioning
-
Dataset : Chexpert 등 (공용)
-
SSL : Self-Supervised Learning (혁종)
-
SL : Supervised pre-training (진수)
-
FeatureEval : Linear Evaluation / K-NN Evaluation(?) (유진)
-
Visualization : NLP 관련 시각화 / 그 외 시각화 샘플들 (승용)
- 샘플은 여유 되는대로 쭉 추가 할게용
-
ETC : 진짜 잡폴더
- Official Paper : https://arxiv.org/pdf/1901.07031.pdf
- Official Site : https://stanfordmlgroup.github.io/competitions/chexpert/
- Other Github : https://github.com/gaetandi/cheXpert
- Dataset, Pre-Training, Evaluation 다 있어요 !
Many research use only language model in image captioning.
In other words, model's input is image feature from pre-trained CNN networks
In contrast, my model take image as input, so that CNN networks can also learn information about image captioning task.
Maybe CNN networks(visual encoder) will have potential to work better than (practically frozen)pre-trained CNN on X-ray datasets(e.g. CheXpert), in this way.
Besides, my DeXTr also use several normal images as input(visual encoder part), extract mutual information between input and normal images (feature difference part), pass this information(feature) to X-Transformer (language model part).
See below for details.
see the function __getitem__
in Dextr/coco_dataset.py
Wrote the code of contrastive attention based on theory of Liu et al.(2022)
DeXTr(Full architecture) : DeXTr/models/Detr.py
Visual Encoder : DeXTr/models/visual_extractor.py
Feature Difference : CA in DeXTr/models/contra_att.py & Others in DeXTr/models/Detr.py
Language Model+Report Generation : Code by Pan(Author of X-LAN)
Training : DeXTr/main_mimic.py
$ CUDA_VISIBLE_DEVICES=1 python3 main_mimic.py --folder ./experiments/name