DiffuSum: Generation Enhanced Extractive Summarization with Diffusion

The repo contains source codes for ACL 2023 Findings paper DiffuSum: Generation Enhanced Extractive Summarization with Diffusion

https://arxiv.org/abs/2305.01735

Abstract

Extractive summarization aims to form a summary by directly extracting sentences from the source document. Existing works mostly formulate it as a sequence labeling problem by making individual sentence label predictions. This paper proposes DiffuSum, a novel paradigm for extractive summarization, by directly generating the desired summary sentence representations with diffusion models and extracting sentences based on sentence representation matching. In addition, DiffuSum jointly optimizes a contrastive sentence encoder with a matching loss for sentence representation alignment and a multi-class contrastive loss for representation diversity. Experimental results show that DiffuSum achieves the new state-of-the-art extractive results on CNN/DailyMail with ROUGE scores of $44.83/22.56/40.56$. Experiments on the other two datasets with different summary lengths also demonstrate the effectiveness of DiffuSum. The strong performance of our framework shows the great potential of adapting generative models for extractive summarization.

code

We reuse some codes from:

Li, Xiang Lisa, et al. "Diffusion-LM Improves Controllable Text Generation.".

and

Khosla, Prannay, et al. "Supervised contrastive learning."

Model architecture

Environment Setup

conda create --name [envname] python=3.8
pip install -r requirements.txt

Data preprocessing

We reuse the preprocessed data from MatchSum repo: https://github.com/maszhongming/MatchSum . Please download the data and put it in the corresponeding directories.

Train & Test

Train

# train on CNN/DM
python run_train_sum.py --diff_steps 500 --model_arch transformer_sum --lr 1e-5 --seed 101 --noise_schedule sqrt --in_channel 128 --modality roc --submit no --padding_mode pad --app "--predict_xstart True --training_mode e2e --roc_train cnn_ext " --notes cnn --bsz 64 --epochs 10

# train on PubMed
python run_train_sum.py --diff_steps 500 --model_arch transformer_sum --lr 1e-5 --seed 101 --noise_schedule sqrt --in_channel 128 --modality roc --submit no --padding_mode pad --app "--predict_xstart True --training_mode e2e --roc_train pubmed " --notes pubmed --bsz 64 --epochs 10

# train on XSum
python run_train_sum.py --diff_steps 500 --model_arch transformer_sum --lr 1e-5 --seed 101 --noise_schedule sqrt --in_channel 128 --modality roc --submit no --padding_mode pad --app "--predict_xstart True --training_mode e2e --roc_train xsum " --notes xsum --bsz 64 --epochs 10

Test

python batch_decode_sum.py [checkpoint_path] -1.0 ema

References

@article{Zhang2023DiffuSumGE,
  title={DiffuSum: Generation Enhanced Extractive Summarization with Diffusion},
  author={Haopeng Zhang and Xiao Liu and Jiawei Zhang},
  journal={ArXiv},
  year={2023},
  volume={abs/2305.01735}
}

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
imp_diff		imp_diff
readme.assets		readme.assets
.DS_Store		.DS_Store
LICENSE		LICENSE
batch_decode_sum.py		batch_decode_sum.py
batch_nll.py		batch_nll.py
eval_bertscore.py		eval_bertscore.py
eval_rouge.py		eval_rouge.py
preprocess_data.py		preprocess_data.py
readme.md		readme.md
requirements.txt		requirements.txt
run_train_sum.py		run_train_sum.py
setup.py		setup.py
supervised_contrastive_loss.py		supervised_contrastive_loss.py
text_sample_sum.py		text_sample_sum.py
train_run.py		train_run.py
train_sum.py		train_sum.py
tree_helper.py		tree_helper.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DiffuSum: Generation Enhanced Extractive Summarization with Diffusion

Abstract

code

Model architecture

Environment Setup

Data preprocessing

Train & Test

Train

Test

References

About

Releases

Packages

Languages

License

hpzhang94/DiffuSum

Folders and files

Latest commit

History

Repository files navigation

DiffuSum: Generation Enhanced Extractive Summarization with Diffusion

Abstract

code

Model architecture

Environment Setup

Data preprocessing

Train & Test

Train

Test

References

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages