python3.8
pytorch1.7 + CUDA11.0
geomloss
pykeops
numpy
matplotlib
prefect_generator
Directory description:
'data/20news': default dataset path.
'embedding_dir': word embedding file path.
'sav': checkpoint file path.
'embedding_dir/word2vec_glove.6B.100d.txt.bin': 100-dimension Glove word embedding dictionary file.
'corpus_obj.bin': Preprocessed 20NewsGroup binary corpus for rapid load and training.
Download the word embedding file form the link given in "embedding_dir/download_link.md". Then u can run the model by:
python run.py --topics 50 --batch-size=64 --lr=1e-4 --coel=0.1 --coea=5.0 --topk=10
if u find this code useful, plz kindly cite our paper:
@article{LIU2022102864,
title = {Improving neural topic modeling via Sinkhorn divergence},
journal = {Information Processing & Management},
volume = {59},
number = {3},
pages = {102864},
year = {2022},
issn = {0306-4573},
doi = {https://doi.org/10.1016/j.ipm.2021.102864},
url = {https://www.sciencedirect.com/science/article/pii/S0306457321003356},
author = {Luyang Liu and Heyan Huang and Yang Gao and Yongfeng Zhang},
keywords = {Deep learning, Topic model, Sinkhorn divergence, Auto-encoder}
}