LDA Collasped Gibbs Sampling python implementation

Requirements

Run pip install -r requirements.txt

You can take a look of the Example Notebook.

Initialization:

from lda2py import LDA
lda = LDA_denvi(num_topic=20,max_iter=50)

Preprocessing text file:

file_path = "./VietNameseFacebookNovember.txt"
lda.preprocess(file_path)

Run the LDA model:

lda.fit()

You can see what are the most popular words for each topic:

lda.get_topic_word(no_topic=8,num_word=15)

This implementation is for Vietnamese dataset, you can customize the preprocessing stage for your language by changing details in utils.py

You can contact me at [email protected] for any details or questions related to this project

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
__pycache__		__pycache__
README.md		README.md
VNmese-stopwords.txt		VNmese-stopwords.txt
VietNameseFacebookNovember.txt		VietNameseFacebookNovember.txt
example.ipynb		example.ipynb
lda2py.py		lda2py.py
requirements.txt		requirements.txt
utils.py		utils.py