This repository contains State of the Art Language models and Classifier for Bengali language, which is primarily spoken by the Bengalis in South Asia.
The models trained here have been used in Natural Language Toolkit for Indic Languages (iNLTK)
Architecture/Dataset | Bengali Wikipedia Articles |
---|---|
ULMFiT | 41.2 |
TransformerXL | 39.3 |
Dataset | Accuracy | MCC | Notebook to Reproduce results |
---|---|---|---|
Bengali News Articles (Soham Articles) | 90.71 | 87.92 | Link |
Architecture | Visualization |
---|---|
ULMFiT | Embeddings projection |
TransformerXL | Embeddings projection |
Dataset | Dataset size (train, valid, test) | Accuracy | MCC | Notebook to Reproduce results |
---|---|---|---|---|
Bengali News Articles (Soham Articles) | (11284, 1411, 1411) | 90.71 | 87.92 | Link |
Dataset | Dataset size (train, valid, test) | Accuracy | MCC | Notebook to Reproduce results |
---|---|---|---|---|
Bengali News Articles (Soham Articles) | (112, 1411, 1411) | 69.88 | 61.56 | Link |
Dataset | Dataset size (train, valid, test) | Accuracy | MCC | Notebook to Reproduce results |
---|---|---|---|---|
Bengali News Articles (Soham Articles) | (112, 1411, 1411) | 74.06 | 65.08 | Link |
Download pretrained Language Model from here
Trained tokenizer using Google's sentencepiece
Download the trained model and vocabulary from here