The Kurdish Language Processing Toolkit
-
Updated
Sep 18, 2024 - Python
The Kurdish Language Processing Toolkit
A corpus for the Zazaki and Gorani languages
Pre tokenized models for Bodo. This repositoryincludes all the tokenized models to be used in the Neural Machine Translation. The models include pre tokenized models trained using ByteLevelBPETokenizer, BPETokenizer, SentencePieceBPETokenizer, BertWordPieceTokenizer
Towards Machine Translation for the Kurdish Language
Language identification of Kurdish and Zaza-Gorani languages (& variants)
Script Normalization for Unconventional Writing of Perso-Arabic scripts (ACL2023)
simple syntactic transfer based on the treebank translation
Language identification models for 17 European official languages and Corsican. To be used with ldig-python3 (https://github.com/lkevers/ldig-python3).
Add a description, image, and links to the less-resource-languages topic page so that developers can more easily learn about it.
To associate your repository with the less-resource-languages topic, visit your repo's landing page and select "manage topics."