Stars
Repository of the RANLP 2023 paper "Exploring the Landscape of Natural Language Processing Research".
A curated list of resources dedicated to open source GitHub repositories related to ChatGPT and OpenAI API
🛥 Vaporetto is a fast and lightweight pointwise prediction based tokenizer. This is a Python wrapper for Vaporetto.
A curated list of resources dedicated to Python libraries, LLMs, dictionaries, and corpora of NLP for Japanese
🎡 Build Python wheels for all the platforms with minimal configuration.
Sentence boundary disambiguation tool for Japanese texts (日本語文境界判定器)
🌿 An easy-to-use Japanese Text Processing tool, which makes it possible to switch tokenizers with small changes of code.
A Japanese tokenizer based on recurrent neural networks
The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.
Code for PyCon JP 2019 talk "Python による日本語自然言語処理 〜系列ラベリングによる実世界テキスト分析〜"
Text Classification Algorithms: A Survey
Python package for understanding the difficulty of text classification datasets. (in CoNNL 2018)
A simple website demonstrating TextRank's extractive summarization capability.
pythonの形態素解析サンプル
aim to use JapaneseTokenizer as easy as possible
An Integrated Corpus Tool With Multilingual Support for the Study of Language, Literature, and Translation
Example code for "Real-World Natural Language Processing"
Chinese NER using Lattice LSTM. Code for ACL 2018 paper.
Bidirectional Long-Short Term Memory tagger (bi-LSTM) (in DyNet) -- hierarchical (with word and character embeddings)
An open source framework for seq2seq models in PyTorch.
LSTM and QRNN Language Model Toolkit for PyTorch
An open-source NLP research library, built on PyTorch.
Unsupervised Word Segmentation with Neural Language Model
Code to train and use models from "Charagram: Embedding Words and Sentences via Character n-grams".