Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do better segmentation #217

Open
danielt998 opened this issue Sep 10, 2024 · 1 comment
Open

Do better segmentation #217

danielt998 opened this issue Sep 10, 2024 · 1 comment

Comments

@danielt998
Copy link
Owner

We should take inspiration from https://github.com/fxsjy/jieba either finding a Java a library to do so or producing our own implementation. My understanding is that it works by producing a DAG and looking at all the possible ways of segmenting a sentence/clause and using word frequency to calculate a probability.

@danielt998
Copy link
Owner Author

The Java equivalent does exist in Maven:
https://mvnrepository.com/artifact/com.huaban/jieba-analysis
https://github.com/huaban/jieba-analysis

It does look unmaintained though - I don't know if it'll need upgrading for use with newer Java versions etc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant