LangDict

Responsible for reading the Linux provided dictionaries from '/usr/share/dict/' and creating dictionaries out of it (or any other file that contains a list of words with each line containing one word). Those dictionaries contain a collection of words and their near neighbours, defined through the Levenshtein distance (which can be specified through the additional parameter --levenshtein).

Through that we can create complex replacement dictionaries for the generation of our error detection framework.

Build instructions

make lang_dict

Usage

./lang_dict 
  --input=<PATH/TO/INPUT_WORDS>
  --output=<PATH/TO/OUTPUT_FILE>
  --archaic=<PATH/TO/ARCHAIC_WORDS>
  --levenshtein=1

For single language:

./lang_dict --output=<PATH/TO/FILE.pkl> [--archaic=FILE] --input=<LIST OF FILES>

Example:

./lang_dict --output=./langs/en/en_US.json --input=/usr/share/dict/american-english-insane

Structure

The generated json has the following structure:

[
 "real": [
  "<WORD_0>": {"id":xxx, "type": "REAL_WORD", "neighbor": [], "archaic": []},
  ...
  "<WORD_N>": {"id":xxx, "type": "REAL_WORD", "neighbor": [], "archaic": []}
 ],
 "archaic": [
   "<WORD_0>": {"id":xxx, "type": "ARCHAIC", "neighbor": [], "archaic": []},
   ...
   "<WORD_M>": {"id":xxx, "type": "ARCHAIC", "neighbor": [], "archaic": []},
 ]
]

Where the neighbor list is a collection of IDs that are real word neighbors of the current word, according to the levenshtein edit distance. The IDs within the archaic list are archaic word neighbors of the current one, according to the levenshtein edit distance.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Makefile		Makefile
README.md		README.md
cxxopts.h		cxxopts.h
json.h		json.h
lang_dict.cpp		lang_dict.cpp
metaphone.h		metaphone.h

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LangDict

Build instructions

Usage

Structure

About

Releases

Packages

Languages

naetherm/langdict

Folders and files

Latest commit

History

Repository files navigation

LangDict

Build instructions

Usage

Structure

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages