-
Notifications
You must be signed in to change notification settings - Fork 210
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
High memory usage question #236
Comments
@ilkerhk @mdakin is right. Normalization may consume more memory because it loads a bigram language model and some large lookups in memory. Even though language model uses succinct data structures, it will still use up at least 100MB of memory. Another culprit is the spelling graph. There is a spelling checker used in normalization which takes a lot of memory because it is not really memory optimized. I tested the latest grpc server with thousands of lines of sentences applying morphological analysis and normalization operations. Here is the result: I can say it is kind of expected to have this amount of memory. If I do not use normalization at all (running server without data root). graphic becomes: It is clearly evident that without normalization system uses much less memory perhaps at most around 250 MB. |
@ahmetaa The language model memory usage is probably larger than 100MB (model file is 80MB, I am assuming it does not map 1:1 in memory. From your graphs just normalization adds 1GB extra to zemberek, it could be interesting to see a breakdown of the usage. |
I've been using your tool for more than a year now. Here is how I use it on my PC: Select text (that is typed in the English alphabet), press a keyboard shortcut which calls Zemberek normalizer which deasciifies and corrects text. However, I have few suggestions: 1- If a word starts with an uppercase letter then correct it but do not change the case. Even better give an option for that. I wrote a bash script and a java wrapper that provides me the above 3 things. I can share it if you want. But those are just a work-around as I don't know the internals of Zemberek. I believe developers can you can integrate a better solution that provides the above features. |
Hello,
I mapped a global shortcut to a small script which gets the X-selection, corrects it, and pastes the corrected text back. The script uses your library which works great. Thanks for this nice API, I am very happy with it.
However, I realized that it uses about 1GB of memory and I keep the program running all the time in the background for fast response time. I am only calling "normalizer" (I added details below, maybe I am doing something wrong).
1GB seems high to me, is that normal? Is there a way to reduce this? My java is rusty, but maybe there is a way to load only the part that is needed ( which is normalizer).
Thanks.
Running this in the background :
java -classpath turkcelestir/zemberek-full.jar:./turkcelestir trCorrIlk
and here trCorrIlk is a small java program which calls zemberek API as below:
strP = normalizer.normalize(str);
The text was updated successfully, but these errors were encountered: