Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

combine_lang_model does not print correct usage help #1375

Closed
Shreeshrii opened this issue Mar 12, 2018 · 3 comments
Closed

combine_lang_model does not print correct usage help #1375

Shreeshrii opened this issue Mar 12, 2018 · 3 comments

Comments

@Shreeshrii
Copy link
Collaborator

Shreeshrii commented Mar 12, 2018

Usage instructions are given in https://github.com/tesseract-ocr/tesseract/blob/master/training/combine_lang_model.cpp#L43-58

// Check validity of input flags.
 if (FLAGS_input_unicharset.empty() || FLAGS_script_dir.empty() ||
     FLAGS_output_dir.empty() || FLAGS_lang.empty()) {
   tprintf("Usage: %s --input_unicharset filename --script_dir dirname\n",
           argv[0]);
   tprintf("  --output_dir rootdir --lang lang [--lang_is_rtl]\n");
   tprintf("  [--words file --puncs file --numbers file]\n");
   tprintf("Sets properties on the input unicharset file, and writes:\n");
   tprintf("rootdir/lang/lang.charset_size=ddd.txt\n");
   tprintf("rootdir/lang/lang.traineddata\n");
   tprintf("rootdir/lang/lang.unicharset\n");
   tprintf("If the 3 word lists are provided, the dawgs are also added to");
   tprintf(" the traineddata file.\n");
   tprintf("The output unicharset and charset_size files are just for human");
   tprintf(" readability.\n");

However, the actual info displayed is

USAGE: combine_lang_model
  --lang_is_rtl  True if lang being processed is written right-to-left  (type:bool default:false)
  --pass_through_recoder  If true, the recoder is a simple pass-through of the unicharset. Otherwise, potentially a compre
ssion of it  (type:bool default:false)
  --input_unicharset  Unicharset to complete and use in encoding  (type:string default:)
  --script_dir  Directory name for input script unicharsets  (type:string default:)
  --words  File listing words to use for the system dictionary  (type:string default:)
  --puncs  File listing punctuation patterns  (type:string default:)
  --numbers  File listing number patterns  (type:string default:)
  --output_dir  Root directory for output files  (type:string default:)
  --version_str  Version string to add to traineddata file  (type:string default:)
  --lang  Name of language being processed  (type:string default:)

So, it looks like that the program is calling a common training argument parser and exiting.

https://github.com/tesseract-ocr/tesseract/blob/master/training/combine_lang_model.cpp#L40

int main(int argc, char** argv) {
  tesseract::ParseCommandLineFlags(argv[0], &argc, &argv, true);

Related: #1297

@Shreeshrii Shreeshrii changed the title combine_lang_model does not print the custom usage info combine_lang_model does not print correct usage help Mar 12, 2018
@zdenop
Copy link
Contributor

zdenop commented Oct 1, 2018

@Shreeshrii : if you read it carefully you would see that it print "almost" the same information but in different order. Only additional information (not relevant to run command are:

Sets properties on the input unicharset file, and writes:
rootdir/lang/lang.charset_size=ddd.txt
rootdir/lang/lang.traineddata
rootdir/lang/lang.unicharset
If the 3 word lists are provided, the dawgs are also added to the traineddata file.
The output unicharset and charset_size files are just for human readability.

@zdenop
Copy link
Contributor

zdenop commented Oct 1, 2018

I remove duplicate help. Please check if everything works as expected.

@Shreeshrii
Copy link
Collaborator Author

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants