Fixed build break

git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@775 d0cd1f9f-072b-0410-8dd7-cf729c803f20
protez · Oct 9, 2012 · 225b325 · 225b325
commit 225b325
Show file tree

Hide file tree

Showing 954 changed files with 2,851,959 additions and 0 deletions.
diff --git a/AUTHORS b/AUTHORS
@@ -0,0 +1,8 @@
+Ray Smith (lead developer) <[email protected]>
+Phil Cheatle
+Simon Crouch
+Dan Johnson
+Mark Seaman
+Sheelagh Huddleston
+Chris Newton
+... and several others.
diff --git a/COPYING b/COPYING
@@ -0,0 +1,22 @@
+This package contains the Tesseract Open Source OCR Engine.
+Orignally developed at Hewlett Packard Laboratories Bristol and
+at Hewlett Packard Co, Greeley Colorado, all the code
+in this distribution is now licensed under the Apache License:
+
+** Licensed under the Apache License, Version 2.0 (the "License");
+** you may not use this file except in compliance with the License.
+** You may obtain a copy of the License at
+** http://www.apache.org/licenses/LICENSE-2.0
+** Unless required by applicable law or agreed to in writing, software
+** distributed under the License is distributed on an "AS IS" BASIS,
+** WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+** See the License for the specific language governing permissions and
+** limitations under the License.
+
+
+Other Dependencies and Licenses:
+================================
+
+Tesseract use Leptonica library (http://leptonica.com/) with a very weakly
+restricted copyright license (http://leptonica.com/about-the-license.html)
+
diff --git a/ChangeLog b/ChangeLog
@@ -0,0 +1,174 @@
+2012-02-01 - v3.02
+  * Moved ResultIterator/PageIterator to ccmain.
+  * Added Right-to-left/Bidi capability in the output iterators for Hebrew/Arabic.
+  * Added paragraph detection in layout analysis/post OCR.
+  * Fixed inconsistent xheight during training and over-chopping.
+  * Added simultaneous multi-language capability.
+  * Refactored top-level word recognition module.
+  * Added experimental equation detector.
+  * Improved handling of resolution from input images.
+  * Blamer module added for error analysis.
+  * Cleaned up externally used namespace by removing includes from baseapi.h.
+  * Removed dead memory mangagement code.
+  * Tidied up constraints on control parameters.
+  * Added support for ShapeTable in classifier and training.
+  * Refactored class pruner.
+  * Fixed training leaks and randomness.
+  * Major improvements to layout analysis for better image detection, diacritic detection, better textline finding, better tabstop finding.
+  * Improved line detection and removal.
+  * Added fixed pitch chopper for CJK.
+  * Added UNICHARSET to WERD_CHOICE to make mult-language handling easier.
+  * Fixed problems with internally scaled images.
+  * Added page and bbox to string in tr files to identify source of training data better.
+  * Fixes to Hindi Shiroreka splitter.
+  * Added word bigram correction.
+  * Reduced stack memory consumption and eliminated some ugly typedefs.
+  * Added new uniform classifier API.
+  * Added new training error counter.
+  * Fixed endian bug in dawg reader.
+  * Many other fixes, including the way in which the chopper finds chops and messes with the outline while it does so.
+
+2010-11-29 - V3.01
+  * Removed old/dead serialise/deserialze methods on *LISTIZED classes.
+  * Total rewrite of DENORM to better encapsulate operation and make
+    for potential to extract features from images.
+  * Thread-safety! Moved all critical globals and statics to members of the appropriate class. Tesseract is now thread-safe (multiple instances can be used in parallel in multiple threads.) with the minor exception that some control parameters are still global and affect all threads.
+  * Added Cube, a new recognizer for Arabic. Cube can also be used in combination with normal Tesseract for other languages with an improvement in accuracy at the cost of (much) lower speed. *There is no training module for Cube yet.*
+  * `OcrEngineMode` in `Init` replaces `AccuracyVSpeed` to control cube.
+  * Greatly improved segmentation search with consequent accuracy and speed improvements, especially for Chinese.
+  * Added `PageIterator` and `ResultIterator` as cleaner ways to get the full results out of Tesseract, that are not currently provided by any of the `TessBaseAPI::Get*` methods. All other methods, such as the `ETEXT_STRUCT` in particular are deprecated and will be deleted in the future.
+  * ApplyBoxes totally rewritten to make training easier. It can now cope with touching/overlapping training characters, and a new boxfile format allows word boxes instead of character boxes, BUT to use that you have to have already boostrapped the language with character boxes. "Cyclic dependency" on traineddata.
+  * Auto orientation and script detection added to page layout analysis.
+  * Deleted *lots* of dead code.
+  * Fixxht module replaced with scalable data-driven module.
+  * Output font characteristics accuracy improved.
+  * Removed the double conversion at each classification.
+  * Upgraded oldest structs to be classes and deprecated PBLOB.
+  * Removed non-deterministic baseline fit.
+  * Added fixed length dawgs for Chinese.
+  * Handling of vertical text improved.
+  * Handling of leader dots improved.
+  * Table detection greatly improved.
+  * Fixed a couple of memory leaks.
+  * Fixed font labels on output text. (Not perfect, but a lot better than before.)
+  * Cleanup and more bug fixes
+  * Special treatments for Hindi.
+  * Support for build in VS2010 with Microsoft Windows SDK for Windows 7 (thanks to Michael Lutz)
+
+2010-09-21 - V3.00
+  * Preparations for thread safety:
+     * Changed TessBaseAPI methods to be non-static
+     * Created a class hierarchy for the directories to hold instance data,
+       and began moving code into the classes.
+     * Moved thresholding code to a separate class.
+  * Added major new page layout analysis module.
+  * Added HOCR output (issues 221, 263: thanks to amkryukov).
+  * Added Leptonica as main image I/O and handling. Currently optional,
+    but in future releases linking with Leptonica will be mandatory.
+  * Ambiguity table rewritten to allow definite replacements in place
+    of fix_quotes.
+  * Added TessdataManager to combine data files into a single file.
+  * Some dead code deleted.
+  * VC++6 no longer supported. It can't cope with the use of templates.
+  * Many more languages added. 
+  * Doxygenation of most of the function header comments.
+  * Added man pages.
+  * Added bash completion script (issue 247: thanks to neskiem)
+  * Fix integer overview in thresholding (issue 366: thanks to Cyanide.Drake)
+  * Add Danish Fraktur support (issues 300, 360: thanks to 
+    [email protected])
+  * Fix file pointer leak (issue 359, thanks to yukihiro.nakadaira)
+  * Fix an error using user-words (Issue 345: thanks to max.markin)
+  * Fix a memory leak in tablefind.cpp (Issue 342, thanks to zdravco)
+  * Fix a segfault due to double fclose (Issue 320, thanks to souther)
+  * Fix an automake error (Issue 318, thanks to ichanjz)
+  * Fix a Win32 crash on fileFormatIsTiff() (Issues 304, 316, 317, 330, 347,
+    349, 352: thanks to nguyenq87, max.markin, zdenop)
+  * Fixed a number of errors in newer (stricter) versions of VC++ (Issues 
+    301, among others)
+
+2010-07-19  gettextize  <[email protected]>
+
+	* m4/gettext.m4: New file, from gettext-0.17.
+	* m4/iconv.m4: New file, from gettext-0.17.
+	* m4/lib-ld.m4: New file, from gettext-0.17.
+	* m4/lib-link.m4: New file, from gettext-0.17.
+	* m4/lib-prefix.m4: New file, from gettext-0.17.
+	* m4/nls.m4: New file, from gettext-0.17.
+	* m4/po.m4: New file, from gettext-0.17.
+	* m4/progtest.m4: New file, from gettext-0.17.
+	* Makefile.am (SUBDIRS): Add po.
+	(EXTRA_DIST): Add config/config.rpath.
+	* configure.ac (AC_CONFIG_FILES): Add po/Makefile.in.
+
+June  2006 - V1.0 of open source Tesseract checked-in.
+Sep 7 2006 - V1.01.
+          Added mfcpch.cpp and getopt.cpp for VC++.
+          Fixed problem with greyscale images and no libtiff.
+          Stopped debug window from being used for the usage output.
+          Fixed load of inttemp for big-endian architectures.
+          Fixed some Mac compilation issues.
+Oct 4 2006 - V1.02
+          Removed dependency on Aspirin.
+          Fixed a few missing Apache license headers.
+          Removed $log.
+Feb 2 2007 - V1.03
+          Added mftraining and cntraining.
+          Added baseapi with adaptive thresholding for grey and color.
+          Fixed many memory leaks.
+          Fixed several bugs including lack of use of adaptive classifier.
+          Added ifdefs to eliminate graphics code and add embedded platform support.
+          Incorporated several patches, including 64-bit builds, Mac builds.
+          Minor accuracy improvements.
+May 15 2007 - V1.04
+          Added dll exports for Windows.
+          Fixed name collisions with stl etc.
+          Made some preliminary changes ready for unicodeization.
+          Several bug fixes discovered during unicodeization.
+July 02 2007 - V2.00
+          Converted internal character handling to UTF8.
+          Trained with 6 languages.
+          Added unicharset_extractor, wordlist2dawg.
+          Added boxfile creation mode.
+          Added UNLV regression test capability.
+          Fixed problems with copyright and registered symbols.
+          Fixed extern "C" declarations problem.
+August 27 2007 - V2.01
+          Fixed UTF8 input problems with box file reader.
+          Fixed various infinite loops and crashes in dawg code.
+          Removed include of config_auto.h from host.h.
+          Added automatic wctype encoding to unicharset_extractor.
+          Fixed dawg table too full error.
+          Removed svn files from tarball.
+          Added new functions to tessdll.
+          Increased maximum utf8 string in a classification result to 8.
+
+January 23 2008 - V2.02
+          Improvements to clustering, training and classifier.
+          Major internationalization improvements for large-character-set
+          languages, eg Kannada.
+          Removed some compiler warnings.
+          Added multipage tiff support for training and running.
+          Updated graphics output to talk to new java-based viewer.
+          Added ability to save n-best lists.
+          Added leptonica support for more file types.
+          Improved Init/End to make them safe.
+          Reduced memory use of dictionaries.
+          Added some new APIs to TessBaseAPI.
+April 21 2008 - V2.02 (again)
+          Fixed namespace collisions with jpeg library (INT32).
+          Portability fixes for Windows for new code.
+          Updates to autoconf system for new code.
+April 22 2008 - V2.03
+          Fixed crash introduced in 2.02.
+	  Fixed lack of tessembedded.cpp in distribution.
+	  Added test for leptonica header files and conditional test for lib.
+June 30 2009 - V2.04
+	  Integrated bug fixes and patches and misc changes for portability.
+	  Integrated a patch to remove some of the "access" macros.
+	  Removed dependence on lua from the viewer, speeding it up
+	  dramatically.
+	  Fixed the viewer so it compiles and runs properly!
+	  Specifically fixing issues: 1, 63, 67, 71, 76, 81, 82, 106, 111,
+	  112, 128, 129, 130, 133, 135, 142, 143, 145, 147, 153, 154, 160,
+	  165, 170, 175, 177, 187, 192, 195, 199, 201, 205, 209, 108, 169