uzn format

uzn is a simple text file format for describing sections of a scanned image. The migneuzn tool outputs in this format for its segmentation.

The format is simply:

left top width height freetext

So an example .uzn file is this:

  395   368  1633    78 Text/Latin
 2030   368  1634    78 Text/Greek
  388   478  1633  2275 Text/Greek
 2031   478  1634  2275 Text/Latin
  396  2852  1633  1002 Text/Greek
 2018  2852  1634  1002 Text/Latin
  471  3960  1565    75 Text/Latin
 1639  4141   685    62 AppCrit
  394  4293  3249  1482 AppCrit
 4078   462     5   606 AppCrit

Tesseract can read in uzn files, and use them instead of doing its own segmentation, on two conditions:

The segmentation mode PSM_SINGLE_COLUMN must be used (which is the default for the migneocr tool)
The uzn file must be named <imagebase>.uzn, where <imagebase> is the path of the image, without the file extension. So for scan001.png the uzn file must be named scan001.uzn.

The format is sometimes called a "zone file," and was created for the UNLV OCR tests in the 1990s. The name probably comes from "UNLV Zone."

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

uzn format

Clone this wiki locally