Skip to content

A collection of OCR'd and machine-corrected Greek texts. This base repository contains Git submodules for the different works and an inventory with full names and meta data for each submodule.

Notifications You must be signed in to change notification settings

OpenGreekAndLatin/greek-dev

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

greek-dev

A collection of OCR'd and machine-corrected Greek texts. This base repository contains Git submodules for the different works and an inventory with full names and meta data for each submodule.

The current state is: to be manually corrected. The TOC is ordered according to priority.

===============

TOC:
  1. Libanius (1903-63). Opera. Recensuit Richardus Foerster 1, pt. 1

Currently w/o inventory:

  1. Libanius (1903-63). Opera. Recensuit Richardus Foerster 2

================

Legend for NLP attributes:

1.00 WORD: spell-checked word (e.g. ψυχή)

0.99 CORRWORD: word provided by alignment with another edition

0.98 UCWORD: word that is correctly spell-checked only in upper-case (i.e. accents can be wrong - e.g. αληθεία --> ΑΛΗΘΕΙΑ)

0.97 SYLLABICSEQ: well-formed syllabic sequence, not found by the previous spellcheckers (e.g. καταλάβοκος)

0.96 CHARSEQ: sequence of Greek characters, which is not a well-formed syllabic sequence (e.g. γδτσαλλ)

0.95 BADONE: single bad character (e.g. ^)

0.94 BADMANY: random sequence of Greek and Latin characters (e.g. αbγm)

0.90, 0.89, 0.88, 0.87, 0.86 etc.: suggestions from the spellchecker, in the order provided by the spellchecker

0.70 same meaning as 0.99 (CORRWORD), after automatic correction

0.10 Latin character sequence, which usually is a correct Latin (or English) word, but which is not spell-checked.

About

A collection of OCR'd and machine-corrected Greek texts. This base repository contains Git submodules for the different works and an inventory with full names and meta data for each submodule.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published