Skip to content

WeSearch_LexicalFiltering

JonathonRead edited this page Nov 8, 2011 · 27 revisions

Background

Working with a lattice of lexical hypotheses and an (über)tagger, we seek to develop a filtering function that discards unlikely hypotheses. The formalisation of the lexical filtering process may be found here.

TNT output for filtering of LE types

One such filter function maps PTB tags output from the TNT tagger onto LE Types. Mappings may be derived intuitively from inspection of a confusion matrix detailing the choices of TNT with respect to LE types.

An alternative approach is to find mappings based on the preferred outcomes of lexical filtering, (i.e. gains in parser efficiency versus losses in parser accuracy and coverage). These outcomes may be approximated by examining the relations between TNT precision, TNT recall and the ambiguity of LE types.

Frequency of LE types in JH0:

type frequency
n 1,134,661
p 498,443
v 454,513
d 335,667
aj 332,243
c 182,759
av 145,290
cm 33,618
pp 34,496
pt 3,864

Plots of the TNT performance on the most frequent LE types

http://dl.dropbox.com/u/680530/WeSearch/Lexical%20Filtering/roc.png

A plot of the precision vs. lexical items filtered for each handled LE type:

http://dl.dropbox.com/u/680530/WeSearch/Lexical%20Filtering/filtering.png

Effective threshold ranges for LE Types:

type min min-precision min-filtering max max-precision max-filtering
n 0.35 0.93639 493184 1.00 0.94253 210876
v 0.47 0.94183 217656 1.00 0.96008 130873
p 0.35 0.89550 1196773 1.00 0.94866 708213
d 0.59 0.92464 408522 1.00 0.93566 318760
aj 0.36 0.73078 629913 1.00 0.90287 276468
av 0.30 0.63747 430465 1.00 0.74478 115081

Related Work

Clone this wiki locally