Skip to content

TgcsTop

GlennSlayden edited this page Sep 5, 2010 · 24 revisions

TGCS: TDL Grammars in C-Sharp

A project to develop a system for processing DELPH-IN style TDL grammars within the .NET and Mono runtime environments. So far, the system is able to do the following:

  • Robustly tokenize TDL and read in grammars, providing precise indication of syntax errors;
  • Build the type system and efficiently calculate the greatest lower bound closure;
  • Unify types to compute maximal inference for the authored type hierarchy;
  • Load feature structures into an internal representation that is hoped to support efficient DAG unification;
  • Display feature structures in a embryonic style inspired by LUI.

One research goal is to investigate the efficiency--under the intensive demands that are characteristic of unification grammars--of a "feature-centric TFS representation".

While research attention since Pereira (1985) has focused on reducing the "number" of DAG copy operations, this work seeks to reduce the cost of the copy operation itself. Capitalizing on the observations that, in DELPH-IN grammars:

  • the type hierarchy is fixed, and
  • a feature can only be introduced at one place in the type hierarchy,

this approach stores every edge in a hash list associated with its originating feature, so a particular TFS instance actually has no centralized node manifestation. There is exactly one list for each such feature, and this list freely mixes edges from any type for which it is appropriate. Since these lists only expand their capacity periodically, and typically do not contract when edges are abandoned for later reuse, it is hoped that operating system memory penalties will be greatly reduced.

Initial work with this model--belabored by extreme care in the considered application of C# value types--has shown great promise for adequate performance of linguistic applications.

[http://www.computational-semantics.com/webshare/20100905-demo.png]

Abstract

A new TDL parser and unification engine for the CLI

The Common Language Infrastructure (CLI, ECMA-335) is a modern standard for architecting extensible, platform-independent software. Well-known implementations include Mono and Microsoft's .NET Framework. These relatively new environments enjoy the robust support of actively developed software, and incorporate decades of research and best practice experience in systems architecture and developer productivity. In this status report, I present my work on a new unification engine for DELPH-IN style TDL grammars, written in C#. At this early stage, the system is able to load the ERG and compute its greatest-lower-bound (GLB) closure. In this step, a novel method improves on the Ait-Kaci technique and reveals a suboptimal closure result produced by the LKB. Informed by Pereira (1985), Wroblewski (1987), Tomabechi (1991), and recent work by van Lohuizen, I also present an approach to the internal representation of DAGs which is expected to minimize GC activity during parsing. Finally, single-core performance having reached physical limits, the focus in high-performance computing now concerns multi-core processors. Accordingly, another goal of this project is to examine opportunities for concurrent programming in the processing of analytical grammars.

References

Ulrich Callmeier. 2001. Efficient Parsing with Large-Scale Unification Grammars. MA Thesis, Universität des Saarlandes - Fachrichtung Informatik.

Ulrich Callmeier. 2000. PET: a platform for experimentation with efficient HPSG processing techniques. Natural Language Engineering 6(1): 99-107.

Bernd Kiefer, Hans-Ulrich Krieger, John Carroll, and Rob Malouf. 1999. A Bag of Useful Techniques for Efficient and robust Parsing. In Proceedings of the 37th annual meeting of the Association for Computational Linguistics. 473-480

Robert Malouf, John Carroll, and Ann Copestake. 2000. Robert Malouf, John Carroll, and Ann Copestake. 2000. Effcient feature structure operations witout compilation. Natural Language Engineering, 1(1):1-18.

Fernando C. N. Pereira. 1985. A structure-sharing representation for unification-based grammar formalisms. In Proceedings of the 23rd Annual Meeting of the Association for Computational Linguistics. Chicago, IL, 8-12 July 1985, pages 137-144.

Hideto Tomabechi. 1991. Quasi-destructive graph unification. In Proceedings of the 29th Annual Meeting of the Association for Computational Linguistics, Berkeley, CA.

Hideto Tomabechi. 1992. Quasi-destructive graph unifications with structure-sharing. In Proceedings of the 15th International Conference on Computational Linguistics (COLING-92), Nantes, France.

Hideto Tomabechi. 1995. Design of efficient unification for natural language. Journal of Natural Language Processing, 2(2):23-58.

Marcel P. van Lohuizen. 1999. Parallel processing of natural language parsers. In PARCO '99.

Marcel P. van Lohuizen. 2000. Exploiting parallelism in unification-based parsing. In Proceedings of the Sixth International Workshop on Parsing Technologies (IWPT 2000), Trento, Italy.

Marcel P. van Lohuizen. 2000. Memory-efficient and Thread-safe Quasi-Destructive Graph Unification. In Proceedings of the 38th Meeting of the Association for Computational Linguistics, Hong Kong, China, 2000.

Marcel P. van Lohuizen. 2001. A generic approach to parallel chart parsing with an application to LinGO. In Proceedings of the 39th Meeting of the Association for Computational Linguistics, Toulouse, France.

David A. Wroblewski. 1987. Nondestructive graph unification. In Proceedings of the 6th National Conference on Artificial Intelligence (AAAI-87), 582-589. Morgan Kaufmann.

Clone this wiki locally