@MISC{Sima'an96anoptimized, author = {Khalil Sima'an}, title = {An optimized algorithm for Data Oriented Parsing}, year = {1996} }
Bookmark
OpenURL
Abstract
This paper presents an optimization of a syntactic disambiguation algorithm for Data Oriented Parsing (DOP) (Bod 93) in particular, and for Stochastic Tree-Substitution Grammars (STSGs) in general. The main advantage of this algorithm on existing alternatives ((Bod 93), (Schabes & Waters 93), (Sima'an et al. 94)) is that its time-complexity is linear, instead of square, in grammarsize (and cubic in sentence length). It is particularly suitable for natural language STSGs which have many deep elementary-trees and a small underlying Context-Free Grammar (CFG). A first implementation of this algorithm is operational and is exhibiting substantial speed up in comparison to the unoptimized version. In addition to presenting the optimized algorithm, the paper reports experiments for measuring the disambiguation-accuracy, the expected sizes and the execution-times of various DOP models, which are projected from the ATIS domain. Keywords: Corpus-based statistical NLP, syntactic disambiguation...