Results 11 - 20
of
25
Reranking an N-Gram Supertagger
- In Proceedings of the TAG+ Workshop
, 2002
"... this paper, we investigate an approach to such a choice based on reranking a set of candidate supertags and their confidence scores. RankBoost (Freund et al., 1998) is the boosting algorithm that we use in order to learn to rerank outputs. It also has been used with good effect in reranking outputs ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
this paper, we investigate an approach to such a choice based on reranking a set of candidate supertags and their confidence scores. RankBoost (Freund et al., 1998) is the boosting algorithm that we use in order to learn to rerank outputs. It also has been used with good effect in reranking outputs of a statistical parser (Collins, 2000) and ranking sentence plans (Walker, Rambow and Rogati, 2001). RankBoost may learn to correct biases that are inherent in n-gram modeling which lead to systematic errors in supertagging (cf. (van Halteren, 1996)). RankBoost can also use a variety of local and long distance features more easily than n-gram-based approaches (cf. (Chen, Bangalore and Vijay-Shanker, 1999)) because it makes sparse data less of an issue
Combining Labeled and Unlabeled Data in Statistical Natural Language Parsing
, 2002
"... Prof. Aravind Joshi, my dissertation advisor has been my guide and mentor for the entire time that I spent at Penn. I thank him for all his academic help and personal kindness. The external member on my dissertation committee was Steven Abney, whose suggestions and advice have made the ideas present ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Prof. Aravind Joshi, my dissertation advisor has been my guide and mentor for the entire time that I spent at Penn. I thank him for all his academic help and personal kindness. The external member on my dissertation committee was Steven Abney, whose suggestions and advice have made the ideas presented here stronger. My dissertation committee members from Penn: Mitch Marcus, Mark Liberman and Martha Palmer provided questions whose answers shaped my dissertation proposal into the finished form in front of you. Many thanks to my academic collaborators; the work on prefix probabilities was done with Mark-Jan Nederhof and Giorgio Satta when they visited IRCS in 1998, the work on subcategorization frame learning was done in collaboration with Daniel Zeman when he visited IRCS in 2000. Thanks to B. Srinivas whose previous work provided the path to the experimental work in this dissertation. Thanks also to Paola Merlo and Suzanne Stevenson for discussions on their work on verb alternation classes. I also acknowledge the help of Woottiporn Tripasai in the extension of their work presented in this dissertation. Thanks to
Alternative Phrases - Theoretical Analysis and Practical Application
, 2001
"... All right, but apart from the sanitation, the medicine, education, wine, public order, irrigation, roads, the fresh-water system and public health, what have the Romans ever done for us? (Monty Python, The Life of Brian) Alternative phrases identify selected elements from a set and subject them ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
All right, but apart from the sanitation, the medicine, education, wine, public order, irrigation, roads, the fresh-water system and public health, what have the Romans ever done for us? (Monty Python, The Life of Brian) Alternative phrases identify selected elements from a set and subject them to particular scrutiny with respect to the sentence's predicate. For instance, in the above example, sanitation, medicine, etc. are all identified as elements in the set of things "the Romans have done for us" that should not be included in the response to the question. They are alternative responses to the desired ones. Alternative phrases come in a variety of constructions and perform a variety of tasks: excluding elements (apart from), expressing preference for particular elements (especially), and simply identifying representative examples (such as).
Comparing and Integrating Tree Adjoining Grammars
- In Proc. of the 5th International Workshop on Tree Adjoining Grammars and Related Formalism
, 2000
"... Grammars are core elements of many NLP applications. Grammars can be developed in two ways: built by hand or extracted from corpora. In this paper, we compare a handcrafted grammar with a Treebank grammar. We contend that recognizing substructures of the grammars' basic units is necessary not only b ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Grammars are core elements of many NLP applications. Grammars can be developed in two ways: built by hand or extracted from corpora. In this paper, we compare a handcrafted grammar with a Treebank grammar. We contend that recognizing substructures of the grammars' basic units is necessary not only because it allows grammars to be compared at a higher level, but also because it provides the building blocks for consistent and efficient integration of the grammars.
Parsing Comparison across Grammar Formalisms using Strongly Equivalent Grammars
, 2003
"... This article presents a novel approach to empirical comparison between parsers for different grammar formalisms such as LTAG and HPSG. The key idea of our approach is to use strongly equivalent grammars obtained by grammar conversion, which generate equivalent parse results for the same input. We va ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
This article presents a novel approach to empirical comparison between parsers for different grammar formalisms such as LTAG and HPSG. The key idea of our approach is to use strongly equivalent grammars obtained by grammar conversion, which generate equivalent parse results for the same input. We validate our approach by giving a formal proof of strong equivalence for an existing grammar conversion from LTAG to HPSG-style grammar. Experimental results using two pairs of LTAG and HPSG parsers with dynamic programming and CFG filtering reveal that the different ways of using the parsing techniques cause significant performance differences, and also suggest a definite way of improving these parsing techniques. RSUM. Nous prsentons une approche nouvelle pour comparer de manire empirique des analyseurs syntaxiques pour des formalismes grammaticaux comme LTAG et HPSG. L'ide centrale consiste utiliser des grammaires fortement quivalentes obtenues par conversion et produisant des analyses quivalentes pour la mme chane d'entre. Nous validons cette approche en fournissant une preuve formelle d'quivalence forte pour le mcanisme de conversion de grammaires de LTAG vers HPSG. Les rsultats exprimentaux obtenus avec deux paires d'analyseurs syntaxiques LTAG et HPSG en utilisant la programmation dynamique et le filtrage CFG ont montr que les diffrences de performance rsultent des diffrence d'adaptation des techniques d'analyse, suggrant une piste solide pour amliorer celles-ci.
Statistical Parsing Algorithms for Lexicalized Tree Adjoining Grammars
"... The goal of this dissertation is two-fold: to develop the theory of probabilistic Tree Adjoining Grammars (TAGs) and to present some practical results in the form of efficient parsing and estimation algorithms for probabilistic TAGs. The overall goal of developing the theory of probabilistic TAGs is ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
The goal of this dissertation is two-fold: to develop the theory of probabilistic Tree Adjoining Grammars (TAGs) and to present some practical results in the form of efficient parsing and estimation algorithms for probabilistic TAGs. The overall goal of developing the theory of probabilistic TAGs is to provide a simple, mathematically and linguistically well-formed probabilistic framework for statistical parsing. The practical results in parsing and estimation of probabilistic TAGs are developed with a view towards an increasingly unsupervised approach to the training of statistical parsers and language models. In particular, this proposal contains the following results: An algorithm for determining deficiency in a generative model for probabilistic TAGs. Anovel chart based head-corner parsing algorithm for probabilistic TAGs. A probability model for statistical parsing and a co-training method for training this parser which combines labeled and unlabeled data. An algorithm for computing prefix probabilities which can be used to predict the word most likely to occur after an initial substring of the input. The proposed work can be summarized in the following points: A separate evaluation of the co-training algorithm on a larger set of labeled and unlabeled data, in addition to the evaluation presented in this proposal. An evaluation of the pre x probability algorithm by comparing it with a trigram language model. An extension of techniques in learning subcategorization information and verb classes to produce TAG lexicons which can be directly used to improve performance of the co-training algorithm.
Coping With Problems in Grammars Automatically Extracted From Treebanks
, 2002
"... We report in this paper on an experiment on automatic extraction of a Tree Adjoining Grammar from the WSJ corpus of the Penn Treebank. We use an automatic tool developed by (Xia, 2001) properly adapted to our particular need. Rather than addressing general aspects of the automatic extraction we focu ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
We report in this paper on an experiment on automatic extraction of a Tree Adjoining Grammar from the WSJ corpus of the Penn Treebank. We use an automatic tool developed by (Xia, 2001) properly adapted to our particular need. Rather than addressing general aspects of the automatic extraction we focus on the problems we have found to extract a linguistically (and computationally) sound grammar and approaches to handle them.
Combining Supertagging and Lexicalized Tree-Adjoining Grammar Parsing ∗
"... In this paper we study various reasons and mechanisms for combining Supertagging with Lexicalized Tree-Adjoining Grammar (LTAG) parsing. Because of the highly lexicalized nature of the LTAG formalism, we experimentally show that notions other than sentence length play a factor in observed parse time ..."
Abstract
- Add to MetaCart
In this paper we study various reasons and mechanisms for combining Supertagging with Lexicalized Tree-Adjoining Grammar (LTAG) parsing. Because of the highly lexicalized nature of the LTAG formalism, we experimentally show that notions other than sentence length play a factor in observed parse times. In particular, syntactic lexical ambiguity and sentence complexity (both are terms we define in this paper) are the dominant factors that affect parsing efficiency. We show how a Supertagger can be used to drastically reduce the syntactic lexical ambiguity for a given input and can be used in combination with an LTAG parser to radically improve parsing efficiency. We then turn our attention to from parsing efficiency to parsing accuracy and provide a method by which we can effectively combine the output of a Supertagger and a statistical LTAG parser using a co-training algorithm for bootstrapping new labeled data. This combination method can be used to incorporate new labeled data from raw text to improve parsing accuracy. 1
Hanzi, Concept and Computation: A Preliminary Survey of Chinese Characters as a Knowledge Resource in NLP
"... There are many people to whom I owe a debt of thanks for their support, for the completion of my thesis and supported me in science as well in privacy during this time. First, I would like to sincerely thank my advisor, Prof. Dr Erhard Hin-richs, under whose influence the work here was initiated dur ..."
Abstract
- Add to MetaCart
There are many people to whom I owe a debt of thanks for their support, for the completion of my thesis and supported me in science as well in privacy during this time. First, I would like to sincerely thank my advisor, Prof. Dr Erhard Hin-richs, under whose influence the work here was initiated during my fruit-ful stay in Germany. Without his continuous and invaluable support, this work could not have been completed. I would also like to thank Prof. Dr. Eschbach-Szabo for reading this thesis and offering constructive comments. Besides my advisors, I am deeply grateful to the rest of my thesis commit-tee: Frank Richter and Fritz Hamm, for their kindly support and interesting questions. A special thanks goes to Lothar Lemnitzer, who proofread the thesis carefully and gave insightful comments. I would like to thank my parents for their life-long love and support. Last but not least, I also owe a lot of thanks to my lovely wife Hsiao-Wen, my

