Results 1 - 10
of
25
Automated Extraction Of Tags From The Penn Treebank
, 2000
"... The accuracy of statistical parsing models can be improved with the use of lexical information. Statistical parsing using Lexicalized tree adjoining grammar (LTAG), a kind of lexicalized grammar, has remained relatively unexplored. We believe that is largely in part due to the absence of large cor ..."
Abstract
-
Cited by 54 (3 self)
- Add to MetaCart
The accuracy of statistical parsing models can be improved with the use of lexical information. Statistical parsing using Lexicalized tree adjoining grammar (LTAG), a kind of lexicalized grammar, has remained relatively unexplored. We believe that is largely in part due to the absence of large corpora accurately bracketed in terms of a perspicuous yet broad coverage LTAG. Our work attempts to alleviate this difficulty. We extract different LTAGs from the Penn Treebank. We show that certain strategies yield an improved extracted LTAG in terms of compactness, broad coverage, and supertagging accuracy. Furthermore, we perform a preliminary investigation in smoothing these grammars by means of an external linguistic resource, namely, the tree families of an XTAG grammar, a hand built grammar of English.
Statistical Parsing With an Automatically Extracted Tree Adjoining Grammar
, 2003
"... Introduction Why use tree adjoining grammars (TAG) for statistical parsing? It might be thought that its added formal power makes parameter estimation unnecessarily di#cult; or that whatever benefits it provides---the ability to model unbounded cross-serial dependencies, for example--- are inconseq ..."
Abstract
-
Cited by 23 (1 self)
- Add to MetaCart
Introduction Why use tree adjoining grammars (TAG) for statistical parsing? It might be thought that its added formal power makes parameter estimation unnecessarily di#cult; or that whatever benefits it provides---the ability to model unbounded cross-serial dependencies, for example--- are inconsequential for statistical parsing, which is concerned with the probable rather than the possible. But just as TAG is not by itself a complete linguistic theory, but a formalism for specifying linguistic theories, it should not be viewed as a statistical model but a formalism for specifying statistical models. The advantage that TAG has over CFG is that it assigns richer structural descriptions to sentences; specifically, in addition to parse trees, it assigns derivation trees (defined below) on which features of a parsing model can be defined. In this chapter we explore the use of TAG for statistical parsing. We start by examining PCFG-based parsers which use head-lexicalization to capture
Handling structural divergences and recovering dropped arguments in a Korean/English machine translation system
- In Proceedings of the AMTA
, 2000
"... Abstract. This paper describes an approach for handling structural divergences and recovering dropped arguments in an implemented Korean to English machine translation system. The approach relies on canonical predicate-argument structures (or dependency structures), whichprovide a suitable pivot rep ..."
Abstract
-
Cited by 19 (5 self)
- Add to MetaCart
Abstract. This paper describes an approach for handling structural divergences and recovering dropped arguments in an implemented Korean to English machine translation system. The approach relies on canonical predicate-argument structures (or dependency structures), whichprovide a suitable pivot representation for the handling of structural divergences and the recovery of dropped arguments. It can also be converted to and from the interface representations of many o-the-shelf parsers and generators. 1
Extending the coverage of a CCG system
- Journal of Language and Computation
, 2004
"... ABSTRACT: We demonstrate ways to enhance the coverage of a symbolic NLP system through data-intensive and machine learning techniques, while preserving the advantages of using a principled symbolic grammar formalism. We automatically acquire a large syntactic CCG lexicon from the Penn Treebank and c ..."
Abstract
-
Cited by 12 (5 self)
- Add to MetaCart
ABSTRACT: We demonstrate ways to enhance the coverage of a symbolic NLP system through data-intensive and machine learning techniques, while preserving the advantages of using a principled symbolic grammar formalism. We automatically acquire a large syntactic CCG lexicon from the Penn Treebank and combine it with semantic and morphological infor-mation from another hand-built lexicon using decision tree and maximum entropy classifiers. We also integrate statistical preprocessing methods in our system.
Towards automatic generation of natural language generation systems
- In COLING02, Proceedings of the Ninteenth International Conference on Computational Linguistics
, 2002
"... Systems that interact with the user via natural language are in their infancy. As these systems mature and become more complex, it would be desirable for a system developer if there were an automatic method for creating natural language generation components that can produce quality output efficient ..."
Abstract
-
Cited by 12 (4 self)
- Add to MetaCart
Systems that interact with the user via natural language are in their infancy. As these systems mature and become more complex, it would be desirable for a system developer if there were an automatic method for creating natural language generation components that can produce quality output efficiently. We conduct experiments that show that this goal appears to be realizable. In particular we discuss a natural language generation system that is composed of SPoT, a trainable sentence planner, and FER-GUS, a stochastic surface realizer. We show how these stochastic NLG components can be made to work together, that they can be ported to new domains with apparent ease, and that such NLG components can be integrated in a real-time dialog system. 1
Automatically Extracting and Comparing Lexicalized Grammars for Different Languages
- In Proc. of the Seventeenth International Joint 30 / Data Oriented Parsing Conference on Arti Intelligence (IJCAI-2001
, 2001
"... In this paper, we present a quantitative comparison between the syntactic structures of three languages: English, Chinese and Korean. This is made possible by first extracting Lexicalized Tree Adjoining Grammars from annotated corpora for each language and then performing the comparison on the ..."
Abstract
-
Cited by 10 (1 self)
- Add to MetaCart
In this paper, we present a quantitative comparison between the syntactic structures of three languages: English, Chinese and Korean. This is made possible by first extracting Lexicalized Tree Adjoining Grammars from annotated corpora for each language and then performing the comparison on the extracted grammars. We found that the majority of the core grammar structures for these three languages are easily inter-mappable. 1
Providing Robustness for a CCG System
- IN PROCEEDINGS OF THE WORKSHOP ON LINGUISTIC THEORY AND GRAMMAR IMPLEMENTATION, ESSLLI
, 2000
"... We demonstrate ways to preserve the advantages of using a symbolic grammar formalism as the basis of an NLP system while enhancing its robustness. We automatically acquire a CCG lexicon, combine it with semantic and morphological information from another hand-built, underspecified lexicon, and in ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
We demonstrate ways to preserve the advantages of using a symbolic grammar formalism as the basis of an NLP system while enhancing its robustness. We automatically acquire a CCG lexicon, combine it with semantic and morphological information from another hand-built, underspecified lexicon, and integrate it with statistical preprocessing methods.
Evaluating Grammar Formalisms For Applications To Natural Language Processing And Biological Sequence Analysis
, 2004
"... EVALUATING GRAMMAR FORMALISMS FOR APPLICATIONS TO NATURAL LANGUAGE PROCESSING AND BIOLOGICAL SEQUENCE ANALYSIS David Chiang Supervisor: Aravind K. Joshi Grammars are gaining importance in statistical natural language processing and computational biology as a means of encoding theories and struc ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
EVALUATING GRAMMAR FORMALISMS FOR APPLICATIONS TO NATURAL LANGUAGE PROCESSING AND BIOLOGICAL SEQUENCE ANALYSIS David Chiang Supervisor: Aravind K. Joshi Grammars are gaining importance in statistical natural language processing and computational biology as a means of encoding theories and structuring algorithms. But one serious obstacle to applications of grammars is that formal language theory traditionally classifies grammars according to their weak generative capacity (WGC)---what sets of strings they generate---and tends to ignore strong generative capacity (SGC)---what sets of structural descriptions they generate---even though the latter is more relevant to applications.
Automatising the Learning of Lexical Patterns: an Application to the Enrichment of WordNet by Extracting Semantic Relationships from Wikipedia
- Journal of Data and Knowledge Engineering
, 2007
"... This paper describes an automatic approach to identify lexical patterns that represent semantic relationships between concepts in an on-line encyclopedia. Next, these patterns can be applied to extend existing ontologies or semantic networks with new relations. The experiments have been performed wi ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
This paper describes an automatic approach to identify lexical patterns that represent semantic relationships between concepts in an on-line encyclopedia. Next, these patterns can be applied to extend existing ontologies or semantic networks with new relations. The experiments have been performed with the Simple English Wikipedia and WordNet 1.7. A new algorithm has been devised for automatically generalising the lexical patterns found in the encyclopedia entries. We have found general patterns for the hyperonymy, hyponymy, holonymy and meronymy relations and, using them, we have extracted more than 2600 new relationships that did not appear in WordNet originally. The precision of these relationships depends on the degree of generality chosen for the patterns and the type of relation, being around 60-70 % for the best combinations proposed.
Experimental Evaluation of LTAG-based Features for Semantic Role Labeling
"... This paper proposes the use of Lexicalized Tree-Adjoining Grammar (LTAG) formalism as an important additional source of features for the Semantic Role Labeling (SRL) task. Using a set of one-vs-all Support Vector Machines (SVMs), we evaluate these LTAG-based features. Our experiments show that LTAG- ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
This paper proposes the use of Lexicalized Tree-Adjoining Grammar (LTAG) formalism as an important additional source of features for the Semantic Role Labeling (SRL) task. Using a set of one-vs-all Support Vector Machines (SVMs), we evaluate these LTAG-based features. Our experiments show that LTAG-based features can improve SRL accuracy significantly. When compared with the best known set of features that are used in state of the art SRL systems we obtain an improvement in F-score from 82.34 % to 85.25%.

