Results 1 - 10
of
18
Chinese ccgbank: extracting ccg derivations from the penn chinese treebank
- In Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010
, 2010
"... Automated conversion has allowed the development of wide-coverage corpora for a variety of grammar formalisms without the expense of manual annotation. Analysing new languages also tests formalisms, exposing their strengths and weaknesses. We present Chinese CCGbank, a 760,000 word corpus annotated ..."
Abstract
-
Cited by 11 (2 self)
- Add to MetaCart
Automated conversion has allowed the development of wide-coverage corpora for a variety of grammar formalisms without the expense of manual annotation. Analysing new languages also tests formalisms, exposing their strengths and weaknesses. We present Chinese CCGbank, a 760,000 word corpus annotated with Combinatory Categorial Grammar (CCG) derivations, induced automatically from the Penn Chinese Treebank (PCTB). We design parsimonious CCG analyses for a range of Chinese syntactic constructions, and transform the PCTB trees to produce them. Our process yields a corpus of 27,759 derivations, covering 98.1 % of the PCTB. 1
Creating a CCGbank and a wide-coverage CCG lexicon for German
- In Proc. of the 44th Annual Meeting of the ACL and 21st International Conference on Computational Linguistics (COLING/ACL-2006
, 2006
"... We present an algorithm which creates a German CCGbank by translating the syntax graphs in the German Tiger corpus into CCG derivation trees. The resulting corpus contains 46,628 derivations, covering 95 % of all complete sentences in Tiger. Lexicons extracted from this corpus contain correct lexica ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
(Show Context)
We present an algorithm which creates a German CCGbank by translating the syntax graphs in the German Tiger corpus into CCG derivation trees. The resulting corpus contains 46,628 derivations, covering 95 % of all complete sentences in Tiger. Lexicons extracted from this corpus contain correct lexical entries for 94 % of all
Priming Effects in Combinatory Categorial Grammar
"... This paper presents a corpus-based account of structural priming in human sentence processing, focusing on the role that syntactic representations play in such an account. We estimate the strength of structural priming effects from a corpus of spontaneous spoken dialogue, annotated syntactically wit ..."
Abstract
-
Cited by 6 (3 self)
- Add to MetaCart
This paper presents a corpus-based account of structural priming in human sentence processing, focusing on the role that syntactic representations play in such an account. We estimate the strength of structural priming effects from a corpus of spontaneous spoken dialogue, annotated syntactically with Combinatory Categorial Grammar (CCG) derivations. This methodology allows us to test a range of predictions that CCG makes about priming. In particular, we present evidence for priming between lexical and syntactic categories encoding partially satisfied subcategorization frames, and we show that priming effects exist both for incremental and normal-form CCG derivations. 1
Prune Diseased Branches to Get Healthy Trees! How to Find Erroneous Local Trees in a Treebank and Why It Matters
- In Proceedings of the Fourth Workshop on Treebanks and Linguistic Theories (TLT 2005
, 2005
"... Annotated corpora are essential for training and testing algorithms in natural language processing (NLP), but even so-called gold-standard corpora contain a significant number of annotation errors (cf. Dickinson 2005, and references therein). For part-of-speech annotation, these errors have been sho ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
(Show Context)
Annotated corpora are essential for training and testing algorithms in natural language processing (NLP), but even so-called gold-standard corpora contain a significant number of annotation errors (cf. Dickinson 2005, and references therein). For part-of-speech annotation, these errors have been shown to be problematic for
Parser-Based Retraining for Domain Adaptation of Probabilistic Generators
"... While the effect of domain variation on Penntreebank-trained probabilistic parsers has been investigated in previous work, we study its effect on a Penn-Treebank-trained probabilistic generator. We show that applying the generator to data from the British National Corpus results in a performance dro ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
(Show Context)
While the effect of domain variation on Penntreebank-trained probabilistic parsers has been investigated in previous work, we study its effect on a Penn-Treebank-trained probabilistic generator. We show that applying the generator to data from the British National Corpus results in a performance drop (from a BLEU score of 0.66 on the standard WSJ test set to a BLEU score of 0.54 on our BNC test set). We develop a generator retraining method where the domain-specific training data is automatically produced using state-of-the-art parser output. The retraining method recovers a substantial portion of the performance drop, resulting in a generator which achieves a BLEU score of 0.61 on our BNC test data. 1
Domain Adaptable Semantic Clustering in Statistical NLG
"... We present a hybrid natural language generation system that utilizes Discourse Representation Structures (DRSs) for statistically learning syntactic templates from a given domain of discourse in sentence “micro ” planning. In particular, given a training corpus of target texts, we extract semantic p ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
We present a hybrid natural language generation system that utilizes Discourse Representation Structures (DRSs) for statistically learning syntactic templates from a given domain of discourse in sentence “micro ” planning. In particular, given a training corpus of target texts, we extract semantic predicates and domain general tags from each sentence and then organize the sentences using supervised clustering to represent the “conceptual meaning ” of the corpus. The sentences, additionally tagged with domain specific information (determined separately), are reduced to templates. We use a SVM ranking model trained on a subset of the corpus to determine the optimal template during generation. The combination of the conceptual unit, a set of ranked syntactic templates, and a given set of information, constrains output selection and yields acceptable texts. Our system is evaluated with automatic, non–expert crowdsourced and expert evaluation metrics and, for generated weather, financial and biography texts, falls within acceptable ranges. Consequently, we argue that our DRS driven statistical and template–based method is robust and domain adaptable as, while content will be dictated by a target domain of discourse, significant investments in sentence planning can be minimized without sacrificing performance. 1
Engineering a Deep HPSG for Mandarin Chinese
"... In this paper, we present our on-going grammar development effort towards a linguistically precise and broad coverage grammar for Mandarin Chinese in the framework of HPSG. The use of LinGO Grammar Matrix facilitates the quick start of the development. We propose a series of linguistic treatments fo ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
In this paper, we present our on-going grammar development effort towards a linguistically precise and broad coverage grammar for Mandarin Chinese in the framework of HPSG. The use of LinGO Grammar Matrix facilitates the quick start of the development. We propose a series of linguistic treatments for a list of interesting phenomena. The analyses are largely compatible with the HPSG framework. In addition, the grammar also composes semantic representations in Minimum Recursion Semantics. Preliminary tests of the grammar on a phenomenon-oriented test suite show encouraging precision and coverage. 1
Joint Grammar and Treebank Development for Mandarin Chinese with HPSG
- Proceedings of the LREC’12
, 2012
"... We present the ongoing development of MCG, a linguistically deep and precise grammar for Mandarin Chinese together with its accompanying treebank, both based on the linguistic framework of HPSG, and using MRS as the semantic representation. We highlight some key features of our grammar design, and r ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
We present the ongoing development of MCG, a linguistically deep and precise grammar for Mandarin Chinese together with its accompanying treebank, both based on the linguistic framework of HPSG, and using MRS as the semantic representation. We highlight some key features of our grammar design, and review a number of challenging phenomena, with comparisons to alternative linguistic treatments and implementations. One of the distinguishing characteristics of our approach is the tight integration of grammar and treebank development. The two-step treebank annotation procedure benefits from the efficiency of the discriminant-based annotation approach, while giving the annotators full freedom of producing extra-grammatical structures. This not only allows the creation of a precise and full-coverage treebank with an imperfect grammar, but also provides prompt feedback for grammarians to identify the errors in the grammar design and implementation. Preliminary evaluation and error analysis shows that the grammar already covers most of the core phenomena for Mandarin Chinese, and the treebank annotation procedure reaches a stable speed of 35 sentences per hour with satisfying quality. Keywords:Grammar Engineering, Treebank Annotation, Syntax 1.
Implementing Categorial Grammar in Semantic Analysis: from a Frame Semantics ’ View
"... In this paper, we propose a new idea that semantic frames are taken as the functions, and semantic categories (usually labeled with semantic roles) are taken as arguments. Thus, a semantic frame can apply to semantic categories if semantic categories are consistent with the semantic frame. Beta-redu ..."
Abstract
- Add to MetaCart
In this paper, we propose a new idea that semantic frames are taken as the functions, and semantic categories (usually labeled with semantic roles) are taken as arguments. Thus, a semantic frame can apply to semantic categories if semantic categories are consistent with the semantic frame. Beta-reduction is used to represent the idea of the application of semantic frame to semantic categories. Semantic consistency is tested through β-unification. It is concluded semantic consistency problems are decidable if verbs are typable in the system of frames. 1