• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

Transformation-based error-driven learning and natural language processing: A case study in part-of-speech tagging (1995)

by E Brill
Venue:Computational Linguistics
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 924
Next 10 →

Conditional random fields: Probabilistic models for segmenting and labeling sequence data

by John Lafferty , 2001
"... We present conditional random fields, a framework for building probabilistic models to segment and label sequence data. Conditional random fields offer several advantages over hidden Markov models and stochastic grammars for such tasks, including the ability to relax strong independence assumptions ..."
Abstract - Cited by 3485 (85 self) - Add to MetaCart
We present conditional random fields, a framework for building probabilistic models to segment and label sequence data. Conditional random fields offer several advantages over hidden Markov models and stochastic grammars for such tasks, including the ability to relax strong independence assumptions made in those models. Conditional random fields also avoid a fundamental limitation of maximum entropy Markov models (MEMMs) and other discriminative Markov models based on directed graphical models, which can be biased towards states with few successor states. We present iterative parameter estimation algorithms for conditional random fields and compare the performance of the resulting models to HMMs and MEMMs on synthetic and natural-language data. 1.
(Show Context)

Citation Context

...models (Berger et al., 1996; Ratnaparkhi, 1996; McCallum et al., 2000) that may suffer from label bias. Non-probabilistic local decision models have also been widely used in segmentation and tagging (=-=Brill, 1995-=-; Roth, 1998; Abney et al., 1999). Because of the computational complexity of global training, these models are only trained to minimize the error of individual label decisions assuming that neighbori...

Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network

by Kristina Toutanova , Dan Klein, Christopher D. Manning, Yoram Singer - IN PROCEEDINGS OF HLT-NAACL , 2003
"... We present a new part-of-speech tagger that demonstrates the following ideas: (i) explicit use of both preceding and following tag contexts via a dependency network representation, (ii) broad use of lexical features, including jointly conditioning on multiple consecutive words, (iii) effective ..."
Abstract - Cited by 693 (23 self) - Add to MetaCart
We present a new part-of-speech tagger that demonstrates the following ideas: (i) explicit use of both preceding and following tag contexts via a dependency network representation, (ii) broad use of lexical features, including jointly conditioning on multiple consecutive words, (iii) effective use of priors in conditional loglinear models, and (iv) fine-grained modeling of unknown word features. Using these ideas together, the resulting tagger gives a 97.24% accuracy on the Penn Treebank WSJ, an error reduction of 4.4% on the best previous single automatically learned tagging result.

Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms

by Michael Collins , 2002
"... We describe new algorithms for training tagging models, as an alternative to maximum-entropy models or conditional random fields (CRFs). The algorithms rely on Viterbi decoding of training examples, combined with simple additive updates. We describe theory justifying the algorithms through a modific ..."
Abstract - Cited by 660 (13 self) - Add to MetaCart
We describe new algorithms for training tagging models, as an alternative to maximum-entropy models or conditional random fields (CRFs). The algorithms rely on Viterbi decoding of training examples, combined with simple additive updates. We describe theory justifying the algorithms through a modification of the proof of convergence of the perceptron algorithm for classification problems. We give experimental results on part-of-speech tagging and base noun phrase chunking, in both cases showing improvements over results for a maximum-entropy tagger.

Shallow Parsing with Conditional Random Fields

by Fei Sha, Fernando Pereira , 2003
"... Conditional random fields for sequence labeling offer advantages over both generative models like HMMs and classifiers applied at each sequence position. Among sequence labeling tasks in language processing, shallow parsing has received much attention, with the development of standard evaluati ..."
Abstract - Cited by 581 (8 self) - Add to MetaCart
Conditional random fields for sequence labeling offer advantages over both generative models like HMMs and classifiers applied at each sequence position. Among sequence labeling tasks in language processing, shallow parsing has received much attention, with the development of standard evaluation datasets and extensive comparison among methods. We show here how to train a conditional random field to achieve performance as good as any reported base noun-phrase chunking method on the CoNLL task, and better than any reported single model. Improved training methods based on modern optimization algorithms were critical in achieving these results. We present extensive comparisons between models and training methods that confirm and strengthen previous results on shallow parsing and training methods for maximum-entropy models.

An Algorithm that Learns What's in a Name

by Daniel M. Bikel, Richard Schwartz, Ralph M. Weischedel , 1999
"... In this paper, we present IdentiFinder^TM, a hidden Markov model that learns to recognize and classify names, dates, times, and numerical quantities. We have evaluated the model in English (based on data from the Sixth and Seventh Message Understanding Conferences [MUC-6, MUC-7] and broadcast news) ..."
Abstract - Cited by 372 (7 self) - Add to MetaCart
In this paper, we present IdentiFinder^TM, a hidden Markov model that learns to recognize and classify names, dates, times, and numerical quantities. We have evaluated the model in English (based on data from the Sixth and Seventh Message Understanding Conferences [MUC-6, MUC-7] and broadcast news) and in Spanish (based on data distributed through the First Multilingual Entity Task [MET-1]), and on speech input (based on broadcast news). We report results here on standard materials only to quantify performance on data available to the community, namely, MUC-6 and MET-1. Results have been consistently better than reported by any other learning algorithm. IdentiFinder's performance is competitive with approaches based on handcrafted rules on mixed case text and superior on text where case information is not available. We also present a controlled experiment showing the effect of training set size on performance, demonstrating that as little as 100,000 words of training data is adequate to get performance around 90% on newswire. Although we present our understanding of why this algorithm performs so well on this class of problems, we believe that significant improvement in performance may still be possible.

A Syntax-based Statistical Translation Model

by Kenji Yamada, Kevin Knight , 2001
"... We present a syntax-based statistical translation model. Our model transforms a source-language parse tree into a target-language string by applying stochastic operations at each node. These operations capture linguistic differences such as word order and case marking. Model parameters are es ..."
Abstract - Cited by 343 (16 self) - Add to MetaCart
We present a syntax-based statistical translation model. Our model transforms a source-language parse tree into a target-language string by applying stochastic operations at each node. These operations capture linguistic differences such as word order and case marking. Model parameters are estimated in polynomial time using an EM algorithm. The model produces word alignments that are better than those produced by IBM Model 5. 1
(Show Context)

Citation Context

...ulary size was 3463 tokens for English, and 3983 tokens for Japanese, with 2029 tokens for English and 2507 tokens for Japanese occurring only once in the corpus. Brill's part-of-speech (POS) tagger (=-=Brill, 1995-=-) and Collins' parser (Collins, 1999) were used to obtain parse trees for the English side of the corpus. The output of Collins' parser was 3 Note that the algorithm performs full EM counting, whereas...

Chunking with Support Vector Machines

by Taku Kudo, Yuji Matsumoto , 2001
"... We apply Support Vector Machines (SVMs) to identify English base phrases (chunks). SVMs are known to achieve high generalization performance even with input data of high dimensional feature spaces. Furthermore, by the Kernel principle, SVMs can carry out training with smaller computational overhead ..."
Abstract - Cited by 219 (11 self) - Add to MetaCart
We apply Support Vector Machines (SVMs) to identify English base phrases (chunks). SVMs are known to achieve high generalization performance even with input data of high dimensional feature spaces. Furthermore, by the Kernel principle, SVMs can carry out training with smaller computational overhead independent of their dimensionality. We apply weighted voting of 8 SVMsbased systems trained with distinct chunk representations. Experimental results show that our approach achieves higher accuracy than previous approaches.

Towards answering opinion questions: Separating facts from opinions and identifying the polarity of opinion sentences

by Hong Yu - In Proceedings of EMNLP-03 , 2003
"... Opinion question answering is a challenging task for natural language processing. In this paper, we discuss a necessary component for an opinion question answering system: separating opinions from fact, at both the document and sentence level. We present a Bayesian classifier for discriminating betw ..."
Abstract - Cited by 215 (4 self) - Add to MetaCart
Opinion question answering is a challenging task for natural language processing. In this paper, we discuss a necessary component for an opinion question answering system: separating opinions from fact, at both the document and sentence level. We present a Bayesian classifier for discriminating between documents with a preponderance of opinions such as editorials from regular news stories, and describe three unsupervised, statistical techniques for the significantly harder task of detecting opinions at the sentence level. We also present a first model for classifying opinion sentences as positive or negative in terms of the main perspective being expressed in the opinion. Results from a large collection of news stories and a human evaluation of 400 sentences are reported, indicating that we achieve very high performance in document classification (upwards of 97 % precision and recall), and respectable performance in detecting opinions and classifying them at the sentence level as positive, negative, or neutral (up to 91 % accuracy). 1
(Show Context)

Citation Context

...q ffi all POSfi ADJ represents the collocation frequency of all words all of part of speech POS fi with ADJ and 9 is a smoothing constant ( 9s;:s=< in our case). We used Brill’s tagger (=-=Brill, 1995-=-) to obtain part-of-speech information. 5.2 Sentence Polarity Tagging As our measure of semantic orientation across an entire sentence we used the average per word loglikelihood scores defined in the ...

Learning to resolve natural language ambiguities: A unified approach

by Dan Roth - In Proceedings of the National Conference on Artificial Intelligence. 806-813. Segond F., Schiller A., Grefenstette &amp; Chanod F.P , 1998
"... distinct semanticonceptsuch as interest rate and has interest in Math are conflated in ordinary text. We analyze a few of the commonly used statistics based The surrounding context- word associations and syn-and machine learning algorithms for natural language tactic patterns in this case- are suffl ..."
Abstract - Cited by 172 (78 self) - Add to MetaCart
distinct semanticonceptsuch as interest rate and has interest in Math are conflated in ordinary text. We analyze a few of the commonly used statistics based The surrounding context- word associations and syn-and machine learning algorithms for natural language tactic patterns in this case- are sufflcicnt to identify disambiguation tasks and observe tha they can bc recast as learning linear separators in the feature space. the correct form. Each of the methods makes a priori assumptions, which Many of these arc important stand-alone problems it employs, given the data, when searching for its hy- but even more important is thei role in many applicapothesis. Nevertheless, as we show, it searches a space tions including speech recognition, machine translation, that is as rich as the space of all linear separators. information extraction and intelligent human-machine We use this to build an argument for a data driven interaction. Most of the ambiguity resolution problems approach which merely searches for a good linear sepa- are at the lower level of the natural language inferences rator in the feature space, without further assumptions chain; a wide range and a large number of ambigui-

Using Predicate-Argument Structures for Information Extraction

by Mihai Surdeanu, Sanda Harabagiu, John Williams, Paul Aarseth - IN PROCEEDINGS OF ACL 2003 , 2003
"... In this paper we present a novel, customizable IE paradigm that takes advantage of predicate-argument structures. We also introduce a new way of automatically identifying predicate argument structures, which is central to our IE paradigm. It is based on: (1) an extended set of features; and ( ..."
Abstract - Cited by 158 (4 self) - Add to MetaCart
In this paper we present a novel, customizable IE paradigm that takes advantage of predicate-argument structures. We also introduce a new way of automatically identifying predicate argument structures, which is central to our IE paradigm. It is based on: (1) an extended set of features; and (2) inductive decision tree learning.
(Show Context)

Citation Context

...Documents are processed in parallel to: (1) parse them syntactically, and (2) recognize the NEs. The full parser first performs part-of-speech (POS) tagging using transformation based learning (TBL) (=-=Brill, 1995-=-). Then non-recursive, or basic, noun phrases (NPB) are identified using the TBL method reported in (Ngai and Florian, 2001). At last, the dependency parser presented in (Collins, 1997) is used to gen...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University