Results 1  10
of
285
Shallow Parsing with Conditional Random Fields
, 2003
"... Conditional random fields for sequence labeling offer advantages over both generative models like HMMs and classifiers applied at each sequence position. Among sequence labeling tasks in language processing, shallow parsing has received much attention, with the development of standard evaluati ..."
Abstract

Cited by 575 (8 self)
 Add to MetaCart
Conditional random fields for sequence labeling offer advantages over both generative models like HMMs and classifiers applied at each sequence position. Among sequence labeling tasks in language processing, shallow parsing has received much attention, with the development of standard evaluation datasets and extensive comparison among methods. We show here how to train a conditional random field to achieve performance as good as any reported base nounphrase chunking method on the CoNLL task, and better than any reported single model. Improved training methods based on modern optimization algorithms were critical in achieving these results. We present extensive comparisons between models and training methods that confirm and strengthen previous results on shallow parsing and training methods for maximumentropy models.
Discriminative Reranking for Natural Language Parsing
, 2005
"... This article considers approaches which rerank the output of an existing probabilistic parser. The base parser produces a set of candidate parses for each input sentence, with associated probabilities that define an initial ranking of these parses. A second model then attempts to improve upon this i ..."
Abstract

Cited by 327 (9 self)
 Add to MetaCart
(Show Context)
This article considers approaches which rerank the output of an existing probabilistic parser. The base parser produces a set of candidate parses for each input sentence, with associated probabilities that define an initial ranking of these parses. A second model then attempts to improve upon this initial ranking, using additional features of the tree as evidence. The strength of our approach is that it allows a tree to be represented as an arbitrary set of features, without concerns about how these features interact or overlap and without the need to define a derivation or a generative model which takes these features into account. We introduce a new method for the reranking task, based on the boosting approach to ranking problems described in Freund et al. (1998). We apply the boosting method to parsing the Wall Street Journal treebank. The method combined the loglikelihood under a baseline model (that of Collins [1999]) with evidence from an additional 500,000 features over parse trees that were not included in the original model. The new model achieved 89.75 % Fmeasure, a 13 % relative decrease in Fmeasure error over the baseline model’s score of 88.2%. The article also introduces a new algorithm for the boosting approach which takes advantage of the sparsity of the feature space in the parsing data. Experiments show significant efficiency gains for the new algorithm over the obvious implementation of the boosting approach. We argue that the method is an appealing alternative—in terms of both simplicity and efficiency—to work on feature selection methods within loglinear (maximumentropy) models. Although the experiments in this article are on natural language parsing (NLP), the approach should be applicable to many other NLP problems which are naturally framed as ranking tasks, for example, speech recognition, machine translation, or natural language generation.
Early Results for Named Entity Recognition with Conditional Random Fields, Feature Induction and WebEnhanced Lexicons
, 2003
"... This paper presents a feature induction method for CRFs. Founded on the principle of constructing only those feature conjunctions that significantly increase loglikelihood, the approach builds on that of Della Pietra et al (1997), but is altered to work with conditional rather than joint probabiliti ..."
Abstract

Cited by 258 (12 self)
 Add to MetaCart
This paper presents a feature induction method for CRFs. Founded on the principle of constructing only those feature conjunctions that significantly increase loglikelihood, the approach builds on that of Della Pietra et al (1997), but is altered to work with conditional rather than joint probabilities, and with a meanfield approximation and other additional modifications that improve efficiency specifically for a sequence model. In comparison with traditional approaches, automated feature induction offers both improved accuracy and significant reduction in feature count; it enables the use of richer, higherorder Markov models, and offers more freedom to liberally guess about which atomic features may be relevant to a task
SemiMarkov conditional random fields for information extraction
 In Advances in Neural Information Processing Systems 17
, 2004
"... We describe semiMarkov conditional random fields (semiCRFs), a conditionally trained version of semiMarkov chains. Intuitively, a semiCRF on an input sequence x outputs a “segmentation ” of x, in which labels are assigned to segments (i.e., subsequences) of x rather than to individual elements x ..."
Abstract

Cited by 248 (10 self)
 Add to MetaCart
(Show Context)
We describe semiMarkov conditional random fields (semiCRFs), a conditionally trained version of semiMarkov chains. Intuitively, a semiCRF on an input sequence x outputs a “segmentation ” of x, in which labels are assigned to segments (i.e., subsequences) of x rather than to individual elements xi of x. Importantly, features for semiCRFs can measure properties of segments, and transitions within a segment can be nonMarkovian. In spite of this additional power, exact learning and inference algorithms for semiCRFs are polynomialtime—often only a small constant factor slower than conventional CRFs. In experiments on five named entity recognition problems, semiCRFs generally outperform conventional CRFs. 1
Efficiently Inducing Features of Conditional Random Fields
, 2003
"... Conditional Random Fields (CRFs) are undirected graphical models, a special case of which correspond to conditionallytrained finite state machines. A key advantage of CRFs is their great flexibility to include a wide variety of arbitrary, nonindependent features of the input. Faced with ..."
Abstract

Cited by 227 (12 self)
 Add to MetaCart
Conditional Random Fields (CRFs) are undirected graphical models, a special case of which correspond to conditionallytrained finite state machines. A key advantage of CRFs is their great flexibility to include a wide variety of arbitrary, nonindependent features of the input. Faced with
Domain adaptation for statistical classifiers
 J. Artif. Int. Res
, 2006
"... ar ..."
(Show Context)
Scalable training of L1regularized loglinear models
 In ICML ’07
, 2007
"... The lbfgs limitedmemory quasiNewton method is the algorithm of choice for optimizing the parameters of largescale loglinear models with L2 regularization, but it cannot be used for an L1regularized loss due to its nondifferentiability whenever some parameter is zero. Efficient algorithms have ..."
Abstract

Cited by 169 (4 self)
 Add to MetaCart
The lbfgs limitedmemory quasiNewton method is the algorithm of choice for optimizing the parameters of largescale loglinear models with L2 regularization, but it cannot be used for an L1regularized loss due to its nondifferentiability whenever some parameter is zero. Efficient algorithms have been proposed for this task, but they are impractical when the number of parameters is very large. We present an algorithm OrthantWise Limitedmemory QuasiNewton (owlqn), based on lbfgs, that can efficiently optimize the L1regularized loglikelihood of loglinear models with millions of parameters. In our experiments on a parse reranking task, our algorithm was several orders of magnitude faster than an alternative algorithm, and substantially faster than lbfgs on the analogous L2regularized problem. We also present a proof that owlqn is guaranteed to converge to a globally optimal parameter vector. 1.
Dynamic Conditional Random Fields: Factorized Probabilistic Models for Labeling and Segmenting Sequence Data
 IN ICML
, 2004
"... In sequence modeling, we often wish to represent complex interaction between labels, such as when performing multiple, cascaded labeling tasks on the same sequence, or when longrange dependencies exist. We present dynamic conditional random fields (DCRFs), a generalization of linearchain cond ..."
Abstract

Cited by 167 (13 self)
 Add to MetaCart
(Show Context)
In sequence modeling, we often wish to represent complex interaction between labels, such as when performing multiple, cascaded labeling tasks on the same sequence, or when longrange dependencies exist. We present dynamic conditional random fields (DCRFs), a generalization of linearchain conditional random fields (CRFs) in which each time slice contains a set of state variables and edgesa distributed state representation as in dynamic Bayesian networks (DBNs)and parameters are tied across slices. Since exact
Table Extraction Using Conditional Random Fields
, 2003
"... The ability to find tables and extract information from them is a necessary component of data mining, question answering, and other information retrieval tasks. Documents often contain tables in order to communicate densely packed, multidimensional information. Tables do this by employing layout pa ..."
Abstract

Cited by 147 (10 self)
 Add to MetaCart
(Show Context)
The ability to find tables and extract information from them is a necessary component of data mining, question answering, and other information retrieval tasks. Documents often contain tables in order to communicate densely packed, multidimensional information. Tables do this by employing layout patterns to efficiently indicate fields and records in twodimensional form.
Improving machine translation performance by exploiting nonparallel corpora
 Computational Linguistics
, 2005
"... We present a novel method for discovering parallel sentences in comparable, nonparallel corpora. We train a maximum entropy classifier that, given a pair of sentences, can reliably determine whether or not they are translations of each other. Using this approach, we extract parallel data from large ..."
Abstract

Cited by 139 (3 self)
 Add to MetaCart
(Show Context)
We present a novel method for discovering parallel sentences in comparable, nonparallel corpora. We train a maximum entropy classifier that, given a pair of sentences, can reliably determine whether or not they are translations of each other. Using this approach, we extract parallel data from large Chinese, Arabic, and English nonparallel newspaper corpora. We evaluate the quality of the extracted data by showing that it improves the performance of a stateoftheart statistical machine translation system. We also show that a goodquality MT system can be built from scratch by starting with a very small parallel corpus (100,000 words) and exploiting a large nonparallel corpus. Thus, our method can be applied with great benefit to language pairs for which only scarce resources are available. 1.