Results 1 - 10
of
75
Posterior regularization for structured latent variable models
- Journal of Machine Learning Research
, 2010
"... We present posterior regularization, a probabilistic framework for structured, weakly supervised learning. Our framework efficiently incorporates indirect supervision via constraints on posterior distributions of probabilistic models with latent variables. Posterior regularization separates model co ..."
Abstract
-
Cited by 138 (8 self)
- Add to MetaCart
(Show Context)
We present posterior regularization, a probabilistic framework for structured, weakly supervised learning. Our framework efficiently incorporates indirect supervision via constraints on posterior distributions of probabilistic models with latent variables. Posterior regularization separates model complexity from the complexity of structural constraints it is desired to satisfy. By directly imposing decomposable regularization on the posterior moments of latent variables during learning, we retain the computational efficiency of the unconstrained model while ensuring desired constraints hold in expectation. We present an efficient algorithm for learning with posterior regularization and illustrate its versatility on a diverse set of structural constraints such as bijectivity, symmetry and group sparsity in several large scale experiments, including multi-view learning, cross-lingual dependency grammar induction, unsupervised part-of-speech induction,
Generalized expectation criteria for semi-supervised learning of conditional random fields
- In In Proc. ACL, pages 870 – 878
, 2008
"... This paper presents a semi-supervised training method for linear-chain conditional random fields that makes use of labeled features rather than labeled instances. This is accomplished by using generalized expectation criteria to express a preference for parameter settings in which the model’s distri ..."
Abstract
-
Cited by 108 (11 self)
- Add to MetaCart
This paper presents a semi-supervised training method for linear-chain conditional random fields that makes use of labeled features rather than labeled instances. This is accomplished by using generalized expectation criteria to express a preference for parameter settings in which the model’s distribution on unlabeled data matches a target distribution. We induce target conditional probability distributions of labels given features from both annotated feature occurrences in context and adhoc feature majority label assignment. The use of generalized expectation criteria allows for a dramatic reduction in annotation time by shifting from traditional instance-labeling to feature-labeling, and the methods presented outperform traditional CRF training and other semi-supervised methods when limited human effort is available. 1
Using Universal Linguistic Knowledge to Guide Grammar Induction
"... We present an approach to grammar induction that utilizes syntactic universals to improve dependency parsing across a range of languages. Our method uses a single set of manually-specified language-independent rules that identify syntactic dependencies between pairs of syntactic categories that comm ..."
Abstract
-
Cited by 57 (7 self)
- Add to MetaCart
(Show Context)
We present an approach to grammar induction that utilizes syntactic universals to improve dependency parsing across a range of languages. Our method uses a single set of manually-specified language-independent rules that identify syntactic dependencies between pairs of syntactic categories that commonly occur across languages. During inference of the probabilistic model, we use posterior expectation constraints to require that a minimum proportion of the dependencies we infer be instances of these rules. We also automatically refine the syntactic categories given in our coarsely tagged input. Across six languages our approach outperforms state-of-theart unsupervised methods by a significant margin. 1 1
Learning From Measurements in Exponential Families
"... Given a model family and a set of unlabeled examples, one could either label specific examples or state general constraints—both provide information about the desired model. In general, what is the most cost-effective way to learn? To address this question, we introduce measurements, a general class ..."
Abstract
-
Cited by 54 (1 self)
- Add to MetaCart
Given a model family and a set of unlabeled examples, one could either label specific examples or state general constraints—both provide information about the desired model. In general, what is the most cost-effective way to learn? To address this question, we introduce measurements, a general class of mechanisms for providing information about a target model. We present a Bayesian decision-theoretic framework, which allows us to both integrate diverse measurements and choose new measurements to make. We use a variational inference algorithm, which exploits exponential family duality. The merits of our approach are demonstrated on two sequence labeling tasks. 1.
Active learning by labeling features
- In Proc. of EMNLP
, 2009
"... Methods that learn from prior information about input features such as generalized expectation (GE) have been used to train accurate models with very little effort. In this paper, we propose an active learning approach in which the machine solicits “labels ” on features rather than instances. In bot ..."
Abstract
-
Cited by 43 (11 self)
- Add to MetaCart
Methods that learn from prior information about input features such as generalized expectation (GE) have been used to train accurate models with very little effort. In this paper, we propose an active learning approach in which the machine solicits “labels ” on features rather than instances. In both simulated and real user experiments on two sequence labeling tasks we show that our active learning method outperforms passive learning with features as well as traditional active learning with instances. Preliminary experiments suggest that novel interfaces which intelligently solicit labels on multiple features facilitate more efficient annotation. 1
Dependency grammar induction via bitext projection constraints
- In ACL-IJCNLP
, 2009
"... Broad-coverage annotated treebanks necessary to train parsers do not exist for many resource-poor languages. The wide availability of parallel text and accurate parsers in English has opened up the possibility of grammar induction through partial transfer across bitext. We consider generative and di ..."
Abstract
-
Cited by 35 (5 self)
- Add to MetaCart
(Show Context)
Broad-coverage annotated treebanks necessary to train parsers do not exist for many resource-poor languages. The wide availability of parallel text and accurate parsers in English has opened up the possibility of grammar induction through partial transfer across bitext. We consider generative and discriminative models for dependency grammar induction that use word-level alignments and a source language parser (English) to constrain the space of possible target trees. Unlike previous approaches, our framework does not require full projected parses, allowing partial, approximate transfer through linear expectation constraints on the space of distributions over trees. We consider several types of constraints that range from generic dependency conservation to language-specific annotation rules for auxiliary verb analysis. We evaluate our approach on Bulgarian and Spanish CoNLL shared task data and show that we consistently outperform unsupervised methods and can outperform supervised learning for limited training data. 1
Multi-view learning over structured and non-identical outputs
- In Proceedings of The 24th Conference on Uncertainty in Artificial Intelligence
, 2008
"... Multi-View Learning over Structured and Non-Identical Outputs In many machine learning problems, labeled training data is limited but unlabeled data is ample. Some of these problems have instances that can be factored into multiple views, each of which is nearly sufficient in determining the correct ..."
Abstract
-
Cited by 31 (5 self)
- Add to MetaCart
(Show Context)
Multi-View Learning over Structured and Non-Identical Outputs In many machine learning problems, labeled training data is limited but unlabeled data is ample. Some of these problems have instances that can be factored into multiple views, each of which is nearly sufficient in determining the correct labels. In this paper we present a new algorithm for probabilistic multi-view learning which uses the idea of stochastic agreement between views as regularization. Our algorithm works on structured and unstructured problems and easily generalizes to partial agreement scenarios. For the full agreement case, our algorithm minimizes the Bhattacharyya distance between the models of each view, and performs better than CoBoosting and two-view Perceptron on several at and structured classification problems.
Alternating projections for learning with expectation constraints
- In Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI
, 2009
"... We present an objective function for learn-ing with unlabeled data that utilizes auxil-iary expectation constraints. We optimize this objective function using a procedure that alternates between information and moment projections. Our method provides an alter-nate interpretation of the posterior reg ..."
Abstract
-
Cited by 30 (5 self)
- Add to MetaCart
(Show Context)
We present an objective function for learn-ing with unlabeled data that utilizes auxil-iary expectation constraints. We optimize this objective function using a procedure that alternates between information and moment projections. Our method provides an alter-nate interpretation of the posterior regular-ization framework (Graca et al., 2008), main-tains uncertainty during optimization un-like constraint-driven learning (Chang et al., 2007), and is more efficient than general-ized expectation criteria (Mann & McCallum, 2008). Applications of this framework in-clude minimally supervised learning, semi-supervised learning, and learning with con-straints that are more expressive than the un-derlying model. In experiments, we demon-strate comparable accuracy to generalized ex-pectation criteria for minimally supervised learning, and use expressive structural con-straints to guide semi-supervised learning, providing a 3%-6 % improvement over state-of-the-art constraint-driven learning. 1
Better alignments = better translations
- in Proc. of the ACL
, 2008
"... Automatic word alignment is a key step in training statistical machine translation systems. Despite much recent work on word alignment methods, alignment accuracy increases often produce little or no improvements in machine translation quality. In this work we analyze a recently proposed agreement-c ..."
Abstract
-
Cited by 29 (4 self)
- Add to MetaCart
(Show Context)
Automatic word alignment is a key step in training statistical machine translation systems. Despite much recent work on word alignment methods, alignment accuracy increases often produce little or no improvements in machine translation quality. In this work we analyze a recently proposed agreement-constrained EM algorithm for unsupervised alignment models. We attempt to tease apart the effects that this simple but effective modification has on alignment precision and recall trade-offs, and how rare and common words are affected across several language pairs. We propose and extensively evaluate a simple method for using alignment models to produce alignments better-suited for phrase-based MT systems, and show significant gains (as measured by BLEU score) in end-to-end translation systems for six languages pairs used in recent MT competitions. 1
Posterior vs. Parameter Sparsity in Latent Variable Models Supplementary Material
"... 1.1 Derivation of the ℓ1/ℓ ∞ dual program We want to optimize the objective: The Lagrangian becomes: min q,cwt KL(q||p) + σ ∑ s. t. Eq[fwti] ≤ cwt 0 ≤ cwt L(q, c, α, λ) = KL(q||p) + σ ∑ cwt + ∑ λwti(Eq[fwti] − cwt) − α · c (2) wt where we are maximizing with respect to λ ≥ 0 and α ≥ 0. Taking th ..."
Abstract
-
Cited by 29 (0 self)
- Add to MetaCart
(Show Context)
1.1 Derivation of the ℓ1/ℓ ∞ dual program We want to optimize the objective: The Lagrangian becomes: min q,cwt KL(q||p) + σ ∑ s. t. Eq[fwti] ≤ cwt 0 ≤ cwt L(q, c, α, λ) = KL(q||p) + σ ∑ cwt + ∑ λwti(Eq[fwti] − cwt) − α · c (2) wt where we are maximizing with respect to λ ≥ 0 and α ≥ 0. Taking the derivative with respect to q(z) we have: ∂L(q, c, α, λ) = log q(z) + 1 − log p(z) − f(z) · λ (3)