Results 1  10
of
15
Spectral methods meet em: A provably optimal algorithm for crowdsourcin. arXiv preprint arXiv:1406.3824
, 2014
"... The DawidSkene estimator has been widely used for inferring the true labels from the noisy labels provided by nonexpert crowdsourcing workers. However, since the estimator maximizes a nonconvex loglikelihood function, it is hard to theoretically justify its performance. In this paper, we propose ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
(Show Context)
The DawidSkene estimator has been widely used for inferring the true labels from the noisy labels provided by nonexpert crowdsourcing workers. However, since the estimator maximizes a nonconvex loglikelihood function, it is hard to theoretically justify its performance. In this paper, we propose a twostage efficient algorithm for multiclass crowd labeling problems. The first stage uses the spectral method to obtain an initial estimate of parameters. Then the second stage refines the estimation by optimizing the objective function of the DawidSkene estimator via the EM algorithm. We show that our algorithm achieves the optimal convergence rate up to a logarithmic factor. We conduct extensive experiments on synthetic and real datasets. Experimental results demonstrate that the proposed algorithm is comparable to the most accurate empirical approach, while outperforming several other recently proposed methods. 1
Statistical guarantees for the EM algorithm: From population to samplebased analysis
, 2014
"... We develop a general framework for proving rigorous guarantees on the performance of the EM algorithm and a variant known as gradient EM. Our analysis is divided into two parts: a treatment of these algorithms at the population level (in the limit of infinite data), followed by results that apply to ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
(Show Context)
We develop a general framework for proving rigorous guarantees on the performance of the EM algorithm and a variant known as gradient EM. Our analysis is divided into two parts: a treatment of these algorithms at the population level (in the limit of infinite data), followed by results that apply to updates based on a finite set of samples. First, we characterize the domain of attraction of any global maximizer of the population likelihood. This characterization is based on a novel view of the EM updates as a perturbed form of likelihood ascent, or in parallel, of the gradient EM updates as a perturbed form of standard gradient ascent. Leveraging this characterization, we then provide nonasymptotic guarantees on the EM and gradient EM algorithms when applied to a finite set of samples. We develop consequences of our general theory for three canonical examples of incompletedata problems: mixture of Gaussians, mixture of regressions, and linear regression with covariates missing completely at random. In each case, our theory guarantees that with a suitable initialization, a relatively small number of EM (or gradient EM) steps will yield (with high probability) an estimate that is within statistical error of the MLE. We provide simulations to confirm this theoretically predicted behavior. 1
Methods of Moments for Learning Stochastic Languages: Unified Presentation and Empirical Comparison
"... Probabilistic latentvariable models are a powerful tool for modelling structured data. However, traditional expectationmaximization methods of learning such models are both computationally expensive and prone to localminima. In contrast to these traditional methods, recently developed learning a ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
(Show Context)
Probabilistic latentvariable models are a powerful tool for modelling structured data. However, traditional expectationmaximization methods of learning such models are both computationally expensive and prone to localminima. In contrast to these traditional methods, recently developed learning algorithms based upon the method of moments are both computationally efficient and provide strong statistical guarantees. In this work we provide a unified presentation and empirical comparison of three general momentbased methods in the context of modelling stochastic languages. By rephrasing these methods upon a common theoretical ground, introducing novel theoretical results where necessary, we provide a clear comparison, making explicit the statistical assumptions upon which each method relies. With this theoretical grounding, we then provide an indepth empirical analysis of the methods on both real and synthetic data with the goal of elucidating performance trends and highlighting important implementation details. 1.
Spectral Methods for Supervised Topic Models
"... Supervised topic models simultaneously model the latent topic structure of large collections of documents and a response variable associated with each document. Existing inference methods are based on either variational approximation or Monte Carlo sampling. This paper presents a novel spectral dec ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
(Show Context)
Supervised topic models simultaneously model the latent topic structure of large collections of documents and a response variable associated with each document. Existing inference methods are based on either variational approximation or Monte Carlo sampling. This paper presents a novel spectral decomposition algorithm to recover the parameters of supervised latent Dirichlet allocation (sLDA) models. The SpectralsLDA algorithm is provably correct and computationally efficient. We prove a sample complexity bound and subsequently derive a sufficient condition for the identifiability of sLDA. Thorough experiments on a diverse range of synthetic and realworld datasets verify the theory and demonstrate the practical effectiveness of the algorithm. 1
Contrastive Learning Using Spectral Methods
"... In many natural settings, the analysis goal is not to characterize a single data set in isolation, but rather to understand the difference between one set of observations and another. For example, given a background corpus of news articles together with writings of a particular author, one may want ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
In many natural settings, the analysis goal is not to characterize a single data set in isolation, but rather to understand the difference between one set of observations and another. For example, given a background corpus of news articles together with writings of a particular author, one may want a topic model that explains word patterns and themes specific to the author. Another example comes from genomics, in which biological signals may be collected from different regions of a genome, and one wants a model that captures the differential statistics observed in these regions. This paper formalizes this notion of contrastive learning for mixture models, and develops spectral algorithms for inferring mixture components specific to a foreground data set when contrasted with a background data set. The method builds on recent momentbased estimators and tensor decompositions for latent variable models, and has the intuitive feature of using background data statistics to appropriately modify moments estimated from foreground data. A key advantage of the method is that the background data need only be coarsely modeled, which is important when the background is too complex, noisy, or not of interest. The method is demonstrated on applications in contrastive topic modeling and genomic sequence analysis. 1
High dimensional expectationmaximization algorithm: Statistical optimization and asymptotic normality. arXiv arXiv preprint:1412.8729
, 2014
"... We provide a general theory of the expectationmaximization (EM) algorithm for inferring high dimensional latent variable models. In particular, we make two contributions: (i) For parameter estimation, we propose a novel high dimensional EM algorithm which naturally incorporates sparsity structure ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
We provide a general theory of the expectationmaximization (EM) algorithm for inferring high dimensional latent variable models. In particular, we make two contributions: (i) For parameter estimation, we propose a novel high dimensional EM algorithm which naturally incorporates sparsity structure into parameter estimation. With an appropriate initialization, this algorithm converges at a geometric rate and attains an estimator with the (near)optimal statistical rate of convergence. (ii) Based on the obtained estimator, we propose new inferential procedures for testing hypotheses and constructing confidence intervals for low dimensional components of high dimensional parameters. For a broad family of statistical models, our framework establishes the first computationally feasible approach for optimal estimation and asymptotic inference in high dimensions. Our theory is supported by thorough numerical results. 1
Estimating LatentVariable Graphical Models using Moments and Likelihoods
"... Recent work on the method of moments enable consistent parameter estimation, but only for certain types of latentvariable models. On the other hand, pure likelihood objectives, though more universally applicable, are difficult to optimize. In this work, we show that using the method of moments in ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Recent work on the method of moments enable consistent parameter estimation, but only for certain types of latentvariable models. On the other hand, pure likelihood objectives, though more universally applicable, are difficult to optimize. In this work, we show that using the method of moments in conjunction with composite likelihood yields consistent parameter estimates for a much broader class of discrete directed and undirected graphical models, including loopy graphs with high treewidth. Specifically, we use tensor factorization to reveal information about the hidden variables. This allows us to construct convex likelihoods which can be globally optimized to recover the parameters. 1.
A convex formulation for mixed regression: Near optimal rates in the face of noise
, 2013
"... ar ..."
(Show Context)
Estimating Mixture Models via Mixtures of Polynomials
"... Mixture modeling is a general technique for making any simple model more expressive through weighted combination. This generality and simplicity in part explains the success of the Expectation Maximization (EM) algorithm, in which updates are easy to derive for a wide class of mixture models. Howev ..."
Abstract
 Add to MetaCart
(Show Context)
Mixture modeling is a general technique for making any simple model more expressive through weighted combination. This generality and simplicity in part explains the success of the Expectation Maximization (EM) algorithm, in which updates are easy to derive for a wide class of mixture models. However, the likelihood of a mixture model is nonconvex, so EM has no known global convergence guarantees. Recently, method of moments approaches offer global guarantees for some mixture models, but they do not extend easily to the range of mixture models that exist. In this work, we present Polymom, an unifying framework based on method of moments in which estimation procedures are easily derivable, just as in EM. Polymom is applicable when the moments of a single mixture component are polynomials of the parameters. Our key observation is that the moments of the mixture model are a mixture of these polynomials, which allows us to cast estimation as a Generalized Moment Problem. We solve its relaxations using semidefinite optimization, and then extract parameters using ideas from computer algebra. This framework allows us to draw insights and apply tools from convex optimization, computer algebra and the theory of moments to study problems in statistical estimation. Simulations show good empirical performance on several models. 1
Coordinatedescent for learning orthogonal matrices through Givens rotations
"... Optimizing over the set of orthogonal matrices is a central component in problems like sparsePCA or tensor decomposition. Unfortunately, such optimization is hard since simple operations on orthogonal matrices easily break orthogonality, and correcting orthogonality usually costs a large amount of ..."
Abstract
 Add to MetaCart
(Show Context)
Optimizing over the set of orthogonal matrices is a central component in problems like sparsePCA or tensor decomposition. Unfortunately, such optimization is hard since simple operations on orthogonal matrices easily break orthogonality, and correcting orthogonality usually costs a large amount of computation. Here we propose a framework for optimizing orthogonal matrices, that is the parallel of coordinatedescent in Euclidean spaces. It is based on Givensrotations, a fasttocompute operation that affects a small number of entries in the learned matrix, and preserves orthogonality. We show two applications of this approach: an algorithm for tensor decompositions used in learning mixture models, and an algorithm for sparsePCA. We study the parameter regime where a Givens rotation approach converges faster and achieves a superior model on a genomewide brainwide mRNA expression dataset. 1.