Results 1  10
of
93
Proximal Methods for Hierarchical Sparse Coding
, 2010
"... Sparse coding consists in representing signals as sparse linear combinations of atoms selected from a dictionary. We consider an extension of this framework where the atoms are further assumed to be embedded in a tree. This is achieved using a recently introduced treestructured sparse regularizatio ..."
Abstract

Cited by 83 (18 self)
 Add to MetaCart
Sparse coding consists in representing signals as sparse linear combinations of atoms selected from a dictionary. We consider an extension of this framework where the atoms are further assumed to be embedded in a tree. This is achieved using a recently introduced treestructured sparse regularization norm, which has proven useful in several applications. This norm leads to regularized problems that are difficult to optimize, and we propose in this paper efficient algorithms for solving them. More precisely, we show that the proximal operator associated with this norm is computable exactly via a dual approach that can be viewed as the composition of elementary proximal operators. Our procedure has a complexity linear, or close to linear, in the number of atoms, and allows the use of accelerated gradient techniques to solve the treestructured sparse approximation problem at the same computational cost as traditional ones using the ℓ1norm. Our method is efficient and scales gracefully to millions of variables, which we illustrate in two types of applications: first, we consider fixed hierarchical dictionaries of wavelets to denoise natural images. Then, we apply our optimization tools in the context of dictionary learning, where learned dictionary elements naturally organize in a prespecified arborescent structure, leading to a better performance in reconstruction of natural image patches. When applied to text documents, our method learns hierarchies of topics, thus providing a competitive alternative to probabilistic topic models.
Partially labeled topic models for interpretable text mining
 In Proceedings of KDD
, 2011
"... Much of the world’s electronic text is annotated with humaninterpretable labels, such as tags on web pages and subject codes on academic publications. Effective text mining in this setting requires models that can flexibly account for the textual patterns that underlie the observed labels while stil ..."
Abstract

Cited by 29 (2 self)
 Add to MetaCart
(Show Context)
Much of the world’s electronic text is annotated with humaninterpretable labels, such as tags on web pages and subject codes on academic publications. Effective text mining in this setting requires models that can flexibly account for the textual patterns that underlie the observed labels while still discovering unlabeled topics. Neither supervised classification, with its focus on label prediction, nor purely unsupervised learning, which does not model the labels explicitly, is appropriate. In this paper, we present two new partially supervised generative models of labeled text, Partially Labeled Dirichlet Allocation (PLDA) and the Partially Labeled Dirichlet Process (PLDP). These models make use of the unsupervised learning machinery of topic models to discover the hidden topics within each label, as well as unlabeled, corpuswide latent topics. We explore applications with qualitative case studies of tagged web pages from del.icio.us and PhD dissertation abstracts, demonstrating improved model interpretability over traditional topic models. We use the many tags present in our del.icio.us dataset to quantitatively demonstrate the new models ’ higher correlation with human relatedness scores over several strong baselines.
Predictive subspace learning for multiview data: a large margin approach
 In NIPS
, 2010
"... Learning from multiview data is important in many applications, such as image classification and annotation. In this paper, we present a largemargin learning framework to discover a predictive latent subspace representation shared by multiple views. Our approach is based on an undirected latent s ..."
Abstract

Cited by 27 (8 self)
 Add to MetaCart
(Show Context)
Learning from multiview data is important in many applications, such as image classification and annotation. In this paper, we present a largemargin learning framework to discover a predictive latent subspace representation shared by multiple views. Our approach is based on an undirected latent space Markov network that fulfills a weak conditional independence assumption that multiview observations and response variables are independent given a set of latent variables. We provide efficient inference and parameter estimation methods for the latent subspace model. Finally, we demonstrate the advantages of largemargin learning on real video and web image data for discovering predictive latent representations and improving the performance on image classification, annotation and retrieval. 1
MaxMargin Nonparametric Latent Feature Models for Link Prediction
"... We present a maxmargin nonparametric latent feature relational model, which unites the ideas of maxmargin learning and Bayesian nonparametrics to discover discriminative latent features for link prediction and automatically infer the unknown latent social dimension. By minimizing a hingeloss usi ..."
Abstract

Cited by 21 (9 self)
 Add to MetaCart
(Show Context)
We present a maxmargin nonparametric latent feature relational model, which unites the ideas of maxmargin learning and Bayesian nonparametrics to discover discriminative latent features for link prediction and automatically infer the unknown latent social dimension. By minimizing a hingeloss using the linear expectation operator, we can perform posterior inference efficiently without dealing with a highly nonlinear link likelihood function; by using a fullyBayesian formulation, we can avoid tuning regularization constants. Experimental results on real datasets appear to demonstrate the benefits inherited from maxmargin learning and fullyBayesian nonparametric inference. 1.
Infinite Latent SVM for Classification and Multitask Learning
"... Unlike existing nonparametric Bayesian models, which rely solely on specially conceived priors to incorporate domain knowledge for discovering improved latent representations, we study nonparametric Bayesian inference with regularization on the desired posterior distributions. While priors can indir ..."
Abstract

Cited by 21 (12 self)
 Add to MetaCart
(Show Context)
Unlike existing nonparametric Bayesian models, which rely solely on specially conceived priors to incorporate domain knowledge for discovering improved latent representations, we study nonparametric Bayesian inference with regularization on the desired posterior distributions. While priors can indirectly affect posterior distributions through Bayes ’ theorem, imposing posterior regularization is arguably more direct and in some cases can be much easier. We particularly focus on developing infinite latent support vector machines (iLSVM) and multitask infinite latent support vector machines (MTiLSVM), which explore the largemargin idea in combination with a nonparametric Bayesian model for discovering predictive latent features for classification and multitask learning, respectively. We present efficient inference methods and report empirical studies on several benchmark datasets. Our results appear to demonstrate the merits inherited from both largemargin learning and Bayesian nonparametrics. 1
Sparse topical coding
 in 27th Conference on Uncertainty in Artificial Intelligence (UAI
, 2011
"... We present sparse topical coding (STC), a nonprobabilistic formulation of topic models for discovering latent representations of large collections of data. Unlike probabilistic topic models, STC relaxes the normalization constraint of admixture proportions and the constraint of defining a normal ..."
Abstract

Cited by 18 (5 self)
 Add to MetaCart
(Show Context)
We present sparse topical coding (STC), a nonprobabilistic formulation of topic models for discovering latent representations of large collections of data. Unlike probabilistic topic models, STC relaxes the normalization constraint of admixture proportions and the constraint of defining a normalized likelihood function. Such relaxations make STC amenable to: 1) directly control the sparsity of inferred representations by using sparsityinducing regularizers; 2) be seamlessly integrated with a convex error function (e.g., SVM hinge loss) for supervised learning; and 3) be efficiently learned with a simply structured coordinate descent algorithm. Our results demonstrate the advantages of STC and supervised MedSTC on identifying topical meanings of words and improving classification accuracy and time efficiency. 1
Probabilistic Mining of SocioGeographic Routines from Mobile Phone Data
"... There is relatively little work on the investigation of largescale human data in terms of multimodality for human activity discovery. In this paper we suggest that human interaction data, or human proximity, obtained by mobile phone Bluetooth sensor data, can be integrated with human location data, ..."
Abstract

Cited by 17 (7 self)
 Add to MetaCart
(Show Context)
There is relatively little work on the investigation of largescale human data in terms of multimodality for human activity discovery. In this paper we suggest that human interaction data, or human proximity, obtained by mobile phone Bluetooth sensor data, can be integrated with human location data, obtained by mobile cell tower connections, to mine meaningful details about human activities from large and noisy datasets. We propose a model, called bag of multimodal behavior, that integrates the modeling of variations of location over multiple timescales, and the modeling of interaction types from proximity. Our representation is simple yet robust to characterize reallife human behavior sensed from mobile phones, which are devices capable of capturing largescale data known to be noisy and incomplete. We use an unsupervised approach, based on probabilistic topic models, to discover latent human activities in terms of the joint interaction and location behaviors of 97 individuals over the course of approximately a 10 month period using data from MIT’s Reality Mining project. Some of the human activities discovered with our multimodal data representation include “going out from 7pmmidnight alone ” and “working from 11am5pm with 35 other people”, further finding that this activity dominantly occurs on specific days of the week. Our methodology also finds dominant work patterns occurring on other days of the week. We further demonstrate the feasibility of the topic modeling framework to discover human routines to predict missing multimodal phone data on specific times of the day. Copyright (c) 2008 IEEE. Personal use of this material is permitted. However, permission to use this material for any other purposes must be obtained from the IEEE by sending a request to pubspermissions@ieee.org.
Constrained LDA for Grouping Product Features in Opinion Mining
"... Abstract. In opinion mining of product reviews, one often wants to produce a summary of opinions based on product features/attributes. However, for the same feature, people can express it with different words and phrases. To produce an effective summary, these words and phrases, which are domain syn ..."
Abstract

Cited by 16 (4 self)
 Add to MetaCart
(Show Context)
Abstract. In opinion mining of product reviews, one often wants to produce a summary of opinions based on product features/attributes. However, for the same feature, people can express it with different words and phrases. To produce an effective summary, these words and phrases, which are domain synonyms, need to be grouped under the same feature. Topic modeling is a suitable method for the task. However, instead of simply letting topic modeling find groupings freely, we believe it is possible to do better by giving it some preexisting knowledge in the form of automatically extracted constraints. In this paper, we first extend a popular topic modeling method, called LDA, with the ability to process large scale constraints. Then, two novel methods are proposed to extract two types of constraints automatically. Finally, the resulting constrainedLDA and the extracted constraints are applied to group product features. Experiments show that constrainedLDA outperforms the original LDA and the latest mLSA by a large margin.
Maximum Entropy Discrimination Markov Networks
, 2008
"... Standard maxmargin structured prediction methods concentrate directly on the inputoutput mapping, and the lack of an elegant probabilistic interpretation causes limitations. In this paper, we present a novel framework called Maximum Entropy Discrimination Markov Networks (MaxEntNet) to do Bayesian ..."
Abstract

Cited by 15 (8 self)
 Add to MetaCart
Standard maxmargin structured prediction methods concentrate directly on the inputoutput mapping, and the lack of an elegant probabilistic interpretation causes limitations. In this paper, we present a novel framework called Maximum Entropy Discrimination Markov Networks (MaxEntNet) to do Bayesian maxmargin structured learning by using expected margin constraints to define a feasible distribution subspace and applying the maximum entropy principle to choose the best distribution from this subspace. We show that MaxEntNet subsumes the standard maxmargin Markov networks (M 3 N) as a spacial case where the predictive model is assumed to be linear and the parameter prior is a standard normal. Based on this understanding, we propose the Laplace maxmargin Markov networks (LapM 3 N) which use the Laplace prior instead of the standard normal. We show that the adoption of a Laplace prior of the parameter makes LapM 3 N enjoy properties expected from a sparsified M 3 N. Unlike the L1regularized maximum likelihood estimation which sets small weights to zeros to achieve sparsity, LapM 3 N posteriorly weights the parameters and features with smaller weights are shrunk more. This posterior weighting effect makes LapM 3 N more stable with respect to the magnitudes of the regularization coefficients and more generalizable. To
Bayesian inference with posterior regularization and applications to infinite latent svms
 In arXiv:1210.1766v2
, 2013
"... Existing Bayesian models, especially nonparametric Bayesian methods, rely on specially conceived priors to incorporate domain knowledge for discovering improved latent representations. While priors affect posterior distributions through Bayes ’ rule, imposing posterior regularization is arguably mo ..."
Abstract

Cited by 14 (9 self)
 Add to MetaCart
(Show Context)
Existing Bayesian models, especially nonparametric Bayesian methods, rely on specially conceived priors to incorporate domain knowledge for discovering improved latent representations. While priors affect posterior distributions through Bayes ’ rule, imposing posterior regularization is arguably more direct and in some cases more natural and general. In this paper, we present regularized Bayesian inference (RegBayes), a novel computational framework that performs posterior inference with a regularization term on the desired postdata posterior distribution under an information theoretical formulation. RegBayes is more flexible than the procedure that elicits expert knowledge via priors, and it covers both directed Bayesian networks and undirected Markov networks. When the regularization is induced from a linear operator on the posterior distributions, such as the expectation operator, we present a general convexanalysis theorem to characterize the solution of RegBayes. Furthermore, we present two concrete examples of RegBayes, infinite latent support vector machines (iLSVM) and multitask infinite latent support vector machines (MTiLSVM), which explore the largemargin idea in combination with a nonparametric Bayesian model for dis