Results 1  10
of
83
A Unified View of Matrix Factorization Models
"... Abstract. We present a unified view of matrix factorization that frames the differences among popular methods, such as NMF, Weighted SVD, EPCA, MMMF, pLSI, pLSIpHITS, Bregman coclustering, and many others, in terms of a small number of modeling choices. Many of these approaches can be viewed as m ..."
Abstract

Cited by 58 (0 self)
 Add to MetaCart
(Show Context)
Abstract. We present a unified view of matrix factorization that frames the differences among popular methods, such as NMF, Weighted SVD, EPCA, MMMF, pLSI, pLSIpHITS, Bregman coclustering, and many others, in terms of a small number of modeling choices. Many of these approaches can be viewed as minimizing a generalized Bregman divergence, and we show that (i) a straightforward alternating projection algorithm can be applied to almost any model in our unified view; (ii) the Hessian for each projection has special structure that makes a Newton projection feasible, even when there are equality constraints on the factors, which allows for matrix coclustering; and (iii) alternating projections can be generalized to simultaneously factor a set of matrices that share dimensions. These observations immediately yield new optimization algorithms for the above factorization methods, and suggest novel generalizations of these methods such as incorporating row and column biases, and adding or relaxing clustering constraints. 1
A comparison of optimization methods and software for largescale l1regularized linear classification
 The Journal of Machine Learning Research
"... Largescale linear classification is widely used in many areas. The L1regularized form can be applied for feature selection; however, its nondifferentiability causes more difficulties in training. Although various optimization methods have been proposed in recent years, these have not yet been com ..."
Abstract

Cited by 54 (7 self)
 Add to MetaCart
Largescale linear classification is widely used in many areas. The L1regularized form can be applied for feature selection; however, its nondifferentiability causes more difficulties in training. Although various optimization methods have been proposed in recent years, these have not yet been compared suitably. In this paper, we first broadly review existing methods. Then, we discuss stateoftheart software packages in detail and propose two efficient implementations. Extensive comparisons indicate that carefully implemented coordinate descent methods are very suitable for training large document data.
Giannakis, “Distributed sparse linear regression
 IEEE Trans. Signal Process
, 2010
"... Abstract—The Lasso is a popular technique for joint estimation and continuous variable selection, especially wellsuited for sparse and possibly underdetermined linear regression problems. This paper develops algorithms to estimate the regression coefficients via Lasso when the training data are di ..."
Abstract

Cited by 46 (8 self)
 Add to MetaCart
(Show Context)
Abstract—The Lasso is a popular technique for joint estimation and continuous variable selection, especially wellsuited for sparse and possibly underdetermined linear regression problems. This paper develops algorithms to estimate the regression coefficients via Lasso when the training data are distributed across different agents, and their communication to a central processing unit is prohibited for e.g., communication cost or privacy reasons. A motivating application is explored in the context of wireless communications, whereby sensing cognitive radios collaborate to estimate the radiofrequency power spectrum density. Attaining different tradeoffs between complexity and convergence speed, three novel algorithms are obtained after reformulating the Lasso into a separable form, which is iteratively minimized using the alternatingdirection method of multipliers so as to gain the desired degree of parallelization. Interestingly, the per agent estimate updates are given by simple softthresholding operations, and interagent communication overhead remains at affordable level. Without exchanging elements from the different training sets, the local estimates consent to the global Lasso solution, i.e., the fit that would be obtained if the entire data set were centrally available. Numerical experiments with both simulated and real data demonstrate the merits of the proposed distributed schemes, corroborating their convergence and global optimality. The ideas in this paper can be easily extended for the purpose of fitting related models in a distributed fashion, including the adaptive Lasso, elastic net, fused Lasso and nonnegative garrote. Index Terms—Distributed linear regression, Lasso, parallel optimization, sparse estimation. I.
Graph regularized sparse coding for image representation
 IEEE Transactions on Image Processing
, 2011
"... Abstract—Sparse coding has received an increasing amount of interest in recent years. It is an unsupervised learning algorithm, which finds a basis set capturing highlevel semantics in the data and learns sparse coordinates in terms of the basis set. Originally applied to modeling the human visual ..."
Abstract

Cited by 30 (1 self)
 Add to MetaCart
(Show Context)
Abstract—Sparse coding has received an increasing amount of interest in recent years. It is an unsupervised learning algorithm, which finds a basis set capturing highlevel semantics in the data and learns sparse coordinates in terms of the basis set. Originally applied to modeling the human visual cortex, sparse coding has been shown useful for many applications. However, most of the existing approaches to sparse coding fail to consider the geometrical structure of the data space. In many real applications, the data is more likely to reside on a lowdimensional submanifold embedded in the highdimensional ambient space. It has been shown that the geometrical information of the data is important for discrimination. In this paper, we propose a graph based algorithm, called graph regularized sparse coding, to learn the sparse representations that explicitly take into account the local manifold structure of the data. By using graph Laplacian as a smooth operator, the obtained sparse representations vary smoothly along the geodesics of the data manifold. The extensive experimental results on image classification and clustering have demonstrated the effectiveness of our proposed algorithm. Index Terms—Image classification, image clustering, manifold learning, sparse coding. I.
Heterogeneous Multitask Learning with Joint Sparsity Constraints
"... Multitask learning addresses the problem of learning related tasks that presumably share some commonalities on their inputoutput mapping functions. Previous approaches to multitask learning usually deal with homogeneous tasks, such as purely regression tasks, or entirely classification tasks. In th ..."
Abstract

Cited by 27 (0 self)
 Add to MetaCart
(Show Context)
Multitask learning addresses the problem of learning related tasks that presumably share some commonalities on their inputoutput mapping functions. Previous approaches to multitask learning usually deal with homogeneous tasks, such as purely regression tasks, or entirely classification tasks. In this paper, we consider the problem of learning multiple related tasks of predicting both continuous and discrete outputs from a common set of input variables that lie in a highdimensional feature space. All of the tasks are related in the sense that they share the same set of relevant input variables, but the amount of influence of each input on different outputs may vary. We formulate this problem as a combination of linear regressions and logistic regressions, and model the joint sparsity as L1/L ∞ or L1/L2 norm of the model parameters. Among several possible applications, our approach addresses an important open problem in genetic association mapping, where the goal is to discover genetic markers that influence multiple correlated traits jointly. In our experiments, we demonstrate our method in this setting, using simulated and clinical asthma datasets, and we show that our method can effectively recover the relevant inputs with respect to all of the tasks. 1
An improved GLMNET for l1regularized logistic regression and support vector machines,” tech
, 2011
"... GLMNET proposed by Friedman et al. [1] is an algorithm for generalized linear models with elastic net. It has been widely applied to solve L1regularized logistic regression. However, recent experiments indicated that the existing GLMNET implementation may not be stable for largescale problems. In ..."
Abstract

Cited by 25 (2 self)
 Add to MetaCart
(Show Context)
GLMNET proposed by Friedman et al. [1] is an algorithm for generalized linear models with elastic net. It has been widely applied to solve L1regularized logistic regression. However, recent experiments indicated that the existing GLMNET implementation may not be stable for largescale problems. In this paper, we propose an improved GLMNET to address some theoretical and implementation issues. In particular, as a Newtontype method, GLMNET achieves fast local convergence, but may fail to quickly obtain a useful solution. By a careful design to adjust the effort for each iteration, our method is efficient regardless of loosely or strictly solving the optimization problem. Experiments demonstrate that the improved GLMNET is more efficient than a stateoftheart coordinate descent method.
Trajectory Prediction: Learning to Map Situations to Robot Trajectories
"... Trajectory planning and optimization is a fundamental problem in articulated robotics. Algorithms used typically for this problem compute optimal trajectories from scratch in a new situation. In effect, extensive data is accumulated containing situations together with the respective optimized trajec ..."
Abstract

Cited by 19 (1 self)
 Add to MetaCart
(Show Context)
Trajectory planning and optimization is a fundamental problem in articulated robotics. Algorithms used typically for this problem compute optimal trajectories from scratch in a new situation. In effect, extensive data is accumulated containing situations together with the respective optimized trajectories – but this data is in practice hardly exploited. The aim of this paper is to learn from this data. Given a new situation we want to predict a suitable trajectory which only needs minor refinement by a conventional optimizer. Our approach has two essential ingredients. First, to generalize from previous situations to new ones we need an appropriate situation descriptor – we propose a sparse feature selection approach to find such wellgeneralizing features of situations. Second, the transfer of previously optimized trajectories to a new situation should not be made in joint angle space – we propose a more efficient task space transfer of old trajectories to new situations. Experiments on a simulated humanoid reaching problem show that we can predict reasonable motion prototypes in new situations for which the refinement is much faster than an optimization from scratch. 1.
Bayesian and L1 approaches to sparse unsupervised learning. Arxiv preprint arXiv:1106.1157
, 2011
"... The use of L1 regularisation for sparse learning has generated immense research interest, with many successful applications in diverse areas such as signal acquisition, image coding, genomics and collaborative filtering. While existing work highlights the many advantages of L1 methods, in this paper ..."
Abstract

Cited by 18 (2 self)
 Add to MetaCart
(Show Context)
The use of L1 regularisation for sparse learning has generated immense research interest, with many successful applications in diverse areas such as signal acquisition, image coding, genomics and collaborative filtering. While existing work highlights the many advantages of L1 methods, in this paper we find that L1 regularisation often dramatically underperforms in terms of predictive performance when compared to other methods for inferring sparsity. We focus on unsupervised latent variable models, and develop L1 minimising factor models, Bayesian variants of “L1”, and Bayesian models with a stronger L0like sparsity induced through spikeandslab distributions. These spikeandslab Bayesian factor models encourage sparsity while accounting for uncertainty in a principled manner, and avoid unnecessary shrinkage of nonzero values. We demonstrate on a number of data sets that in practice spikeandslab Bayesian methods outperform L1 minimisation, even on a computational budget. We thus highlight the need to reassess the wide use of L1 methods in sparsityreliant applications, particularly when we care about generalising to previously unseen data, and provide an alternative that, over many varying conditions, provides improved generalisation performance.