Results 1  10
of
11
Guaranteed matrix completion via nonconvex factorization
, 2014
"... Matrix factorization is a popular approach for largescale matrix completion and constitutes a basic component of many solutions for Netflix Prize competition. In this approach, the unknown lowrank matrix is expressed as the product of two much smaller matrices so that the lowrank property is auto ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
Matrix factorization is a popular approach for largescale matrix completion and constitutes a basic component of many solutions for Netflix Prize competition. In this approach, the unknown lowrank matrix is expressed as the product of two much smaller matrices so that the lowrank property is automatically fulfilled. The resulting optimization problem, even with huge size, can be solved (to stationary points) very efficiently through standard optimization algorithms such as alternating minimization and stochastic gradient descent (SGD). However, due to the nonconvexity caused by the factorization model, there is a limited theoretical understanding of whether these algorithms will generate a good solution. In this paper, we establish a theoretical guarantee for the factorization based formulation to correctly recover the underlying lowrank matrix. In particular, we show that under similar conditions to those in previous works, many standard optimization algorithms converge to the global optima of the factorization based formulation, thus recovering the true lowrank matrix. To the best of our knowledge, our result is the first one that provides recovery guarantee for many standard algorithms such as gradient descent, SGD and block coordinate gradient descent. Our result also applies to alternating minimization, and a notable difference from previous studies on alternating minimization is that we do not need the resampling scheme (i.e. using independent samples in each iteration).
Scalable variational bayesian matrix factorization with side information
 In Proc. of AISTATS
, 2014
"... Abstract Bayesian matrix factorization (BMF) is a popular method for collaborative prediction, because of its robustness to overfitting as well as of being free from crossvalidation for fine tuning of regularization parameters. In practice, however, due to its cubic time complexity with respect to ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Abstract Bayesian matrix factorization (BMF) is a popular method for collaborative prediction, because of its robustness to overfitting as well as of being free from crossvalidation for fine tuning of regularization parameters. In practice, however, due to its cubic time complexity with respect to the rank of factor matrices, existing variational inference algorithms for BMF are not well suited to webscale datasets where billions of ratings provided by millions of users are available. The time complexity even increases when the side information, such as user binary implicit feedback or item content information, is incorporated into variational Bayesian matrix factorization (VBMF). For instance, a state of the arts in VBMF with side information, is to place Gaussian priors on user and item factor matrices, where mean of each prior is regressed on the corresponding side information. Since this approach introduces additional cubic time complexity with respect to the size of feature vectors, the use of rich side information in a form of highdimensional feature vector is prohibited. In this paper, we present a scalable inference for VBMF with side information, the complexity of which is linear in the rank K of factor matrices. Moreover, the algorithm can be easily parallelized on multicore systems. Experiments on largescale datasets demonstrate the useful behavior of our algorithm such as scalability, fast learning, and prediction accuracy.
A Learningrate Schedule for Stochastic Gradient Methods to Matrix Factorization
"... Abstract. Stochastic gradient methods are effective to solve matrix factorization problems. However, it is well known that the performance of stochastic gradient method highly depends on the learning rate schedule used; a good schedule can significantly boost the training process. In this paper, mo ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract. Stochastic gradient methods are effective to solve matrix factorization problems. However, it is well known that the performance of stochastic gradient method highly depends on the learning rate schedule used; a good schedule can significantly boost the training process. In this paper, motivated from past works on convex optimization which assign a learning rate for each variable, we propose a new schedule for matrix factorization. The experiments demonstrate that the proposed schedule leads to faster convergence than existing ones. Our schedule uses the same parameter on all data sets included in our experiments; that is, the time spent on learning rate selection can be significantly reduced. By applying this schedule to a stateoftheart matrix factorization package, the resulting implementation outperforms available parallel matrix factorization packages.
Elastic Distributed Bayesian Collaborative Filtering
"... In this paper, we consider learning a Bayesian collaborative filtering model on a shared cluster of commodity machines. Two main challenges arise: (1) How can we parallelize and distribute Bayesian collaborative filtering? (2) How can our distributed inference system handle elasticity events common ..."
Abstract
 Add to MetaCart
(Show Context)
In this paper, we consider learning a Bayesian collaborative filtering model on a shared cluster of commodity machines. Two main challenges arise: (1) How can we parallelize and distribute Bayesian collaborative filtering? (2) How can our distributed inference system handle elasticity events common in a shared, resource managed cluster, including resource rampup, preemption, and stragglers? To parallelize Bayesian inference, we adapt ideas from both matrix factorization partitioning schemes used with stochastic gradient descent and stale synchronous programming used with parameter servers. To handle elasticity events we offer a generalization of previous partitioning schemes that gives increased flexibility during system disruptions. We additionally describe two new scheduling algorithms to dynamically route work at runtime. In our experiments, we compare the effectiveness of both scheduling algorithms and demonstrate their robustness to system failure. 1
Scalable Parallel EM Algorithms for Latent Dirichlet Allocation in MultiCore Systems
"... Latent Dirichlet allocation (LDA) is a widelyused probabilistic topic modeling tool for content analysis such as web mining. To handle webscale content analysis on just a single PC, we propose multicore parallel expectationmaximization (PEM) algorithms to infer and estimate LDA parameters in sha ..."
Abstract
 Add to MetaCart
(Show Context)
Latent Dirichlet allocation (LDA) is a widelyused probabilistic topic modeling tool for content analysis such as web mining. To handle webscale content analysis on just a single PC, we propose multicore parallel expectationmaximization (PEM) algorithms to infer and estimate LDA parameters in shared memory systems. By avoiding memory access conflicts reducing the locking time among multiple threads and residualbased dynamic scheduling, we show that PEM algorithms are more scalable and accurate than the current stateoftheart parallel LDA algorithms on a commodity PC. This parallel LDA toolbox is made publicly available as open source software at mloss.org.
A Parallel and Efficient Algorithm for Learning to Match
"... Many tasks in data mining and related fields can be formalized as matching between objects in two heterogeneous domains, including collaborative filtering, link prediction, image tagging, and web search. Machine learning techniques, referred to as learningtomatch in this paper, have been successf ..."
Abstract
 Add to MetaCart
(Show Context)
Many tasks in data mining and related fields can be formalized as matching between objects in two heterogeneous domains, including collaborative filtering, link prediction, image tagging, and web search. Machine learning techniques, referred to as learningtomatch in this paper, have been successfully applied to the problems. Among them, a class of stateoftheart methods, named featurebased matrix factorization, formalize the task as an extension to matrix factorization by incorporating auxiliary features into the model. Unfortunately, making those algorithms scale to real world problems is challenging, and simple parallelization strategies fail due to the complex cross talking patterns between subtasks. In this paper, we tackle this challenge with a novel parallel and efficient algorithm for featurebased matrix factorization. Our algorithm, based on coordinate descent, can easily handle hundreds of millions of instances and features on a single machine. The key recipe of this algorithm is an iterative relaxation of the objective to facilitate parallel updates of parameters, with guaranteed convergence on minimizing the original objective function. Experimental results demonstrate that the proposed method is effective on a wide range of matching problems, with efficiency significantly improved upon the baselines while accuracy retained unchanged. 1.
A Fast Distributed Stochastic Gradient Descent Algorithm for Matrix Factorization
"... The accuracy and effectiveness of matrix factorization technique were well demonstrated in the Netflix movie recommendation contest. Among the numerous solutions for matrix factorization, Stochastic Gradient Descent (SGD) is one of the most widely used algorithms. However, as a sequential approach, ..."
Abstract
 Add to MetaCart
The accuracy and effectiveness of matrix factorization technique were well demonstrated in the Netflix movie recommendation contest. Among the numerous solutions for matrix factorization, Stochastic Gradient Descent (SGD) is one of the most widely used algorithms. However, as a sequential approach, SGD algorithm cannot directly be used in the Distributed Cluster Environment (DCE). In this paper, we propose a fast distributed SGD algorithm named FDSGD for matrix factorization, which can run efficiently in DCE. This algorithm solves data sharing problem based on independent storage system to avoid data synchronization which may cause a big influence to algorithm performance, and synchronous operation problem in DCE using a distributed synchronization tool so that distributed cooperation threads can perform in a harmonious environment.