Results 1 
8 of
8
Top Rank Optimization in Linear Time
"... Bipartite ranking aims to learn a realvalued ranking function that orders positive instances before negative instances. Recent efforts of bipartite ranking are focused on optimizing ranking accuracy at the top of the ranked list. Most existing approaches are either to optimize task specific metric ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
(Show Context)
Bipartite ranking aims to learn a realvalued ranking function that orders positive instances before negative instances. Recent efforts of bipartite ranking are focused on optimizing ranking accuracy at the top of the ranked list. Most existing approaches are either to optimize task specific metrics or to extend the rank loss by emphasizing more on the error associated with the top ranked instances, leading to a high computational cost that is superlinear in the number of training instances. We propose a highly efficient approach, titled TopPush, for optimizing accuracy at the top that has computational complexity linear in the number of training instances. We present a novel analysis that bounds the generalization error for the top ranked instances for the proposed approach. Empirical study shows that the proposed approach is highly competitive to the stateoftheart approaches and is 10100 times faster. 1
Collaborative Ranking with a Push at the Top
"... The goal of collaborative filtering is to get accurate recommendations at the top of the list for a set of users. From such a perspective, collaborative ranking based formulations with suitable ranking loss functions are natural. While recent literature has explored the idea based on objective fun ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
The goal of collaborative filtering is to get accurate recommendations at the top of the list for a set of users. From such a perspective, collaborative ranking based formulations with suitable ranking loss functions are natural. While recent literature has explored the idea based on objective functions such as NDCG or Average Precision, such objectives are difficult to optimize directly. In this paper, building on recent advances from the learning to rank literature, we introduce a novel family of collaborative ranking algorithms which focus on accuracy at the top of the list for each user while learning the ranking functions collaboratively. We consider three specific formulations, based on collaborative pnorm push, infinite push, and reverseheight push, and propose ecient optimization methods for learning these models. Experimental results illustrate the value of collaborative ranking, and show that the proposed methods are competitive, usually better than existing popular approaches to personalized recommendation.
Generalized Low Rank Models
, 2014
"... Principal components analysis (PCA) is a wellknown technique for approximating a data set represented by a matrix by a low rank matrix. Here, we extend the idea of PCA to handle arbitrary data sets consisting of numerical, Boolean, categorical, ordinal, and other data types. This framework encompa ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
Principal components analysis (PCA) is a wellknown technique for approximating a data set represented by a matrix by a low rank matrix. Here, we extend the idea of PCA to handle arbitrary data sets consisting of numerical, Boolean, categorical, ordinal, and other data types. This framework encompasses many well known techniques in data analysis, such as nonnegative matrix factorization, matrix completion, sparse and robust PCA, kmeans, kSVD, and maximum margin matrix factorization. The method handles heterogeneous data sets, and leads to coherent schemes for compressing, denoising, and imputing missing entries across all data types simultaneously. It also admits a number of interesting interpretations of the low rank factors, which allow clustering of examples or of features. We propose several parallel algorithms for fitting generalized low rank models, and describe implementations and numerical results. This manuscript is a draft. Comments sent to udell@stanford.edu are welcome.
Largescale Distributed Optimization for Improving Accuracy at the Top
"... In this paper, we present a largescale distributed implementation of the accuracy at the top algorithm, which is a new notion of classification accuracy based on the top τquantile values of a scoring function. Our implementation approach is based on the Alternating Direction Method of Multipliers ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
In this paper, we present a largescale distributed implementation of the accuracy at the top algorithm, which is a new notion of classification accuracy based on the top τquantile values of a scoring function. Our implementation approach is based on the Alternating Direction Method of Multipliers (ADMM) consensus framework, written in Pregel (a unified framework for performing largescale graph computations, [6]) and meant for solving large scale convex optimization problems in a distributed fashion. 1
Learning Sample Specific Weights for Late Fusion
"... Abstract — Late fusion is one of the most effective approaches to enhance recognition accuracy through combining prediction scores of multiple classifiers, each of which is trained by a specific feature or model. The existing methods generally use a fixed fusion weight for one classifier over all sa ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract — Late fusion is one of the most effective approaches to enhance recognition accuracy through combining prediction scores of multiple classifiers, each of which is trained by a specific feature or model. The existing methods generally use a fixed fusion weight for one classifier over all samples, and ignore the fact that each classifier may perform better or worse for different subsets of samples. In order to address this issue, we propose a novel sample specific late fusion (SSLF) method. Specifically, we cast late fusion into an information propagation process that diffuses the fusion weights of labeled samples to the individual unlabeled samples, and enforce positive samples to have higher fusion scores than negative samples. Upon this process, the optimal fusion weight for each sample is identified, while positive samples are pushed toward the top at the fusion score rank list to achieve better accuracy. In this paper, two SSLF methods are presented. The first method is ranking SSLF (RSSLF), which is based on graph Laplacian with RankSVM style constraints. We formulate and solve the problem with a fast gradient projection algorithm; the second method is infinite push SSLF (ISSLF), which combines graph Laplacian with infinite push constraints. ISSLF is a l ∞ norm constrained optimization problem and can be solved by an efficient alternating direction method of multipliers method. Extensive experiments on both largescale image and video data sets demonstrate the effectiveness of our methods. In addition, in order to make our method scalable to support large data sets, the AnchorGraph model is employed to propagate information on a subset of samples (anchor points) and then reconstruct the entire graph to get the weights of all samples. To the best of our knowledge, this is the first method that supports learning of sample specific fusion weights for late fusion. Index Terms — Image recognition, video recognition, late fusion, infinite push, l ∞ norm. I.
Contents
, 2014
"... Principal components analysis (PCA) is a wellknown technique for approximating a data set represented by a matrix by a low rank matrix. Here, we extend the idea of PCA to handle arbitrary data sets consisting of numerical, Boolean, categorical, ordinal, and other data types. This framework encompa ..."
Abstract
 Add to MetaCart
(Show Context)
Principal components analysis (PCA) is a wellknown technique for approximating a data set represented by a matrix by a low rank matrix. Here, we extend the idea of PCA to handle arbitrary data sets consisting of numerical, Boolean, categorical, ordinal, and other data types. This framework encompasses many well known techniques in data analysis, such as nonnegative matrix factorization, matrix completion, sparse and robust PCA, kmeans, kSVD, and maximum margin matrix factorization. The method handles heterogeneous data sets, and leads to coherent schemes for compressing, denoising, and imputing missing entries across all data types simultaneously. It also admits a number of interesting interpretations of the low rank factors, which allow clustering of examples or of features. We propose several parallel algorithms for fitting generalized low rank models, and describe implementations and numerical results. This manuscript is a draft. Comments sent to udell@stanford.edu are welcome. 1 ar
Collaborative Filtering with InformationRich and InformationSparse Entities
"... In this paper, we consider a popular model for collaborative filtering in recommender systems where some users of a website rate some items, such as movies, and the goal is to recover the ratings of some or all of the unrated items of each user. In particular, we consider both the clustering model, ..."
Abstract
 Add to MetaCart
(Show Context)
In this paper, we consider a popular model for collaborative filtering in recommender systems where some users of a website rate some items, such as movies, and the goal is to recover the ratings of some or all of the unrated items of each user. In particular, we consider both the clustering model, where only users (or items) are clustered, and the coclustering model, where both users and items are clustered, and further, we assume that some users rate many items (informationrich users) and some users rate only a few items (informationsparse users). When users (or items) are clustered, our algorithm can recover the rating matrix with ω(MK logM) noisy entries while MK entries are necessary, where K is the number of clusters and M is the number of items. In the case of coclustering, we prove that K2 entries are necessary for recovering the rating matrix, and our algorithm achieves this lower bound within a logarithmic factor when K is sufficiently large. We compare our algorithms with a wellknown algorithms called alternating minimization (AM), and a similarity scorebased algorithm known as the popularityamongfriends (PAF) algorithm by applying all three to the MovieLens and Netflix data sets. Our coclustering algorithm and AM have similar overall error rates when recovering the rating matrix, both of which are lower than the error rate under PAF. But more importantly, the error rate of our coclustering algorithm is significantly lower than AM and PAF in the scenarios of interest in recommender systems: when recommending a few items to each user or when recommending items to users who only rated a few items (these users are the majority of the total user population). The performance difference increases even more when noise is added to the datasets. 1.