Results 1  10
of
76
Dynamic topic models
 In ICML
, 2006
"... Scientists need new tools to explore and browse large collections of scholarly literature. Thanks to organizations such as JSTOR, which scan and index the original bound archives of many journals, modern scientists can search digital libraries spanning hundreds of years. A scientist, suddenly ..."
Abstract

Cited by 681 (29 self)
 Add to MetaCart
(Show Context)
Scientists need new tools to explore and browse large collections of scholarly literature. Thanks to organizations such as JSTOR, which scan and index the original bound archives of many journals, modern scientists can search digital libraries spanning hundreds of years. A scientist, suddenly
MaximumMargin Matrix Factorization
 Advances in Neural Information Processing Systems 17
, 2005
"... We present a novel approach to collaborative prediction, using lownorm instead of lowrank factorizations. The approach is inspired by, and has strong connections to, largemargin linear discrimination. We show how to learn lownorm factorizations by solving a semidefinite program, and discuss ..."
Abstract

Cited by 264 (21 self)
 Add to MetaCart
(Show Context)
We present a novel approach to collaborative prediction, using lownorm instead of lowrank factorizations. The approach is inspired by, and has strong connections to, largemargin linear discrimination. We show how to learn lownorm factorizations by solving a semidefinite program, and discuss generalization error bounds for them.
Fast maximum margin matrix factorization for collaborative prediction
 In Proceedings of the 22nd International Conference on Machine Learning (ICML
, 2005
"... Maximum Margin Matrix Factorization (MMMF) was recently suggested (Srebro et al., 2005) as a convex, infinite dimensional alternative to lowrank approximations and standard factor models. MMMF can be formulated as a semidefinite programming (SDP) and learned using standard SDP solvers. However, cu ..."
Abstract

Cited by 248 (6 self)
 Add to MetaCart
Maximum Margin Matrix Factorization (MMMF) was recently suggested (Srebro et al., 2005) as a convex, infinite dimensional alternative to lowrank approximations and standard factor models. MMMF can be formulated as a semidefinite programming (SDP) and learned using standard SDP solvers. However, current SDP solvers can only handle MMMF problems on matrices of dimensionality up to a few hundred. Here, we investigate a direct gradientbased optimization method for MMMF and demonstrate it on large collaborative prediction problems. We compare against results obtained by Marlin (2004) and find that MMMF substantially outperforms all nine methods he tested. 1.
A Survey of Collaborative Filtering Techniques
, 2009
"... As one of the most successful approaches to building recommender systems, collaborative filtering (CF) uses the known preferences of a group of users to make recommendations or predictions of the unknown preferences for other users. In this paper, we first introduce CF tasks and their main challenge ..."
Abstract

Cited by 216 (0 self)
 Add to MetaCart
As one of the most successful approaches to building recommender systems, collaborative filtering (CF) uses the known preferences of a group of users to make recommendations or predictions of the unknown preferences for other users. In this paper, we first introduce CF tasks and their main challenges, such as data sparsity, scalability, synonymy, gray sheep, shilling attacks, privacy protection, etc., and their possible solutions. We then present three main categories of CF techniques: memorybased, modelbased, and hybrid CF algorithms (that combine CF with other recommendation techniques), with examples for representative algorithms of each category, and analysis of their predictive performance and their ability to address the challenges. From basic techniques to the stateoftheart, we attempt to present a comprehensive survey for CF techniques, which can be served as a roadmap for research and practice in this area.
A CORRELATED TOPIC MODEL OF SCIENCE
, 2007
"... Topic models, such as latent Dirichlet allocation (LDA), can be useful tools for the statistical analysis of document collections and other discrete data. The LDA model assumes that the words of each document arise from a mixture of topics, each of which is a distribution over the vocabulary. A limi ..."
Abstract

Cited by 156 (10 self)
 Add to MetaCart
Topic models, such as latent Dirichlet allocation (LDA), can be useful tools for the statistical analysis of document collections and other discrete data. The LDA model assumes that the words of each document arise from a mixture of topics, each of which is a distribution over the vocabulary. A limitation of LDA is the inability to model topic correlation even though, for example, a document about genetics is more likely to also be about disease than Xray astronomy. This limitation stems from the use of the Dirichlet distribution to model the variability among the topic proportions. In this paper we develop the correlated topic model (CTM), where the topic proportions exhibit correlation via the logistic normal distribution [J. Roy. Statist. Soc. Ser. B 44 (1982) 139–177]. We derive a fast variational inference algorithm for approximate posterior inference in this model, which is complicated by the fact that the logistic normal is not conjugate to the multinomial. We apply the CTM to the articles from Science published from 1990–1999, a data set that comprises 57M words. The CTM gives a better fit of the data than LDA, and we demonstrate its use as an exploratory tool of large document collections.
Nonlinear Matrix Factorization with Gaussian Processes
"... A popular approach to collaborative filtering is matrix factorization. In this paper we develop a nonlinear probabilistic matrix factorization using Gaussian process latent variable models. We use stochastic gradient descent (SGD) to optimize the model. SGD allows us to apply Gaussian processes to ..."
Abstract

Cited by 74 (1 self)
 Add to MetaCart
A popular approach to collaborative filtering is matrix factorization. In this paper we develop a nonlinear probabilistic matrix factorization using Gaussian process latent variable models. We use stochastic gradient descent (SGD) to optimize the model. SGD allows us to apply Gaussian processes to data sets with millions of observations without approximate methods. We apply our approach to benchmark movie recommender data sets. The results show better than previous stateoftheart performance. 1.
Learning with Matrix Factorization
, 2004
"... Matrices that can be factored into a product of two simpler matrices can serve as a useful and often natural model in the analysis of tabulated or highdimensional data. Models based on matrix factorization (Factor Analysis, PCA) have been extensively used in statistical analysis and machine learning ..."
Abstract

Cited by 71 (6 self)
 Add to MetaCart
(Show Context)
Matrices that can be factored into a product of two simpler matrices can serve as a useful and often natural model in the analysis of tabulated or highdimensional data. Models based on matrix factorization (Factor Analysis, PCA) have been extensively used in statistical analysis and machine learning for over a century, with many new formulations and models suggested in recent
Learning a metalevel prior for feature relevance from multiple related tasks
 IN PROCEEDINGS OF INTERNATIONAL CONFERENCE ON MACHINE LEARNING (ICML). EINAT
, 2007
"... In many prediction tasks, selecting relevant features is essential for achieving good generalization performance. Most feature selection algorithms consider all features to be a priori equally likely to be relevant. In this paper, we use transfer learning — learning on an ensemble of related tasks — ..."
Abstract

Cited by 45 (2 self)
 Add to MetaCart
(Show Context)
In many prediction tasks, selecting relevant features is essential for achieving good generalization performance. Most feature selection algorithms consider all features to be a priori equally likely to be relevant. In this paper, we use transfer learning — learning on an ensemble of related tasks — to construct an informative prior on feature relevance. We assume that features themselves have metafeatures that are predictive of their relevance to the prediction task, and model their relevance as a function of the metafeatures using hyperparameters (called metapriors). We present a convex optimization algorithm for simultaneously learning the metapriors and feature weights from an ensemble of related prediction tasks that share a similar relevance structure. Our approach transfers the metapriors among different tasks, allowing it to deal with settings where tasks have nonoverlapping features or where feature relevance varies over the tasks. We show that transfer learning of feature relevance improves performance on two real data sets which illustrate such settings: (1) predicting ratings in a collaborative filtering task, and (2) distinguishing arguments of a verb in a sentence.
Collaborative filtering and the missing at random assumption
 In Proceedings of the 23rd Conference on Uncertainty in Artificial Intelligence (UAI
, 2007
"... Rating prediction is an important application, and a popular research topic in collaborative ltering. However, both the validity of learning algorithms, and the validity of standard testing procedures rest on the assumption that missing ratings are missing at random (MAR). In this paper we presen ..."
Abstract

Cited by 43 (4 self)
 Add to MetaCart
(Show Context)
Rating prediction is an important application, and a popular research topic in collaborative ltering. However, both the validity of learning algorithms, and the validity of standard testing procedures rest on the assumption that missing ratings are missing at random (MAR). In this paper we present the results of a user study in which we collect a random sample of ratings from current users of an online radio service. An analysis of the rating data collected in the study shows that the sample of random ratings has markedly dierent properties than ratings of userselected songs. When asked to report on their own rating behaviour, a large number of users indicate they believe their opinion of a song does aect whether they choose to rate that song, a violation of the MAR condition. Finally, we present experimental results showing that incorporating an explicit model of the missing data mechanism can lead to signi cant improvements in prediction performance on the random sample of ratings. 1
Collaborative prediction using ensembles of maximum margin matrix factorizations
 In ICML
, 2006
"... Fast gradientbased methods for Maximum Margin Matrix Factorization (MMMF) were recently shown to have great promise (Rennie & Srebro, 2005), including significantly outperforming the previous stateoftheart methods on some standard collaborative prediction benchmarks (including MovieLens). In ..."
Abstract

Cited by 35 (0 self)
 Add to MetaCart
(Show Context)
Fast gradientbased methods for Maximum Margin Matrix Factorization (MMMF) were recently shown to have great promise (Rennie & Srebro, 2005), including significantly outperforming the previous stateoftheart methods on some standard collaborative prediction benchmarks (including MovieLens). In this paper, we investigate ways to further improve the performance of MMMF, by casting it within an ensemble approach. We explore and evaluate a variety of alternative ways to define such ensembles. We show that our resulting ensembles can perform significantly better than a single MMMF model, along multiple evaluation metrics. In fact, we find that ensembles of partially trained MMMF models can sometimes even give better predictions in total training time comparable to a single MMMF model. 1.