#### DMCA

## Collaborative Filtering: A Machine Learning Perspective (2004)

Citations: | 76 - 3 self |

### Citations

11970 | Maximum likelihood from incomplete data via the EM algorithm
- Dempster, Laird, et al.
- 1977
(Show Context)
Citation Context ...ues are never observed. Learning these models requires the use of an expectation maximization procedure. In this chapter we review the Expectation Maximization algorithm of Dempster, Laird and, Rubin =-=[16]-=-. We also introduce the more recent free energy interpretation of standard EM due to Neal and Hinton [44]. We follow the free energy approach of Neal and Hinton for the development of all models in ch... |

8904 | Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference - Pearl - 1988 |

4365 | Latent dirichlet allocation - Blei, Ng, et al. |

2769 | Statistical Analysis with Missing Data
- Little, Rubin
- 1987
(Show Context)
Citation Context ... unobserved given the values all variables is equal to the the probability that a variable is unobserved given the values of just the observed variables, then the data is said to be missing at random =-=[36]-=-. If the data is missing completely at random or simply missing at random then the missing data mechanism can be ignored. If the data is not missing at random then ignoring the missing data mechanism ... |

1569 | Wrappers for feature subset selection.
- Kohavi, John
- 1997
(Show Context)
Citation Context ...en features to use for classication. In aslter approach to feature selection, a set of features is selected as a preprocessing step, ignoring the eect of the selected features on classier accuracy [33=-=-=-]. In a wrapper approach to feature selection, classication accuracy is used to guide a search through the space of feature subsets [33]. One feature selectionslter often used with the naive Bayes cla... |

1546 | Grouplens: An open architecture for collaborative filtering of netnews
- Resnick, Iacovou, et al.
- 1994
(Show Context)
Citation Context ...al of research has been performed within the pure, non-sequential, rating-based formulation of collaborativesltering. While early work focused on the neighborhood methods introduced by Resnick et al. =-=[49-=-], new and inventive techniques have been introduced from a wide variety of disciplines including articial intelligence, human factors, knowledge discovery, informationsltering and retrieval, machine ... |

1496 | Empirical analysis of predictive algorithms for collaborative filtering
- Breese, Heckerman, et al.
- 1998
(Show Context)
Citation Context ... and describe some of their important properties. 3.3.1 Experimental Protocols Most rating prediction experiments found in the literature follow a protocol popularized by Breese, Heckerman, and Kadie =-=[10]-=-. In these experiments the available ratings for each user are split into an observed set, and a held out set. The observed ratings are used for training, and the held out ratings are used for testing... |

1352 | An introduction to variable and feature selection. - Guyon, Elisseeff - 2003 |

1246 | Algorithms for non-negative matrix factorization - Lee, Seung - 2000 |

806 | Constrained Optimization and Lagrange Multiplier Methods. Athena Scientific, - Bertsekas - 1996 |

776 |
An algorithmic framework for performing collaborative filtering
- Herlocker, Konstan, et al.
- 1999
(Show Context)
Citation Context ...bscures the true relationship between standard classication techniques from machine learning, and the set of methods popularized by Resnick et al. [49], Shardanand and Maes [52], and Herlocker et al [=-=27]-=-. These algorithms have been called memory-based [10], similarity-based, and neighborhood-based [27] in the literature. As we show in the following section, neighborhood-based collaborativesltering me... |

771 | Probabilistic latent semantic analysis.
- Hofmann
- 1999
(Show Context)
Citation Context ...rediction [29]. The rating prediction version of the aspect model is closely related to the aspect model for probabilistic latent semantic analysis of text documents, also referred to as pLSA or pLSI =-=[28]-=-. To avoid confusion we will refer to the rating prediction version as the triadic aspect model, and the text analysis version as the dyadic aspect model. It is important to understand the relationshi... |

709 | Probabilistic principal component analysis.
- Tipping, Bishop
- 1999
(Show Context)
Citation Context ...dimensional latent space vectors into the N dimensional data space, species a mean in the data space common to all data vectors, and is randomly sampled Gaussian noise unique to each data vector [54]. is often referred to as the factor loading matrix. The corresponding graphical model is shown insgure 6.1. x = z + + (6.8) Chapter 6. Dimensionality Reduction 48 X 2 X 3 X M X 1 Z 1 Z 2 Z K F... |

408 | A Survey of Clustering Data Mining Techniques,"
- Berkhin
- 2006
(Show Context)
Citation Context ...s to group similar input vectors together. A number of clustering algorithms are well known in machine learning, and they fall into two broad classes: hierarchical clustering, and standard clustering =-=[3-=-]. In hierarchical clustering a tree of clusters is constructed, and methods dier depending on whether the tree is constructed bottom-up or top-down. Standard clustering includes K-means, K-medians, a... |

330 | Methods and metrics for cold-start recommendations,”
- Schein, Popescul, et al.
- 2002
(Show Context)
Citation Context ...al of several new algorithms that incorporate additional features. See for example the work of Basu, Hirsh, and Cohen [2], Melville, Mooney, and Nagarajan [39], as well as Schein, Popescul, and Ungar =-=[51-=-]. The hybrid approach purportedly reduces the eect of two well known problems with collaborativesltering systems: the cold start problem, and the new user problem. The cold start problem occurs when ... |

323 | Application of dimensionality reduction in recommender system-a case study. - Sarwar, Karypis, et al. - 2000 |

231 | Implicit interest indicators. In:
- Claypool, Le, et al.
- 2001
(Show Context)
Citation Context ...icitly collected while the user performs a primary task such a browsing an Internet site. Claypool et al. present an interesting comparison between implicit preference indicators and explicit ratings =-=[15]-=-. Requiring a user to supply explicit ratings results in a cognitive burden not present when implicit preference Chapter 2. Formulations 6 indicators are collected. Claypool et al. argue that the perc... |

231 | Pattern Classi and Scene Analysis - Duda, Hart - 1973 |

211 | Latent class models for collaborative filtering. - Hofmann, Puzicha - 1999 |

207 | Estimating a dirichlet distribution.
- Minka
- 2003
(Show Context)
Citation Context ...t iteration, which yields very similar results compared to the alternative Newton iteration. Details for both procedures including the derivation for the inversion of the psi function may be found in [40]. @F [; ;s] @ i = N( ( k X i=1 j )s( i )) + N X u=1s(su i )s( K X j=1 u j )s( i ) =s( k X i=1 j ) + 1=N( N X u=1s(su i )s( K X j=1 u j )) i =s1 0 @s( k X i=1 j ) + 1=N( N X u=1s(su i )s( ... |

198 | Weighted lowrank approximations.
- Srebro, Jaakkola
- 2003
(Show Context)
Citation Context ... rating prediction method based on K-medians clustering. In chapter 6 we present rating prediction methods based on dimensionality reduction techniques including weighted singular value decomposition =-=[53]-=-, principal components analysis [24], and probabilistic principal components analysis [13]. We introduce a new rating prediction algorithm that extends the existing work on weighted singular value dec... |

179 | Incremental Singular Value Decomposition of Uncertain Data with Missing Values. - Brand - 2002 |

157 | Expectation-propagation for the generative aspect model,”
- Minka, Lafferty
- 2002
(Show Context)
Citation Context ...curate rating prediction results. Similarly, we might expect a modelstting procedure based on expectation propagation to result in more accurate rating prediction results than our variational methods =-=[41-=-]. 7.4.5 URP Rating Prediction Computing the distribution over rating values for a particular unrated item given a user prole r u requires applying variational inference. For rating prediction we gene... |

152 | Top-down induction of clustering trees. - Blockeel, Raedt, et al. - 1998 |

122 | Unsupervised learning from dyadic data. - Hofmann, Puzicha - 2009 |

95 | Variational extensions to EM and multinomial PCA. In:
- Buntine
- 2002
(Show Context)
Citation Context ...he LDA model [7]. On the other hand, a single step of variational inference can be used. This is the approach adopted by Buntine tost the Multinomial PCA (mPCA) model , a slight generalization of LDA =-=[1-=-1]. A further renement is to allow the number of steps of variational inference to vary for each user by dening a heuristic function that may depend on the user, H(u). Empirically, we have found that ... |

79 |
A new view of the EM algorithm that justi incremental and other variants
- Neal, Hinton
- 1998
(Show Context)
Citation Context ...In this chapter we review the Expectation Maximization algorithm of Dempster, Laird and, Rubin [16]. We also introduce the more recent free energy interpretation of standard EM due to Neal and Hinton =-=[4-=-4]. We follow the free energy approach of Neal and Hinton for the development of all models in chapter 7. As we will see in the following chapters, dierent learning and prediction methods dier greatly... |

70 |
Electronic Junk.
- Denning
- 1982
(Show Context)
Citation Context .... . . . . . . . . . . . 96 xii Chapter 1 Introduction The problem of information overload was identied as early as 1982 in an ACM President 's Letter by Peter J. Denning aptly titled Electronic Junk [=-=17]-=-. Denning argued that the deployment of oce information systems technology coupled with a quickly increasing use of electronic mail was sure to overwhelm computer users. Since that time many new sourc... |

51 |
Urs Hölzle. "Web search for a planet: The Google cluster architecture
- Barroso, Dean
(Show Context)
Citation Context ...n detail below. E-Step X = WsD + (1 W )s^ D (6.5) M-Step [U; ; V ] = SVD(X) (6.6) ^ D = UKK V T K (6.7) In fact, this EM algorithm holds for any weight matrix W so long as W ij lies in the interval [0=-=; 1]-=- for all i and j [53]. Srebro and Jaakola note that both the number of iterations needed to achieve convergence, and the quality of the solution of the EM procedure depend strongly on the amount of mi... |

48 | On an equivalence between plsi and lda.
- Girolami, Kaban
- 2003
(Show Context)
Citation Context ... procedure are given in algorithm 7.9. 7.4.4 An Equivalence Between The Aspect Model and URP Recently Girolami and Kaban have shown an interesting equivalence between the dyadic aspect model and LDA [20]. They show thatstting an LDA model with a uniform Chapter 7. Probabilistic Rating Models 79 Input: fr u g, K Output: ,sInitialize ,swhile (F [;s; ;s] Not Converged) do for u = 1 to N do [ u ;su ... |

39 | Cluster merging and splitting in hierarchical clustering algorithms”,
- Ding, He
- 2002
(Show Context)
Citation Context ...tric between all pairs of clusters, and choosing the pair of clusters that is closest with respect to the metric. Common linkage metrics include single linkage, complete linksage, and average linkage =-=[18]-=-. The linkage metric depends on a distance function between input vectors d(x a ; x b ). Single Linkage l s (C k ; C l ) = min xr2C k ;x t 2C l d(x r ; x t ) Complete Linkage l c (C k ; C l ) = max xr... |

38 | A Maximum Entropy Approach to Collaborative Filtering in Dynamic, Sparse,
- Pavlov, Pennock
- 2002
(Show Context)
Citation Context ...opt a maximum entropy approach to prediction within the sequential framework. Their method performs favorably on a document recommendation task when compared to content-based methods currently in use =-=[4-=-6]. Girolami and Kaban introduce a method for learning dynamic user proles based on simplicial mixtures ofsrst order Markov chains. They apply their method to a variety of data sets including a web br... |

22 | Learning What People (Don’t) Want,”
- Hofmann
- 2001
(Show Context)
Citation Context ... value decomposition. In chapter 7 we describe a number of methods based on density estimation in probabilistic models including a multinomial model, a mixture of multinomials model, the aspect model =-=[29-=-], and the user rating prole model [38]. We introduce a new family of models called the Attitude model family. We implement a total of nine rating prediction methods and perform large scale prediction... |

21 | Simplicial Mixtures of Markov Chains: Distributed Modelling of Dynamic User Profiles.
- Girolami, Kaban
- 2003
(Show Context)
Citation Context ...ntroduce a method for learning dynamic user proles based on simplicial mixtures ofsrst order Markov chains. They apply their method to a variety of data sets including a web browsing prediction task [=-=21]-=-. 2.2 Pure, Non-Sequential, Rating-Based Formulation Throughout this work we assume a pure, non-sequential, rating-based formulation of collaborative ltering. In this formulation users and items are d... |

16 | Is multinomial pca multi-faceted clustering or dimensionality reduction? Proceedings of the Authorized licensed use limited to - Buntine, Perttu - 2003 |

16 |
Using collaborative ltering to weave an Information Tapestry
- Goldberg, Nichols, et al.
- 1992
(Show Context)
Citation Context ...rm of SQL-like expressions based on the document's content, the content of the annotations, the number of annotations, and the identity of the authors of the annotations associated with each document =-=[23]-=-. Theseld of collaborativesltering research consists of a large number of informationsltering problems, and this collection of formulations is highly structured. In this chapter we introduce a space o... |

14 |
Social information algorithms for automating `word of mouth
- Shardanand, Maes
- 1995
(Show Context)
Citation Context ...Billsus and Pazzani also obscures the true relationship between standard classication techniques from machine learning, and the set of methods popularized by Resnick et al. [49], Shardanand and Maes [=-=52]-=-, and Herlocker et al [27]. These algorithms have been called memory-based [10], similarity-based, and neighborhood-based [27] in the literature. As we show in the following section, neighborhood-base... |

7 |
Learning collaborative information
- Billsus, Pazzani
- 1998
(Show Context)
Citation Context ...ion in a precisely analogous fashion. Billsus and Pazzani propose an alternate framework for performing rating prediction 20 Chapter 4. Classification and Regression 21 as classication or regression [=-=5-=-]. They begin by re-encode ordinal rating values on a scale of 1 to V using a binary 1-of-V encoding scheme. This is necessary when using certain classiers that can not be applied in the presence of m... |

7 |
Robust feature selection using distributions of mutual information
- Zaffalon, Hutter
- 2002
(Show Context)
Citation Context ...re item by the number of samples used to compute it. Zaalon and Hutter present a principled, Bayesian approach to dealing with this problem based on estimating the distribution of mutual information [=-=5-=-6]. 4.2.2 Complexity The computational cost of separately learning one Naive Bayes classier for each item is O(NM 2 V 2 ). Storing the probabilities for a singe classier takes MV 2 + V space and thus ... |

6 |
Recommendation as classi cation: Using social and content-based information in recommendation
- Basu, Hirsh, et al.
- 1998
(Show Context)
Citation Context ...or research than hybrid formulations. However, recent research has seen the proposal of several new algorithms that incorporate additional features. See for example the work of Basu, Hirsh, and Cohen =-=[2-=-], Melville, Mooney, and Nagarajan [39], as well as Schein, Popescul, and Ungar [51]. The hybrid approach purportedly reduces the eect of two well known problems with collaborativesltering systems: th... |

5 | Clustering methods for collaborative - Ungar, Foster - 1998 |

3 |
Eigenstate: a constant time collaborative algorithm
- Goldberg, Roeder, et al.
- 2001
(Show Context)
Citation Context ...-medians clustering. In chapter 6 we present rating prediction methods based on dimensionality reduction techniques including weighted singular value decomposition [53], principal components analysis =-=[24]-=-, and probabilistic principal components analysis [13]. We introduce a new rating prediction algorithm that extends the existing work on weighted singular value decomposition. In chapter 7 we describe... |

3 | Clustering for collaborative applications - Kohrs, Merialdo - 1999 |

3 |
Content-boosted collaborative
- Melville, Mooney, et al.
- 2001
(Show Context)
Citation Context ... However, recent research has seen the proposal of several new algorithms that incorporate additional features. See for example the work of Basu, Hirsh, and Cohen [2], Melville, Mooney, and Nagarajan =-=[39-=-], as well as Schein, Popescul, and Ungar [51]. The hybrid approach purportedly reduces the eect of two well known problems with collaborativesltering systems: the cold start problem, and the new user... |

2 |
Collaborative with privacy via factor analysis
- Canny
- 2002
(Show Context)
Citation Context ...ediction methods based on dimensionality reduction techniques including weighted singular value decomposition [53], principal components analysis [24], and probabilistic principal components analysis =-=[13]-=-. We introduce a new rating prediction algorithm that extends the existing work on weighted singular value decomposition. In chapter 7 we describe a number of methods based on density estimation in pr... |

2 |
Modeling user rating pro for collaborative
- Marlin
- 2004
(Show Context)
Citation Context ...escribe a number of methods based on density estimation in probabilistic models including a multinomial model, a mixture of multinomials model, the aspect model [29], and the user rating prole model [=-=38]-=-. We introduce a new family of models called the Attitude model family. We implement a total of nine rating prediction methods and perform large scale prediction accuracy experiments. In chapter 8 we ... |

1 |
RecTree: An ecient collaborative ltering method
- Chee, Han, et al.
- 2001
(Show Context)
Citation Context ...ser to. Chapter 5. Clustering 38 5.2.1 Rating Prediction Seng and Wang present a user clustering algorithm based on divisive hierarchical clustering called the Recommendation Tree algorithm (RecTree) =-=[1-=-4]. In the RecTree algorithm a cluster node is expanded if it is at a depth less than a specied maximum, and its size is greater than a specied maximum. The exact sequence in which nodes are expanded ... |

1 |
Thresholds for more accurate collaborative
- Gokhale, Claypool
- 1999
(Show Context)
Citation Context ...= 1 to M do ^ r a ys r a + P K k=1 wau k (r u k y r u k ) P K k=1 jwau k j end for Algorithm 4.1: PKNN-Predict active user and each user from the data set. The later were termed history thresholds [2=-=2-=-]. Herlocker et al. perform experiments using similar thresholds, as well as a BestK neighbors method that is most similar to standard KNN classication. The general result of this work is that using a... |

1 |
Collaborative via gaussian probabilistic latent semantic analysis
- Hofmann
- 2003
(Show Context)
Citation Context ...dels to Gaussian distributed continuous random variables and then re-derrive all the modelstting and prediction equations. Hofmann has recently carried out this exercise with the triadic aspect model =-=[30]-=-. A close second to our chosen formulation in terms of research activity is the cooccurrence, pure, non-sequential formulation. We brie y discussed this formulation in conjunction with LDA and the dya... |

1 | Information - Loeb, Terry - 1992 |

1 | Collaborative with the simple bayesian classi - Miyahara, Pazzani - 2000 |

1 |
Clustering items for collaborative
- O'Connor, Herlocker
- 1999
(Show Context)
Citation Context ...ing step, which requires the subsequent application of a rating prediction method. O'Connor and Herlocker have studied item clustering as a preprocessing step for neighborhood based rating prediction =-=[45]-=-. They apply several clustering methods, but their empirical results show prediction accuracy actually decreases compared to the unclustered base case regardless of the clustering method used. A reduc... |