#### DMCA

## Predictive discrete latent factor models for large scale dyadic data (2007)

Venue: | In KDD ’07 |

Citations: | 36 - 2 self |

### Citations

3173 |
Generalized Linear Models
- McCullagh, Nelder
- 1989
(Show Context)
Citation Context ... EM-based algorithms for “soft” and “hard” assignments, that are linear in the number of non-zeros in the dyadic matrix. The algorithms generalize several existing algorithms including GLM regression =-=[16]-=-, co-clustering using Bregman divergences [2], cross-association learning [4], NPMLE [1], etc. • We present an extensive empirical evaluation of our procedure through simulation experiments, analysis ... |

2798 |
Matrix Computations
- Golub, Loan
- 1996
(Show Context)
Citation Context ...ens data. We choose k=l=5 for the both PDLF and cross-associations learning. first case study. Since most of the existing techniques for addressing this task such as singular value decomposition(SVD) =-=[8]-=-, nonnegative matrix factorization(NNMF) [13] and correlation-based methods [23] implicitly assume a Gaussian generative model, we transformed the response, i.e., the rating values using ynew = p (6− ... |

1545 | GroupLens: An Open Architecture for Collaborative Filtering of Netnews,”
- Resnick, Iakovou, et al.
- 1994
(Show Context)
Citation Context ...rst case study. Since most of the existing techniques for addressing this task such as singular value decomposition(SVD) [8], nonnegative matrix factorization(NNMF) [13] and correlation-based methods =-=[23]-=- implicitly assume a Gaussian generative model, we transformed the response, i.e., the rating values using ynew = p (6− y) to eliminate the skew and make the distribution more symmetric and close to G... |

1223 | Probabilistic latent semantic indexing
- Hofmann
- 1999
(Show Context)
Citation Context ... learning methods in the context of dyadic data. Most methods of similar flavor such as singular value decomposition [8], non-negative matrix factorization [13],probabilistic latent semantic analysis =-=[12]-=-, cross-association learning [4], Bregman co-clustering [2] are matrix approximation techniques, which impose different constraints on the latent structure depending on the choice of loss function. Am... |

1222 | Algorithms for Non-negative Matrix Factorization
- Lee, Seung
- 2001
(Show Context)
Citation Context ...and cross-associations learning. first case study. Since most of the existing techniques for addressing this task such as singular value decomposition(SVD) [8], nonnegative matrix factorization(NNMF) =-=[13]-=- and correlation-based methods [23] implicitly assume a Gaussian generative model, we transformed the response, i.e., the rating values using ynew = p (6− y) to eliminate the skew and make the distrib... |

993 | A view of the EM algorithm that justifies incremental sparse and other variants
- Neal, Hinton
- 1998
(Show Context)
Citation Context ...e maximization of log-likelihood defined in eqn. (3.4), we consider a complete data likelihood obtained by augmenting {yij}ij with the latent variables {ρ(i)}i and {γ(j)}j . Following the analysis in =-=[19]-=-, we consider the free-energy function, which is defined as the sum of the expected complete log-likelihood and the entropy of the latent variables with respect to an arbitrary distribution p̃({ρ(i)}i... |

481 | Biclustering algorithms for biological data analysis: A survey
- Madeira, Oliveira
- 2004
(Show Context)
Citation Context ...t space through a mixture model. 2.3 Matrix Co-clustering Co-clustering, or simultaneous clustering of both rows and columns, has become a method of choice for analyzing large and sparse data matrices=-=[15, 2]-=- due to its scalability and has been shown to be effective for predicting missing values in dyadic data exploiting the interactions that are often present in the observed response values. In particula... |

440 | Clustering with Bregman divergences
- Banerjee, Merugu, et al.
- 2005
(Show Context)
Citation Context ...d co-cluster interactions) or θi,j,I,J = θI,J (accommodates only co-cluster interactions). Using the bijection result between (regular) exponential families and a special class of Bregman divergences =-=[3]-=- and the projection theorem characterizing the optimality of minimum Bregman information matrix with respect to generalized additive models in the natural parameter space [17], it can be shown that ma... |

345 | Information-theoretic coclustering
- DHILLON, MALLELA, et al.
- 2003
(Show Context)
Citation Context ...LF algorithm, measured in this case by I-divergence between observed and predicted (shown in Table 5.13) is better than a straightforward Poisson regression or the information-theoretic co-clustering =-=[6]-=- approach. The clusters from the PDLF algorithm were rigorously analyzed. Figure 5.2 shows the co-clusters obtained before and after adjusting for the covariates and the row/column effects and the cor... |

318 | Latent space approaches to social network analysis
- Hoff, Raftery, et al.
- 2002
(Show Context)
Citation Context ...nate strategy would be to work with a continuous latent factor model where the interactions are modeled through a distance function. Such strategies have been pursued recently for social network data =-=[11]-=-; generalization to dyadic data with elements obtained from two different sets is challenging. Although the current work focuses on predictive discrete latent factors based on generalized linear model... |

318 | A Framework for Collaborative, Content-Based, and Demographic Filtering,
- Pazzani
- 1999
(Show Context)
Citation Context ... Paper 34 method employs a one-dimensional discrete cluster model where the cluster assignment variables are modeled using a Potts allocation model. Other Work. In the context of recommender systems, =-=[21]-=- considered combining information in the local structure of preference ratings as well as demographic and content-based covariates using an ensemble-based approach. This ensemble method, however, does... |

231 | Estimation and prediction of stochastic block structures.
- Nowicki, Snijders
- 2001
(Show Context)
Citation Context ...er, such models are mainly used for explanatory analysis and are not well suited for prediction tasks. Models similar to ours have been studied for small problems in one dimension [1]. More recently, =-=[20]-=- proposed a block model for binary dyadic data which models incidence matrices in social networks where both row and column elements are the same. However, their method does not incorporate covariates... |

135 | A generalized maximum entropy approach to bregman co-clustering and matrix approximation
- BANERJEE, DHILLON, et al.
- 2004
(Show Context)
Citation Context ...ignments, that are linear in the number of non-zeros in the dyadic matrix. The algorithms generalize several existing algorithms including GLM regression [16], co-clustering using Bregman divergences =-=[2]-=-, cross-association learning [4], NPMLE [1], etc. • We present an extensive empirical evaluation of our procedure through simulation experiments, analysis of a publicly available movie rating dataset,... |

97 | Fully automatic crossassociations
- CHAKRABARTI, PAPADIMITRIOU, et al.
- 2004
(Show Context)
Citation Context ... number of non-zeros in the dyadic matrix. The algorithms generalize several existing algorithms including GLM regression [16], co-clustering using Bregman divergences [2], cross-association learning =-=[4]-=-, NPMLE [1], etc. • We present an extensive empirical evaluation of our procedure through simulation experiments, analysis of a publicly available movie rating dataset, and illustrations on a real dat... |

46 | Unsupervised learning on k-partite graphs
- LONG, ZHANG, et al.
- 2006
(Show Context)
Citation Context ...on co-clustering methods, we refer the reader to [15]. We note that none of these methods make use of additional covariates for modeling the response as we do in our PDLF model. Recently, Long et al. =-=[14]-=- proposed a relational summary network (RSN) model for clustering over k-partite graphs describing relations between k classes of entities. The RSN model considers not only pairwise interactions, but ... |

44 | Efficient analysis of mixed hierarchical and cross classified random structures using a multilevel model.
- Rasbash, Goldstein
- 1994
(Show Context)
Citation Context ...odel and provides better performance. An alternate strategy that has been widely used in the statistics literature provides a more continuous approximation through a hierarchical random effects model =-=[22]-=-. However, such models are mainly used for explanatory analysis and are not well suited for prediction tasks. Models similar to ours have been studied for small problems in one dimension [1]. More rec... |

42 |
A general maximum likelihood analysis of overdispersion in generalized linear models.
- Aitkin
- 1996
(Show Context)
Citation Context ...non-zeros in the dyadic matrix. The algorithms generalize several existing algorithms including GLM regression [16], co-clustering using Bregman divergences [2], cross-association learning [4], NPMLE =-=[1]-=-, etc. • We present an extensive empirical evaluation of our procedure through simulation experiments, analysis of a publicly available movie rating dataset, and illustrations on a real dataset from a... |

40 | Modelling spatially correlated data via mixtures: a Bayesian approach
- Fernández, Green
- 2002
(Show Context)
Citation Context ...rks where both row and column elements are the same. However, their method does not incorporate covariates and was illustrated only on a small dataset. Another model of similar nature was proposed by =-=[7]-=- for spatial data. This Research Track Paper 34 method employs a one-dimensional discrete cluster model where the cluster assignment variables are modeled using a Potts allocation model. Other Work. I... |

31 | Convergence theorems for generalized alternating minimization procedures.
- Gunawardana, Byrne
- 2005
(Show Context)
Citation Context ...ut can be readily computed using convex optimization methods such as the Newton-Raphson’s method. In fact, since the generalized EM algorithm does not require an exact optimization over each argument =-=[10]-=-, it is sufficient Research Track Paper 29 Algorithm 1 Generalized EM Algorithm for PDLF Model Input: Response matrixY = [yij ] ∈ Rm×n with measureW = [wij ] ∈ [0, 1]m×n, covariates X = [xij ] ∈ Rm×n×... |

4 |
Distributed Learning using Generative Models
- Merugu
- 2006
(Show Context)
Citation Context ...ted sum of element-wise Bregman divergence between the matricesY and Ŷ. This co-clustering formulation also permits an alternate interpretation in terms of a structured mixture model as presented in =-=[17]-=-. We briefly describe this connection. For dyad (i, j), let ρ(i) and γ(j) denote the row and column membership of the ith row and jth column respectively. We assume the cluster ids for rows and column... |

1 |
Targeted internet advertising using predictive clustering and linear programming. http://research.microsoft.com/ meek/papers/goal-oriented.ps
- Chickering, Heckerman, et al.
(Show Context)
Citation Context ...h. This ensemble method, however, does not leverage the full potential of the underlying local structure and is not as interpretable as the PDLF model. Our current work is also related to recent work =-=[5]-=- on goal oriented or predictive clustering, which uses a bottleneck-like method, where the rows are clustered to retain maximal information about the dyadic response. Unlike our method, this approach ... |