Results 1  10
of
29
Generalized probabilistic matrix factorizations for collaborative filtering
, 2010
"... Abstract—Probabilistic matrix factorization (PMF) methods have shown great promise in collaborative filtering. In this paper, we consider several variants and generalizations of PMF framework inspired by three broad questions: Are the prior distributions used in existing PMF models suitable, or can ..."
Abstract

Cited by 18 (3 self)
 Add to MetaCart
(Show Context)
Abstract—Probabilistic matrix factorization (PMF) methods have shown great promise in collaborative filtering. In this paper, we consider several variants and generalizations of PMF framework inspired by three broad questions: Are the prior distributions used in existing PMF models suitable, or can one get better predictive performance with different priors? Are there suitable extensions to leverage side information? Are there benefits to taking into account row and column biases? We develop new families of PMF models to address these questions along with efficient approximate inference algorithms for learning and prediction. Through extensive experiments on movie recommendation datasets, we illustrate that simpler models directly capturing correlations among latent factors can outperform existing PMF models, side information can benefit prediction accuracy, and accounting for row/column biases leads to improvements in predictive performance. Keywordsprobabilistic matrix factorization, topic models, variational inference I.
An Association Analysis Approach to Biclustering
"... The discovery of biclusters, which denote groups of items that show coherent values across a subset of all the transactions in a data set, is an important type of analysis performed on realvalued data sets in various domains, such as biology. Several algorithms have been proposed to find different ..."
Abstract

Cited by 16 (5 self)
 Add to MetaCart
(Show Context)
The discovery of biclusters, which denote groups of items that show coherent values across a subset of all the transactions in a data set, is an important type of analysis performed on realvalued data sets in various domains, such as biology. Several algorithms have been proposed to find different types of biclusters in such data sets. However, these algorithms are unable to search the space of all possible biclusters exhaustively. Pattern mining algorithms in association analysis also essentially produce biclusters as their result, since the patterns consist of items that are supported by a subset of all the transactions. However, a major limitation of the numerous techniques developed in association analysis is that they are only able to analyze data sets with binary and/or categorical variables, and their application to realvalued data sets often involves some lossy transformation such as discretization or binarization of the attributes. In
PACBayesian Analysis of Coclustering and Beyond
"... We derive PACBayesian generalization bounds for supervised and unsupervised learning models based on clustering, such as coclustering, matrix trifactorization, graphical models, graph clustering, and pairwise clustering. 1 We begin with the analysis of coclustering, which is a widely used approa ..."
Abstract

Cited by 15 (7 self)
 Add to MetaCart
(Show Context)
We derive PACBayesian generalization bounds for supervised and unsupervised learning models based on clustering, such as coclustering, matrix trifactorization, graphical models, graph clustering, and pairwise clustering. 1 We begin with the analysis of coclustering, which is a widely used approach to the analysis of data matrices. We distinguish among two tasks in matrix data analysis: discriminative prediction of the missing entries in data matrices and estimation of the joint probability distribution of row and column variables in cooccurrence matrices. We derive PACBayesian generalization bounds for the expected outofsample performance of coclusteringbased solutions for these two tasks. The analysis yields regularization terms that were absent in the previous formulations of coclustering. The bounds suggest that the expected performance of coclustering is governed by a tradeoff between its empirical performance and the mutual information preserved by the cluster variables on row and column IDs. We derive an iterative projection algorithm for finding a local optimum of this tradeoff for discriminative prediction tasks. This algorithm achieved stateoftheart performance in the MovieLens collaborative filtering task. Our coclustering model can also be seen as matrix trifactorization and the results provide generalization bounds, regularization
Nonparametric Bayesian Coclustering Ensembles
"... A nonparametric Bayesian approach to coclustering ensembles is presented. Similar to clustering ensembles, coclustering ensembles combine various base coclustering results to obtain a more robust consensus coclustering. To avoid prespecifying the number of coclusters, we specify independent Dir ..."
Abstract

Cited by 12 (1 self)
 Add to MetaCart
(Show Context)
A nonparametric Bayesian approach to coclustering ensembles is presented. Similar to clustering ensembles, coclustering ensembles combine various base coclustering results to obtain a more robust consensus coclustering. To avoid prespecifying the number of coclusters, we specify independent Dirichlet process priors for the row and column clusters. Thus, the numbers of row and columnclusters are unbounded a priori; the actual numbers of clusters can be learned a posteriori from observations. Next, to model nonindependence of row and columnclusters, we employ a Mondrian Process as a prior distribution over partitions of the data matrix. As a result, the coclusters are not restricted to a regular grid partition, but form nested partitions with varying resolutions. The empirical evaluation
Constrained CoClustering for Textual Documents
 Proc. Conf. Artificial Intelligence (AAAI
, 2010
"... In this paper, we present a constrained coclustering approach for clustering textual documents. Our approach combines the benefits of informationtheoretic coclustering and constrained clustering. We use a twosided hidden Markov random field (HMRF) to model both the document and word constraints. ..."
Abstract

Cited by 11 (5 self)
 Add to MetaCart
(Show Context)
In this paper, we present a constrained coclustering approach for clustering textual documents. Our approach combines the benefits of informationtheoretic coclustering and constrained clustering. We use a twosided hidden Markov random field (HMRF) to model both the document and word constraints. We also develop an alternating expectation maximization (EM) algorithm to optimize the constrained coclustering model. We have conducted two sets of experiments on a benchmark data set: (1) using humanprovided category labels to derive document and word constraints for semisupervised document clustering, and (2) using automatically extracted named entities to derive document constraints for unsupervised document clustering. Compared to several representative constrained clustering and coclustering approaches, our approach is shown to be more effective for highdimensional, sparse text data.
Latent Dirichlet Bayesian coclustering
 In Proceedings of the European Conference on Machine Learning
, 2009
"... Abstract. Coclustering has emerged as an important technique for mining contingency data matrices. However, almost all existing coclustering algorithms are hard partitioning, assigning each row and column of the data matrix to one cluster. Recently a Bayesian coclustering approach has been propose ..."
Abstract

Cited by 8 (4 self)
 Add to MetaCart
(Show Context)
Abstract. Coclustering has emerged as an important technique for mining contingency data matrices. However, almost all existing coclustering algorithms are hard partitioning, assigning each row and column of the data matrix to one cluster. Recently a Bayesian coclustering approach has been proposed which allows a probability distribution membership in row and column clusters. The approach uses variational inference for parameter estimation. In this work, we modify the Bayesian coclustering model, and use collapsed Gibbs sampling and collapsed variational inference for parameter estimation. Our empirical evaluation on real data sets shows that both collapsed Gibbs sampling and collapsed variational inference are able to find more accurate likelihood estimates than the standard variational Bayesian coclustering approach.
Variational Inference for Nonparametric Multiple Clustering
"... Most clustering algorithms produce a single clustering solution. Similarly, feature selection for clustering tries to find one feature subset where one interesting clustering solution resides. However, a single data set may be multifaceted and can be grouped and interpreted in many different ways, ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
(Show Context)
Most clustering algorithms produce a single clustering solution. Similarly, feature selection for clustering tries to find one feature subset where one interesting clustering solution resides. However, a single data set may be multifaceted and can be grouped and interpreted in many different ways, especially for high dimensional data, where feature selection is typically needed. Moreover, different clustering solutions are interesting for different purposes. Instead of committing to one clustering solution, in this paper we introduce a probabilistic nonparametric Bayesian model that can discover several possible clustering solutions and the feature subset views that generated each cluster partitioning simultaneously. We provide a variational inference approach to learn the features and clustering partitions in each view. Our model allows us not only to learn the multiple clusterings and views but also allows us to automatically learn the number of views and the number of clusters in each view. Keywords multiple clustering, nonredundant/disparate clustering, feature selection, nonparametric Bayes, variational inference 1.
MixedMembership Naive Bayes Models
"... In recent years, mixture models have found widespread usage in discovering latent cluster structure from data. A popular special case of finite mixture models are naive Bayes models, where the probability of a feature vector factorizes over the features for any given component of the mixture. Despit ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
(Show Context)
In recent years, mixture models have found widespread usage in discovering latent cluster structure from data. A popular special case of finite mixture models are naive Bayes models, where the probability of a feature vector factorizes over the features for any given component of the mixture. Despite their popularity, naive Bayes models suffer from two important restrictions: first, they do not have a natural mechanism for handling sparsity, where each data point may have only a few observed features; and second, they do not allow objects to be generated from different latent clusters with varying degrees (i.e., mixedmemberships) in the generative process. In this paper, we first introduce marginal naive Bayes (MNB) models, which generalize naive Bayes models to handle sparsity by marginalizing over all missing features. More importantly, we propose mixedmembership naive Bayes (MMNB) models, which generalizes (marginal) naive Bayes models to allow for mixed memberships in the generative process. MMNB models can be viewed as a natural generalization of latent Dirichlet allocation (LDA) with the ability to handle heterogenous and possibly sparse feature vectors. We propose two variational inference algorithms to learn MMNB models from data. While the first exactly follows the corresponding ideas for LDA, the second uses much fewer variational parameters leading to a much faster algorithm with smaller time and space requirements. An application of the same idea in the context of topic modeling leads to a new Fast LDA algorithm. The efficacy of the proposed mixedmembership models and the fast variational inference algorithms are demonstrated by extensive experiments on a wide variety of different datasets. 1
Residual Bayesian Coclustering for Matrix Approximation
"... In recent years, matrix approximation for missing value prediction has emerged as an important problem in a variety of domains such as recommendation systems, ecommerce and online advertisement. While matrix factorization based algorithms typically have good approximation accuracy, such algorithms ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
(Show Context)
In recent years, matrix approximation for missing value prediction has emerged as an important problem in a variety of domains such as recommendation systems, ecommerce and online advertisement. While matrix factorization based algorithms typically have good approximation accuracy, such algorithms can be slow especially for large matrices. Further, such algorithms cannot naturally make prediction on new rows or columns. In this paper, we propose residual Bayesian coclustering (RBC), which learns a generative model corresponding to the matrix from the nonmissing entries, and uses the model to predict the missing entries. RBC is an extension of Bayesian coclustering by taking row and column bias into consideration. The model allows mixed memberships of rows and columns to multiple clusters, and can naturally handle the prediction on new rows and columns which are not used in the training process, given only a few nonmissing entries in them. We propose two variational inference based algorithms for learning the model and predicting missing entries. One of the proposed algorithms leads to a parallel RBC which can achieve significant speedups. The efficacy of RBC is demonstrated by extensive experimental comparisons with stateoftheart algorithms on real world datasets. 1
A nonparametric bayesian model for multiple clustering with overlapping feature views
 Journal of Machine Learning Research
, 2012
"... Most clustering algorithms produce a single clustering solution. This is inadequate for many data sets that are multifaceted and can be grouped and interpreted in many different ways. Moreover, for highdimensional data, different features may be relevant or irrelevant to each clustering solution, ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
(Show Context)
Most clustering algorithms produce a single clustering solution. This is inadequate for many data sets that are multifaceted and can be grouped and interpreted in many different ways. Moreover, for highdimensional data, different features may be relevant or irrelevant to each clustering solution, suggesting the need for feature selection in clustering. Features relevant to one clustering interpretation may be different from the ones relevant for an alternative interpretation or view of the data. In this paper, we introduce a probabilistic nonparametric Bayesian model that can discover multiple clustering solutions from data and the feature subsets that are relevant for the clusters in each view. In our model, the features in different views may be shared and therefore the sets of relevant features are allowed to overlap. We model feature relevance to each view using an Indian Buffet Process and the cluster membership in each view using a Chinese Restaurant Process. We provide an inference approach to learn the latent parameters corresponding to this multiple partitioning problem. Our model not only learns the features and clusters in each view but also automatically learns the number of clusters, number of views and number of features in each view. 1