Results 1  10
of
93
InformationTheoretic CoClustering
 In KDD
, 2003
"... Twodimensional contingency or cooccurrence tables arise frequently in important applications such as text, weblog and marketbasket data analysis. A basic problem in contingency table analysis is coclustering: simultaneous clustering of the rows and columns. A novel theoretical formulation views ..."
Abstract

Cited by 346 (12 self)
 Add to MetaCart
(Show Context)
Twodimensional contingency or cooccurrence tables arise frequently in important applications such as text, weblog and marketbasket data analysis. A basic problem in contingency table analysis is coclustering: simultaneous clustering of the rows and columns. A novel theoretical formulation views the contingency table as an empirical joint probability distribution of two discrete random variables and poses the coclustering problem as an optimization problem in information theory  the optimal coclustering maximizes the mutual information between the clustered random variables subject to constraints on the number of row and column clusters.
Distributional Word Clusters vs. Words for Text Categorization
 Journal of Machine Learning Research
, 2003
"... We study an approach to text categorization that combines distributional clustering of words and a Support Vector Machine (SVM) classifier. This wordcluster representation is computed using the recently introduced Information Bottleneck method, which generates a compact and efficient representati ..."
Abstract

Cited by 89 (7 self)
 Add to MetaCart
(Show Context)
We study an approach to text categorization that combines distributional clustering of words and a Support Vector Machine (SVM) classifier. This wordcluster representation is computed using the recently introduced Information Bottleneck method, which generates a compact and efficient representation of documents. When combined with the classification power of the SVM, this method yields high performance in text categorization. This novel combination of SVM with wordcluster representation is compared with SVMbased categorization using the simpler bagofwords (BOW) representation. The comparison is performed over three known datasets. On one of these datasets (the 20 Newsgroups) the method based on word clusters significantly outperforms the wordbased representation in terms of categorization accuracy or representation efficiency. On the two other sets (Reuters21578 and WebKB) the wordbased representation slightly outperforms the wordcluster representation. We investigate the potential reasons for this behavior and relate it to structural differences between the datasets.
NonRedundant Data Clustering
, 2004
"... Data clustering is a popular approach for automatically finding classes, concepts, or groups of patterns. In practice this discovery process should avoid redundancies with existing knowledge about class structures or groupings, and reveal novel, previously unknown aspects of the data. In order to de ..."
Abstract

Cited by 87 (3 self)
 Add to MetaCart
Data clustering is a popular approach for automatically finding classes, concepts, or groups of patterns. In practice this discovery process should avoid redundancies with existing knowledge about class structures or groupings, and reveal novel, previously unknown aspects of the data. In order to deal with this problem, we present an extension of the information bottleneck framework, called coordinated conditional information bottleneck, which takes negative relevance information into account by maximizing a conditional mutual information score subject to constraints. Algorithmically, one can apply an alternating optimization scheme that can be used in conjunction with different types of numeric and nonnumeric attributes. We present experimental results for applications in text mining and computer vision.
Expectation maximization and posterior constraints
 In NIPS 20
, 2008
"... The expectation maximization (EM) algorithm is a widely used maximum likelihood estimation procedure for statistical models when the values of some of the variables in the model are not observed. Very often, however, our aim is primarily to find a model that assigns values to the latent variables ..."
Abstract

Cited by 75 (12 self)
 Add to MetaCart
(Show Context)
The expectation maximization (EM) algorithm is a widely used maximum likelihood estimation procedure for statistical models when the values of some of the variables in the model are not observed. Very often, however, our aim is primarily to find a model that assigns values to the latent variables that have intended meaning for our data and maximizing expected likelihood only sometimes accomplishes this. Unfortunately, it is typically difficult to add even simple apriori information about latent variables in graphical models without making the models overly complex or intractable. In this paper, we present an efficient, principled way to inject rich constraints on the posteriors of latent variables into the EM algorithm. Our method can be used to learn tractable graphical models that satisfy additional, otherwise intractable constraints. Focusing on clustering and the alignment problem for statistical machine translation, we show that simple, intuitive posterior constraints can greatly improve the performance over standard baselines and be competitive with more complex, intractable models. 1
Multiway distributional clustering via pairwise interactions
 In ICML
, 2005
"... We present a novel unsupervised learning scheme that simultaneously clusters variables of several types (e.g., documents, words and authors) based on pairwise interactions between the types, as observed in cooccurrence data. In this scheme, multiple clustering systems are generated aiming at maximi ..."
Abstract

Cited by 60 (10 self)
 Add to MetaCart
(Show Context)
We present a novel unsupervised learning scheme that simultaneously clusters variables of several types (e.g., documents, words and authors) based on pairwise interactions between the types, as observed in cooccurrence data. In this scheme, multiple clustering systems are generated aiming at maximizing an objective function that measures multiple pairwise mutual information between cluster variables. To implement this idea, we propose an algorithm that interleaves topdown clustering of some variables and bottomup clustering of the other variables, with a local optimization correction routine. Focusing on document clustering we present an extensive empirical study of twoway, threeway and fourway applications of our scheme using six realworld datasets including the 20 Newsgroups (20NG) and the Enron email collection. Our multiway distributional clustering (MDC) algorithms consistently and significantly outperform previous stateoftheart information theoretic clustering algorithms. 1.
Generative modelbased document clustering: a comparative study
 Knowledge and Information Systems
, 2005
"... Semisupervised learning has become an attractive methodology for improving classification models and is often viewed as using unlabeled data to aid supervised learning. However, it can also be viewed as using labeled data to help clustering, namely, semisupervised clustering. Viewing semisupervis ..."
Abstract

Cited by 50 (0 self)
 Add to MetaCart
(Show Context)
Semisupervised learning has become an attractive methodology for improving classification models and is often viewed as using unlabeled data to aid supervised learning. However, it can also be viewed as using labeled data to help clustering, namely, semisupervised clustering. Viewing semisupervised learning from a clustering angle is useful in practical situations when the set of labels available in labeled data are not complete, i.e., unlabeled data contain new classes that are not present in labeled data. This paper analyzes several multinomial modelbased semisupervised document clustering methods under a principled modelbased clustering framework. The framework naturally leads to a deterministic annealing extension of existing semisupervised clustering approaches. We compare three (slightly) different semisupervised approaches for clustering documents: Seeded damnl, Constrained damnl, and Feedbackbased damnl, where damnl stands for multinomial modelbased deterministic annealing algorithm. The first two are extensions of the seeded kmeans and constrained kmeans algorithms studied by Basu et al. (2002); the last one is motivated by Cohn et al. (2003). Through empirical experiments on text datasets, we show that: (a) deterministic annealing can often significantly improve the performance of semisupervised clustering; (b) the constrained approach is the best when available labels are complete whereas the feedbackbased approach excels when available labels are incomplete.
Nonnegative matrix factorization for rapid recovery of constituent spectra in magnetic resonance chemical shift imaging of the brain
 IEEE Trans on Med Imaging
, 2004
"... Abstract—We present an algorithm for blindly recovering constituent source spectra from magnetic resonance (MR) chemical shift imaging (CSI) of the human brain. The algorithm, which we call constrained nonnegative matrix factorization (cNMF), does not enforce independence or sparsity, instead only ..."
Abstract

Cited by 38 (1 self)
 Add to MetaCart
(Show Context)
Abstract—We present an algorithm for blindly recovering constituent source spectra from magnetic resonance (MR) chemical shift imaging (CSI) of the human brain. The algorithm, which we call constrained nonnegative matrix factorization (cNMF), does not enforce independence or sparsity, instead only requiring the source and mixing matrices to be nonnegative. It is based on the nonnegative matrix factorization (NMF) algorithm, extending it to include a constraint on the positivity of the amplitudes of the recovered spectra. This constraint enables recovery of physically meaningful spectra even in the presence of noise that causes a significant number of the observation amplitudes to be negative. We demonstrate and characterize the algorithm’s performance using P volumetric brain data, comparing the results with two different blind source separation methods: Bayesian spectral decomposition (BSD) and nonnegative sparse coding (NNSC). We then incorporate the cNMF algorithm into a hierarchical decomposition framework, showing that it can be used to recover tissuespecific spectra given a processing hierarchy that proceeds coarsetofine. We demonstrate the hierarchical procedure on H brain data and conclude that the computational efficiency of the algorithm makes it wellsuited for use in diagnostic workup. Index Terms—Blind source separation (BSS), chemical shift imaging (CSI), hierarchical decomposition, magnetic resonance (MR), magnetic resonance spectroscopy (MRS), nonnegative matrix factorization (NMF).