Results 1 
5 of
5
Document clustering via adaptive subspace iteration
 In SIGIR
, 2004
"... Document clustering has long been an important problem in information retrieval. In this paper, we present a new clustering algorithm ASI1, which uses explicitly modeling of the subspace structure associated with each cluster. ASI simultaneously performs data reduction and subspace identification vi ..."
Abstract

Cited by 36 (7 self)
 Add to MetaCart
(Show Context)
Document clustering has long been an important problem in information retrieval. In this paper, we present a new clustering algorithm ASI1, which uses explicitly modeling of the subspace structure associated with each cluster. ASI simultaneously performs data reduction and subspace identification via an iterative alternating optimization procedure. Motivated from the optimization procedure, we then provide a novel method to determine the number of clusters. We also discuss the connections of ASI with various existential clustering approaches. Finally, extensive experimental results on real data sets show the effectiveness of ASI algorithm.
A unified view on clustering binary data
 Machine Learning
"... Clustering is the problem of identifying the distribution of patterns and intrinsic correlations in large data sets by partitioning the data points into similarity classes. This paper studies the problem of clustering binary data. Binary data have been occupying a special place in the domain of dat ..."
Abstract

Cited by 12 (3 self)
 Add to MetaCart
(Show Context)
Clustering is the problem of identifying the distribution of patterns and intrinsic correlations in large data sets by partitioning the data points into similarity classes. This paper studies the problem of clustering binary data. Binary data have been occupying a special place in the domain of data analysis. A unified view of binary data clustering is presented by examining the connections among various clustering criteria. Experimental studies are conducted to empirically verify the relationships. 1
Research Track Paper A General Model for Clustering Binary Data
"... Clustering is the problem of identifying the distribution of patterns and intrinsic correlations in large data sets by partitioning the data points into similarity classes. This paper studies the problem of clustering binary data. This is the case for market basket datasets where the transactions co ..."
Abstract
 Add to MetaCart
(Show Context)
Clustering is the problem of identifying the distribution of patterns and intrinsic correlations in large data sets by partitioning the data points into similarity classes. This paper studies the problem of clustering binary data. This is the case for market basket datasets where the transactions contain items and for document datasets where the documents contain “bag of words”. The contribution of the paper is threefold. First a general binary data clustering model is presented. The model treats the data and features equally, based on their symmetric association relations, and explicitly describes the data assignments as well as feature assignments. We characterize several variations with different optimization procedures for the general model. Second, we also establish the connections between our clustering model with other existing clustering methods. Third, we also discuss the problem for determining the number of clusters for binary clustering. Experimental results show the effectiveness of the proposed clustering model.
Research Overview
, 2008
"... My research explores two related topics on learning from data—how to efficiently discover useful patterns and how to effectively retrieve information. The interests lie broadly in data mining, machine learning, information retrieval, and bioinformatics studying both the algorithmic and application ..."
Abstract
 Add to MetaCart
(Show Context)
My research explores two related topics on learning from data—how to efficiently discover useful patterns and how to effectively retrieve information. The interests lie broadly in data mining, machine learning, information retrieval, and bioinformatics studying both the algorithmic and application issues. I focus strongly on research challenges grounded in realworld problems, and work to validate my research in this context. I received NSF Career Award, Two IBM Faculty research awards, an IBM Shared University Research (SUR) award, and a Xerox University Affairs Committee (UAC) award for my work on data mining and its applications. All these awards are highly competitive and recognize the quality and importance of my work. My research output so far is: 22 papers in peerreviewed journals, 2 book chapters, 72 papers