Results 1  10
of
17
Simultaneous feature selection and clustering using mixture models
 IEEE TRANS. PATTERN ANAL. MACH. INTELL
, 2004
"... Clustering is a common unsupervised learning technique used to discover group structure in a set of data. While there exist many algorithms for clustering, the important issue of feature selection, that is, what attributes of the data should be used by the clustering algorithms, is rarely touched u ..."
Abstract

Cited by 122 (1 self)
 Add to MetaCart
Clustering is a common unsupervised learning technique used to discover group structure in a set of data. While there exist many algorithms for clustering, the important issue of feature selection, that is, what attributes of the data should be used by the clustering algorithms, is rarely touched upon. Feature selection for clustering is difficult because, unlike in supervised learning, there are no class labels for the data and, thus, no obvious criteria to guide the search. Another important problem in clustering is the determination of the number of clusters, which clearly impacts and is influenced by the feature selection issue. In this paper, we propose the concept of feature saliency and introduce an expectationmaximization (EM) algorithm to estimate it, in the context of mixturebased clustering. Due to the introduction of a minimum message length model selection criterion, the saliency of irrelevant features is driven toward zero, which corresponds to performing feature selection. The criterion and algorithm are then extended to simultaneously estimate the feature saliencies and the number of clusters.
Mercer KernelBased Clustering in Feature Space
 IEEE Transactions on Neural Networks
, 2002
"... Abstract—This letter presents a method for both the unsupervised partitioning of a sample of data and the estimation of the possible number of inherent clusters which generate the data. This work exploits the notion that performing a nonlinear data transformation into some high dimensional feature s ..."
Abstract

Cited by 96 (0 self)
 Add to MetaCart
(Show Context)
Abstract—This letter presents a method for both the unsupervised partitioning of a sample of data and the estimation of the possible number of inherent clusters which generate the data. This work exploits the notion that performing a nonlinear data transformation into some high dimensional feature space increases the probability of the linear separability of the patterns within the transformed space and therefore simplifies the associated data structure. It is shown that the eigenvectors of a kernel matrix which defines the implicit mapping provides a means to estimate the number of clusters inherent within the data and a computationally simple iterative procedure is presented for the subsequent feature space partitioning of the data. Index Terms—Data clustering, data partitioning, unsupervised learning. I.
Semisupervised conditional random fields for improved sequence segmentation and labeling
 In International Committee on Computational Linguistics and the Association for Computational Linguistics
, 2006
"... We present a new semisupervised training procedure for conditional random fields (CRFs) that can be used to train sequence segmentors and labelers from a combination of labeled and unlabeled training data. Our approach is based on extending the minimum entropy regularization framework to the struct ..."
Abstract

Cited by 78 (7 self)
 Add to MetaCart
(Show Context)
We present a new semisupervised training procedure for conditional random fields (CRFs) that can be used to train sequence segmentors and labelers from a combination of labeled and unlabeled training data. Our approach is based on extending the minimum entropy regularization framework to the structured prediction case, yielding a training objective that combines unlabeled conditional entropy with labeled conditional likelihood. Although the training objective is no longer concave, it can still be used to improve an initial model (e.g. obtained from supervised training) by iterative ascent. We apply our new training algorithm to the problem of identifying gene and protein mentions in biological texts, and show that incorporating unlabeled data improves the performance of the supervised CRF in this case. 1
EntropyBased Criterion in Categorical Clustering
 Proc. of Intl. Conf. on Machine Learning (ICML
, 2004
"... Entropytype measures for the heterogeneity of clusters have been used for a long time. This paper studies the entropybased criterion in clustering categorical data. It first shows that the entropybased criterion can be derived in the formal framework of probabilistic clustering models and e ..."
Abstract

Cited by 35 (4 self)
 Add to MetaCart
(Show Context)
Entropytype measures for the heterogeneity of clusters have been used for a long time. This paper studies the entropybased criterion in clustering categorical data. It first shows that the entropybased criterion can be derived in the formal framework of probabilistic clustering models and establishes the connection between the criterion and the approach based on dissimilarity coefficients.
R.: Learning to model spatial dependency: Semisupervised discriminative random fields
 In: NIPS
, 2007
"... We present a novel, semisupervised approach to training discriminative random elds (DRFs) that efciently exploits labeled and unlabeled training data to achieve improved accuracy in a variety of image processing tasks. We formulate DRF training as a form of MAP estimation that combines conditional ..."
Abstract

Cited by 26 (7 self)
 Add to MetaCart
(Show Context)
We present a novel, semisupervised approach to training discriminative random elds (DRFs) that efciently exploits labeled and unlabeled training data to achieve improved accuracy in a variety of image processing tasks. We formulate DRF training as a form of MAP estimation that combines conditional loglikelihood on labeled data, given a datadependent prior, with a conditional entropy regularizer dened on unlabeled data. Although the training objective is no longer concave, we develop an efcient local optimization procedure that produces classiers that are more accurate than ones based on standard supervised DRF training. We then apply our semisupervised approach to train DRFs to segment both synthetic and real data sets, and demonstrate signicant improvements over supervised DRFs in each case. 1
A unified view on clustering binary data
 Machine Learning
"... Clustering is the problem of identifying the distribution of patterns and intrinsic correlations in large data sets by partitioning the data points into similarity classes. This paper studies the problem of clustering binary data. Binary data have been occupying a special place in the domain of dat ..."
Abstract

Cited by 12 (3 self)
 Add to MetaCart
(Show Context)
Clustering is the problem of identifying the distribution of patterns and intrinsic correlations in large data sets by partitioning the data points into similarity classes. This paper studies the problem of clustering binary data. Binary data have been occupying a special place in the domain of data analysis. A unified view of binary data clustering is presented by examining the connections among various clustering criteria. Experimental studies are conducted to empirically verify the relationships. 1
Clustering, Dimensionality Reduction and Side Information
, 2006
"... Recent advances in sensing and storage technology have created many highvolume, highdimensional data sets in pattern recognition, machine learning, and data mining. Unsupervised learning can provide generic tools for analyzing and summarizing these data sets when there is no welldefined notion of ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
(Show Context)
Recent advances in sensing and storage technology have created many highvolume, highdimensional data sets in pattern recognition, machine learning, and data mining. Unsupervised learning can provide generic tools for analyzing and summarizing these data sets when there is no welldefined notion of classes. The purpose of this thesis is to study some of the open problems in two main areas of unsupervised learning, namely clustering and (unsupervised) dimensionality reduction. Instancelevel constraint on objects, an example of sideinformation, is also considered to improve the clustering results. Our first contribution is a modification to the isometric feature mapping (ISOMAP) algorithm when the input data, instead of being all available simultaneously, arrive sequentially from a data stream. ISOMAP is representative of a class of nonlinear dimensionality reduction algorithms that are based on the notion of a manifold. Both the standard ISOMAP and the landmark version of ISOMAP are considered. Experimental results on synthetic data as well as real world images demonstrate that the modified algorithm can maintain an accurate lowdimensional representation of the data in an efficient manner. We study the problem of feature selection in modelbased clustering when the number of clusters
EntropyInspired Competitive Clustering Algorithms
"... Abstract In this paper, the wellknown competitive clustering algorithm (CA) is revisited and reformulated from a point of view of entropy minimization. That is, the second term of the objective function in CA can be seen as quadratic or secondorder entropy. Along this novel explanation, two genera ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Abstract In this paper, the wellknown competitive clustering algorithm (CA) is revisited and reformulated from a point of view of entropy minimization. That is, the second term of the objective function in CA can be seen as quadratic or secondorder entropy. Along this novel explanation, two generalized competitive clustering algorithms inspired by Renyi entropy and Shannon entropy, i.e. RECA and SECA, are respectively proposed in this paper. Simulation results show that CA requires a large number of initial clusters to obtain the right number of clusters, while RECA and SECA require small and moderate number of initial clusters respectively. Also the iteration steps in RECA and SECA are less than that of CA. Further CA and RECA are generalized to CAp and RECAp by using the porder entropy and Renyi's porder entropy in CA and RECA respectively. Simulation results show that the value of phas a great impact on the performance of CAp, whereas it has little in
uence on that of RECAp. Key words: competitive clustering; fuzzy cmeans; optimal number of clusters; cluster validity; entropy minimization
Conjugate and natural gradient rules for BYY harmony learning on Gaussian mixture with automated model selection
 Int. J. Pattern Recognition Artif. Intell
"... Under the Bayesian Ying–Yang (BYY) harmony learning theory, a harmony function has been developed on a BIdirectional architecture of the BYY system for Gaussian mixture with an important feature that, via its maximization through a general gradient rule, a model selection can be made automatically ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
Under the Bayesian Ying–Yang (BYY) harmony learning theory, a harmony function has been developed on a BIdirectional architecture of the BYY system for Gaussian mixture with an important feature that, via its maximization through a general gradient rule, a model selection can be made automatically during parameter learning on a set of sample data from a Gaussian mixture. This paper further proposes the conjugate and natural gradient rules to efficiently implement the maximization of the harmony function, i.e. the BYY harmony learning, on Gaussian mixture. It is demonstrated by simulation experiments that these two new gradient rules not only work well, but also converge more quickly than the general gradient ones. Keywords: Bayesian Ying–Yang learning; Gaussian mixture; automated model selection; conjugate gradient; natural gradient.