Results 21  30
of
294
Deterministic Initialization of the KMeans Algorithm Using Hierarchical Clustering
 International Journal of Pattern Recognition and Artificial Intelligence
, 2012
"... ar ..."
(Show Context)
Onedimensional centerbased l1clustering method, Optimization Letters 7(1): 5–22
, 2012
"... In this paper, we consider the l1clustering problem for a finite datapoint set which should be partitioned into k disjoint nonempty subsets. In that case, the objective function does not have to be either convex or differentiable, and generally it may have many local or global minima. Therefore, i ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
(Show Context)
In this paper, we consider the l1clustering problem for a finite datapoint set which should be partitioned into k disjoint nonempty subsets. In that case, the objective function does not have to be either convex or differentiable, and generally it may have many local or global minima. Therefore, it becomes a complex global optimization problem. A method of searching for a locally optimal solution is proposed in the paper, the convergence of the corresponding iterative process is proved and the corresponding algorithm is given. The method is illustrated by and compared with some other clustering methods, especially with the l2clustering method, which is also known in the literature as a smooth kmeans method, on a few typical situations, such as the presence of outliers among the data and the clustering of incomplete data. Numerical experiments show in this case that the proposed l1clustering algorithm is faster and gives significantly better results than the l2clustering algorithm.
Bipartite Graphs for Monitoring Clusters Transitions
, 2010
"... The study of evolution has become an important research issue, especially in the last decade, due to a greater awareness of our world’s volatility. As a consequence, a new paradigm has emerged to respond more effectively to a class of new problems in Data Mining. In this paper we address the proble ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
The study of evolution has become an important research issue, especially in the last decade, due to a greater awareness of our world’s volatility. As a consequence, a new paradigm has emerged to respond more effectively to a class of new problems in Data Mining. In this paper we address the problem of monitoring the evolution of clusters and propose the MClusT framework, which was developed along the lines of this new Change Mining paradigm. MClusT includes a taxonomy of transitions, a tracking method based in Graph Theory, and a transition detection algorithm. To demonstrate its feasibility and applicability we present real world case studies, using datasets extracted from Banco de Portugal and the Portuguese Institute of Statistics. We also test our approach in a benchmark dataset from TSDL. The results are encouraging and demonstrate the ability of MClusT framework to provide an efficient diagnosis of clusters transitions.
Integration of Global and Local Information in Videos for Key Frame Extraction
"... Key frame extraction methods aim to obtain a set of frames that can efficiently represent and summarize video contents and be reused in many video retrievalrelated applications. An effective set of key frames, viewed as a highquality summary of the video, should include the major objects and event ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
(Show Context)
Key frame extraction methods aim to obtain a set of frames that can efficiently represent and summarize video contents and be reused in many video retrievalrelated applications. An effective set of key frames, viewed as a highquality summary of the video, should include the major objects and events of the video, and contain little redundancy and overlapped content. In this paper, a new key frame extraction method is presented, which not only is based on the traditional idea of clustering in the feature extraction phase but also effectively reduces redundant frames using the integration of local and global information in videos. Experimental results on the TRECVid 2007 test video dataset have demonstrated the effectiveness of our proposed key frame extraction method in terms of the compression rate and retrieval precision.
New Filter method for categorical variables' selection
 International Journal of Computer Science Issues
, 2012
"... It is worth noting that the variableselection process has become an increasingly exciting challenge, given the dramatic increase in the size of databases and the number of variables to be explored and modelized. Therefore, several strategies and methods have been developed with the aim of selecting ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
It is worth noting that the variableselection process has become an increasingly exciting challenge, given the dramatic increase in the size of databases and the number of variables to be explored and modelized. Therefore, several strategies and methods have been developed with the aim of selecting the minimum number of variables while preserving as much information for the interest variable of the system to be modelized (variable to predict). In this work, we will present a novel Filter method useful for selecting variables, distinct for its joint application of both simple as well as multivariate analyses to select variables. In the first place, we will deal with the major prevailing strategies and methods already underway. Secondly, we will expose our new method and establish a comparison of its achieved results with those of the existing methods. The experiments have been implemented on two different databases, namely, a cardiac diagnosis disease labeled "Spect Heart", and a car diagnosis, called "Car Diagnosis 2". As for the ultimate section, it will bear the conclusion as well some highlights for future research perspectives and potential horizons.
Nonparametric mixture models for clustering
 In Proceedings of the 2010 Joint IAPR International Conference on Structural, Syntactic, and Statistical Pattern Recognition
"... Abstract. Mixture models have been widely used for data clustering. However, commonly used mixture models are generally of a parametric form (e.g., mixture of Gaussian distributions or GMM), which significantly limits their capacity in fitting diverse multidimensional data distributions encountered ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
Abstract. Mixture models have been widely used for data clustering. However, commonly used mixture models are generally of a parametric form (e.g., mixture of Gaussian distributions or GMM), which significantly limits their capacity in fitting diverse multidimensional data distributions encountered in practice. We propose a nonparametric mixture model (NMM) for data clustering in order to detect clusters generated from arbitrary unknown distributions, using nonparametric kernel density estimates. The proposed model is nonparametric since the generative distribution of each data point depends only on the rest of the data points and the chosen kernel. A leaveoneout likelihood maximization is performed to estimate the parameters of the model. The NMM approach, when applied to cluster high dimensional text datasets significantly outperforms the stateoftheart and classical approaches such as Kmeans, Gaussian Mixture Models, spectral clustering and linkage methods. 1
Pinch Ratio Clustering from a Topologically Intrinsic Lexicographic Ordering
"... This paper introduces an algorithm for determining data clusters called TILO/PRC (Topologically Intrinsic Lexicographic Ordering/Pinch Ratio Clustering). The theoretical foundation for this algorithm, developed in [14], uses ideas from topology (particularly knot theory) suggesting that it should be ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
(Show Context)
This paper introduces an algorithm for determining data clusters called TILO/PRC (Topologically Intrinsic Lexicographic Ordering/Pinch Ratio Clustering). The theoretical foundation for this algorithm, developed in [14], uses ideas from topology (particularly knot theory) suggesting that it should be very flexible and robust with respect to noise. The TILO portion of the algorithm progressively improves a linear ordering of the points in a data set until the ordering satisfies a topological condition called strongly irreducible. The PRC algorithm then divides the data set based on this ordering and a heuristic metric called the pinch ratio. We demonstrate the effectiveness of TILO/PRC for finding clusters in a wide variety of real and synthetic data sets and compare the results to existing clustering methods. Moreover, because the output of TILO depends on the initial ordering, we consider the effects of different random orderings on the final clusters defined by PRC, and show that choosing an initial ordering based on a different clustering algorithm can improve the final clusters. These results verify that both the theoretical foundations of TILO and the heuristic notion of pinch ratio are reasonable. 1
Clustering Students based on Their Annotations of a Digital Text
 In the Proceedings of 4th IEEE International Conference on Technology for Education, Andhra Pradesh, India
"... Abstract—Students often annotate texts they are reading using highlighting, underlining, and written comments and marks in the margins of the text. These may serve various functions and will reflect each student’s goals and understanding of the text. This research proposes two simple biologyinspire ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
(Show Context)
Abstract—Students often annotate texts they are reading using highlighting, underlining, and written comments and marks in the margins of the text. These may serve various functions and will reflect each student’s goals and understanding of the text. This research proposes two simple biologyinspired approaches to represent the patterns of student annotations and to cluster students based on the similarity between their annotations; the annotations produced were simple highlighting. To verify the effectiveness of the proposed approaches, the research compared the processing speed of these approaches with generic hierarchical clustering algorithm implemented in Matlab and compared the accuracy of the clusters with the clusters created by human raters. The results show that both of the proposed approaches are more efficient and accurate than the generic hierarchical clustering algorithm. The proposed methodology can be implemented as an addon to existing learning management systems and ebook readers, to automatically offer the students important notes and annotations conducted by others (either peers or students in the past) who have similar annotation behaviour pattern and style to the students. KeywordsAnnotation; Biologyinspired; Chromosome; Patterns; Clustering
Warped KMeans: An algorithm to cluster sequentiallydistributed data. Pattern Recognition Letters (2012). To be published. 1Available at http://personales.upv.es/luileito/wkm/. This is a preprint for personal use only. The published paper may be subject
"... Many devices generate large amounts of data that follow some sort of sequentiality, e.g., motion sensors, epens, eye trackers, etc. and often these data need to be compressed for classification, storage, and/or retrieval tasks. Traditional clustering algorithms can be used for this purpose, but unf ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
(Show Context)
Many devices generate large amounts of data that follow some sort of sequentiality, e.g., motion sensors, epens, eye trackers, etc. and often these data need to be compressed for classification, storage, and/or retrieval tasks. Traditional clustering algorithms can be used for this purpose, but unfortunately they do not cope with the sequential information implicitly embedded in such data. Thus, we revisit the wellknown Kmeans algorithm and provide a general method to properly cluster sequentiallydistributed data. We present Warped KMeans (WKM), a multipurpose partitional clustering procedure that minimizes the Sum of Squared Error criterion, while imposing a hard sequentiality constraint in the classification step. We illustrate the properties of WKM in three applications, one being the segmentation and classification of human activity. WKM outperformed five stateoftheart clustering techniques to simplify data trajectories, achieving a recognition accuracy of near 97%, which is an improvement of around 66 % over their peers. Moreover, such an improvement came with a reduction in the computational cost of more than one order of magnitude.
A dirichlet multinomial mixture modelbased approach for short text clustering
 in KDD
"... ABSTRACT Short text clustering has become an increasingly important task with the popularity of social media like Twitter, Google+, and Facebook. It is a challenging problem due to its sparse, highdimensional, and largevolume characteristics. In this paper, we proposed a collapsed Gibbs Sampling ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
ABSTRACT Short text clustering has become an increasingly important task with the popularity of social media like Twitter, Google+, and Facebook. It is a challenging problem due to its sparse, highdimensional, and largevolume characteristics. In this paper, we proposed a collapsed Gibbs Sampling algorithm for the Dirichlet Multinomial Mixture model for short text clustering (abbr. to GSDMM). We found that GSDMM can infer the number of clusters automatically with a good balance between the completeness and homogeneity of the clustering results, and is fast to converge. GSDMM can also cope with the sparse and highdimensional problem of short texts, and can obtain the representative words of each cluster. Our extensive experimental study shows that GSDMM can achieve significantly better performance than three other clustering models.