• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

A Clustering Comparison Measure Using Density Profiles (2010)

by E Bae, J Bailey, G Z Dong
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 11
Next 10 →

Overlapping correlation clustering

by Francesco Bonchi, Aristides Gionis, Antti Ukkonen - In ICDM , 2011
"... Abstract—We introduce a new approach to the problem of overlapping clustering. The main idea is to formulate overlapping clustering as an optimization problem in which each data point is mapped to a small set of labels, representing membership to different clusters. The objective is to find a mappin ..."
Abstract - Cited by 7 (1 self) - Add to MetaCart
Abstract—We introduce a new approach to the problem of overlapping clustering. The main idea is to formulate overlapping clustering as an optimization problem in which each data point is mapped to a small set of labels, representing membership to different clusters. The objective is to find a mapping so that the distances between data points agree as much as possible with distances taken over their label sets. To define distances between label sets, we consider two measures: a set-intersection indicator function and the Jaccard coefficient. To solve the main optimization problem we propose a localsearch algorithm. The iterative step of our algorithm requires solving non-trivial optimization subproblems, which, for the measures of set-intersection and Jaccard, we solve using a greedy method and non-negative least squares, respectively. Since our frameworks uses pairwise similarities of objects as the input, it lends itself naturally to the task of clustering structured objects for which feature vectors can be difficult to obtain. As a proof of concept we show how easily our framework can be applied in two different complex application domains. Firstly, we develop overlapping clustering of animal trajectories, obtaining zoologically meaningful results. Secondly, we apply our framework for overlapping clustering of proteins based on pairwise similarities of aminoacid sequences, outperforming the of state-of-the-art method in matching a ground truth taxonomy. I.
(Show Context)

Citation Context

...res vectors are not available, as in our application on trajectories and proteins. Multiple clustering solutions. A large body of work studies the problem of discovering multiple clustering solutions =-=[26]-=-, [27], [28], [29], [30]. The objective in these papers is to discover multiple clusterings for a given dataset. Each of the clusterings needs to be of high quality and the clusterings are required to...

Spatially-Aware Comparison and Consensus for Clusterings ∗

by Parasaran Raman, Jeff M. Phillips, Suresh Venkatasubramanian
"... This paper proposes a new distance metric between clusterings that incorporates information about the spatial distribution of points and clusters. Our approach builds on the idea of a Hilbert space-based representation of clusters as a combination of the representations of their constituent points. ..."
Abstract - Cited by 7 (4 self) - Add to MetaCart
This paper proposes a new distance metric between clusterings that incorporates information about the spatial distribution of points and clusters. Our approach builds on the idea of a Hilbert space-based representation of clusters as a combination of the representations of their constituent points. We use this representation and the underlying metric to design a spatially-aware consensus clustering procedure. This consensus procedure is implemented via a novel reduction to Euclidean clustering, and is both simple and efficient. All of our results apply to both soft and hard clusterings. We accompany these algorithms with a detailed experimental evaluation that demonstrates the efficiency and quality of our techniques.
(Show Context)

Citation Context

...ese methods ignore the actual spatial description of the data, merely treating the data as atoms in a set and using set information to compare the partitions. As has been observed by many researchers =-=[38, 3, 7]-=-, ignoring the spatial relationships in the data can be problematic. Consider the three partitions in Figure 1. The first partition (FP) is obtained by a projection onto the y-axis, and the second (SP...

Breaking the Deadlock: Simultaneously Discovering Attribute Matching and Cluster Matching with Multi-Objective Simulated

by Haishan Liu, Dejing Dou
"... Abstract. In this paper, we present a data mining approach to challenges in the matching and integration of heterogeneous datasets. In particular, we propose solutions to two problems that arise in combining information from different results of scientific research. The first problem, attribute matc ..."
Abstract - Cited by 1 (1 self) - Add to MetaCart
Abstract. In this paper, we present a data mining approach to challenges in the matching and integration of heterogeneous datasets. In particular, we propose solutions to two problems that arise in combining information from different results of scientific research. The first problem, attribute matching, involves discovery of correspondences among distinct numeric-typed summary features (“attributes”) that are used to characterize datasets that have been collected and analyzed in different research labs. The second problem, cluster matching, involves discovery of matchings between patterns across datasets. We treat both of these problems together as a multi-objective optimization problem. A multi-objective simulated annealing algorithm is described to find the optimal solution. The utility of this approach is demonstrated in a series of experiments using synthetic and realistic datasets that are designed to simulate heterogeneous data from different sources.
(Show Context)

Citation Context

... that the datasets may be described by attributes from heterogeneous ontologies or schemas. Even those methods that are able to measure clustering similarity across different datasets (e.g., the ADCO =-=[1]-=- method) have to assume the homogeneous meta-data. Given this situation, in order to carry out cluster comparison for meta-analysis, researchers often need to perform ontology or schema matching first...

Generating a Diverse Set of High-Quality Clusterings

by Jeff M. Phillips, Parasaran Raman, Suresh Venkatasubramanian
"... Abstract. We provide a new framework for generating multiple good quality partitions (clusterings) of a single data set. Our approach decomposes this problem into two components, generating many high-quality partitions, and then grouping these partitions to obtain k representatives. The decompositio ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
Abstract. We provide a new framework for generating multiple good quality partitions (clusterings) of a single data set. Our approach decomposes this problem into two components, generating many high-quality partitions, and then grouping these partitions to obtain k representatives. The decomposition makes the approach extremely modular and allows us to optimize various criteria that control the choice of representative partitions. 1
(Show Context)

Citation Context

... one) of existing partitions. k-consensus clustering assumes an input set of many partitions, and then seeks to return k representative partitions. Most algorithms for generating alternate partitions =-=[38,16,6,5,13,21,12]-=- operate as follows. Generate a single partition using a clustering algorithm of choice. Next, find another partition that is both far from the first partition and of high quality. Most methods stop h...

Case Study: Data Mining of Associate Degree Accepted Candidates by Modular Method ",Communications and Network ,Vol

by Behrouz Minaei Bidgoli, Maryam Nazaridoust
"... Since about 10 years ago, University of Applied Science and Technology (UAST) in Iran has admitted students in dis-continuous associate degree by modular method, so that almost 100,000 students are accepted every year. Although the first aim of holding such courses was to improve scientific and skil ..."
Abstract - Cited by 1 (1 self) - Add to MetaCart
Since about 10 years ago, University of Applied Science and Technology (UAST) in Iran has admitted students in dis-continuous associate degree by modular method, so that almost 100,000 students are accepted every year. Although the first aim of holding such courses was to improve scientific and skill level of employees, over time a considerable group of unemployed people have been interested to participate in these courses. According to this fact, in this paper, we mine and analyze a sample data of accepted candidates in modular 2008 and 2009 courses by using unsupervised and super-vised learning paradigms. In the first step, by using unsupervised paradigm, we grouped (clustered) set of modular ac-cepted candidates based on their student status and labeled data sets by three classes so that each class somehow shows educational and student status of modular accepted candidates. In the second step, by using supervised and unsuper-vised algorithms, we generated predicting models in 2008 data sets. Then, by making a comparison between perform-ances of generated models, we selected predicting model of association rules through which some rules were extracted. Finally, this model is executed for Test set which includes accepted candidates of next course then by evaluation of re-sults, the percentage of correctness and confidentiality of obtained results can be viewed.
(Show Context)

Citation Context

... can argue that so far varied clustering algo-srithms have been generated which offer relatively differ-sent results. Figure 1 shows result of applying three different clustering on a unique data set =-=[6]-=-.s1In this study, pre process has been done by using two applicationsprograms (Clementine and Data mining tab in Microsoft Office Excels2007).sCopyright © 2012 SciRes.sCNsB. M. BIDGOLI, M. NAZARIDOUST...

Neurocomputing 92 (2012) 156–169 Contents lists available at SciVerse ScienceDirect

by Haishan Liu A, Gwen Frishkoff B, Robert Frank C, Dejing Dou A
"... journal homepage: www.elsevier.com/locate/neucom Sharing and integration of cognitive neuroscience data: Metric and pattern ..."
Abstract - Add to MetaCart
journal homepage: www.elsevier.com/locate/neucom Sharing and integration of cognitive neuroscience data: Metric and pattern
(Show Context)

Citation Context

...n about the distribution of different ERP pattern attributes, i.e., spatial and temporal features of the data. To this end, we have represented clusters as density profiles, as proposed by Bae et al. =-=[16]-=-, and have selected a cluster similarity index known as ADCO (Attribute Distribution Clustering Orthogonality). ADCO determines the similarity between two clusters based on their density profiles, whi...

Date Approved

by Aniruddha N. Udipi, Alan L. Davis, Erik L. Brunv, Erik L. Brunv, Erik L. Brunv , 2012
"... The computing landscape is undergoing a major change, primarily enabled by ubiquitous wireless networks and the rapid increase in the use of mobile devices which access a web-based information infrastructure. It is expected that most intensive computing may either happen in servers housed in large d ..."
Abstract - Add to MetaCart
The computing landscape is undergoing a major change, primarily enabled by ubiquitous wireless networks and the rapid increase in the use of mobile devices which access a web-based information infrastructure. It is expected that most intensive computing may either happen in servers housed in large datacenters (warehousescale computers), e.g., cloud computing and other web services, or in many-core high-performance computing (HPC) platforms in scientific labs. It is clear that the primary challenge to scaling such computing systems into the exascale realm is the efficient supply of large amounts of data to hundreds or thousands of compute cores, i.e., building an efficient memory system. Main memory systems are at an inflection point, due to the convergence of several major application and technology trends. Examples include the increasing importance of energy consumption, reduced access stream locality, increasing failure rates, limited pin counts, increasing heterogeneity and complexity, and the diminished importance of cost-per-bit. In light of these trends, the memory system requires a major overhaul. The key to architecting the next generation of memory systems is a combination of the prudent incorporation

s i t o n o m i c s a n d B u s i n e s c t h e n s U n i v y o

by unknown authors
"... f ..."
Abstract - Add to MetaCart
Abstract not found
(Show Context)

Citation Context

...g a concise representation of each cluster and then use the earthmover’s distance (EMD) [20] to compare these sets of representatives in a spatially-aware manner. These include CDistance [11], d ADCO =-=[6]-=-, CC distance [48], and LiftEMD [39]. As discussed in [39], LiftEMD has the benefit of being both efficient as well as a well-founded metric, and is the method used here. Density-based distances. The ...

AciForager: Incrementally Discovering Regions of Correlated Change in Evolving Graphs

by Jeffrey Chan, James Bailey, Christopher Leckie
"... components, fault detection ..."
Abstract - Add to MetaCart
components, fault detection

Structure-Aware Distance Measures for Comparing Clusterings in Graphs

by Jeffrey Chan, Nguyen Xuan Vinh, Wei Liu, James Bailey, Christopher A. Leckie, Kotagiri Ramamohanarao, Jian Pei
"... Abstract. Clustering in graphs aims to group vertices with similar pat-terns of connections. Applications include discovering communities and latent structures in graphs. Many algorithms have been proposed to find graph clusterings, but an open problem is the need for suitable com-parison measures t ..."
Abstract - Add to MetaCart
Abstract. Clustering in graphs aims to group vertices with similar pat-terns of connections. Applications include discovering communities and latent structures in graphs. Many algorithms have been proposed to find graph clusterings, but an open problem is the need for suitable com-parison measures to quantitatively validate these algorithms, perform-ing consensus clustering and to track evolving (graph) clusters across time. To date, most comparison measures have focused on comparing the vertex groupings, and completely ignore the difference in the structural approximations in the clusterings, which can lead to counter-intuitive comparisons. In this paper, we propose new measures that account for differences in the approximations. We focus on comparison measures for two important graph clustering approaches, community detection and blockmodelling, and propose comparison measures that work for weighted (and unweighted) graphs.
(Show Context)

Citation Context

...und that clusterings could be similar in memberships, but their points are distributed very differently. This could lead to counter-intuitive situations where such clusterings were considered similar =-=[5]-=-[6]. We show the same scenario occurs when comparing graph clusterings. Therefore, in this paper, we address the open problems of analysing and showing why membership comparison measures can be inadeq...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University