Results 1 - 10
of
11
Overlapping correlation clustering
- In ICDM
, 2011
"... Abstract—We introduce a new approach to the problem of overlapping clustering. The main idea is to formulate overlapping clustering as an optimization problem in which each data point is mapped to a small set of labels, representing membership to different clusters. The objective is to find a mappin ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
(Show Context)
Abstract—We introduce a new approach to the problem of overlapping clustering. The main idea is to formulate overlapping clustering as an optimization problem in which each data point is mapped to a small set of labels, representing membership to different clusters. The objective is to find a mapping so that the distances between data points agree as much as possible with distances taken over their label sets. To define distances between label sets, we consider two measures: a set-intersection indicator function and the Jaccard coefficient. To solve the main optimization problem we propose a localsearch algorithm. The iterative step of our algorithm requires solving non-trivial optimization subproblems, which, for the measures of set-intersection and Jaccard, we solve using a greedy method and non-negative least squares, respectively. Since our frameworks uses pairwise similarities of objects as the input, it lends itself naturally to the task of clustering structured objects for which feature vectors can be difficult to obtain. As a proof of concept we show how easily our framework can be applied in two different complex application domains. Firstly, we develop overlapping clustering of animal trajectories, obtaining zoologically meaningful results. Secondly, we apply our framework for overlapping clustering of proteins based on pairwise similarities of aminoacid sequences, outperforming the of state-of-the-art method in matching a ground truth taxonomy. I.
Spatially-Aware Comparison and Consensus for Clusterings ∗
"... This paper proposes a new distance metric between clusterings that incorporates information about the spatial distribution of points and clusters. Our approach builds on the idea of a Hilbert space-based representation of clusters as a combination of the representations of their constituent points. ..."
Abstract
-
Cited by 7 (4 self)
- Add to MetaCart
(Show Context)
This paper proposes a new distance metric between clusterings that incorporates information about the spatial distribution of points and clusters. Our approach builds on the idea of a Hilbert space-based representation of clusters as a combination of the representations of their constituent points. We use this representation and the underlying metric to design a spatially-aware consensus clustering procedure. This consensus procedure is implemented via a novel reduction to Euclidean clustering, and is both simple and efficient. All of our results apply to both soft and hard clusterings. We accompany these algorithms with a detailed experimental evaluation that demonstrates the efficiency and quality of our techniques.
Breaking the Deadlock: Simultaneously Discovering Attribute Matching and Cluster Matching with Multi-Objective Simulated
"... Abstract. In this paper, we present a data mining approach to challenges in the matching and integration of heterogeneous datasets. In particular, we propose solutions to two problems that arise in combining information from different results of scientific research. The first problem, attribute matc ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
Abstract. In this paper, we present a data mining approach to challenges in the matching and integration of heterogeneous datasets. In particular, we propose solutions to two problems that arise in combining information from different results of scientific research. The first problem, attribute matching, involves discovery of correspondences among distinct numeric-typed summary features (“attributes”) that are used to characterize datasets that have been collected and analyzed in different research labs. The second problem, cluster matching, involves discovery of matchings between patterns across datasets. We treat both of these problems together as a multi-objective optimization problem. A multi-objective simulated annealing algorithm is described to find the optimal solution. The utility of this approach is demonstrated in a series of experiments using synthetic and realistic datasets that are designed to simulate heterogeneous data from different sources.
Generating a Diverse Set of High-Quality Clusterings
"... Abstract. We provide a new framework for generating multiple good quality partitions (clusterings) of a single data set. Our approach decomposes this problem into two components, generating many high-quality partitions, and then grouping these partitions to obtain k representatives. The decompositio ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
Abstract. We provide a new framework for generating multiple good quality partitions (clusterings) of a single data set. Our approach decomposes this problem into two components, generating many high-quality partitions, and then grouping these partitions to obtain k representatives. The decomposition makes the approach extremely modular and allows us to optimize various criteria that control the choice of representative partitions. 1
Case Study: Data Mining of Associate Degree Accepted Candidates by Modular Method ",Communications and Network ,Vol
"... Since about 10 years ago, University of Applied Science and Technology (UAST) in Iran has admitted students in dis-continuous associate degree by modular method, so that almost 100,000 students are accepted every year. Although the first aim of holding such courses was to improve scientific and skil ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
Since about 10 years ago, University of Applied Science and Technology (UAST) in Iran has admitted students in dis-continuous associate degree by modular method, so that almost 100,000 students are accepted every year. Although the first aim of holding such courses was to improve scientific and skill level of employees, over time a considerable group of unemployed people have been interested to participate in these courses. According to this fact, in this paper, we mine and analyze a sample data of accepted candidates in modular 2008 and 2009 courses by using unsupervised and super-vised learning paradigms. In the first step, by using unsupervised paradigm, we grouped (clustered) set of modular ac-cepted candidates based on their student status and labeled data sets by three classes so that each class somehow shows educational and student status of modular accepted candidates. In the second step, by using supervised and unsuper-vised algorithms, we generated predicting models in 2008 data sets. Then, by making a comparison between perform-ances of generated models, we selected predicting model of association rules through which some rules were extracted. Finally, this model is executed for Test set which includes accepted candidates of next course then by evaluation of re-sults, the percentage of correctness and confidentiality of obtained results can be viewed.
Neurocomputing 92 (2012) 156–169 Contents lists available at SciVerse ScienceDirect
"... journal homepage: www.elsevier.com/locate/neucom Sharing and integration of cognitive neuroscience data: Metric and pattern ..."
Abstract
- Add to MetaCart
(Show Context)
journal homepage: www.elsevier.com/locate/neucom Sharing and integration of cognitive neuroscience data: Metric and pattern
Date Approved
, 2012
"... The computing landscape is undergoing a major change, primarily enabled by ubiquitous wireless networks and the rapid increase in the use of mobile devices which access a web-based information infrastructure. It is expected that most intensive computing may either happen in servers housed in large d ..."
Abstract
- Add to MetaCart
The computing landscape is undergoing a major change, primarily enabled by ubiquitous wireless networks and the rapid increase in the use of mobile devices which access a web-based information infrastructure. It is expected that most intensive computing may either happen in servers housed in large datacenters (warehousescale computers), e.g., cloud computing and other web services, or in many-core high-performance computing (HPC) platforms in scientific labs. It is clear that the primary challenge to scaling such computing systems into the exascale realm is the efficient supply of large amounts of data to hundreds or thousands of compute cores, i.e., building an efficient memory system. Main memory systems are at an inflection point, due to the convergence of several major application and technology trends. Examples include the increasing importance of energy consumption, reduced access stream locality, increasing failure rates, limited pin counts, increasing heterogeneity and complexity, and the diminished importance of cost-per-bit. In light of these trends, the memory system requires a major overhaul. The key to architecting the next generation of memory systems is a combination of the prudent incorporation
AciForager: Incrementally Discovering Regions of Correlated Change in Evolving Graphs
"... components, fault detection ..."
Structure-Aware Distance Measures for Comparing Clusterings in Graphs
"... Abstract. Clustering in graphs aims to group vertices with similar pat-terns of connections. Applications include discovering communities and latent structures in graphs. Many algorithms have been proposed to find graph clusterings, but an open problem is the need for suitable com-parison measures t ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract. Clustering in graphs aims to group vertices with similar pat-terns of connections. Applications include discovering communities and latent structures in graphs. Many algorithms have been proposed to find graph clusterings, but an open problem is the need for suitable com-parison measures to quantitatively validate these algorithms, perform-ing consensus clustering and to track evolving (graph) clusters across time. To date, most comparison measures have focused on comparing the vertex groupings, and completely ignore the difference in the structural approximations in the clusterings, which can lead to counter-intuitive comparisons. In this paper, we propose new measures that account for differences in the approximations. We focus on comparison measures for two important graph clustering approaches, community detection and blockmodelling, and propose comparison measures that work for weighted (and unweighted) graphs.