Results 1 -
4 of
4
Approximating Largest Convex Hulls for Imprecise Points
, 2007
"... Assume that a set of imprecise points is given, where each point is specified by a region in which the point will lie. Such a region can be modelled as a circle, square, line segment, etc. We study the problem of maximising the area of the convex hull of such a set. We prove NP-hardness when the imp ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Assume that a set of imprecise points is given, where each point is specified by a region in which the point will lie. Such a region can be modelled as a circle, square, line segment, etc. We study the problem of maximising the area of the convex hull of such a set. We prove NP-hardness when the imprecise points are modelled as line segments, and give linear time approximation schemes for a variety of models, based on the core-set paradigm.
Clustering for Data Reduction: A Divide and Conquer Approach
, 2007
"... We consider the problem of reducing a potentially very large dataset to a subset of representative prototypes. Rather than searching over the entire space of prototypes, we first roughly divide the data into balanced clusters using bisecting k-means and spectral cuts, and then find the prototypes fo ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
We consider the problem of reducing a potentially very large dataset to a subset of representative prototypes. Rather than searching over the entire space of prototypes, we first roughly divide the data into balanced clusters using bisecting k-means and spectral cuts, and then find the prototypes for each cluster by affinity propagation. We apply our algorithm to text data, where we perform an order of magnitude faster than simply looking for prototypes on the entire dataset. Furthermore, our “divide and conquer ” approach actually performs more accurately on datasets which are well bisected, as the greedy decisions of affinity propagation are confined to classes of already similar items. 1 Introduction. We consider the problem of data reduction, whereby we want to reduce our original data to a smaller but representative subset. This is related to feature selection in dimensionality reduction tasks, where we are looking for an m < n, where m << n. A reduced dataset might have a direct interpretation. For instance, the objects in question might be sentences, and resulting prototypes would span only the most essential
RESEARCH Speeding up the Consensus Clustering methodology for microarray data analysis
"... Background: The inference of the number of clusters in a dataset, a fundamental problem in Statistics, Data Analysis and Classification, is usually addressed via internal validation measures. The stated problem is quite difficult, in particular for microarrays, since the inferred prediction must be ..."
Abstract
- Add to MetaCart
Background: The inference of the number of clusters in a dataset, a fundamental problem in Statistics, Data Analysis and Classification, is usually addressed via internal validation measures. The stated problem is quite difficult, in particular for microarrays, since the inferred prediction must be sensible enough to capture the inherent biological structure in a dataset, e.g., functionally related genes. Despite the rich literature present in that area, the identification of an internal validation measure that is both fast and precise has proved to be elusive. In order to partially fill this gap, we propose a speed-up of Consensus (Consensus Clustering), a methodology whose purpose is the provision of a prediction of the number of clusters in a dataset, together with a dissimilarity matrix (the consensus matrix) that can be used by clustering algorithms. As detailed in the remainder of the paper, Consensus is a natural candidate for a speed-up. Results: Since the time-precision performance of Consensus depends on two parameters, our first task is to show that a simple adjustment of the parameters is not enough to obtain a good precision-time trade-off. Our second task is to provide a fast approximation algorithm for Consensus. That is, the closely related algorithm FC (Fast Consensus) that would have the same precision as Consensus with a substantially better time performance. The performance of FC has been assessed via extensive experiments on twelve benchmark datasets that

