Results 1  10
of
16
Clusters and water flows: a novel approach to modal clustering through Morse theory
, 2014
"... The problem of finding groups in data (cluster analysis) has been extensively studied by researchers from the fields of Statistics and Computer Science, among others. However, despite its popularity it is widely recognized that the investigation of some theoretical aspects of clustering has been re ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
The problem of finding groups in data (cluster analysis) has been extensively studied by researchers from the fields of Statistics and Computer Science, among others. However, despite its popularity it is widely recognized that the investigation of some theoretical aspects of clustering has been relatively sparse. One of the main reasons for this lack of theoretical results is surely the fact that, unlike the situation with other statistical problems as regression or classification, for some of the cluster methodologies it is quite difficult to specify a population goal to which the databased clustering algorithms should try to get close. This paper aims to provide some insight into the theoretical foundations of the usual nonparametric approach to clustering, which understands clusters as regions of high density, by presenting an explicit formulation for the ideal population clustering.
A Conformal Prediction Approach to Explore Functional Data
, 2013
"... This paper applies conformal prediction techniques to compute simultaneous prediction bands and clustering trees for functional data. These tools can be used to detect outliers and clusters. Both our prediction bands and clustering trees provide prediction sets for the underlying stochastic process ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
This paper applies conformal prediction techniques to compute simultaneous prediction bands and clustering trees for functional data. These tools can be used to detect outliers and clusters. Both our prediction bands and clustering trees provide prediction sets for the underlying stochastic process with a guaranteed finite sample behavior, under no distributional assumptions. The prediction sets are also informative in that they correspond to the high density region of the underlying process. While ordinary conformal prediction has high computational cost for functional data, we use the inductive conformal predictor, together with several novel choices of conformity scores, to simplify the computation. Our methods are illustrated on some real data examples. 1
Confidence Regions for Level Sets
, 2012
"... This paper discusses a universal approach to the construction of confidence regions for level sets {h(x) ≥ 0} ⊂Rd of a function h of interest. The proposed construction is based on a plugin estimate of the level sets using an appropriate estimate hn of h. The approach provides finite sample upper ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
This paper discusses a universal approach to the construction of confidence regions for level sets {h(x) ≥ 0} ⊂Rd of a function h of interest. The proposed construction is based on a plugin estimate of the level sets using an appropriate estimate hn of h. The approach provides finite sample upper and lower confidence limits. This leads to generic conditions under which the constructed confidence regions achieve a prescribed coverage level asymptotically. The construction requires an estimate of quantiles of the distribution of sup∆n hn(x) − h(x)  for appropriate sets ∆n ⊂ R d. In contrast to related work from the literature, the existence of a weak limit for an appropriately normalized process {hn(x),x ∈ D} is not required. This adds significantly to the challenge of deriving asymptotic results for the corresponding coverage level. Our approach is exemplified in the case of a density level set utilizing a kernel density estimator and a bootstrap procedure.
DeBaCl: A Python Package for Interactive DEnsityBAsed CLustering
"... The level set tree approach of Hartigan (1975) provides a probabilistically based and highly interpretable encoding of the clustering behavior of a dataset. By representing the hierarchy of data modes as a dendrogram of the level sets of a density estimator, this approach offers many advantages for ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
The level set tree approach of Hartigan (1975) provides a probabilistically based and highly interpretable encoding of the clustering behavior of a dataset. By representing the hierarchy of data modes as a dendrogram of the level sets of a density estimator, this approach offers many advantages for exploratory analysis and clustering, especially for complex and highdimensional data. Several R packages exist for level set tree estimation, but their practical usefulness is limited by computational inefficiency, absence of interactive graphical capabilities and, from a theoretical perspective, reliance on asymptotic approximations. To make it easier for practitioners to capture the advantages of level set trees, we have written the Python package DeBaCl for DEnsityBAsed CLustering. In this article we illustrate how DeBaCl’s level set tree estimates can be used for difficult clustering tasks and interactive graphical data analysis. The package is intended to promote the practical use of level set trees through improvements in computational efficiency and a high degree of user customization. In addition, the flexible algorithms implemented in DeBaCl enjoy finite sample accuracy, as demonstrated in recent literature on density clustering. Finally, we show the level set tree framework can be easily extended to deal with functional data.
Mapping Topographic Structure in White Matter Pathways with Level Set Trees
"... Mapping topographic structure in white matter pathways with level set trees. ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Mapping topographic structure in white matter pathways with level set trees.
Submitted to the Annals of Statistics arXiv: math.PR/0000000 CONSISTENT PROCEDURES FOR CLUSTER TREE ESTIMATION AND PRUNING
"... For a density f on Rd, a highdensity cluster is any connected component of fx: f(x) g, for some > 0. The set of all highdensity clusters forms a hierarchy called the cluster tree of f. We present two procedures for estimating the cluster tree given samples from f. The rst is a robust variant ..."
Abstract
 Add to MetaCart
For a density f on Rd, a highdensity cluster is any connected component of fx: f(x) g, for some > 0. The set of all highdensity clusters forms a hierarchy called the cluster tree of f. We present two procedures for estimating the cluster tree given samples from f. The rst is a robust variant of the single linkage algorithm for hierarchical clustering. The second is based on the knearest neighbor graph of the samples. We give nitesample convergence rates for these algorithms which also imply consistency, and we derive lower bounds on the sample complexity of cluster tree estimation. Finally, we study a tree pruning procedure that guarantees, under milder conditions than usual, to remove clusters that are spurious while recovering those that are salient. 1. Introduction. We