Results 11  20
of
42
Set Estimation and Nonparametric Detection
, 2000
"... The authors consider the estimation of a set S # IR d from a random sample of n points. They examine the properties of a detection method, proposed by Devroye & Wise (1980), which relies on the use of a "naive" estimator of S, defined as a union of balls, centered at the sample poi ..."
Abstract

Cited by 11 (0 self)
 Add to MetaCart
The authors consider the estimation of a set S # IR d from a random sample of n points. They examine the properties of a detection method, proposed by Devroye & Wise (1980), which relies on the use of a "naive" estimator of S, defined as a union of balls, centered at the sample points, with common radius #n . They obtain the convergence rate for the probability of false alarm and show that the smoothing parameter #n can be used to incorporate some prior information on the shape of S. They suggest two general methods for selecting #n and illustrate them with a simulation study and a real data example.
Nonparametric approach to the estimation of lengths and surface areas
 ANNALS OF STATISTICS
, 2007
"... The Minkowski content L0(G) of a body G⊂R d represents the boundary length (for d = 2) or the surface area (for d = 3) of G. A method for estimating L0(G) is proposed. It relies on a nonparametric estimator based on the information provided by a random sample (taken on a rectangle containing G) in w ..."
Abstract

Cited by 11 (1 self)
 Add to MetaCart
(Show Context)
The Minkowski content L0(G) of a body G⊂R d represents the boundary length (for d = 2) or the surface area (for d = 3) of G. A method for estimating L0(G) is proposed. It relies on a nonparametric estimator based on the information provided by a random sample (taken on a rectangle containing G) in which we are able to identify whether every point is inside or outside G. Some theoretical properties concerning strong consistency, L1error and convergence rates are obtained. A practical application to a problem of image analysis in cardiology is discussed in some detail. A brief simulation study is provided.
Nonparametric Density Estimation and Clustering with Application to Cosmology. unpuplished
, 2003
"... We present a nonparametric method for galaxy clustering in astronomical sky surveys. We show that the cosmological definition of clusters of galaxies is equivalent to density contour clusters (Hartigan, 1975) Sc = {f> c} where f is a probability density function. The plugin estimator ̂ Sc = { ..."
Abstract

Cited by 10 (2 self)
 Add to MetaCart
We present a nonparametric method for galaxy clustering in astronomical sky surveys. We show that the cosmological definition of clusters of galaxies is equivalent to density contour clusters (Hartigan, 1975) Sc = {f> c} where f is a probability density function. The plugin estimator ̂ Sc = { ̂ f> c} is used to estimate Sc where ̂ f is the multivariate kernel density estimator. To choose the optimal smoothing parameter, we use crossvalidation and the plugin method and show that crossvalidation method outperforms the plugin method in our case. A new cluster catalogue, database of the locations of clusters, based on the plugin estimator is compared to existing cluster catalogs, the Abell and Edinburgh/Durham Cluster Catalogue I (EDCCI). Our result is more consistent with the EDCCI than with the Abell, which is the most widely used catalogue. We use the smoothed bootstrap to asses the validity of clustering results.
PLUGIN ESTIMATION OF LEVEL SETS IN A NONCOMPACT SETTING WITH APPLICATIONS IN MULTIVARIATE RISK THEORY
, 2011
"... This paper deals with the problem of estimating the level sets L(c) = {F(x) ≥ c}, with c ∈ (0,1), of an unknown distribution function F on R 2 +. A plugin approach is followed. That is, given a consistent estimator Fn of F, we estimate L(c) by Ln(c) = {Fn(x) ≥ c}. In our setting, noncompactnes ..."
Abstract

Cited by 8 (4 self)
 Add to MetaCart
(Show Context)
This paper deals with the problem of estimating the level sets L(c) = {F(x) ≥ c}, with c ∈ (0,1), of an unknown distribution function F on R 2 +. A plugin approach is followed. That is, given a consistent estimator Fn of F, we estimate L(c) by Ln(c) = {Fn(x) ≥ c}. In our setting, noncompactness property is a priori required for the level sets to estimate. We state consistency results with respect to the Hausdorff distance and the volume of the symmetric difference. Our results are motivated by applications in multivariate risk theory. In particular we propose a new bivariate version of the Conditional Tail Expectation by conditioning the twodimensional random vector to be in the level set L(c). We also present simulated and real examples which illustrate our theoretical results.
Complexity Penalized Support Estimation
, 2004
"... We consider the estimation of the support of a probability density function with iid observations. The estimator to be considered is a minimizer of a complexity penalized excess mass criterion. We present a fast algorithm for the construction of the estimator. The estimator is able to estimate s ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
We consider the estimation of the support of a probability density function with iid observations. The estimator to be considered is a minimizer of a complexity penalized excess mass criterion. We present a fast algorithm for the construction of the estimator. The estimator is able to estimate supports which consists of disconnected regions. We will prove that the estimator achieves minimax rates of convergence up to a logarithmic factor simultaneously over a scale of Hölder smoothness classes for the boundary of the support. The proof assumes a sharp boundary for the support.
Cluster Analysis of Massive Datasets in Astronomy
, 2006
"... Clusters of galaxies are a useful proxy to trace the mass distribution of the universe. By measuring the mass of clusters of galaxies at different scales, one can follow the evolution of the mass distribution (Martínez and Saar, 2002). It can be shown that finding galaxies clustering is equivalent t ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
Clusters of galaxies are a useful proxy to trace the mass distribution of the universe. By measuring the mass of clusters of galaxies at different scales, one can follow the evolution of the mass distribution (Martínez and Saar, 2002). It can be shown that finding galaxies clustering is equivalent to finding density contour clusters (Hartigan, 1975): connected components of the level set Sc ≡ {f> c} where f is a probability density function. Cuevas et al. (2000, 2001) proposed a nonparametric method for density contour clusters. They attempt to find density contour clusters by the minimal spanning tree. While their algorithm is conceptually simple, it requires intensive computations for large datasets. We propose a more efficient clustering method based on their algorithm with the Fast Fourier Transform (FFT). The method is applied to a study of galaxy clustering on large astronomical sky survey data.
Optimal rates of convergence for persistence diagrams in topological data analysis
"... Computational topology has recently known an important development toward data analysis, giving birth to the field of topological data analysis. Topological persistence, or persistent homology, appears as a fundamental tool in this field. In this paper, we study topological persistence in general me ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
Computational topology has recently known an important development toward data analysis, giving birth to the field of topological data analysis. Topological persistence, or persistent homology, appears as a fundamental tool in this field. In this paper, we study topological persistence in general metric spaces, with a statistical approach. We show that the use of persistent homology can be naturally considered in general statistical frameworks and persistence diagrams can be used as statistics with interesting convergence properties. Some numerical experiments are performed in various contexts to illustrate our results. 1
Clusters and water flows: a novel approach to modal clustering through Morse theory
, 2014
"... The problem of finding groups in data (cluster analysis) has been extensively studied by researchers from the fields of Statistics and Computer Science, among others. However, despite its popularity it is widely recognized that the investigation of some theoretical aspects of clustering has been re ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
The problem of finding groups in data (cluster analysis) has been extensively studied by researchers from the fields of Statistics and Computer Science, among others. However, despite its popularity it is widely recognized that the investigation of some theoretical aspects of clustering has been relatively sparse. One of the main reasons for this lack of theoretical results is surely the fact that, unlike the situation with other statistical problems as regression or classification, for some of the cluster methodologies it is quite difficult to specify a population goal to which the databased clustering algorithms should try to get close. This paper aims to provide some insight into the theoretical foundations of the usual nonparametric approach to clustering, which understands clusters as regions of high density, by presenting an explicit formulation for the ideal population clustering.
On the path density of a gradient field
 Annals of Statistics
"... We consider the problem of reliably finding filaments in point clouds. Realistic data sets often have numerous filaments of various sizes and shapes. Statistical techniques exist for finding one (or a few) filaments but these methods do not handle noisy data sets with many filaments. Other methods c ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
We consider the problem of reliably finding filaments in point clouds. Realistic data sets often have numerous filaments of various sizes and shapes. Statistical techniques exist for finding one (or a few) filaments but these methods do not handle noisy data sets with many filaments. Other methods can be found in the astronomy literature but they do not have rigorous statistical guarantees. We propose the following method. Starting at each data point we construct the steepest ascent path along a kernel density estimator. We locate filaments by finding regions where these paths are highly concentrated. Formally, we define the density of these paths and we construct a consistent estimator of this path density. 1. Introduction. The
Adaptation to lowest density regions with application to support recovery
, 2014
"... Adaptation to lowest density regions with ..."