Results 11  20
of
116
Asymptotic normality of plugin level set estimates
 Annals of Applied Probability
, 2009
"... We establish the asymptotic normality of the Gmeasure of the symmetric difference between the level set and a plugintype estimator of it formed by replacing the density in the definition of the level set by a kernel density estimator. Our proof will highlight the efficacy of Poissonization method ..."
Abstract

Cited by 22 (2 self)
 Add to MetaCart
We establish the asymptotic normality of the Gmeasure of the symmetric difference between the level set and a plugintype estimator of it formed by replacing the density in the definition of the level set by a kernel density estimator. Our proof will highlight the efficacy of Poissonization methods in the treatment of large sample theory problems of this kind.
Kernel estimation of density level sets
 J. Multivariate Anal
, 2006
"... Abstract. Let f be a multivariate density and fn be a kernel estimate of f drawn from the nsample X1, · · ·,Xn of i.i.d. random variables with density f. We compute the asymptotic rate of convergence towards 0 of the volume of the symmetric difference between the tlevel set {f ≥ t} and its plug ..."
Abstract

Cited by 21 (3 self)
 Add to MetaCart
(Show Context)
Abstract. Let f be a multivariate density and fn be a kernel estimate of f drawn from the nsample X1, · · ·,Xn of i.i.d. random variables with density f. We compute the asymptotic rate of convergence towards 0 of the volume of the symmetric difference between the tlevel set {f ≥ t} and its plugin estimator {fn ≥ t}. As a corollary, we obtain the exact rate of convergence of a plugin type estimate of the density level set corresponding to a fixed probability for the law induced by f.
Adaptive Hausdorff Estimation of Density Level Sets
, 2007
"... Consider the problem of estimating the γlevel set G ∗ γ = {x: f(x) ≥ γ} of an unknown ddimensional density function f based on n independent observations X1,..., Xn from the density. This problem has been addressed under global error criteria related to the symmetric set difference. However, in c ..."
Abstract

Cited by 20 (0 self)
 Add to MetaCart
Consider the problem of estimating the γlevel set G ∗ γ = {x: f(x) ≥ γ} of an unknown ddimensional density function f based on n independent observations X1,..., Xn from the density. This problem has been addressed under global error criteria related to the symmetric set difference. However, in certain applications such as anomaly detection and clustering, a more uniform mode of convergence is desirable to ensure that the estimated set is close to the target set everywhere. The Hausdorff error criterion provides this degree of uniformity and hence is more appropriate in such situations. It is known that the minimax optimal rate of convergence for the Hausdorff error is (n/log n) −1/(d+2α) for level sets with Lipschitz boundaries, where the parameter α characterizes the regularity of the density around the level of interest. However, the estimators proposed in previous work achieve this rate for very restricted classes of sets (e.g. the boundary fragment and starshaped sets) that effectively reduce the set estimation problem to a function estimation problem. This characterization precludes the existence of multiple connected components, which is fundamental to many applications such as clustering. Also, all previous work assumes knowledge of the density regularity as characterized by the parameter α. In this paper, we present a procedure that is adaptive to unknown regularity conditions and achieves near minimax optimal rates of Hausdorff error convergence for a class of level sets with very general shapes and multiple connected components at arbitrary orientations. 1
Multiscale Inference about a Density
, 2007
"... We introduce a multiscale test statistic based on local order statistics and spacings that provides simultaneous confidence statements for the existence and location of local increases and decreases of a density or a failure rate. The procedure provides guaranteed finitesample significance levels, ..."
Abstract

Cited by 20 (5 self)
 Add to MetaCart
We introduce a multiscale test statistic based on local order statistics and spacings that provides simultaneous confidence statements for the existence and location of local increases and decreases of a density or a failure rate. The procedure provides guaranteed finitesample significance levels, is easy to implement and possesses certain asymptotic optimality and adaptivity properties.
LowNoise Density Clustering
"... We study densitybased clustering under lownoise conditions. Our framework allows for sharply defined clusters such as clusters on lower dimensional manifolds. We show that accurate clustering is possible even in high dimensions. We propose two databased methods for choosing the bandwidth and we s ..."
Abstract

Cited by 18 (7 self)
 Add to MetaCart
We study densitybased clustering under lownoise conditions. Our framework allows for sharply defined clusters such as clusters on lower dimensional manifolds. We show that accurate clustering is possible even in high dimensions. We propose two databased methods for choosing the bandwidth and we study the stability properties of density clusters. We show that a simple graphbased algorithm known as the “friendsoffriends ” algorithm successfully approximates the high density clusters. 1
Estimating the Number of Clusters
, 2000
"... Hartigan (1975) defines the number q of clusters in a dvariate statistical population as the number of connected components of the set {f>c}, where f denotes the underlying density function on R^d and c is a given constant. Some usual cluster algorithms treat q as an input which must be given in ..."
Abstract

Cited by 17 (0 self)
 Add to MetaCart
Hartigan (1975) defines the number q of clusters in a dvariate statistical population as the number of connected components of the set {f>c}, where f denotes the underlying density function on R^d and c is a given constant. Some usual cluster algorithms treat q as an input which must be given in advance. The authors propose a method for estimating this parameter which is based on the computation of the number of connected components of an estimate of {f>c}. This set estimator is constructed as a union of balls with centres at an appropriate subsample which is selected via a nonparametric density estimator of f. The asymptotic behaviour of the proposed method is analyzed. A simulation study and an example with real data are also included.
Exact Rates in Density Support Estimation
"... Let f be an unknown multivariate probability density with compact support Sf. Given n independent observations X1,...,Xn drawn from f, this paper is devoted to the study of the estimator Ŝn of Sf defined as unions of balls centered at the Xi and of common radius rn. measure the proximity between S ..."
Abstract

Cited by 17 (1 self)
 Add to MetaCart
Let f be an unknown multivariate probability density with compact support Sf. Given n independent observations X1,...,Xn drawn from f, this paper is devoted to the study of the estimator Ŝn of Sf defined as unions of balls centered at the Xi and of common radius rn. measure the proximity between Ŝn and Sf, we employ a general criterion dg, based on some function g, which encompasses many statistical situations of interest. Under mild assumptions on the sequence (rn) and some analytic conditions on f and g, the exact rates of convergence of dg(Ŝn, Sf) are obtained using tools from Riemannian geometry. The conditions on the radius sequence are found to be sharp and consequences of the results are discussed from a statistical perspective.
Calibrating the excess mass and dip tests of modality
 J R Stat Soc Ser B (Statistical Methodol
, 1998
"... Summary. Nonparametric tests of modality are a distributionfree way of assessing evidence about inhomogeneity in a population, provided that the potential subpopulations are suf®ciently well separated. They include the excess mass and dip tests, which are equivalent in univariate settings and are ..."
Abstract

Cited by 16 (2 self)
 Add to MetaCart
(Show Context)
Summary. Nonparametric tests of modality are a distributionfree way of assessing evidence about inhomogeneity in a population, provided that the potential subpopulations are suf®ciently well separated. They include the excess mass and dip tests, which are equivalent in univariate settings and are alternatives to the bandwidth test. Only very conservative forms of the excess mass and dip tests are available at present, however, and for that reason they are generally not competitive with the bandwidth test. In the present paper we develop a practical approach to calibrating the excess mass and dip tests to improve their level accuracy and power substantially. Our method exploits the fact that the limiting distribution of the excess mass statistic under the null hypothesis depends on unknowns only through a constant, which may be estimated. Our calibrated test exploits this fact and is shown to have greater power and level accuracy than the bandwidth test has. The latter tends to be quite conservative, even in an asymptotic sense. Moreover, the calibrated test avoids dif®culties that the bandwidth test has with spurious modes in the tails, which often must be discounted through subjective intervention of the experimenter.
How to Divide a Territory? A New Simple Differential Formalism for Optimization of Set Functions
 International Journal of Intelligent Systems
"... ..."
Optimal rates for plugin estimators of density level sets
"... In the context of density level set estimation, we study the convergence of general plugin methods under two main assumptions on the density for a given level λ. More precisely, it is assumed that the density (i) is smooth in a neighborhood of λ and (ii) has γexponent at level λ. Condition (i) ens ..."
Abstract

Cited by 14 (0 self)
 Add to MetaCart
In the context of density level set estimation, we study the convergence of general plugin methods under two main assumptions on the density for a given level λ. More precisely, it is assumed that the density (i) is smooth in a neighborhood of λ and (ii) has γexponent at level λ. Condition (i) ensures that the density can be estimated at a standard nonparametric rate and condition (ii) is similar to Tsybakov’s margin assumption which is stated for the classification framework. Under these assumptions, we derive optimal rates of convergence for plugin estimators. Explicit convergence rates are given for plugin estimators based on kernel density estimators when the underlying measure is the Lebesgue measure. Lower bounds proving optimality of the rates in a minimax sense when the density is Hölder smooth are also provided.