Results 1  10
of
42
Estimating the Support of a HighDimensional Distribution
, 1999
"... Suppose you are given some dataset drawn from an underlying probability distribution P and you want to estimate a "simple" subset S of input space such that the probability that a test point drawn from P lies outside of S is bounded by some a priori specified between 0 and 1. We propo ..."
Abstract

Cited by 766 (29 self)
 Add to MetaCart
Suppose you are given some dataset drawn from an underlying probability distribution P and you want to estimate a "simple" subset S of input space such that the probability that a test point drawn from P lies outside of S is bounded by some a priori specified between 0 and 1. We propose a method to approach this problem by trying to estimate a function f which is positive on S and negative on the complement. The functional form of f is given by a kernel expansion in terms of a potentially small subset of the training data; it is regularized by controlling the length of the weight vector in an associated feature space. The expansion coefficients are found by solving a quadratic programming problem, which we do by carrying out sequential optimization over pairs of input patterns. We also provide a preliminary theoretical analysis of the statistical performance of our algorithm. The algorithm is a natural extension of the support vector algorithm to the case of unlabelled d...
Learning minimum volume sets
 J. Machine Learning Res
, 2006
"... Given a probability measure P and a reference measure µ, one is often interested in the minimum µmeasure set with Pmeasure at least α. Minimum volume sets of this type summarize the regions of greatest probability mass of P, and are useful for detecting anomalies and constructing confidence region ..."
Abstract

Cited by 41 (9 self)
 Add to MetaCart
Given a probability measure P and a reference measure µ, one is often interested in the minimum µmeasure set with Pmeasure at least α. Minimum volume sets of this type summarize the regions of greatest probability mass of P, and are useful for detecting anomalies and constructing confidence regions. This paper addresses the problem of estimating minimum volume sets based on independent samples distributed according to P. Other than these samples, no other information is available regarding P, but the reference measure µ is assumed to be known. We introduce rules for estimating minimum volume sets that parallel the empirical risk minimization and structural risk minimization principles in classification. As in classification, we show that the performances of our estimators are controlled by the rate of uniform convergence of empirical to true probabilities over the class from which the estimator is drawn. Thus we obtain finite sample size performance bounds in terms of VC dimension and related quantities. We also demonstrate strong universal consistency and an oracle inequality. Estimators based on histograms and dyadic partitions illustrate the proposed rules. 1
Information, Divergence and Risk for Binary Experiments
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2009
"... We unify fdivergences, Bregman divergences, surrogate regret bounds, proper scoring rules, cost curves, ROCcurves and statistical information. We do this by systematically studying integral and variational representations of these various objects and in so doing identify their primitives which all ..."
Abstract

Cited by 37 (8 self)
 Add to MetaCart
We unify fdivergences, Bregman divergences, surrogate regret bounds, proper scoring rules, cost curves, ROCcurves and statistical information. We do this by systematically studying integral and variational representations of these various objects and in so doing identify their primitives which all are related to costsensitive binary classification. As well as developing relationships between generative and discriminative views of learning, the new machinery leads to tight and more general surrogate regret bounds and generalised Pinsker inequalities relating fdivergences to variational divergence. The new viewpoint also illuminates existing algorithms: it provides a new derivation of Support Vector Machines in terms of divergences and relates Maximum Mean Discrepancy to Fisher Linear Discriminants.
Kernel estimation of density level sets
 J. Multivariate Anal
, 2006
"... Abstract. Let f be a multivariate density and fn be a kernel estimate of f drawn from the nsample X1, · · ·,Xn of i.i.d. random variables with density f. We compute the asymptotic rate of convergence towards 0 of the volume of the symmetric difference between the tlevel set {f ≥ t} and its plug ..."
Abstract

Cited by 24 (2 self)
 Add to MetaCart
(Show Context)
Abstract. Let f be a multivariate density and fn be a kernel estimate of f drawn from the nsample X1, · · ·,Xn of i.i.d. random variables with density f. We compute the asymptotic rate of convergence towards 0 of the volume of the symmetric difference between the tlevel set {f ≥ t} and its plugin estimator {fn ≥ t}. As a corollary, we obtain the exact rate of convergence of a plugin type estimate of the density level set corresponding to a fixed probability for the law induced by f.
Generalization error bounds in semisupervised classification under the cluster assumption
, 2007
"... ..."
LowNoise Density Clustering
"... We study densitybased clustering under lownoise conditions. Our framework allows for sharply defined clusters such as clusters on lower dimensional manifolds. We show that accurate clustering is possible even in high dimensions. We propose two databased methods for choosing the bandwidth and we s ..."
Abstract

Cited by 19 (8 self)
 Add to MetaCart
(Show Context)
We study densitybased clustering under lownoise conditions. Our framework allows for sharply defined clusters such as clusters on lower dimensional manifolds. We show that accurate clustering is possible even in high dimensions. We propose two databased methods for choosing the bandwidth and we study the stability properties of density clusters. We show that a simple graphbased algorithm known as the “friendsoffriends ” algorithm successfully approximates the high density clusters. 1
Exact Rates in Density Support Estimation
"... Let f be an unknown multivariate probability density with compact support Sf. Given n independent observations X1,...,Xn drawn from f, this paper is devoted to the study of the estimator Ŝn of Sf defined as unions of balls centered at the Xi and of common radius rn. measure the proximity between S ..."
Abstract

Cited by 17 (1 self)
 Add to MetaCart
Let f be an unknown multivariate probability density with compact support Sf. Given n independent observations X1,...,Xn drawn from f, this paper is devoted to the study of the estimator Ŝn of Sf defined as unions of balls centered at the Xi and of common radius rn. measure the proximity between Ŝn and Sf, we employ a general criterion dg, based on some function g, which encompasses many statistical situations of interest. Under mild assumptions on the sequence (rn) and some analytic conditions on f and g, the exact rates of convergence of dg(Ŝn, Sf) are obtained using tools from Riemannian geometry. The conditions on the radius sequence are found to be sharp and consequences of the results are discussed from a statistical perspective.
Estimating the Number of Clusters
, 2000
"... Hartigan (1975) defines the number q of clusters in a dvariate statistical population as the number of connected components of the set {f>c}, where f denotes the underlying density function on R^d and c is a given constant. Some usual cluster algorithms treat q as an input which must be given in ..."
Abstract

Cited by 16 (0 self)
 Add to MetaCart
Hartigan (1975) defines the number q of clusters in a dvariate statistical population as the number of connected components of the set {f>c}, where f denotes the underlying density function on R^d and c is a given constant. Some usual cluster algorithms treat q as an input which must be given in advance. The authors propose a method for estimating this parameter which is based on the computation of the number of connected components of an estimate of {f>c}. This set estimator is constructed as a union of balls with centres at an appropriate subsample which is selected via a nonparametric density estimator of f. The asymptotic behaviour of the proposed method is analyzed. A simulation study and an example with real data are also included.
Discussion of paper by
, 1990
"... Thanks to David Mason for helpful comments and providing us with copies of his work. GAUSS programs that carry out the computations in this paper are available from the web site ..."
Abstract

Cited by 13 (3 self)
 Add to MetaCart
Thanks to David Mason for helpful comments and providing us with copies of his work. GAUSS programs that carry out the computations in this paper are available from the web site
NONPARAMETRIC TESTS OF CONDITIONAL TREATMENT EFFECTS
, 2009
"... We develop a general class of nonparametric tests for treatment effects conditional on covariates. We consider a wide spectrum of null and alternative hypotheses regarding conditional treatment effects, including (i) the null hypothesis of the conditional stochastic dominance between treatment and c ..."
Abstract

Cited by 11 (2 self)
 Add to MetaCart
We develop a general class of nonparametric tests for treatment effects conditional on covariates. We consider a wide spectrum of null and alternative hypotheses regarding conditional treatment effects, including (i) the null hypothesis of the conditional stochastic dominance between treatment and control groups; (ii) the null hypothesis that the conditional average treatment effect is positive for each value of covariates; and (iii) the null hypothesis of no distributional (or average) treatment effect conditional on covariates against a onesided (or twosided) alternative hypothesis. The test statistics are based on L1type functionals of uniformly consistent nonparametric kernel estimators of conditional expectations that characterize the null hypotheses. Using the Poissionization technique of Giné et al. (2003), we show that suitably studentized versions of our test statistics are asymptotically standard normal under the null hypotheses and also show that the proposed nonparametric tests are consistent against general fixed alternatives. Furthermore, it turns out that our tests have nonnegligible powers against some local alternatives that are n −1/2 different from the null hypotheses, where n is the sample size. We provide a more powerful test for the case when the null hypothesis may be binding only on a strict subset of the support