Results 1  10
of
10
Equivalence of distancebased and RKHSbased statistics in hypothesis testing
 Ann. Statist
, 2013
"... We provide a unifying framework linking two classes of statistics used in twosample and independence testing: on the one hand, the energy distances and distance covariances from the statistics literature; on the other, maximum mean discrepancies (MMD), that is, distances between embeddings of dist ..."
Abstract

Cited by 19 (6 self)
 Add to MetaCart
We provide a unifying framework linking two classes of statistics used in twosample and independence testing: on the one hand, the energy distances and distance covariances from the statistics literature; on the other, maximum mean discrepancies (MMD), that is, distances between embeddings of distributions to reproducing kernel Hilbert spaces (RKHS), as established in machine learning. In the case where the energy distance is computed with a semimetric of negative type, a positive definite kernel, termed distance kernel, may be defined such that the MMD corresponds exactly to the energy distance. Conversely, for any positive definite kernel, we can interpret the MMD as energy distance with respect to some negativetype semimetric. This equivalence readily extends to distance covariance using kernels on the product space. We determine the class of probability distributions for which the test statistics are consistent against all alternatives. Finally, we investigate the performance of the family of distance kernels in twosample and independence tests: we show in particular that the energy distance most commonly employed in statistics is just one member of a parametric family of kernels, and that other choices from this family can yield more powerful tests. 1. Introduction. The
Btest: A nonparametric, low variance kernel twosample test
 In Advances in Neural Information Processing Systems
, 2013
"... We propose a family of maximum mean discrepancy (MMD) kernel twosample tests that have low sample complexity and are consistent. The test has a hyperparameter that allows one to control the tradeoff between sample complexity and computational time. Our family of tests, which we denote as Btests, ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
(Show Context)
We propose a family of maximum mean discrepancy (MMD) kernel twosample tests that have low sample complexity and are consistent. The test has a hyperparameter that allows one to control the tradeoff between sample complexity and computational time. Our family of tests, which we denote as Btests, is both computationally and statistically efficient, combining favorable properties of previously proposed MMD twosample tests. It does so by better leveraging samples to produce low variance estimates in the finite sample case, while avoiding a quadratic number of kernel evaluations and complex nullhypothesis approximation as would be required by tests relying on one sample Ustatistics. The Btest uses a smaller than quadratic number of kernel evaluations and avoids completely the computational burden of complex nullhypothesis approximation, while maintaining consistency and probabilistically conservative thresholds on Type I error. Finally, recent results of combining multiple kernels transfer seamlessly to our hypothesis test, allowing a further increase in discriminative power and decrease in sample complexity. 1
Ultrahigh Dimensional Feature Screening via RKHS Embeddings
"... Feature screening is a key step in handling ultrahigh dimensional data sets that are ubiquitous in modern statistical problems. Over the last decade, convex relaxation based approaches (e.g., Lasso/sparse additive model) have been extensively developed and analyzed for feature selection in high dime ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
Feature screening is a key step in handling ultrahigh dimensional data sets that are ubiquitous in modern statistical problems. Over the last decade, convex relaxation based approaches (e.g., Lasso/sparse additive model) have been extensively developed and analyzed for feature selection in high dimensional regime. But in the ultrahigh dimensional regime, these approaches suffer from several problems, both computationally and statistically. To overcome these issues, in this paper, we propose a novel Hilbert space embedding based approach to independence screening for ultrahigh dimensional data sets. The proposed approach is modelfree (i.e., no model assumption is made between response and predictors) and could handle nonstandard (e.g., graphs) and multivariate outputs directly. We establish the sure screening property of the proposed approach in the ultrahigh dimensional regime, and experimentally demonstrate its advantages and superiority over other approaches on several synthetic and real data sets. 1
Btests: Low Variance Kernel TwoSample Tests
 in "Neural Information Processing Systems", Lake Tahoe, United States
, 2013
"... A family of maximum mean discrepancy (MMD) kernel twosample tests is introduced. Members of the test family are called Blocktests or Btests, since the test statistic is an average over MMDs computed on subsets of the samples. The choice of block size allows control over the tradeoff between test ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
A family of maximum mean discrepancy (MMD) kernel twosample tests is introduced. Members of the test family are called Blocktests or Btests, since the test statistic is an average over MMDs computed on subsets of the samples. The choice of block size allows control over the tradeoff between test power and computation time. In this respect, the Btest family combines favorable properties of previously proposed MMD twosample tests: Btests are more powerful than a linear time test where blocks are just pairs of samples, yet they are more computationally efficient than a quadratic time test where a single large block incorporating all the samples is used to compute a Ustatistic. A further important advantage of the Btests is their asymptotically Normal null distribution: this is by contrast with the Ustatistic, which is degenerate under the null hypothesis, and for which estimates of the null distribution are computationally demanding. Recent results on kernel selection for hypothesis testing transfer seamlessly to the Btests, yielding a means to optimize test power via kernel choice. 1
unknown title
, 1053
"... ernelbased methods provide a rich and elegant framework for developing nonparametric detection procedures for signal processing. Several recently proposed procedures can be simply described using basic concepts of reproducing kernel Hilbert space (RKHS) embeddings of probability distributions, main ..."
Abstract
 Add to MetaCart
(Show Context)
ernelbased methods provide a rich and elegant framework for developing nonparametric detection procedures for signal processing. Several recently proposed procedures can be simply described using basic concepts of reproducing kernel Hilbert space (RKHS) embeddings of probability distributions, mainly mean elements and covariance operators. We propose a unified view of these tools and draw relationships with information divergences between distributions. INTRODUCTION AND CONTEXT Testing hypotheses of signals is one of the key topics in statistical signal processing [1]. Popular examples include testing for equality of signals/homogeneity, as in speaker verification [2]–[4] or change detection [5], [6]. Testing for a changepoint
Signal Processing
"... aleksandar velasevic [ A unified view] Kernelbased methods provide a rich and elegant framework for developing nonparametric detection procedures for signal processing. Several recently proposed procedures can be simply described using basic concepts of reproducing kernel Hilbert space (RKHS) embed ..."
Abstract
 Add to MetaCart
(Show Context)
aleksandar velasevic [ A unified view] Kernelbased methods provide a rich and elegant framework for developing nonparametric detection procedures for signal processing. Several recently proposed procedures can be simply described using basic concepts of reproducing kernel Hilbert space (RKHS) embeddings of probability distributions, mainly mean elements and covariance operators. We propose a unified view of these tools and draw relationships with information divergences between distributions. Introduction and context Testing hypotheses of signals is one of the key topics in statistical signal processing [1]. Popular examples include testing for equality of signals/homogeneity, as in speaker verification
To cite this version:
, 2012
"... HAL is a multidisciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte p ..."
Abstract
 Add to MetaCart
HAL is a multidisciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et a ̀ la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés. GAIDON et al.: RECOGNIZING ACTIVITIES WITH CLUSTERTREES OF TRACKLETS 1 Recognizing activities with clustertrees of
Asymptotic e ciency criterion Experiments
, 2012
"... kernel selection for largescale tests and equivalence to energy distance ..."
Abstract
 Add to MetaCart
kernel selection for largescale tests and equivalence to energy distance