Results 1  10
of
17
High dimensional classification using features annealed independence rules
 Ann. Statist
, 2008
"... ABSTRACT. Classification using highdimensional features arises frequently in many contemporary statistical studies such as tumor classification using microarray or other highthroughput data. The impact of dimensionality on classifications is largely poorly understood. In a seminal paper, Bickel an ..."
Abstract

Cited by 80 (18 self)
 Add to MetaCart
ABSTRACT. Classification using highdimensional features arises frequently in many contemporary statistical studies such as tumor classification using microarray or other highthroughput data. The impact of dimensionality on classifications is largely poorly understood. In a seminal paper, Bickel and Levina (2004) show that the Fisher discriminant performs poorly due to diverging spectra and they propose to use the independence rule to overcome the problem. We first demonstrate that even for the independence classification rule, classification using all the features can be as bad as the random guessing due to noise accumulation in estimating population centroids in highdimensional feature space. In fact, we demonstrate further that almost all linear discriminants can perform as bad as the random guessing. Thus, it is paramountly important to select a subset of important features for highdimensional classification, resulting in Features Annealed Independence Rules (FAIR). The conditions under which all the important features can be selected by the twosample tstatistic are established. The choice of the optimal number of features, or equivalently, the threshold value of the test statistics are proposed based on an upper bound of the classification error. Simulation studies and real data analysis support our theoretical results and demonstrate convincingly the advantage of our new classification procedure.
A Twosample Test for Highdimensional Data with Application to GeneSet Testing,”
 Annals of Statistics,
, 2010
"... ..."
Stepup procedures controlling generalized FWER and generalized
, 2007
"... In many applications of multiple hypothesis testing where more than one false rejection can be tolerated, procedures controlling error rates measuring at least k false rejections, instead of at least one, for some fixed k ≥ 1 can potentially increase the ability of a procedure to detect false null h ..."
Abstract

Cited by 18 (4 self)
 Add to MetaCart
(Show Context)
In many applications of multiple hypothesis testing where more than one false rejection can be tolerated, procedures controlling error rates measuring at least k false rejections, instead of at least one, for some fixed k ≥ 1 can potentially increase the ability of a procedure to detect false null hypotheses. The kFWER, a generalized version of the usual familywise error rate (FWER), is such an error rate that has recently been introduced in the literature and procedures controlling it have been proposed. A further generalization of a result on the kFWER is provided in this article. In addition, an alternative and less conservative notion of error rate, the kFDR, is introduced in the same spirit as the kFWER by generalizing the usual false discovery rate (FDR). A kFWER procedure is constructed given any set of increasing constants by utilizing the kth order joint null distributions of the pvalues without assuming any specific form of dependence among all the pvalues. Procedures controlling the kFDR are also developed by using the kth order joint null distributions of the pvalues, first assuming that the sets of null and nonnull pvalues are mutually independent or they are jointly positively dependent in the sense of being multivariate totally positive of order two (MTP2) and then discarding that assumption about the overall dependence among the pvalues. 1. Introduction. Having
Central limit theorems and multiplier bootstrap when p is much larger than n
, 2012
"... Abstract. We derive a central limit theorem for the maximum of a sum of high dimensional random vectors. More precisely, we establish conditions under which the distribution of the maximum is approximated by the maximum of a sum of the Gaussian random vectors with the same covariance matrices as th ..."
Abstract

Cited by 4 (3 self)
 Add to MetaCart
Abstract. We derive a central limit theorem for the maximum of a sum of high dimensional random vectors. More precisely, we establish conditions under which the distribution of the maximum is approximated by the maximum of a sum of the Gaussian random vectors with the same covariance matrices as the original vectors. The key innovation of our result is that it applies even if the dimension of random vectors (p) is much larger than the sample size (n). In fact, the growth of p could be exponential in some fractional power of n. We also show that the distribution of the maximum of a sum of the Gaussian random vectors with unknown covariance matrices can be estimated by the distribution of the maximum of the (conditional) Gaussian process obtained by multiplying the original vectors with i.i.d. Gaussian multipliers. We call this procedure the "multiplier bootstrap". Here too, the growth of p could be exponential in some fractional power of n. We prove that our distributional approximations, either Gaussian or conditional Gaussian, yield a highquality approximation for the distribution of the original maximum, often with at most a polynomial approximation error. These results are of interest in numerous econometric and statistical applications. In particular, we demonstrate how our central limit theorem and the multiplier bootstrap can be used for high dimensional estimation, multiple hypothesis testing, and adaptive specification testing. All of our results contain nonasymptotic bounds on approximation errors.
Phase Transition and Regularized Bootstrap in Large Scale ttests with False Discovery Rate Control
, 2013
"... ar ..."
RAPTT: An Exact TwoSample Test in High Dimensions Using Random Projections. ArXiv eprints
, 1405
"... ar ..."
Certified by....................
, 2013
"... Monotonicity is a key qualitative prediction of a wide array of economic models derived via robust comparative statics. It is therefore important to design effective and practical econometric methods for testing this prediction in empirical analysis. Chapter 1 develops a general nonparametric frame ..."
Abstract
 Add to MetaCart
Monotonicity is a key qualitative prediction of a wide array of economic models derived via robust comparative statics. It is therefore important to design effective and practical econometric methods for testing this prediction in empirical analysis. Chapter 1 develops a general nonparametric framework for testing monotonicity of a regression function. Using this framework, a broad class of new tests is introduced, which gives an empirical researcher a lot of flexibility to incorporate ex ante information she might have. Chapter 1 also develops new methods for simulating critical values, which are based on the combination of a bootstrap procedure and new selection algorithms. These methods yield tests that have correct asymptotic size and are asymptotically nonconservative. It is also shown how to obtain an adaptive rate optimal test that has the best attainable rate of uniform consistency against models whose regression function has Lipschitzcontinuous firstorder derivatives and that automatically adapts to the unknown smoothness of the regression function. Simulations show that the power of the new tests in many cases significantly exceeds that
high dimensions, regularization, and the L1 asymptotics
"... Density estimation in high and ultra ..."
Cramér Type Moderate Deviation for the Maximum of the Periodogram with Application to Simultaneous Tests in Gene Expression Time Series
, 908
"... Abstract. In this paper, Cramér type moderate deviations for the maximum of the periodogram and its studentized version are derived. The results are then applied to a simultaneous testing problem in gene expression time series. It is shown that the level of the simultaneous tests is accurate provide ..."
Abstract
 Add to MetaCart
Abstract. In this paper, Cramér type moderate deviations for the maximum of the periodogram and its studentized version are derived. The results are then applied to a simultaneous testing problem in gene expression time series. It is shown that the level of the simultaneous tests is accurate provided that the number of genes G and the sample size n satisfy G = exp(o(n1/3)).