Results 1  10
of
19
Central Limit Theorems for Classical Likelihood Ratio Tests for HighDimensional Normal Distributions
"... For random samples of size n obtained from pvariate normal distributions, we consider the classical likelihood ratio tests (LRT) for their means and covariance matrices in the highdimensional setting. These test statistics have been extensively studied in multivariate analysis and their limiting d ..."
Abstract

Cited by 10 (3 self)
 Add to MetaCart
For random samples of size n obtained from pvariate normal distributions, we consider the classical likelihood ratio tests (LRT) for their means and covariance matrices in the highdimensional setting. These test statistics have been extensively studied in multivariate analysis and their limiting distributions under the null hypothesis were proved to be chisquare distributions as n goes to infinity and p remains fixed. In this paper, we consider the highdimensional case where both p and n go to infinity with p/n → y ∈ (0, 1]. We prove that the likelihood ratio test statistics under this assumption will converge in distribution to normal distributions with explicit means and variances. We perform the simulation study to show that the likelihood ratio tests using our central limit theorems outperform those using the traditional chisquare approximations for analyzing highdimensional data.
Challenges of big data analysis
 National Science Review
, 2014
"... Big Data bring new opportunities to modern society and challenges to data scientists. On one hand, Big Data hold great promises for discovering subtle population patterns and heterogeneities that are not possible with smallscale data. On the other hand, the massive sample size and high dimensional ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
Big Data bring new opportunities to modern society and challenges to data scientists. On one hand, Big Data hold great promises for discovering subtle population patterns and heterogeneities that are not possible with smallscale data. On the other hand, the massive sample size and high dimensionality of Big Data introduce unique computational and statistical challenges, including scalability and storage bottleneck, noise accumulation, spurious correlation, incidental endogeneity, and measurement errors. These challenges are distinguished and require new computational and statistical paradigm. This article give overviews on the salient features of Big Data and how these features impact on paradigm change on statistical and computational methods as well as computing architectures. We also provide various new perspectives on the Big Data analysis and computation. In particular, we emphasis on the viability of the sparsest solution in highconfidence set and point out that exogeneous assumptions in most statistical methods for Big Data can not be validated due to incidental endogeneity. They can lead to wrong statistical inferences and consequently wrong scientific conclusions.
Supplement to “Universal asymptotics for highdimensional sign tests
, 2013
"... In a smalln largep hypothesis testing framework, most procedures in the literature require quite stringent distributional assumptions, and restrict to a specific scheme of (n, p)asymptotics. More precisely, multinormality is almost always assumed, and it is imposed, typically, that p/n → c, fo ..."
Abstract

Cited by 5 (3 self)
 Add to MetaCart
In a smalln largep hypothesis testing framework, most procedures in the literature require quite stringent distributional assumptions, and restrict to a specific scheme of (n, p)asymptotics. More precisely, multinormality is almost always assumed, and it is imposed, typically, that p/n → c, for some c in some given convex set C ⊂ (0,∞). Such restrictions clearly jeopardize practical relevance of these procedures. In this paper, we consider several classical testing problems in multivariate analysis, directional statistics, and multivariate time series: the problem of testing uniformity on the unit sphere, the spherical location problem, the problem of testing that a process is white noise versus serial dependence, the problem of testing for multivariate independence, and the problem of testing for sphericity. In each case, we show that the natural sign tests enjoy nonparametric validity and are distributionfree in a “universal ” (n, p)asymptotics framework, where p may go to infinity in an arbitrary way as n does. Simulations confirm our asymptotic results. 1. Introduction. There
Distributions of Angles in Random Packing on Spheres
"... This paper studies the asymptotic behaviors of the pairwise angles among n randomly and uniformly distributed unit vectors in R p as the number of points n → ∞, while the dimension p is either fixed or growing with n. For both settings, we derive the limiting empirical distribution of the random ang ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
(Show Context)
This paper studies the asymptotic behaviors of the pairwise angles among n randomly and uniformly distributed unit vectors in R p as the number of points n → ∞, while the dimension p is either fixed or growing with n. For both settings, we derive the limiting empirical distribution of the random angles and the limiting distributions of the extreme angles. The results reveal interesting differences in the two settings and provide a precise characterization of the folklore that “all highdimensional random vectors are almost always nearly orthogonal to each other”. Applications to statistics and machine learning and connections with some open problems in physics and mathematics are also discussed.
Approximation of Rectangular BetaLaguerre Ensembles and Large Deviations
"... Let λ1, · · · , λn be random eigenvalues coming from the betaLaguerre ensemble with parameter p, which is a generalization of the real, complex and quaternion Wishart matrices of parameter (n, p). In the case that the sample size n is much smaller than the dimension of the population distributio ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
Let λ1, · · · , λn be random eigenvalues coming from the betaLaguerre ensemble with parameter p, which is a generalization of the real, complex and quaternion Wishart matrices of parameter (n, p). In the case that the sample size n is much smaller than the dimension of the population distribution p, a common situation in modern data, we approximate the betaLaguerre ensemble by a betaHermite ensemble which is a generalization of the real, complex and quaternion Wigner matrices. As corollaries, when n is much smaller than p, we show that the largest and smallest eigenvalues of the complex Wishart matrix are asymptotically independent; we obtain the limiting distribution of the condition numbers as a sum of two i.i.d. random variables with a TracyWidom distribution, which is much different from the exact square case that n = p by Edelman (1988); we propose a test procedure for a spherical hypothesis test. By the same approximation tool, we obtain the asymptotic distribution of the smallest eigenvalue of the betaLaguerre ensemble. In the second part of the paper, under the assumption that n is much smaller than p in a certain scale, we prove the large deviation principles for three basic statistics: the largest eigenvalue, the smallest eigenvalue and the empirical distribution of λ1, · · · , λn, where the last large deviation is derived by using a nonstandard method.
Features of Big Data and sparsest solution in high confidence set
"... This chapter summarizes some of the unique features of Big Data analysis. These features are shared neither by lowdimensional data nor by small samples. Big Data pose new computational challenges and hold great promises for understanding population heterogeneity as in personalized medicine or ser ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
This chapter summarizes some of the unique features of Big Data analysis. These features are shared neither by lowdimensional data nor by small samples. Big Data pose new computational challenges and hold great promises for understanding population heterogeneity as in personalized medicine or services. High dimensionality introduces spurious correlations, incidental endogeneity, noise accumulation, and measurement error. These unique features are very distinguished and statistical procedures should be designed with these issues in mind. To illustrate, a method called a sparsest solution in highconfidence set is introduced which is generally applicable to highdimensional statistical inference. This method, whose properties are briefly examined, is natural as the information about parameters contained in the data is summarized by highconfident sets and the sparsest solution is a way to deal with the noise accumulation issue. The first decade of this century has seen the explosion of data collection in
Distributions of Eigenvalues of Large Euclidean Matrices Generated from Three Manifolds
"... Let x1, · · · , xn be points randomly chosen from a set G ⊂ R p and f(x) be a function. A special Euclidean random matrix is given by Mn = (f(∥xi − xj ∥ 2))n×n. When p is fixed and n → ∞ we prove that ˆµ(Mn), the empirical distribution of the eigenvalues of Mn, converges to δ0 for a big class of ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Let x1, · · · , xn be points randomly chosen from a set G ⊂ R p and f(x) be a function. A special Euclidean random matrix is given by Mn = (f(∥xi − xj ∥ 2))n×n. When p is fixed and n → ∞ we prove that ˆµ(Mn), the empirical distribution of the eigenvalues of Mn, converges to δ0 for a big class of functions of f(x). Assuming both p and n go to infinity with n/p → y ∈ (0, ∞), we obtain the explicit limit of ˆµ(Mn) when G is the unit sphere S p−1 or the unit ball Bp(0, 1) and the explicit limit of ˆµ((Mn − apIn)/bp) for G = [0, 1] p, where ap and bp are constants. As corollaries, we obtain the limit of ˆµ(An) with An = (d(xi, xj))n×n and d being the geodesic distance on S p−1. We also obtain the limit of ˆµ(An) for the Euclidean distance matrix An = (∥xi − xj∥)n×n as G is S p−1 or Bp(0, 1). The limits are the law of a+bV where a and b are explicit constants and V follows the MarčenkoPastur law. The same are also obtained for other examples including (exp(−λ 2 ∥xi − xj ∥ γ))n×n and (exp(−λ 2 d(xi, xj) γ))n×n.
HighDimensional Tests for Spherical Location and Spiked Covariance
"... Highdimensional tests for spherical location and spiked covariance ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
Highdimensional tests for spherical location and spiked covariance
AGAINST CONTIGUOUS ROTATIONALLY SYMMETRIC ALTERNATIVES
, 2015
"... ECARES and Département de Mathématique, Université libre de Bruxelles ..."
and Financial
, 2013
"... Big Data bring new opportunities to modern society and challenges to data scientists. On the one hand, Big Data hold great promises for discovering subtle population patterns and heterogeneities that are not possible with smallscale data. On the other hand, the massive sample size and high dimensio ..."
Abstract
 Add to MetaCart
(Show Context)
Big Data bring new opportunities to modern society and challenges to data scientists. On the one hand, Big Data hold great promises for discovering subtle population patterns and heterogeneities that are not possible with smallscale data. On the other hand, the massive sample size and high dimensionality of Big Data introduce unique computational and statistical challenges, including scalability and storage bottleneck, noise accumulation, spurious correlation, incidental endogeneity and measurement errors. These challenges are distinguished and require new computational and statistical paradigm.This paper gives overviews on the salient features of Big Data and how these features impact on paradigm change on statistical and computational methods as well as computing architectures. We also provide various new perspectives on the Big Data analysis and computation. In particular, we emphasize on the viability of the sparsest solution in highconfidence set and point out that exogenous assumptions in most statistical methods for Big Data cannot be validated due to incidental endogeneity.They can lead to wrong statistical inferences and consequently wrong scientific conclusions.