Results 1  10
of
42
Detection of an Anomalous Cluster in a Network
, 2010
"... We consider the problem of detecting whether or not in a given sensor network, there is a cluster of sensors which exhibit an “unusual behavior.” Formally, suppose we are given a set of nodes and attach a random variable to each node. We observe a realization of this process and want to decide bet ..."
Abstract

Cited by 32 (4 self)
 Add to MetaCart
We consider the problem of detecting whether or not in a given sensor network, there is a cluster of sensors which exhibit an “unusual behavior.” Formally, suppose we are given a set of nodes and attach a random variable to each node. We observe a realization of this process and want to decide between the following two hypotheses: under the null, the variables are i.i.d. standard normal; under the alternative, there is a cluster of variables that are i.i.d. normal with positive mean and unit variance, while the rest are i.i.d. standard normal. We also address surveillance settings where each sensor in the network collects information over time. The resulting model is similar, now with a time series attached to each node. We again observetheprocessovertime and want to decide between the null, where all the variables are i.i.d. standard normal; and the alternative, where there is an emerging cluster of i.i.d. normal variables with positive mean and unit variance. The growth models used to represent the emerging cluster are quite general, and in particular include cellular automata used in modelling epidemics. In both settings, we consider classes of clusters that are quite general, for which we obtain a lower bound on their respective minimax detection rate, and show that some form of scan statistic, by far the most popular method in practice, achieves that same rate within a logarithmic factor. Our results are not limited to the normal location model, but generalize to any oneparameter exponential family when the anomalous clusters are large enough.
Feature selection by higher criticism thresholding: Optimal phase diagram. Manucript, available at arXiv:0812.2263
, 2008
"... We consider twoclass linear classification in a highdimensional, lowsample size setting. Only a small fraction of the features are useful, the useful features are unknown to us, and each useful feature contributes weakly to the classification decision – this setting was called the rare/weak model ..."
Abstract

Cited by 26 (7 self)
 Add to MetaCart
(Show Context)
We consider twoclass linear classification in a highdimensional, lowsample size setting. Only a small fraction of the features are useful, the useful features are unknown to us, and each useful feature contributes weakly to the classification decision – this setting was called the rare/weak model (RW Model) in [11]. We select features by thresholding feature zscores. The threshold is set by higher criticism (HC) [11]. Let πi denote the Pvalue associated to the ith zscore and π(i) denote the ith order statistic of the collection of Pvalues. The HC threshold (HCT) is the order statistic of the zscore corresponding to index i maximizing (i/n − π(i)) / p π(i)(1 − π(i)). The ideal threshold optimizes the classification error. In [11] we showed that HCT was numerically close to the ideal threshold. We formalize an asymptotic framework for studying the RW model, considering a sequence of problems with increasingly many features and relatively fewer observations. We show that along this sequence, the limiting performance of ideal HCT is essentially just as good as the limiting performance of ideal thresholding. Our results describe twodimensional
Global Testing under Sparse Alternatives: ANOVA, Multiple Comparisons and the Higher Criticism
"... Testing for the significance of a subset of regression coefficients in a linear model, a staple of statistical analysis, goes back at least to the work of Fisher who introduced the analysis of variance (ANOVA). We study this problem under the assumption that the coefficient vector is sparse, a commo ..."
Abstract

Cited by 25 (2 self)
 Add to MetaCart
(Show Context)
Testing for the significance of a subset of regression coefficients in a linear model, a staple of statistical analysis, goes back at least to the work of Fisher who introduced the analysis of variance (ANOVA). We study this problem under the assumption that the coefficient vector is sparse, a common situation in modern highdimensional settings. Suppose we have p covariates and that under the alternative, the response only depends upon on the order of p 1−α of those, 0 ≤ α ≤ 1. Under moderate sparsity levels, i.e. 0 ≤ α ≤ 1/2, we show that ANOVA is essentially optimal under some conditions on the design. This is no longer the case under strong sparsity constraints, i.e. α> 1/2. In such settings, a multiple comparison procedure is often preferred and we establish its optimality when α ≥ 3/4. However, these two very popular methods are suboptimal, and sometimes powerless, under moderately strong sparsity where 1/2 < α < 3/4. We suggest a method based on the Higher Criticism that is powerful in the whole range α> 1/2. This optimality property is true for a variety of designs, including the classical (balanced) multiway designs and more modern ‘p> n ’ designs arising in genetics and signal processing. In addition to the standard fixed effects model, we establish similar results for a random effects model where the nonzero coefficients of the regression vector are normally distributed.
Supplement to “UPS delivers optimal phase diagram in high dimensional variable selection.” DOI:10.1214/11AOS947SUPP
, 2011
"... ar ..."
(Show Context)
Detecting activations over graphs using spanning tree wavelet bases
 In Artificial Intelligence and Statistics (AISTATS
, 2013
"... ar ..."
DETECTION OF CORRELATIONS
 SUBMITTED TO THE ANNALS OF STATISTICS
"... We consider the hypothesis testing problem of deciding whether an observed highdimensional vector has independent normal components or, alternatively, if it has a small subset of correlated components. The correlated components may have a certain combinatorial structure known to the statistician. W ..."
Abstract

Cited by 8 (2 self)
 Add to MetaCart
(Show Context)
We consider the hypothesis testing problem of deciding whether an observed highdimensional vector has independent normal components or, alternatively, if it has a small subset of correlated components. The correlated components may have a certain combinatorial structure known to the statistician. We establish upper and lower bounds for the worstcase (minimax) risk in terms of the size of the correlated subset, the level of correlation, and the structure of the class of possibly correlated sets. We show that some simple tests have nearoptimal performance in many cases, while the generalized likelihood ratio test is suboptimal in some important cases.
Detecting Positive Correlations in a Multivariate Sample
, 2012
"... We consider the problem of testing whether a correlation matrix of a multivariate normal population is the identity matrix. We focus on sparse classes of alternatives where only a few entries are nonzero and, in fact, positive. We derive a general lower bound applicable to various classes and study ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
We consider the problem of testing whether a correlation matrix of a multivariate normal population is the identity matrix. We focus on sparse classes of alternatives where only a few entries are nonzero and, in fact, positive. We derive a general lower bound applicable to various classes and study the performance of some nearoptimal tests. We pay special attention to computational feasibility and construct nearoptimal tests that can be computed efficiently. Finally, we apply our results to prove new lower bounds for the clique number of highdimensional random geometric graphs.
Localized nonlinear functional equations and two sampling problems in signal processing
 Adv. Comput. Math
"... Abstract. Let 1 ≤ p ≤ ∞. functional equation In this paper, we consider solving a nonlinear f(x) = y, where x, y belong to ℓ p and f has continuous bounded gradient in an inverseclosed subalgebra of B(ℓ 2), the Banach algebra of all bounded linear operators on the Hilbert space ℓ 2. We introduce st ..."
Abstract

Cited by 6 (4 self)
 Add to MetaCart
(Show Context)
Abstract. Let 1 ≤ p ≤ ∞. functional equation In this paper, we consider solving a nonlinear f(x) = y, where x, y belong to ℓ p and f has continuous bounded gradient in an inverseclosed subalgebra of B(ℓ 2), the Banach algebra of all bounded linear operators on the Hilbert space ℓ 2. We introduce strict monotonicity property for functions f on Banach spaces ℓp so that the above nonlinear functional equation is solvable and the solution x depends continuously on the given data y in ℓp. We show that the VanCittert iteration converges in ℓp with exponential rate and hence it could be used to locate the true solution of the above nonlinear functional equation. We apply the above theory to handle two problems in signal processing: nonlinear sampling termed with instantaneous companding and subsequently average sampling; and local identification of innovation positions and qualification of amplitudes of signals with finite rate of innovation. 1.
OPTIMAL CLASSIFICATION IN SPARSE GAUSSIAN GRAPHIC MODEL
"... Consider a twoclass classification problem where the number of features is much larger than the sample size. The features are masked by Gaussian noise with mean zero and covariance matrix , where the precision matrix = −1 is unknown but is presumably sparse. The useful features, also unknown, are ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
(Show Context)
Consider a twoclass classification problem where the number of features is much larger than the sample size. The features are masked by Gaussian noise with mean zero and covariance matrix , where the precision matrix = −1 is unknown but is presumably sparse. The useful features, also unknown, are sparse and each contributes weakly (i.e., rare and weak) to the classification decision. By obtaining a reasonably good estimate of , we formulate the setting as a linear regression model. We propose a twostage classification method where we first select features by the method of Innovated Thresholding (IT), and then use the retained features and Fisher’s LDA for classification. In this approach, a crucial problem is how to set the threshold of IT. We approach this problem by adapting the recent innovation of Higher Criticism Thresholding (HCT). We find that when useful features are rare and weak, the limiting behavior of HCT is essentially just as good as the limiting behavior of ideal threshold, the threshold one would choose if the underlying distribution of the signals is known (if only). Somewhat surprisingly, when is sufficiently sparse, its offdiagonal coordinates usually do not have a major influence over the classification decision. Compared to recent work in the case where is the identity matrix [Proc.
Test alternative to higher criticism for high dimensional means under sparsity and columnwise dependence
 Ann. Statist
, 2013
"... ar ..."