#### DMCA

## Robust Sparse Principal Component Regression under the High Dimensional Elliptical Model

### Citations

350 | Introduction to the Non-asymptotic Analysis of Random Matrices
- Vershynin
- 2012
(Show Context)
Citation Context ...3b) relax the Gaussian assumption in conducting a scale invariant version of the sparse PCA (i.e., estimating the leading eigenvector of the correlation instead of the covariance matrix). However, it can not be easily applied to estimate u1 and the rate of convergence they proved is not the parametric rate. This paper improves upon the aforementioned results in two directions. First, with regard to the classical principal component regression, under a double asymptotic framework in which d is allowed to increase with n, by borrowing the very recent development in principal component analysis (Vershynin, 2010; Lounici, 2012; Bunea and Xiao, 2012), we for the first time explicitly show the advantage of principal component regression over the classical linear regression. We explicitly confirm the following two advantages of principal component regression: (i) Principal component regression is insensitive to collinearity, while linear regression is very sensitive to; (ii) Principal component regression can utilize the low rank structure of the covariance matrix Σ, while linear regression cannot. Secondly, in high dimensions where d can increase much faster, even exponentially faster, than n, we propo... |

315 |
Symmetric Multivariate and Related Distributions.
- Fang, Kotz, et al.
- 1989
(Show Context)
Citation Context ...eigenvectors u1,u2 of Σ are specified to be sparse with sj := ‖uj‖0 and ujk = 1/ √ sj for k ∈ [1 + ∑j−1 i=1 si, ∑j i=1 si] and zero for all the others. Σ is generated as Σ = ∑2 j=1(ωj−ωd)ujuTj +ωdId. Across all settings, we let s1 = s2 = 10, ω1 = 5.5, ω2 = 2.5, and ωj = 0.5 for all j = 3, . . . , d. With Σ, we then consider the following four different elliptical distributions: (Normal)X ∼ ECd(0,Σ, ζ1) with ζ1 d = χd. Here χd is the chi-distribution with degree of freedom d. For Y1, . . . , Yd i.i.d.∼ N(0, 1), √ Y 21 + . . .+ Y 2 d d = χd. In this setting, X follows the Gaussian distribution (Fang et al., 1990). (Multivariate-t) X ∼ ECd(0,Σ, ζ2) with ζ2 d = √ κξ∗1/ξ ∗ 2 . Here ξ ∗ 1 d = χd and ξ∗2 d = χκ with κ ∈ Z+. In this setting, X follows a multivariate-t distribution with degree of freedom κ (Fang et al., 1990). Here we consider κ = 3. (EC1)X ∼ ECd(0,Σ, ζ3) with ζ3 ∼ F (d, 1), an F distribution. (EC2)X ∼ ECd(0,Σ, ζ4) with ζ4 ∼ Exp(1), an exponential distribution. We then simulate x1, . . . ,xn fromX . This forms a data matrix X. Secondly, we let Y = Xu1 +, where ∼ Nn(0, In). This produces the data (Y ,X). We repeatedly generate n data according to the four distributions discussed above for ... |

275 | Sparse principal component analysis
- Zou, Hastie, et al.
(Show Context)
Citation Context ...less analysis on principal component regression in high dimensions where the dimension d can be even exponentially larger than the sample size n. This is partially due to the fact that estimating the leading eigenvectors of Σ itself has been difficult enough. For example, Johnstone and Lu (2009) show that, even under the Gaussian model, when d/n → γ for some γ > 0, there exist multiple settings under which u1 can be an inconsistent estimator of u1. To attack this “curse of dimensionality”, one solution is adding a sparsity assumption on u1, leading to various versions of the sparse PCA. See, Zou et al. (2006); d’Aspremont et al. (2007); Moghaddam et al. (2006), among others. Under the (sub)Gaussian settings, minimax optimal rates are being established in estimating u1, . . . ,um (Vu and Lei, 2012; Ma, 2013; Cai et al., 2013). Very recently, Han and Liu (2013b) relax the Gaussian assumption in conducting a scale invariant version of the sparse PCA (i.e., estimating the leading eigenvector of the correlation instead of the covariance matrix). However, it can not be easily applied to estimate u1 and the rate of convergence they proved is not the parametric rate. This paper improves upon the aforement... |

273 | A direct formulation for sparse pca using semidefinite programming. - d’Aspremont, Ghaoui, et al. - 2007 |

111 |
A course in multivariate analysis.
- Kendall
- 1968
(Show Context)
Citation Context ...y, and logistic. It allows the random vector to be heavy tailed and have tail dependence. These extra flexibilities make it very suitable for modeling finance and biomedical imaging data. Under the elliptical model, we prove that our method can estimate the regression coefficients in the optimal parametric rate and therefore is a good alternative to the Gaussian based methods. Experiments on synthetic and real world data are conducted to illustrate the empirical usefulness of the proposed method. 1 Introduction Principal component regression (PCR) has been widely used in statistics for years (Kendall, 1968). Take the classical linear regression with random design for example. Let x1, . . . ,xn ∈ Rd be n independent realizations of a random vector X ∈ Rd with mean 0 and covariance matrix Σ. The classical linear regression model and simple principal component regression model can be elaborated as follows: (Classical linear regression model) Y = Xβ + ; (Principal Component Regression Model) Y = αXu1 + , (1.1) where X = (x1, . . . ,xn)T ∈ Rn×d, Y ∈ Rn, ui is the i-th leading eigenvector of Σ, and ∈ Nn(0, σ 2Id) is independent of X, β ∈ Rd and α ∈ R. Here Id ∈ Rd×d is the identity matrix. The pri... |

107 |
On consistency and sparsity for principal components analysis in high dimensions.
- Johnstone, Lu
- 2009
(Show Context)
Citation Context ...nd Xuj when i < j. This indicates, although not rigorous, there is possibility that principal component regression can borrow strength from the low rank structure of Σ, which motivates our work. Even though the statistical performance of principal component regression in low dimensions is not fully understood, there is even less analysis on principal component regression in high dimensions where the dimension d can be even exponentially larger than the sample size n. This is partially due to the fact that estimating the leading eigenvectors of Σ itself has been difficult enough. For example, Johnstone and Lu (2009) show that, even under the Gaussian model, when d/n → γ for some γ > 0, there exist multiple settings under which u1 can be an inconsistent estimator of u1. To attack this “curse of dimensionality”, one solution is adding a sparsity assumption on u1, leading to various versions of the sparse PCA. See, Zou et al. (2006); d’Aspremont et al. (2007); Moghaddam et al. (2006), among others. Under the (sub)Gaussian settings, minimax optimal rates are being established in estimating u1, . . . ,um (Vu and Lei, 2012; Ma, 2013; Cai et al., 2013). Very recently, Han and Liu (2013b) relax the Gaussian ass... |

78 | Spectral bounds for sparse PCA: exact and greedy algorithms
- Moghaddam, Weiss, et al.
(Show Context)
Citation Context ...n in high dimensions where the dimension d can be even exponentially larger than the sample size n. This is partially due to the fact that estimating the leading eigenvectors of Σ itself has been difficult enough. For example, Johnstone and Lu (2009) show that, even under the Gaussian model, when d/n → γ for some γ > 0, there exist multiple settings under which u1 can be an inconsistent estimator of u1. To attack this “curse of dimensionality”, one solution is adding a sparsity assumption on u1, leading to various versions of the sparse PCA. See, Zou et al. (2006); d’Aspremont et al. (2007); Moghaddam et al. (2006), among others. Under the (sub)Gaussian settings, minimax optimal rates are being established in estimating u1, . . . ,um (Vu and Lei, 2012; Ma, 2013; Cai et al., 2013). Very recently, Han and Liu (2013b) relax the Gaussian assumption in conducting a scale invariant version of the sparse PCA (i.e., estimating the leading eigenvector of the correlation instead of the covariance matrix). However, it can not be easily applied to estimate u1 and the rate of convergence they proved is not the parametric rate. This paper improves upon the aforementioned results in two directions. First, with regard ... |

65 |
Principal components regression in exploratory statistical research
- Massy
- 1965
(Show Context)
Citation Context ...x. The principal component regression then can be conducted in two steps: First we obtain an estimator u1 of u1; Secondly we project the data in the direction of u1 and solve a simple linear regression in estimating α. By checking Equation (1.1), it is easy to observe that the principal component regression model is a subset of the general linear regression (LR) model with the constraint that the regression coefficient β is proportional to u1. There has been a lot of discussions on the advantage of principal component regression over classical linear regression. In low dimensional settings, Massy (1965) pointed out that principal component regression can be much more efficient in handling collinearity among predictors compared to the linear regression. More recently, Cook (2007) and Artemiou and Li (2009) argued that principal component regression has potential to play a more important role. In particular, letting uj be the j-th leading eigenvector of the sample covariance matrix Σ of x1, . . . ,xn, 1 Artemiou and Li (2009) show that under mild conditions with high probability the correlation between the response Y and Xui is higher than or equal to the correlation between Y and Xuj when... |

57 | Fisher lecture: Dimension reduction in regression
- Cook
- 2007
(Show Context)
Citation Context ...imple linear regression in estimating α. By checking Equation (1.1), it is easy to observe that the principal component regression model is a subset of the general linear regression (LR) model with the constraint that the regression coefficient β is proportional to u1. There has been a lot of discussions on the advantage of principal component regression over classical linear regression. In low dimensional settings, Massy (1965) pointed out that principal component regression can be much more efficient in handling collinearity among predictors compared to the linear regression. More recently, Cook (2007) and Artemiou and Li (2009) argued that principal component regression has potential to play a more important role. In particular, letting uj be the j-th leading eigenvector of the sample covariance matrix Σ of x1, . . . ,xn, 1 Artemiou and Li (2009) show that under mild conditions with high probability the correlation between the response Y and Xui is higher than or equal to the correlation between Y and Xuj when i < j. This indicates, although not rigorous, there is possibility that principal component regression can borrow strength from the low rank structure of Σ, which motivates our w... |

50 | Sparse principal component analysis and iterative thresholding. to appear Annals of Statistics.
- Ma
- 2013
(Show Context)
Citation Context ...genvectors of Σ itself has been difficult enough. For example, Johnstone and Lu (2009) show that, even under the Gaussian model, when d/n → γ for some γ > 0, there exist multiple settings under which u1 can be an inconsistent estimator of u1. To attack this “curse of dimensionality”, one solution is adding a sparsity assumption on u1, leading to various versions of the sparse PCA. See, Zou et al. (2006); d’Aspremont et al. (2007); Moghaddam et al. (2006), among others. Under the (sub)Gaussian settings, minimax optimal rates are being established in estimating u1, . . . ,um (Vu and Lei, 2012; Ma, 2013; Cai et al., 2013). Very recently, Han and Liu (2013b) relax the Gaussian assumption in conducting a scale invariant version of the sparse PCA (i.e., estimating the leading eigenvector of the correlation instead of the covariance matrix). However, it can not be easily applied to estimate u1 and the rate of convergence they proved is not the parametric rate. This paper improves upon the aforementioned results in two directions. First, with regard to the classical principal component regression, under a double asymptotic framework in which d is allowed to increase with n, by borrowing the very ... |

43 | Model selection in gaussian graphical models: High-dimensional consistency of l1-regularized mle.
- Ravikumar, Raskutti, et al.
- 2008
(Show Context)
Citation Context ...s are considered. Here κL, κU are two constants larger than 1. Condition 1 (“Easiest”): λ1(Σ) 1,κU dλj(Σ) for any j ∈ {2, . . . , d} and λ2(Σ) 1,κU λj(Σ) for any j ∈ {3, . . . , d}; Condition 2 (“Hardest”): λ1(Σ) κL,κU λj(Σ) for any j ∈ {2, . . . , d}. In the sequel, we say that the modelMd(Y , ; Σ, ξ, s) holds if the data (Y ,X) are generated using the modelMd(Y , ; Σ, ξ, s). Under Conditions 1 and 2, we then have the following theorem, which shows that under certain conditions, ‖β − β‖2 = OP ( √ s log d/n), which is the optimal parametric rate in estimating the regression coefficient (Ravikumar et al., 2008). Theorem 3.2. Let the modelMd(Y , ; Σ, ξ, s) hold and |α |in Equation (3.5) are upper bounded by a constant and ‖Σ‖2 is lower bounded by a constant. Then under Condition 1 or Condition 2 and for all random vectorX such that max v∈Sd−1,‖v‖0≤2s |vT (Σ−Σ)v |= oP (1), we have the robust principal component regression estimator β satisfies that ‖β − β‖2 = OP (√ s log d n ) . Normal multivariate-t EC1 EC2 0 20 40 60 80 0. 0 0. 2 0. 4 0. 6 0. 8 1. 0 1. 2 number of selected features av er ag ed e rr or PCR RPCR 0 20 40 60 80 0. 0 0. 2 0. 4 0. 6 0. 8 1. 0 1. 2 number of selected features av er ag ... |

32 | Truncated power method for sparse eigenvalue problems.
- Yuan, Zhang
- 2013
(Show Context)
Citation Context ...wer method. Here n = 100, d = 200, and we are interested in estimating the regression coefficient β. The horizontal-axis represents the cardinalities of the estimates’ support sets and the vertical-axis represents the empirical mean square error. Here from the left to the right, the minimum mean square errors for lasso are 0.53, 0.55, 1, and 1. 6 4 Experiments In this section we conduct study on both synthetic and real-world data to investigate the empirical performance of the robust sparse principal component regression proposed in this paper. We use the truncated power algorithm proposed in Yuan and Zhang (2013) to approximate the global optimums u1 to (3.6). Here the cardinalities of the support sets of the leading eigenvectors are treated as tuning parameters. The following three methods are considered: lasso: the classical L1 penalized regression; PCR: The sparse principal component regression using the sample covariance matrix as the sufficient statistic and exploiting the truncated power algorithm in estimating u1; RPCR: The robust sparse principal component regression proposed in this paper, using the multivariate Kendall’s tau as the sufficient statistic and exploiting the truncated power alg... |

29 | Minimax rates of estimation for sparse pca in high dimensions.
- Vu, Lei
- 2012
(Show Context)
Citation Context ...ing the leading eigenvectors of Σ itself has been difficult enough. For example, Johnstone and Lu (2009) show that, even under the Gaussian model, when d/n → γ for some γ > 0, there exist multiple settings under which u1 can be an inconsistent estimator of u1. To attack this “curse of dimensionality”, one solution is adding a sparsity assumption on u1, leading to various versions of the sparse PCA. See, Zou et al. (2006); d’Aspremont et al. (2007); Moghaddam et al. (2006), among others. Under the (sub)Gaussian settings, minimax optimal rates are being established in estimating u1, . . . ,um (Vu and Lei, 2012; Ma, 2013; Cai et al., 2013). Very recently, Han and Liu (2013b) relax the Gaussian assumption in conducting a scale invariant version of the sparse PCA (i.e., estimating the leading eigenvector of the correlation instead of the covariance matrix). However, it can not be easily applied to estimate u1 and the rate of convergence they proved is not the parametric rate. This paper improves upon the aforementioned results in two directions. First, with regard to the classical principal component regression, under a double asymptotic framework in which d is allowed to increase with n, by borrowing... |

21 |
Multivariate Nonparametric Methods with R: An approach based on spatial signs and ranks, volume 199.
- Oja
- 2010
(Show Context)
Citation Context ...riate Kendall’s tau matrix, denoted by K ∈ Rd×d, is defined as: K := E ( (X − X)(X − X)T ‖X − X‖22 ) . (3.2) Let x1, . . . ,xn be n independent observations of X . The sample version of multivariate Kendall’s tau is accordingly defined as K = 1 n(n− 1) ∑ i 6=j (xi − xj)(xi − xj)T ‖xi − xj‖22 , (3.3) and we have that E(K) = K. K is a matrix version U statistic and it is easy to see that maxjk |Kjk |≤ 1,maxjk |Kjk |≤ 1. Therefore, K is a bounded matrix and hence can be a nicer statistic than the sample covariance matrix. Moreover, we have the following important proposition, coming from Oja (2010), showing that K has the same eigenspace as Σ and Cov(X). Proposition 3.1 (Oja (2010)). Let X ∼ ECd(µ,Σ, ξ) be a continuous distribution and K be the population multivariate Kendall’s tau statistic. Then if λj(Σ) 6= λk(Σ) for any k 6= j, we have Θj(Σ) = Θj(K) and λj(K) = E ( λj(Σ)U 2 j λ1(Σ)U21 + . . .+ λd(Σ)U 2 d ) , (3.4) where U := (U1, . . . , Ud)T follows a uniform distribution in Sd−1. In particular, when Eξ2 exists, Θj(Cov(X)) = Θj(K). 3.3 Model and Method In this section we discuss the model we build and the accordingly proposed method in conducting high dimensional (sparse) principal ... |

10 |
On the sample covariance matrix estimator of reduced effective rank population matrices, with applications to fPCA. arXiv preprint arXiv:1212.5321.
- Bunea, Xiao
- 2012
(Show Context)
Citation Context ...n in conducting a scale invariant version of the sparse PCA (i.e., estimating the leading eigenvector of the correlation instead of the covariance matrix). However, it can not be easily applied to estimate u1 and the rate of convergence they proved is not the parametric rate. This paper improves upon the aforementioned results in two directions. First, with regard to the classical principal component regression, under a double asymptotic framework in which d is allowed to increase with n, by borrowing the very recent development in principal component analysis (Vershynin, 2010; Lounici, 2012; Bunea and Xiao, 2012), we for the first time explicitly show the advantage of principal component regression over the classical linear regression. We explicitly confirm the following two advantages of principal component regression: (i) Principal component regression is insensitive to collinearity, while linear regression is very sensitive to; (ii) Principal component regression can utilize the low rank structure of the covariance matrix Σ, while linear regression cannot. Secondly, in high dimensions where d can increase much faster, even exponentially faster, than n, we propose a robust method in conducting (spar... |

10 |
Sparse PCA: Optimal rates and adaptive estimation. The Annals of Statistics
- Cai, Ma, et al.
- 2013
(Show Context)
Citation Context ... of Σ itself has been difficult enough. For example, Johnstone and Lu (2009) show that, even under the Gaussian model, when d/n → γ for some γ > 0, there exist multiple settings under which u1 can be an inconsistent estimator of u1. To attack this “curse of dimensionality”, one solution is adding a sparsity assumption on u1, leading to various versions of the sparse PCA. See, Zou et al. (2006); d’Aspremont et al. (2007); Moghaddam et al. (2006), among others. Under the (sub)Gaussian settings, minimax optimal rates are being established in estimating u1, . . . ,um (Vu and Lei, 2012; Ma, 2013; Cai et al., 2013). Very recently, Han and Liu (2013b) relax the Gaussian assumption in conducting a scale invariant version of the sparse PCA (i.e., estimating the leading eigenvector of the correlation instead of the covariance matrix). However, it can not be easily applied to estimate u1 and the rate of convergence they proved is not the parametric rate. This paper improves upon the aforementioned results in two directions. First, with regard to the classical principal component regression, under a double asymptotic framework in which d is allowed to increase with n, by borrowing the very recent development ... |

9 | Sign and rank covariance matrices: statistical properties and application to principal components analysis. In Statistical data analysis based on the L1-norm and related methods,
- Croux, Ollila, et al.
- 2002
(Show Context)
Citation Context ... respect to the Lebesgue reference measure, the volume of Gaussian family is zero (like a line in a 3-dimensional space), while the volume of the elliptical family is positive (like a ball in a 3-dimensional space). 3.2 Multivariate Kendall’s tau As a important step in conducting the principal component regression, we need to estimate u1 = Θ1(Cov(X)) = Θ1(Σ) as accurately as possible. Since the random variable ξ in Equation (3.1) can be very heavy tailed, the according elliptical distributed random vector can be heavy tailed. Therefore, as has been pointed out by various authors (Tyler, 1987; Croux et al., 2002; Han and Liu, 2013b), the leading eigenvector of the sample covariance matrix Σ can be a bad estimator in estimating u1 = Θ1(Σ) under the elliptical distribution. This motivates developing robust estimator. In particular, in this paper we consider using the multivariate Kendall’s tau (Choi and Marden, 1998) and recently deeply studied by Han and Liu (2013a). In the following we give a brief description of this estimator. Let X ∼ ECd(µ,Σ, ξ) and X be an independent copy of X . The population multivariate Kendall’s tau matrix, denoted by K ∈ Rd×d, is defined as: K := E ( (X − X)(X − X)T ‖X ... |

9 |
Sparse principal component analysis with missing observations
- Lounici
- 2013
(Show Context)
Citation Context ...ssian assumption in conducting a scale invariant version of the sparse PCA (i.e., estimating the leading eigenvector of the correlation instead of the covariance matrix). However, it can not be easily applied to estimate u1 and the rate of convergence they proved is not the parametric rate. This paper improves upon the aforementioned results in two directions. First, with regard to the classical principal component regression, under a double asymptotic framework in which d is allowed to increase with n, by borrowing the very recent development in principal component analysis (Vershynin, 2010; Lounici, 2012; Bunea and Xiao, 2012), we for the first time explicitly show the advantage of principal component regression over the classical linear regression. We explicitly confirm the following two advantages of principal component regression: (i) Principal component regression is insensitive to collinearity, while linear regression is very sensitive to; (ii) Principal component regression can utilize the low rank structure of the covariance matrix Σ, while linear regression cannot. Secondly, in high dimensions where d can increase much faster, even exponentially faster, than n, we propose a robust met... |

6 | Estimating the tail dependence function of an elliptical distribution. - Kluppelberg, Kuhn, et al. - 2007 |

6 |
A distribution-freem-estimator of multivariate scatter.
- Tyler
- 1987
(Show Context)
Citation Context ...n family with respect to the Lebesgue reference measure, the volume of Gaussian family is zero (like a line in a 3-dimensional space), while the volume of the elliptical family is positive (like a ball in a 3-dimensional space). 3.2 Multivariate Kendall’s tau As a important step in conducting the principal component regression, we need to estimate u1 = Θ1(Cov(X)) = Θ1(Σ) as accurately as possible. Since the random variable ξ in Equation (3.1) can be very heavy tailed, the according elliptical distributed random vector can be heavy tailed. Therefore, as has been pointed out by various authors (Tyler, 1987; Croux et al., 2002; Han and Liu, 2013b), the leading eigenvector of the sample covariance matrix Σ can be a bad estimator in estimating u1 = Θ1(Σ) under the elliptical distribution. This motivates developing robust estimator. In particular, in this paper we consider using the multivariate Kendall’s tau (Choi and Marden, 1998) and recently deeply studied by Han and Liu (2013a). In the following we give a brief description of this estimator. Let X ∼ ECd(µ,Σ, ξ) and X be an independent copy of X . The population multivariate Kendall’s tau matrix, denoted by K ∈ Rd×d, is defined as: K := E ( (... |

3 |
On principal components and regression: a statistical explanation of a natural phenomenon. Statistica Sinica,
- Artemiou, Li
- 2009
(Show Context)
Citation Context ...ression in estimating α. By checking Equation (1.1), it is easy to observe that the principal component regression model is a subset of the general linear regression (LR) model with the constraint that the regression coefficient β is proportional to u1. There has been a lot of discussions on the advantage of principal component regression over classical linear regression. In low dimensional settings, Massy (1965) pointed out that principal component regression can be much more efficient in handling collinearity among predictors compared to the linear regression. More recently, Cook (2007) and Artemiou and Li (2009) argued that principal component regression has potential to play a more important role. In particular, letting uj be the j-th leading eigenvector of the sample covariance matrix Σ of x1, . . . ,xn, 1 Artemiou and Li (2009) show that under mild conditions with high probability the correlation between the response Y and Xui is higher than or equal to the correlation between Y and Xuj when i < j. This indicates, although not rigorous, there is possibility that principal component regression can borrow strength from the low rank structure of Σ, which motivates our work. Even though the statis... |

3 |
Scale-invariant sparse PCA on high dimensional meta-elliptical data.
- Han, Liu
- 2013
(Show Context)
Citation Context ...ough. For example, Johnstone and Lu (2009) show that, even under the Gaussian model, when d/n → γ for some γ > 0, there exist multiple settings under which u1 can be an inconsistent estimator of u1. To attack this “curse of dimensionality”, one solution is adding a sparsity assumption on u1, leading to various versions of the sparse PCA. See, Zou et al. (2006); d’Aspremont et al. (2007); Moghaddam et al. (2006), among others. Under the (sub)Gaussian settings, minimax optimal rates are being established in estimating u1, . . . ,um (Vu and Lei, 2012; Ma, 2013; Cai et al., 2013). Very recently, Han and Liu (2013b) relax the Gaussian assumption in conducting a scale invariant version of the sparse PCA (i.e., estimating the leading eigenvector of the correlation instead of the covariance matrix). However, it can not be easily applied to estimate u1 and the rate of convergence they proved is not the parametric rate. This paper improves upon the aforementioned results in two directions. First, with regard to the classical principal component regression, under a double asymptotic framework in which d is allowed to increase with n, by borrowing the very recent development in principal component analysis (V... |

2 |
A multivariate version of kendall’s τ .
- Choi, Marden
- 1998
(Show Context)
Citation Context ...egression, we need to estimate u1 = Θ1(Cov(X)) = Θ1(Σ) as accurately as possible. Since the random variable ξ in Equation (3.1) can be very heavy tailed, the according elliptical distributed random vector can be heavy tailed. Therefore, as has been pointed out by various authors (Tyler, 1987; Croux et al., 2002; Han and Liu, 2013b), the leading eigenvector of the sample covariance matrix Σ can be a bad estimator in estimating u1 = Θ1(Σ) under the elliptical distribution. This motivates developing robust estimator. In particular, in this paper we consider using the multivariate Kendall’s tau (Choi and Marden, 1998) and recently deeply studied by Han and Liu (2013a). In the following we give a brief description of this estimator. Let X ∼ ECd(µ,Σ, ξ) and X be an independent copy of X . The population multivariate Kendall’s tau matrix, denoted by K ∈ Rd×d, is defined as: K := E ( (X − X)(X − X)T ‖X − X‖22 ) . (3.2) Let x1, . . . ,xn be n independent observations of X . The sample version of multivariate Kendall’s tau is accordingly defined as K = 1 n(n− 1) ∑ i 6=j (xi − xj)(xi − xj)T ‖xi − xj‖22 , (3.3) and we have that E(K) = K. K is a matrix version U statistic and it is easy to see that maxjk |Kj... |

1 | Optimal sparse principal component analysis in high dimensional elliptical model.
- Han, Liu
- 2013
(Show Context)
Citation Context ...ough. For example, Johnstone and Lu (2009) show that, even under the Gaussian model, when d/n → γ for some γ > 0, there exist multiple settings under which u1 can be an inconsistent estimator of u1. To attack this “curse of dimensionality”, one solution is adding a sparsity assumption on u1, leading to various versions of the sparse PCA. See, Zou et al. (2006); d’Aspremont et al. (2007); Moghaddam et al. (2006), among others. Under the (sub)Gaussian settings, minimax optimal rates are being established in estimating u1, . . . ,um (Vu and Lei, 2012; Ma, 2013; Cai et al., 2013). Very recently, Han and Liu (2013b) relax the Gaussian assumption in conducting a scale invariant version of the sparse PCA (i.e., estimating the leading eigenvector of the correlation instead of the covariance matrix). However, it can not be easily applied to estimate u1 and the rate of convergence they proved is not the parametric rate. This paper improves upon the aforementioned results in two directions. First, with regard to the classical principal component regression, under a double asymptotic framework in which d is allowed to increase with n, by borrowing the very recent development in principal component analysis (V... |