1. Introduction
For any two integers $p,n\geq2$ , let $\mathbf{X}_{p\times n}$ be a $p\times n$ random matrix with independent and identically distributed (i.i.d.) real entries. The matrix $\mathbf{W}$ defined by $\mathbf{W} = \mathbf{X}\mathbf{X}^\top/n$ (with $^\top$ standing for matrix transpose) is usually called a sample covariance matrix (see [Reference Bai and Yin1] and [Reference Yin, Bai and Krishnaiah11]), where p and n can be understood as dimension size and sample size respectively. When the entries are i.i.d. centered normal random variables, then $n\mathbf{W}$ is called a Wishart matrix. Sample covariance matrices appear naturally in many situations of multivariate statistical inference; in particular, many test statistics involve the extremal eigenvalues of $\mathbf{W}$ . For instance, the union-intersection principle proposed in [Reference Roy8] suggests that one can use the largest eigenvalue of the sample covariance matrix to test whether or not the population covariance is identity. In the literature, weak convergence and law of large numbers of the extremal eigenvalues of $\mathbf{W}$ have been well studied; see [Reference Bai and Yin1], [Reference Johansson5], [Reference Johnstone6], [Reference Yin, Bai and Krishnaiah11], and the references therein. In this note we study large deviations of the extremal eigenvalues of $\mathbf{W}$ as both p and n tend to infinity.
As the non-zero eigenvalues of $\mathbf{X}\mathbf{X}^\top$ are the same as those of $\mathbf{X}^\top\mathbf{X}$ , it is without loss of generality to assume that $p\leq n$ . Let $\lambda_{\min}$ and $\lambda_{\max}$ denote the smallest and largest eigenvalue of $\mathbf{W}$ respectively. It is assumed throughout the note that the i.i.d. entries $\{X_{ij}\}_{1\leq i\leq p,1\leq j\leq n}$ of $\mathbf{X}$ have zero mean $\mathbb{E}(X_{ij})=0$ and unit variance $\mathbb{V}(X_{ij})=1$ . Under the fourth finite moment assumption $EX_{ij}^4<\infty$ , Bai and Yin [Reference Bai and Yin1] proved that $\lambda_{\min}\rightarrow (1-\kappa^{1/2})^2$ and $\lambda_{\max}\rightarrow (1+\kappa^{1/2})^2$ almost surely as $n\rightarrow\infty$ and $p=p(n)\rightarrow\infty$ with $p(n)/n\rightarrow \kappa$ . When $\kappa=0$ , the above results indicate that for large p and n the majority of $\lambda_{\min}$ lies in the region close to 1 from the left, and the majority of $\lambda_{\max}$ lies in the region close to 1 from the right. Therefore Fey et al. [Reference Fey, van der Hofstad and Klok3, Theorem 3.1] studied asymptotics on large deviation probabilities in the forms $\mathbb{P}(\lambda_{\min}\leq c)$ with $0\leq c\leq 1$ and $\mathbb{P}(\lambda_{\max}\geq c)$ with $c\geq 1$ for large p and n satisfying $p={\mathrm{o}}(n/{\ln}\ln n)$ . They also noted [Reference Fey, van der Hofstad and Klok3, p. 1061] that the technical assumption $p={\mathrm{o}}(n/{\ln}\ln n)$ might be extended further by refining the arguments; however, this does not seem to be able to get rid of the logarithmic term.
The main result of this note (see Theorem 1 below) is an extension of [Reference Fey, van der Hofstad and Klok3, Theorem 3.1] in two respects: (a) the technical assumption is extended to $p={\mathrm{o}}(n)$ , and (b) the i.i.d. entries are more general. To state our main result, let us recall the definition of sub-Gaussian distribution. A random variable X is said to be sub-Gaussian if it satisfies one of the following three equivalent properties, with parameters $K_i, 1\leq i\leq 3$ differing from each other by at most an absolute constant factor (see [Reference Vershynin10, Lemma 5.5]).
-
(i) Tails: $\mathbb{P}(|X|>t)\leq \exp\big\{1-t^2/K_1^2\big\}$ for all $t\geq0$ .
-
(ii) Moments: $(\mathbb{E}|X|^p)^{1/p}\leq K_2\sqrt{p}$ for all $p\geq1$ .
-
(iii) Super-exponential moment: $\mathbb{E}\exp\big\{X^2/K_3^2\big\}\leq e$ .
If moreover $\mathbb{E}(X)=0$ , then (i)–(iii) are also equivalent to the following.
-
(iv) Moment generating function: $\mathbb{E}\exp\{tX\}\leq \exp\big\{t^2K_4^2\big\}$ for all $t\in \mathbb{R}$ for some constant $K_4$ .
Furthermore, the sub-Gaussian norm of X is defined as $\sup_{p\geq1}p^{-1/2}(\mathbb{E}|X|^p)^{1/p}$ , namely the smallest $K_2$ in (ii).
Theorem 1. Suppose that the entries $\{X_{ij}\}_{1\leq i\leq p,1\leq j\leq n}$ of $\mathbf{X}$ are i.i.d. sub-Gaussian with zero mean and unit variance. Then, for $p = p(n)\rightarrow\infty$ with $p(n)={\mathrm{o}}(n)$ as $n\rightarrow\infty$ , we have the following.
-
(i) For any $c\geq 1$ ,
(1) \begin{align}\liminf_{n\rightarrow\infty}n^{-1}\ln \mathbb{P}(\lambda_{\max}\geq c) \geq -I(c),\qquad\quad\end{align}(2) \begin{align}\limsup_{n\rightarrow\infty}n^{-1}\ln \mathbb{P}(\lambda_{\max}\geq c) \leq -\lim_{\epsilon\rightarrow 0}I(c - \epsilon).\end{align} -
(ii) For any $0\leq c\leq 1$ ,
(3) \begin{align}\liminf_{n\rightarrow\infty}n^{-1}\ln \mathbb{P}(\lambda_{\min}\leq c) \geq -I(c),\qquad\qquad\end{align}(4) \begin{align}\limsup_{n\rightarrow\infty}n^{-1}\ln \mathbb{P}(\lambda_{\min}\leq c) \leq -\lim_{\epsilon\rightarrow 0}I(c + \epsilon).\end{align}
Here $I(c)\,:\!=\, \lim_{p\rightarrow\infty}I_{p}(c)$ with
$\|x\|$ being the Euclidean norm, and
For standard normal entries, the results of Theorem 1 were proved in [Reference Fey, van der Hofstad and Klok3, Theorem 3.1] (assuming $p={\mathrm{o}}(n/{\ln}\ln n)$ ), and in [Reference Jiang and Li4, Theorems 2 and 3] (under the assumption $p(n)={\mathrm{o}}(n)$ ) where general $\beta$ -Laguerre ensembles were considered (with $\beta=1$ corresponding to entries being standard normal). From this point of view, Theorem 1 can also be regarded as an extension of [Reference Jiang and Li4, Theorems 2 and 3] from $\beta$ -Laguerre ensembles to sub-Gaussian entries. The continuity of I(c) is still largely unknown, as pointed out in [Reference Fey, van der Hofstad and Klok3]. However, with the arguments in [Reference Fey, van der Hofstad and Klok3, Theorem 3.2], I(c) can be shown to be continuous on $[1,\infty)$ for some special sub-Gaussian entries; see Section 2.3 for more details. The proof of Theorem 1 makes use of a concentration inequality of the largest eigenvalue $\lambda_{\max}$ (see Section 2.1), which helps us to avoid refining the arguments in [Reference Fey, van der Hofstad and Klok3]. The same idea was employed in [Reference Singull, Uwamariya and Yang9] for the study of condition numbers of sample covariance matrices.
2. Proof of Theorem 1
2.1. Concentration inequality for the largest eigenvalue
Vershynin [Reference Vershynin10, Theorem 5.39] considered a random matrix $A_{p\times n}$ whose columns $A_j$ , $1\leq j\leq n$ are independent sub-Gaussian isotropic random vectors in $\mathbb{R}^p$ . Here we switched ‘rows’, which was originally written in [Reference Vershynin10, Theorem 5.39], to ‘columns’, as therein the largest singular value $s_{\max}(A)$ of A is defined as the largest eigenvalue of $(A^\top A)^{1/2}$ , while in the current note we always consider the form $\mathbf{X}\mathbf{X}^\top $ because of the assumption $p\leq n$ . If we now take $A=\mathbf{X}$ , then the elements in each column are i.i.d. sub-Gaussian random variables, implying (based on [Reference Vershynin10, Lemma 5.24]) that the sub-Gaussian norm $\|A_j\|_{\psi_2}$ of each column $A_j$ is finite, which is independent of p and n. As columns have the same distribution, it holds that $K\,:\!=\, \|A_1\|_{\psi_2}=\cdots=\|A_n\|_{\psi_2}$ . The concentration inequality in [Reference Vershynin10, Theorem 5.39] says that for any $t\geq0$ , and two absolute constants $\kappa_1,\kappa_2>0$ only dependent on K,
Note that $s_{\max}^2(A)=n\lambda_{\max}$ in the case $A=\mathbf{X}$ , so the above non-asymptotic inequality reads
With $\gamma\,:\!=\, t/\sqrt{n}$ and the fact $p\leq n$ , for any $\gamma\geq0$ it becomes
2.2. Proof of the upper bounds
As suggested in [Reference Fey, van der Hofstad and Klok3], the fundamental first step of the proof is as follows:
Then the lower bounds (1) and (3) (for any $p\leq n$ ) follow directly from Cramér’s theorem for i.i.d. random variables $S_{x,i}$ , $1\leq i\leq n$ . More specifically, we first fix an integer p and choose an x such that only the first p components are non-zero, then apply Cramér’s theorem, and finally send p to infinity; see also the detailed arguments in [Reference Fey, van der Hofstad and Klok3, Section 3.2] leading to (3.8) therein. To prove the upper bounds in (2) and (4), as explained in [Reference Fey, van der Hofstad and Klok3] and [Reference Singull, Uwamariya and Yang9], we shall use a finite number $N_d$ of spherical caps of chord $2\,\tilde{\!d}\,:\!=\, 2d\sqrt{1-d^2/4}$ with centers $x^{(j)}$ to cover the entire unit sphere S defined as $\|x\|=1$ , such that for any $x\in S$ there is some $x^{(j)}\in S$ close to x with $\|x-x^{(j)}\|\leq d$ . In this case,
(see [Reference Fey, van der Hofstad and Klok3, p. 1054]). For $p=p(n)\rightarrow\infty$ as $n\rightarrow\infty$ , we need an explicit expression of $N_d$ , which can be borrowed from [Reference Singull, Uwamariya and Yang9] (see also [Reference Fey, van der Hofstad and Klok3] and [Reference Rogers7]) as
for all $d<1/2$ and large $\tilde{p}(n)\,:\!=\, p(n)-1$ . Then it is clear that for any fixed d, we have the limit $\lim_{n\rightarrow\infty}n^{-1}\ln N_d=0$ with $p(n)={\mathrm{o}}(n)$ .
Thanks to the concentration inequality (5), the following upper estimates are used:
To prove (2), applying (6) to (8) gives
where the first inequality comes from the facts
and $\lambda_{\max}<(1+\kappa_1+\gamma)^2$ . With $\epsilon\,:\!=\, 2d(1+\kappa_1+\gamma)^2$ , the Chernoff upper bound (see [Reference Dembo and Zeitouni2, remark (c) of Theorem 2.2.3]) implies
With $p(n)={\mathrm{o}}(n)$ and the fact $\lim_{n\rightarrow\infty}n^{-1}\ln N_d = 0$ , it follows that
Taking into account the concentration inequality (5), we obtain
Thus (2) is proved by first taking $d\rightarrow0^+$ (implying that $\epsilon \rightarrow0^+$ ) and then sending $\gamma\rightarrow\infty$ .
In a very similar way (4) can be proved by applying (7) to (9) as follows:
Here we remark that the original proof in [Reference Fey, van der Hofstad and Klok3] is based on splitting the values of $\lambda_{\max}$ into two (or more) parts with the length of each part depending on n, which leads to the restrictive assumption $p={\mathrm{o}}(n/{\ln}\ln n)$ . Because of the uniform constant $\gamma$ in the concentration inequality (5), it is thus possible to improve it as $p={\mathrm{o}}(n)$ .
2.3. Continuity of I(c)
It was remarked in [Reference Fey, van der Hofstad and Klok3] that the continuity of I(c) is still largely unknown. Here we derive bounds for I(c) using the ideas of [Reference Fey, van der Hofstad and Klok3, Theorem 3.2], and show that I(c) is continuous on $[1,\infty)$ for sub-Gaussian entries satisfying the conditions in Theorem 1 and $K_4^2=1/2$ (recall that $K_4^2$ is given in the definition of sub-Gaussian distributions in Section 1).
Recall that $I(c)=\lim_{p\rightarrow\infty}I_{p}(c)$ , where
For $c\geq1$ , we have
since
It was shown in [Reference Singull, Uwamariya and Yang9] that
Therefore, for $c\geq1$ ,
The restriction $1/2\leq K_4^2$ is from the assumptions that the entries $X_{ij}$ have zero mean, unit variance, and
The other restriction $K_4^2\leq c/2$ is from searching for the supremum. Therefore
On the other hand, Fey et al. [Reference Fey, van der Hofstad and Klok3, Theorem 3.2] proved that $I(c)\leq (c-1-\ln c)/2$ for $c\geq1$ . In summary, if one takes the entries $X_{ij}$ as sub-Gaussian random variables satisfying the conditions in Theorem 1 and $K_4^2=1/2$ , then $I(c)=(c-1-\ln c)/2$ for $c\geq1$ . As mentioned in [Reference Fey, van der Hofstad and Klok3], this is a kind of universality result as $(c-1-\ln c)/2$ is the corresponding rate function with i.i.d. standard normal entries. Furthermore, the condition $K_4^2=1/2$ is satisfied for at least three distributions: standard normal, Bernoulli $\mp1$ with equal probabilities, and uniform distribution on $[{-}\sqrt{3},\sqrt{3}]$ .
Acknowledgements
The authors are grateful to the referee and Editor for several constructive comments which have led to an improved version of the paper.
Funding information
There are no funding bodies to thank relating to this creation of this article.
Competing interests
There were no competing interests to declare which arose during the preparation or publication process of this article.