1 Introduction
Wigner random matrices are $N\times N$ random Hermitian matrices $W=W^*$ with centred, independent, identically distributed (i.i.d.) entries up to the symmetry constraint $w_{ab} = \overline {w_{ba}}$ . Originally introduced by E. Wigner [Reference Wigner53] to study spectral gaps of large atomic nuclei, Wigner matrices have become the most studied random matrix ensemble since they represent the simplest example of a fully chaotic quantum Hamiltonian beyond the explicitly computable Gaussian case.
A key conceptual feature of Wigner matrices, as well as a fundamental technical tool to study them, is the fact that their resolvent $G(z):= (W-z)^{-1}$ , with a spectral parameter z away from the real axis becomes asymptotically deterministic in the large N limit. The limit is the scalar matrix $m(z)\cdot I$ , where $m(z) = \frac {1}{2}(-z+\sqrt {z^2-4})$ is the Stieltjes transform of the Wigner semicircular density, $\rho _{\mathrm {sc}}(x) = \frac {1}{2\pi }\sqrt {4-x^2}$ , which is the $N\to \infty $ limit of the empirical density of the eigenvalues of W under the standard normalisation $\operatorname {\mathbf {E}} |w_{ab}|^2= 1/N$ . The local law on optimal scale asserts that this limit holds even when z is very close to the real axis, as long as $|\Im z|\gg 1/N$ . Noticing that the imaginary part of the Stieltjes transform resolves the spectral measure on a scale comparable with $|\Im z|$ , this condition is necessary for a deterministic limit to hold since on scales of order $1/N$ , comparable with the typical eigenvalue spacing, the resolvent is genuinely fluctuating.
The limit $G(z)\to m(z)\cdot I$ holds in a natural appropriate topology, namely when tested against deterministic $N\times N$ matrices A: that is, in the form , where denotes the normalised trace. It is essential that the test matrix A is deterministic; no analogous limit can hold if A is random and strongly correlated with W: for example, if A is a spectral projection of W.
The first optimal local law for Wigner matrices was proven for $A=I$ in [Reference Erdős, Schlein and Yau27]; see also [Reference Cacciapuoti, Maltsev and Schlein13, Reference Götze, Naumov and Tikhomirov32, Reference Tao and Vu50, Reference Tao and Vu51], extended later to more general matrices A in the form thatFootnote 1
holds with a very high probability for any fixed $\xi>0$ if N is sufficiently large. By optimality in this paper, we always mean up to a tolerance factor $N^{\xi }$ . This is a natural byproduct of our method yielding very high probability estimates under the customary moment condition; see equation (2.2) later.Footnote 2 The estimate given by equation (1.1) is called the average local law, and it controls the error in terms of the standard Euclidean matrix norm $\| A\|$ of A. It holds for arbitrary deterministic matrices A, and it is also optimal in this generality with respect to the dependence on A: for example, for $A=I$ , the trace
is approximately complex Gaussian with standard deviation [Reference He and Knowles33]
but equation (1.1) is far from being optimal when applied to matrices with small rank. Rank-one matrices, $A= \boldsymbol {y} \boldsymbol {x}^*$ , are especially important since they give the asymptotic behaviour of resolvent matrix elements $G_{\boldsymbol {x} \boldsymbol {y}}:= \langle \boldsymbol {x}, G \boldsymbol {y}\rangle $ . For such special test matrices, a separate isotropic local law of the optimal form
has been proven; see [Reference Erdős, Yau and Yin28] for special coordinate vectors and later [Reference Knowles and Yin38] for general vectors $\boldsymbol {x}, \boldsymbol {y}$ , as well as [Reference Erdős, Krüger and Schröder26, Reference He, Knowles and Rosenthal34, Reference Knowles and Yin36, Reference Lee and Schnelli40] for more general ensembles. Note that a direct application of equation (1.1) to $A= \boldsymbol {y} \boldsymbol {x}^*$ would give a bound of order $1/\eta $ instead of the optimal $1/\sqrt {N\eta }$ in equation (1.2), which is an unacceptable overestimate in the most interesting small $\eta $ -regime. More generally, the average local law given by equation (1.1) performs badly when A has effectively small rank: that is, if only a few eigenvalues of A are comparable with the norm $\|A\|$ and most other eigenvalues are much smaller or even zero.
Quite recently, we found that the average local law given by equation (1.1) is also suboptimal for another class of test matrices A, namely traceless matrices. In [Reference Cipolloni, Erdős and Schröder15], we proved that
for any deterministic matrix A with : that is, traceless observables yield an additional $\sqrt {\eta }$ improvement in the error. The optimality of this bound for general traceless A was demonstrated by identifying the nontrivial Gaussian fluctuation of in [Reference Cipolloni, Erdős and Schröder16].
While the mechanism behind the suboptimality of equation (1.1) for small rank and traceless A is very different, their common core is that estimating the size of A simply by the Euclidean norm is too crude for several important classes of A. In this paper, we present a local law that unifies all three local laws in equations (1.1), (1.2) and (1.3) by identifying the appropriate way to measure the size of A. Our main result (Theorem 2.2, $k=1$ case) shows that
holds with very high probability, where is the traceless part of A. It is straightforward to check that equation (1.4) implies equations (1.1), (1.2) and (1.3); moreover, it optimally interpolates between full-rank and rank-one matrices A; hence we call equation (1.4) the rank-uniform local law for Wigner matrices. Note that an optimal local law for matrices of intermediate rank was previously unknown; indeed, the local laws given by equations (1.1) and (1.2) are optimal only for essentially full-rank and essentially finite-rank observables, respectively. The proof of the optimality of equation (1.4) follows from identifying the scale of the Gaussian fluctuation of its left-hand side. Its standard deviation for traceless A is
this relation was established for matrices with bounded norm $\| A\|\lesssim 1$ in [Reference Cipolloni, Erdős and Schröder16, Reference Lytova42].
The key observation that traceless A substantially improves the error term in equation (1.3) compared with equation (1.1) was the conceptually new input behind our recent proof of the Eigenstate Thermalisation Hypothesis in [Reference Cipolloni, Erdős and Schröder15] followed by the proof of the normal fluctuation in the quantum unique ergodicity for Wigner matrices in [Reference Cipolloni, Erdős and Schröder17]. Both results concern the behaviour of the eigenvector overlaps: that is, quantities of the form , where $\{{\boldsymbol {u}}_i\}_{i=1}^N$ are the normalised eigenvectors of W. The former result stated that
holds with very high probability for any $i,j$ and for any fixed $\xi>0$ . The latter result established the optimality of equation (1.6) for $i=j$ by showing that $\sqrt {N} \langle {\boldsymbol {u}}_i, \mathring {A} {\boldsymbol {u}}_i \rangle $ is asymptotically Gaussian when the corresponding eigenvalue lies in the bulk of the spectrum. The variance of $\sqrt {N} \langle {\boldsymbol {u}}_i, \mathring {A} {\boldsymbol {u}}_i \rangle $ was shown to be in [Reference Cipolloni, Erdős and Schröder17], but we needed to assume that with some fixed positive constant c: that is, that the rank of $\mathring {A}$ was essentially macroscopic.
As the second main result of the current paper, we now remove this unnatural condition and show the standard Gaussianity of the normalised overlaps for bulk indices under the optimal and natural condition that , which essentially ensures that $\mathring {A}$ is not of finite rank. This improvement is possible thanks to improving the dependence of the error terms in the local laws from $\|\mathring {A}\|$ to similarly to the improvement in equation (1.4) over equation (1.3). We will also need a multi-resolvent version of this improvement since off-diagonal overlaps $\langle {\boldsymbol {u}}_i, A {\boldsymbol {u}}_j \rangle $ are not accessible via single-resolvent local laws; in fact, $|\langle {\boldsymbol {u}}_i, A {\boldsymbol {u}}_j \rangle |^2$ is intimately related to with two different spectral parameters $z, z'$ , analysed in Theorem 2.2. As a corollary, we will show the following improvement of equation (1.6) (see Theorem 2.6)
for the bulk indices. The analysis at the edge is deferred to later work.
Gaussian fluctuation of diagonal overlaps with a special low rank observable has been proven earlier. Right after [Reference Cipolloni, Erdős and Schröder17] was posted on the arXiv, Benigni and Lopatto in an independent work [Reference Benigni and Lopatto7] proved the standard Gaussian fluctuation of $[N/|S|]^{1/2}\big [\sum _{a\in S} |u_i(a)|^2 - |S|/N]$ whenever $1\ll |S|\ll N$ : that is, they considered $\langle {\boldsymbol {u}}_i, \mathring {A} {\boldsymbol {u}}_i \rangle $ for the special case when the matrix A is the projection on coordinates from the set S. Their result also holds at the edge. The condition $|S|\ll N$ requires A to have small rank; hence it is complementary to our old condition from [Reference Cipolloni, Erdős and Schröder17] for projection operators. The natural condition $|S|\gg 1$ is the special case of our new improved condition . In particular, our new result covers [Reference Benigni and Lopatto7] as a special case in the bulk, and it gives a uniform treatment of all observables in full generality.
The methods of [Reference Benigni and Lopatto7] and [Reference Cipolloni, Erdős and Schröder17] are very different albeit they both rely on Dyson Brownian motion (DBM), complemented by fairly standard Green function comparison (GFT) techniques. Benigni and Lopatto focused on the joint Gaussianity of the individual eigenvector entries $u_i(a)$ (or, more generally, linear functionals $\langle q_{\alpha }, {\boldsymbol {u}}_i\rangle $ with deterministic unit vectors $q_{\alpha }$ ) in the spirit of the previous quantum ergodicity results by Bourgade and Yau [Reference Bourgade and Yau10] operating with the so-called eigenvector moment flow from [Reference Bourgade and Yau10] complemented by its ‘fermionic’ version by Benigni [Reference Benigni9]. This approach becomes less effective when more entries need to be controlled simultaneously, and it seems to have a natural limitation at $|S|\ll N$ .
Our method viewed the eigenvector overlap $\langle {\boldsymbol {u}}_i, \mathring {A} {\boldsymbol {u}}_i \rangle $ and its off-diagonal version $\langle {\boldsymbol {u}}_i, \mathring {A} {\boldsymbol {u}}_j \rangle $ as one unit without translating it into a sum of rank-one projections $\langle {\boldsymbol {u}}_i, q_{\alpha }\rangle \langle q_{\alpha }, {\boldsymbol {u}}_j\rangle $ via the spectral decomposition of $\mathring {A}$ . The corresponding flow for overlaps with arbitrary A, called the stochastic eigenstate equation, was introduced by Bourgade, Yau and Yin in [Reference Bourgade, Yau and Yin12] (even though they applied it to the special case when A is a projection, their formalism is general). The analysis of this new flow is more involved than the eigenvector moment flow since it operates on a geometrically more complicated higher-dimensional space. However, the substantial part of this analysis has been done by Marcinek and Yau [Reference Marcinek43], and we heavily relied on their work in our proof [Reference Cipolloni, Erdős and Schröder17].
We close this introduction by commenting on our methods. The main novelty of the current paper is the proof of the rank-uniform local laws involving the Hilbert-Schmidt norm instead of the Euclidean matrix norm $\|\mathring {A}\|$ . This is done in Section 3, and it will directly imply the improved overlap estimate in equation (1.7). Once this estimate is available, both the DBM and the GFT parts of the proof in the current paper are essentially the same as in [Reference Cipolloni, Erdős and Schröder17]; hence we will not give all details but only point out the differences. While this can be done very concisely for the GFT in Appendix B, for the DBM part, we need to recall a large part of the necessary setup in Section 4 for the convenience of the reader.
As to our main result, the general scheme to prove single resolvent local laws has been well established, and traditionally it consisted of two parts: (i) the derivation of an approximate self-consistent equation that $G-m$ satisfies and (ii) estimating the key fluctuation term in this equation. The proofs of the multi-resolvent local laws follow the same scheme, but the self-consistent equation is considerably more complicated, and its stability is more delicate; see, for example, [Reference Cipolloni, Erdős and Schröder15, Reference Cipolloni, Erdős and Schröder19], where general multi-resolvent local laws were proven. The main complication lies in part (ii), where a high moment estimate is needed for the fluctuation term. The corresponding cumulant expansion results in many terms that have typically been organised and estimated by a graphical Feynman diagrammatic scheme. A reasonably manageable power counting handles all diagrams for the purpose of proving equations (1.1) and (1.2). However, in the multi-resolvent setup, or if we aim at some improvement, the diagrammatic approach becomes very involved since the right number of additional improvement factors needs to be gained from every single graph. This was the case many times before: (i) when a small factor (so-called ‘sigma-cell’) was extracted at the cusp [Reference Erdős, Krüger and Schröder25], (ii) when we proved that the correlation between the resolvents of the Hermitization of an i.i.d. random matrix shifted by two different spectral parameters $z_1, z_2$ decays in $1/|z_1-z_2|$ [Reference Cipolloni, Erdős and Schröder14] and (iii) more recently when the gain of order $\sqrt {\eta }$ due to the traceless A in equation (1.3) was obtained in [Reference Cipolloni, Erdős and Schröder15].
Extracting instead of $\|A\|$ , especially in the multi-resolvent case, seems even more involved in this way since estimating A simply by its norm appears everywhere in any diagrammatic expansion. However, very recently in [Reference Cipolloni, Erdős and Schröder18] we introduced a new method of a system of master inequalities that circumvents the full diagrammatic expansion. The power of this method was demonstrated by fully extracting the maximal $\sqrt {\eta }$ -gain from traceless A even in the multi-resolvent setup; the same result seemed out of reach with the diagrammatic method used for the single-resolvent setup in [Reference Cipolloni, Erdős and Schröder15]. In the current paper, we extend this technique to obtain the optimal control in terms of instead of $\|\mathring {A}\|$ for single resolvent local laws. However, the master inequalities in this paper are different from the ones in [Reference Cipolloni, Erdős and Schröder18]; in fact, they are much tighter since the effect we extract now is much more delicate. We also obtain a similar optimal control for the multi-resolvent local laws needed to prove the Gaussianity of the bulk eigenvector overlaps under the optimal condition on A.
Notations and conventions
We denote vectors by bold-faced lowercase Roman letters ${\boldsymbol x}, {\boldsymbol y}\in \mathbf {C} ^N$ , for some $N\in \mathbf {N}$ . Vector and matrix norms,
and
, indicate the usual Euclidean norm and the corresponding induced matrix norm. For any $N\times N$ matrix A, we use the notation
to denote the normalised trace of A. Moreover, for vectors ${\boldsymbol x}, {\boldsymbol y}\in \mathbf {C}^N$ and matrices $A\in \mathbf {C}^{N\times N}$ , we define
We will use the concept of ‘with very high probability’, meaning that for any fixed $D>0$ , the probability of an N-dependent event is bigger than $1-N^{-D}$ if $N\ge N_0(D)$ . We introduce the notion of stochastic domination (see, for example, [Reference Erdős, Knowles, Yau and Yin24]): given two families of non-negative random variables
indexed by N (and possibly some parameter u in some parameter space $U^{(N)}$ ), we say that X is stochastically dominated by Y, if for all $\xi , D>0$ , we have
for large enough $N\geq N_0(\xi ,D)$ . In this case, we use the notation $X\prec Y$ or
. We also use the convention that $\xi>0$ denotes an arbitrary small constant that is independent of N.
Finally, for positive quantities $f,g$ we write $f\lesssim g$ and $f\sim g$ if $f \le C g$ or $c g\le f\le Cg$ , respectively, for some constants $c,C>0$ that depend only on the constants appearing in the moment condition; see equation (2.2) later.
2 Main results
Assumption 1. We say that $W=W^{\ast }\in \mathbf {C}^{N\times N}$ is a real symmetric/complex hermitian Wigner-matrix if the entries $(w_{ab})_{a\le b}$ in the upper triangular part are independent and satisfy
for some real random variable $\chi _{\mathrm {d}}$ and some real/complex random variable $\chi _{\mathrm {od}}$ of mean $\operatorname {\mathbf {E}} \chi _{\mathrm {d}}=\operatorname {\mathbf {E}} \chi _{\mathrm {od}}=0$ and variances , $\operatorname {\mathbf {E}}\chi _{\mathrm {od}}^2=0$ , $\operatorname {\mathbf {E}} \chi _{\mathrm {d}}^2=1$ in the complex, and , $\operatorname {\mathbf {E}} \chi _{\mathrm {d}}^2=2$ in the real case.Footnote 3 We furthermore assume that for every $ n\ge 3$ ,
for some constant $C_n$ ; in particular, all higher-order cumulants $\kappa _n^{\mathrm {d}},\kappa _n^{\mathrm {od}}$ of $\chi _{\mathrm {d}}, \chi _{\mathrm {od}} $ are finite for any n.
Our results hold for both symmetry classes, but for definiteness, we prove the main results in the real case, the changes for the complex case being minimal.
For a spectral parameter $z\in \mathbf {C}$ with , the resolvent $G=G(z)=(W-z)^{-1}$ of a $N\times N$ Wigner matrix W is well approximated by a constant multiple $m\cdot I$ of the identity matrix, where $m=m(z)$ is the Stieltjes transform of the semicircular distribution $\sqrt {4-x^2}/(2\pi )$ and satisfies the equation
We set $\rho (z): = |\Im m(z)|$ , which approximates the density of eigenvalues near $\Re z$ in a window of size $\eta $ .
We first recall the classical local law for Wigner matrices in both its tracial and isotropic forms [Reference Erdős, Schlein and Yau27, Reference Erdős, Yau and Yin29, Reference He, Knowles and Rosenthal34, Reference Knowles and Yin38]:
Theorem 2.1. Fix any $\epsilon>0$ ; then it holds that
uniformly in any deterministic vectors $\boldsymbol {x}, \boldsymbol {y}$ and spectral parameter z with and $\Re z\in \mathbf {R}$ , where .
Our main result is the following optimal multi-resolvent local law with Hilbert-Schmidt norm error terms. Compared to Theorem 2.1, we formulate the bound only in an averaged sense since, due to the Hilbert-Schmidt norm in the error term, the isotropic bound is a special case with one of the traceless matrices being a centred rank-one matrix; see Corollary 2.4.
Theorem 2.2 (Averaged multi-resolvent local law).
Fix $\epsilon>0$ , let $k\ge 1$ , and consider $z_1,\ldots ,z_{k}\in \mathbf {C}$ with $N\eta \rho \ge N^{\epsilon }$ , for , and let $A_1,\ldots ,A_k$ be deterministic traceless matrices . Set $G_i:= G(z_i)$ and $m_i:= m(z_i)$ for all $i\le k$ . Then we have the local law on optimal scaleFootnote 4
Remark 2.3. We also obtain generalisations of Theorem 2.2, where each G may be replaced by a product of Gs and ; see Lemma 3.1 later.
Due to the Hilbert-Schmidt sense of the error term, we obtain an isotropic variant of Theorem 2.2 as an immediate corollary by choosing in equation (2.5).
Corollary 2.4 (Isotropic local law).
Under the setup and conditions of Theorem 2.2, for any vectors $\boldsymbol {x},\boldsymbol {y}$ , it holds that
We now compare Theorem 2.2 to the previous result [Reference Cipolloni, Erdős and Schröder18, Theorem 2.5], where an error term was proven for equation (2.5). For clarity, we focus on the really interesting $d<10$ regime.
Remark 2.5. For $k=1$ , our new estimate for traceless A,
is strictly better than the one in [Reference Cipolloni, Erdős and Schröder18, Theorem 2.5], since always holds, but can be much smaller than for small rank A. In addition, equation (2.7) features an additional factor $\sqrt {\rho }\lesssim 1$ that is considerably smaller than 1 near the spectral edges.
For larger $k\ge 2$ , the relationship depends on the relative size of the Hilbert-Schmidt and operator norm of the $A_i$ s as well as on the size of $\eta $ . We recall [Reference Rudelson and Vershynin46] that the numerical rank of A is defined as and say that A is $\alpha $ -mesoscopic for some $\alpha \in [0,1]$ if $r(A)=N^{\alpha }$ . If for some $k\ge 2$ all $A_i$ are $\alpha $ -mesoscopic, then Theorem 2.2 improves upon [Reference Cipolloni, Erdős and Schröder18, Theorem 2.5] whenever $\eta \ll N^{( 1 -\alpha k) /(k-1)}$ .
Local laws on optimal scales can give certain information on eigenvectors as well. Let $\lambda _1\le \lambda _2 \le \ldots \le \lambda _N$ denote the eigenvalues and $\{ {\boldsymbol u}_i\}_{i=1}^N$ the corresponding orthonormal eigenvectors of W. Already the single-resolvent isotropic local law given by equation (2.4) implies the eigenvector delocalisation: that is, that $\| {\boldsymbol u_i} \|_{\infty } \prec N^{-1/2}$ . More generally,Footnote 5 : that is, eigenvectors behave as completely random unit vectors in the sense of considering their rank- $1$ projections onto any deterministic vector ${\boldsymbol x}$ . This concept can be greatly extended to arbitrary deterministic observable matrix A, leading to the following results motivated both by thermalisation ideas from physics [Reference D’Alessio, Kafri, Polkovnikov and Rigol21, Reference Deutsch22, Reference Eckhardt, Fishman, Keating, Agam, Main and Müller23, Reference Feingold and Peres30] and by quantum (unique) ergodicity (QUE) in mathematics [Reference Anantharaman and Le Masson2, Reference Anantharaman and Sabri3, Reference Bauerschmidt, Huang and Yau4, Reference Bauerschmidt, Knowles and Yau5, Reference Colin de Verdière20, Reference Luo and Sarnak41, Reference Marklof and Rudnick44, Reference Rudnick and Sarnak47, Reference Snirelman48, Reference Soundararajan49, Reference Zelditch54, Reference Zelditch55].
Theorem 2.6 (Eigenstate thermalisation hypothesis).
Let W be a Wigner matrix satisfying Assumption 1, and let $\delta>0$ . Then for any deterministic matrix A and any bulk indices $i,j\in [\delta N,(1-\delta )N]$ , it holds that
where is the traceless part of A.
Remark 2.7.
-
1. The result given by equation (2.8) was established in [Reference Cipolloni, Erdős and Schröder15] with replaced by uniformly in the spectrum (i.e., also for edge indices).
-
2. For rank- $1$ matrices $A=\boldsymbol {x}\boldsymbol {x}^{\ast }$ , the bound given by equation (2.8) immediately implies the complete delocalisation of eigenvectors in the form .
Theorem 2.6 directly follows from the bound
that is obtained by the spectral decomposition of both resolvents and the well-known eigenvalue rigidity, with some explicit $\delta $ -dependent constants $C_{\delta }$ and $\epsilon =\epsilon (\delta )>0$ (see [Reference Cipolloni, Erdős and Schröder15, Lemma 1] for more details). The right-hand side can be directly estimated using equation (2.5); and finally, choosing $\eta = N^{-1+\xi }$ for any small $\xi>0$ gives equation (2.8) and thus proves Theorem 2.6.
The next question is to establish a central limit theorem for the diagonal overlap in equation (2.8).
Theorem 2.8 (Central limit theorem in the QUE).
Let W be a real symmetric ( $\beta =1$ ) or complex Hermitian ( $\beta =2$ ) Wigner matrix satisfying Assumption 1. Fix small $\delta ,\delta '>0$ , and let $A=A^*$ be a deterministic $N\times N$ matrix with . In the real symmetric case, we also assume that $A\in \mathbf {R}^{N\times N}$ is real. Then for any bulk index $i\in [\delta N, (1-\delta ) N]$ , we have a central limit theorem
with $\mathcal {N}$ being a standard real Gaussian random variable. Moreover, for any moment, the speed of convergence is explicit (see equation (B.5)).
We require that in order to ensure that the spectral distribution of $\mathring {A}$ is not concentrated to a finite number eigenvalues: that is, that $\mathring {A}$ has effective rank $\gg 1$ . Indeed, the statement in equation (2.9) does not hold for finite-rank As: for example, if $A=\mathring {A}=|\mathbf {e}_x\rangle \langle {\boldsymbol e}_x|-|{\boldsymbol e}_y\rangle \langle \mathbf {e}_y|$ , for some $x\ne y\in [N]$ , then , which is the difference of two asymptotically independent $\chi ^2$ -distributed random variables (e.g., see [Reference Bourgade and Yau10, Theorem 1.2]). More generally, the joint distribution of finitely many eigenvectors overlaps has been identified in [Reference Aggarwal, Lopatto and Marcinek1, Reference Bourgade and Yau10, Reference Bourgade, Huang and Yau11, Reference Marcinek43] for various related ensembles.
3 Proof of Theorem 2.2
In this section, we prove Theorem 2.2 in the critical $d<10$ regime. The $d\ge 10$ regime is handled similarly, but the estimates are much simpler; the necessary modifications are outlined in Appendix A.
In the subsequent proof, we will often assume that a priori bounds, with some control parameters $\psi _K^{\mathrm {av}/\mathrm {iso}}\ge 1$ , of the form
for certain indices $K\ge 0$ have been established uniformlyFootnote 6 in deterministic traceless matrices $\boldsymbol A=(A_1,\ldots ,A_K)$ , deterministic vectors $\boldsymbol {x}, \boldsymbol {y}$ and spectral parameters $\boldsymbol z=(z_1,\ldots ,z_K)$ with $N\eta \rho \ge N^{\epsilon }$ . We stress that we do not assume the estimates to be uniform in K. Note that $\psi _0^{\mathrm {av}}$ is defined somewhat differently from $\psi _K^{\mathrm {av}}$ , $K\ge 1$ , but the definition of $\psi ^{\mathrm {iso}}_K$ is the same for all $K\ge 0$ . For intuition, the reader should think of the control parameters as essentially order-one quantities; in fact, our main goal will be to prove this fact. Note that by Theorem 2.1, we may set $\psi _0^{\mathrm {av}/\mathrm {iso}}=1$ .
As a first step, we observe that equations (3.1), (3.2) and (3.3) immediately imply estimates on more general averaged resolvent chains and isotropic variants.
Lemma 3.1. (i) Assuming equations (3.1) and (3.3) for $K=0$ holds uniformly in $z_1$ , then for any $z_1,\ldots ,z_l$ with $N\eta \rho \ge N^{\epsilon }$ , it holds that
where $m[z_1,\ldots ,z_l]$ stands for the lth divided difference of the function $m(z)$ from equation (2.3), explicitly
(ii) Assuming for some $k\ge 1$ the estimates given by equations (3.2) and (3.3) for $K=k$ have been established uniformly, then for $\mathcal {G}_j:=G_{j,1}\cdots G_{j,l_j}$ with , traceless matrices $A_i$ and , $\rho := \max _{j,i} \rho (z_{j,i})$ , it holds that
where
and $g_{j,i}(x)=(x-z_{j,i})^{-1}$ or , depending on whether $G_{j,i}=G(z_{j,i})$ or .
Proof. Analogous to [Reference Cipolloni, Erdős and Schröder18, Lemma 3.2].
The main result of this section is the following hierarchy of master inequalities.
Proposition 3.2 (Hierarchy of master inequalities).
Fix $k\ge 1$ , and assume that equations (3.2) and (3.3) have been established uniformly in $\boldsymbol A$ and $\boldsymbol z$ with $N\eta \rho \ge N^{\epsilon }$ for all $ K\le 2k$ . Then it holds that
with the definition
where the sum is taken over an arbitrary number of non-negative integers $k_i$ , with $k_i\ge 1$ for $i\ge 3$ , under the condition that their sum does not exceed k (in the case of only one nonzero $k_1$ , the second factor and product in equation (3.10) are understood to be one and $\Phi _0=1$ ).
This hierarchy has the structure that each $\Psi ^{\mathrm {av}/\mathrm {iso}}_k$ is estimated partly by $\psi $ s with an index higher than k, which potentially is uncontrollable even if the coefficient of the higher-order terms is small (recall that $1/(N\eta )$ and $1/(N\eta \rho )$ are small quantities). Thus the hierarchy must be complemented by another set of inequalities that estimate higher-indexed $\Psi $ s with smaller-indexed ones even at the expense of a large constant. The success of this scheme eventually depends on the relative size of these small and large constants, so it is very delicate. We prove the following reduction inequalities to estimate the $\psi _l^{\mathrm {av}/\mathrm {iso}}$ terms with $k+1\le l\le 2k$ in equations (3.8) and (3.9) by $\psi $ s with indices smaller than or equal to k.
Lemma 3.3 (Reduction lemma).
Fix $1\le j\le k$ , and assume that equations (3.2) and (3.3) have been established uniformly for $K\le 2k$ . Then it holds that
and for even k also that
The rest of the present section is structured as follows: in Section 3.1, we prove equation (3.8), and in Section 3.2, we prove equation (3.9). Then, in Section 3.3, we prove Lemma 3.3 and conclude the proof of Theorem 2.2. Before starting the main proof, we collect some trivial estimates between Hilbert-Schmidt and operator norms using matrix Hölder inequalities.
Lemma 3.4. For $N\times N$ matrices $B_1,\ldots ,B_k$ and $k\ge 2$ , it holds that
and
In the sequel, we often drop the indices from $G,A$ ; hence we write $(GA)^k$ for $G_1A_1\ldots G_kA_k$ and assume without loss of generality that $A_i=A_i^{\ast }$ and . We also introduce the convention in this paper that matrices denoted by capital A letters are always traceless.
3.1 Proof of averaged estimate given by equation (3.8) in Proposition 3.2
We now identify the leading contribution of . For any matrix-valued function $f(W)$ , we define the second moment renormalisation, denoted by underlining, as
in terms of the directional derivative $\partial _{\widetilde W}$ in the direction of an independent GUE-matrix $\widetilde W$ . The motivation for the second moment renormalisation is that by Gaussian integration by parts, it holds that $\operatorname {\mathbf {E}} W f(W)=\widetilde {\operatorname {\mathbf {E}}}\widetilde W (\partial _{\widetilde W} f)(W)$ whenever W is a Gaussian random matrix of zero mean and $\widetilde W$ is an independent copy of W. In particular, it holds that $\operatorname {\mathbf {E}}\underline {Wf(W)}=0$ whenever W is a GUE-matrix, while $\operatorname {\mathbf {E}}\underline {Wf(W)}$ is small but nonzero for GOE or non-Gaussian matrices. By concentration and universality, we expect that to leading order $Wf(W)$ may be approximated by $\widetilde {\operatorname {\mathbf {E}}} \widetilde W(\partial _{\widetilde W}f)(W)$ . Here the directional derivative $\partial _{\widetilde W}f$ should be understood as
In our application, the function $f(W)$ is always a (product of) matrix resolvents $G(z)=(W-z)^{-1}$ and possibly deterministic matrices $A_i$ . This time, we view the resolvent as a function of W, $G(W)= (W-z)^{-1}$ for any fixed z. By the resolvent identity, it follows that
while the expectation of a product of GUE-matrices acts as an averaged trace in the sense
where I denotes the identity matrix and $(\Delta ^{ab})_{cd}:=\delta _{ac}\delta _{bd}$ . Therefore, for instance, we have the identities
Finally, we want to comment on the choice of renormalising with respect to an independent GUE rather than a GOE matrix. This is purely a matter of convenience, and we could equally have chosen the GOE-renormalisation. Indeed, we have
and therefore, for instance,
which is a negligible difference. Our formulas below will be slightly simpler with our choice in equation (3.15), even though now $E\underline {W f(W)}$ is not exactly zero even for $W \sim \mathrm {GOE}$ .
Lemma 3.5. We have
where $\mathcal {E}_1^{\mathrm {av}}=0$ and
for $k\ge 3$ .
Proof. We start with the expansion
due to
where for $k=1$ the first two terms in the right-hand side of equation (3.19) are not present. In the second step, we extended the underline renormalisation to the entire product $\underline {WG_1A_1G_2\cdots G_kA_k}$ at the expense of generating additional terms collected in the summation; this identity can be obtained directly from the definition given by equation (3.15). Note that in the first line of equation (3.19), we moved the term coming from of equation (3.20) to the left-hand side, causing the error $\mathcal {O}_{\prec }(\psi _0^{\mathrm {av}}/(N\eta ))$ . For $k\ge 2$ , using Lemmas 3.1 and 3.4, we estimate the second term in the second line of equation (3.19) by
For the first term in the second line of equation (3.19), we distinguish the cases $k=2$ and $k\ge 3$ . In the former, we write
where we used Lemma 3.4 to estimate
In case $k\ge 3$ , we estimate
Note that the leading deterministic term of was simply estimated as
From equation (3.24), we write , where the second term can simply be estimated as , due to Lemma 3.4, and included in the error term. Collecting all other error terms from equations (3.21) and (3.24) and recalling $\psi _j^{\mathrm {av}/\mathrm {iso}}\ge 1\gtrsim \sqrt {\rho /(N\eta )}$ for all j, we obtain equation (3.17) with the definition of $\mathcal {E}_k$ from equation (3.18).
Lemma 3.5 reduces understanding the local law to the underlined term in equation (3.19) since $\mathcal {E}_k^{\mathrm {av}}$ will be treated as an error term. For the underlined term, we use a cumulant expansion when calculating the high moment for any fixed integer p. Here we will again make a notational simplification, ignoring different indices in G, A and m; in particular, we may write
by choosing $G=G(\overline {z_i})$ for half of the factors.
We set $\partial _{ab}:= \partial /\partial w_{ab}$ as the derivative with respect to the $(a,b)$ -entry of W: that is, we consider $w_{ab}$ and $w_{ba}$ as independent variables in the following cumulant expansion (such expansion was first used in the random matrix context in [Reference Khorunzhy, Khoruzhenko and Pastur35] and later revived in [Reference He and Knowles33, Reference Lee and Schnelli39]):
Technically, we use a truncated version of the expansion above; see, for example, [Reference Erdős, Krüger and Schröder26, Reference He and Knowles33]. We thus computeFootnote 7
recalling Assumption 1 for the diagonal and off-diagonal cumulants. The summation runs over all indices $a,b\in [N]$ . The second cumulant calculation in equation (3.27) used the fact that by definition of the underline renormalisation the $\partial _{ba}$ -derivative in the first line may not act on its own $(GA)^k$ .
For the first term of equation (3.27), we use due to equation (3.16) with $\widetilde W=\Delta ^{ab}$ so that using $G^t=G$ , we can perform the summation and obtain
from Lemma 3.1, estimating the deterministic leading term of by as in equation (3.25). The first prefactor in the right-hand side of equation (3.28) is already written as the square of the target size $N^{k/2-1}\sqrt {\rho /(N\eta )}$ for ; see equation (2.5).
For the second term of equation (3.27), we estimate
recalling that $G=G^t$ since W is real symmetric.Footnote 8
For the second line of equation (3.27), we define the set of multi-indices $\boldsymbol l = (l_1, l_2, \ldots , l_n)$ with arbitrary length n, denoted by and total size $k=\sum _i l_i$ as
Note that the set $\mathcal {I}_k^{\mathrm {d}}$ is a finite set with cardinality depending only on $k,p$ . We distribute the derivatives according to the product rule to estimate
where for the multiset J, we define and set
Here, for the multiset $J\subset \mathcal {I}_k^{\mathrm {d}}$ , we defined its cardinality by $|J|$ and set . Along the product rule, the multi-index $\boldsymbol l$ encodes how the first factor $([(GA)^k]_{aa}$ in equation (3.30) is differentiated, while each element $\boldsymbol j\in J$ is a multi-index that encodes how another factor is differentiated. Note that $|J|$ is the number of such factors affected by derivatives; the remaining $p-1-|J|$ factors are untouched.
For the third line of equation (3.27), we similarly define the appropriate index set that is needed to encode the product ruleFootnote 9
Note that in addition to the multi-index $\boldsymbol l$ encoding the distribution of the derivatives after the Leibniz rule similarly to the previous diagonal case, the second element $\boldsymbol \alpha $ of the new type of indices also keeps track of whether, after the differentiations, the corresponding factor is evaluated at $ab, ba, aa$ or $bb$ . While a single $\partial _{ab}$ or $\partial _{ba}$ acting on results in an off-diagonal term of the form $[(GA)^kG]_{ab}$ or $[(GA)^kG]_{ba}$ , a second derivative also produces diagonal terms. The derivative action on the first factor $[(GA)^k]_{ba} $ in the third line of equation (3.27) produces diagonal factors already after one derivative. The restriction in equation (3.31) that the number of $aa$ - and $bb$ -type diagonal elements must coincide comes from a simple counting of diagonal indices along derivatives: when an additional $\partial _{ab}$ hits an off-diagonal term, then either one $aa$ and one $bb$ diagonal are created or none. Similarly, when an additional $\partial _{ab}$ hits a diagonal $aa$ term, then one diagonal $aa$ remains, along with a new off-diagonal $ab$ . In any case, the difference between the $aa$ and $bb$ diagonals is unchanged.
Armed with this notation, similarly to equation (3.30), we estimate
where for the multiset $J\subset \mathcal {I}_k^{\mathrm {od}}$ , we define and set
Note that equation (3.33) is an overestimate: not all indices $(\boldsymbol j,\boldsymbol \beta )$ indicated in equation (3.34) can actually occur after the Leibniz rule.
Lemma 3.6. For any $k\ge 1$ , it holds that
By combining Lemma 3.5 and equations (3.27), (3.28), (3.30) and (3.33) with Lemma 3.6 and using a simple Hölder inequality, we obtain, for any fixed $\xi>0$ , that
where we used the $\Xi _k^{\mathrm {d}}$ term to add back the $a=b$ part of the summation in equation (3.33) compared to equation (3.27). By taking p large enough and $\xi $ arbitrarily small and using the definition of $\prec $ and the fact that the bound given by equation (3.36) holds uniformly in the spectral parameters and the deterministic matrices, we conclude the proof of equation (3.8).
Proof of Lemma 3.6.
The proof repeatedly uses equation (3.3) in the form
with $\boldsymbol e_b$ being the bth coordinate vector, where we estimated the deterministic leading term $m^k(A^k)_{ab}$ by using equation (3.14). Recalling the normalisation , the best available bound on is ; however, this can be substantially improved under a summation over the index b:
Using equations (3.37) and (3.38) for each entry of equations (3.31) and (3.34), we obtain the following naive (or a priori) estimates on $\Xi _k^{\mathrm {d/od}}$
where we defined
Note that $\Omega _k\le \Phi _k$ just by choosing $k_1=k_2=0$ in the definition of $\Phi _k$ , equation (3.10), and thus $\Omega _k/\sqrt {N}\lesssim \Phi _k \sqrt {\rho /(N\eta )}$ since $1\lesssim \rho /\eta $ . Hence equation (3.35) follows trivially from equation (3.40) for $\Xi _k^{\mathrm {d}}$ and $\Xi _k^{\mathrm {od}}$ whenever or , respectively: that is, when the exponent of N in equation (3.40) is nonpositive.
In the rest of the proof, we consider the remaining diagonal D1 and off-diagonal cases O1–O3 that we will define below. The cases are organised according to the quantity that captures by how many factors of $N^{1/2}$ the naive estimate given by equation (3.40) exceeds the target in equation (3.35) when all $\Phi $ s and $\psi $ s are set to be order one. Within case O1, we further differentiate whether an off-diagonal index pair $ab$ or $ba$ appears at least once in the tuple $\boldsymbol \alpha $ or in one of the tuples $\boldsymbol \beta $ . Within case O2, we distinguish according to the length of and as follows:
-
D1
-
O1
-
Ola $ab\vee ba \in \boldsymbol \alpha \cup \bigcup _{({\boldsymbol {j}},\boldsymbol \beta )\in J} \boldsymbol \beta $
-
Olb and : that is, and
-
-
O2
-
O2a ,
-
O2b , ,
-
O2c , , $l_1\ge 1$ ,
-
O2d , , $l_1= 0$ .
-
-
O3
The list of four cases above is exhaustive since by definition, and the subcases of O2 are obviously exhaustive. Within case O1, either some off-diagonal element appears in $\boldsymbol \alpha $ or some $\boldsymbol \beta $ (hence we are in case Ola), or the number of elements in $\boldsymbol \alpha $ and all $\boldsymbol \beta $ is even; compare to the constraint on the number of diagonal elements in equation (3.32). The latter case is only possible if , , which is case Olb (note that implies , and is impossible as it would imply , the number of elements in $\boldsymbol \alpha $ , is odd).
Now we give the estimates for each case separately. For case D1, using the restriction in the summation in equation (3.33) to get , we estimate
where we used the first inequalities of equations (3.37) and (3.38) for the $(GA)^k$ and one of the $(GA)^kG$ factors and the second inequality of equation (3.37) for the remaining factors, and in the last step, we used equation (3.39) and $\psi _k^{\mathrm {iso}}\sqrt {\rho /\eta }\gtrsim 1$ . Finally, we use Young’s inequality . This confirms equation (3.35) in case D1.
For the off-diagonal cases, we will use the following so-called Ward-improvements:
-
I1 Averaging over a or b in gains a factor of $\sqrt {\rho /(N\eta )}$ compared to equation (3.37).
-
I2 Averaging over a in gains a factor of $\sqrt {\rho /(N\eta )}$ compared to equation (3.38),
at the expense of replacing a factor of $(1+\psi _k^{\mathrm {iso}}\sqrt {\rho /(N\eta )})$ in the definition of $\Omega _k$ by a factor of $(1+\psi ^{\mathrm {iso}}_{2k}/\sqrt {N\eta \rho })^{1/2}$ . These latter replacements necessitate changing $\Omega _k$ to the larger $\Phi _k$ as a main control parameter in the estimates after Ward improvements. Indeed, I1 and I2 follow directly from equation (3.6) of Lemma 3.1 and , more precisely
where the first step in each case followed from a Schwarz inequality and summing up the indices explicitly. This improvement is essentially equivalent to using the Ward-identity $GG^*= \Im G/\eta $ in equation (3.43).
Now we collect these gains over the naive bound given in equation (3.40) for each case. Note that whenever a factor $\sqrt {\rho /(N\eta )}$ is gained, the additional $1/\sqrt {N}$ is freed up along the second inequality in equation (3.40) that can be used to compensate the positive N-powers.
For case O3, we have and estimate all but the first two $(\boldsymbol j,\boldsymbol \beta )$ factors in equation (3.34) trivially, using the last inequality in equation (3.37) to obtain
For the last two factors, we use the first inequality in equation (3.37) and then estimate as
where in the second step, we performed a Schwarz inequality for the double $a, b$ summation and used the last bound in equations (3.43), (3.39) and $1\lesssim \psi _k^{\mathrm {iso}}\sqrt { \rho /\eta }$ . Thus, we conclude
In case O2a, there exists some ${\boldsymbol {j}}$ with (recall that ). By estimating the remaining J-terms trivially by equation (3.37), we obtain
for some $j_1+j_2=k$ and double indices $\beta _1,\beta _2 \in \{ aa, bb, ab, ba\}$ . Here, in the second step, we assumed without loss of generality $j_1\ge 1$ (the case $j_2\ge 1$ being completely analogous) and used the first inequality in equation (3.37) for and the second inequality in equation (3.37) for . Finally, in the last step, we performed an $a,b$ -Schwarz inequality, using the last bound in equations (3.43) and (3.39).
In case O2b, we have for all ${\boldsymbol {j}}$ since implies , and we estimate all but two J-factors trivially by the last inequality in equation (3.37), the other two J-factors (which are necessarily off-diagonal) by the first inequality in equation (3.37), the $l_1$ -factor by the last inequality in equation (3.37) and the $l_2$ factor by the first inequality in equation (3.38) (note that $l_2\ge 1$ ) to obtain
where the last step used equation (3.39) and $\psi _k^{\mathrm {iso}} \sqrt {\rho /\eta }\gtrsim 1$ .
In case O2c, we use the first inequalities of equations (3.37) and (3.38) for the $l_1,l_2$ -terms (since $l_1,l_2\ge 1$ ) and the first inequality of equation (3.37) for the $(GA)^kG$ factor to obtain
by equation (3.39).
In case O2d, we write the single- $ G$ diagonal as
and use isotropic resummation for the leading m term into the $\boldsymbol 1=(1,1,\ldots )$ vector of norm
, that is,
and estimate
using the first inequalities of equations (3.37) and (3.38).
In case Ola, we use either I1 or I2, depending on whether the off-diagonal matrix is of the form $(GA)^lG$ or $(GA)^l$ , to gain one factor of $\sqrt {\rho /(N\eta )}$ in either case and conclude equation (3.35).
Finally, we consider case Olb, where there is no off-diagonal element to perform Ward-improvement, but for which, using equation (3.39), we estimate
for any exponents with $k_1+k_2=k_3+k_4=k$ . Here, in case $k_4>0$ , we used the second inequalities of equations (3.37) and (3.38) for the $k_2,k_4$ factors and the first inequality of equation (3.37) for the $k_1,k_3$ factors. The case $k_4=0$ is handled similarly, with the same result, by estimating $[(GA)^{k_3}G]_{aa}$ instead of $[(GA)^{k_4}G]_{bb}$ using the first inequality of equation (3.37).
3.2 Proof of the isotropic estimate given by equation (3.9) in Proposition 3.2
First we state the isotropic version of Lemma 3.5:
Lemma 3.7. For any deterministic unit vectors $\boldsymbol {x}, \boldsymbol {y}$ and $k\ge 0$ , we have
where $\mathcal {E}_0^{\mathrm {iso}}=0$ and for $k\ge 1$
Proof. From equation (3.20) applied to the first factor $G=G_1$ , similarly to equation (3.19), we obtain
where we used the definition in equation (3.3) for the first term and the definition in equation (3.15). An estimate analogous to equation (3.21) handles the sum and is incorporated in equation (3.53). This concludes the proof together with Lemma 3.1 and .
Exactly as in equation (3.27), we perform a cumulant expansion
recalling Assumption 1 for the diagonal and off-diagonal cumulants. In fact, the formula in equation (3.55) is identical to equation (3.27) for $k+1$ instead of k if the last $A=A_{k+1}$ in the product $(GA)^{k+1}= G_1 A_1G_2 A_2\ldots G_{k+1} A_{k+1}$ is chosen specifically $A_{k+1}= \boldsymbol {y}\boldsymbol {x}^*$ .
For the first line of equation (3.55), after performing the derivative, we can also perform the summations and estimate the resulting isotropic resolvent chains by using the last inequality of equation (3.37) as well as Lemma 3.1 to obtain
For the second line of equation (3.55), we estimate
For the third and fourth lines of equation (3.55), we distribute the derivatives according to the product rule to estimate (with the absolute value inside the summation to address both diagonal and off-diagonal terms)
where
and the summation in equation (3.57) is performed over all $\boldsymbol j=(j_0,\ldots ,j_n) \in \mathbf {N}_0^n$ with $j_0\ge 0$ , $j_1,\ldots ,j_n\ge 1$ and . Recall that $\sum {\boldsymbol j}=j_0+ j_1+ j_2+\ldots +j_n$ .
Lemma 3.8. For any admissible $\boldsymbol j$ in the summation of equation (3.57), it holds that
By combining Lemmas 3.7 and 3.8 and equations (3.56), (3.57) and (3.58), we obtain
concluding the proof of equation (3.9).
Proof of Lemma 3.8.
We recall the notations $\Omega _k,\Phi _k$ from equations (3.10) and (3.41). For a naive bound, we estimate all but the first factor trivially in equation (3.58) with
Note that the estimate is independent of the number of derivatives. For the first factor in equation (3.58), we estimate, after performing the derivatives, all but the last $[(GA)^{k_i}G]$ -factor (involving $\boldsymbol y$ ) trivially by equation (3.37) as
By combining equations (3.61) and (3.62) and the Schwarz-inequality
we conclude
which implies equation (3.59) in the case when $\sum \boldsymbol j\ge n+2$ using that $\Omega _k\le \Phi _k$ and $\rho /\eta \gtrsim 1$ . It thus only remains to consider the cases $\sum \boldsymbol j=n$ and $\sum \boldsymbol j=n+1$ .
If $\sum \boldsymbol j=n$ , then $n\ge 2$ and $j_0=0$ , $j_1=j_2=\cdots =1$ . By estimating the $j_2,j_3,\ldots $ factors in equation (3.58) using equation (3.61), we then bound
using and $\Omega _k\le \Phi _k$ , $1\lesssim \rho /\eta $ in the last step.
Finally, if $\sum \boldsymbol j=n+1$ , then $n\ge 1$ by admissibility and either $j_0=0$ or $j_1=1$ . In the first case, we estimate the $j_2,j_3,\ldots $ factors in equation (3.58) using equation (3.61) and all but the first $[(GA)^jG]_{\boldsymbol {x}\cdot }$ in the $j_1$ -factor after differentiation trivially to obtain
again using a Schwarz inequality. Finally, in the $j_1=1$ case, we estimate two $j_0$ -factor using equation (3.62), the $j_2,j_3,\ldots $ factors trivially and to bound
where we used the trivial bound for the in order to estimate the remaining terms by a Schwarz inequality. This completes the proof of the lemma.
3.3 Reduction inequalities and bootstrap
In this section, we prove the reduction inequalities in Lemma 3.3 and conclude the proof of our main result Theorem 2.2 showing that $\psi _k^{\mathrm {av}/\mathrm {iso}}\lesssim 1$ for any $k\ge 0$ .
Proof of Lemma 3.3.
The proof of this proposition is very similar to [Reference Cipolloni, Erdős and Schröder18, Lemma 3.6]; we thus present only the proof in the averaged case. Additionally, we only prove the case when k is even; if k is odd, the proof is completely analogous.
Define $T=T_k:=A(GA)^{k/2-1}$ , write $(GA)^{2k} = GTGTGTGT$ , and use the spectral theorem for these four intermediate resolvents. Then, using that $|m_i|\lesssim 1$ and that
, after a Schwarz inequality in the third line, we conclude that
We remark that to bound
in terms of $\psi _k^{\mathrm {av}}$ , we used (ii) of Lemma 3.1 together with $G^*(z) = G(\bar z)$ .
We are now ready to conclude the proof of our main result.
Proof of Theorem 2.2.
The proof repeatedly uses a simple argument called iteration. By this, we mean the following observation: whenever we know that $X\prec x$ implies
for some constants $B\ge N^{\delta }$ , $A,C>0$ and exponent $0<\alpha <1$ , and we know that $X\prec N^D$ initially (here $\delta , \alpha $ and D are N-independent positive constants; other quantities may depend on N), then we also know that $X\prec x$ implies
The proof is simply to iterate equation (3.68) finitely many times (depending only on $\delta , \alpha $ and D). The fact that $\Psi _k^{\mathrm {av}/\mathrm {iso}}\prec N^D$ follows by a simple norm bound on the resolvents and A, so the condition $X\prec N^D$ is always satisfied in our applications.
By the standard single resolvent local laws in equation (2.4), we know that $\psi _0^{\mathrm {av}}=\psi _0^{\mathrm {iso}}=1$ . Using the master inequalities in Proposition 3.2 and the reduction bounds from Lemma 3.3, in the first step, we will show that $\Psi _k^{\mathrm {av}/\mathrm {iso}}\prec \rho ^{-k/4}$ for any $k\ge 1$ as an a priori bound. Then, in the second step, we feed this bound into the tandem of the master inequalities, and the reduction bounds to improve the estimate to $\Psi _k^{\mathrm {av}/\mathrm {iso}}\prec 1$ . The first step is the critical stage of the proof; here we need to show that our bounds are sufficiently strong to close the hierarchy of our estimates to yield a better bound on $\Psi _k^{\mathrm {av}/\mathrm {iso}}$ than the trivial $\Psi _k^{\mathrm {av}/\mathrm {iso}}\le N^{k/2}\eta ^{-k-1}$ estimate obtained by using the norm bounds and . Once some improvement is achieved, it can be relatively easily iterated.
The proof of $\Psi _k^{\mathrm {av}/\mathrm {iso}}\prec \rho ^{-k/4} $ proceeds by a step-two induction: we first prove that $\Psi _k^{\mathrm {av},\mathrm {iso}}\prec \rho ^{-k/4} $ for $k=1,2$ and then show that if $\Psi _n^{\mathrm {av}/\mathrm {iso}}\prec \rho ^{-n/4} $ holds for all $n\le k-2$ , for some $k\ge 4$ , then it also holds for $\Psi _{k-1}^{\mathrm {av}/\mathrm {iso}}$ and $\Psi _{k}^{\mathrm {av}/\mathrm {iso}}$ .
Using equations (3.8)–(3.9), we have
for $k=1$ , using
Similarly, for $k=2$ , estimating explicitly
by Schwarz inequalities and plugging it into equations (3.8)–(3.9), we have
In these estimates, we frequently used that $\psi _k^{\mathrm {av}/\mathrm {iso}}\ge 1$ , $\rho \lesssim 1$ , $\rho /N\eta \le 1$ and $N\eta \rho \ge 1$ to simplify the formulas.
By equations (3.70)–(3.71), using iteration for the sum $\Psi _1^{\mathrm {av}}+\Psi _1^{\mathrm {iso}}$ , we readily conclude
Note that since equation (3.72) holds uniformly in the hidden parameters $A, z, \boldsymbol {x}, \boldsymbol {y}$ in $\Psi _1^{\mathrm {av}/\mathrm {iso}}$ , this bound serves as an upper bound on $\psi _1^{\mathrm {av}}+\psi _1^{\mathrm {iso}}$ (in the sequel, we will frequently use an already proven upper bound on $\Psi _k$ as an effective upper bound on $\psi _k$ in the next steps without explicitly mentioning it). Next, using this upper bound together with an iteration for $\Psi _2^{\mathrm {av}}+\Psi _2^{\mathrm {iso}}$ , we have from equation (3.71)
again after several simplifications by Young’s inequality and the basic inequalities $\psi _k^{\mathrm {av}/\mathrm {iso}}\ge 1$ , $\rho \lesssim 1$ and $N\eta \rho \ge 1$ .
We now apply the reduction inequalities from Lemma 3.3 in the form
where the first inequality was already inserted into the right-hand side of equation (3.12) to get the second inequality in equation (3.74).
Then, inserting equations (3.74) and (3.72) into equation (3.73) and using iteration, we conclude
which together with equation (3.72) implies
We now proceed with a step-two induction on k. The initial step of the induction is equation (3.76). Fix an even $k\ge 4$ , and assume that $\Psi _n^{\mathrm {av}/\mathrm {iso}}\prec \rho ^{-n/4}$ holds for any $n\le k-2$ . In this case, by substituting this bound for $\psi _n^{\mathrm {av}/\mathrm {iso}}$ whenever possible, for any even $l\le k$ , we have the following upper bounds on $\Phi _l$ and $\Phi _{l-1}$ :
We now plug equation (3.77) into equations (3.8) and (3.9) and, again using the bound $\Psi _n^{\mathrm {av}/\mathrm {iso}}\prec \rho ^{-n/4}$ , $n\le k-2$ , whenever possible, get
and
By iteration for $ \Psi _{k-1}^{\mathrm {av}}+ \Psi _{k-1}^{\mathrm {iso}}$ from equation (3.78), we thus get
where we used that $\mathfrak {E}_{k-2}\le \mathfrak {E}_{k-1}$ . Then using iteration for $ \Psi _{k}^{\mathrm {av}}+ \Psi _{k}^{\mathrm {iso}}$ from equation (3.79), we have
where we used that $\mathfrak {E}_{k-1}\le \mathfrak {E}_k$ .
We will now use the reduction inequalities from Lemma 3.3 in the following form:
and
for any $j\le l/2$ , where $l\le k-2$ is even. In the last step, we also used the last line of equation (3.82) to estimate $\psi _{2(l-2j)}^{\mathrm {av}}$ . Then by plugging equation (3.83) into equation (3.77), we readily conclude that
for any $r\le k$ .
Plugging equation (3.84) into equations (3.80) and (3.81) and using iteration, we conclude
We will now additionally use that by equation (3.12) for any $r\in \{k-1,k\}$ , we have
and that
for any $2\le j\le k-1$ .
Plugging these bounds, together with equation (3.82) for $j=k-1$ and $j=k$ , into equation (3.85), and using iteration first for $ \Psi _{k-1}^{\mathrm {av}}+\Psi _{k-1}^{\mathrm {iso}}$ and then for $ \Psi _{k}^{\mathrm {av}}+\Psi _{k}^{\mathrm {iso}}$ , we conclude that $\Psi _{k-1}^{\mathrm {av}/\mathrm {iso}}\prec \rho ^{-(k-1)/4}$ and that $\Psi _k^{\mathrm {av}/\mathrm {iso}}\prec \rho ^{-k/4}$ . This completes the step-two induction and hence the first and pivotal step of the proof.
In the second step, we improve the bounds $ \Psi _k^{\mathrm {av}/\mathrm {iso}}\prec \rho ^{-k/4}$ to $ \Psi _k^{\mathrm {av}/\mathrm {iso}}\prec 1$ for all k. Recall that the definition of stochastic domination given by equation (1.8) involved an arbitrary small but fixed exponent $\xi $ . Now we fix this $\xi $ , a large exponent D, and fix an upper threshold K for the indices. Our goal is to prove that
where the supremum over all indicated parameters are meant in the sense described below equation (3.3).
Now we distinguish two cases in the supremum over the collection of spectral parameters $\boldsymbol z$ in equation (3.87). In the regime where $\rho =\rho (\boldsymbol z) \ge N^{-\xi /K}$ , the bounds $\Psi _k^{\mathrm {av}/\mathrm {iso}}\prec \rho ^{-k/4}$ , already proven for all k, imply equation (3.87). Hence we can work in the opposite regime where $\rho < N^{-\xi /K}$ , and from now on, we restrict the supremum in equation (3.87) to this parameter regime. By plugging this bound into the master inequalities in Proposition 3.2 and noticing that $\Phi _k \le 1 + \rho ^{-k/4} (N\eta \rho )^{-1/4}$ , we directly conclude that
for any $k\ge 0$ . Now we can use this improved inequality by again plugging it into the master inequalities to achieve
and so on. Recalling the assumption that $N\eta \rho \ge N^{\epsilon }$ and recalling that $\rho \gtrsim \eta ^{1/2}\ge N^{-1/3}$ , we need to iterate this process finitely many times (depending on k, $\xi , K, \epsilon $ ) to also achieve $\Psi _k^{\mathrm {av}/\mathrm {iso}}\prec 1$ in the second regime. This concludes the proof of the theorem.
4 Stochastic eigenstate equation and proof of Theorem 2.8
Armed with the new local law (Theorem 2.2) and its direct corollary on the eigenvector overlaps (Theorem 2.6), the rest of the proof of Theorem 2.8 is very similar to the proof of [Reference Cipolloni, Erdős and Schröder17, Theorem 2.2], which is presented in [Reference Cipolloni, Erdős and Schröder17, Sections 3 and 4]. For this reason, we only explain the differences and refer to [Reference Cipolloni, Erdős and Schröder17] for a fully detailed proof. We mention that the proof in [Reference Cipolloni, Erdős and Schröder17] relies heavily on the theory of the stochastic eigenstate equation initiated in [Reference Bourgade and Yau10] and then further developed in [Reference Bourgade, Yau and Yin12, Reference Marcinek43].
Similarly to [Reference Cipolloni, Erdős and Schröder17, Sections 3-4], we present the proof only in the real case (the complex case is completely analogous and so omitted). We will prove Theorem 2.8 dynamically: that is, we consider the Dyson Brownian motion (DBM) with initial condition W and show that the overlaps of the eigenvectors have Gaussian fluctuations after a time t slightly bigger than $N^{-1}$ . With a separate argument in Appendix B, we show that the (small) Gaussian component added along the DBM flow can be removed at the price of a negligible error.
More precisely, we consider the matrix flow
where $\widetilde {B}_t$ is a standard real symmetric matrix Brownian motion (see, for example, [Reference Bourgade and Yau10, Definition 2.1]). We denote the resolvent of $W_t$ by $G=G_t(z):=(W_t-z)^{-1}$ , for $z\in \mathbf {C}\setminus \mathbf {R}$ . It is well known that in the limit $N\to \infty $ , the resolvent $G_t(z):=(W_t-z)^{-1}$ , for $z\in \mathbf {C}\setminus \mathbf {R}$ , becomes approximately deterministic and that its deterministic approximation is given by the scalar matrix $m_t\cdot I$ . The function $m_t=m_t(z)$ is the unique solution of the complex Burgers equation
with initial condition $m(z)=m_{\mathrm {sc}}(z)$ being the Stieltjes transform of the semicircular law. Denote $\rho _t=\rho _t(z):=\pi ^{-1}\Im m_t(z)$ ; then it is easy to see that $\rho _t(x+\mathrm {i} 0)$ is a rescaling of $\rho _0=\rho _{\mathrm {sc}}$ by a factor $1+t$ . In fact, $W_t$ is a Wigner matrix itself, with a normalisation $\operatorname {\mathbf {E}} | (W_t)_{ab}|^2 = N^{-1}(1+t)$ with a Gaussian component.
Denote by $\lambda _1(t)\le \lambda _2(t)\le \dots \le \lambda _N(t)$ the eigenvalues of $W_t$ , and let $\{{\boldsymbol u}_i(t)\}_{i\in [N]}$ be the corresponding eigenvectors. Then it is known [Reference Bourgade and Yau10, Theorem 2.3] that $\lambda _i=\lambda _i(t)$ , ${\boldsymbol u}_i={\boldsymbol u}_i(t)$ are the unique strong solutions of the following system of stochastic differential equations:
where $B_t=(B_{ij})_{i,j\in [N]}$ is a standard real symmetric matrix Brownian motion (see, for example, [Reference Bourgade and Yau10, Definition 2.1]).
Note that the flow for the diagonal overlaps , by equation (4.4), naturally also depends on the off-diagonal overlap . Hence, even if we are only interested in diagonal overlaps, our analysis must also handle off-diagonal overlaps. In particular, this implies that there is no closed differential equation for only diagonal or only off-diagonal overlaps. However, in [Reference Bourgade, Yau and Yin12], Bourgade, Yau and Yin proved that the perfect matching observable $f_{{\boldsymbol \lambda },t}$ , which is presented in equation (4.6) below, satisfies a parabolic PDE (see equation (4.10) below). We now describe how the observable $f_{{\boldsymbol \lambda },t}$ is constructed.
4.1 Perfect matching observables
Without loss of generality for the rest of the paper, we assume that A is traceless, : that is, $A=\mathring {A}$ . We introduce the shorthand notation for the eigenvector overlaps
To compute the moments, we will consider monomials of eigenvector overlaps of the form $\prod _k p_{i_k j_k}$ , where each index occurs an even number of times. We start by introducing a particle picture and a certain graph that encode such monomials: each particle on the set of integers $[N]$ corresponds to two occurrences of an index i in the monomial product. This particle picture was introduced in [Reference Bourgade and Yau10] and heavily used in [Reference Bourgade, Yau and Yin12, Reference Marcinek43]. Each particle configuration is encoded by a function ${\boldsymbol \eta }:[N] \to \mathbf {N}_0$ , where $\eta _j:={\boldsymbol \eta }(j)$ denotes the number of particles at the site j and $n({\boldsymbol \eta }):=\sum _j \eta _j= n$ is the total number of particles. We denote the space of n-particle configurations by $\Omega ^n$ . Moreover, for any index pair $i\ne j\in [N]$ , we define ${\boldsymbol \eta }^{ij}$ to be the configuration obtained moving a particle from site i to site j; if there is no particle in i, then ${\boldsymbol \eta }^{ij}:={\boldsymbol \eta }$ .
We now define the perfect matching observable (introduced in [Reference Bourgade, Yau and Yin12]) for any given configuration ${\boldsymbol \eta }$ :
with n being the number of particles in the configuration ${\boldsymbol \eta }$ . Here $\mathcal {G}_{\boldsymbol \eta }$ denotes the set of perfect matchings on the complete graph with vertex set
and
where $e=\{(i_1,a_1),(i_2,a_2)\}\in \mathcal {V}_{\boldsymbol \eta }^2$ , and $\mathcal {E}(G)$ denotes the edges of G. Note that in equation (4.6), we took the conditioning on the entire flow of eigenvalues, ${\boldsymbol \lambda } =\{\boldsymbol \lambda (t)\}_{t\in [0,T]}$ for some fixed $T>0$ . From now on, we will always assume that $T\ll 1$ (even if not stated explicitly).
We always assume that the entire eigenvalue trajectory $\{\boldsymbol \lambda (t)\}_{t\in [0,T]}$ satisfies the usual rigidity estimate asserting that the eigenvalues are very close to the deterministic quantiles of the semicircle law with very high probability. To formalise it, we define
for any $\xi>0$ , where $\widehat {i}:=i\wedge (N+1-i)$ . Here $\gamma _i(t)$ denote the quantiles of $\rho _t$ , defined by
where $\rho _t(x)= \frac {1}{2(1+t)\pi }\sqrt {(4(1+t)^2-x^2)_+}$ is the semicircle law corresponding to $W_t$ . Note that $|\gamma _i(t)-\gamma _i(s)|\lesssim |t-s|$ for any bulk index i and any $t,s\ge 0$ .
The well-known rigidity estimate (see, for example, [Reference Erdős, Knowles, Yau and Yin24, Theorem 7.6] or [Reference Erdős, Yau and Yin29]) asserts that
for any (small) $\xi>0$ and (large) $D>0$ . This was proven for any fixed t: for example, in [Reference Erdős, Knowles, Yau and Yin24, Theorem 7.6] or [Reference Erdős, Yau and Yin29], the extension to all t follows by a grid argument together with the fact that ${\boldsymbol \lambda }(t)$ is stochastically $1/2$ -Hölder in t, which follows by Weyl’s inequality
with $s\le t$ and $U_1,U_2$ being independent GUE/GOE matrices that are also independent of W.
By [Reference Bourgade, Yau and Yin12, Theorem 2.6], we know that the perfect matching observable $f_{{\boldsymbol \lambda },t}$ is a solution of the following parabolic discrete PDE
where
Note that the number of particles $n=n({\boldsymbol \eta })$ is preserved under the flow of equation (4.10). The eigenvalue trajectories are fixed in this proof; hence we will often omit ${\boldsymbol \lambda }$ from the notation: for example, we will use $f_t=f_{{\boldsymbol \lambda }, t}$ , and so on.
The main technical input in the proof of Theorem 2.8 is the following result (compare to [Reference Cipolloni, Erdős and Schröder17, Proposition 3.2]):
Proposition 4.1. For any $n\in \mathbf {N}$ , there exists $c(n)>0$ such that for any $\epsilon>0$ , and for any $T\ge N^{-1+\epsilon }$ , it holds
with very high probability, where the supremum is taken over configurations ${\boldsymbol \eta } \in \Omega ^n $ supported in the bulk: that is, such that $\eta _i=0$ for $i\notin [\delta N, (1-\delta ) N]$ , with $\delta>0$ from Theorem 2.8. The implicit constant in equation (4.13) depends on n, $\epsilon $ , $\delta $ .
We are now ready to prove Theorem 2.8.
Proof of Theorem 2.8.
Fix $i\in [\delta N,(1-\delta ) N]$ . Then the convergence in equation (2.9) follows immediately from equation (4.13), choosing ${\boldsymbol \eta }$ to be the configuration with $\eta _i=n$ and all other $\eta _j=0$ , together with a standard application of the Green function comparison theorem (GFT), relating the eigenvectors/eigenvalues of $W_T$ to those of W; see Appendix B, where we recall the GFT argument for completeness. We defer the interested reader to [Reference Cipolloni, Erdős and Schröder17, Proof of Theorem 2.2] for a more detailed proof.
4.2 DBM analysis
Since the current DBM analysis of equation (4.10) heavily relies on [Reference Cipolloni, Erdős and Schröder17, Section 4], before starting it, we introduce an equivalent representation of equation (4.6) used in [Reference Cipolloni, Erdős and Schröder17] (which itself is based on the particles representation from [Reference Marcinek43]).
Fix $n\in \mathbf {N}$ , and consider configurations ${\boldsymbol \eta }\in \Omega ^n$ : that is, such that $\sum _j\eta _j=n$ . We now give an equivalent representation of equations (4.10) and (4.11) that is defined on the $2n$ -dimensional lattice $[N]^{2n}$ instead of configurations of n particles (see [Reference Cipolloni, Erdős and Schröder17, Section 4.1] for a more detailed description). Let ${\boldsymbol x}\in [N]^{2n}$ , and define the configuration space
where
for all $i\in \mathbf {N}$ .
The correspondence between these two representations is given by
Note that ${\boldsymbol x}$ uniquely determines ${\boldsymbol \eta }$ , but ${\boldsymbol \eta }$ determines only the coordinates of ${\boldsymbol x}$ as a multiset and not its ordering. Let $\phi \colon \Lambda ^n\to \Omega ^n$ , $\phi ({\boldsymbol x})={\boldsymbol \eta }$ be the projection from the ${\boldsymbol x}$ -configuration space to the ${\boldsymbol \eta }$ -configuration space using equation (4.16). We will then always consider functions g on $[N]^{2n}$ that are push-forwards of some function f on $\Omega ^n$ , $g= f\circ \phi $ : that is, they correspond to functions on the configurations
In particular, g is supported on $\Lambda ^n$ , and it is equivariant under permutation of the arguments: that is, it depends on ${\boldsymbol x}$ only as a multiset. We thus consider the observable
where $ f_{{\boldsymbol \lambda },t}$ was defined in equation (4.6).
Using the ${\boldsymbol x}$ -representation space, we can now write the flow of equations (4.10) and (4.11) as follows:
where
with ${\boldsymbol e}_a(c)=\delta _{ac}$ , $a,c\in [2n]$ . This flow is a map of functions defined on $\Lambda ^n\subset [N]^{2n}$ , and it preserves equivariance.
We now define the scalar product and the natural measure on $\Lambda ^n$ :
as well as the norm on $L^p(\Lambda ^n)$ :
By [Reference Marcinek43, Appendix A.2], it follows that the operator $\mathcal {L}=\mathcal {L}(t)$ is symmetric with respect to the measure $\pi $ , and it is a negative operator on $L^2(\Lambda ^n)$ with Dirichlet form
Let $\mathcal {U}(s,t)$ be the semigroup associated to $\mathcal {L}$ : that is, for any $0\le s\le t$ , it holds
4.2.1 Short-range approximation
Most of our DBM analysis will be completely local; hence we will introduce a short-range approximation $h_t$ (see its definition in equation (4.26) below) of $g_t$ that will be exponentially small, evaluated on ${\boldsymbol x}$ s that are not fully supported in the bulk.
Recall the definition of the quantiles $\gamma _i(0)$ from equation (4.9). Then we define the sets
which correspond to indices and spectral range in the bulk, respectively. From now on, we fix a point ${\boldsymbol y}\in \mathcal {J}$ and an N-dependent parameter K such that $1\ll K\le \sqrt {N}$ . Next, we define the averaging operator as a simple multiplication operator by a ‘smooth’ cut-off function:
with . Additionally, fix an integer $\ell $ with $1\ll \ell \ll K$ , and define the short-range coefficients
where $c_{ij}(t)$ is defined in equation (4.12). The parameter $\ell $ is the length of the short-range interaction.
The short-range approximation $h_t=h_t({\boldsymbol x})$ of $g_t$ is defined as the unique solution of the parabolic equation
where
Since K, ${\boldsymbol y}$ and $\ell $ are fixed for the rest of this section, we will often omit them from the notation. We conclude this section defining the transition semigroup $\mathcal {U}_{\mathcal {S}}(s,t)=\mathcal {U}_{\mathcal {S}}(s,t;\ell )$ associated to the short-range generator $\mathcal {S}(t)$ .
4.2.2 $L^2$ -bound
By standard finite speed propagation estimates (see [Reference Cipolloni, Erdős and Schröder17, Proposition 4.2, Lemmata 4.3–4.4]), we conclude that
Lemma 4.2. Let $0\le s_1\le s_2\le s_1+\ell N^{-1}$ and f be a function on $\Lambda ^n$ . Then for any ${\boldsymbol x}\in \Lambda ^n$ supported on $\mathcal {J}$ , it holds
for any small $\xi>0$ . The implicit constant in equation (4.13) depends on n, $\epsilon $ , $\delta $ .
In particular, this lemma shows that the observable $g_t$ and its short-range approximation $h_t$ are close to each other up to times $t\ll \ell /N$ ; hence to prove Proposition 4.1 will be enough to estimate $h_t$ . First in Proposition 4.4 below we will prove a bound in the $L^2$ -sense that will be enhanced to an $L^{\infty }$ bound by standard parabolic regularity arguments.
Define the event $\widehat {\Omega }$ on which the local laws for certain products of resolvents and traceless matrices A hold: that is, for a small $\omega>2\xi >0$ , we define
where
, $\rho _{i,t}:=|\Im m_t(z_i)|$ and $\rho _t^*:=\max _i\rho _{i,t}$ . Theorem 2.2 shows that $\widehat {\Omega }$ is a very high-probability event by using a standard grid argument for the spectral parameters and stochastic continuity in the time parameter. Note that by the rigidity given by equation (4.8) and the spectral theorem, we have (recall the definition of $\gamma _i(0)$ from equation (4.9))
with $\eta _k=\eta _k(t)$ defined by $N\eta _k\rho (\gamma _{i_k}(t)+\mathrm {i} N^{-2/3})=N^{\omega }$ . In particular, since $ |\Im m_t(z_1) \Im m_t(z_2)|\lesssim \rho (z_1)\rho (z_2)$ , by the first line of equation (4.29) for $k=2$ , we have
on $\widehat {\Omega }_{\omega ,\xi }$ , which by equation (4.30), choosing $z_k=\gamma _{i_k}(t)+\mathrm {i}\eta $ , implies
simultaneously for all $i,j\in [N]$ and $0\le t\le T$ . We recall that the quantiles $\gamma _i(t)$ are defined in equation (4.9).
Remark 4.3. The set $\widehat {\Omega }$ defined in equation (4.29) is slightly different from its analogueFootnote 10 in [Reference Cipolloni, Erdős and Schröder17, Eq. (4.20)]. First, all the error terms now explicitly depend on , whilst in [Reference Cipolloni, Erdős and Schröder17, Eq. (4.20)], we just bounded the error terms using the operator norm of A (which was smaller than $1$ in [Reference Cipolloni, Erdős and Schröder17, Eq. (4.20)]). Second, we have a slightly weaker bound (compared to [Reference Cipolloni, Erdős and Schröder17, Eq. (4.20)]) for , since we now do not carry the dependence on the $\rho _{i,t}$ s optimally. As a consequence of this slightly worse bound close to the edges, we get the overlap bound in equation (4.31) instead of the optimal bound [Reference Cipolloni, Erdős and Schröder17, Equation (4.21)]; however, this difference will not cause any change in the result. We remark that the bound in equation (4.31) is optimal for bulk indices.
Proposition 4.4. For any parameters satisfying $N^{-1}\ll \eta \ll T_1\ll \ell N^{-1}\ll K N^{-1}$ and any small $\epsilon , \xi>0$ , it holds
with
uniformly for particle configuration ${\boldsymbol y}\in \Lambda ^n$ supported on $\mathcal {J}$ and eigenvalue trajectory ${\boldsymbol \lambda }$ in the high-probability event $\widetilde {\Omega }_{\xi } \cap \widehat {\Omega }_{\omega ,\xi }$ .
Proof. This proof is very similar to that of [Reference Cipolloni, Erdős and Schröder17, Proposition 4.5]; hence we will only explain the main differences. The reader should consult [Reference Cipolloni, Erdős and Schröder17] for a fully detailed proof. The key idea is to replace the operator $\mathcal {S}(t)$ in equations (4.26) and (4.27) by the following operator
where
and $a_{ij}^{\mathcal {S}}$ are their short-range version defined as in equation (4.25) and
We remark that ${\boldsymbol x}^{ij}_{ab}$ from equation (4.20) changes two entries of ${\boldsymbol x}$ per time; instead, ${\boldsymbol x}_{{\boldsymbol a}{\boldsymbol b}}^{{\boldsymbol i}{\boldsymbol j}}$ changes all the coordinates of ${\boldsymbol x}$ at the same time: that is, let ${\boldsymbol i}:=(i_1,\dots , i_n), {\boldsymbol j}:=(j_1,\dots , j_n)\in [N]^n$ , with $\{i_1,\dots ,i_n\}\cap \{j_1,\dots , j_n\}=\emptyset $ ; then ${\boldsymbol x}_{{\boldsymbol a}{\boldsymbol b}}^{{\boldsymbol i}{\boldsymbol j}}\ne {\boldsymbol x}$ iff for all $r\in [n]$ , it holds that $x_{a_r}=x_{b_r}=i_r$ . This means $\mathcal {S}(t)$ makes a jump only in one direction at a time, while $\mathcal {A}(t)$ jumps in all directions simultaneously. Technically, the replacement of $\mathcal {S}(t)$ by $\mathcal {A}(t)$ is done on the level of Dirichlet forms:
Lemma 4.5 (Lemma 4.6 of [Reference Cipolloni, Erdős and Schröder17]).
Let $\mathcal {S}(t)$ , $\mathcal {A}(t)$ be the generators defined in equations (4.27) and (4.34), respectively, and let $\mu $ denote the uniform measure on $\Lambda ^n$ for which $\mathcal {A}(t)$ is reversible. Then there exists a constant $C(n)>0$ such that
for any $h\in L^2(\Lambda ^n)$ on the very-high-probability set $\widetilde {\Omega }_{\xi } \cap \widehat {\Omega }_{\omega ,\xi }$ .
Next, combining
which follows from equation (4.26), with equation (4.37), and using that ${\boldsymbol x}_{{\boldsymbol a}{\boldsymbol b}}^{{\boldsymbol i}{\boldsymbol j}}={\boldsymbol x}$ unless ${\boldsymbol x}_{a_r}={\boldsymbol x}_{b_r}=i_r$ for all $r\in [n]$ , we conclude that
The star over $\sum $ means summation over two n-tuples of fully distinct indices. Then proceeding as in the proof of [Reference Cipolloni, Erdős and Schröder17, Proposition 4.5], we conclude that
which implies , by a simple Gronwall inequality, using that $T_1\gg \eta $ .
We point out that to go from equation (4.39) to equation (4.40), we proceed exactly as in the proof of [Reference Cipolloni, Erdős and Schröder17, Proposition 4.5] (with the additional , factors in [Reference Cipolloni, Erdős and Schröder17, Equation (4.47)] and [Reference Cipolloni, Erdős and Schröder17, Equation (4.48)], respectively) except for the estimate in [Reference Cipolloni, Erdős and Schröder17, Equation (4.43)]. The error terms in this estimate used that $|P(G)|\le N^{n\xi -n/2}$ uniformly in the spectrum, a fact that we cannot establish near the edges as a consequence of the weaker bound in equation (4.31). We now explain how we can still prove [Reference Cipolloni, Erdős and Schröder17, Equation (4.43)] in the current case. The main mechanism is that the strong bound holds for bulk indices, and when an edge index j is involved together with a bulk index i, then the kernel $a_{ij}\lesssim \eta /N$ is very small, which balances the weaker estimate on the overlap. Note that equation (4.31) still provides a nontrivial bound of order $N^{-1/3}$ for $|\langle {\boldsymbol u}_i, A {\boldsymbol u}_j\rangle |$ since $\rho (\gamma _i(t)+\mathrm {i} N^{-2/3})\gtrsim N^{-1/3}$ uniformly in $0\le t\le T$ .
We start with removing the short-range cutoff from the kernel $a_{ij}^{\mathcal {S}}(t)$ in the left-hand side of [Reference Cipolloni, Erdős and Schröder17, Equation (4.43)]:
Here $\sum _{{\boldsymbol j}}^{**}$ denotes the sum over distinct $j_1,\dots ,j_n$ such that at least one $|i_r-j_r|$ is bigger than $\ell $ .
Here the indices $i_1,\dots ,i_n$ are fixed and such that $i_l\in [\delta N,(1-\delta )N]$ for any $l\in [n]$ . We will now show that the second line in equation (4.41) is estimated by $N^{1+n\xi }\eta \ell ^{-1}$ . This is clear for the terms containing $\boldsymbol 1(n\,\,\mathrm {even})$ ; hence we now show that this bound is also valid for the terms containing $P(G)$ . We present this bound only for the case when $|j_1-i_1|>\ell $ and $|j_r-i_r|\le \ell $ for any $r\in \{2,\dots ,n\}$ . The proof in the other cases is completely analogous and so omitted. Additionally, to make our presentation easier, we assume that $n=2$ :
Here $c\le \delta /2$ is a small fixed constant so that $j_1$ is still a bulk index if $|i_1-j_1|\le cN$ . The fact that the first summation in the second line of equation (4.42) is bounded by $N^{1+n\xi }\eta \ell ^{-1}$ follows from equation (4.31): that is, that
, with very high probability, for any bulk indices $i,j$ – in particular, the bound
holds for this term. For the second summation, we have that
where we used that $a_{i_1 j_1}(t)\lesssim \eta N^{-1}$ , $\ell \ll K\ll \sqrt {N}$ and that
by equation (4.31). We point out that to go from the first to the second line of equation (4.43), we also used that $\sum _{j_2}a_{i_2 j_2}(t)\lesssim 1$ on $\widehat {\Omega }$ . This concludes the proof that the last line of equation (4.41) is bounded by $N^{1+n\xi }\eta \ell ^{-1}$ . We thus conclude that
Proceeding in a similar way – that is, splitting bulk and edge regimes and using the corresponding bounds for the overlaps – we then add back the missing indices in the summation in the second line of equation (4.44):
Finally, by equations (4.44) and (4.45), we conclude
which is exactly the same as [Reference Cipolloni, Erdős and Schröder17, Equation (4.43)]. Given equation (4.46), the remaining part of the proof of this proposition is completely analogous to the proof of [Reference Cipolloni, Erdős and Schröder17, Proposition 4.5]; the only difference is that now in [Reference Cipolloni, Erdős and Schröder17, Eq. (4.48)], using that $|m_t(z_i)|\lesssim 1$ uniformly in $0\le t\le T$ , we will have an additional error term
coming from the deterministic term in equation (4.29) (the mixed terms when we use the error term in equation (4.29) for some terms and the leading term for the remaining terms are estimated in the same way). We remark that in the first inequality, we used that
by our assumption
from Theorem 2.8. Here $\sum _{k_1+\dots +k_r=n}^*$ denotes the summation over all $k_1,\dots , k_r\ge 2$ such that there exists at least one $r_0$ such that $k_{r_0}\ge 3$ .
4.2.3 Proof of Proposition 4.1
Given the finite speed of propagation estimates in Lemma 4.2 and the $L^2$ -bound on $h_t$ from Proposition 4.4 as an input, enhancing this bound to an $L^{\infty }$ -bound and hence proving Proposition 4.1 is completely analogous to the proof of [Reference Cipolloni, Erdős and Schröder17, Proposition 3.2] presented in [Reference Cipolloni, Erdős and Schröder17, Section 4.4] and so omitted.
A Proof of Theorem 2.2 in the large d regime
The $d\ge 10$ regime is much simpler mainly because the trivial norm bound $\| G(z)\|\le 1/d$ on every resolvent is affordable. In particular, no system of master inequalities and their meticulously bootstrapped analysis are necessary; a simple induction on k is sufficient. We remark that the argument using these drastic simplifications is completely analogousFootnote 11 to [Reference Cipolloni, Erdős and Schröder18, Appendix B]; hence we will be very brief.
We now assume that equation (2.5) has been proven up to some $k-1$ in the $d\ge 10$ regime. Using equation (3.19) and estimating all resolvent chains in the right-hand side of equation (3.19) by the induction hypotheses (after splitting ), using the analogue of Lemma 3.1 to estimate in terms of the induction hypothesis, we easily obtain
in place of Lemma 3.5. In estimating the leading terms in equation (3.19), we used that $|m[z_1, z_k] - m(z_1)m(z_k) |\lesssim d^{-4}$ . Note that $N^{k/2-1}/d^k$ is the natural size of the leading deterministic term under the normalisation , and the small factor $1/Nd^2$ represents the smallness of the negligible error term. We now follow the argument in Section 3 starting from equation (3.26). For the Gaussian term in equation (3.28), we simply bound
indicating a gain of order $1/(\sqrt {N}d)$ over the natural size of the leading term in equation (A.1); this gives the main error term in equation (2.5). The modifications to the non-Gaussian terms in equation (3.27) – that is, the estimates of equations (3.30) and (3.33) – are similarly straightforward and left to the reader. This completes the proof in the remaining $d\ge 10$ regime.
B Green function comparison
The Green function comparison argument is very similar to the one presented in [Reference Cipolloni, Erdős and Schröder17, Appendix A]; hence we only explain the minor differences.
Consider the Ornstein-Uhlenbeck flow
with $\widehat {B}_t$ a real symmetric Brownian motion. Along the OU-flow in equation (B.1), the moments of the entries of $\widehat {W}_t$ remain constant. Additionally, this flow adds a small Gaussian component to W so that for any fixed T, we have
with $c=c(T)>0$ a constant very close to one as long as $T\ll 1$ and $U,\widetilde W$ are independent GOE/Wigner matrices. Now consider the solution of the flow in equation (4.1) $W_t$ with initial condition $W_0=\sqrt {1-cT}\widetilde W$ , so that
Lemma B.1. Let $\widehat {W}_t$ be the solution of equation (B.1), and let $\widehat {{\boldsymbol u}}_i(t)$ be its eigenvectors. Then for any smooth test function $\theta $ of at most polynomial growth and any fixed $\epsilon \in (0,1/2)$ , there exists an $\omega =\omega (\theta ,\epsilon )>0$ such that for any bulk index $i\in [\delta N, (1-\delta )N]$ (with $\delta>0$ from Theorem 2.8) and $t=N^{-1+\epsilon }$ , it holds that
We now show how to conclude Theorem 2.8 using the GFT result from Lemma B.1. Choose $T=N^{-1+\epsilon }$ and $\theta (x)=x^n$ for some integer $n\in \mathbf {N}$ . Then we have
for some small $c=c(n,\epsilon )>0$ , with ${\boldsymbol u}_i,\widehat {\boldsymbol u}_i(t),\boldsymbol u_i(t)$ being the eigenvectors of $W,\widehat W_t,W_t$ , respectively. This concludes the proof of Theorem 2.8. Note that in equation (B.5), we used Lemma B.1 in the first step, equation (B.3) in the second step and equation (4.13) for ${\boldsymbol \eta }$ such that $\eta _i=n$ and $\eta _j=0$ for any $j\ne i$ in the third step, using that in distribution the eigenvectors of $W_{cT}$ are equal to those of $\widetilde W_{cT/(1-cT)}$ , with $\widetilde W_t$ being the solution to the DBM flow with initial condition $\widetilde W_0=\widetilde W$ .
Proof of Lemma B.1.
The proof of this lemma is very similar to the proof of [Reference Cipolloni, Erdős and Schröder17, Appendix A]. The differences come from the somewhat different local law. First, we now systematically carry the factor instead of $\|A\|^2=1$ as in [Reference Cipolloni, Erdős and Schröder17, Appendix A], but this is automatic. Second, since the current overlap bound in equation (4.31) is somewhat weaker near the edge, we need to check that for resolvents with spectral parameters in the bulk, this will make no essential difference. This is the main purpose of repeating the standard proof from [Reference Cipolloni, Erdős and Schröder17, Appendix A] in some detail.
As a consequence of the repulsion of the eigenvalues (level repulsion), as in [Reference Knowles and Yin37, Lemma 5.2], to understand the overlap , it is enough to understand functions of with $\Im z$ slightly below $N^{-1}$ : that is, the local eigenvalue spacing. In particular, to prove equation (B.4), it is enough to show that
for $t=N^{-1+\epsilon }$ , $z=E+\mathrm {i} \eta $ for some $\zeta>0,\omega >0$ and all $\eta \ge N^{-1-\zeta }$ ; compare to [Reference Benigni6, Section 4] and [Reference Bourgade and Yau10, Appendix A].
To prove this, we define
and then use Itô’s formula:
where $\alpha ,\beta \in [N]^2$ are double indices, $w_{\alpha }(t)$ are the entries of $W_t$ , and $\partial _{\alpha }:=\partial _{w_{\alpha }}$ . Here,
denotes the joint cumulant of $w_{\alpha _1}(t), \dots , w_{\alpha _l}(t)$ , with $l\in \mathbf {N}$ . Note that by equation (2.2), it follows that $|\kappa _t(\alpha _1,\dots ,\alpha _l)|\lesssim N^{-l/2}$ uniformly in $t \ge 0$ .
By cumulant expansion, we get
where $\Omega (R)$ is an error term, easily seen to be negligible as every additional derivative gains a further factor of $N^{-1/2}$ . Then to estimate equation (B.10), we realise that $\partial _{ab}$ -derivatives of result in factors of the form $(GAG)_{ab}$ , $(GAG)_{aa}$ . For such factors, we use that
where we used that , , for any $\xi>0$ , uniformly in the spectrum by [Reference Erdős, Yau and Yin29] and Theorem 2.6, respectively. We remark that in [Reference Cipolloni, Erdős and Schröder17, Equation (A.11)], we could bound $(G_t(z_1)AG_t(z_2))_{ab}$ by $N^{1/2+\xi +2\zeta }$ as a consequence of the better bound on for indices close to the edge (however, in [Reference Cipolloni, Erdős and Schröder17, Equation (A.11)], we did not have ). While our estimate on $(GAG)_{ab}$ is now weaker by a factor $N^{1/6}$ , this is still sufficient to complete the Green function comparison argument.
Indeed, using equation (B.11) and that $|(G_t)_{ab}|\le N^{\zeta }$ , for any $\zeta>0$ , we conclude that
and so, together with
Author contributions
All the authors contributed equally.
Conflict of Interest
The authors have no conflicts of interest to declare.
Ethical standards
The research meets all ethical guidelines, including adherence to the legal requirements of the study country.
Funding statement
L.E. acknowledges support by ERC Advanced Grant ‘RMTBeyond’ No. 101020331. D.S. acknowledges the support of Dr. Max Rössler, the Walter Haefner Foundation and the ETH Zürich Foundation.