Hostname: page-component-78c5997874-fbnjt Total loading time: 0 Render date: 2024-11-10T17:41:28.258Z Has data issue: false hasContentIssue false

DIMENSION-FREE UNIFORMITY WITH APPLICATIONS, I

Published online by Cambridge University Press:  29 November 2017

József Beck*
Affiliation:
Department of Mathematics, Rutgers University, Busch Campus, Hill Center, New Brunswick, NJ 08903, U.S.A. email jbeck@math.rutgers.edu

Abstract

We prove a dimension-free strong uniformity theorem, and apply it in the configuration space of a large system of non-interacting particles, to describe the fast approach to equilibrium starting from off-equilibrium, and its long-term stability.

MSC classification

Type
Research Article
Copyright
Copyright © University College London 2017 

1 Introduction

Why does the typical time evolution of a large (mechanical) system (i.e. a system with many degrees of freedom, like gas in a container), starting from off-equilibrium, approach equilibrium in a short time, and remain in equilibrium for a very, very long time? Basically the same question was raised in physics in the second half of the nineteenth century when Maxwell, Boltzmann and Gibbs developed the foundations of statistical mechanics. In this paper, we study the same general global question about large (i.e. many-particle) systems, but our approach is completely different from the well-known probabilistic machinery of statistical mechanics. We also use probability theory, but it is not our primary tool. What we do is at the crossroads of uniform distribution (in the high-dimensional configuration space) and (large) dynamical systems. It is pure mathematics with rigorous proofs. Nevertheless, we borrow some motivations and intuitions from physics.

Consider the following concrete (idealized) mechanical model that we may call off-equilibrium ideal gas (or off-equilibrium Bernoulli model of gases). Assume that there are $N$ particles moving around in a cubic container, bouncing back and forth on the walls like billiard balls. Let $N$ be large (e.g. in the range of the Avogadro number, roughly $10^{24}$ ), so that the system imitates the motion of gas molecules in a box. Assume that the time evolution of the system starts from an explicit far-from-equilibrium initial point configuration, say Big Bang, where all particles start from the same point, or something similar to Big Bang. The particles move on straight lines like point billiards until they hit a wall–elastic collision. Two typical point-particles in the 3-space do not collide, so we assume that there is no particle–particle interaction. To determine the time evolution of the system, we have to say something about the initial velocities of the particles. We consider the most important velocity distribution in physics. Assume that the particles have three-dimensional Gaussian (normal) initial velocity distribution. The initial point configuration is explicitly given (like Big Bang); the initial velocities of different particles are chosen independently—this defines a measure (in fact, a product measure, due to independence) that makes it possible to talk about the typical time evolution of this large billiards-in-a-box system.

We focus on the following global questions. In what precise sense does the typical time evolution of this large system (i.e. off-equilibrium ideal gas) approach equilibrium? How fast is the approach to equilibrium? Does the system really remain in equilibrium for a very long time?

Here equilibrium means spatial equilibrium, since the Gaussian initial velocity distribution is already the equilibrium velocity distribution, as discovered by Maxwell.

Statistical mechanics has a complete theory for the probabilistic model of the equilibrium ideal gas, based on the partition function. Unfortunately it is not clear at all, to say the least, how one can extend that theory for the non-equilibrium case. Especially that our model is mainly deterministic, due to the billiard orbits, and only partly random, due to the independent choice of velocities. This is why we write this paper.

Here is a brief summary of Boltzmann’s answer to the basic questions. According to Boltzmann, the first step is to switch from the three-dimensional cubic container, that we like to call the particle space, to the $6N$ -dimensional phase space (each particle has 3 space coordinates and 3 moments), where a single point represents the microstate of the whole $N$ -particle system at a given time instant  $t$ . Boltzmann introduced the concept of macrostate, which are the observable states; a macrostate is basically a large set of microstates that look the same. Boltzmann’s key insight was that the equilibrium macrostate must contain vastly more microstates than any off-equilibrium macrostate. Thus it is reasonable to expect that a system starting from off-equilibrium, which represents an atypical microstate, evolves through macrostates occupying progressively larger volumes in phase space, and eventually reaches the equilibrium macrostate. Boltzmann’s explanation why the system remains in the equilibrium macrostate for a very long time was to combine the so-called probability postulate with the fact that the equilibrium macrostate represents the overwhelming majority of the phase space. Boltzmann’s probability postulate states that the larger the macrostate, the greater the probability of finding a microstate in it. And it complements with Boltzmann’s classical definition that the entropy of the system is the logarithm of the probability of its macrostate, carved on his gravestone.

Well, this is a great insight/intuition. Many physicists find Boltzmann’s argument perfectly convincing and settles the issue. Mathematicians, on the other hand, point out that Boltzmann’s argument is nowhere near a mathematical proof, and call it a framework, a first step toward the solution.

The first logical difficulty in Boltzmann’s argument is that in physics macrostates are well defined only in equilibrium. When the system is far from equilibrium, it is not clear at all how one defines macrostates. Here we do not use the vague concept of macrostate at all. In this paper, a state of a system always means a microstate, which is simply all positions and all velocities at a time instant.

The second difficulty is that it basically ignores the dynamical aspect. To put it in a nutshell, if a system is in an atypical microstate, then it does not automatically evolve into an equilibrium macrostate just because the latter is typical!

To solidify Boltzmann’s argument, we have to identify properties of the dynamics of the system that guarantee the evolution of how atypical (i.e. unlikely) microstates evolve into typical (i.e. very likely) microstates. We have to answer the question why a probability argument works for the short-time dynamics of the system? Thus we need to justify the probability postulate on a realistic time scale, i.e. to justify the approximation “phase-space average” ${\approx}$ “short-time average” in a quantitative form. We may call it the short-time ergodic problem.

We may summarize this paper in one sentence: to justify the probability postulate, we solve the short-time ergodic problem by proving, and repeatedly applying, a short-time ergodic theorem.

The reader is probably wondering why we need a short-time ergodic theorem, and why traditional ergodic theory, in particular Birkhoff’s theorem, does not solve the short-time ergodic problem. Indeed, the message of Birkhoff’s well-known individual ergodic theorem is precisely the equality “phase-space average” $=$ “asymptotic time average”.

Well, the first problem with traditional ergodic theory is that asymptotic time average entails taking the infinite time limit (i.e. $t\rightarrow \infty$ ), and since Birkhoff’s theorem does not give any estimation on the error term, it does not say anything about realistic time. The second problem is that (traditional measure-theoretic) ergodic theory ignores zero measure sets, and a fixed initial point configuration (like Big Bang) represents a zero set in the phase space.

To solve the short-time ergodic problem, we do not use traditional ergodic theory. Our tool is dimension-free strong uniformity in the configuration space. In our model there is no particle–particle interaction, explaining why it suffices to study the $3N$ -dimensional configuration space instead of the $6N$ -dimensional phase space.

We start our rigorous discussion with strong uniformity, elaborating on its three different aspects: start-free strong uniformity, complexity-free strong uniformity, and dimension-free strong uniformity. The three different aspects are all crucial to our goal of describing the fast approach to equilibrium in large off-equilibrium systems.

The traditional theory of uniform distribution, which is built around Weyl’s criterion and nice test sets such as axis-parallel rectangles and boxes, does not go beyond Riemann integral; see [Reference Drmota and Tichy2]. Strong uniformity (in a broad sense) refers to the extension from Riemann integral to Lebesgue measure/integral. It seems a minor change, but it has surprisingly far-reaching consequences, as we explain below.

The subject of strong uniformity started with an old conjecture of Khinchin [Reference Khinchin4] from 1923: prove that, given a Lebesgue measurable set $S\subset [0,1)$ , the sequence $\unicode[STIX]{x1D6FC},2\unicode[STIX]{x1D6FC},3\unicode[STIX]{x1D6FC},\ldots$ is uniformly distributed modulo 1 with respect to $S$ for almost every  $\unicode[STIX]{x1D6FC}$ , i.e.

(1.1) $$\begin{eqnarray}\lim _{n\rightarrow \infty }\frac{1}{n}\mathop{\sum }_{\substack{ 1\leqslant k\leqslant n \\ \{k\unicode[STIX]{x1D6FC}\}\in S}}1=\operatorname{length}(S)\quad \text{for almost every}~\unicode[STIX]{x1D6FC}.\end{eqnarray}$$

Here, as usual, $0\leqslant \{x\}<1$ denotes the fractional part of a real number  $x$ , and for simplicity $\operatorname{length}$ denotes the one-dimensional Lebesgue measure.

Khinchin’s conjecture remained for several decades among the most famous open problems in uniform distribution, and everybody expected a positive solution. It was thus a big surprise when Marstrand [Reference Marstrand5] disproved it in 1970 by constructing an open set $S\subset [0,1)$ with $\operatorname{length}(S)<1$ such that

$$\begin{eqnarray}\limsup _{n\rightarrow \infty }\frac{1}{n}\mathop{\sum }_{\substack{ 1\leqslant k\leqslant n \\ \{k\unicode[STIX]{x1D6FC}\}\in S}}1=1\quad \text{for every}~\unicode[STIX]{x1D6FC}.\end{eqnarray}$$

The fact that open sets are the simplest in the Borel hierarchy makes Marstrand’s negative result even more surprising.

Marstrand’s result is the bad news that demonstrates that Khinchin was too optimistic. The good news is that recently we succeeded in saving Khinchin’s conjecture in the continuous case by replacing the unit interval $[0,1)$ modulo one with the two-dimensional unit torus $[0,1)^{2}=I^{2}$ , and replacing the arithmetic progression $\unicode[STIX]{x1D6FC},2\unicode[STIX]{x1D6FC},3\unicode[STIX]{x1D6FC},\ldots \,$ , starting from 0, with the straight line $(t\cos \unicode[STIX]{x1D703},t\sin \unicode[STIX]{x1D703})$ , $t\geqslant 0$ , starting from the origin $(0,0)$ with angle  $\unicode[STIX]{x1D703}$ .

Uniformity of the torus line $(t\cos \unicode[STIX]{x1D703},t\sin \unicode[STIX]{x1D703})$ modulo one relative to a set $S$ means that the set $T_{S}(\unicode[STIX]{x1D703})=\operatorname{length}\{0\leqslant t\leqslant T:(t\cos \unicode[STIX]{x1D703},t\sin \unicode[STIX]{x1D703})\in S\hspace{0.2em}{\rm mod}\hspace{0.2em}1\}$ satisfies

$$\begin{eqnarray}\lim _{T\rightarrow \infty }\frac{T_{S}(\unicode[STIX]{x1D703})-\operatorname{area}(S)T}{T}=0.\end{eqnarray}$$

We could in fact prove much more.

Theorem A (Beck [Reference Beck1]).

Let $S\subset [0,1)^{2}$ be an arbitrary Lebesgue measurable set in the unit square with $0<\operatorname{area}(S)<1$ . Then for every $\unicode[STIX]{x1D700}>0$ ,

(1.2) $$\begin{eqnarray}\lim _{T\rightarrow \infty }\frac{T_{S}(\unicode[STIX]{x1D703})-\operatorname{area}(S)T}{(\log T)^{3+\unicode[STIX]{x1D700}}}=0\end{eqnarray}$$

for almost every angle $\unicode[STIX]{x1D703}$ , where $\operatorname{area}$ denotes the two-dimensional Lebesgue measure.

The polylogarithmic error term in (1.2) is shockingly small compared to the linear main term $\operatorname{area}(S)T$ . Thus we may call Theorem A a superuniformity result.

In [Reference Beck1] we also studied the case of higher dimensions. Let $S\subset [0,1)^{d}=I^{d}$ be an arbitrary Lebesgue measurable set in the unit cube of dimension $d\geqslant 3$ , and assume that $0<\operatorname{vol}_{d}(S)<1$ , where for simplicity $\operatorname{vol}_{d}$ denotes the $d$ -dimensional Lebesgue measure, and $\operatorname{vol}$ denotes the three-dimensional Lebesgue measure. Let $\mathbf{e}\in \mathbf{S}^{d-1}$ be an arbitrary unit vector in the $d$ -dimensional Euclidean space  $\mathbf{R}^{d}$ , where $\mathbf{S}^{d-1}$ denotes the unit sphere in  $\mathbf{R}^{d}$ . Consider the straight line $t\mathbf{e}$ , $t\geqslant 0$ , starting from the origin $\mathbf{0}\in \mathbf{R}^{d}$ . Let $T_{S}(\mathbf{e})$ denote the time the torus line $t\mathbf{e}$ (modulo one) spends in the given set $S$ as $0\leqslant t\leqslant T$ .

Uniformity of the torus line $t\mathbf{e}$ (modulo one) relative to $S$ means that

$$\begin{eqnarray}\lim _{T\rightarrow \infty }\frac{T_{S}(\mathbf{e})-\operatorname{vol}_{d}(S)T}{T}=0.\end{eqnarray}$$

We could prove much more.

Theorem B (Beck [Reference Beck1]).

  1. (i) Let $S\subset [0,1)^{3}$ be an arbitrary Lebesgue measurable set in the unit cube with $0<\operatorname{vol}_{3}(S)<1$ . Then for every $\unicode[STIX]{x1D700}>0$ ,

    (1.3) $$\begin{eqnarray}\lim _{T\rightarrow \infty }\frac{T_{S}(\mathbf{e})-\operatorname{vol}(S)T}{T^{1/4}(\log T)^{3+\unicode[STIX]{x1D700}}}=0\end{eqnarray}$$
    for almost every direction $\mathbf{e}\in \mathbf{S}^{2}$ in the $3$ -space $\mathbf{R}^{3}$ .
  2. (ii) In the $d$ -dimensional case $S\subset [0,1)^{d}$ with $d\geqslant 4$ , we have the perfect analogue of (1.3) where the factor $T^{1/4}$ in (1.3) is replaced by $T^{1/2-1/2(d-1)}$ for almost every direction $\mathbf{e}\in \mathbf{S}^{d-1}$ in the $d$ -space $\mathbf{R}^{d}$ .

In Theorems A and B, the upper bounds on the error do not depend on the complexity (or ugliness) of the test set  $S$ . Also, the starting point can be arbitrary, since the torus is translation invariant. We may call Theorems A and B complexity-free and start-free strong uniformity results.

Our basic tool to describe the time evolution of a large system is strong uniformity in the configuration space. For realistic gas models, the number of particles $N$ is in the range of the Avogadro number (around $10^{24}$ ), so the corresponding configuration space has very high dimension  $3N$ . Theorem B is about arbitrary dimension  $d$ , but it does not help, because there is an unspecified constant factor $c_{0}(d)$ in the upper bound for the discrepancy. Unfortunately our proof of Theorem B in [Reference Beck1] gives a very weak exponential upper bound on  $c_{0}(d)$ , which makes it totally useless in high-dimensional applications.

The optimal way to eliminate the dimension problem would be to prove an upper bound on the discrepancy that does not depend on the dimension. Surprisingly, we can actually do that. We formulate a result which is basically such a dimension-free upper bound.

Note that the diameter of the $d$ -dimensional unit cube $[0,1)^{d}$ is $\sqrt{d}$ . Moreover, it is an easy exercise in probability theory to prove that the distance between two randomly chosen points in $[0,1)^{d}$ is $\sqrt{d/6}+o(\sqrt{d})$ with probability close to one if $d$ is large. These two facts explain why it is natural to modify the time-discrepancy

(1.4) $$\begin{eqnarray}\int _{0}^{T}f(t\mathbf{e})\,dt-T\int _{I^{d}}f\,d\text{V}\end{eqnarray}$$

of a test function $f$ by replacing $t$ with $t\sqrt{d}$ , and to study

(1.5) $$\begin{eqnarray}\int _{0}^{T}f(t\sqrt{d}\mathbf{e})\,dt-T\int _{I^{d}}f\,d\text{V}\end{eqnarray}$$

instead, where $\mathbf{e}\in \mathbf{S}^{d-1}$ is a $d$ -dimensional unit vector. The effect of the switch from (1.4) to (1.5) is modest in small dimensions, but becomes substantial in very large dimensions.

In fact, we need the following slightly more general notation: for $0\leqslant T_{1}<T_{2}$ and $\mathbf{v}\in \mathbf{R}^{d}\setminus \mathbf{0}$ , consider the time discrepancy

(1.6) $$\begin{eqnarray}D_{f}(\mathbf{v};T_{1},T_{2})=\int _{T_{1}}^{T_{2}}f(t\mathbf{v})\,dt-(T_{2}-T_{1})\int _{I^{d}}f\,d\text{V}.\end{eqnarray}$$

In Theorem 1.1 below we just consider the special case $f=\unicode[STIX]{x1D712}_{S}$ , where $S\subset [0,1)^{d}$ , and write $D_{f}=D_{S}$ .

Since $1/2-1/2(d-1)<1/2$ , Theorem B immediately implies the following soft qualitative result: let $S\subset [0,1)^{d}$ be an arbitrary Lebesgue measurable set with $0<\operatorname{vol}_{d}(S)<1$ . Then for almost every direction $\mathbf{e}\in \mathbf{S}^{d-1}$ in the $d$ -space $\mathbf{R}^{d}$ ,

(1.7) $$\begin{eqnarray}D_{S}(\sqrt{d}\mathbf{e};0,T)=O(T^{1/2}).\end{eqnarray}$$

We shall establish the following dimension-free hard quantitative version of the soft qualitative estimate (1.7).

Theorem 1.1. Let $S\subset [0,1)^{d}$ be an arbitrary measurable test set in the unit torus with $d\geqslant 10^{3}$ . Let $p=\operatorname{vol}_{d}(S)$ be the $d$ -dimensional Lebesgue measure of  $S$ . Let $T=T_{0}=T_{0}(d)>0$ be the solution of the equation

$$\begin{eqnarray}100\,dT\text{e}^{-\unicode[STIX]{x1D70B}^{2}T^{2}/2}=1.\end{eqnarray}$$

Note that

$$\begin{eqnarray}T_{0}=T_{0}(d)=\frac{\sqrt{2}}{\unicode[STIX]{x1D70B}}\sqrt{\log d}+o_{d}(1)\quad \text{as}~d\rightarrow \infty .\end{eqnarray}$$

Given any $0<\unicode[STIX]{x1D700}<1$ , there exists a measurable subset ${\mathcal{A}}={\mathcal{A}}(d;\unicode[STIX]{x1D700})\subset \mathbf{S}^{d-1}$ of the (hyper)sphere such that the normalized surface area of ${\mathcal{A}}$ is greater than $1-\unicode[STIX]{x1D700}$ , and the inequality

(1.8) $$\begin{eqnarray}|D_{S}(\sqrt{d}\mathbf{e};T_{0},T_{1})|\leqslant \sqrt{p(1-p)}\biggl(\frac{50}{\sqrt{\unicode[STIX]{x1D700}}}\sqrt{T_{1}}(2+\log _{2}T_{1})^{3}+5\biggr)\end{eqnarray}$$

holds for every $\mathbf{e}\in {\mathcal{A}}$ and every $T_{1}>\max \{3T_{0},10\}$ .

Remark.

It is easy to extend Theorem 1.1 to square-integrable test functions $f\in L_{2}$ , and we leave it to the reader. The requirement $d\geqslant 10^{3}$ is purely technical, and the result should be true for all dimensions less than  $10^{3}$ .

The crucial fact here is that the upper bound on the error in (1.8) does not depend on the dimension  $d$ . This is why we call Theorem 1.1 a dimension-free result, despite the fact that the threshold $T_{0}=T_{0}(d)$ does depend on the dimension—we return to this issue below.

Also the upper bound on the error in (1.8) does not depend on the complexity (or ugliness) of the test set  $S$ . The common starting point of the torus lines $t\sqrt{d}\mathbf{e}$ , $t\geqslant 0$ , is the origin, but of course we could choose any other common starting point, since the torus is translation invariant. The order of the error term

$$\begin{eqnarray}\sqrt{T_{1}}(2+\log _{2}T_{1})^{3}=T_{1}^{1/2+o(1)}\end{eqnarray}$$

is nearly square-root size, which is basically best possible. Indeed, in [Reference Beck1] we proved that the error term $T^{1/2-1/2(d-1)}$ in Theorem B is best possible, apart from a polylogarithmic factor, and the exponent converges to $1/2$ as $d\rightarrow \infty$ . Square-root size upper bound for the error term is very good, since uniformity requires much less: any sublinear upper bound suffices. These facts justify the claim that Theorem 1.1 is a dimension-free, start-free and complexity-free strong uniformity result.

In fact, the only dependence on the dimension $d$ in Theorem 1.1 is the threshold $T_{0}=T_{0}(d)$ , which exhibits an extremely weak dependence. Indeed, $T_{0}(d)$ is shockingly small, only a square-root logarithmic function of  $d$ . For example, if $d=10^{1000}$ then $T_{0}\leqslant 25$ .

Perhaps the reader is wondering whether or not we need the strange threshold $T_{0}=T_{0}(d)$ in Theorem 1.1. The answer is yes, and we shall prove it in part II in the sequel.

We may call $T_{0}=T_{0}(d)$ in Theorem 1.1 the threshold for configuration space equilibrium, when the typical time evolution of a system with $N=d/3$ particles and Gaussian initial velocity distribution reaches equilibrium in the configuration space.

We derive Theorem 1.1 from the rather complicated Theorem 1.2 below, and carry out the routine deduction in part II in the sequel. Theorem 1.2 concerns the Gaussian square-integral of (1.6), given by

(1.9) $$\begin{eqnarray}\displaystyle \unicode[STIX]{x1D6E5}_{f}^{2}(\text{Gauss};T_{1},T_{2}) & = & \displaystyle (2\unicode[STIX]{x1D70B})^{-d/2}\int _{\mathbf{R}^{d}}|D_{f}(\mathbf{v};T_{1},T_{2})|^{2}\text{e}^{-|\mathbf{v}|^{2}/2}\,d\mathbf{v}\nonumber\\ \displaystyle & = & \displaystyle \int _{\mathbf{S}^{d-1}}\int _{0}^{\infty }|D_{f}(\unicode[STIX]{x1D70C}\mathbf{e};T_{1},T_{2})|^{2}\frac{\unicode[STIX]{x1D70C}^{d-1}\text{e}^{-\unicode[STIX]{x1D70C}^{2}/2}}{C_{d}}\,d\unicode[STIX]{x1D70C}\,d\unicode[STIX]{x1D708}_{d-1}^{\ast }(\mathbf{e}),\nonumber\\ \displaystyle & & \displaystyle\end{eqnarray}$$

where $d\unicode[STIX]{x1D708}_{d-1}^{\ast }(\mathbf{e})$ denotes the integration with respect to the normalized surface area on the sphere $\mathbf{S}^{d-1}$ , so that $\unicode[STIX]{x1D708}_{d-1}^{\ast }(\mathbf{S}^{d-1})=1$ , and where

(1.10) $$\begin{eqnarray}C_{d}=\left\{\begin{array}{@{}ll@{}}(d-2)(d-4)\ldots 2\quad & \text{if}~d~\text{is even},\\ \sqrt{\unicode[STIX]{x1D70B}/2}(d-2)(d-4)\ldots 1\quad & \text{if}~d~\text{is odd}.\end{array}\right.\end{eqnarray}$$

Note that $\unicode[STIX]{x1D70C}^{d-1}/C_{d}$ in (1.9) is the surface area of the sphere of radius $\unicode[STIX]{x1D70C}$ in the $d$ -space, and the vector $\mathbf{v}=\unicode[STIX]{x1D70C}\mathbf{e}$ has $d$ -dimensional standard Gaussian normal distribution. This explains the reference to Gauss in $\unicode[STIX]{x1D6E5}_{f}^{2}(\text{Gauss};T_{1},T_{2})$ .

Theorem 1.2. Let $1\leqslant U<W$ be real numbers and $d\geqslant 2$ be an integer such that

(1.11) $$\begin{eqnarray}\text{e}^{\unicode[STIX]{x1D70B}^{2}U^{2}/2}\geqslant 3\,dU.\end{eqnarray}$$

Then for every test function $f\in L_{2}([0,1)^{d})$ ,

(1.12) $$\begin{eqnarray}\displaystyle & & \displaystyle \unicode[STIX]{x1D6E5}_{f}^{2}(\text{Gauss};U,W)\nonumber\\ \displaystyle & & \displaystyle \quad =(2\unicode[STIX]{x1D70B})^{-d/2}\int _{\mathbf{R}^{d}}\biggl(\int _{U}^{W}f(t\mathbf{v})\,dt-(W-U)\int _{I^{d}}f\,d\text{V}\biggr)^{2}\text{e}^{-|\mathbf{v}|^{2}/2}\,d\mathbf{v}\nonumber\\ \displaystyle & & \displaystyle \quad \leqslant 10\unicode[STIX]{x1D70E}_{0}^{2}(f)\lceil \log _{2}\frac{W}{U}\rceil (W-U+1),\end{eqnarray}$$

where

$$\begin{eqnarray}\unicode[STIX]{x1D70E}_{0}^{2}(f)=\int _{I^{d}}|f|^{2}\,d\text{V}-\biggl|\int _{I^{d}}f\,d\text{V}\biggr|^{2}=\int _{I^{d}}\biggl|f(\mathbf{y})-\int _{I^{d}}f\,d\text{V}\biggr|^{2}\,d\mathbf{y}.\end{eqnarray}$$

Here $\lceil z\rceil$ denotes the upper integer part of a real number  $z$ .

Note that in the special case of a characteristic function $f=\unicode[STIX]{x1D712}_{S}$ , $S\subset I^{d}$ , the term $\unicode[STIX]{x1D70E}_{0}^{2}(f)$ reduces to $\unicode[STIX]{x1D70E}_{0}^{2}(\unicode[STIX]{x1D712}_{S})=\operatorname{vol}_{d}(S)(1-\operatorname{vol}_{d}(S))$ . Theorem 1.2 is also dimension-free, complexity-free and start-free.

The next section is an applications of Theorem 1.2 in the very-high-dimensional configuration space.

The value of the constant 10 is of course accidental, and it is basically irrelevant in the applications. Note that $\unicode[STIX]{x1D6E5}_{f}^{2}(\text{Gauss};U,W)$ is the average square-error and, intuitively speaking, we may refer to

$$\begin{eqnarray}(\unicode[STIX]{x1D70E}_{0}^{2}(f)(W-U))^{1/2}=\unicode[STIX]{x1D70E}_{0}(f)\sqrt{W-U}\end{eqnarray}$$

as the inevitable random error.

Condition (1.11) is equivalent to

(1.13) $$\begin{eqnarray}U\geqslant \frac{\sqrt{2\log d}}{\unicode[STIX]{x1D70B}}+o(1).\end{eqnarray}$$

The square-root-logarithmic (1.13) is the (shockingly small) threshold for configuration space equilibrium.

We apply Theorem 1.2 as a short-time ergodic theorem. It justifies the approximation “configuration space average” ${\approx}$ “short-time average” in a quantitative form.

2 Application in the high-dimensional configuration space: square-root equilibrium: fast approach and long-term stability

In the off-equilibrium ideal gas model, $N$ point particles are moving around in a cubic container, say the unit cube $[0,1)^{3}$ , bouncing back and forth on the walls like billiard balls. To study the time evolution of such a large billiard system, we use the well-known geometric trick of unfolding that converts a (zigzag) billiard orbit into a torus line. For illustration the figure below shows the square billiard. Then unfolding simply means that we keep reflecting the square itself in the respective side and unfold the piecewise linear billiard path to a straight line.

Unfolding defines four reflected copies $S_{1},S_{2},S_{3},S_{4}$ of a test set $S$ , where each one of the four unit squares contains a reflected copy of the given test set $S$ . In the last step we shrink the underlying $2\times 2$ square to the unit square $I^{2}=[0,1)^{2}$ . Of course, the test set $S$ can be upgraded to any periodic test function  $f$ .

Note that the unfolding of the billiard path in the $h$ -dimensional unit cube $[0,1)^{h}$ with $h\neq 2$ can be defined in an analogous way. Formally, unfolding means the map

$$\begin{eqnarray}2\biggl\|\frac{x}{2}\biggr\|\rightarrow \biggl\{\frac{x}{2}\biggr\}\end{eqnarray}$$

applied to each coordinate, where $\Vert z\Vert$ denotes the distance of a real number $z$ from a nearest integer and $0\leqslant \{z\}<1$ denotes the fractional part of  $z$ .

It follows from unfolding that a billiard path in the $h$ -dimensional unit cube $[0,1)^{h}$ intersects a given test set $S\subset [0,1)^{h}$ at time $t$ precisely when the corresponding torus line in the $h$ -dimensional $2\times \cdots \times 2$ (hyper)cube intersects the union of the $2^{h}$ reflected copies of  $S$ . Note that each one of the $2^{h}$ unit (hyper)cubes contains a reflected copy of the given test set  $S$ . In the last step, we shrink the $h$ -dimensional $2\times \cdots \times 2$ (hyper)cube to the unit (hyper)cube $I^{h}=[0,1)^{h}$ .

A constant speed piecewise linear point billiard motion in $I^{h}$ is defined by the equation

(2.1) $$\begin{eqnarray}\mathbf{x}(t)=(x_{1}(t),\ldots ,x_{h}(t)),\qquad x_{i}(t)=2\biggl\|\frac{y_{i}+t\unicode[STIX]{x1D6FC}_{i}}{2}\biggr\|,\quad 1\leqslant i\leqslant h,\;0<t<\infty .\end{eqnarray}$$

Here $\mathbf{y}=(y_{1},\ldots ,y_{h})$ is the starting point and the non-zero vector $\unicode[STIX]{x1D736}=(\unicode[STIX]{x1D6FC}_{1},\ldots ,\unicode[STIX]{x1D6FC}_{k})$ is the initial direction.

Motivated by the central limit theorem, it is a natural intuition to visualize snapshot equilibrium in the particle space (or gas container) as a state where the system exhibits square-root size fluctuations in particle counting. More precisely, let $S\subset [0,1)^{3}$ be an arbitrary but fixed Lebesgue measurable subset of the unit cube with volume $0<\operatorname{vol}(S)<1$ , and consider the particle-counting function

(2.2) $$\begin{eqnarray}N_{S}(t)=\mathop{\sum }_{\substack{ 1\leqslant k\leqslant N \\ \mathbf{x}_{k}(t)\in S}}1,\end{eqnarray}$$

where $\mathbf{x}_{k}(t)$ is the orbit of the $k$ th billiard; see (2.1). The billiard system exhibits square-root fluctuation if the particle-counting function (2.2) differs from the expected value $N\operatorname{vol}(S)$ by $O(\sqrt{N})$ . In other words, it is a good intuition to visualize snapshot equilibrium as a square-root fluctuation equilibrium in the particle space, or simply square-root equilibrium.

Using square-root equilibrium as the definition of snapshot equilibrium, the statement once the system reaches (snapshot) equilibrium (in the particle space), it stays in (snapshot) equilibrium forever is certainly untrue for the unlimited time evolution of a typical trajectory of the system, i.e. $t\rightarrow \infty$ . Indeed, given an arbitrary initial configuration of starting points, if the initial velocity coordinates are linearly independent over the rationals (representing typical directions), then by Kronecker’s well-known density theorem, the time evolution of this individual trajectory of the system eventually violates square-root equilibrium in the worst possible way infinitely many times. We will clarify this in part II in the sequel.

We know that the $N$ -point billiard model in the cube can be reduced to the torus model via unfolding. We assume that the particles independently have Gaussian initial velocity distribution in the 3-space. In other words, the set of the $N$ particles in $I^{3}=[0,1)^{3}$ at time $t$ is

(2.3) $$\begin{eqnarray}\displaystyle {\mathcal{Y}}(\text{Gauss};\unicode[STIX]{x1D714};t) & = & \displaystyle {\mathcal{Y}}(\text{Gauss};\unicode[STIX]{x1D70C}_{1},\mathbf{e}_{1},\ldots ,\unicode[STIX]{x1D70C}_{N},\mathbf{e}_{N};t)\nonumber\\ \displaystyle & = & \displaystyle \{\mathbf{y}_{1}+\unicode[STIX]{x1D70C}_{1}t\mathbf{e}_{1},\ldots ,\mathbf{y}_{N}+\unicode[STIX]{x1D70C}_{N}t\mathbf{e}_{N}\}\hspace{0.2em}{\rm mod}\hspace{0.2em}1\nonumber\\ \displaystyle & = & \displaystyle \{\{\mathbf{y}_{1}+\unicode[STIX]{x1D70C}_{1}t\mathbf{e}_{1}\},\ldots ,\{\mathbf{y}_{N}+\unicode[STIX]{x1D70C}_{N}t\mathbf{e}_{N}\}\},\end{eqnarray}$$

where ${\mathcal{Y}}=\{\mathbf{y}_{1},\ldots ,\mathbf{y}_{N}\}\subset [0,1)^{3}$ is the $N$ -element set of initial point configuration, and the initial velocities of the particles are independent random variables having the same speed distribution with density

(2.4) $$\begin{eqnarray}g(u)=\sqrt{\frac{2}{\unicode[STIX]{x1D70B}}}u^{2}\text{e}^{-u^{2}/2},\quad 0\leqslant u<\infty ,\end{eqnarray}$$

which is the density of the speed of the three-dimensional Gaussian velocity distribution. So the trajectory of the $k$ th particle is $\mathbf{y}_{k}+\unicode[STIX]{x1D70C}_{k}t\mathbf{e}_{k}\in \mathbf{R}^{3}\hspace{0.2em}{\rm mod}\hspace{0.2em}1$ , $1\leqslant k\leqslant N$ , where

$$\begin{eqnarray}\Pr [\unicode[STIX]{x1D70C}_{k}\leqslant u]=\sqrt{\frac{2}{\unicode[STIX]{x1D70B}}}\int _{0}^{u}z^{2}\text{e}^{-z^{2}/2}\,dz.\end{eqnarray}$$

On the other hand, the curve in the configuration space $I^{d}=[0,1)^{d}$ with $d=3N$ , representing the time evolution of the system (2.3), is the straight line modulo one in $\mathbf{R}^{d}$ given by

(2.5) $$\begin{eqnarray}\vec{{\mathcal{Y}}}(\text{Gauss};\unicode[STIX]{x1D714};t)=\vec{{\mathcal{Y}}}+t\mathbf{v}(\unicode[STIX]{x1D714})\hspace{0.2em}{\rm mod}\hspace{0.2em}1,\end{eqnarray}$$

where

(2.6) $$\begin{eqnarray}\unicode[STIX]{x1D714}=(\unicode[STIX]{x1D70C}_{1},\mathbf{e}_{1},\ldots ,\unicode[STIX]{x1D70C}_{N},\mathbf{e}_{N})\in \unicode[STIX]{x1D6FA}_{\text{Gauss}}=([0,\infty )\times \mathbf{S}^{2})^{N},\end{eqnarray}$$

and

(2.7) $$\begin{eqnarray}\mathbf{v}(\unicode[STIX]{x1D714})=(\unicode[STIX]{x1D70C}_{1}\mathbf{e}_{1},\ldots ,\unicode[STIX]{x1D70C}_{N}\mathbf{e}_{N}).\end{eqnarray}$$

The product space $\unicode[STIX]{x1D6FA}_{\text{Gauss}}$ is equipped with the product measure $\unicode[STIX]{x1D707}_{\text{Gauss}}$ , where the half-line $[0,\infty )$ has the probability density function (2.4), and the sphere $\mathbf{S}^{2}$ has the normalized surface area. Here $\vec{{\mathcal{Y}}}$ denotes the $3N$ -dimensional vector

(2.8) $$\begin{eqnarray}\vec{{\mathcal{Y}}}=(\mathbf{y}_{1},\ldots ,\mathbf{y}_{N})\end{eqnarray}$$

formed from the $N$ -element set of starting points of the particles ${\mathcal{Y}}\subset [0,1)^{3}$ .

We need the well-known fact from probability theory that $\unicode[STIX]{x1D707}_{\text{Gauss}}$ is the $d$ -dimensional standard Gaussian distribution with $d=3N$ , i.e. the multidimensional Gaussian distribution is rotation invariant.

Let $B\subset I^{3}=[0,1)^{3}$ be an arbitrary but fixed measurable test set in the gas container, where $\operatorname{vol}(B)$ denotes the three-dimensional Lebesgue measure. Assume that $N$ is large. Is it then true that, once a typical time evolution of the (Gaussian torus) system reaches square-root equilibrium in the particle space, then it stays in that state in the quantitative sense, of factor 30 say,

$$\begin{eqnarray}||{\mathcal{Y}}(\text{Gauss};\unicode[STIX]{x1D714};t)\cap B|-\operatorname{vol}(B)N|\leqslant 30\sqrt{N},\end{eqnarray}$$

for an extremely long time (with the possible exception of a totally negligible set of values of  $t$ )? The factor 30 is accidental, and square-root equilibrium is the best that we can hope for.

By using Theorem 1.2, in particular (1.12), we give a positive answer to this question. We use Theorem 1.2 as a short-time ergodic theorem in the configuration space. Thus the configuration space average nearly equals the short-term time average. The good news is that the configuration space average can be easily computed with direct application of probability theory, since the configuration space is a product space with product measure; see the application of Bernstein’s large deviation inequality in (2.11) below. Note that Birkhoff’s ergodic theorem does not have an explicit error term and works only for typical initial condition, and a typical initial condition represents equilibrium, the trivial case, since we are studying off-equilibrium dynamics. In contrast, Theorem 1.2 has the advantage that it works for arbitrary off-equilibrium initial configuration. It also has an explicit error term, and we can use it to describe the time evolution in realistic time. The details go as follows.

The family of time evolutions ${\mathcal{Y}}(\text{Gauss};\unicode[STIX]{x1D714};t)$ , $\unicode[STIX]{x1D714}\in \unicode[STIX]{x1D6FA}_{\text{Gauss}}$ , of the three-dimensional Gaussian torus model (in the particle space $I^{3}$ ) is represented by the family of torus lines (2.5) in the configuration space  $I^{d}$ , all starting from the same point $\vec{{\mathcal{Y}}}\in I^{d}$ ; see also (2.6)–(2.8).

For an arbitrary $\unicode[STIX]{x1D6FE}>0$ , define the (extremely complicated) set

(2.9) $$\begin{eqnarray}S(B;\unicode[STIX]{x1D6FE})=\{\vec{{\mathcal{Z}}}\in I^{d}:||{\mathcal{Z}}\cap B|-\operatorname{vol}(B)N|>\unicode[STIX]{x1D6FE}\sqrt{N}\}\end{eqnarray}$$

in the configuration space, where

$$\begin{eqnarray}\vec{{\mathcal{Z}}}=(z_{1},\ldots ,z_{3N})\quad \text{and}\quad {\mathcal{Z}}=\{\mathbf{z}_{1},\ldots ,\mathbf{z}_{N}\!\}\end{eqnarray}$$

with $\mathbf{z}_{k}=(z_{3k-2},z_{3k-1},z_{3k})$ , $1\leqslant k\leqslant N$ .

Recall the following large deviation inequality in probability theory; see [Reference Feller3].

Bernstein’s inequality. Let $X_{1},\ldots ,X_{n}$ be independent random variables with binomial distribution $\Pr [X_{i}=1]=p$ and $\Pr [X_{i}=0]=q=1-p$ . Then for every positive  $\unicode[STIX]{x1D6FE}$ ,

(2.10) $$\begin{eqnarray}\displaystyle \Pr \biggl[\biggl|\mathop{\sum }_{i=1}^{n}(X_{i}-p)\biggr|\geqslant \unicode[STIX]{x1D6FE}\sqrt{npq}\biggr] & = & \displaystyle \mathop{\sum }_{\substack{ 0\leqslant k\leqslant n \\ |k-pn|\geqslant \unicode[STIX]{x1D6FE}\sqrt{pqn}}}\binom{n}{k}p^{k}q^{n-k}\nonumber\\ \displaystyle & {\leqslant} & \displaystyle 2\exp \biggl(-\frac{\unicode[STIX]{x1D6FE}^{2}/2}{1+\unicode[STIX]{x1D6FE}/3\sqrt{npq}}\biggr),\qquad \quad\end{eqnarray}$$

where $\sqrt{npq}$ is the standard deviation of the binomial distribution.

Using (2.10) with $p=\operatorname{vol}(B)$ , we have

(2.11) $$\begin{eqnarray}\displaystyle \operatorname{vol}_{d}(S(B;\unicode[STIX]{x1D6FE})) & {\leqslant} & \displaystyle 2\exp \biggl(-\frac{\unicode[STIX]{x1D6FE}^{2}(2p(1-p))^{-1}}{1+\unicode[STIX]{x1D6FE}/3p(1-p)\sqrt{N}}\biggr)\nonumber\\ \displaystyle & {\leqslant} & \displaystyle 2\exp \biggl(-\frac{2\unicode[STIX]{x1D6FE}^{2}}{1+2\unicode[STIX]{x1D6FE}/\sqrt{N}}\biggr),\end{eqnarray}$$

where the last inequality comes from the simple fact that $p(1-p)\leqslant 1/4$ . The reason why we could apply Bernstein’s inequality is that $\operatorname{vol}_{d}$ is a product measure, and so the $d=3N$ -dimensional volume $\operatorname{vol}_{d}(S(B;\unicode[STIX]{x1D6FE}))$ represents a large deviation probability for $N$ independent random variables.

For example,

(2.12) $$\begin{eqnarray}\text{if}~\unicode[STIX]{x1D6FE}=30~\text{and}~N\geqslant 10^{6},\text{then (2.11) gives}~2\exp \biggl(-\frac{2\unicode[STIX]{x1D6FE}^{2}}{1+2\unicode[STIX]{x1D6FE}/\sqrt{N}}\biggr)<10^{-700},\end{eqnarray}$$

which is extremely small. The long-term stability of, say, 30-square-root equilibrium (in the particle space) is based on this striking numerical fact.

Since the configuration space $I^{d}$ is translation invariant, we can apply Theorem 1.2 with $f=\unicode[STIX]{x1D712}_{S}$ , where $S=S(B;\unicode[STIX]{x1D6FE})-\vec{{\mathcal{Y}}}$ is a translated copy of $S(B;\unicode[STIX]{x1D6FE})$ in the torus $I^{d}$ . Clearly $\operatorname{vol}_{d}(S)=\operatorname{vol}_{d}(S(B;\unicode[STIX]{x1D6FE}))$ . Using (2.11) in Theorem 1.2 with $W=2^{k}U$ , we obtain

(2.13) $$\begin{eqnarray}\displaystyle & & \displaystyle (2\unicode[STIX]{x1D70B})^{-d/2}\int _{\mathbf{R}^{d}}\biggl(\int _{U}^{2^{k}U}\unicode[STIX]{x1D712}_{S}(t\mathbf{v})\,dt-\operatorname{vol}_{d}(S)(2^{k}-1)U\biggr)^{2}\text{e}^{-|\mathbf{v}|^{2}/2}\,d\mathbf{v}\nonumber\\ \displaystyle & & \displaystyle \quad =(2\unicode[STIX]{x1D70B})^{-d/2}\int _{\mathbf{R}^{d}}(D_{S}(\mathbf{v};U,2^{k}U))^{2}\text{e}^{-|\mathbf{v}|^{2}/2}\,d\mathbf{v}\nonumber\\ \displaystyle & & \displaystyle \quad \leqslant 2\exp \biggl(-\frac{2\unicode[STIX]{x1D6FE}^{2}}{1+2\unicode[STIX]{x1D6FE}/\sqrt{N}}\biggr)10k((2^{k}-1)U+1),\end{eqnarray}$$

assuming of course that $U\geqslant 1$ and $\text{e}^{\unicode[STIX]{x1D70B}^{2}U^{2}/2}\geqslant 3\,dU$ . By (2.6), (2.9) and using $S=S(B;\unicode[STIX]{x1D6FE})-\vec{{\mathcal{Y}}}$ , we have

(2.14) $$\begin{eqnarray}\displaystyle & & \displaystyle (2\unicode[STIX]{x1D70B})^{-d/2}\int _{\mathbf{R}^{d}}\biggl(\int _{U}^{2^{k}U}\unicode[STIX]{x1D712}_{S}(t\mathbf{v})\,dt-\operatorname{vol}_{d}(S)(2^{k}-1)U\biggr)^{2}\text{e}^{-|\mathbf{v}|^{2}/2}\,d\mathbf{v}\nonumber\\ \displaystyle & & \displaystyle \quad =\int _{\unicode[STIX]{x1D6FA}_{\text{Gauss}}}\biggl(\int _{U}^{2^{k}U}\unicode[STIX]{x1D712}_{S}(t\mathbf{v}(\unicode[STIX]{x1D714}))\,dt-\operatorname{vol}_{d}(S)(2^{k}-1)U\biggr)^{2}\,d\unicode[STIX]{x1D707}_{\text{Gauss}}(\unicode[STIX]{x1D714})\nonumber\\ \displaystyle & & \displaystyle \quad =\int _{\unicode[STIX]{x1D6FA}_{\text{Gauss}}}\!\!(\operatorname{length}\{U\,\leqslant \,t\,\leqslant \,2^{k}U:||{\mathcal{Y}}(\text{Gauss};\unicode[STIX]{x1D714};t)\,\cap \,B|\,-\,\text{vol}(B)N|\,>\,\unicode[STIX]{x1D6FE}\sqrt{N}\}\nonumber\\ \displaystyle & & \displaystyle \qquad -\,\text{vol}_{d}(S(B;\unicode[STIX]{x1D6FE}))(2^{k}-1)U)^{2}\,d\unicode[STIX]{x1D707}_{\text{Gauss}}(\unicode[STIX]{x1D714}).\qquad \quad\end{eqnarray}$$

Combining (2.13) and (2.14), we obtain the following result.

Theorem 2.1. Let ${\mathcal{Y}}(\text{Gauss};\unicode[STIX]{x1D714};t)$ , $\unicode[STIX]{x1D714}\in \unicode[STIX]{x1D6FA}_{\text{Gauss}}$ , be the three-dimensional Gaussian torus model, and let $B\subset [0,1)^{3}$ be a measurable test set with three-dimensional Lebesgue measure $\operatorname{vol}(B)$ . Assume that

$$\begin{eqnarray}U\geqslant 1\quad \text{and}\quad \text{e}^{\unicode[STIX]{x1D70B}^{2}U^{2}/2}\geqslant 3\,dU.\end{eqnarray}$$

Then for every $\unicode[STIX]{x1D6FE}>0$ and every integer $k\geqslant 1$ ,

(2.15) $$\begin{eqnarray}\displaystyle & & \displaystyle \int _{\unicode[STIX]{x1D6FA}_{\text{Gauss}}}(\operatorname{length}\{U\leqslant t\leqslant 2^{k}U:||{\mathcal{Y}}(\text{Gauss};\unicode[STIX]{x1D714};t)\cap B|-\operatorname{vol}(B)N|>\unicode[STIX]{x1D6FE}\sqrt{N}\}\nonumber\\ \displaystyle & & \displaystyle \qquad -\,\operatorname{vol}_{d}(S(B;\unicode[STIX]{x1D6FE}))(2^{k}-1)U)^{2}\,d\unicode[STIX]{x1D707}_{\text{Gauss}}(\unicode[STIX]{x1D714})\nonumber\\ \displaystyle & & \displaystyle \quad \leqslant 2\exp \biggl(-\frac{2\unicode[STIX]{x1D6FE}^{2}}{1+2\unicode[STIX]{x1D6FE}/\sqrt{N}}\biggr)10k((2^{k}-1)U+1),\end{eqnarray}$$

where

$$\begin{eqnarray}\operatorname{vol}_{d}(S(B;\unicode[STIX]{x1D6FE}))<2\exp \biggl(-\frac{2\unicode[STIX]{x1D6FE}^{2}}{1+2\unicode[STIX]{x1D6FE}/\sqrt{N}}\biggr).\end{eqnarray}$$

Probably the reader does not find Theorem 2.1 very pretty, but it is an extremely powerful result. To illustrate its power, let $\unicode[STIX]{x1D6FE}=30$ , $U=4$ , $k=100$ and $N=10^{27}$ , so that $d=3N=3\cdot 10^{27}$ . Then by (2.12) and (2.15),

(2.16) $$\begin{eqnarray}\displaystyle & & \displaystyle \int _{\unicode[STIX]{x1D6FA}_{\text{Gauss}}}(\operatorname{length}\{4\leqslant t\leqslant 4\cdot 2^{100}:||{\mathcal{Y}}(\text{Gauss};\unicode[STIX]{x1D714};t)\cap B|-\operatorname{vol}(B)N|>30\sqrt{N}\}\nonumber\\ \displaystyle & & \displaystyle \qquad -\,\operatorname{vol}_{d}(S(B;30))(2^{100}-1)4)^{2}\,d\unicode[STIX]{x1D707}_{\text{Gauss}}(\unicode[STIX]{x1D714})\nonumber\\ \displaystyle & & \displaystyle \quad \leqslant 10^{-700}\cdot 400\cdot 2^{100}\cdot 20<10^{-661}.\end{eqnarray}$$

Let $\unicode[STIX]{x1D6FA}_{\text{Gauss}}^{(\text{bad})}$ be the set of those $\unicode[STIX]{x1D714}\in \unicode[STIX]{x1D6FA}_{\text{Gauss}}$ for which

(2.17) $$\begin{eqnarray}\operatorname{length}\{4\leqslant t\leqslant 4\cdot 2^{100}:||{\mathcal{Y}}(\text{Gauss};\unicode[STIX]{x1D714};t)\cap B|-\operatorname{vol}(B)N|>30\sqrt{N}\}\geqslant 10^{-220}.\end{eqnarray}$$

We claim that (2.16) implies

(2.18) $$\begin{eqnarray}\unicode[STIX]{x1D707}_{\text{Gauss}}(\unicode[STIX]{x1D6FA}_{\text{Gauss}}^{(\text{bad})})\leqslant 10^{-220}.\end{eqnarray}$$

Indeed, otherwise

$$\begin{eqnarray}\displaystyle & & \displaystyle \int _{\unicode[STIX]{x1D6FA}_{\text{Gauss}}}(\operatorname{length}\{4\leqslant t\leqslant 4\cdot 2^{100}:||{\mathcal{Y}}(\text{Gauss};\unicode[STIX]{x1D714};t)\cap B|-\operatorname{vol}(B)N|>30\sqrt{N}\}\nonumber\\ \displaystyle & & \displaystyle \qquad -\,\operatorname{vol}_{d}(S(B;30))(2^{100}-1)4)^{2}\,d\unicode[STIX]{x1D707}_{\text{Gauss}}(\unicode[STIX]{x1D714})\nonumber\\ \displaystyle & & \displaystyle \quad \geqslant \int _{\unicode[STIX]{x1D6FA}_{\text{Gauss}}^{(\text{bad})}} (\operatorname{length}\!\{\!4\leqslant t\leqslant 4\cdot 2^{100}:||{\mathcal{Y}}(\text{Gauss};\unicode[STIX]{x1D714};t)\cap B|\nonumber\\ \displaystyle & & \displaystyle \qquad -\operatorname{vol}(B)N|>30\sqrt{N}\!\}-{\operatorname{vol}_{d}(S(B;30))(2^{100}-1)4)}^{2}\,d\unicode[STIX]{x1D707}_{\text{Gauss}}(\unicode[STIX]{x1D714})\nonumber\\ \displaystyle & & \displaystyle \quad \geqslant 10^{-220}(10^{-220}-10^{-600})^{2}>10^{-661},\nonumber\end{eqnarray}$$

which contradicts (2.16). In the last step we have used the fact that

$$\begin{eqnarray}\operatorname{vol}_{d}(S(B;30))(2^{100}-1)4\leqslant 10^{-700}\cdot 10^{31}<10^{-600}.\end{eqnarray}$$

Note that the choice of $N=10^{27}$ was realistic, in the sense that there are roughly $10^{27}$ gas molecules in a cubic box of volume $1~\text{m}^{3}$ . In the classical Bernoulli gas model, the gas molecules are represented by point billiards. Using unfolding, we can reduce the billiard model to the torus model. The threshold $U=4$ here represents, roughly speaking, the relaxation distance, i.e. the necessary number of jumps per particle in the torus model, equal to half of the number of bounces in the billiard model, to reach square-root equilibrium (in the particle space) for the typical time evolution of the Gaussian system. Assume that the gas molecules have average speed $10^{3}~\text{m s}^{-1}$ . For this Gaussian system, it takes only a few milliseconds to reach square-root equilibrium. Now (2.17) and (2.18) have the following interpretation. Choosing an arbitrary (measurable) test set $B\subset [0,1)^{3}$ in the gas container (or particle space) and an arbitrary $N$ -element initial point configuration  ${\mathcal{Y}}$ , for the totally overwhelming majority of the initial velocities (Gaussian distribution), the number of particles in $B$ remains very close to the expected value $\operatorname{vol}(B)N$ for an extremely long time, with the possible exception of a totally negligible set of times  $t$ .

Indeed, for every (measurable) test set $B\subset [0,1)^{3}$ and every $N=10^{27}$ -element initial point configuration ${\mathcal{Y}}\subset [0,1)^{3}$ , there exists a subset $\unicode[STIX]{x1D6FA}_{\text{Gauss}}^{(\text{good})}$ , where

$$\begin{eqnarray}\unicode[STIX]{x1D6FA}_{\text{Gauss}}^{(\text{good})}=\unicode[STIX]{x1D6FA}_{\text{Gauss}}\setminus \unicode[STIX]{x1D6FA}_{\text{Gauss}}^{(\text{bad})}\end{eqnarray}$$

with

$$\begin{eqnarray}\unicode[STIX]{x1D707}_{\text{Gauss}}(\unicode[STIX]{x1D6FA}_{\text{Gauss}}^{(\text{good})})\geqslant 1-10^{-220},\end{eqnarray}$$

noting (2.18), representing a totally overwhelming majority such that for every $\unicode[STIX]{x1D714}\in \unicode[STIX]{x1D6FA}_{\text{Gauss}}^{(\text{good})}$ , the inequality

(2.19) $$\begin{eqnarray}||{\mathcal{Y}}(\text{Gauss};\unicode[STIX]{x1D714};t)\cap B|-\operatorname{vol}(B)10^{27}|\leqslant 30\cdot 10^{13.5}\end{eqnarray}$$

holds for every $4\leqslant t\leqslant 4\cdot 2^{100}$ with the possible exception of a set of times $t$ of total length less than $10^{-220}$ ; see (2.17). The latter actually represents less than $10^{-223}$ seconds, which is a ridiculously short time.

Note that $4\leqslant t\leqslant 4\cdot 2^{100}$ represents a time interval of about $10^{27}$ seconds, which is an incredibly long time, being roughly a billion times the age of the universe.

Finally, by (2.19),

$$\begin{eqnarray}\frac{1}{N}||{\mathcal{Y}}(\text{Gauss};\unicode[STIX]{x1D714};t)\cap B|-\text{vol}(B)|\leqslant 3\cdot 10^{-12.5}<10^{-12},\end{eqnarray}$$

which can be interpreted as almost constant density for an incredibly long time.

What happens if we want to prove long-term stability of square-root equilibrium with respect to a whole family of nice sets instead of a fixed measurable test set? Of course we cannot expect that a system stays in square-root equilibrium with respect to all measurable test sets simultaneously. What we may expect, however, is that, starting from an arbitrary but fixed initial configuration  ${\mathcal{Y}}$ , and after reaching configuration space equilibrium, the typical time evolution of the system stays in square-root equilibrium in the particle space with respect to all nice test sets simultaneously for a very, very long time, without any violator time instant  $t$ . And indeed, again by using Theorem 1.2, we are going to prove such a result in part II in the sequel. In fact, this paper is the first in a series of papers devoted to the applications of Theorem 1.2, and its extensions beyond the Gaussian case, to describe the time evolution of large off-equilibrium systems.

3 Proof of Theorem 1.2

For technical reasons, it is convenient to prove first a special case with an upper bound on the ratio  $W/U$ .

Theorem 3.1. Let $f\in L_{2}([0,1)^{d})$ be a test function. If $1\leqslant U<V\leqslant 2U$ and $\text{e}^{\unicode[STIX]{x1D70B}^{2}U^{2}/2}\geqslant 3\,dU$ , then

$$\begin{eqnarray}\unicode[STIX]{x1D6E5}_{f}^{2}(\text{Gauss};U,V)\leqslant \unicode[STIX]{x1D70E}_{0}^{2}(f)(9(V-U)+1).\end{eqnarray}$$

The technical restriction $U<V\leqslant 2U$ in Theorem 3.1 can be easily eliminated by a routine application of the Cauchy–Schwarz inequality; see the end of §3.

To prove Theorem 3.1, we shall use Fourier analysis in the configuration space $I^{d}=[0,1)^{d}$ . For high dimension $d$ , this leads to technical difficulties that are combinatorial in nature.

Let $f\in L_{2}(I^{d})$ be a Lebesgue square-integrable function in the $d$ -dimensional unit torus, i.e. we extend $f$ over the whole $d$ -space $\mathbf{R}^{d}$ periodically, and consider the Fourier expansion

(3.1) $$\begin{eqnarray}f(\mathbf{u})=\mathop{\sum }_{\mathbf{n}\in \mathbf{Z}^{d}}a_{\mathbf{n}}\text{e}^{2\unicode[STIX]{x1D70B}\text{i}\mathbf{n}\cdot \mathbf{u}},\end{eqnarray}$$

where

(3.2) $$\begin{eqnarray}a_{\mathbf{n}}=\int _{I^{d}}f(\mathbf{y})\text{e}^{-2\unicode[STIX]{x1D70B}\text{i}\mathbf{n}\cdot \mathbf{y}}\,d\mathbf{y},\quad \mathbf{n}\in \mathbf{Z}^{d},\end{eqnarray}$$

are the Fourier coefficients. Here $\mathbf{v}\cdot \mathbf{w}$ denotes the dot product of $\mathbf{v}$ and  $\mathbf{w}$ .

Clearly

(3.3) $$\begin{eqnarray}a_{\mathbf{0}}=\int _{I^{d}}f(\mathbf{y})\,d\mathbf{y}\quad \text{and so}\quad \mathop{\sum }_{\mathbf{n}\in \mathbf{Z}^{d}\setminus \mathbf{0}}|a_{\mathbf{n}}|^{2}=\unicode[STIX]{x1D70E}_{0}^{2}(f),\end{eqnarray}$$

where we have used Parseval’s formula.

By (3.1), (3.2) and noting that $\mathbf{e}\in \mathbf{S}^{d-1}$ is a unit vector in the $d$ -space, we have

(3.4) $$\begin{eqnarray}f(t\sqrt{d}\mathbf{e})-\int _{I^{d}}f\,d\text{V}=\mathop{\sum }_{\mathbf{n}\in \mathbf{Z}^{d}\setminus \mathbf{0}}a_{\mathbf{n}}\text{e}^{2\unicode[STIX]{x1D70B}\text{i}t\sqrt{d}\mathbf{n}\cdot \mathbf{e}}.\end{eqnarray}$$

Remark.

Notice that (3.4) is an informal equality. The infinite sum on the right-hand side may be divergent for some unit vector $\mathbf{e}\in \mathbf{S}^{d-1}$ in the $d$ -space. To avoid this kind of technical nuisance, we use the well-known fact that the trigonometric polynomials are dense in the $L_{2}$ -space. We proceed in two steps. The first step is to prove the theorem in the special case where $f$ is a trigonometric polynomial (in $d$ variables). Then it is trivial to carry out the usual manipulations, e.g. changing the order of finite summation and integration. The second step is the routine limiting process. The class of trigonometric polynomials forms a dense subset of the Hilbert space $L_{2}(I^{d})$ , and we can complete the proof in the general case with an application of Lebesgue’s dominated convergence theorem.

Recall (1.9) and (1.10). Using (1.6) and (3.4), we have

(3.5) $$\begin{eqnarray}\displaystyle D_{f}(\unicode[STIX]{x1D70C}\mathbf{e};T_{1},T_{2}) & = & \displaystyle \int _{T_{1}}^{T_{2}}f(t\unicode[STIX]{x1D70C}\mathbf{e})\,dt-(T_{2}-T_{1})\int _{I^{d}}f\,d\text{V}\nonumber\\ \displaystyle & = & \displaystyle \int _{T_{1}}^{T_{2}}\mathop{\sum }_{\mathbf{n}\in \mathbf{Z}^{d}\setminus \mathbf{0}}a_{\mathbf{n}}\text{e}^{2\unicode[STIX]{x1D70B}\text{i}t\mathbf{n}\cdot \unicode[STIX]{x1D70C}\mathbf{e}}\,dt.\end{eqnarray}$$

We need a simple estimate.

Lemma 3.1. For every $d$ -dimensional vector $\mathbf{w}=(w_{1},\ldots ,w_{d})$ , we have

$$\begin{eqnarray}\int _{\mathbf{S}^{d-1}}\int _{0}^{\infty }\text{e}^{\text{i}\mathbf{w}\cdot \unicode[STIX]{x1D70C}\mathbf{e}}\frac{\unicode[STIX]{x1D70C}^{d-1}\text{e}^{-\unicode[STIX]{x1D70C}^{2}/2}}{C_{d}}\,d\unicode[STIX]{x1D70C}\,d\unicode[STIX]{x1D708}_{d-1}^{\ast }(\mathbf{e})=\text{e}^{-|\mathbf{w}|^{2}/2}.\end{eqnarray}$$

Proof. For simplicity, denote the integral under investigation by  ${\mathcal{I}}$ . Note that the vector $\unicode[STIX]{x1D70C}\mathbf{e}=\mathbf{v}=(v_{1},\ldots ,v_{d})$ has $d$ -dimensional standard Gaussian (normal) distribution. Thus

$$\begin{eqnarray}\displaystyle {\mathcal{I}} & = & \displaystyle (2\unicode[STIX]{x1D70B})^{-d/2}\int _{\mathbf{R}^{d}}\text{e}^{\text{i}\mathbf{w}\cdot \mathbf{v}}\text{e}^{-|\mathbf{v}|^{2}/2}\,d\mathbf{v}=\mathop{\prod }_{j=1}^{d}\biggl(\frac{1}{\sqrt{2\unicode[STIX]{x1D70B}}}\int _{-\infty }^{\infty }\text{e}^{\text{i}w_{j}v_{j}}\text{e}^{-v_{j}^{2}/2}\,dv_{j}\biggr)\nonumber\\ \displaystyle & = & \displaystyle \mathop{\prod }_{j=1}^{d}\text{e}^{-w_{j}^{2}/2}=\text{e}^{-|\mathbf{w}|^{2}/2},\nonumber\end{eqnarray}$$

where in the argument, we have used the well-known facts that the coordinates $v_{1},\ldots ,v_{d}$ of $\mathbf{v}$ are independent random variables, each having standard normal distribution, and that the Fourier transform of $\text{e}^{-x^{2}/2}$ is itself.◻

Let us return to (1.9) and (3.5). Applying Lemma 3.1, it is easy to establish the following result.

Lemma 3.2. For every $-\infty \leqslant W^{\prime }<W^{\prime \prime }\leqslant \infty$ , we have

$$\begin{eqnarray}\unicode[STIX]{x1D6E5}_{f}^{2}(\text{Gauss};W^{\prime },W^{\prime \prime })=\mathop{\sum }_{\mathbf{n}_{1},\mathbf{n}_{2}\in \mathbf{Z}^{d}\setminus \mathbf{0}}a_{\mathbf{n}_{1}}\overline{a_{\mathbf{n}_{2}}}\int _{W^{\prime }}^{W^{\prime \prime }}\int _{W^{\prime }}^{W^{\prime \prime }}\text{e}^{-2\unicode[STIX]{x1D70B}^{2}|t_{1}\mathbf{n}_{1}-t_{2}\mathbf{n}_{2}|^{2}}\,dt_{1}\,dt_{2}.\end{eqnarray}$$

Proof. It is easy to see that

$$\begin{eqnarray}|D_{f}(\unicode[STIX]{x1D70C}\mathbf{e};W^{\prime },W^{\prime \prime })|^{2}=\int _{W^{\prime }}^{W^{\prime \prime }}\int _{W^{\prime }}^{W^{\prime \prime }}\mathop{\sum }_{\mathbf{n}_{1},\mathbf{n}_{2}\in \mathbf{Z}^{d}\setminus \mathbf{0}}a_{\mathbf{n}_{1}}\overline{a_{\mathbf{n}_{2}}}\text{e}^{2\unicode[STIX]{x1D70B}\text{i}(t_{1}\mathbf{n}_{1}-t_{2}\mathbf{n}_{2})\cdot \unicode[STIX]{x1D70C}\mathbf{e}}\,dt_{1}\,dt_{2}.\end{eqnarray}$$

Applying this in (3.5), we obtain

(3.6) $$\begin{eqnarray}\displaystyle & & \displaystyle \unicode[STIX]{x1D6E5}_{f}^{2}(\text{Gauss};W^{\prime },W^{\prime \prime })\nonumber\\ \displaystyle & & \displaystyle \quad =\int _{\mathbf{S}^{d-1}}\int _{0}^{\infty }\biggl(\int _{W^{\prime }}^{W^{\prime \prime }}\int _{W^{\prime }}^{W^{\prime \prime }}\mathop{\sum }_{\mathbf{n}_{1},\mathbf{n}_{2}\in \mathbf{Z}^{d}\setminus \mathbf{0}}a_{\mathbf{n}_{1}}\overline{a_{\mathbf{n}_{2}}}\text{e}^{2\unicode[STIX]{x1D70B}\text{i}(t_{1}\mathbf{n}_{1}-t_{2}\mathbf{n}_{2})\cdot \unicode[STIX]{x1D70C}\mathbf{e}}\,dt_{1}\,dt_{2}\biggr)\nonumber\\ \displaystyle & & \displaystyle \qquad \cdot \,\frac{\unicode[STIX]{x1D70C}^{d-1}\text{e}^{-\unicode[STIX]{x1D70C}^{2}/2}}{C_{d}}\,d\unicode[STIX]{x1D70C}\,d\unicode[STIX]{x1D708}_{d-1}^{\ast }(\mathbf{e})\nonumber\\ \displaystyle & & \displaystyle \quad =\mathop{\sum }_{\mathbf{n}_{1},\mathbf{n}_{2}\in \mathbf{Z}^{d}\setminus \mathbf{0}}a_{\mathbf{n}_{1}}\overline{a_{\mathbf{n}_{2}}}\nonumber\\ \displaystyle & & \displaystyle \qquad \cdot \int _{W^{\prime }}^{W^{\prime \prime }}\!\int _{W^{\prime }}^{W^{\prime \prime }}\!\biggl(\int _{\mathbf{S}^{d-1}}\int _{0}^{\infty }\text{e}^{2\unicode[STIX]{x1D70B}\text{i}(t_{1}\mathbf{n}_{1}-t_{2}\mathbf{n}_{2})\cdot \unicode[STIX]{x1D70C}\mathbf{e}}\frac{\unicode[STIX]{x1D70C}^{d-1}\text{e}^{-\unicode[STIX]{x1D70C}^{2}/2}}{C_{d}}\,d\unicode[STIX]{x1D70C}\,d\unicode[STIX]{x1D708}_{d-1}^{\ast }(\mathbf{e})\biggr)\,dt_{1}\,dt_{2}.\nonumber\\ \displaystyle & & \displaystyle\end{eqnarray}$$

Lemma 3.1 applied to the inner integral on the right-hand side of (3.6) now gives

$$\begin{eqnarray}\int _{\mathbf{S}^{d-1}}\int _{0}^{\infty }\text{e}^{2\unicode[STIX]{x1D70B}\text{i}(t_{1}\mathbf{n}_{1}-t_{2}\mathbf{n}_{2})\cdot \unicode[STIX]{x1D70C}\mathbf{e}}\frac{\unicode[STIX]{x1D70C}^{d-1}\text{e}^{-\unicode[STIX]{x1D70C}^{2}/2}}{C_{d}}\,d\unicode[STIX]{x1D70C}\,d\unicode[STIX]{x1D708}_{d-1}^{\ast }(\mathbf{e})=\text{e}^{-2\unicode[STIX]{x1D70B}^{2}|t_{1}\mathbf{n}_{1}-t_{2}\mathbf{n}_{2}|^{2}}.\end{eqnarray}$$

Substituting this into (3.6) completes the proof. ◻

We are now ready to prove Theorem 3.1. The proof is an elementary brute force combinatorial argument. For every $\mathbf{n}=(n_{1},\ldots ,n_{d})\in \mathbf{Z}^{d}\setminus \mathbf{0}$ , write

$$\begin{eqnarray}L(\mathbf{n})=\{1\leqslant i\leqslant d:n_{i}\neq 0\}.\end{eqnarray}$$

Applying the inequality $|a_{\mathbf{n}_{1}}\overline{a_{\mathbf{n}_{2}}}||\leqslant (|a_{\mathbf{n}_{1}}|^{2}+|a_{\mathbf{n}_{2}}|^{2})/2$ in Lemma 3.2, we have

(3.7) $$\begin{eqnarray}\displaystyle & & \displaystyle |\unicode[STIX]{x1D6E5}_{f}^{2}(\text{Gauss};U,V)|\nonumber\\ \displaystyle & & \displaystyle \quad \leqslant \mathop{\sum }_{\mathbf{n}_{1},\mathbf{n}_{2}\in \mathbf{Z}^{d}\setminus \mathbf{0}}\frac{1}{2}(|a_{\mathbf{n}_{1}}|^{2}+|a_{\mathbf{n}_{2}}|^{2})\int _{U}^{V}\int _{U}^{V}\text{e}^{-2\unicode[STIX]{x1D70B}^{2}|t_{1}\mathbf{n}_{1}-t_{2}\mathbf{n}_{2}|^{2}}\,dt_{1}\,dt_{2}\nonumber\\ \displaystyle & & \displaystyle \quad =\mathop{\sum }_{\mathbf{n}_{1}\in \mathbf{Z}^{d}\setminus \mathbf{0}}|a_{\mathbf{n}_{1}}|^{2}\mathop{\sum }_{\mathbf{n}_{2}\in \mathbf{Z}^{d}\setminus \mathbf{0}}\int _{U}^{V}\int _{U}^{V}\text{e}^{-2\unicode[STIX]{x1D70B}^{2}|t_{1}\mathbf{n}_{1}-t_{2}\mathbf{n}_{2}|^{2}}\,dt_{1}\,dt_{2}\nonumber\\ \displaystyle & & \displaystyle \quad =\int _{U}^{V}\mathop{\sum }_{\mathbf{n}_{1}\in \mathbf{Z}^{d}\setminus \mathbf{0}}|a_{\mathbf{n}_{1}}|^{2}\biggl(\int _{U}^{V}\mathop{\sum }_{\mathbf{n}_{2}\in \mathbf{Z}^{d}\setminus \mathbf{0}}\text{e}^{-2\unicode[STIX]{x1D70B}^{2}|t_{1}\mathbf{n}_{1}-t_{2}\mathbf{n}_{2}|^{2}}\,dt_{2}\biggr)\,dt_{1}\nonumber\\ \displaystyle & & \displaystyle \quad =\mathop{\sum }_{\unicode[STIX]{x1D706}_{1}=1}^{d}\int _{U}^{V}\mathop{\sum }_{\substack{ \mathbf{n}_{1}\in \mathbf{Z}^{d}\setminus \mathbf{0} \\ |L(\mathbf{n}_{1})|=\unicode[STIX]{x1D706}_{1}}}|a_{\mathbf{n}_{1}}|^{2}\nonumber\\ \displaystyle & & \displaystyle \qquad \cdot \,\biggl(\mathop{\sum }_{L_{1,2}\subseteq L(\mathbf{n}_{1})}\mathop{\sum }_{\unicode[STIX]{x1D706}_{2}=\max \{|L_{1,2}|,1\}}^{d}\int _{U}^{V}\mathop{\sum }_{\substack{ \mathbf{n}_{2}\in \mathbf{Z}^{d}\setminus \mathbf{0} \\ |L(\mathbf{n}_{2})|=\unicode[STIX]{x1D706}_{2} \\ L(\mathbf{n}_{2})\cap L(\mathbf{n}_{1})=L_{1,2}}}\text{e}^{-2\unicode[STIX]{x1D70B}^{2}|t_{1}\mathbf{n}_{1}-t_{2}\mathbf{n}_{2}|^{2}}\,dt_{2}\biggr)\,dt_{1}.\nonumber\\ \displaystyle & & \displaystyle\end{eqnarray}$$

We fix $t_{1}\in [U,V]$ , $\mathbf{n}_{1}\in \mathbf{Z}^{d}\setminus \mathbf{0}$ , $L_{1,2}\subseteq L(\mathbf{n}_{1})$ and $\unicode[STIX]{x1D706}_{2}$ , and focus on the inner integral on the right-hand side of (3.7).

Write

$$\begin{eqnarray}\unicode[STIX]{x1D706}_{1,2}=|L_{1,2}|=|\{1\leqslant i\leqslant d:n_{1,i}\neq 0~\text{and}~n_{2,i}\neq 0\}|.\end{eqnarray}$$

Let $k_{i}(\mathbf{n}_{2})$ , $i=1,2,3\ldots \,$ , denote the number of coordinates $n_{2,i}=\pm i$ of $\mathbf{n}_{2}$ which also satisfy $n_{1,i}=0$ . Note that

(3.8) $$\begin{eqnarray}k_{1}(\mathbf{n}_{2})+k_{2}(\mathbf{n}_{2})+k_{3}(\mathbf{n}_{2})+\cdots =\unicode[STIX]{x1D706}_{2}-\unicode[STIX]{x1D706}_{1,2}.\end{eqnarray}$$

Let $h_{0}(t_{2};\mathbf{n}_{2})$ denote the number of coordinates $j\in L_{1,2}$ such that

$$\begin{eqnarray}|t_{1}n_{1,j}-t_{2}n_{2,j}|<\frac{U}{2}.\end{eqnarray}$$

Let $h_{i}(t_{2};\mathbf{n}_{2})$ , $i=1,2,3\ldots \,$ , denote the number of coordinates $j\in L_{1,2}$ such that

$$\begin{eqnarray}\frac{(2i-1)U}{2}\leqslant |t_{1}n_{1,j}-t_{2}n_{2,j}|<\frac{(2i+1)U}{2}.\end{eqnarray}$$

Note that

(3.9) $$\begin{eqnarray}h_{0}(t_{2};\mathbf{n}_{2})+h_{1}(t_{2};\mathbf{n}_{2})+h_{2}(t_{2};\mathbf{n}_{2})+h_{3}(t_{2};\mathbf{n}_{2})+\cdots =\unicode[STIX]{x1D706}_{1,2}.\end{eqnarray}$$

By definition,

$$\begin{eqnarray}\displaystyle |t_{1}\mathbf{n}_{1}-t_{2}\mathbf{n}_{2}|^{2} & = & \displaystyle \mathop{\sum }_{j\in L_{1,2}}(t_{1}n_{1,j}-t_{2}n_{2,j})^{2}\nonumber\\ \displaystyle & & \displaystyle +\,\mathop{\sum }_{j\in L(\mathbf{n}_{1})\setminus L_{1,2}}(t_{1}n_{1,j})^{2}+\mathop{\sum }_{j\in L(\mathbf{n}_{2})\setminus L_{1,2}}(t_{2}n_{2,j})^{2}.\nonumber\end{eqnarray}$$

Using this and the definitions of $k_{i}(\mathbf{n}_{2})$ and $h_{i}(t_{2};\mathbf{n}_{2})$ , we have

(3.10) $$\begin{eqnarray}\displaystyle & & \displaystyle \int _{U}^{V}\mathop{\sum }_{\substack{ \mathbf{n}_{2}\in \mathbf{Z}^{d}\setminus \mathbf{0} \\ |L(\mathbf{n}_{2})|=\unicode[STIX]{x1D706}_{2} \\ L(\mathbf{n}_{2})\cap L(\mathbf{n}_{1})=L_{1,2}}}\text{e}^{-2\unicode[STIX]{x1D70B}^{2}|t_{1}\mathbf{n}_{1}-t_{2}\mathbf{n}_{2}|^{2}}\,dt_{2}\nonumber\\ \displaystyle & & \displaystyle \quad \leqslant \exp \biggl(-2\unicode[STIX]{x1D70B}^{2}\mathop{\sum }_{j\in L(\mathbf{n}_{1})\setminus L_{1,2}}|n_{1,j}|^{2}U^{2}\biggr)\nonumber\\ \displaystyle & & \displaystyle \qquad \cdot \int _{U}^{V}\mathop{\sum }_{\substack{ \mathbf{n}_{2}\in \mathbf{Z}^{d}\setminus \mathbf{0} \\ |L(\mathbf{n}_{2})|=\unicode[STIX]{x1D706}_{2} \\ L(\mathbf{n}_{2})\cap L(\mathbf{n}_{1})=L_{1,2}}}{\mathcal{K}}(t_{2};\mathbf{n}_{2}){\mathcal{H}}(t_{2};\mathbf{n}_{2};U)\,dt_{2},\end{eqnarray}$$

where

$$\begin{eqnarray}{\mathcal{K}}(t_{2};\mathbf{n}_{2})=\mathop{\prod }_{i\geqslant 1}\text{e}^{-2\unicode[STIX]{x1D70B}^{2}k_{i}(\mathbf{n}_{2})i^{2}t_{2}^{2}}\end{eqnarray}$$

and

$$\begin{eqnarray}{\mathcal{H}}(t_{2};\mathbf{n}_{2};U)=\mathop{\prod }_{i\geqslant 1}\text{e}^{-\unicode[STIX]{x1D70B}^{2}h_{i}(t_{2};\mathbf{n}_{2})((2i-1)U)^{2}/2}.\end{eqnarray}$$

We estimate the last sum on the right-hand side of (3.10). Again, using the definitions of $k_{i}(\mathbf{n}_{2})$ and $h_{i}(t_{2};\mathbf{n}_{2})$ , and noting (3.8) and (3.9), we obtain the upper bound

(3.11) $$\begin{eqnarray}\mathop{\sum }_{\substack{ \mathbf{n}_{2}\in \mathbf{Z}^{d}\setminus \mathbf{0} \\ |L(\mathbf{n}_{2})|=\unicode[STIX]{x1D706}_{2} \\ L(\mathbf{n}_{2})\cap L(\mathbf{n}_{1})=L_{1,2}}}{\mathcal{K}}(t_{2};\mathbf{n}_{2}){\mathcal{H}}(t_{2};\mathbf{n}_{2};U)\leqslant {\mathcal{K}}(d;U){\mathcal{H}}(U),\end{eqnarray}$$

where

$$\begin{eqnarray}{\mathcal{K}}(d;U)=\mathop{\sum }_{\substack{ r\geqslant 1 \\ k_{1},\ldots ,k_{r-1}\geqslant 0 \\ k_{r}\geqslant 1 \\ k_{1}+\cdots +k_{r}=\unicode[STIX]{x1D706}_{2}-\unicode[STIX]{x1D706}_{1,2}}}\mathop{\prod }_{i=1}^{r}\binom{d-\unicode[STIX]{x1D706}_{1}-k_{1}-\cdots -k_{i-1}}{k_{i}}2^{k_{i}}\text{e}^{-2\unicode[STIX]{x1D70B}^{2}k_{i}i^{2}U^{2}}\end{eqnarray}$$

and

$$\begin{eqnarray}\displaystyle {\mathcal{H}}(U) & = & \displaystyle \mathop{\sum }_{\substack{ r\geqslant 0 \\ h_{0},\ldots ,h_{r-1}\geqslant 0 \\ h_{r}\geqslant 1 \\ h_{0}+\cdots +r_{r}=\unicode[STIX]{x1D706}_{1,2}}}\binom{\unicode[STIX]{x1D706}_{1,2}}{h_{0}}\mathop{\prod }_{i=1}^{r}\binom{\unicode[STIX]{x1D706}_{1,2}-h_{0}-\cdots -h_{i-1}}{h_{i}}\nonumber\\ \displaystyle & & \displaystyle \times \,2^{h_{i}}\text{e}^{-\unicode[STIX]{x1D70B}^{2}h_{i}((2i-1)U)^{2}/2}.\nonumber\end{eqnarray}$$

Note that (3.11) includes the pathological case $\unicode[STIX]{x1D706}_{2}-\unicode[STIX]{x1D706}_{1,2}=0$ , with the convention that the summation means the single term $(k_{1},\ldots ,k_{r})=(0)$ , and similarly, if $\unicode[STIX]{x1D706}_{1,2}=0$ , then $(h_{0},h_{1},\ldots ,h_{r})$ is just the single term  $(0)$ .

We have the trivial bound

(3.12) $$\begin{eqnarray}{\mathcal{K}}(d;U)\leqslant \mathop{\sum }_{\substack{ r\geqslant 1 \\ k_{1},\ldots ,k_{r-1}\geqslant 0 \\ k_{r}\geqslant 1 \\ k_{1}+\cdots +k_{r}=\unicode[STIX]{x1D706}_{2}-\unicode[STIX]{x1D706}_{1,2}}}\mathop{\prod }_{i=1}^{r}(2d)^{k_{i}}\text{e}^{-2\unicode[STIX]{x1D70B}^{2}k_{i}i^{2}U^{2}}\leqslant \biggl(\mathop{\sum }_{i\geqslant 1}2d\text{e}^{-2\unicode[STIX]{x1D70B}^{2}i^{2}U^{2}}\biggr)^{\unicode[STIX]{x1D706}_{2}-\unicode[STIX]{x1D706}_{1,2}}.\end{eqnarray}$$

On the other hand, using the multinomial theorem, we obtain

(3.13) $$\begin{eqnarray}{\mathcal{H}}(U)=\biggl(1+\mathop{\sum }_{i\geqslant 1}2\text{e}^{-\unicode[STIX]{x1D70B}^{2}((2i-1)U)^{2}/2}\biggr)^{\unicode[STIX]{x1D706}_{1,2}}.\end{eqnarray}$$

Combining (3.11)–(3.13), we deduce that

(3.14) $$\begin{eqnarray}\mathop{\sum }_{\substack{ \mathbf{n}_{2}\in \mathbf{Z}^{d}\setminus \mathbf{0} \\ |L(\mathbf{n}_{2})|=\unicode[STIX]{x1D706}_{2} \\ L(\mathbf{n}_{2})\cap L(\mathbf{n}_{1})=L_{1,2}}}\!\!{\mathcal{K}}(t_{2};\mathbf{n}_{2}){\mathcal{H}}(t_{2};\mathbf{n}_{2};U)\leqslant (3d\text{e}^{-2\unicode[STIX]{x1D70B}^{2}U^{2}})^{\unicode[STIX]{x1D706}_{2}-\unicode[STIX]{x1D706}_{1,2}}(1+3\text{e}^{-\unicode[STIX]{x1D70B}^{2}U^{2}/2})^{\unicode[STIX]{x1D706}_{1,2}}.\end{eqnarray}$$

Combining this with (3.10), we conclude that

(3.15) $$\begin{eqnarray}\displaystyle & & \displaystyle \int _{U}^{V}\mathop{\sum }_{\substack{ \mathbf{n}_{2}\in \mathbf{Z}^{d}\setminus \mathbf{0} \\ |L(\mathbf{n}_{2})|=\unicode[STIX]{x1D706}_{2} \\ L(\mathbf{n}_{2})\cap L(\mathbf{n}_{1})=L_{1,2}}}\text{e}^{-2\unicode[STIX]{x1D70B}^{2}|t_{1}\mathbf{n}_{1}-t_{2}\mathbf{n}_{2}|^{2}}\,dt_{2}\nonumber\\ \displaystyle & & \displaystyle \quad \leqslant \text{e}^{-2\unicode[STIX]{x1D70B}^{2}(\unicode[STIX]{x1D706}_{1}-\unicode[STIX]{x1D706}_{1,2})U^{2}}(V-U)(3d\text{e}^{-2\unicode[STIX]{x1D70B}^{2}U^{2}})^{\unicode[STIX]{x1D706}_{2}-\unicode[STIX]{x1D706}_{1,2}}(1+3\text{e}^{-\unicode[STIX]{x1D70B}^{2}U^{2}/2})^{\unicode[STIX]{x1D706}_{1,2}}.\qquad \;\end{eqnarray}$$

Let us now return to (3.7). We have the decomposition

(3.16) $$\begin{eqnarray}|\unicode[STIX]{x1D6E5}_{f}^{2}(\text{Gauss};U,V)|\leqslant \unicode[STIX]{x1D6EF}_{1}+\unicode[STIX]{x1D6EF}_{2}+\unicode[STIX]{x1D6EF}_{3}+\unicode[STIX]{x1D6EF}_{4},\end{eqnarray}$$

where

(3.17) $$\begin{eqnarray}\displaystyle \unicode[STIX]{x1D6EF}_{1} & = & \displaystyle \mathop{\sum }_{\unicode[STIX]{x1D706}_{1}=1}^{d-1}\int _{U}^{V}\mathop{\sum }_{\substack{ \mathbf{n}_{1}\in \mathbf{Z}^{d}\setminus \mathbf{0} \\ |L(\mathbf{n}_{1})|=\unicode[STIX]{x1D706}_{1}}}|a_{\mathbf{n}_{1}}|^{2}\nonumber\\ \displaystyle & & \displaystyle \cdot \,\biggl(\mathop{\sum }_{L_{1,2}\subseteq L(\mathbf{n}_{1})}\mathop{\sum }_{\unicode[STIX]{x1D706}_{2}=\unicode[STIX]{x1D706}_{1}+1}^{d}\int _{U}^{V}\mathop{\sum }_{\substack{ \mathbf{n}_{2}\in \mathbf{Z}^{d}\setminus \mathbf{0} \\ |L(\mathbf{n}_{2})|=\unicode[STIX]{x1D706}_{2} \\ L(\mathbf{n}_{2})\cap L(\mathbf{n}_{1})=L_{1,2}}}\text{e}^{-2\unicode[STIX]{x1D70B}^{2}|t_{1}\mathbf{n}_{1}-t_{2}\mathbf{n}_{2}|^{2}}\,dt_{2}\biggr)\,dt_{1}\nonumber\\ \displaystyle & & \displaystyle\end{eqnarray}$$

is characterized by the property $\unicode[STIX]{x1D706}_{1}<\unicode[STIX]{x1D706}_{2}$ , and

(3.18) $$\begin{eqnarray}\displaystyle \unicode[STIX]{x1D6EF}_{2} & = & \displaystyle \mathop{\sum }_{\unicode[STIX]{x1D706}_{1}=1}^{d}\int _{U}^{V}\mathop{\sum }_{\substack{ \mathbf{n}_{1}\in \mathbf{Z}^{d}\setminus \mathbf{0} \\ |L(\mathbf{n}_{1})|=\unicode[STIX]{x1D706}_{1}}}|a_{\mathbf{n}_{1}}|^{2}\nonumber\\ \displaystyle & & \displaystyle \cdot \,\biggl(\mathop{\sum }_{L_{1,2}\subseteq L(\mathbf{n}_{1})}\mathop{\sum }_{\unicode[STIX]{x1D706}_{2}=\max \{|L_{1,2}|,1\}}^{\unicode[STIX]{x1D706}_{1}-1}\int _{U}^{V}\mathop{\sum }_{\substack{ \mathbf{n}_{2}\in \mathbf{Z}^{d}\setminus \mathbf{0} \\ |L(\mathbf{n}_{2})|=\unicode[STIX]{x1D706}_{2} \\ L(\mathbf{n}_{2})\cap L(\mathbf{n}_{1})=L_{1,2}}}\text{e}^{-2\unicode[STIX]{x1D70B}^{2}|t_{1}\mathbf{n}_{1}-t_{2}\mathbf{n}_{2}|^{2}}\,dt_{2}\biggr)\,dt_{1}\nonumber\\ \displaystyle & & \displaystyle\end{eqnarray}$$

is characterized by the property $\unicode[STIX]{x1D706}_{1}>\unicode[STIX]{x1D706}_{2}$ . Furthermore, we split the case $\unicode[STIX]{x1D706}_{1}=\unicode[STIX]{x1D706}_{2}$ into two subcases according as $L(\mathbf{n}_{1})\neq L(\mathbf{n}_{2})$ or $L(\mathbf{n}_{1})=L(\mathbf{n}_{2})$ , thus

(3.19) $$\begin{eqnarray}\displaystyle \unicode[STIX]{x1D6EF}_{3} & = & \displaystyle \mathop{\sum }_{\unicode[STIX]{x1D706}_{1}=1}^{d}\int _{U}^{V}\mathop{\sum }_{\substack{ \mathbf{n}_{1}\in \mathbf{Z}^{d}\setminus \mathbf{0} \\ |L(\mathbf{n}_{1})|=\unicode[STIX]{x1D706}_{1}}}|a_{\mathbf{n}_{1}}|^{2}\nonumber\\ \displaystyle & & \displaystyle \cdot \,\biggl(\mathop{\sum }_{\substack{ L_{1,2}\subset L(\mathbf{n}_{1}) \\ L_{1,2}\neq L(\mathbf{n}_{1})}}\int _{U}^{V}\mathop{\sum }_{\substack{ \mathbf{n}_{2}\in \mathbf{Z}^{d}\setminus \mathbf{0} \\ |L(\mathbf{n}_{2})|=\unicode[STIX]{x1D706}_{1} \\ L(\mathbf{n}_{2})\cap L(\mathbf{n}_{1})=L_{1,2}}}\text{e}^{-2\unicode[STIX]{x1D70B}^{2}|t_{1}\mathbf{n}_{1}-t_{2}\mathbf{n}_{2}|^{2}}\,dt_{2}\biggr)\,dt_{1}\end{eqnarray}$$

and

(3.20) $$\begin{eqnarray}\unicode[STIX]{x1D6EF}_{4}=\int _{U}^{V}\mathop{\sum }_{\mathbf{n}_{1}\in \mathbf{Z}^{d}\setminus \mathbf{0}}|a_{\mathbf{n}_{1}}|^{2}\biggl(\int _{U}^{V}\mathop{\sum }_{\substack{ \mathbf{n}_{2}\in \mathbf{Z}^{d}\setminus \mathbf{0} \\ L(\mathbf{n}_{2})=L(\mathbf{n}_{1})}}\text{e}^{-2\unicode[STIX]{x1D70B}^{2}|t_{1}\mathbf{n}_{1}-t_{2}\mathbf{n}_{2}|^{2}}\,dt_{2}\biggr)\,dt_{1}.\end{eqnarray}$$

To estimate the last term (3.20), we need a simple but important lemma. To state this, we first need a definition. Given real numbers $C$ and  $C^{\prime }$ , consider the set

$$\begin{eqnarray}{\mathcal{B}}_{U}(C;C^{\prime })=\{t\in [U,2U]:\text{there exists}~n\in \mathbf{Z}\setminus \{0\}~\text{such that}~|C-tn|\leqslant C^{\prime }\}.\end{eqnarray}$$

We give an upper bound on the one-dimensional Lebesgue measure, i.e. length, of the set ${\mathcal{B}}_{U}(C;C^{\prime })$ .

Lemma 3.3. For arbitrary real numbers $C,C^{\prime }$ with $|C|\geqslant U\geqslant 1$ and $0<C^{\prime }<U/2$ , we have

$$\begin{eqnarray}\operatorname{length}({\mathcal{B}}_{U}(C;C^{\prime }))<6C^{\prime }.\end{eqnarray}$$

Proof. We can assume without loss of generality that $C>0$ . Clearly

$$\begin{eqnarray}|C-tn|\leqslant C^{\prime }\quad \text{if and only if}~\frac{C-C^{\prime }}{n}\leqslant t\leqslant \frac{C+C^{\prime }}{n},\end{eqnarray}$$

so

$$\begin{eqnarray}\operatorname{length}({\mathcal{B}}_{U}(C;C^{\prime }))=\mathop{\sum }_{n}^{\ast }\frac{2C^{\prime }}{n},\end{eqnarray}$$

where the summation $\sum _{n}^{\ast }$ is extended over all integers $n$ such that

$$\begin{eqnarray}\frac{C}{t}-\frac{C^{\prime }}{t}\leqslant n\leqslant \frac{C}{t}+\frac{C^{\prime }}{t}.\end{eqnarray}$$

Note that

$$\begin{eqnarray}\frac{C+C^{\prime }}{t}\leqslant \frac{3C}{2t}\leqslant \frac{3C}{2U}\quad \text{and}\quad \frac{C-C^{\prime }}{t}\geqslant \frac{C}{2t}\geqslant \frac{C}{4U}.\end{eqnarray}$$

Thus we have

$$\begin{eqnarray}\displaystyle \operatorname{length}({\mathcal{B}}_{U}(C;C^{\prime })) & {\leqslant} & \displaystyle \mathop{\sum }_{C/4U\leqslant n\leqslant 3C/2U}\frac{2C^{\prime }}{n}\nonumber\\ \displaystyle & {\leqslant} & \displaystyle 2C^{\prime }\biggl(1+\log \frac{3C/2U}{C/4U}\biggr)=2C^{\prime }(1+\log 6),\nonumber\end{eqnarray}$$

where we have used the well-known fact that $\sum _{A\leqslant n\leqslant B}1/n\leqslant 1+\log (B/A)$ for all $0<A<B$ . Since $\log 6<2$ , the proof is complete.◻

Applying (3.15) in (3.17), we obtain

(3.21) $$\begin{eqnarray}\displaystyle \unicode[STIX]{x1D6EF}_{1} & {\leqslant} & \displaystyle (V-U)^{2}\max _{\unicode[STIX]{x1D706}_{1,2}\leqslant d}(1+3\text{e}^{-\unicode[STIX]{x1D70B}^{2}U^{2}/2})^{\unicode[STIX]{x1D706}_{1,2}}\mathop{\sum }_{\unicode[STIX]{x1D706}_{2}=1}^{d}(3d\text{e}^{-2\unicode[STIX]{x1D70B}^{2}U^{2}})^{\unicode[STIX]{x1D706}_{2}}\nonumber\\ \displaystyle & & \displaystyle \cdot \mathop{\sum }_{\unicode[STIX]{x1D706}_{1}=1}^{\unicode[STIX]{x1D706}_{2}-1}\mathop{\sum }_{\substack{ \mathbf{n}_{1}\in \mathbf{Z}^{d}\setminus \mathbf{0} \\ |L(\mathbf{n}_{1})|=\unicode[STIX]{x1D706}_{1}}}|a_{\mathbf{n}_{1}}|^{2}\mathop{\sum }_{\unicode[STIX]{x1D706}_{1,2}=0}^{\unicode[STIX]{x1D706}_{1}}\binom{\unicode[STIX]{x1D706}_{1}}{\unicode[STIX]{x1D706}_{1,2}}(3d\text{e}^{-2\unicode[STIX]{x1D70B}^{2}U^{2}})^{-\unicode[STIX]{x1D706}_{1,2}}\text{e}^{-2\unicode[STIX]{x1D70B}^{2}(\unicode[STIX]{x1D706}_{1}-\unicode[STIX]{x1D706}_{1,2})U^{2}}\nonumber\\ \displaystyle & = & \displaystyle (V-U)^{2}(1+3\text{e}^{-\unicode[STIX]{x1D70B}^{2}U^{2}/2})^{d}\mathop{\sum }_{\unicode[STIX]{x1D706}_{2}=1}^{d}(3d\text{e}^{-2\unicode[STIX]{x1D70B}^{2}U^{2}})^{\unicode[STIX]{x1D706}_{2}}\nonumber\\ \displaystyle & & \displaystyle \cdot \mathop{\sum }_{\unicode[STIX]{x1D706}_{1}=1}^{\unicode[STIX]{x1D706}_{2}-1}\mathop{\sum }_{\substack{ \mathbf{n}_{1}\in \mathbf{Z}^{d}\setminus \mathbf{0} \\ |L(\mathbf{n}_{1})|=\unicode[STIX]{x1D706}_{1}}}|a_{\mathbf{n}_{1}}|^{2}((3d\text{e}^{-2\unicode[STIX]{x1D70B}^{2}U^{2}})^{-1}+\text{e}^{-2\unicode[STIX]{x1D70B}^{2}U^{2}})^{\unicode[STIX]{x1D706}_{1}}.\end{eqnarray}$$

By hypothesis,

(3.22) $$\begin{eqnarray}dU\text{e}^{-\unicode[STIX]{x1D70B}^{2}U^{2}/2}\leqslant {\textstyle \frac{1}{3}}.\end{eqnarray}$$

It follows that

(3.23) $$\begin{eqnarray}\displaystyle & & \displaystyle ((3d\text{e}^{-2\unicode[STIX]{x1D70B}^{2}U^{2}})^{-1}+\text{e}^{-2\unicode[STIX]{x1D70B}^{2}U^{2}})^{\unicode[STIX]{x1D706}_{1}}\nonumber\\ \displaystyle & & \displaystyle \quad =(3d\text{e}^{-2\unicode[STIX]{x1D70B}^{2}U^{2}})^{-\unicode[STIX]{x1D706}_{1}}(1+3d\text{e}^{-2\unicode[STIX]{x1D70B}^{2}U^{2}/2}\text{e}^{-2\unicode[STIX]{x1D70B}^{2}U^{2}})^{\unicode[STIX]{x1D706}_{1}}\nonumber\\ \displaystyle & & \displaystyle \quad \leqslant (3d\text{e}^{-2\unicode[STIX]{x1D70B}^{2}U^{2}})^{-\unicode[STIX]{x1D706}_{1}}\biggl(1+\frac{1}{2d}\biggr)^{d}<2(3d\text{e}^{-2\unicode[STIX]{x1D70B}^{2}U^{2}})^{-\unicode[STIX]{x1D706}_{1}}.\end{eqnarray}$$

Using (3.22) and (3.23) in (3.21), we have

$$\begin{eqnarray}\displaystyle \unicode[STIX]{x1D6EF}_{1} & {\leqslant} & \displaystyle 2(V-U)^{2}\biggl(1+\frac{1}{2d}\biggr)^{d}\mathop{\sum }_{\unicode[STIX]{x1D706}_{2}=1}^{d}\mathop{\sum }_{\unicode[STIX]{x1D706}_{1}=1}^{\unicode[STIX]{x1D706}_{2}-1}(3d\text{e}^{-2\unicode[STIX]{x1D70B}^{2}U^{2}})^{\unicode[STIX]{x1D706}_{2}-\unicode[STIX]{x1D706}_{1}}\mathop{\sum }_{\substack{ \mathbf{n}_{1}\in \mathbf{Z}^{d}\setminus \mathbf{0} \\ |L(\mathbf{n}_{1})|=\unicode[STIX]{x1D706}_{1}}}|a_{\mathbf{n}_{1}}|^{2}\nonumber\\ \displaystyle & {\leqslant} & \displaystyle 4(V-U)^{2}\,d\mathop{\sum }_{j=1}^{\infty }(3d\text{e}^{-2\unicode[STIX]{x1D70B}^{2}U^{2}})^{j}\mathop{\sum }_{\mathbf{n}\in \mathbf{Z}^{d}\setminus \mathbf{0}}|a_{\mathbf{n}}|^{2}\nonumber\\ \displaystyle & = & \displaystyle 12(V-U)^{2}\frac{d^{2}\text{e}^{-2\unicode[STIX]{x1D70B}^{2}U^{2}}}{1-3d\text{e}^{-2\unicode[STIX]{x1D70B}^{2}U^{2}}}\mathop{\sum }_{\mathbf{n}\in \mathbf{Z}^{d}\setminus \mathbf{0}}|a_{\mathbf{n}}|^{2},\nonumber\end{eqnarray}$$

where we have used the substitution $j=\unicode[STIX]{x1D706}_{2}-\unicode[STIX]{x1D706}_{1}$ . Thus

(3.24) $$\begin{eqnarray}\unicode[STIX]{x1D6EF}_{1}\leqslant 12(V-U)^{2}\frac{3^{-4}U^{-2}}{1-3d\text{e}^{-2\unicode[STIX]{x1D70B}^{2}U^{2}}}\mathop{\sum }_{\mathbf{n}\in \mathbf{Z}^{d}\setminus \mathbf{0}}|a_{\mathbf{n}}|^{2}\leqslant \frac{1}{6}\mathop{\sum }_{\mathbf{n}\in \mathbf{Z}^{d}\setminus \mathbf{0}}|a_{\mathbf{n}}|^{2},\end{eqnarray}$$

since $V-U\leqslant U$ .

Next, applying (3.15) in (3.18), we obtain

(3.25) $$\begin{eqnarray}\displaystyle \unicode[STIX]{x1D6EF}_{2} & {\leqslant} & \displaystyle (V-U)^{2}\max _{\unicode[STIX]{x1D706}_{1,2}\leqslant d}(1+3\text{e}^{-\unicode[STIX]{x1D70B}^{2}U^{2}/2})^{\unicode[STIX]{x1D706}_{1,2}}\mathop{\sum }_{\unicode[STIX]{x1D706}_{1}=1}^{d}\mathop{\sum }_{\substack{ \mathbf{n}_{1}\in \mathbf{Z}^{d}\setminus \mathbf{0} \\ |L(\mathbf{n}_{1})|=\unicode[STIX]{x1D706}_{1}}}|a_{\mathbf{n}_{1}}|^{2}\nonumber\\ \displaystyle & & \displaystyle \cdot \mathop{\sum }_{\unicode[STIX]{x1D706}_{2}=1}^{\unicode[STIX]{x1D706}_{1}-1}\mathop{\sum }_{\unicode[STIX]{x1D706}_{1,2}=0}^{\unicode[STIX]{x1D706}_{2}}\binom{\unicode[STIX]{x1D706}_{1}}{\unicode[STIX]{x1D706}_{1,2}}(3d\text{e}^{-2\unicode[STIX]{x1D70B}^{2}U^{2}})^{\unicode[STIX]{x1D706}_{2}-\unicode[STIX]{x1D706}_{1,2}}\text{e}^{-2\unicode[STIX]{x1D70B}^{2}(\unicode[STIX]{x1D706}_{1}-\unicode[STIX]{x1D706}_{1,2})U^{2}}\nonumber\\ \displaystyle & {\leqslant} & \displaystyle (V-U)^{2}\biggl(1+\frac{1}{2d}\biggr)^{d}\nonumber\\ \displaystyle & & \displaystyle \cdot \mathop{\sum }_{\unicode[STIX]{x1D706}_{1}=1}^{d}\mathop{\sum }_{\unicode[STIX]{x1D706}_{2}=1}^{\unicode[STIX]{x1D706}_{1}-1}\mathop{\sum }_{\unicode[STIX]{x1D706}_{1,2}=0}^{\unicode[STIX]{x1D706}_{2}}(3d\text{e}^{-2\unicode[STIX]{x1D70B}^{2}U^{2}})^{\unicode[STIX]{x1D706}_{2}-\unicode[STIX]{x1D706}_{1,2}}(d\text{e}^{-2\unicode[STIX]{x1D70B}^{2}U^{2}})^{\unicode[STIX]{x1D706}_{1}-\unicode[STIX]{x1D706}_{1,2}}\mathop{\sum }_{\mathbf{n}\in \mathbf{Z}^{d}\setminus \mathbf{0}}|a_{\mathbf{n}}|^{2}\nonumber\\ \displaystyle & {\leqslant} & \displaystyle 2(V-U)^{2}\mathop{\sum }_{\unicode[STIX]{x1D706}_{1}=1}^{d}\mathop{\sum }_{\unicode[STIX]{x1D706}_{2}=1}^{\unicode[STIX]{x1D706}_{1}-1}\mathop{\sum }_{\unicode[STIX]{x1D706}_{1,2}=0}^{\unicode[STIX]{x1D706}_{2}}(3d\text{e}^{-2\unicode[STIX]{x1D70B}^{2}U^{2}})^{\unicode[STIX]{x1D706}_{1}+\unicode[STIX]{x1D706}_{2}-2\unicode[STIX]{x1D706}_{1,2}}\mathop{\sum }_{\mathbf{n}\in \mathbf{Z}^{d}\setminus \mathbf{0}}|a_{\mathbf{n}}|^{2},\qquad\end{eqnarray}$$

where in the last steps we have used (3.22) and the trivial upper bound

(3.26) $$\begin{eqnarray}\binom{\unicode[STIX]{x1D706}_{1}}{\unicode[STIX]{x1D706}_{1,2}}=\binom{\unicode[STIX]{x1D706}_{1}}{\unicode[STIX]{x1D706}_{1}-\unicode[STIX]{x1D706}_{1,2}}\leqslant \unicode[STIX]{x1D706}_{1}^{\unicode[STIX]{x1D706}_{1}-\unicode[STIX]{x1D706}_{1,2}}\leqslant d^{\unicode[STIX]{x1D706}_{1}-\unicode[STIX]{x1D706}_{1,2}}.\end{eqnarray}$$

Using (3.22) in (3.25), we have

(3.27) $$\begin{eqnarray}\displaystyle \unicode[STIX]{x1D6EF}_{2} & {\leqslant} & \displaystyle 2(V-U)^{2}\mathop{\sum }_{\unicode[STIX]{x1D706}_{1}=1}^{d}\mathop{\sum }_{\unicode[STIX]{x1D706}_{2}=1}^{\unicode[STIX]{x1D706}_{1}-1}\mathop{\sum }_{\unicode[STIX]{x1D706}_{1,2}=0}^{\unicode[STIX]{x1D706}_{2}}(3^{-3}d^{-1}U^{-2})^{\unicode[STIX]{x1D706}_{1}+\unicode[STIX]{x1D706}_{2}-2\unicode[STIX]{x1D706}_{1,2}}\mathop{\sum }_{\boldsymbol{ n}\in \mathbf{Z}^{d}\setminus \mathbf{0}}|a_{\mathbf{n}}|^{2}\nonumber\\ \displaystyle & {\leqslant} & \displaystyle 2(V-U)^{2}\mathop{\sum }_{j=1}^{\infty }d(j+1)j(3^{-3}d^{-1}U^{-2})^{j}\mathop{\sum }_{\mathbf{n}\in \mathbf{Z}^{d}\setminus \mathbf{0}}|a_{\mathbf{n}}|^{2}\nonumber\\ \displaystyle & = & \displaystyle 4(V-U)^{2}\frac{3^{-3}U^{-2}}{(1-3^{-3}d^{-1}U^{-2})^{3}}\mathop{\sum }_{\mathbf{n}\in \mathbf{Z}^{d}\setminus \mathbf{0}}|a_{\mathbf{n}}|^{2}\nonumber\\ \displaystyle & = & \displaystyle \frac{4}{27}(V-U)^{2}U^{-2}(1-3^{-3}d^{-1}U^{-2})^{-3}\mathop{\sum }_{\boldsymbol{ n}\in \mathbf{Z}^{d}\setminus \mathbf{0}}|a_{\mathbf{n}}|^{2}\nonumber\\ \displaystyle & {\leqslant} & \displaystyle \frac{1}{4}\mathop{\sum }_{\mathbf{n}\in \mathbf{Z}^{d}\setminus \mathbf{0}}|a_{\mathbf{n}}|^{2},\end{eqnarray}$$

where we have used the substitution $j=\unicode[STIX]{x1D706}_{1}+\unicode[STIX]{x1D706}_{2}-2\unicode[STIX]{x1D706}_{1,2}$ , the assumption $V-U\leqslant U$ , and the simple fact that

$$\begin{eqnarray}\mathop{\sum }_{j=1}^{\infty }j(j+1)x^{j}=\frac{2x}{(1-x)^{3}}\quad \text{for all}~|x|<1.\end{eqnarray}$$

Next, applying (3.15) in (3.19), we obtain

$$\begin{eqnarray}\displaystyle \unicode[STIX]{x1D6EF}_{3} & {\leqslant} & \displaystyle (V-U)^{2}\max _{\unicode[STIX]{x1D706}_{1,2}\leqslant d}(1+3\text{e}^{-\unicode[STIX]{x1D70B}^{2}U^{2}/2})^{\unicode[STIX]{x1D706}_{1,2}}\nonumber\\ \displaystyle & & \displaystyle \cdot \mathop{\sum }_{\unicode[STIX]{x1D706}_{1}=1}^{d}\mathop{\sum }_{\substack{ \mathbf{n}_{1}\in \mathbf{Z}^{d}\setminus \mathbf{0} \\ |L(\mathbf{n}_{1})|=\unicode[STIX]{x1D706}_{1}}}|a_{\mathbf{n}_{1}}|^{2}\mathop{\sum }_{\unicode[STIX]{x1D706}_{1,2}=0}^{\unicode[STIX]{x1D706}_{1}-1}\binom{\unicode[STIX]{x1D706}_{1}}{\unicode[STIX]{x1D706}_{1,2}}(3d\text{e}^{-(2+2)\unicode[STIX]{x1D70B}^{2}U^{2}})^{\unicode[STIX]{x1D706}_{1}-\unicode[STIX]{x1D706}_{1,2}}.\nonumber\end{eqnarray}$$

Using (3.26) in this, we have

$$\begin{eqnarray}\displaystyle \unicode[STIX]{x1D6EF}_{3} & {\leqslant} & \displaystyle (V-U)^{2}\max _{\unicode[STIX]{x1D706}_{1,2}\leqslant d}(1+3\text{e}^{-\unicode[STIX]{x1D70B}^{2}U^{2}/2})^{\unicode[STIX]{x1D706}_{1,2}}\nonumber\\ \displaystyle & & \displaystyle \cdot \mathop{\sum }_{\unicode[STIX]{x1D706}_{1}=1}^{d}\mathop{\sum }_{\substack{ \mathbf{n}_{1}\in \mathbf{Z}^{d}\setminus \mathbf{0} \\ |L(\mathbf{n}_{1})|=\unicode[STIX]{x1D706}_{1}}}|a_{\mathbf{n}_{1}}|^{2}\mathop{\sum }_{\unicode[STIX]{x1D706}_{1,2}=0}^{\unicode[STIX]{x1D706}_{1}-1}(3d^{2}\text{e}^{-4\unicode[STIX]{x1D70B}^{2}U^{2}})^{\unicode[STIX]{x1D706}_{1}-\unicode[STIX]{x1D706}_{1,2}}.\nonumber\end{eqnarray}$$

Then using (3.22) in this, we have

(3.28) $$\begin{eqnarray}\displaystyle \unicode[STIX]{x1D6EF}_{3} & {\leqslant} & \displaystyle (V-U)^{2}(1+3(dU)^{-1})^{d}\mathop{\sum }_{\unicode[STIX]{x1D706}_{1}=1}^{d}\mathop{\sum }_{\unicode[STIX]{x1D706}_{1,2}=0}^{\unicode[STIX]{x1D706}_{1}-1}(3(3^{4}\,dU)^{-2})^{\unicode[STIX]{x1D706}_{1}-\unicode[STIX]{x1D706}_{1,2}}\mathop{\sum }_{\mathbf{n}\in \mathbf{Z}^{d}\setminus \mathbf{0}}|a_{\mathbf{n}}|^{2}\nonumber\\ \displaystyle & {\leqslant} & \displaystyle (V-U)^{2}\biggl(1+\frac{1}{2d}\biggr)^{d}d\mathop{\sum }_{j=1}^{\infty }(3^{7}d^{2}U^{2})^{-j}\mathop{\sum }_{\boldsymbol{ n}\in \mathbf{Z}^{d}\setminus \mathbf{0}}|a_{\mathbf{n}}|^{2}\nonumber\\ \displaystyle & {\leqslant} & \displaystyle 2d(V-U)^{2}\frac{(3^{7}d^{2}U^{2})^{-1}}{1-(3^{7}d^{2}U^{2})^{-1}}\mathop{\sum }_{\mathbf{n}\in \mathbf{Z}^{d}\setminus \mathbf{0}}|a_{\mathbf{n}}|^{2}\nonumber\\ \displaystyle & = & \displaystyle 2(V-U)^{2}U^{-2}\frac{(3^{7}d)^{-1}}{1-(3^{7}d^{2}U^{2})^{-1}}\mathop{\sum }_{\mathbf{n}\in \mathbf{Z}^{d}\setminus \mathbf{0}}|a_{\mathbf{n}}|^{2}\nonumber\\ \displaystyle & {\leqslant} & \displaystyle \frac{1}{100d}\mathop{\sum }_{\mathbf{n}\in \mathbf{Z}^{d}\setminus \mathbf{0}}|a_{\mathbf{n}}|^{2},\end{eqnarray}$$

where we have used the substitution $j=\unicode[STIX]{x1D706}_{1}-\unicode[STIX]{x1D706}_{1,2}$ and the hypothesis $V-U\leqslant U$ .

Finally we estimate (3.20). We have

(3.29) $$\begin{eqnarray}\unicode[STIX]{x1D6EF}_{4}=\unicode[STIX]{x1D6EF}_{4}^{(1)}+\unicode[STIX]{x1D6EF}_{4}^{(2)},\end{eqnarray}$$

where

(3.30) $$\begin{eqnarray}\unicode[STIX]{x1D6EF}_{4}^{(1)}=\int _{U}^{V}\mathop{\sum }_{\mathbf{n}_{1}\in \mathbf{Z}^{d}\setminus \mathbf{0}}|a_{\mathbf{n}_{1}}|^{2}\biggl(\int _{U}^{V}\mathop{\sum }_{\substack{ \mathbf{n}_{2}\in \mathbf{Z}^{d}\setminus \mathbf{0} \\ L(\mathbf{n}_{2})=L(\mathbf{n}_{1}) \\ h_{0}(t_{2};\mathbf{n}_{2})<|L(\mathbf{n}_{1})|}}\text{e}^{-2\unicode[STIX]{x1D70B}^{2}|t_{1}\mathbf{n}_{1}-t_{2}\mathbf{n}_{2}|^{2}}\,dt_{2}\biggr)\,dt_{1},\end{eqnarray}$$

and where

(3.31) $$\begin{eqnarray}\unicode[STIX]{x1D6EF}_{4}^{(2)}=\int _{U}^{V}\mathop{\sum }_{\mathbf{n}_{1}\in \mathbf{Z}^{d}\setminus \mathbf{0}}|a_{\mathbf{n}_{1}}|^{2}\biggl(\int _{U}^{V}\mathop{\sum }_{\substack{ \mathbf{n}_{2}\in \mathbf{Z}^{d}\setminus \mathbf{0} \\ L(\mathbf{n}_{2})=L(\mathbf{n}_{1}) \\ h_{0}(t_{2};\mathbf{n}_{2})=|L(\mathbf{n}_{1})|}}\text{e}^{-2\unicode[STIX]{x1D70B}^{2}|t_{1}\mathbf{n}_{1}-t_{2}\mathbf{n}_{2}|^{2}}\,dt_{2}\biggr)\,dt_{1}.\end{eqnarray}$$

We start with the term $\unicode[STIX]{x1D6EF}_{4}^{(1)}$ given by (3.30), and use the notation $\unicode[STIX]{x1D706}_{1}=|L(\mathbf{n}_{1})|$ . Note that here $\unicode[STIX]{x1D706}_{1}=\unicode[STIX]{x1D706}_{2}=\unicode[STIX]{x1D706}_{1,2}$ . Write

$$\begin{eqnarray}{\mathcal{H}}^{(1)}(U)=\mathop{\sum }_{\substack{ r\geqslant 0 \\ h_{0},\ldots ,h_{r-1}\geqslant 0 \\ h_{r}\geqslant 1 \\ h_{0}+\cdots +r_{r}=\unicode[STIX]{x1D706}_{1}}}\binom{\unicode[STIX]{x1D706}_{1}}{h_{0}}\mathop{\prod }_{i=1}^{r}\binom{\unicode[STIX]{x1D706}_{1}-h_{0}-\cdots -h_{i-1}}{h_{i}}2^{h_{i}}\text{e}^{-\unicode[STIX]{x1D70B}^{2}h_{i}((2i-1)U)^{2}/2}.\end{eqnarray}$$

In this special case the argument of (3.10)–(3.15) simplifies to the upper bound

$$\begin{eqnarray}\displaystyle \unicode[STIX]{x1D6EF}_{4}^{(1)} & {\leqslant} & \displaystyle \int _{U}^{V}\mathop{\sum }_{\mathbf{n}_{1}\in \mathbf{Z}^{d}\setminus \mathbf{0}}|a_{\mathbf{n}_{1}}|^{2}(V-U){\mathcal{H}}^{(1)}(U)\,dt_{1}\nonumber\\ \displaystyle & = & \displaystyle \int _{U}^{V}\mathop{\sum }_{\mathbf{n}_{1}\in \mathbf{Z}^{d}\setminus \mathbf{0}}|a_{\mathbf{n}_{1}}|^{2}(V-U)\nonumber\\ \displaystyle & & \displaystyle \cdot \,((1+2\text{e}^{-\unicode[STIX]{x1D70B}^{2}U^{2}/2}+2\text{e}^{-\unicode[STIX]{x1D70B}^{2}3^{2}U^{2}/2}+2\text{e}^{-\unicode[STIX]{x1D70B}^{2}5^{2}U^{2}/2}+\cdots )^{\unicode[STIX]{x1D706}_{1}}-1)\,dt_{1},\nonumber\end{eqnarray}$$

where the term $-1$ at the end is explained by the restriction $h_{0}<\unicode[STIX]{x1D706}_{1}$ , i.e. the term $-1$ is due to the fact that in the application of the multinomial theorem we have an almost complete sum, where the single missing case is $h_{0}=\unicode[STIX]{x1D706}_{1}$ . It now follows that

(3.32) $$\begin{eqnarray}\displaystyle \unicode[STIX]{x1D6EF}_{4}^{(1)} & {\leqslant} & \displaystyle \mathop{\sum }_{\mathbf{n}_{1}\in \mathbf{Z}^{d}\setminus \mathbf{0}}|a_{\mathbf{n}_{1}}|^{2}(V-U)^{2}((1+3\text{e}^{-\unicode[STIX]{x1D70B}^{2}U^{2}/2})^{d}-1)\nonumber\\ \displaystyle & {\leqslant} & \displaystyle \unicode[STIX]{x1D70E}_{0}^{2}(f)(V-U)^{2}(\exp (3d\text{e}^{-\unicode[STIX]{x1D70B}^{2}U^{2}/2})-1)\nonumber\\ \displaystyle & {\leqslant} & \displaystyle \unicode[STIX]{x1D70E}_{0}^{2}(f)(V-U)^{2}6\,d\text{e}^{-\unicode[STIX]{x1D70B}^{2}U^{2}/2}\end{eqnarray}$$

since

(3.33) $$\begin{eqnarray}d\text{e}^{-\unicode[STIX]{x1D70B}^{2}U^{2}/2}\leqslant {\textstyle \frac{1}{3}}.\end{eqnarray}$$

Note that in (3.32), we have used Parseval’s formula and the simple inequalities $1+x\leqslant \text{e}^{x}\leqslant 1+2x$ for $0\leqslant x\leqslant 1$ ; and of course (3.33) follows from (3.22). If we now apply (3.22) in (3.32), then we conclude that

(3.34) $$\begin{eqnarray}\unicode[STIX]{x1D6EF}_{4}^{(1)}\leqslant \unicode[STIX]{x1D70E}_{0}^{2}(f)(V-U)^{2}\frac{6}{3U}\leqslant 2\unicode[STIX]{x1D70E}_{0}^{2}(f)(V-U),\end{eqnarray}$$

since $V-U\leqslant U$ .

To estimate $\unicode[STIX]{x1D6EF}_{4}^{(2)}$ given by (3.31), we make use of the fact that for every given triple $(t_{1},\mathbf{n}_{1},t_{2})$ with $t_{1},t_{2}\in [U,V]$ and $\mathbf{n}_{1}\in \mathbf{Z}^{d}\setminus \mathbf{0}$ , there is at most one term in the sum

$$\begin{eqnarray}\mathop{\sum }_{\substack{ \mathbf{n}_{2}\in \mathbf{Z}^{d}\setminus \mathbf{0} \\ L(\mathbf{n}_{2})=L(\mathbf{n}_{1}) \\ h_{0}(t_{2};\mathbf{n}_{2})=|L(\mathbf{n}_{1})|}}\text{e}^{-2\unicode[STIX]{x1D70B}^{2}|t_{1}\mathbf{n}_{1}-t_{2}\mathbf{n}_{2}|^{2}}.\end{eqnarray}$$

Indeed, it follows easily from the definition of $h_{0}(t_{2};\mathbf{n}_{2})$ that for every given triple $(t_{1},\mathbf{n}_{1},t_{2})$ , the inequality $|t_{1}n_{1,j}-t_{2}n_{2,j}|<U/2$ has at most one integer solution $n_{2,j}$ , and since $h_{0}(t_{2};\mathbf{n}_{2})=|L(\mathbf{n}_{1})|$ and $L(\mathbf{n}_{2})=L(\mathbf{n}_{1})$ , there is therefore at most one $\mathbf{n}_{2}\in \mathbf{Z}^{d}\setminus \mathbf{0}$ satisfying the requirements.

Let $\mathbf{n}_{2}^{\ast }\in \mathbf{Z}^{d}\setminus \mathbf{0}$ , $\mathbf{n}_{2}^{\ast }=\mathbf{n}_{2}^{\ast }(t_{1},\mathbf{n}_{1},t_{2})$ , denote this single integer lattice point, if it exists. Then (3.31) can be rewritten in the form

$$\begin{eqnarray}\unicode[STIX]{x1D6EF}_{4}^{(2)}=\int _{U}^{V}\mathop{\sum }_{\mathbf{n}_{1}\in \mathbf{Z}^{d}\setminus \mathbf{0}}|a_{\mathbf{n}_{1}}|^{2}\biggl(\int _{U}^{V}\text{e}^{-2\unicode[STIX]{x1D70B}^{2}|t_{1}\mathbf{n}_{1}-t_{2}\mathbf{n}_{2}^{\ast }|^{2}}\,dt_{2}\biggr)\,dt_{1}\end{eqnarray}$$

if $\mathbf{n}_{2}^{\ast }=\mathbf{n}_{2}^{\ast }(t_{1},\mathbf{n}_{1},t_{2})$ exists, and vanishes otherwise. Let $j_{0}=j_{0}(\mathbf{n}_{1})$ be an arbitrary but fixed element of the set $L(\mathbf{n}_{1})$ . Then we have trivially

(3.35) $$\begin{eqnarray}\unicode[STIX]{x1D6EF}_{4}^{(2)}\leqslant \int _{U}^{V}\mathop{\sum }_{\mathbf{n}_{1}\in \mathbf{Z}^{d}\setminus \mathbf{0}}|a_{\mathbf{n}_{1}}|^{2}\biggl(\int _{U}^{V}\text{e}^{-2\unicode[STIX]{x1D70B}^{2}|t_{1}n_{1,j_{0}}-t_{2}n_{2,j_{0}}^{\ast }|^{2}}\,dt_{2}\biggr)\,dt_{1}.\end{eqnarray}$$

Given $(t_{1},\mathbf{n}_{1})$ , we decompose the interval $U\leqslant t_{2}\leqslant V$ , where $V\leqslant 2U$ , into sets

(3.36) $$\begin{eqnarray}I_{\ell }(t_{1},\mathbf{n}_{1})=\{t_{2}\in [U,V]:\ell -1\leqslant |t_{1}n_{1,j_{0}}-t_{2}n_{2,j_{0}}^{\ast }|<\ell \},\quad \ell =1,2,3,\ldots ,\end{eqnarray}$$

where $j_{0}=j_{0}(\mathbf{n}_{1})$ and $\mathbf{n}_{2}^{\ast }=\mathbf{n}_{2}^{\ast }(t_{1},\mathbf{n}_{1},t_{2})$ . Using the decomposition (3.36) in (3.35), we obtain the upper bound

(3.37) $$\begin{eqnarray}\unicode[STIX]{x1D6EF}_{4}^{(2)}\leqslant \int _{U}^{V}\mathop{\sum }_{\mathbf{n}_{1}\in \mathbf{Z}^{d}\setminus \mathbf{0}}|a_{\mathbf{n}_{1}}|^{2}\biggl(\mathop{\sum }_{\ell =1}^{\infty }\operatorname{length}(I_{\ell })\text{e}^{-2\unicode[STIX]{x1D70B}^{2}(\ell -1)^{2}}\biggr)\,dt_{1},\end{eqnarray}$$

where $I_{\ell }=I_{\ell }(t_{1},\mathbf{n}_{1})$ . Applying Lemma 3.3 with $C=t_{1}n_{1,j_{0}}$ and $C^{\prime }=\ell$ , we obtain the upper bound $\operatorname{length}(I_{\ell })<6\ell$ . Using this in (3.37), we conclude that

(3.38) $$\begin{eqnarray}\unicode[STIX]{x1D6EF}_{4}^{(2)}\leqslant \int _{U}^{V}\mathop{\sum }_{\mathbf{n}\in \mathbf{Z}^{d}\setminus \mathbf{0}}|a_{\mathbf{n}}|^{2}\biggl(\mathop{\sum }_{\ell =1}^{\infty }6\ell \text{e}^{-2\unicode[STIX]{x1D70B}^{2}(\ell -1)^{2}}\biggr)\,dt_{1}\leqslant 7(V-U)\mathop{\sum }_{\mathbf{n}\in \mathbf{Z}^{d}\setminus \mathbf{0}}|a_{\mathbf{n}}|^{2}.\end{eqnarray}$$

Finally, combining (3.3), (3.24), (3.27)–(3.29), (3.34) and (3.38) with (3.16), we conclude that

$$\begin{eqnarray}\displaystyle |\unicode[STIX]{x1D6E5}_{f}^{2}(\text{Gauss};U,V)| & {\leqslant} & \displaystyle \biggl(\frac{1}{6}+\frac{1}{4}+\frac{1}{100d}+2(V-U)+7(V-U)\biggr)\unicode[STIX]{x1D70E}_{0}^{2}(f)\nonumber\\ \displaystyle & {\leqslant} & \displaystyle (9(V-U)+1)\unicode[STIX]{x1D70E}_{0}^{2}(f).\nonumber\end{eqnarray}$$

This completes the proof of Theorem 3.1.

To prove Theorem 1.2, we have to eliminate the technical restriction $U<V\leqslant 2U$ in Theorem 3.1.

Given $1\leqslant U<W$ , let $k\geqslant 1$ denote the integer such that $2^{k-1}U<W\leqslant 2^{k}U$ . It is easy to find a sequence $1\leqslant W_{0}=U<W_{1}<\cdots <W_{k}=W$ such that $W_{i}<W_{i+1}\leqslant 2W_{i}$ for all $0\leqslant i<k$ . By using the Cauchy–Schwarz inequality and Theorem 3.1, we have

$$\begin{eqnarray}\displaystyle & & \displaystyle \unicode[STIX]{x1D6E5}_{f}^{2}(\text{Gauss};U,W)\nonumber\\ \displaystyle & & \displaystyle \quad =(2\unicode[STIX]{x1D70B})^{-d/2}\int _{\mathbf{R}^{d}}(D_{f}(\mathbf{v};U,W))^{2}\text{e}^{-|\mathbf{v}|^{2}/2}\,d\mathbf{v}\nonumber\\ \displaystyle & & \displaystyle \quad =(2\unicode[STIX]{x1D70B})^{-d/2}\int _{\mathbf{R}^{d}}\biggl(\mathop{\sum }_{j=0}^{k-1}D_{f}(\mathbf{v};W_{j},W_{j+1})\biggr)^{2}\text{e}^{-|\mathbf{v}|^{2}/2}\,d\mathbf{v}\nonumber\\ \displaystyle & & \displaystyle \quad \leqslant (2\unicode[STIX]{x1D70B})^{-d/2}\int _{\mathbf{R}^{d}}\biggl(\mathop{\sum }_{j=0}^{k-1}(D_{f}(\mathbf{v};W_{j},W_{j+1}))^{2}\biggr)\biggl(\mathop{\sum }_{j=0}^{k-1}1\biggr)\text{e}^{-|\mathbf{v}|^{2}/2}\,d\mathbf{v}\nonumber\\ \displaystyle & & \displaystyle \quad =k\mathop{\sum }_{j=0}^{k-1}(2\unicode[STIX]{x1D70B})^{-d/2}\int _{\mathbf{R}^{d}}(D_{f}(\mathbf{v};W_{j},W_{j+1}))^{2}\text{e}^{-|\mathbf{v}|^{2}/2}\,d\mathbf{v}\nonumber\\ \displaystyle & & \displaystyle \quad \leqslant k\unicode[STIX]{x1D70E}_{0}^{2}(f)\mathop{\sum }_{j=0}^{k-1}(9(W_{j+1}-W_{j})+1)\leqslant k\unicode[STIX]{x1D70E}_{0}^{2}(f)(10(W-U)+10).\nonumber\end{eqnarray}$$

Since $k=\lceil \log _{2}(W/U)\rceil$ , the proof of Theorem 1.2 is complete.

References

Beck, J., From Khinchin’s conjecture on strong uniformity to superuniform motions. Mathematika 61 2015, 591707.CrossRefGoogle Scholar
Drmota, M. and Tichy, R. F., Sequences, Discrepancies and Applications (Lecture Notes in Mathematics  1651 ), Springer (1997).CrossRefGoogle Scholar
Feller, W., An Introduction to Probability Theory and its Applications, Vol. 1, 2nd edn, Wiley (1971).Google Scholar
Khinchin, A., Ein Satz über Kettenbrüche mit arithmetischen Anwendungen. Math. Z. 18 1923, 289306.CrossRefGoogle Scholar
Marstrand, J. M., On Khinchin’s conjecture about strong uniform distribution. Proc. Lond. Math. Soc. (3) 21 1970, 540556.CrossRefGoogle Scholar