1 Introduction
Let $N\geqslant 100$ be a natural number (so that $\log \log N$ is positive). If $k\geqslant 3$ is a natural number we define $r_{k}(N)$ to be the largest cardinality of a set $A\subset [N]:=\{1,\ldots ,N\}$ that does not contain an arithmetic progression of $k$ distinct elements.
Klaus Roth proved in 1953 [Reference Roth24] that $r_{3}(N)\ll N(\log \log N)^{-1}$ , and so in particularFootnote 1 $r_{3}(N)=o(N)$ as $N\rightarrow \infty$ . Since Szemerédi’s 1969 proof [Reference Szemerédi29] that $r_{4}(N)=o(N)$ , and his later proof [Reference Szemerédi30] that $r_{k}(N)=o_{k}(N)$ for $k\geqslant 5$ (answering a question from [Reference Erdős and Turán10]), it has been natural to ask for similarly effective bounds for these quantities. It is worth noting that the famous conjecture of Erdős [Reference Erdős9] asserting that every set of natural numbers whose sum of reciprocals is divergent is equivalent to the claim that $\sum _{n=1}^{\infty }r_{k}(2^{n})/2^{n}<\infty$ for all $k\geqslant 3$ (see [Reference Tao and Vu33, Exercise 10.0.6]).
A first attempt towards quantitative bounds for higher $k$ was made by Roth in [Reference Roth25], who provided a new proof that $r_{4}(N)=o(N)$ . A major breakthrough was made in 1998 by Gowers [Reference Gowers11, Reference Gowers12], who obtained the bound $r_{k}(N)\ll _{k}N(\log \log N)^{-\unicode[STIX]{x1D716}_{k}}$ for each $k\geqslant 4$ , where $\unicode[STIX]{x1D716}_{k}:=1/2^{2^{k+9}}$ . In the other direction, a classical result of Behrend [Reference Behrend2] shows that $r_{3}(N)\gg N\exp (-c\sqrt{\log N})$ for some absolute constant $c>0$ (see [Reference Elkin8, Reference Green and Wolf20] for a slight refinement of this bound), and in [Reference Rankin23] (see also [Reference Łaba and Lacey22]) the argument was generalized to give the bound $r_{1+2^{k}}(N)\gg _{k}N\exp (-c\log ^{1/(k+1)}N)$ for any $k\geqslant 1$ .
In the meantime, there has been progress on $r_{3}(N)$ . Szemerédi (unpublished) obtained the bound $r_{3}(N)\ll N\text{e}^{-c\sqrt{\log \log N}}$ , and shortly thereafter Heath-Brown [Reference Heath-Brown21] and Szemerédi [Reference Szemerédi32] independently obtained the bound $r_{3}(N)\ll N(\log N)^{-c}$ for some absolute constant $c>0$ . The best known value of $c$ has been improved in a series of papers [Reference Bloom4, Reference Bourgain6, Reference Bourgain7, Reference Sanders27, Reference Sanders28]. Sanders [Reference Sanders28] was the first to show that any $c<1$ is admissible, and Bloom [Reference Bloom4] improved the factor of $\log \log N$ in Sanders’s bound.
The only other direct progress on upper bounds for $r_{k}(N)$ is our previous paper [Reference Green, Tao, Chen, Gowers, Halberstam, Schmidt and Vaughan19], obtaining the bound $r_{4}(N)\ll N\text{e}^{-c\sqrt{\log \log N}}$ . The main objective of this paper is to obtain a bound for $r_{4}(N)$ of the same quality as the Heath-Brown and Szemerédi bound for $r_{3}(N)$ .
Theorem 1.1. We have $r_{4}(N)\ll N(\log N)^{-c}$ for some absolute constant $c>0$ .
An analogous result in finite fields was claimed (and published [Reference Green and Tao15]) by us around 12 years ago, although an error in this paper came to light some years later. This was corrected around 5 years ago in [Reference Green and Tao16]. These papers (like almost all of the previously cited quantitative results on $r_{k}(N)$ ) are based on the density increment argument of Roth [Reference Roth24]. However we will use a slightly different “energy decrement” and “regularity” approach here, inspired by the Khinchin-type recurrence theorems for length-four progressions established by Bergelson et al [Reference Bergelson, Host and Kra3] in the ergodic setting, and by the authors [Reference Green13] in the combinatorial setting.
2 Notation
We use the asymptotic notation $X\ll Y$ or $X=O(Y)$ to denote $|X|\leqslant CY$ for some constant $C$ . Given an asymptotic parameter $N$ going to infinity, we use $X=o(Y)$ to denote the bound $|X|\leqslant c(N)Y$ for some function $c(N)$ of $N$ that goes to zero as $N$ goes to infinity. We also write $X\asymp Y$ for $X\ll Y\ll X$ . If we need the implied constant $C$ or decay function $c(\,)$ to depend on an additional parameter, we indicate this by subscripts, e.g. $X=o_{k}(Y)$ denotes the bound $|X|\leqslant c_{k}(N)Y$ for a function $c_{k}(N)$ that goes to zero as $N\rightarrow \infty$ for any fixed choice of $k$ .
We will frequently use probabilistic notation, and adopt the convention that boldface variables such as $\mathbf{a}$ or $\mathbf{r}$ represent random variables, whereas non-boldface variables such as $a$ and $r$ represent deterministic variables (or constants). We write $\mathbb{P}(E)$ for the probability of a random event $E$ , and $\mathbb{E}\mathbf{X}$ and $\operatorname{Var}\mathbf{X}$ for the expectation and variance of a real or complex random variable $\mathbf{X}$ ; we also use $\mathbb{E}(\mathbf{X}|E)=\mathbb{E}\mathbf{X}1_{E}/\mathbb{P}(E)$ for the conditional expectation of $\mathbf{X}$ relative to an event $E$ of non-zero probability, where of course $1_{E}$ denotes the indicator variable of $E$ . In this paper, the random variables $\mathbf{X}$ of which we will compute expectations of will be discrete, in the sense that they take only finitely many values, so there will be no issues of measurability. The essential range of a discrete random variable $\mathbf{X}$ is the set of all values $X$ for which $\mathbb{P}(\mathbf{X}=X)$ is non-zero.
By a slight abuse of notation, we also retain the traditional (in additive combinatorics) use for $\mathbb{E}$ as an average, thus $\mathbb{E}_{a\in A}f(a):=(1/|A|)\sum _{a\in A}f(a)$ for any finite non-empty set $A$ and function $f:A\rightarrow \mathbb{C}$ , where we use $|A|$ to denote the cardinality of $A$ . Thus for instance $\mathbb{E}_{a\in A}f(a)=\mathbb{E}f(\mathbf{a})$ if $\mathbf{a}$ is drawn uniformly at random from $A$ .
A function $f:A\rightarrow \mathbb{C}$ is said to be $1$ -bounded if one has $|f(a)|\leqslant 1$ for all $a\in A$ . We will frequently rely on the following probabilistic form of the Cauchy–Schwarz inequality, the proof of which is an exercise.
Lemma 2.1 (Cauchy–Schwarz).
Let $A,B$ be sets, let $f:A\rightarrow \mathbb{C}$ be a $1$ -bounded function, and let $g:A\times B\rightarrow \mathbb{C}$ be another function. Let $\mathbf{a},\mathbf{b},\mathbf{b}^{\prime }$ be discrete random variables in $A,B,B^{\prime }$ respectively, such that $\mathbf{b}^{\prime }$ is a conditionally independent copy of $\mathbf{b}$ relative to $\mathbf{a}$ , that is to say that
for all $a$ in the essential range of $\mathbf{a}$ and all $b,b^{\prime }\in B$ . Then we have
We will think of this lemma as allowing one to eliminate a factor $f(\mathbf{a})$ from a lower bound of the form $|\mathbb{E}f(\mathbf{a})g(\mathbf{a},\mathbf{b})|\geqslant \unicode[STIX]{x1D702}$ , at the cost of duplicating the factor $g$ , and worsening the lower bound from $\unicode[STIX]{x1D702}$ to $\unicode[STIX]{x1D702}^{2}$ .
We also have the following variant of Lemma 2.1.
Lemma 2.2 (Popularity principle).
Let $\mathbf{a}$ be a random variable taking values in a set $A$ , and let $f:A\rightarrow [-C,C]$ be a function for some $C>0$ . If we have $\mathbb{E}f(\mathbf{a})\geqslant \unicode[STIX]{x1D702}$ for some $\unicode[STIX]{x1D702}>0$ then, with probability at least $\unicode[STIX]{x1D702}/2C$ , the random variable $\mathbf{a}$ attains a value $a\in A$ for which $f(a)\geqslant \unicode[STIX]{x1D702}/2$ .
Proof. If we set $\unicode[STIX]{x1D6FA}:=\{a\in A:f(a)\geqslant \unicode[STIX]{x1D702}/2\}$ , then
and hence on taking expectations
This implies that
giving the claim. ◻
If $\unicode[STIX]{x1D703}\in \mathbb{R}$ , we write $\Vert \unicode[STIX]{x1D703}\Vert _{\mathbb{R}/\mathbb{Z}}$ for the distance from $\unicode[STIX]{x1D703}$ to the nearest integer, and $e(\unicode[STIX]{x1D703})=\text{e}^{2\unicode[STIX]{x1D70B}\text{i}\unicode[STIX]{x1D703}}$ . Observe from elementary trigonometry that
and hence also
We will also use the triangle inequalities
for $\unicode[STIX]{x1D703}_{1},\unicode[STIX]{x1D703}_{2}\in \mathbb{R}/\mathbb{Z}$ and $k\in \mathbb{Z}$ frequently in the sequel, often without further comment.
For any prime $p$ , we (by slight abuse of notation) let $a\mapsto a/p$ be the obvious homomorphism from $\mathbb{Z}/p\mathbb{Z}$ to $\mathbb{R}/\mathbb{Z}$ that maps $a~(\operatorname{mod}~p)$ to $a/p~(\operatorname{mod}~1)$ for any integer $a$ . We then define $e_{p}:\mathbb{Z}/p\mathbb{Z}\rightarrow \mathbb{C}$ to be the character
of $\mathbb{Z}/p\mathbb{Z}$ .
3 High-level overview of argument
We will establish Theorem 1.1 by establishing the following result, related to the Khinchin-type recurrence theorems mentioned earlier. It will be convenient to introduce the notation
whenever $\mathbf{a},\mathbf{r}$ are random variables on $\mathbb{Z}/p\mathbb{Z}$ and $\mathbf{f}:\mathbb{Z}/p\mathbb{Z}\rightarrow [-1,1]$ is a random function; of course, the notation can also be applied to deterministic functions $f:\mathbb{Z}/p\mathbb{Z}\rightarrow [-1,1]$ . Later on we will also need the conditional variant
for some events $E$ of non-zero probability. Informally, this quantity counts the density of arithmetic progressions $\mathbf{a},\mathbf{a}\,+\,\mathbf{r},\mathbf{a}\,+\,2\mathbf{r},\mathbf{a}\,+\,3\mathbf{r}$ on the event $E$ weighted by $\mathbf{f}$ , where $\mathbf{a},\mathbf{r}$ need not be drawn uniformly or independently (and $\mathbf{f}$ may also be coupled to $\mathbf{a},\mathbf{r}$ ).
Theorem 3.1. Let $p$ be a prime, let $\unicode[STIX]{x1D702}$ be a real number with $0<\unicode[STIX]{x1D702}\leqslant {\textstyle \frac{1}{10}}$ , and let $f:\mathbb{Z}/p\mathbb{Z}\rightarrow [-1,1]$ be a function. Then there exist random variables $\mathbf{a},\mathbf{r}\in \mathbb{Z}/p\mathbb{Z}$ , not necessarily independent, obeying the near-uniform distribution bound
the recurrence property
and the “thickness” bound
We note that a variant of Theorem 3.1 was established by us in [Reference Green13] (answering a question in [Reference Bergelson, Host and Kra3]), in which the random variable $\mathbf{a}$ was uniformly distributed in $\mathbb{Z}/p\mathbb{Z}$ , the random variable $\mathbf{r}$ was uniformly distributed in a subset of $\mathbb{Z}/p\mathbb{Z}$ of size $\gg _{\unicode[STIX]{x1D702}}p$ and was independent of $\mathbf{a}$ , and the condition (3.4) (which is crucial to the quantitative bound in Theorem 1.1) was not present. Compared to that result, Theorem 3.1 obtains the much more quantitative bound (3.4), but at the expense of no longer enforcing independence between $\mathbf{a}$ and $\mathbf{r}$ . The use of non-independent random variables $\mathbf{a},\mathbf{r}$ is an innovation of this current paper; it is similar to the technique in previous papers of using “factors” (finite partitions) to break up the domain $\mathbb{Z}/p\mathbb{Z}$ into smaller “atoms” such as Bohr sets and analyzing each atom separately. However there will be technical advantages from the more general framework of pairs of independent random variables $\mathbf{a},\mathbf{r}$ . In particular we will be able to avoid some of the boundary issues arising from irregularity of Bohr sets, by using the smoother device of “regular probability distributions” associated to such sets. Although $f$ is allowed to attain negative values in Theorem 3.1, in our applications we shall only be concerned with the case when $f$ is non-negative.
Let us now see how Theorem 1.1 follows from Theorem 3.1. Clearly we may assume that $N\geqslant 100$ . Suppose that $A$ is a subset of $\{1,\ldots ,N\}$ without any non-trivial four-term arithmetic progressions. By Bertrand’s postulate, we may find a prime $p$ between (for example) $2N$ and $4N$ . If we define $f:\mathbb{Z}/p\mathbb{Z}\rightarrow [-1,1]$ to be the indicator function $1_{A}$ of $A$ (viewed as a subset of $\mathbb{Z}/p\mathbb{Z}$ ), then we have
and also
whenever $a,r\in \mathbb{Z}/p\mathbb{Z}$ with $r$ non-zero. Now let $\mathbf{a},\mathbf{r}$ be as in Theorem 3.1, with $\unicode[STIX]{x1D702}$ to be chosen later. From (3.2), (3.3), (3.5) we have
But by (3.6), (3.4), the left-hand side is $O(\exp (-\unicode[STIX]{x1D702}^{-O(1)})/p)$ . Setting $\unicode[STIX]{x1D702}:=c\log ^{-c}p$ for a sufficiently small absolute constant $c>0$ , we conclude that
and hence $A\ll N\log ^{-c/4}N$ , giving Theorem 1.1.
Remark.
As mentioned previously, the arguments in [Reference Green13] established a bound of the form (3.3) with $\mathbf{a}$ and $\mathbf{r}$ independent, and also one could ensure that $\mathbf{a}$ was uniformly distributed over $\mathbb{Z}/p\mathbb{Z}$ . As a consequence, one could establish a variant of Theorem 1.1, namely that for any $N\geqslant 1$ , $\unicode[STIX]{x1D702}>0$ , and $A\subset [N]$ , one had
for $\gg _{\unicode[STIX]{x1D702}}N$ choices of $0\leqslant r\leqslant N$ . Unfortunately our methods do not seem to provide a good bound of this form due to our coupling together of $\mathbf{a}$ and $\mathbf{r}$ .
It remains to establish Theorem 3.1. As in [Reference Bergelson, Host and Kra3, Reference Green13], the lower bound (3.3) will ultimately come from the following consequence of the Cauchy–Schwarz inequality that counts solutions to the equation $x-3y+3z-w=0$ for $x,y,z,w$ in some subset of a compact abelian group; this inequality is a specific feature of the theory of length-four progressions that is not available for longer progressionsFootnote 2 .
Lemma 3.2 (Application of Cauchy–Schwarz).
Let $G=(G,+)$ be a compact abelian group, let $\unicode[STIX]{x1D707}$ be the probability Haar measure on $G$ , and let $F:G\rightarrow \mathbb{R}$ be a bounded measurable function. Then
Proof. Making the change of variables $w=x-3y$ and using Fubini’s theorem, the left-hand side may be rewritten as
which by the Cauchy–Schwarz inequality is at least
But by a further application of Fubini’s theorem, the expression inside the square is $(\int _{G}F(x)\,d\unicode[STIX]{x1D707}(x))^{2}$ . The claim follows.◻
To see the relevance of this lemma to Theorem 3.1, and to motivate the strategy of proof of that theorem, let us first test that theorem on some key examples. To simplify the exposition, our discussion will be somewhat non-rigorous in nature; for instance, we will make liberal use of the non-rigorous symbol ${\approx}$ without quantifying the nature of the approximation.
Example 1 (A well-distributed pure quadratic factor).
Let $G$ be the $d$ -torus $G=(\mathbb{R}/\mathbb{Z})^{d}$ for some bounded $d=O(1)$ , and let $F:G\rightarrow [-1,1]$ be a smooth function (independent of $p$ ); for instance, $F$ could be a finite linear combination of characters $\unicode[STIX]{x1D712}:G\rightarrow S^{1}$ of $G$ . Let $\unicode[STIX]{x1D6FC}_{1},\ldots ,\unicode[STIX]{x1D6FC}_{d}\in \mathbb{Z}/p\mathbb{Z}$ be “generic” frequencies, in the sense that there are no non-trivial linear relations of the form
with $k_{1},\ldots ,k_{d}=O(1)$ not all equal to zero. We also introduce some additional frequencies $\unicode[STIX]{x1D6FD}_{1},\ldots ,\unicode[STIX]{x1D6FD}_{d}\in \mathbb{Z}/p\mathbb{Z}$ , for which we impose no genericity restrictions. Let $f:\mathbb{Z}/p\mathbb{Z}\rightarrow [-1,1]$ be the function
where $Q:\mathbb{Z}/p\mathbb{Z}\rightarrow G$ is the quadratic polynomial
and where we use the obvious division by zero map $a\mapsto a/p$ from $\mathbb{Z}/p\mathbb{Z}$ to $\mathbb{R}/\mathbb{Z}$ . For any tuples $k=(k_{1},\ldots ,k_{d})\in \mathbb{Z}^{d}\equiv {\hat{G}}$ and $\unicode[STIX]{x1D709}=(\unicode[STIX]{x1D709}_{1},\ldots ,\unicode[STIX]{x1D709}_{d})\in G$ , we define the dot product
Because of our genericity hypothesis on the $\unicode[STIX]{x1D6FC}_{i}$ , we see from Gauss sum estimates that
for any bounded tuple $k\in \mathbb{Z}^{d}$ when $p$ is large. By the Weyl equidistribution criterion, we thus see that when $p$ is large, the quantity $(\unicode[STIX]{x1D6FC}a^{2}+\unicode[STIX]{x1D6FD}a)/p$ becomes equidistributed in $G$ as $a$ ranges over $\mathbb{Z}/p\mathbb{Z}$ . In particular, as $F$ was assumed to be smooth, we expect to have
if $\mathbf{a}$ is drawn uniformly in $\mathbb{Z}/p\mathbb{Z}$ . Now suppose that $\mathbf{r}$ is also drawn uniformly in $\mathbb{Z}/p\mathbb{Z}$ , independently of $\mathbf{a}$ . The tuple
will not become equidistributed in $G^{4}$ , because of the elementary algebraic identity
which is a discrete version of the fact that the third derivative of any quadratic polynomial vanishes. However, this turns out to be the only constraint on this tuple in the limit $p\rightarrow \infty$ . Indeed, from the genericity hypothesis on the $\unicode[STIX]{x1D6FC}_{i}$ , one can verify that the quadratic form
on $(\mathbb{Z}/p\mathbb{Z})^{2}$ for bounded tuples $k_{0},k_{1},k_{2},k_{3}\in \mathbb{Z}^{d}$ vanishes if and only if $(k_{0},k_{1},k_{2},k_{3})$ is of the form $(k,-3k,3k,-k)$ for some tuple $k$ , where
denotes the purely quadratic component of $Q(a)$ . Using this and a variant of the Weyl equidistribution criterion, one can eventually compute that
Applying Lemma 3.2, we conclude (a heuristic version of) Theorem 3.1 in this case, taking $\mathbf{a},\mathbf{r}$ to be independent uniformly distributed variables on $\mathbb{Z}/p\mathbb{Z}$ .
Example 2 (A well-distributed impure quadratic factor).
Now we give a “local” version of the first example, in which the function $f$ exhibits “locally quadratic” behaviour rather than “globally quadratic” behaviour. Let $\unicode[STIX]{x1D702}>0$ be a small parameter, and suppose that $p$ is very large compared to $\unicode[STIX]{x1D702}$ . We suppose that the cyclic group $\mathbb{Z}/p\mathbb{Z}$ is somehow partitioned into a number $P_{1},\ldots ,P_{m}$ of arithmetic progressions; the number $m$ of such progressions should be thought of as being moderately large (e.g. $m\sim \exp (1/\unicode[STIX]{x1D702}^{O(1)})$ for some parameter $\unicode[STIX]{x1D702}>0$ ). Consider one such progression, for example $P_{c}=\{b_{c}+ns_{c}:1\leqslant n\leqslant N_{c}\}$ for some $b_{c},s_{c}\in \mathbb{Z}/p\mathbb{Z}$ and some $N_{c}>0$ ; one should think of $N_{c}$ as being reasonably large, e.g. $N_{c}\gg \exp (-1/\unicode[STIX]{x1D702}^{O(1)})p$ . To each such progression $P_{c}$ , we associate a torus $G_{c}=(\mathbb{R}/\mathbb{Z})^{d_{c}}$ for some bounded $d_{c}$ with probability Haar measure $\unicode[STIX]{x1D707}_{c}$ , a smooth function $F_{c}:G_{c}\rightarrow [-1,1]$ , and a collection $\unicode[STIX]{x1D709}_{c,1},\ldots ,\unicode[STIX]{x1D709}_{c,d_{c}}\in \mathbb{R}/\mathbb{Z}$ of frequencies that are generic in the sense that there does not exist any non-trivial relations of the form
for bounded $k_{1},\ldots ,k_{d_{c}}\in \mathbb{Z}$ . We then define the function $f:\mathbb{Z}/p\mathbb{Z}\rightarrow [-1,1]$ by setting
for $1\leqslant c\leqslant m$ and $1\leqslant n\leqslant N_{c}$ . One could also add a lower order linear term to the phases $\unicode[STIX]{x1D709}_{c,i}n^{2}$ , as in the preceding example, if desired, but we will not do so here to simplify the exposition slightly.
Within each progression $P_{c}$ , a Weyl equidistribution analysis (using the genericity hypothesis) reveals that the tuple $(\unicode[STIX]{x1D709}_{c,d_{1}}n^{2},\ldots ,\unicode[STIX]{x1D709}_{c,d_{c}}n^{2})$ becomes equidistributed in $G_{c}$ as $p$ becomes large, so that
Now we define the random variables $\mathbf{a},\mathbf{r}\in \mathbb{Z}/p\mathbb{Z}$ as follows. We first select a random element $\mathbf{c}$ from $\{1,\ldots ,m\}$ with $\mathbb{P}(\mathbf{c}=c)=|P_{j}|/p$ for $c=1,\ldots ,m$ . Conditioning on the event that $\mathbf{c}$ is equal to $c$ , we then select $\mathbf{a}$ uniformly at random from $P_{c}$ , and also select $\mathbf{r}$ uniformly at random from an arithmetic progression of the form
with $\mathbf{a}$ and $\mathbf{r}$ independent after conditioning on $\mathbf{c}=c$ . Note that $\mathbf{a}$ and $\mathbf{r}$ are only conditionally independent, relative to the auxiliary variable $\mathbf{c}$ ; if one does not perform this conditioning, then $\mathbf{a}$ and $\mathbf{r}$ become coupled to each other through their mutual dependence on $\mathbf{c}$ .
Without conditioning on $\mathbf{c}$ , the random variable $\mathbf{a}$ becomes uniformly distributed on $\mathbb{Z}/p\mathbb{Z}$ , thus
Also, from (3.11) we have the conditional expectation
A modification of the equidistribution analysis from the first example also gives
where the conditional quartic form $\unicode[STIX]{x1D6EC}_{\mathbf{a},\mathbf{r}}(f|\mathbf{c}=c)$ was defined in (3.1), and hence by Lemma 3.2 we have
Averaging in $c$ (weighted by $\mathbb{P}(\mathbf{c}=c)$ ) to remove the conditional expectation on the left-hand side, and then applying Hölder’s inequality, we obtain a heuristic version of Theorem 3.1 in this case.
Example 3 (A poorly distributed pure quadratic factor).
We now return to the situation of the first example, except that we no longer impose the genericity hypothesis, that is to say we allow for a non-trivial relation of the form (3.7). Without loss of generality we can take the coefficient $k_{d}$ of this relation to be non-zero. Because of this relation, the quantity $Q(\mathbf{a})$ studied in the first example and the tuple (3.8) may not necessarily be as equidistributed as before. However, we can use this irregularity of distribution to modify the representation of $f$ (up to a small error) in such a manner as to reduce the number $d$ of quadratic phases involved. Namely, we can write
where
and where we take advantage of the field structure of $\mathbb{Z}/p\mathbb{Z}$ to locate an inverse $k_{d}^{-1}$ of $k_{d}$ in this field. For our quantitative analysis we will run into a technical difficulty with this representation, in that the Lipschitz constant of $\tilde{F}$ will increase by an undesirable amount compared to that of $F$ when one performs this change of variable, at least if one uses the standard metric on the torus. To fix this, we will eventually have to work with more general tori $\prod _{i=1}^{d}\mathbb{R}/\unicode[STIX]{x1D706}_{i}\mathbb{Z}$ than the standard torus $(\mathbb{R}/\mathbb{Z})^{d}$ , but we ignore this issue for now to continue with the heuristic discussion.
To remove the dependence on the linear phase $\unicode[STIX]{x1D6FE}a/p$ , we partition $\mathbb{Z}/p\mathbb{Z}$ into “(shifted) Bohr sets” $B_{1},\ldots ,B_{m}$ for some moderately large $m$ (e.g. $m\sim \exp (1/\unicode[STIX]{x1D702}^{-C})$ for some constant $C>0$ ), defined by
for $c=1,\ldots ,m$ . On each Bohr set $B_{c}$ , we have the approximation
where $\tilde{F}_{c}(x,y):=\tilde{F}(x,c/m)$ . Using the heuristic that Bohr sets behave like arithmetic progressions, the situation is now similar to that in the second example, with the number of quadratic phases involved reduced from $d$ to $d-1$ , except that there may still be some non-trivial relations among the surviving quadratic phases (and one also now has some lower order linear terms in the quadratic phases). To deal with this difficulty, we turn now to the consideration of yet another example.
Example 4 (A poorly distributed impure quadratic factor).
We now consider an example that is in some sense a combination of the second and third examples. Namely, we suppose we are in the same situation as in the second example, except that we allow some of the indices $c$ to have “poor quadratic distribution” in the sense that they admit non-trivial relations of the form (3.10). Again we may assume without loss of generality that $k_{d_{c}}$ is non-zero in such relations. Because of such relations, we no longer expect to have the equidistribution properties that were used in the second example. However, by modifying the calculations in the third example, we can obtain a new representation of $f$ (again allowing for a small error) on each of the progressions $P_{c}$ with poor quadratic distribution to reduce the number $d_{c}$ of quadratic polynomials used in that progression by one. Iterating this process a finite number of times, one eventually returns to the situation in the second example in which no non-trivial relations occur, at which point one can (heuristically, at least) verify Theorem 3.1 in this case.
The situation becomes slightly more complicated if one adds a lower order linear term $\unicode[STIX]{x1D701}_{c,i}n$ to the purely quadratic phases $\unicode[STIX]{x1D709}_{c,i}n^{2}$ appearing in the second example; this basically is the type of situation one encounters for instance at the conclusion of the third example. In this case, every time one converts a non-trivial relation of the form (3.10) on one of the cells $P_{c}$ of the partition into a new representation of $f$ on that cell, one must subdivide that cell $P_{j}$ into smaller pieces, by intersecting $P_{j}$ with various Bohr sets. However, the resulting sets still behave somewhat like arithmetic progressions, and it turns out that we can still iterate the construction a bounded number of times until no further non-trivial relations between surviving quadratic phases remain on any of the cells of the partition, at which point one can (heuristically, at least) verify Theorem 3.1 in this case (as well as in the case considered in the third example).
Example 5 (A pseudorandom perturbation of a pure quadratic factor).
In all the preceding examples, the function $f:\mathbb{Z}/p\mathbb{Z}\rightarrow [-1,1]$ under consideration was “locally quadratically structured”, in the sense that on local regions such as $P_{c}$ , the function $f$ could be accurately represented in terms of quadratic phase functions $a\mapsto Q(a)$ . This is however not the typical behaviour expected for a general function $f:\mathbb{Z}/p\mathbb{Z}\rightarrow [-1,1]$ . A more representative example would be a function of the form
where $f_{1}:\mathbb{Z}/p\mathbb{Z}\rightarrow \mathbb{R}$ is a function of the type considered in the first example, thus
for some quadratic function $Q:\mathbb{Z}/p\mathbb{Z}\rightarrow G$ into a torus $G=(\mathbb{R}/\mathbb{Z})^{d}$ and some smooth $F:G\rightarrow [-1,1]$ , and $f_{2}:\mathbb{Z}/p\mathbb{Z}\rightarrow [-1,1]$ is a function that is globally Gowers uniform in the sense that
where $\mathbf{a},\mathbf{h}_{1},\mathbf{h}_{2},\mathbf{h}_{3}$ are drawn independently and uniformly at random from $\mathbb{Z}/p\mathbb{Z}$ . A typical example to keep in mind is when $F$ (and hence $f_{1}$ ) takes values in $[0,1]$ , and $f=\mathbf{f}$ is a random function with $f(a)$ equal to $1$ with probability $f_{1}(a)$ and $0$ with probability $1-f_{1}(a)$ , independently as $a\in \mathbb{Z}/p\mathbb{Z}$ varies; then the $f_{2}(a)$ for $a\in \mathbb{Z}/p\mathbb{Z}$ become independent random variables of mean zero, and the global Gowers uniformity can be established with high probability using tools such as the Chernoff inequality.
From the standard theory of the Gowers norms (see e.g. [Reference Tao and Vu33, Ch. 11]), one can use the global Gowers uniformity of $f_{2}$ , combined with a number of applications of the Cauchy–Schwarz inequality, to establish a “generalized von Neumann theorem” that, in our current context, implies that $f$ and $f_{1}$ globally count approximately the same number of length-four progressions in the sense that
similarly one also has
As a consequence, Theorem 3.1 for such functions follows (heuristically, at least) from the analysis of the first example, at least if one assumes the genericity of the frequencies $\unicode[STIX]{x1D709}_{1},\ldots ,\unicode[STIX]{x1D709}_{d}$ .
Example 6 (A pseudorandom perturbation of an impure quadratic factor).
We now consider a situation that is to the second example as the fifth example was to the first. Namely, we consider a function of the form
where $f_{1}:\mathbb{Z}/p\mathbb{Z}\rightarrow [-1,1]$ is a function of the type considered in the second example, thus
for $c=1,\ldots ,m$ and $n=1,\ldots ,N_{c}$ . As for the function $f_{2}:\mathbb{Z}/p\mathbb{Z}\rightarrow [-1,1]$ , global Gowers uniformity of $f_{2}$ will be too weak of a hypothesis for our purposes, because the random variable $\mathbf{r}$ appearing in the second example is now localized to a significantly smaller region than $\mathbb{Z}/p\mathbb{Z}$ . Instead, we will require the local Gowers uniformity hypothesis
where $\mathbf{a}$ is now the random variable from the second example (in particular, $\mathbf{a}$ depends on the auxiliary random variable $\mathbf{c}$ ), and once one conditions on an event $\mathbf{c}=c$ for $c=1,\ldots ,m$ , one draws $\mathbf{h}_{1},\mathbf{h}_{2},\mathbf{h}_{3}$ independently of each other and from $\mathbf{a}$ , and each $\mathbf{h}_{i}$ drawn uniformly from an arithmetic progression of the form
for some constant $C_{i}>0$ (for technical reasons, it is convenient to allow these constants $C_{1},C_{2},C_{3}$ to be different from each other, and also to be larger than the constant $C$ appearing in (3.12), so that $\mathbf{h}_{1},\mathbf{h}_{2},\mathbf{h}_{3}$ range over a narrower scale than $\mathbf{r}$ ). As with $\mathbf{a}$ and $\mathbf{r}$ , the random variables $\mathbf{a},\mathbf{h}_{1},\mathbf{h}_{2},\mathbf{h}_{3}$ are now only conditionally independent relative to the auxiliary variable $\mathbf{c}$ , but are not independent of each other without this conditioning, as they are coupled to each other through $\mathbf{c}$ .
As it turns out, once one assumes this local Gowers uniformity of $f_{2}$ , one can modify the Cauchy–Schwarz arguments used to establish the global generalized von Neumann theorem to obtain the approximations (3.14), (3.15) for the random variables $\mathbf{a},\mathbf{r}$ considered in the second example, at which point Theorem 3.1 for this choice of $f$ follows (heuristically, at least) from the analysis of that example, at least if one assumes that there are no non-trivial relations of the form (3.10).
Example 7 (Non-pseudorandom perturbation of a pure quadratic factor).
We now modify the fifth example by replacing the hypothesis (3.13) by its negation
(it is not difficult to show that the left-hand side is non-negative). In this case, the generalized von Neumann theorem used in that example does not give a good estimate. However, in this situation one can apply the inverse theorem for the Gowers norm established by us in [Reference Green and Tao14]. To obtain good quantitative bounds, we will use the version of that theorem that involves local correlation with quadratic objects (as opposed to a somewhat weak global correlation with a single “locally quadratic” object). Namely, if (3.18) holds, then one can partition $\mathbb{Z}/p\mathbb{Z}$ into a moderately large (e.g. $O(\exp (1/\unicode[STIX]{x1D702}^{-O(1)}))$ ) number of pieces $P_{1},\ldots ,P_{m}$ , such that on each piece $P_{c}$ , the function $f_{2}$ correlates with a “quadratically structured” object. The precise statement is somewhat technical to state, but one simple special case of this conclusion is that the pieces $P_{1},\ldots ,P_{m}$ are arithmetic progressions as in the second example, and for a “significant number” of the progressions
there exists a frequency $\unicode[STIX]{x1D709}_{c}\in \mathbb{R}/\mathbb{Z}$ such that
(In general, one would take $P_{c}$ to be Bohr sets of moderately high rank, rather than arithmetic progressions, and the phase $a\mapsto \unicode[STIX]{x1D709}_{c}a^{2}/p$ would have to be replaced by a more general locally quadratic phase function on such a Bohr set, but we ignore these technicalities for the current informal discussion.) From this and the cosine rule, it is possible to find a function $g:\mathbb{Z}/p\mathbb{Z}\rightarrow [-1,1]$ that is equal to (the real part of) a scalar multiple of the quadratic phases $b_{c}+ns_{c}\mapsto e(\unicode[STIX]{x1D709}_{c}n^{2})$ on each progression $P_{c}$ , such that $f_{2}+g$ has an energy decrement compared to $f_{2}$ in the sense that
for some constant $C>0$ . In this situation, we can modify the decomposition $f=f_{1}+f_{2}$ by adding $g$ to $f_{2}$ and subtracting it from $f_{1}$ . (Strictly speaking, this may make $f_{1}$ and $f_{2}$ range slightly outside of $[-1,1]$ , but because $f$ itself ranges in $[-1,1]$ , it turns out to be relatively easy to modify $f_{1},f_{2}$ further to rectify this problem.) The new function $f_{1}$ has a similar “quadratic structure” to the previous function $f_{1}$ , except that the quadratic structure is now localized to the cells $P_{1},\ldots ,P_{m}$ of the partition of $\mathbb{Z}/p\mathbb{Z}$ , and the number of quadratic functions has been increased by one. If the new function $f_{2}$ is now locally Gowers uniform in the sense of (3.16), then we are now essentially in the situation of the sixth example (at least if there are no non-trivial relations of the form (3.10)), and we can (heuristically at least) conclude Theorem 3.1 in this case by the previous analysis. If $f_{2}$ is locally Gowers uniform but there are additionally some relations of the form (3.10), then one can hope to adapt the analysis of the fourth example to reduce the quadratic complexity of $f_{1}$ on all the poorly distributed cells, at which point one restarts the analysis. If however $f_{2}$ remains non-uniform, then we need to argue using the analysis of the next and final example.
Example 8 (Non-pseudorandom perturbation of an impure quadratic factor).
Our final and most difficult example will be as to the sixth example as the seventh example was to the fifth. Namely, we modify the sixth example by assuming that the negation of (3.16) holds. Equivalently, one has the lower bound
on the local Gowers norm for a “significant fraction” of the $c=1,\ldots ,m$ .
At the qualitative level, the inverse theorem in [Reference Green and Tao14] for the global Gowers norm allows one to also deduce a similar conclusion starting from the hypothesis (3.20). However, the quantitative bounds obtained by this approach turn out to be too poor for the purposes of establishing Theorems 3.1 or 1.1. Instead, one must obtain a quantitative local inverse theorem for the Gowers norm that has reasonably good bounds (of polynomial type) on the amount of correlation that is (locally) attained. Establishing such a theorem is by far the most complicated and lengthy component of this paper, although broadly speaking it follows the same strategy as previous theorems of this type in [Reference Gowers11, Reference Green and Tao14]. If one takes this local inverse theorem for granted, then roughly speaking what we can then conclude from the hypothesis (3.20) is that for a significant number of $c=1,\ldots ,m$ , one can partition the cell $P_{c}$ into subcells $P_{c,1},\ldots ,P_{c,m_{c}}$ , and locate a “locally quadratic phase function” $\unicode[STIX]{x1D719}_{c,i}:P_{c,i}\rightarrow \mathbb{R}/\mathbb{Z}$ on each such subcell (generalizing the functions $b_{c}+ns_{c}\mapsto e(\unicode[STIX]{x1D709}_{c}n^{2})$ from the previous example), such that
for a significant fraction of the $c,i$ . Using this, one can again obtain an energy decrement of the form (3.19), where now $g$ is (the real part of) a scalar multiple of the functions $a\mapsto e(\unicode[STIX]{x1D719}_{c,i}(a))$ on each $P_{c,i}$ . By arguing as in the sixth example, one can then modify $f_{1}$ and $f_{2}$ in such a way that the “energy” $\mathbb{E}f_{2}(\mathbf{a})^{2}$ decreases significantly, while $f_{1}$ is now locally quadratically structured on a somewhat finer partition of $\mathbb{Z}/p\mathbb{Z}$ than the original partition $P_{1},\ldots ,P_{m}$ , with the number of quadratic phases needed to describe $f_{1}$ on each partition having increased by one. If the function $f_{2}$ is now locally Gowers uniform (with respect to a new set of random variables $\mathbf{a},\mathbf{r}$ adapted to this finer partition), and there are no non-trivial relations of the form we can now (heuristically) conclude Theorem 3.1 from the analysis of the sixth example, assuming the addition of the new quadratic phase has not introduced relations of the form (3.10). If such relations occur, though, one can hope to adapt the analysis of the fourth example to reduce the quadratic complexity of the poorly distributed cells, perhaps at the cost of further subdivision of the cells. Finally, if the new version of $f_{2}$ remains non-uniform with respect to the finer partition, then one iterates the analysis of this example to reduce the energy of $f_{2}$ further. This process cannot continue indefinitely due to the non-negativity of the energy (and also because none of the other steps in the iteration will cause a significant increase in energy). Because of this, one can hope to cover all cases of Theorem 3.1 by some complicated iteration of the eight arguments described above.
Having informally discussed the eight key examples for Theorem 3.1, we return now to the task of proving this theorem rigorously.
It will be convenient to work throughout the rest of the paper with a fixed choice
of absolute constants, with each $C_{i}$ assumed to be sufficiently large depending on the previous $C_{1},\ldots ,C_{i-1}$ . For instance, for sake of concreteness one could choose $C_{i}:=2^{2^{100i}}$ ; of course, other choices are possible. The implied constants in the $O(\,)$ notation will not depend on the $C_{i}$ unless otherwise specified. These constants will serve as exponents for various scales $\unicode[STIX]{x1D702}^{-C_{i}}$ that will appear in our analysis, with the point being that any scale of the form $\unicode[STIX]{x1D702}^{-C_{i}}$ for $i=2,\ldots ,5$ is extremely tiny with respect to any polynomial combination of the previous scales $\unicode[STIX]{x1D702}^{-C_{1}},\ldots ,\unicode[STIX]{x1D702}^{-C_{i-1}}$ .
In all of the eight examples considered above, the function $f$ was approximated by some “quadratically structured” function, usually denoted $f_{1}$ , with the approximation being accurate in various senses with respect to some pair $(\mathbf{a},\mathbf{r})$ of random variables. The rigorous argument will similarly approximate $f$ by a quadratically structured object; it will be convenient to make this object a random function $\mathbf{f}$ rather than a deterministic one (though as it turns out, this function will become deterministic again once an auxiliary random variable $\mathbf{c}$ is fixed). The precise definition of “quadratically structured” will be rather technical, and will eventually be given in Definition 6.1. For now, we shall abstract the properties of “quadratic structure” that we will need, in the following proposition involving an abstract directed graph $G=(V,E)$ (encoding the “structured local approximants”), which we will construct more explicitly later. We will shortly iterate this proposition to establish Theorem 3.1 and hence Theorem 1.1.
Proposition 3.3 (Main proposition, abstract form).
Let $\unicode[STIX]{x1D702}$ be a real number with $0<\unicode[STIX]{x1D702}\leqslant {\textstyle \frac{1}{10}}$ , and let $p$ be a prime with
Let $f:\mathbb{Z}/p\mathbb{Z}\rightarrow [0,1]$ be a function. Then there exist the following:
-
(a) a (possibly infinite) directed graph $G=(V,E)$ , with elements $v\in V$ referred to as structured local approximants, and the notation $v\rightarrow v^{\prime }$ used to denote the existence of a directed edge from one structured local approximant $v$ to another $v^{\prime }$ ;
-
(b) a triple $(\mathbf{a}_{v},\mathbf{r}_{v},\mathbf{f}_{v})$ associated to $f$ and to each structured local approximant $v\in V$ , where $\mathbf{a}_{v},\mathbf{r}_{v}$ are random variables in $\mathbb{Z}/p\mathbb{Z}$ , and $\mathbf{f}_{v}:\mathbb{Z}/p\mathbb{Z}\rightarrow [-1,1]$ is a random function (with $\mathbf{a}_{v},\mathbf{r}_{v},\mathbf{f}_{v}$ not assumed to be independent);
-
(c) a quadratic dimension $d_{2}(v)\in \mathbb{N}$ assigned to each vertex $v\in V$ ;
-
(d) a poorly distributed quadratic dimension $d_{2}^{\text{poor}}(v)\in \mathbb{N}$ assigned to each vertex $v\in V$ , with $0\leqslant d_{2}^{\text{poor}}(v)\leqslant d_{2}(v)$ ; and
-
(e) an initial approximant $v_{0}\in V$ , with $d_{2}(v_{0})=0$ (and hence $d_{2}^{\text{poor}}(v_{0})=0$ ).
Furthermore, whenever a structured local approximant $v_{k}\in V$ can be reached from $v_{0}$ by a path $v_{0}\rightarrow v_{1}\rightarrow \cdots \rightarrow v_{k}$ with $0\leqslant k\leqslant 8\unicode[STIX]{x1D702}^{-2C_{2}}$ , then the following properties are obeyed:
-
(i) one has the “thickness” condition
(3.22) $$\begin{eqnarray}\mathbb{P}(\mathbf{r}_{v_{k}}=0)\ll \exp (3\unicode[STIX]{x1D702}^{-C_{5}})/p;\end{eqnarray}$$ -
(ii) we have the almost uniformity condition
(3.23) $$\begin{eqnarray}|\mathbb{E}f(\mathbf{a}_{v_{k}})-\mathbb{E}_{a\in \mathbb{Z}/p\mathbb{Z}}f(a)|\leqslant \unicode[STIX]{x1D702};\end{eqnarray}$$ -
(iii) bad approximation implies energy decrement: if
(3.24) $$\begin{eqnarray}|\mathbb{E}\mathbf{f}_{v_{k}}(\mathbf{a}_{v_{k}})-f(\mathbf{a}_{v_{k}})|>\unicode[STIX]{x1D702}\end{eqnarray}$$or(3.25) $$\begin{eqnarray}|\unicode[STIX]{x1D6EC}_{\mathbf{a}_{v_{k}},\mathbf{r}_{v_{k}}}(\mathbf{f}_{v_{k}})-\unicode[STIX]{x1D6EC}_{\mathbf{a}_{v_{k}},\mathbf{r}_{v_{k}}}(f)|>\unicode[STIX]{x1D702}\end{eqnarray}$$then there exists a structured local approximant $v_{k+1}\in V$ with $v_{k}\rightarrow v_{k+1}$ such that$$\begin{eqnarray}\mathbb{E}|f(\mathbf{a}_{v_{k+1}})-\mathbf{f}_{v_{k+1}}(\mathbf{a}_{v_{k+1}})|^{2}\leqslant \mathbb{E}|f(\mathbf{a}_{v_{k}})-\mathbf{f}_{v_{k}}k(\mathbf{a}_{v_{k}})|^{2}-\unicode[STIX]{x1D702}^{C_{2}}\end{eqnarray}$$and$$\begin{eqnarray}d_{2}(v_{k+1})\leqslant d_{2}(v_{k})+1.\end{eqnarray}$$ -
(iv) failure of “Khinchin-type recurrence” implies dimension decrement: if
(3.26) $$\begin{eqnarray}\unicode[STIX]{x1D6EC}_{\mathbf{a}_{v_{k}},\mathbf{r}_{v_{k}}}(\mathbf{f}_{v_{k}})\leqslant (\mathbb{E}\mathbf{f}_{v_{k}}(\mathbf{a}_{v_{k}}))^{4}-\unicode[STIX]{x1D702},\end{eqnarray}$$then there exists a structured local approximant $v_{k+1}\in V$ with $v_{k}\rightarrow v_{k+1}$ obeying the bounds$$\begin{eqnarray}\displaystyle \mathbb{E}|f(\mathbf{a}_{v_{k+1}})-\mathbf{f}_{v_{k+1}}(\mathbf{a}_{v_{k+1}})|^{2} & {\leqslant} & \displaystyle \mathbb{E}|f(\mathbf{a}_{v_{k}})-\mathbf{f}_{v_{k}}(\mathbf{a}_{v_{k}})|^{2}+\unicode[STIX]{x1D702}^{3C_{2}},\nonumber\\ \displaystyle d_{2}(v_{k+1}) & {\leqslant} & \displaystyle d_{2}(v_{k}),\nonumber\\ \displaystyle d_{2}^{\text{poor}}(v_{k+1}) & {\leqslant} & \displaystyle d_{2}^{\text{poor}}(v_{k})-1.\nonumber\end{eqnarray}$$
The proof of this proposition will occupy the remainder of the paper. For now, let us see how this proposition implies Theorem 3.1. Let $p,\unicode[STIX]{x1D702},f$ be as in that theorem, and let $C_{1},\ldots ,C_{5}$ be as above. If the largeness criterion (3.21) fails, then we may set $\mathbf{r}:=0$ , $\mathbf{f}:=f$ , and draw $\mathbf{a}$ uniformly at random from $\mathbb{Z}/p\mathbb{Z}$ , and it is easy to see that the conclusions of Theorem 3.1 are obeyed (with (3.3) following from Hölder’s inequality). Thus we may assume without loss of generality that (3.21) holds.
Let $G=(V,E)$ , $v_{0}$ , $d_{2}(\,)$ , $d_{2}^{\text{poor}}(\,)$ , and $(\mathbf{a}_{v},\mathbf{r}_{v},\mathbf{f}_{v})$ be as in Proposition 3.3. Suppose first that there exists a structured local approximant $v_{k}\in V$ that can be reached from $v_{0}$ by a path of length at most $8\unicode[STIX]{x1D702}^{-2C_{2}}$ , and for which none of the inequalities (3.24)–(3.26) hold, that is to say one has the bounds
From (3.29), (3.28), (3.27) and the triangle inequality (and the boundedness of $\mathbf{f}_{v_{k}},f$ ) we conclude that
combining this with (3.22) and (3.23) we see that the random variables $\mathbf{a}_{v_{k}},\mathbf{r}_{v_{k}}$ obey the properties required of Theorem 3.1. Thus we may assume for sake of contradiction that this situation never occurs, which by Proposition 3.3 implies that whenever $v_{k}\in V$ is a structured local approximant that can be reached from $v_{0}$ by a path of length at most $8\unicode[STIX]{x1D702}^{-2C_{2}}$ , then the conclusions of at least one of (iii) and (iv) hold. Iterating this we may therefore construct a path
with
such that for every $0\leqslant k\leqslant k_{0}$ , one either has the energy decrement bounds
or the dimension decrement bounds
Since $v_{0}$ already has the minimum quadratic dimension $d_{2}^{\text{poor}}(v_{0})=0$ , we see that we must experience an energy decrement at the $k=0$ stage. Also, if $k$ is the $j$ th index to experience an energy decrement, we see that $d_{2}^{\text{poor}}(v_{k+1})\leqslant d_{2}(v_{k+1})\leqslant j$ , and so one can have at most $j$ consecutive dimension decrements after the $k$ th stage; in other words, we must experience another energy decrement within $j+1$ steps. By definition of $k_{0}$ , we have $\sum _{0\leqslant j\leqslant 2\unicode[STIX]{x1D702}^{-C_{2}}}(j+1)<k_{0}$ if $C_{2}$ is large enough. We conclude that at least $2\unicode[STIX]{x1D702}^{-C_{2}}$ energy decrements occur within the path $v_{0}\rightarrow \cdots \rightarrow v_{k_{0}+1}$ . This implies that
But if $C_{2}$ is sufficiently large, this implies from (3.30) that
(for example), which leads to a contradiction because the left-hand side is clearly non-negative, and the right-hand side non-positive. This gives the desired contradiction that establishes Theorem 3.1 and hence Theorem 1.1.
It remains to establish Proposition 3.3. This will occupy the remaining portions of the paper.
4 Bohr sets
To define and manipulate the “structured local approximants” that appear in Proposition 3.3, we will need to develop the theory of two mathematical objects. The first is that of a Bohr set, which will be covered in this section; the second is that of a dilated torus, which we will discuss in the next section.
Definition 4.1 (Bohr set).
A subset $S$ of $\mathbb{Z}/p\mathbb{Z}$ is said to be non-degenerate if it contains at least one non-zero element. In this case we define the dual $S$ -norm
for any $a\in \mathbb{Z}/p\mathbb{Z}$ , and then define the Bohr set $B(S,\unicode[STIX]{x1D70C})\subset \mathbb{Z}/p\mathbb{Z}$ for any $\unicode[STIX]{x1D70C}>0$ by the formula
where $\Vert \unicode[STIX]{x1D703}\Vert _{\mathbb{R}/\mathbb{Z}}$ denotes the distance from $\unicode[STIX]{x1D703}$ to the nearest integer. We refer to $S$ as the set of frequencies of the Bohr set, $\unicode[STIX]{x1D70C}$ as the radius, and $|S|$ as the rank of the Bohr set. We also define the shifted Bohr sets
for any $n\in \mathbb{Z}/p\mathbb{Z}$ .
From (2.4) we have the triangle inequalities
for $a,b\in \mathbb{Z}/p\mathbb{Z}$ and $k\in \mathbb{Z}$ ; also we trivially have
if $S\subset S^{\prime }$ and $a\in \mathbb{Z}/p\mathbb{Z}$ , or equivalently that $B(S^{\prime },\unicode[STIX]{x1D70C})\subset B(S,\unicode[STIX]{x1D70C})$ for $\unicode[STIX]{x1D70C}>0$ . We will frequently use these inequalities in the sequel, usually without further comment. In Lemma 4.6 below, we will show that $\Vert \Vert _{S^{\bot }}$ is “dual” to a certain word norm $\Vert \Vert _{S}$ on $\mathbb{Z}/p\mathbb{Z}$ . One could also define Bohr sets in the case when $S$ is degenerate, but this creates some minor complications in our arguments, so we remove this case from our definition of a Bohr set.
We have the following standard size bounds for Bohr sets, whose proof may be found in [Reference Tao and Vu33, Lemma 4.20].
Lemma 4.2. If $B(S,\unicode[STIX]{x1D70C})$ is a Bohr set, then $|B(S,\unicode[STIX]{x1D70C})|\geqslant \unicode[STIX]{x1D70C}^{|S|}p$ and $|B(S,2\unicode[STIX]{x1D70C})|\leqslant 4^{|S|}|B(S,\unicode[STIX]{x1D70C})|$ .
In previous work on Roth-type theorems, one sometimes restricts attention to regular Bohr sets, as first introduced in [Reference Bourgain6]; see [Reference Tao and Vu33, §4.4] for some discussion of this concept. Due to our use of the probabilistic method, we will be able to work with a technically simpler and “smoothed out” version of a regular Bohr set, which we call the regular probability distribution on a Bohr set.
Definition 4.3. Let $B(S,\unicode[STIX]{x1D70C})$ be a Bohr set. The regular probability distribution $\mathfrak{p}_{B(S,\unicode[STIX]{x1D70C})}:\mathbb{Z}/p\mathbb{Z}\rightarrow \mathbb{R}$ associated to $B(S,\unicode[STIX]{x1D70C})$ is the function defined by the formula
it is easy to see (from Fubini’s theorem) that this is indeed a probability distribution on $\mathbb{Z}/p\mathbb{Z}$ . A random variable $\mathbf{a}\in \mathbb{Z}/p\mathbb{Z}$ is said to be drawn regularly from $B(S,\unicode[STIX]{x1D70C})$ if it has probability density function $\mathfrak{p}_{B(S,\unicode[STIX]{x1D70C})}$ , thus $\mathbb{P}(\mathbf{a}=a)=\mathfrak{p}_{B(S,\unicode[STIX]{x1D70C})}(a)$ for all $a\in \mathbb{Z}/p\mathbb{Z}$ .
More generally, for any shifted Bohr set $n+B(S,\unicode[STIX]{x1D70C})$ , we define the regular probability distribution $\mathfrak{p}_{n+B(S,\unicode[STIX]{x1D70C})}:\mathbb{Z}/p\mathbb{Z}\rightarrow \mathbb{R}$ by the formula
and say that $\mathbf{a}$ is drawn regularly from $n+B(S,\unicode[STIX]{x1D70C})$ if it has probability distribution $\mathfrak{p}_{n+B(S,\unicode[STIX]{x1D70C})}$ .
Informally, to draw a random variable $\mathbf{a}$ regularly from $n+B(S,\unicode[STIX]{x1D70C})$ , one should draw it uniformly from $n\,+\,B(S,\mathbf{t}\unicode[STIX]{x1D70C})$ , where $\mathbf{t}$ is itself selected uniformly at random from the interval $[1/2,1]$ . Note that if $\mathbf{a}$ is drawn regularly from $n+B(S,\unicode[STIX]{x1D70C})$ , then $m+\mathbf{a}$ will be drawn regularly from $m+n+B(S,\unicode[STIX]{x1D70C})$ for any $m\in \mathbb{Z}/p\mathbb{Z}$ , and similarly $k\mathbf{a}$ will be drawn from $kn+B(k^{-1}\cdot S,\unicode[STIX]{x1D70C})$ for any non-zero $k\in \mathbb{Z}/p\mathbb{Z}$ , where $k^{-1}\cdot S:=\{k^{-1}\unicode[STIX]{x1D709}:\unicode[STIX]{x1D709}\in S\}$ is the dilate of the frequency set $S$ by $k^{-1}$ .
From Lemma 4.2 we see that if $\mathbf{a}$ is drawn regularly from a shifted Bohr set $n+B(S,\unicode[STIX]{x1D70C})$ , then
for all $a\in \mathbb{Z}/p\mathbb{Z}$ . In practice, this will mean that the influence of any given value of $\mathbf{a}$ will be negligible.
The presence of the averaging parameter $t$ in (4.2) allows for the following very convenient approximate translation-invariance property. Given two random variables $\mathbf{a},\mathbf{a}^{\prime }$ taking values in a finite set $A$ , we define the total variation distance between the two to be the quantity
or equivalently
where $f:A\rightarrow \mathbb{C}$ ranges over $1$ -bounded functions.
The next lemma gives some approximate translation-invariance properties of Bohr sets. Its proof is a thinly disguised version of the arguments of Bourgain [Reference Bourgain6].
Lemma 4.4. Let $n+B(S,\unicode[STIX]{x1D70C})$ be a shifted Bohr set, and let $\mathbf{a}$ be drawn regularly from $B(S,\unicode[STIX]{x1D70C})$ . Let $B(S^{\prime },\unicode[STIX]{x1D70C}^{\prime })$ be another Bohr set with $S^{\prime }\supset S$ .
-
(i) If $h\in B(S^{\prime },\unicode[STIX]{x1D70C}^{\prime })$ , then $\mathbf{a}$ and $\mathbf{a}+h$ differ in total variation by at most $O(|S|\unicode[STIX]{x1D70C}^{\prime }/\unicode[STIX]{x1D70C})$ .
-
(ii) More generally, if $\mathbf{h}$ is a random variable independent of $\mathbf{a}$ that takes values in $B(S^{\prime },\unicode[STIX]{x1D70C}^{\prime })$ , then $\mathbf{a}$ and $\mathbf{a}+\mathbf{h}$ differ in total variation by at most $O(|S|\unicode[STIX]{x1D70C}^{\prime }/\unicode[STIX]{x1D70C})$ .
Proof. To prove (i), it suffices to show that
for any $1$ -bounded function $f:\mathbb{Z}/p\mathbb{Z}\rightarrow \mathbb{C}$ ; the claim (ii) then also follows by conditioning $\mathbf{h}$ to a fixed value $h\in B(S^{\prime },\unicode[STIX]{x1D70C}^{\prime })$ , then multiplying by $\mathbb{P}(\mathbf{h}=h)$ and summing over $h$ .
By translating $f$ by $n$ , we may assume that $n=0$ . We may assume that $\unicode[STIX]{x1D70C}^{\prime }\leqslant \unicode[STIX]{x1D70C}/10|S|$ , as the claim is trivial otherwise.
From (4.2) we have
and
so by the triangle inequality it suffices to show that
By the triangle inequality, the integrand here is bounded above by $2$ . Also, from (4.1), we see that any $a$ for which $1_{B(S,t\unicode[STIX]{x1D70C})-h}(a)\neq 1_{B(S,t\unicode[STIX]{x1D70C})}(a)$ lies in the “annulus” $B(S,t\unicode[STIX]{x1D70C}+\unicode[STIX]{x1D70C}^{\prime })\backslash B(S,t\unicode[STIX]{x1D70C}-\unicode[STIX]{x1D70C}^{\prime })$ . We conclude that the left-hand side of (4.4) is bounded by
which, using the elementary bound $\min (x-1,1)\ll \log x$ for $x\geqslant 1$ , can be bounded in turn by
The integral telescopes to
which can be bounded in turn by
The claim now follows from Lemma 4.2. ◻
We will be interested in the Fourier coefficients $\mathbb{E}e_{p}(\unicode[STIX]{x1D706}\mathbf{n})=\mathbb{E}e(\unicode[STIX]{x1D706}\mathbf{n}/p)$ of random variables $\mathbf{n}$ drawn regularly from Bohr sets $B(S,\unicode[STIX]{x1D70C})$ . As was noted by Bourgain [Reference Bourgain6], these coefficients are controlled by a “word norm” $\Vert \Vert _{S}$ , defined as follows.
Definition 4.5 (Word norm).
If $S\subset \mathbb{Z}/p\mathbb{Z}$ is non-degenerate, and $a$ is an element of $\mathbb{Z}/p\mathbb{Z}$ , we define the word norm $\Vert a\Vert _{S}$ of $a$ to be the minimum value of $\sum _{s\in S}|n_{s}|$ , where $(n_{s})_{s\in S}\in \mathbb{Z}^{S}$ ranges over tuples of integers such that one has a representation $a=\sum _{s\in S}n_{s}s$ ; note that such a representation always exists because $S$ is non-degenerate.
Similarly to (4.1), we observe the triangle inequalities
for $a,b\in \mathbb{Z}/p\mathbb{Z}$ and $k\in \mathbb{Z}$ , which we will use frequently in the sequel, often without further comment.
We now give a duality relationship between the word norm $\Vert \Vert _{S}$ and the dual $S$ -norm $\Vert \Vert _{S^{\bot }}$ .
Lemma 4.6 (Duality).
Let $S$ be a non-degenerate subset of $\mathbb{Z}/p\mathbb{Z}$ , and let $\unicode[STIX]{x1D706}\in \mathbb{Z}/p\mathbb{Z}$ :
-
(i) for every $n\in \mathbb{Z}/p\mathbb{Z}$ , one has $\Vert n\unicode[STIX]{x1D706}/p\Vert _{\mathbb{R}/\mathbb{Z}}\leqslant \Vert n\Vert _{S^{\bot }}\Vert \unicode[STIX]{x1D706}\Vert _{S}$ ;
-
(ii) conversely, if one has the estimate $\Vert n\unicode[STIX]{x1D706}/p\Vert _{\mathbb{R}/\mathbb{Z}}\leqslant A\Vert n\Vert _{S^{\bot }}$ for some $A\geqslant 1$ and all $n\in \mathbb{Z}/p\mathbb{Z}$ , then $\Vert \unicode[STIX]{x1D706}\Vert _{S}\ll |S|^{3/2}A$ .
Proof. To prove (i), we simply observe (using (2.4)) that for any $n\in \mathbb{Z}/p\mathbb{Z}$ , one has
as desired, where $\unicode[STIX]{x1D706}=\sum _{\unicode[STIX]{x1D709}\in S}a_{\unicode[STIX]{x1D709}}\unicode[STIX]{x1D709}$ is a representation of $\unicode[STIX]{x1D706}$ that minimizes $\sum _{\unicode[STIX]{x1D709}\in S}|\unicode[STIX]{x1D709}|$ .
Estimates such as (ii) go back to the work of Bourgain [Reference Bourgain6]. We will prove this claim by a Fourier-analytic argument. We may assume that $\Vert \unicode[STIX]{x1D706}\Vert _{S}\geqslant |S|^{3/2}$ , as the claim is trivial otherwise. Let $\unicode[STIX]{x1D713}:\mathbb{R}\rightarrow \mathbb{R}$ be a non-negative smooth even function (not depending on $p$ or $\unicode[STIX]{x1D706}$ ) supported on $[-1,1]$ and non-zero on $[-1/2,1/2]$ , whose Fourier transform $\hat{\unicode[STIX]{x1D713}}(\unicode[STIX]{x1D709}):=\int _{\mathbb{R}}\unicode[STIX]{x1D713}(x)e(-\unicode[STIX]{x1D709}x)\,dx$ is also non-negative. Set $N:=|S|^{-1}\Vert \unicode[STIX]{x1D706}\Vert _{S}$ , so in particular $N\geqslant 1$ . We consider the kernel $K_{N}:\mathbb{Z}/p\mathbb{Z}\rightarrow \mathbb{C}$ defined by
by the Poisson summation formula we have
for any integer $n$ , so in particular $K_{N}$ is non-negative.
By definition of $N$ , the frequency $\unicode[STIX]{x1D706}$ has no representations of the form $\unicode[STIX]{x1D706}=\sum _{\unicode[STIX]{x1D709}\in S}a_{\unicode[STIX]{x1D709}}\unicode[STIX]{x1D709}$ with $\sup _{\unicode[STIX]{x1D709}\in S}|a_{\unicode[STIX]{x1D709}}|<N$ . Hence the Riesz-type product $\prod _{\unicode[STIX]{x1D709}\in S}K_{N}(\unicode[STIX]{x1D709}n)$ , when expanded, contains no terms of the form $e_{p}(\unicode[STIX]{x1D706}n)$ or $e_{p}(-\unicode[STIX]{x1D706}n)$ , and is therefore orthogonal to $\cos (2\unicode[STIX]{x1D70B}\unicode[STIX]{x1D706}n/p)$ . In particular we have the identity
On the other hand, from two applications of (2.3) we have
As $K_{N}$ is non-negative, we conclude that
We can expand $K_{N}(\unicode[STIX]{x1D709}_{0}n)(1-\cos (2\unicode[STIX]{x1D70B}\unicode[STIX]{x1D709}_{0}n/p))$ as a Fourier series
The expression inside parentheses is only non-vanishing for $|k|\leqslant N+1$ , and has magnitude $O(1/N^{2})$ . As $\unicode[STIX]{x1D713}$ is non-negative everywhere and non-zero on $[-1/2,1/2]$ , we thus have a pointwise estimate of the form
(for example). By using the non-negativity of the Fourier coefficients of $K_{N}$ , this gives the estimate
Comparing this with (4.6), we conclude that $1\ll A^{2}|S|/N^{2}$ , and the claim follows from the definition of $N$ .◻
Next, we estimate the Fourier coefficients of a regular distribution on a Bohr set in terms of the word norm.
Lemma 4.7. Let $S$ be a non-degenerate subset of $\mathbb{Z}/p\mathbb{Z}$ . Suppose that $\mathbf{n}$ is drawn regularly from $B(S,\unicode[STIX]{x1D70C})$ . Then we have
for all $\unicode[STIX]{x1D706}\in \mathbb{Z}/p\mathbb{Z}$ , where we adopt the convention that the above estimate is vacuously true if $\Vert \unicode[STIX]{x1D706}\Vert _{S}=0$ .
Proof. For any $h\in \mathbb{Z}/p\mathbb{Z}$ , one has from Lemma 4.4 that
which we may rearrange as
Since $|1-e_{p}(\unicode[STIX]{x1D706}h)|\gg \Vert \unicode[STIX]{x1D706}h/p\Vert _{\mathbb{R}/\mathbb{Z}}$ , we conclude that
Taking $h$ so as to minimize the ratio $\Vert h\Vert _{S^{\bot }}/\Vert \unicode[STIX]{x1D706}h/p\Vert _{\mathbb{R}/\mathbb{Z}}$ , the claim follows from Lemma 4.6.◻
We will take advantage of the fact that Bohr sets can be approximately described as generalized arithmetic progressions. A key lemma in this regard is the following.
Lemma 4.8. Let $\unicode[STIX]{x1D6E4}$ be a lattice in $\mathbb{R}^{d}$ . Then there exist linearly independent generators $v_{1},\ldots ,v_{d}$ of $\unicode[STIX]{x1D6E4}$ and real numbers $N_{1},\ldots ,N_{d}>0$ such that
for all $t>0$ , where $B_{\mathbb{R}^{d}}(0,r)$ is the open Euclidean ball of radius $r$ in $\mathbb{R}^{d}$ , and the $n_{i}$ are understood to be integers. Furthermore, the determinant/covolume $\det (\unicode[STIX]{x1D6E4})$ obeys the bounds
Proof. Applying [Reference Tao and Vu34, Theorem 1.6], we can find elements $v_{1},\ldots ,v_{r}$ of $\unicode[STIX]{x1D6E4}$ for some $r\leqslant d$ , linearly independent over the rationals, and real numbers $N_{1},\ldots ,N_{d}>0$ such that
for all $t>0$ , and such that
(Strictly speaking, the statement of [Reference Tao and Vu34, Theorem 1.6] only claims the latter bound for $t=1$ , but the same argument gives the bound for all $t>0$ .) Sending $t$ to infinity, we conclude that the $v_{1},\ldots ,v_{r}$ generate $\unicode[STIX]{x1D6E4}$ ; since, by virtue of being a lattice, $\unicode[STIX]{x1D6E4}$ is cocompact, this forces $d=r$ . Also, volume packing arguments show that as $t\rightarrow \infty$ , the cardinality $|B_{\mathbb{R}^{d}}(0,t)\,\cap \,\unicode[STIX]{x1D6E4}|$ is asymptotic to the measure of $B_{\mathbb{R}^{d}}(0,t)$ divided by $\det (\unicode[STIX]{x1D6E4})$ , while the cardinality of $|\{n_{1}v_{1}+\cdots +n_{d}v_{d}:|n_{i}|\leqslant tN_{i}\}|$ is asymptotic to $\prod _{i=1}^{d}(2tN_{i})$ . We conclude (4.8) as desired.◻
The following corollary describes how we may pick a “basis” for a Bohr set.
Corollary 4.9. Let $S$ be a non-degenerate subset of $\mathbb{Z}/p\mathbb{Z}$ , and set $d:=|S|$ . Then there exist elements $a_{1},\ldots ,a_{d}$ of $\mathbb{Z}/p\mathbb{Z}$ and real numbers $N_{1},\ldots ,N_{d}>0$ such that
and
for all $i=1,\ldots ,d$ . Furthermore, for any $a\in \mathbb{Z}/p\mathbb{Z}$ , there exists a representation
with $n_{1},\ldots ,n_{d}$ integers of size
for $i=1,\ldots ,d$ . Finally, if one imposes the additional condition $|n_{i}|<N_{i}/2$ for all $i=1,\ldots ,d$ , then there is at most one such representation of this form (4.12) for a given $a$ .
Proof. For each $s\in S$ , the fraction $s/p$ can be viewed as an element of $\mathbb{R}/\mathbb{Z}$ of order at most $p$ ; as $S$ is non-degenerate, we see that the tuple $(s/p)_{s\in S}$ is an element of the torus $(\mathbb{R}/\mathbb{Z})^{S}$ of order $p$ . Let $\unicode[STIX]{x1D6E4}$ be the preimage in $\mathbb{R}^{S}$ of the group generated by this element, thus $\unicode[STIX]{x1D6E4}$ is a lattice of $\mathbb{R}^{S}$ that contains $\mathbb{Z}^{S}$ as a sublattice of index $p$ ; in particular, $\unicode[STIX]{x1D6E4}$ has determinant $p$ . Applying Lemma 4.8, one can find generators $v_{1},\ldots ,v_{d}$ of $\unicode[STIX]{x1D6E4}$ and real numbers $N_{1},\ldots ,N_{d}$ obeying (4.10) such that
for all $t>0$ .
By construction of $\unicode[STIX]{x1D6E4}$ , we can find elements $a_{1},\ldots ,a_{d}$ of $\mathbb{Z}/p\mathbb{Z}$ such that
for $i=1,\ldots ,d$ . Applying (4.14) with $t$ slightly larger than $N_{i}^{-1}$ for some $i=1,\ldots ,d$ , we see that $v_{i}\in B_{\mathbb{R}^{d}}(N_{i}^{-1})$ , and hence by (4.15) we have (4.11).
Finally, if $a\in \mathbb{Z}/p\mathbb{Z}$ , then by definition of $\unicode[STIX]{x1D6E4}$ we can find an element $x$ of $\unicode[STIX]{x1D6E4}$ in the preimage of $(as/p)_{s\in S}$ such that each component of $x$ has magnitude less than $\Vert a\Vert _{S^{\bot }}$ ; in particular, $x\in B_{\mathbb{R}^{S}}(0,\sqrt{d}\Vert a\Vert _{S^{\bot }})$ . Applying (4.14), we conclude that $x=\sum _{i=1}^{d}n_{i}v_{i}$ for some integers $n_{1},\ldots ,n_{d}$ obeying (4.13), giving the desired representation (4.12).
Finally, we show uniqueness. If there were two representations of the form (4.12) with $|n_{i}|<N_{i}/2$ for all $i=1,\ldots ,d$ , then there exists a tuple $(n_{1}^{\prime },\ldots ,n_{d}^{\prime })\in \mathbb{Z}^{d}$ , not identically zero, with $|n_{i}^{\prime }|<N_{i}$ for all $i=1,\ldots ,d$ and $\sum _{i=1}^{d}n_{i}a_{i}=0$ , which implies that the vector $\sum _{i=1}^{d}n_{i}v_{i}$ lies in $\mathbb{Z}^{S}$ . As the $v_{1},\ldots ,v_{d}$ are linearly independent, this vector must have magnitude at least $1$ ; but this contradicts (4.7) (with $t=1$ ).◻
Linear and quadratic functions on Bohr sets
We will frequently need to deal with locally linear or quadratic functions on Bohr sets. We review the definitions of these now.
Definition 4.10. Let $B$ be a subset of $\mathbb{Z}/p\mathbb{Z}$ , and let $G=(G,+)$ be an abelian group. A function $\unicode[STIX]{x1D719}:B\rightarrow G$ is said to be locally linear on $B$ if one has
whenever $n,h_{1},h_{2}\in \mathbb{Z}/p\mathbb{Z}$ are such that $n,n+h_{1},n+h_{2},n+h_{1}+h_{2}\in B$ . Similarly, $\unicode[STIX]{x1D719}$ is said to be locally quadratic on $B$ if one has
whenever $n,h_{1},h_{2},h_{3}\in \mathbb{Z}/p\mathbb{Z}$ are such that $n+\unicode[STIX]{x1D714}_{1}h_{1}+\unicode[STIX]{x1D714}_{2}h_{2}+\unicode[STIX]{x1D714}_{3}h_{3}\in B$ for all $(\unicode[STIX]{x1D714}_{1},\unicode[STIX]{x1D714}_{2},\unicode[STIX]{x1D714}_{3})\in \{0,1\}^{3}$ .
A function $\unicode[STIX]{x1D713}:B\times B\rightarrow G$ is said to be locally bilinear on $B$ if one has
whenever $h_{1},h_{1}^{\prime },h_{2}\in B$ are such that $h_{1}+h_{1}^{\prime }\in B$ , and similarly one has
whenever $h_{1},h_{2},h_{2}^{\prime }\in B$ are such that $h_{2}+h_{2}^{\prime }\in B$ .
Specializing (4.16) to the case $h_{1}=h_{2}=h_{3}=h$ , we conclude that
whenever $\unicode[STIX]{x1D719}:B\rightarrow G$ is locally quadratic on $B$ and $n,n+h,n+2h,n+3h\in B$ .
It is well known (from the Weyl exponential sum estimates) that quadratic exponential sums such as $\mathbb{E}_{1\leqslant n\leqslant N}e(\unicode[STIX]{x1D6FC}n^{2}+\unicode[STIX]{x1D6FD}n)$ can only be large when the quadratic phase $\unicode[STIX]{x1D6FC}n^{2}$ is of “major arc” type in the sense that $k\unicode[STIX]{x1D6FC}n^{2}$ is close to constant on the range $\{1,\ldots ,N\}$ of the summation variable $n$ , for some bounded positive integer $k$ . The following proposition is an analogue of this phenomenon on Bohr sets.
Proposition 4.11 (Large local quadratic exponential sums).
Let $B(S,\unicode[STIX]{x1D70C})$ be a Bohr set, let $0<\unicode[STIX]{x1D6FF}\leqslant 1/2$ , let $\unicode[STIX]{x1D706},\unicode[STIX]{x1D707}:B(S,10\unicode[STIX]{x1D70C})\rightarrow \mathbb{R}/\mathbb{Z}$ be locally linear maps, and let $\unicode[STIX]{x1D719}:B(S,10\unicode[STIX]{x1D70C})\times B(S,10\unicode[STIX]{x1D70C})\rightarrow \mathbb{R}/\mathbb{Z}$ be a locally bilinear phase such that
if $\mathbf{n},\mathbf{m}$ are drawn independently and regularly from $B(S,\unicode[STIX]{x1D70C})$ . Then there exists a natural number
such that
whenever $n,m\in B(S,\unicode[STIX]{x1D6FF}^{C_{1}}\unicode[STIX]{x1D70C}/(C_{1}|S|)^{3|S|})$ .
Proof. Let $d:=|S|$ , thus $d\geqslant 1$ . By Corollary 4.9, we can find elements $a_{1},\ldots ,a_{d}$ of $\mathbb{Z}/p\mathbb{Z}$ and real numbers $N_{1},\ldots ,N_{d}$ obeying the conclusions of that corollary.
Suppose that $1\leqslant i,j\leqslant d$ are such that $N_{i},N_{j}\geqslant d/\unicode[STIX]{x1D6FF}^{C_{1}/2}\unicode[STIX]{x1D70C}$ (we allow $i$ and $j$ to be equal). Then by (4.11) we have
We can control the coefficient $\unicode[STIX]{x1D719}(a_{i},a_{j})$ by the following argument. If we draw $\mathbf{b}_{i}$ and $\mathbf{b}_{j}$ uniformly from $\{b_{i}\in \mathbb{Z}:1\leqslant b_{i}\leqslant \unicode[STIX]{x1D6FF}^{C_{1}/4}N_{i}\unicode[STIX]{x1D70C}/d\}$ and $\{b_{j}\in \mathbb{Z}:1\leqslant b_{j}\leqslant \unicode[STIX]{x1D6FF}^{C_{1}/4}N_{j}\unicode[STIX]{x1D70C}/d\}$ respectively and independently of each other and of $\mathbf{n},\mathbf{m}$ , then from two applications of Lemma 4.4 (comparing $\mathbf{n}$ with $\mathbf{n}+\mathbf{b}_{i}a_{i}$ , and $\mathbf{m}$ with $\mathbf{m}+\mathbf{b}_{j}a_{j}$ ) we have
and hence from (4.18) (assuming $C_{1}$ large enough) we have
By the pigeonhole principle, we can therefore find $n,m\in B(S,\unicode[STIX]{x1D70C})$ such that
Using the local bilinearity of $\unicode[STIX]{x1D719}$ , the left-hand side may be written as
for some $\unicode[STIX]{x1D6FC},\unicode[STIX]{x1D6FD},\unicode[STIX]{x1D6FE}\in \mathbb{R}/\mathbb{Z}$ depending on $i,j,n,m$ whose exact values are not of importance to us. Evaluating the expectations and using the triangle inequality, we conclude that
and hence (by Lemma 2.2)
for $\gg \unicode[STIX]{x1D6FF}^{C_{1}/4+1}N_{i}\unicode[STIX]{x1D70C}/d$ values of $b_{i}$ in the range $1\leqslant b_{i}\leqslant \unicode[STIX]{x1D6FF}^{C_{1}/4}N_{i}\unicode[STIX]{x1D70C}/d$ . This average is a geometric series that can be explicitly computed, leading to the bound
for $\gg \unicode[STIX]{x1D6FF}^{C_{1}/4+1}N_{i}\unicode[STIX]{x1D70C}/d$ values of $b_{i}$ in the range $1\leqslant b_{i}\leqslant \unicode[STIX]{x1D6FF}^{C_{1}/4}N_{i}\unicode[STIX]{x1D70C}/d$ . Applying [Reference Green and Tao17, Lemma A.4] (which is really an observation of Vinogradov, used often in the theory of Weyl sums), we conclude that
for some natural number $k_{i,j}$ with $1\leqslant k_{i,j}\ll \unicode[STIX]{x1D6FF}^{-O(C_{1})}$ . If we then “clear denominators” by defining
then $1\leqslant k\ll \unicode[STIX]{x1D6FF}^{-O(C_{1}d^{2})}$ and
for all $1\leqslant i,j\leqslant d$ with $N_{i},N_{j}\geqslant d/\unicode[STIX]{x1D6FF}^{C_{1}/2}\unicode[STIX]{x1D70C}$ .
For any $n\in \mathbb{Z}/p\mathbb{Z}$ , we see from Corollary 4.9 that we can find integers $n_{1},\ldots ,n_{d}$ with
such that
In particular, if $n\in B(S,\unicode[STIX]{x1D6FF}^{C_{1}}\unicode[STIX]{x1D70C}/(C_{1}d)^{3d})$ , then $n_{i}$ is only non-zero when $N_{i}\geqslant d/\unicode[STIX]{x1D6FF}^{C_{1}/2}\unicode[STIX]{x1D70C}$ . From these bounds, (4.20), and the local bilinearity of $\unicode[STIX]{x1D719}$ , we conclude (4.19) as desired.◻
Local $U^{2}$ -inverse theorem
The global inverse $U^{2}$ theorem, which is a simple and well-known exercise in discrete Fourier analysis, asserts that if a $1$ -bounded function $f:\mathbb{Z}/p\mathbb{Z}\rightarrow \mathbb{C}$ obeys the bound
where $\mathbf{h}_{0},\mathbf{h}_{1},\mathbf{h}_{0}^{\prime },\mathbf{h}_{1}^{\prime }$ are drawn uniformly at random from $\mathbb{Z}/p\mathbb{Z}$ , then there exists $\unicode[STIX]{x1D709}\in \mathbb{Z}/p\mathbb{Z}$ such that
where $\mathbf{h}$ is also drawn uniformly at random from $\mathbb{Z}/p\mathbb{Z}$ .
In this section we give a local version of the above claim, in which the random variables $\mathbf{h},\mathbf{h}_{0},\mathbf{h}_{1},\mathbf{h}_{0}^{\prime },\mathbf{h}_{1}^{\prime }$ are localized to a small Bohr set. If the rank of the Bohr set is bounded, one can modify the above arguments to obtain a reasonable inverse theorem of this nature, but in our application the rank of the Bohr set will be rather large, and it will be important that this rank does not affect the lower bound in correlations of the form (4.22). Fortunately, such a result is available, and will be crucial in the proofs of the two remaining claims (Corollary 4.13 and Theorem 8.1) needed to prove Theorem 1.1.
Here is a precise version of the claim.
Theorem 4.12. Let $S\subset \mathbb{Z}/p\mathbb{Z}$ be non-degenerate for some prime $p$ , and let $0<\unicode[STIX]{x1D702}<1/2$ . Let $\unicode[STIX]{x1D70C}_{0},\unicode[STIX]{x1D70C}_{1}$ be real parameters with $0<\unicode[STIX]{x1D70C}_{1}<\unicode[STIX]{x1D70C}_{0}<1/2$ and such that
for a sufficiently large absolute constant $C$ . Let $f:\mathbb{Z}/p\mathbb{Z}\rightarrow \mathbb{C}$ be a $1$ -bounded function such that
where $\mathbf{h}_{0},\mathbf{h}_{0}^{\prime },\mathbf{h}_{1},\mathbf{h}_{1}^{\prime }$ are drawn independently and regularly from $B(S,\unicode[STIX]{x1D70C}_{0})$ , $B(S,\unicode[STIX]{x1D70C}_{0})$ , $B(S,\unicode[STIX]{x1D70C}_{1})$ , $B(S,\unicode[STIX]{x1D70C}_{1})$ respectively. Then there exists $\unicode[STIX]{x1D709}\in \mathbb{Z}/p\mathbb{Z}$ such that
where $\mathbf{n}_{0},\mathbf{n}_{1}$ are drawn independently and regularly from $B(S,\unicode[STIX]{x1D70C}_{0}),B(S,\unicode[STIX]{x1D70C}_{1})$ respectively.
Proof. We thank Fernando Shao for supplying a proof of this result, which was considerably simpler than our original argument.
For this proof, which is Fourier-analytic in nature, it will be convenient to work explicitly with probability densities rather than probabilistic notation. (However, in the lengthier proof of the local inverse $U^{3}$ theorem given in the next section, the probabilistic notation will be significantly cleaner to use.) In this argument, all sums will be over $\mathbb{Z}/p\mathbb{Z}$ . We abbreviate
for $i=0,1$ and $h\in \mathbb{Z}/p\mathbb{Z}$ ; clearly we have $\mathfrak{p}_{i}(h)\geqslant 0$ and
The hypothesis (4.24) may be written as
and our goal is to locate $\unicode[STIX]{x1D709}\in \mathbb{Z}/p\mathbb{Z}$ such that
The first step is to replace the factor $\mathfrak{p}_{0}(h_{0})$ by the slightly different factor $\mathfrak{p}_{0}^{1/2}(h_{0}+h_{1})\mathfrak{p}_{0}^{1/2}(h_{0}+h_{1}^{\prime })$ . If we use the elementary inequality $|x^{1/2}-y^{1/2}|\leqslant |x-y|^{1/2}$ for $x,y\geqslant 0$ and then apply Cauchy–Schwarz, Lemma 4.4, and (4.23), we see that
for any $h_{1}$ in the support of $\mathfrak{p}_{1}$ , where the $1$ -bounded function $b_{h_{1}}$ is given by $b_{h_{1}}(h_{0}):=\operatorname{sgn}(\mathfrak{p}_{0}(h_{0}+h_{1})-\mathfrak{p}_{0}(h_{0}))$ . Similarly we have
whenever $h_{1}^{\prime }$ is also in the support of $\mathfrak{p}_{1}$ ; by the triangle inequality, we conclude that
for all $h_{1},h_{1}^{\prime }$ in the support of $\mathfrak{p}_{1}$ . From the $1$ -boundedness of $f$ and (4.25), we conclude that
If $C$ is large enough, the left-hand side is thus bounded by $0.1\unicode[STIX]{x1D702}$ (for example), so by (4.26) and the triangle inequality we conclude that
If we write
we may rewrite the above estimate as
A similar argument then lets us replace $\mathfrak{p}_{0}(h_{0}^{\prime })$ with $\mathfrak{p}_{0}^{1/2}(h_{0}^{\prime }+h_{1})\mathfrak{p}_{0}^{1/2}(h_{0}^{\prime }+h_{1}^{\prime })$ , leaving us with
which we can simplify using (4.27) to
Making the change of variables $n:=h_{1}-h_{1}^{\prime }$ , we may rewrite the left-hand side as
where $\tilde{f}_{0}(n):=\overline{f_{0}}(-n)$ , and similarly for $p_{1}$ , and $f\ast g$ denotes the discrete convolution
Using the Fourier transform, we may then rewrite the previous bound as
where
From (4.25), the $1$ -boundedness of $f$ , and the Plancherel identity we have
By this, (4.28), and the pigeonhole principle, we may therefore find $\unicode[STIX]{x1D709}\in \mathbb{Z}/p\mathbb{Z}$ such that
By the Plancherel identity again, the left-hand side may be rewritten as
and hence (by replacing $n_{1}$ with $-n_{1}$ and using (4.27))
By argument similar to those at the beginning of the proof, we may replace $\mathfrak{p}_{0}^{1/2}(n_{0}+n_{1})$ by $\mathfrak{p}_{0}^{1/2}(n_{0})$ and conclude that
and the claim follows. ◻
As a corollary of this inverse theorem, we can establish that locally almost linear phases on Bohr sets can be approximated by globally linear phases; this will be needed in §7 to deal with poorly distributed quadratic factors.
Here is a precise statement.
Corollary 4.13. Let $\unicode[STIX]{x1D719}:n_{0}+B(S,\unicode[STIX]{x1D70C})\rightarrow \mathbb{R}/\mathbb{Z}$ be a function on a shifted Bohr set $n_{0}+B(S,\unicode[STIX]{x1D70C})$ that is “locally almost linear” in the sense that one has the bound
for all $h,k\in B(S,\unicode[STIX]{x1D70C}/2)$ and some $A\geqslant 1$ . Then there exists $\unicode[STIX]{x1D709}\in \mathbb{Z}/p\mathbb{Z}$ such that
for all $h\in B(S,\unicode[STIX]{x1D70C})$ .
Proof. By translating in space, we may normalize so that $n_{0}=0$ ; by shifting $\unicode[STIX]{x1D719}$ by a phase, we may also suppose that $\unicode[STIX]{x1D719}(0)=0$ . By replacing $\unicode[STIX]{x1D70C}$ with the smaller quantity $\unicode[STIX]{x1D70C}/A^{1/2}$ if necessary, we may normalize $A$ to be $1$ (note that (4.30) is trivial for $\Vert h\Vert _{S^{\bot }}\,\geqslant \,\unicode[STIX]{x1D70C}/A^{1/2}$ ). Thus, we now have a function $\unicode[STIX]{x1D719}\,:\,B(S,\unicode[STIX]{x1D70C})\rightarrow \mathbb{R}/\mathbb{Z}$ with $\unicode[STIX]{x1D719}(0)=0$ such that the quantity
obeys the bound
for all $h,k\in B(S,\unicode[STIX]{x1D70C}/2)$ , and our task is to locate $\unicode[STIX]{x1D709}\in \mathbb{Z}/p\mathbb{Z}$ such that
for all $h\in B(S,\unicode[STIX]{x1D70C})$ .
Let $\unicode[STIX]{x1D70C}_{0}:=\unicode[STIX]{x1D70C}/100$ , and set $\unicode[STIX]{x1D70C}_{1}:=\unicode[STIX]{x1D70C}/C|S|^{3}$ for some sufficiently large absolute constant $C$ . If we let $f:\mathbb{Z}/p\mathbb{Z}\rightarrow \mathbb{C}$ be the $1$ -bounded function
and draw $\mathbf{h}_{0},\mathbf{h}_{0}^{\prime },\mathbf{h}_{1},\mathbf{h}_{1}^{\prime }$ independently and regularly from $B(S,\unicode[STIX]{x1D70C}_{0})$ , $B(S,\unicode[STIX]{x1D70C}_{0})$ , $B(S,\unicode[STIX]{x1D70C}_{1})$ , $B(S,\unicode[STIX]{x1D70C}_{1})$ respectively, then from (4.31) we have
Applying (4.32) and taking expectations, we conclude that
(for example). Applying Theorem 4.12 (which is applicable for $C$ large enough), we may thus find $\unicode[STIX]{x1D709}\in \mathbb{Z}/p\mathbb{Z}$ such that
if $\mathbf{n}_{0},\mathbf{n}_{1}$ are drawn independently and regularly from $B(S,\unicode[STIX]{x1D70C}_{0}),B(S,\unicode[STIX]{x1D70C}_{1})$ respectively. In particular, there exists $n\in B(S,\unicode[STIX]{x1D70C}_{0})$ such that
so by (4.32) we conclude that
For any $h\in B(S,\unicode[STIX]{x1D70C}_{1})$ , we have from Lemma 4.4 that
on the other hand, from (4.31) we have the identity
Combining this with (4.32), (4.35), and (2.2), we conclude that
for all $h\in B(S,\unicode[STIX]{x1D70C}_{1})$ . As the claim (4.33) is trivial for $h\in B(S,\unicode[STIX]{x1D70C})\backslash B(S,\unicode[STIX]{x1D70C}_{1})$ , the claim follows.◻
5 Dilated tori
As mentioned in Example 3 of §3, to maintain good quantitative control (and specifically, Lipschitz norm control) on the functions $F:G\rightarrow [-1,1]$ used to build quadratic approximants, one needs to generalize the underlying domain $G$ to more general tori than the standard tori $(\mathbb{R}/\mathbb{Z})^{d}$ with the usual norm structure. It turns out that it will suffice to work with dilated tori of the form
where $\unicode[STIX]{x1D706}_{1},\ldots ,\unicode[STIX]{x1D706}_{d}\geqslant 1$ are real numbers. One can view this dilated torus as the quotient of $\mathbb{R}^{d}$ by a dilated lattice $\unicode[STIX]{x1D6E4}:=\prod _{i=1}^{d}\unicode[STIX]{x1D706}_{i}\mathbb{Z}$ . We can place a “norm” on $G$ by declaring $\Vert x\Vert _{G}$ for $x\in G$ to be the Euclidean distance in $\mathbb{R}^{d}$ from $x$ to $\unicode[STIX]{x1D6E4}$ ; this generalizes the norm $\Vert \Vert _{\mathbb{R}/\mathbb{Z}}$ from §2. This in turn defines a metric $d_{G}$ on $G$ by the formula
The volume $\operatorname{vol}(G)$ of a dilated torus is defined to be the product
It will be important to keep this quantity under control during the iteration process. In particular, when transforming from one dilated torus to another, the volume of the new torus should behave like a linear function of the existing torus; anything worse than this (e.g. quadratic behaviour) will lead to undesirable bounds upon iteration.
We define the Pontryagin dual ${\hat{G}}$ of a dilated torus $G$ to be the lattice
Elements $k$ of this dual will be called dual frequencies of the torus. If $k=(k_{1},\ldots ,k_{d})$ is a dual frequency and $x=(x_{1},\ldots ,x_{d})$ is an element of $G$ , we define the dot product $k\cdot x\in \mathbb{R}/\mathbb{Z}$ in the usual fashion as
noting that this gives a well-defined element of $\mathbb{R}/\mathbb{Z}$ .
A dual frequency $k$ is said to be irreducible if it is non-zero, and not of the form $k=nk^{\prime }$ for some other dual frequency $k^{\prime }$ and some natural number $n>1$ . If a dual frequency $k$ is irreducible, then its orthogonal complement
is a $(d-1)$ -dimensional subtorus of $G$ ; it inherits a metric $d_{k^{\bot }}$ from the torus $G$ it lies in. We will need to pass to such a complement when dealing with poorly distributed quadratic factors (as in the third or fourth examples in §3), however we encounter the technical issue that these complements $k^{\bot }$ will not quite be of the form of a dilated torus. However, we will be able to transform $k^{\bot }$ into a dilated torus using a bilipschitz transformation, as the following result shows.
Theorem 5.1. Let $G=\prod _{i=1}^{d}(\mathbb{R}/\unicode[STIX]{x1D706}_{i}\mathbb{Z})$ be a dilated torus, and let $k\in {\hat{G}}$ be an irreducible dual frequency of $G$ . Then there exists a dilated torus $G^{\prime }=\prod _{i=1}^{d-1}(\mathbb{R}/\unicode[STIX]{x1D706}_{i}^{\prime }\mathbb{Z})$ and a Lie group isomorphism $\unicode[STIX]{x1D713}:k^{\bot }\rightarrow G^{\prime }$ obeying the bilipschitz bounds
and such that one has the volume bound
where $|k|$ denotes the Euclidean magnitude of $k$ in $\mathbb{R}^{d}$ .
Proof. The case $d=0$ is vacuous and the case $d=1$ is trivial, so we may assume $d>1$ . One can identify $k^{\bot }$ with the quotient $V/\unicode[STIX]{x1D6E4}$ , where $V:=\{x\in \mathbb{R}^{d}:k\cdot x=0\}$ is the hyperplane in $\mathbb{R}^{d}$ orthogonal to $k$ (now viewed as an element of $\mathbb{R}^{d}$ ), and $\unicode[STIX]{x1D6E4}:=V\cap \prod _{i=1}^{d}(\unicode[STIX]{x1D706}_{i}\mathbb{Z})$ is the restriction of the lattice $\prod _{i=1}^{d}(\unicode[STIX]{x1D706}_{i}\mathbb{Z})$ to $V$ .
As $k$ is irreducible, there exists a vector $e$ in the lattice $\prod _{i=1}^{d}(\unicode[STIX]{x1D706}_{i}\mathbb{Z})$ with $k\cdot e=1$ ; thus $e$ has distance $1/|k|$ to $V$ . One can form a fundamental domain of $\mathbb{R}^{d}/\prod _{i=1}^{d}(\unicode[STIX]{x1D706}_{i}\mathbb{Z})$ by taking any fundamental domain for $V/\unicode[STIX]{x1D6E4}$ and performing the Minkowski sum of that domain with the interval $\{te:0\leqslant t\leqslant 1\}$ . By Fubini’s theorem, the $d$ -dimensional Lebesgue measure of such a sum will equal the $(d-1)$ -dimensional Lebesgue measure of the fundamental domain of $V/\unicode[STIX]{x1D6E4}$ and $1/|k|$ ; thus the covolume of $\prod _{i=1}^{d}(\unicode[STIX]{x1D706}_{i}\mathbb{Z})$ in $\mathbb{R}^{d}$ equals $1/|k|$ times the covolume of $\unicode[STIX]{x1D6E4}$ in $V$ . As the former covolume (determinant) is $\prod _{i=1}^{d}\unicode[STIX]{x1D706}_{i}=\operatorname{vol}(G)$ , we conclude that $\unicode[STIX]{x1D6E4}$ has covolume $|k|\operatorname{vol}(G)$ in $V$ .
Applying Lemma 4.8, we can find linearly independent elements $v_{1},\ldots ,v_{d-1}$ generating $\unicode[STIX]{x1D6E4}$ such that
for all $t>0$ , where $B_{V}(0,r)$ is the Euclidean ball of radius $r$ in $V$ , and the $n_{i}$ are understood to be integers, with the bound
From (5.3) we conclude in particular that
for all $1\leqslant i\leqslant d$ .
We now define the $(d-1)$ -dimensional dilated torus
and the isomorphism $\unicode[STIX]{x1D719}:V/\unicode[STIX]{x1D6E4}\rightarrow G^{\prime }$ by the formula
for real numbers $t_{1},\ldots ,t_{d-1}$ . It is easy to see that this is a Lie group isomorphism, and the bound (5.2) follows from (5.4). It remains to establish the bilipschitz bounds (5.1). It suffices to show that the linear isomorphism
from $V$ to $\mathbb{R}^{d-1}$ , together with its inverse, have an operator norm of $O(d^{O(d)})$ . For the inverse map, this is clear from (5.5). For the forward map, it suffices from Cramer’s rule to show that
for all $i=1,\ldots ,d-1$ and all unit vectors $x$ in $V$ . But from (5.5) the numerator is at most $\prod _{1\leqslant i^{\prime }\leqslant d-1:i^{\prime }\neq i}N_{i^{\prime }}^{-1}$ , while the denominator is the volume of a fundamental domain in $V$ and is thus equal to $d^{O(d)}N_{1}^{-1}\ldots N_{d-1}^{-1}$ thanks to (5.4). The claim follows.◻
6 Constructing the approximants
In this section we construct the abstract directed graph $G=(V,E)$ that appears in Proposition 3.3. For the rest of the paper, the prime $p$ , the function $f:\mathbb{Z}/p\mathbb{Z}\rightarrow [-1,1]$ , and the parameter $\unicode[STIX]{x1D702}$ with $0<\unicode[STIX]{x1D702}\leqslant {\textstyle \frac{1}{10}}$ are fixed, and we assume that (3.21) holds.
We begin with a description of the structured approximants $v\in V$ .
Definition 6.1 (Structured local approximant).
A structured local approximant is a tuple
consisting of the following objects:
-
∙ a finite non-empty set $C$ ;
-
∙ a random variable $\mathbf{c}$ , which we call the label variable, taking values in $C$ ;
-
∙ a shifted Bohr set $n_{c}+B(S_{c},\unicode[STIX]{x1D70C}_{c})$ associated to each label $c\in C$ ;
-
∙ a dilated torus $G_{c}$ associated to each label $c\in C$ ;
-
∙ a $1$ -Lipschitz function $F_{c}:G_{c}\rightarrow [-1,1]$ associated to each label $c\in \mathbb{C}$ ; and
-
∙ a locally quadratic function $\unicode[STIX]{x1D6EF}_{c}:n_{c}+B(S_{c},\unicode[STIX]{x1D70C}_{c})\rightarrow G_{c}$ associated to each label $c\in C$ .
We denote the collection of all structured local approximants (up to isomorphismFootnote 3 ) as $V$ . Given any structured local approximant $v\in V$ , we define the random variables $(\mathbf{a}_{v},\mathbf{r}_{v},\mathbf{f}_{v})$ associated to $v$ by the following construction.
-
(1) First, let $\mathbf{c}$ be the random label variable appearing above.
-
(2) For each $c\in C$ in the essential range of $\mathbf{c}$ , if we condition on the event $\mathbf{c}=c$ , we draw $\mathbf{a}_{v},\mathbf{r}_{v}$ independently and regularly from $n_{c}+B(S_{c},\unicode[STIX]{x1D70C}_{c}/2)$ and $B(S_{c},\exp (-\unicode[STIX]{x1D702}^{-C_{4}})\unicode[STIX]{x1D70C}_{c})$ respectively, and then we let $\mathbf{f}_{v}$ be the function
$$\begin{eqnarray}\mathbf{f}_{v}(a):=F_{c}(\unicode[STIX]{x1D6EF}_{c}(a)).\end{eqnarray}$$
Thus $\mathbf{f}_{v}$ is deterministic when $\mathbf{c}$ is conditioned to be fixed, but random when $\mathbf{c}$ is allowed to vary.
We also define the following additional statistics of the structured local approximant $v$ :
-
∙ the waste $\operatorname{waste}(v)$ is the quantity $|\mathbb{E}f(\mathbf{a})-\mathbb{E}_{a\in \mathbb{Z}/p\mathbb{Z}}f(a)|$ ;
-
∙ the $1$ -error $\operatorname{Err}_{1}(v)$ is $|\mathbb{E}\mathbf{f}(\mathbf{a})-\mathbb{E}f(\mathbf{a})|$ ;
-
∙ the $4$ -error $\operatorname{Err}_{4}(v)$ is $|\unicode[STIX]{x1D6EC}_{\mathbf{a},\mathbf{r}}(\mathbf{f})-\unicode[STIX]{x1D6EC}_{\mathbf{a},\mathbf{r}}(f)|$ ;
-
∙ the energy $\operatorname{Energy}(v)$ is $\mathbb{E}|f(\mathbf{a})-\mathbf{f}(\mathbf{a})|^{2}$ ;
-
∙ the linear rank $d_{1}(v)$ is $\max _{c\in C}|S_{c}|$ ;
-
∙ the quadratic dimension $d_{2}(v)$ is $\max _{c\in C}\dim (G_{c})$ ;
-
∙ the linear scale $\unicode[STIX]{x1D70C}(v)$ is $\min _{c\in C}\unicode[STIX]{x1D70C}_{c}$ ;
-
∙ the quadratic volume $\operatorname{vol}(v)$ is the quantity $\max _{c\in C}\operatorname{vol}(G_{c})$ ;
-
∙ the poorly distributed quadratic dimension $d_{2}^{\text{poor}}(v)$ is the maximum value of $\dim (G_{c})$ over all poorly distributed $c$ in the essential range of $\mathbf{c}$ , or zero if no such $c$ exists. Here, an element $c$ in the essential range of $\mathbf{c}$ is said to be poorly distributed if one has
(6.1) $$\begin{eqnarray}\unicode[STIX]{x1D6EC}_{\mathbf{a},\mathbf{r}}(f|\mathbf{c}=c)<\mathbb{E}(\mathbf{f}(\mathbf{a})|\mathbf{c}=c)^{4}-\frac{\unicode[STIX]{x1D702}}{2}.\end{eqnarray}$$
This gives the set $V$ of structured local approximants for Proposition 3.3; we clearly have $0\leqslant d_{2}^{\text{poor}}(v)\leqslant d_{2}(v)$ for all $v\in V$ .
We now also define the initial approximant.
Definition 6.2. The initial approximant $v_{0}\in V$ is defined to be the tuple
defined as follows:
-
∙ $C:=\mathbb{Z}/p\mathbb{Z}$ , and $\mathbf{c}$ is drawn uniformly from $C$ ;
-
∙ for each $c\in C$ , we have $n_{c}:=0$ , $S_{c}:=\{1\}$ , and $\unicode[STIX]{x1D70C}_{c}:=1$ ;
-
∙ for each $c\in C$ , the group $G_{c}$ is the standard $0$ -torus $(\mathbb{R}/\mathbb{Z})^{0}$ (that is to say, a point);
-
∙ for each $c\in C$ , the function $F_{c}:G_{c}\rightarrow [-1,1]$ is the zero function $F_{c}(x):=0$ ;
-
∙ for each $c\in C$ , the function $\unicode[STIX]{x1D6EF}_{c}:\mathbb{Z}/p\mathbb{Z}\rightarrow G_{c}$ is the unique (constant) map from $\mathbb{Z}/p\mathbb{Z}$ to the point $G_{c}$ .
By chasing the definitions, we see that $\mathbf{a}_{v_{0}}$ is uniformly distributed in $\mathbb{Z}/p\mathbb{Z}$ , and we can compute several of the statistics of the initial approximant $v_{0}$ :
Now we define the edges of the graph $G(V,E)$ .
Definition 6.3. We let $E$ be the set of all directed edges $v\rightarrow v^{\prime }$ , where $v,v^{\prime }\in V$ are structured local approximants such that
From this definition and (6.2) we have the following bounds on the various statistics of vertices of $V$ that are not too far from the initial vertex $v_{0}$ , assuming that each constant $C_{i}$ is chosen sufficiently large depending on the preceding constants $C_{1},\ldots ,C_{i-1}$ .
Lemma 6.4. Suppose a vertex $v=v_{k}\in V$ can be reached from $v_{0}$ by a path $v_{0}\rightarrow v_{1}\rightarrow \cdots \rightarrow v_{k}$ with $0\leqslant k\leqslant 8\unicode[STIX]{x1D702}^{-2C_{2}}$ . Then we have
From (6.7) we see in particular that the almost uniformity axiom in Proposition 3.3(ii) is obeyed. The thickness axiom in Proposition 3.3(i) is also easy, as the following corollary shows.
Corollary 6.5. Suppose a quadratic approximant $v=v_{k}\in V$ can be reached from $v_{0}$ by a path $v_{0}\rightarrow v_{1}\rightarrow \cdots \rightarrow v_{k}$ of length $k$ at most $8\unicode[STIX]{x1D702}^{-2C_{2}}$ . Then we have $\mathbb{P}(\mathbf{r}_{v}=0)\ll \exp (\unicode[STIX]{x1D702}^{-C_{5}^{2}})/p$ .
Proof. Write
It suffices to show that
for each $c$ in the essential range of $\mathbf{c}$ . But once $\mathbf{c}$ is fixed to equal $\mathbf{c}$ , then $\mathbf{r}_{v}$ is drawn regularly from $n_{c}+B(S_{c},\exp (-\unicode[STIX]{x1D702}^{-C_{4}})\unicode[STIX]{x1D70C}_{c})$ . By Lemma 6.4, $S_{c}$ has cardinality at most $8\unicode[STIX]{x1D702}^{-3C_{2}}$ and $\unicode[STIX]{x1D70C}_{c}$ is at least $\exp (-\unicode[STIX]{x1D702}^{-2C_{5}})$ . The claim now follows from Lemma 4.2.◻
It remains to verify the last two axioms (iii), (iv) of Proposition 3.3. We isolate these statements formally, using Lemma 6.4 and Definition 6.3.
The first of these results, Theorem 6.6, states that “a bad approximation implies an energy decrement”. The second, Theorem 6.7, states that “a bad lower bound implies a dimension increment”.
Theorem 6.6. Let the notation and hypotheses be as above. Suppose that $v\in V$ is a structured local approximant obeying (6.3)–(6.6). If we have
or
then there exists a structured local approximant $v^{\prime }$ obeying the bounds
Theorem 6.7. Let the notation and hypotheses be as above. Suppose that $v\in V$ is a structured local approximant obeying (6.3)–(6.6), and let $\mathbf{a}_{v},\mathbf{r}_{v},\mathbf{f}_{v}$ be the random variables associated to $v$ . If we have
then there exists a quadratic approximant $v^{\prime }\in V$ with
It remains to prove Theorems 6.6 and 6.7. Theorem 6.6 will be proven in §8 using a difficult local inverse Gowers theorem, Theorem 8.1, that will be proven in later sections. Theorem 6.7, on the other hand, will not rely on the local inverse Gowers theorem; it is proven in §7.
7 Bad lower bound implies dimension decrement
In this section we prove Theorem 6.7. Let the notation and hypotheses be as in Theorem 6.7. We abbreviate $\mathbf{a}_{v},\mathbf{r}_{v},\mathbf{f}_{v}$ as $\mathbf{a},\mathbf{r},\mathbf{f}$ respectively. We can write the left-hand side of (6.16) as $\mathbb{E}A(\mathbf{c})$ , where for any $c\in C$ , the quantity $A(c)$ is defined as the conditional expectation
Similarly, we can write $\mathbb{E}\mathbf{f}(\mathbf{a})=\mathbb{E}B(\mathbf{c})$ , where $B(\mathbf{c}):=\mathbb{E}(\mathbf{f}(\mathbf{a})|\mathbf{c}=c)$ . By (6.16) and Hölder’s inequality, we thus have
Applying Lemma 2.2, we must therefore have
By (6.1), we conclude that $\mathbf{c}$ is poorly distributed with probability $\gg \unicode[STIX]{x1D702}$ . In particular, there is at least one poorly distributed value of $c$ .
Most of this section will be devoted to the proof of the following proposition, which roughly speaking asserts that when $\mathbf{c}$ is poorly distributed, there is a linear constraint between the quadratic frequencies that will ultimately allow us to decrease the poorly distributed quadratic dimension $d_{2}^{\text{poor}}$ .
Proposition 7.1. Let $c$ be a poorly distributed element of the essential range of $\mathbf{c}$ . Then there exists a natural number $m_{c}$ , a frequency $\unicode[STIX]{x1D709}_{c}\in \mathbb{Z}/p\mathbb{Z}$ and an irreducible dual frequency $k_{c}^{\prime }\in {\hat{G}}_{c}$ with
and
such that
for all $a\in B(S_{c},\unicode[STIX]{x1D70C}_{c}/2)$ and $h\in B(S_{c}\cup \{\unicode[STIX]{x1D709}_{c}\},\exp (-\unicode[STIX]{x1D702}^{-5C_{4}})\unicode[STIX]{x1D70C})$ .
A key technical point here is that the upper bound on $|k_{c}^{\prime }|$ involves only $C_{2}$ and not $C_{3}$ or $C_{4}$ ; this is necessary to keep the bounds under control during the iteration process. However, we will be able to tolerate the presence of the $C_{3}$ and $C_{4}$ constants in the other components of Proposition 7.1.
Proof. We condition on the event $\mathbf{c}=c$ . By Definition 6.1, the random variables $\mathbf{a},\mathbf{r}$ are now independent and regularly drawn from $n_{c}+B(S_{c},\unicode[STIX]{x1D70C}_{c}/2)$ and $B(S_{c},\exp (-\unicode[STIX]{x1D702}^{-C_{4}})\unicode[STIX]{x1D70C}_{c})$ respectively, while $\mathbf{f}(n)=F_{c}(\unicode[STIX]{x1D6EF}_{c}(a))$ . We conclude that
Since $\unicode[STIX]{x1D6EF}_{c}:\mathbb{Z}/p\mathbb{Z}\rightarrow G_{c}$ is locally quadratic on $n_{c}+B(S_{c},\unicode[STIX]{x1D70C}_{c})$ , which contains the progression $\mathbf{a},\mathbf{a}+\mathbf{r},\mathbf{a}+2\mathbf{r},\mathbf{a}+3\mathbf{r}$ , we see from (4.17) that
and so the left-hand side can be written as
where $F_{c}^{(3)}:G_{c}^{3}\rightarrow [-1,1]$ is the function
Applying Lemma 3.2, we have
where $\unicode[STIX]{x1D707}_{c}$ is the probability Haar measure on $G_{c}$ . By the triangle inequality, we conclude that at least one of the assertions
or
holds. Defining $\tilde{F}:G_{c}^{3}\rightarrow [-1,1]$ by
in the former case and
in the latter case, we see that $\tilde{F}$ is $1$ -Lipschitz and of mean zero, and
where $\mathbf{x}_{c}\in G_{c}^{3}$ is the random variable
The Weyl equidistribution criterion, applied in the contrapositive, then suggests that there should be a non-zero dual frequency $k=(k_{1},k_{2},k_{3})\in {\hat{G}}_{c}^{3}$ to $G_{c}^{3}$ such that $\mathbb{E}(e(k\cdot \mathbf{x}_{c})|\mathbf{c}=c)$ is large. The next lemma makes this intuition precise.
Lemma 7.2 (Weyl equidistribution).
With the notation and hypotheses as above, there exists a non-zero dual frequency $k=(k_{1},k_{2},k_{3})\in {\hat{G}}_{c}^{3}$ to $G_{c}^{3}$ with $|k|\ll \exp (O(\unicode[STIX]{x1D702}^{-3C_{2}}))$ such that
A key point here is that the bound on $|k|$ does not depend on the volume of the dilated torus $G_{c}$ , which will typically be much larger than $\unicode[STIX]{x1D702}^{-2C_{2}-10}$ .
Proof. Write $G_{c}=\prod _{i=1}^{d}(\mathbb{R}/\unicode[STIX]{x1D706}_{i}\mathbb{Z})$ , thus $\unicode[STIX]{x1D706}_{1},\ldots ,\unicode[STIX]{x1D706}_{d}\geqslant 1$ , and by (6.4) one has
The bound (7.4) is not possible when $d=0$ , so we may assume $d\geqslant 1$ . We can write $G_{c}^{3}=\prod _{i=1}^{3d}(\mathbb{R}/\unicode[STIX]{x1D706}_{i}\mathbb{Z})$ , where we extend $\unicode[STIX]{x1D706}_{i}$ periodically with period $d$ .
Let $\unicode[STIX]{x1D711}:\mathbb{R}\rightarrow \mathbb{R}$ be a fixed smooth even function supported on $[-1,1]$ that equals $1$ at the origin and whose Fourier transform $\hat{\unicode[STIX]{x1D711}}(\unicode[STIX]{x1D709}):=\int _{\mathbb{R}}\unicode[STIX]{x1D719}(x)e(-x\unicode[STIX]{x1D709})\,dx$ is non-negative; such a function may be easily constructed by convolving an $L^{2}$ -normalized smooth function on $[0,1]$ with its reflection. Let $A\geqslant 1$ be a parameter to be chosen later, and introduce the kernel $K:G_{c}^{3}\rightarrow \mathbb{R}^{+}$ by the formula
for $t_{i}\in \mathbb{R}/\unicode[STIX]{x1D706}_{i}\mathbb{Z}$ , where
By Poisson summation, the $K_{i}$ and hence $K$ are non-negative. A Fourier-analytic calculation using the smoothness of $\unicode[STIX]{x1D711}$ gives
and
(where the implied constant is allowed to depend on $\unicode[STIX]{x1D711}$ ) and hence by (2.2) and Cauchy–Schwarz we have
which on taking tensor products gives
and
where $\unicode[STIX]{x1D707}_{c}^{3}$ is the Haar probability measure on $G_{c}^{3}$ . If we then take the convolution
then by the $1$ -Lipschitz nature of $\tilde{F}$ we see that
Thus, if we choose
for a sufficiently large absolute constant $C$ , we conclude from (7.4) that
However, by Fourier expansion and the fact that $\tilde{F}$ has mean zero,
where $k=(k_{1},\ldots ,k_{3d})$ with $k_{i}\in (1/\unicode[STIX]{x1D706}_{i})\mathbb{Z}$ for $i=1,\ldots ,3d$ , and
Using the triangle inequality and crudely bounding $|\widehat{\tilde{F}}(k)|$ by $1$ , we conclude that
The summand is only non-vanishing when $\sup _{i}|k_{i}|\leqslant A$ , so that
(thanks to (7.5) and the choice of $A$ ), and the number of such $k$ is
Since $\unicode[STIX]{x1D711}$ is bounded, the claim now follows from the pigeonhole principle.◻
We return to the proof of Proposition 7.1. Applying Lemma 7.2 and (6.5), we see that there exists a non-zero triplet $(k_{c}^{0},k_{c}^{1},k_{c}^{2})\in {\hat{G}}_{c}^{3}$ with
and
Among other things, the non-zero nature of this triplet forces $G_{c}$ to be non-trivial, and thus
We also emphasize that the bound (7.6) involves $C_{2}$ rather than $C_{3}$ ; this will become important when establishing the important upper bound of (7.2) later in this proof.
We can use the exponential sum bound (7.7) to control the “second derivative” of $\unicode[STIX]{x1D6EF}_{c}$ . Indeed, for any $h_{1},h_{2}\in B(S_{c},\unicode[STIX]{x1D70C}_{c}/10)$ , define the quantity $\unicode[STIX]{x2202}^{2}\unicode[STIX]{x1D6EF}_{c}(h_{1},h_{2})\in \mathbb{R}/\mathbb{Z}$ by
for any $a\in n_{c}+B(S_{c},\unicode[STIX]{x1D70C}/2)$ . Since $\unicode[STIX]{x1D6E4}_{\mathbf{c}}$ is locally quadratic on $n_{c}+B(S_{c},\unicode[STIX]{x1D70C})$ , this quantity is well-defined, symmetric in $h_{1},h_{2}$ , and is also locally bilinear in $h_{1}$ and $h_{2}$ .
Lemma 7.3. Let the notation and hypotheses be as above. Then for any $i=0,1,2$ , we have
where, conditioning on the event $\mathbf{c}=c$ , the random variables $\mathbf{r},\mathbf{r}^{\prime },\mathbf{h},\mathbf{h}^{\prime }$ are drawn independently and regularly from the Bohr sets $B(S_{c},\exp (-\unicode[STIX]{x1D702}^{-C_{4}})\unicode[STIX]{x1D70C})$ , $B(S_{c},\exp (-\unicode[STIX]{x1D702}^{-C_{4}})\unicode[STIX]{x1D70C})$ , $B(S_{c},\exp (-\unicode[STIX]{x1D702}^{-2C_{4}})\unicode[STIX]{x1D70C})$ , $B(S_{c},\exp (-\unicode[STIX]{x1D702}^{-2C_{4}})\unicode[STIX]{x1D70C})$ respectively, independently of $\mathbf{a}$ .
Proof. To simplify the notation we only consider the $i=2$ case, as the $i=0,1$ cases are similar. This will be “Weyl differencing” argument that relies primarily on the Cauchy–Schwarz inequality.
Recall that after conditioning to the event $\mathbf{c}=c$ , the random variable $\mathbf{a}$ is drawn regularly from $B(S_{c},\unicode[STIX]{x1D70C}/2)$ . Using Lemma 4.4, we see that $\mathbf{a}$ and $\mathbf{a}-\mathbf{h}$ differ in total variation by $O(\exp (-\unicode[STIX]{x1D702}^{-C_{4}/2}))$ , hence from (7.7) we have
Similarly we may use Lemma 4.4 to compare $\mathbf{r}$ and $\mathbf{r}+\mathbf{h}$ , and conclude that
By the pigeonhole principle (and independence of $\mathbf{a},\mathbf{h},\mathbf{r}$ relative to the event $\mathbf{c}=c$ ), we may thus find $a_{c}\in n_{c}+B(S_{c},\unicode[STIX]{x1D70C}/2)$ such that
Using the identity
we can rewrite the left-hand side as
where $b_{1},b_{2}:B(S_{c},\unicode[STIX]{x1D70C})\rightarrow \mathbb{C}$ are the $1$ -bounded functions
and
Applying Lemma 2.1 to eliminate the $\mathbf{b}_{1}(\mathbf{r})$ factor, we conclude that
Applying Lemma 2.1 again to eliminate the $b_{2}(\mathbf{h})\overline{b_{2}(\mathbf{h}^{\prime })}$ factor, we obtain the claim.◻
We return to the proof of Proposition 7.1. Let $i=i_{c}\in \{0,1,2\}$ be such that $k_{c}^{i}$ is non-zero. Let $\mathbf{r},\mathbf{r}^{\prime },\mathbf{h},\mathbf{h}^{\prime }$ be as in the above lemma, and let $\mathbf{h}^{\prime \prime }$ be a further independent copy of $\mathbf{h}$ or $\mathbf{h}^{\prime }$ , thus $\mathbf{h}^{\prime \prime }$ is also drawn regularly from $B(S_{c},\exp (-\unicode[STIX]{x1D702}^{-2C_{4}})\unicode[STIX]{x1D70C})$ and independently of $\mathbf{r},\mathbf{r}^{\prime },\mathbf{h},\mathbf{h}^{\prime }$ (after conditioning on $\mathbf{c}=c$ ). Applying Lemma 4.4 to compare $\mathbf{r}$ with $\mathbf{r}+\mathbf{h}^{\prime \prime }$ , we have
so by the pigeonhole principle we can find $r,r^{\prime },h^{\prime }\in B(S_{c},\exp (-\unicode[STIX]{x1D702}^{-C_{4}})\unicode[STIX]{x1D70C}_{c})$ (depending on $c$ , of course) such that
By the local bilinearity of $\unicode[STIX]{x2202}^{2}\unicode[STIX]{x1D6EF}_{c}$ , we may thus have
for some locally linear functions $\unicode[STIX]{x1D713},\unicode[STIX]{x1D713}^{\prime \prime }:B(S_{c},\unicode[STIX]{x1D70C}/100)\rightarrow \mathbb{R}/\mathbb{Z}$ (which can depend on $c$ ).
Applying Proposition 4.11 (recalling from (6.3) that $|S_{c}|\leqslant 8\exp (-3C_{2})$ ), we conclude that there exists a non-zero multiple $k_{c}\in {\hat{G}}_{c}$ of $k_{c}^{i}$ with
such that
for $n,m\in B(S_{c},\exp (-\unicode[STIX]{x1D702}^{-3C_{4}})\unicode[STIX]{x1D70C}_{c})$ .
Applying Corollary 4.13, we may thus find $\unicode[STIX]{x1D709}_{c}\in \mathbb{Z}/p\mathbb{Z}$ such that
for all $n\in \mathbb{Z}/p\mathbb{Z}$ (of course, the bound is only non-trivial when $h$ lies in the Bohr set $B(S_{c},\exp (-\unicode[STIX]{x1D702}^{-4C_{4}})\unicode[STIX]{x1D70C})$ ).
The dual frequency $k_{c}\in \widehat{G_{c}}$ is non-zero, but not necessarily irreducible. However, we may write $k_{c}=m_{c}k_{c}^{\prime }$ where $m_{c}$ is a positive natural number and $k_{c}^{\prime }\in \widehat{G_{c}}$ is irreducible, thus by (7.8) we have the bound (7.1). The same argument gives the bound $k_{c}^{\prime }\ll \exp (\unicode[STIX]{x1D702}^{-4C_{3}})$ , but this is not sufficient to establish the upper bound in (7.2). However, observe that $k_{c}^{i}$ must also be a multiple of the irreducible vector $k_{c}^{\prime }$ , and now the upper bound in (7.2) follows from (7.6).
We can also obtain a lower bound on $k_{c}^{\prime }$ by observing that the slab
has measure at most $|k_{c}^{\prime }|\operatorname{vol}(G_{c})$ , and contains the Euclidean ball of radius $1/2$ centred at the origin. This gives the lower bound
which by (6.4), (6.6) gives the lower bound in (7.2).
Now let $a\in B(S_{c},\unicode[STIX]{x1D70C}_{c}/2)$ and $h\in B(S_{c}\cup \{\unicode[STIX]{x1D709}_{c}\},\exp (-\unicode[STIX]{x1D702}^{-5C_{4}})\unicode[STIX]{x1D70C}_{c})$ . Then we have
and
for all $j$ , $0\leqslant j\leqslant 2m_{c}$ . From (7.10) and (7.1), we conclude that
(for example). On the other hand, from (7.9) we have
and hence by the triangle inequality we have
for all $j$ , $0\leqslant j\leqslant 2m_{c}$ .
This is close to (7.3), but we will need to replace the dual frequency $k_{c}$ here with the irreducible dual frequency $k_{c}^{\prime }$ . To do this, we first observe that as $\unicode[STIX]{x1D6EF}_{c}$ is locally quadratic on $n_{c}+B(S_{c},\unicode[STIX]{x1D70C}_{c})$ , we may write
for all $j$ , $0\leqslant j\leqslant 2m_{c}$ , and some $\unicode[STIX]{x1D6FC},\unicode[STIX]{x1D6FD},\unicode[STIX]{x1D6FE}\in G_{c}$ depending on $c,a,h$ . Inserting this formula into the preceding estimate, we conclude that
for $j$ , $0\leqslant j\leqslant 2m_{c}$ . Applying this for $j=1,2$ and using the triangle inequality, we have
Since $2m_{c}k_{c}^{\prime }=2k_{c}$ and $(2m_{c})^{2}k_{c}^{\prime }=(2m_{c})2k_{c}$ , we conclude in particular (using (7.1)) that
and thus by (7.12) we obtain (7.3) as desired. This finally completes the proof of Proposition 7.1. ◻
We now return to the proof of Theorem 6.7. We are given a structured local approximant
and need to construct a modification
that somehow incorporates the linear constraint identified in Proposition 7.1 to decrement the poorly distributed quadratic dimension of $v^{\prime }$ , in the spirit of the third and fourth examples in §3. To avoid confusion, we shall restore the subscripts $(\mathbf{a}_{v},\mathbf{r}_{v},\mathbf{f}_{v})$ on the random variables associated to $v$ as per Definition 6.1, to distinguish them from the corresponding random variables $(\mathbf{a}_{v^{\prime }},\mathbf{r}_{v^{\prime }},\mathbf{f}_{v^{\prime }})$ that will be associated to $v^{\prime }$ .
We shall set $C^{\prime }:=(\mathbb{Z}/p\mathbb{Z})\times C$ , and let $\mathbf{c}^{\prime }$ be the random variable
Clearly $\mathbf{c}^{\prime }$ takes values in the non-empty finite set $C^{\prime }$ . Now we need to define $n_{c^{\prime }}^{\prime },S_{c^{\prime }}^{\prime },\unicode[STIX]{x1D70C}_{c^{\prime }}^{\prime },G_{c^{\prime }}^{\prime },F_{c^{\prime }}^{\prime },\unicode[STIX]{x1D6EF}_{c^{\prime }}^{\prime }$ for any given $c^{\prime }=(a,c)$ in $C^{\prime }$ . In the case where $c$ is not poorly distributed, we simply carry over the corresponding data from $v$ without further modification. That is to say, we define
whenever $c^{\prime }=(a,c)$ with $c$ not poorly distributed. If instead $c^{\prime }=(a,c)$ with $c$ poorly distributed, then we introduce the natural number $m_{c}$ , the dual frequency $k_{c}^{\prime }\in {\hat{G}}_{c}$ , and the frequency $\unicode[STIX]{x1D709}_{c}\in \mathbb{Z}/p\mathbb{Z}$ from Proposition 7.1; of course we can arrange matters so that $m_{c},k_{c}^{\prime },\unicode[STIX]{x1D709}_{c}$ depend only on $c$ and not on $a$ . Because of (7.1) and the hypothesis (3.21), the quantity $2m_{c}$ is invertible in the field $\mathbb{Z}/p\mathbb{Z}$ , and so we may define the dilate $(2m_{c})^{-1}\cdot S_{c}$ of $S_{c}$ inside $\mathbb{Z}/p\mathbb{Z}$ , and can similarly define the dilate $(2m_{c})^{-1}\unicode[STIX]{x1D709}_{c}$ of $\unicode[STIX]{x1D709}_{c}$ . We will need to do this division here to cancel some denominators appearing later in the argument.
In this poorly distributed case, we define the “linear” data $n_{c^{\prime }}^{\prime },S_{c^{\prime }}^{\prime },\unicode[STIX]{x1D70C}_{c^{\prime }}^{\prime }$ by
thus the shifted Bohr set $n_{c^{\prime }}^{\prime }+B(S_{c^{\prime }}^{\prime },\unicode[STIX]{x1D70C}_{c^{\prime }}^{\prime })$ will be a small subset of $n_{c}+B(S_{c},\unicode[STIX]{x1D70C}_{c})$ in which the radius $\unicode[STIX]{x1D70C}_{c}$ has been reduced and an additional frequency $\unicode[STIX]{x1D709}_{c}/2m_{c}$ has been added. As we shall see, this particular choice of this linear data will allow us to utilize the approximate constraint (7.3).
The constraint (7.3) has the effect of approximately restricting $\unicode[STIX]{x1D6EF}_{c}$ (on a suitable Bohr set) to a coset of the orthogonal complement $(k_{c}^{\prime })^{\bot }=\{x\in G_{c}:k_{c}^{\prime }\cdot x=0\}$ of $k_{c}^{\prime }$ in $G_{c}$ . Applying Theorem 5.1, (6.4), and the crucial bound (7.2), we may find a dilated torus $\tilde{G}_{c}=\prod _{i=1}^{\dim (G_{c})-1}(\mathbb{R}/\tilde{\unicode[STIX]{x1D706}}_{c,i}\mathbb{Z})$ with volume
as well as a Lie group isomorphism $\unicode[STIX]{x1D713}_{c}:(k_{c}^{\prime })^{\bot }\rightarrow \tilde{G}_{c}$ obeying the bilipschitz bounds
In particular, if we define the even more dilated torus
and let $\unicode[STIX]{x1D6FF}_{c}:G_{c}^{\prime }\rightarrow \tilde{G}_{c}$ be the rescaling map
then we see that $\unicode[STIX]{x1D713}^{-1}\circ \unicode[STIX]{x1D6FF}_{c}:G_{c}^{\prime }\rightarrow (k_{c}^{\prime })^{\bot }$ is a $1$ -Lipschitz Lie group isomorphism.
An element of $n_{c^{\prime }}^{\prime }+B(S_{c}^{\prime },\unicode[STIX]{x1D70C}_{c}^{\prime })$ can be uniquely represented in the form $n_{c^{\prime }}^{\prime }+2m_{c}h$ for $h\in B(S_{c}\cup \{\unicode[STIX]{x1D709}_{c}\},\exp (-\unicode[STIX]{x1D702}^{-6C_{4}})\unicode[STIX]{x1D70C}_{c})$ . From (7.3), we know that the point $\unicode[STIX]{x1D6EF}_{c}(n_{c^{\prime }}^{\prime }+2m_{c}h)-\unicode[STIX]{x1D6EF}_{c}(n_{c^{\prime }}^{\prime })$ lies within a $O(\exp (-\unicode[STIX]{x1D702}^{-3C_{4}}))$ -neighbourhood of the subtorus $(k_{c}^{\prime })^{\bot }$ . Using the lower bound in (7.2), we can find a locally linear projection $\unicode[STIX]{x1D70B}_{c}$ from this neighbourhood to the subtorus itself (e.g. by viewing the subtorus locally as a graph in $\dim (G_{c})-1$ of the $\dim (G_{c})$ coordinates and then projecting in the direction of the remaining coordinate), which moves each point in the neighbourhood by at most $O(\exp (-\unicode[STIX]{x1D702}^{-2C_{4}}))$ . From the $1$ -Lipschitz nature of $F_{c}$ , we thus have
We can rewrite this as
where $F_{c^{\prime }}^{\prime }:G_{c}^{\prime }\rightarrow [-1,1]$ is the $1$ -Lipschitz function
and $\unicode[STIX]{x1D6EF}_{c^{\prime }}^{\prime }:n_{c^{\prime }}^{\prime }+B(S_{c}^{\prime },\unicode[STIX]{x1D70C}_{c}^{\prime })\rightarrow G_{c}^{\prime }$ takes the form
The map $\unicode[STIX]{x1D6EF}_{c^{\prime }}^{\prime }$ is the composition of a locally quadratic map with three locally linear maps, and is hence also locally quadratic. This concludes the construction of all the required quadratic data $G_{c^{\prime }}^{\prime },F_{c^{\prime }}^{\prime },\unicode[STIX]{x1D6EF}_{c^{\prime }}^{\prime }$ when $c^{\prime }$ arises from a poorly distributed $c$ .
It remains to verify the claims (6.17)–(6.23) of Theorem 6.7. The claim (6.17) is clear; in fact, the frequency sets $S_{c^{\prime }}^{\prime }$ are either equal to their original counterparts $S_{c}$ or have the addition of just one further frequency $\unicode[STIX]{x1D709}_{c}$ , so we even obtain the improved bound $d(v^{\prime })\leqslant d(v)+1$ in our construction here. Since the dilated torus $G_{c^{\prime }}^{\prime }$ is either equal to $G_{c}$ when $c$ is not poorly distributed, or has one lower dimension than $G_{c}$ if $c$ is poorly distributed, we obtain the bounds (6.18), (6.19). Since $\unicode[STIX]{x1D70C}_{c^{\prime }}^{\prime }$ is either equal to $\unicode[STIX]{x1D70C}_{c}$ when $c$ is not poorly distributed, or $\exp (-\unicode[STIX]{x1D702}^{-6C_{4}})\unicode[STIX]{x1D70C}_{c}$ when $c$ is poorly distributed, we obtain (6.20) (with a little room to spare). As for the volume bound, $G_{c^{\prime }}^{\prime }$ clearly has the same volume as $G_{c}$ when $c$ is not poorly distributed, and when $c$ is poorly distributed we have
which by (7.13), (6.3) is bounded in turn by $\exp (-\unicode[STIX]{x1D702}^{-5C_{2}})\operatorname{vol}(G_{c})$ , which yields (6.21), again with a little bit of room to spare (because the bounds here only increased the volume by factors that involved $C_{2}$ rather than $C_{3}$ ).
Now we establish (6.22). From the triangle inequality we have
so it will suffice to show that
for each $c$ in the essential range of $\mathbf{c}$ .
The claim is trivial when $c$ is not poorly distributed, since in this case $\mathbf{a}_{v}$ and $\mathbf{a}_{v^{\prime }}$ have identical distribution after conditioning to $\mathbf{c}=c$ . If $c$ is poorly distributed, then (after conditioning to $\mathbf{c}=c$ ) $\mathbf{a}_{v}$ is drawn regularly from $n_{c}+B(S_{c},\unicode[STIX]{x1D70C}_{c}/2)$ , while $\mathbf{a}_{v^{\prime }}$ has the distribution of $\mathbf{a}_{v}+2m_{c}\mathbf{h}_{c}$ where $\mathbf{h}_{c}$ is drawn regularly from $B(S_{c}\cup \{\unicode[STIX]{x1D709}_{c}\},\exp (-\unicode[STIX]{x1D702}^{-6C_{4}})\unicode[STIX]{x1D70C}_{c})$ independently of $\mathbf{a}_{v}$ (after conditioning to $\mathbf{c}=c$ ). The required bound (6.22) now follows from Lemma 4.4 (and (6.3)).
Finally, we prove (6.23). Our task is to show that
By the triangle inequality as before, it suffices to show that
for all $c$ in the essential range of $\mathbf{c}$ . This is trivial for $c$ not poorly distributed, so assume $c$ is poorly distributed. From (7.14) we then have
and also
for $a\in B(S_{c},\unicode[STIX]{x1D70C}_{c})$ , so by the triangle inequality it suffices to show that
(for example). But this follows by repeating the proof of (7.15), with the function $f$ replaced by $|f-F_{c}\circ \unicode[STIX]{x1D6EF}_{c}|^{2}$ . This completes the proof of Theorem 6.7.
8 Bad approximation implies energy decrement
The remaining task in the paper is to prove Theorem 6.6. In this section we will establish this result contingent on a local inverse Gowers norm theorem (Theorem 8.1) that will be proven in later sections. We begin by stating the (rather technical) precise form of that theorem that we will need.
Theorem 8.1 (Local inverse $U^{3}$ theorem).
Let $p$ be a prime, and let $S$ be a subset of $\mathbb{Z}/p\mathbb{Z}$ containing at least one non-zero element. Let $\unicode[STIX]{x1D702}$ be a real parameter with $0<\unicode[STIX]{x1D702}<{\textstyle \frac{1}{2}}$ . Let $K$ be the quantity
and let $\unicode[STIX]{x1D70C}_{0},\unicode[STIX]{x1D70C}_{1},\unicode[STIX]{x1D70C}_{2},\ldots ,\unicode[STIX]{x1D70C}_{10}$ be real numbers satisfying
as well as the separation condition
for all $i=0,\ldots ,9$ . Assume that the prime $p$ is huge relative to the reciprocal of these parameters, in the sense that
Let $f:\mathbb{Z}/p\mathbb{Z}\rightarrow \mathbb{C}$ be a $1$ -bounded function such that
whenever $\mathbf{h}_{0},\mathbf{h}_{0}^{\prime },\mathbf{h}_{1},\mathbf{h}_{1}^{\prime },\mathbf{h}_{2},\mathbf{h}_{2}^{\prime }$ are drawn independently and regularly from $B(S,\unicode[STIX]{x1D70C}_{0})$ , $B(S,\unicode[STIX]{x1D70C}_{0})$ , $B(S,\unicode[STIX]{x1D70C}_{1})$ , $B(S,\unicode[STIX]{x1D70C}_{1})$ , $B(S,\unicode[STIX]{x1D70C}_{2})$ , and $B(S,\unicode[STIX]{x1D70C}_{2})$ respectively. Then there exists a positive integer $k<\exp (K^{O(C_{1})})$ , a set $S^{\prime }\subset \mathbb{Z}/p\mathbb{Z}$ , $S^{\prime }\supset S$ , with
a locally quadratic phase $\unicode[STIX]{x1D719}:B(S^{\prime },\unicode[STIX]{x1D70C}_{9})\rightarrow \mathbb{R}/\mathbb{Z}$ , and a function $\unicode[STIX]{x1D6FD}:\mathbb{Z}/p\mathbb{Z}\rightarrow \mathbb{Z}/p\mathbb{Z}$ such that
if $\mathbf{n},\mathbf{m}$ are drawn independently and regularly from $B_{S}(0,\unicode[STIX]{x1D70C}_{0})$ and $B_{S^{\prime }}(0,\unicode[STIX]{x1D70C}_{10})$ respectively.
Remarks.
The parameters $\unicode[STIX]{x1D70C}_{3},\ldots ,\unicode[STIX]{x1D70C}_{8}$ do not have any role in the statement of this result, but they appear in the proof. We have retained them to avoid a potentially confusing relabelling.
Informally, this theorem asserts that if $f$ has a large $U^{3}$ norm on $B(S,\unicode[STIX]{x1D70C}_{0})$ , then $f$ will correlate with a locally quadratic phase $n+km\mapsto \unicode[STIX]{x1D719}(m)+\unicode[STIX]{x1D6FD}(n)m/p$ on translates $n+k\cdot B_{S^{\prime }}(0,\unicode[STIX]{x1D70C}_{10})$ of $k\cdot B_{S^{\prime }}(0,\unicode[STIX]{x1D70C}_{10})$ , with polynomial bounds on the correlation. Although we will not make crucial use of this fact in our arguments, it may be noted that the homogeneous component $\unicode[STIX]{x1D719}$ of this locally quadratic phase does not depend on the translation parameter $n$ . In the bounded rank case $|S|=O(1)$ , a theorem very roughly of this form was established in [Reference Green and Tao14]; the key point in Theorem 8.1 is that the inverse theory of [Reference Green and Tao14] can be localized to a Bohr set without having the lower bound $\unicode[STIX]{x1D702}^{O(C_{1})}$ on the correlation appearing in (8.6) depend on the rank $|S|$ or radius $\unicode[STIX]{x1D70C}_{0}$ of the Bohr set (although these parameters certainly influence the range of the variables $\mathbf{n},\mathbf{m}$ appearing in (8.6)).
The proof of Theorem 8.1 will occupy most of the remainder of this paper. To a large extent, it may be understood separately of our main arguments, requiring little of the notation of §3, for example. In this section, we will assume Theorem 8.1 and use it to establish Theorem 6.6.
For the remainder of this section, the notation and hypotheses will be as in Theorem 6.6. Namely, we fix a prime $p$ , a function $f:\mathbb{Z}/p\mathbb{Z}\rightarrow [-1,1]$ , and a parameter $0<\unicode[STIX]{x1D702}\leqslant 1/10$ , and assume (3.21). We also suppose that
is a structured local approximant obeying (6.3)–(6.6), and one of (6.8) or (6.9) holds. Our objective is to construct a structured local approximant
obeying the bounds (6.10)–(6.15). The situation here is a formalization of Example 8 from §3.
Let $\mathbf{a}=\mathbf{a}_{v},\mathbf{r}=\mathbf{r}_{v},\mathbf{f}=\mathbf{f}_{v}$ be the random variables associated to $v$ in Definition 6.1. We can unify the hypotheses (6.8), (6.9) by introducing the quadrilinear form
defined for arbitrary random (or deterministic) bounded functions $\mathbf{f}_{0},\mathbf{f}_{1},\mathbf{f}_{2},\mathbf{f}_{3}:\mathbb{Z}/p\mathbb{Z}\rightarrow \mathbb{R}$ . From the definitions of $\operatorname{Err}_{1}$ and $\operatorname{Err}_{4}$ ( just prior to (6.1)), the hypothesis (6.8) may be written as
while (6.9) can be similarly written as
Applying the triangle inequality and the quadrilinearity of $\unicode[STIX]{x1D6EC}_{\mathbf{a},\mathbf{r}}$ , we conclude that
for some random functions $\mathbf{f}_{0},\mathbf{f}_{1},\mathbf{f}_{2},\mathbf{f}_{3}$ , each of which is either equal to $1$ , $f$ , or $f-\mathbf{f}$ , and with at least one of the functions $\mathbf{f}_{0},\mathbf{f}_{1},\mathbf{f}_{2},\mathbf{f}_{3}$ equal to $f-\mathbf{f}$ . For sake of concreteness we will assume that it is $\mathbf{f}_{3}$ that is equal to $f-\mathbf{f}$ , thus
the other cases are treated similarly (with some changes to the numerical constants below) and are left to the interested reader.
We can write the left-hand side of (8.7) as
Applying Lemma 2.2, we conclude that with probability $\gg \unicode[STIX]{x1D702}$ , the variable $\mathbf{c}$ attains a value $c$ for which we have the lower bound
We now use a local version of the standard “generalized von Neumann theorem” argument (based on several applications of the Cauchy–Schwarz inequality) to obtain some local correlation of $f-f_{c}$ with a quadratic phase.
Proposition 8.2. Let the notation and hypotheses be as above. For each $(a,c)$ in the essential range of $(\mathbf{a},\mathbf{c})$ , there exists a natural number $k_{a,c}$ with
a set $\tilde{S}_{a,c}\subset \mathbb{Z}/p\mathbb{Z}$ with $\tilde{S}_{a,c}\supset S_{c}$ and
and a locally quadratic function $\unicode[STIX]{x1D6FE}_{n,a,c}:B(\tilde{S}_{a,c},\exp (-\unicode[STIX]{x1D702}^{-11C_{4}})\unicode[STIX]{x1D70C}_{c})\rightarrow \mathbb{R}/\mathbb{Z}$ for each $n\in \mathbb{Z}/p\mathbb{Z}$ , such that
where, after conditioning to the event $\mathbf{a}=a,\mathbf{c}=c$ , the random variables $\mathbf{n}$ and $\mathbf{m}$ are drawn regularly and independently from the Bohr sets $B(S_{c},\exp (-\unicode[STIX]{x1D702}^{-2C_{4}})\unicode[STIX]{x1D70C})$ and $B(\tilde{S}_{a,c},\exp (-\unicode[STIX]{x1D702}^{-12C_{4}})\unicode[STIX]{x1D70C}_{c})$ respectively.
Proof. Suppose for now that $c$ obeys (8.8). From Definition 6.1, once we condition to the event $\mathbf{c}=c$ , the random variables $\mathbf{a},\mathbf{r}$ are independent and regularly drawn from $B(S_{c},\unicode[STIX]{x1D70C}_{c}/2)$ and $B(S_{c},\exp (-\unicode[STIX]{x1D702}^{-C_{4}})\unicode[STIX]{x1D70C}_{c})$ respectively; from (6.4) we have the bounds
Also, the function $\mathbf{f}$ is now the deterministic function
on the Bohr set $B(S_{c},\unicode[STIX]{x1D70C}_{c})$ , and $\mathbf{f}_{0},\mathbf{f}_{1},\mathbf{f}_{2}$ become deterministic functions $f_{0,c}$ , $f_{1,c}$ and $f_{2,c}$ taking values in $[-2,2]$ . Thus we have
where $f_{3,c}:=f-f_{c}$ .
We now do a linear change of variable with conveniently chosen numerical coefficients that will facilitate a certain use of the Cauchy–Schwarz inequality to eliminate the bounded functions $f_{0,c},f_{1,c},f_{2,c}$ , leaving only the function $f_{3,c}$ . Continuing to condition on the event that $\mathbf{c}=c$ , let $\mathbf{n}_{1},\mathbf{n}_{2}$ and $\mathbf{n}_{3}$ be drawn regularly and independently from the Bohr sets $B(S_{c},\exp (-\unicode[STIX]{x1D702}^{-2C_{4}})\unicode[STIX]{x1D70C}_{c})$ , $B(S_{c},\exp (-\unicode[STIX]{x1D702}^{-3C_{4}})\unicode[STIX]{x1D70C}_{c})$ , and $B(S_{c},\exp (-\unicode[STIX]{x1D702}^{-4C_{4}})\unicode[STIX]{x1D70C}_{c})$ respectively, independently of the previous random variables. We can use Lemma 4.4 (and (8.12)) to compare $\mathbf{a}$ with $\mathbf{a}-3\mathbf{n}_{2}-12\mathbf{n}_{3}$ , and conclude that
By another application of Lemma 4.4, we may compare $\mathbf{r}$ with $\mathbf{r}+2\mathbf{n}_{1}+3\mathbf{n}_{2}+6\mathbf{n}_{3}$ , and conclude that
Finally, we use Lemma 4.4 to replace $\mathbf{a}$ by $\mathbf{a}-3\mathbf{r}$ , so that
The purpose of this odd-seeming change of variables is that each of the functions $f_{0,c},f_{1,c},f_{2,c}$ now has an argument that involves only two of the three random variables $\mathbf{n}_{1},\mathbf{n}_{2},\mathbf{n}_{3}$ , while the argument of the key function $f_{3,c}$ depends on $\mathbf{n}_{1},\mathbf{n}_{2},\mathbf{n}_{3}$ only through their sum $\mathbf{n}_{1}+\mathbf{n}_{2}+\mathbf{n}_{3}$ .
One can achieve a similar effect for the other three choices $\mathbf{f}_{0},\mathbf{f}_{1},\mathbf{f}_{2}$ for key function by suitable adjustment to the constants above; we leave the details to the interested reader.
By Lemma 2.2, we see that with probability $\gg \unicode[STIX]{x1D702}$ (conditioning on $\mathbf{c}=c$ ), the random variable $\mathbf{a}$ attains a value $a$ such that
Let $a$ be such that (8.13) holds. We can then find an $r\in \mathbb{Z}/p\mathbb{Z}$ (depending on $a,c$ ) such that
We now suppress the additive structure on the first three arguments by rewriting the above bound as
where $f_{0,c,a},f_{1,c,a},f_{2,c,a}:\mathbb{Z}/p\mathbb{Z}\times \mathbb{Z}/p\mathbb{Z}\rightarrow [-2,2]$ are bounded functions whose exact form
will not be relevant in the arguments that follow.
We can eliminate the factor $f_{0,c,a}$ using Lemma 2.1 to conclude that
where $\mathbf{n}_{1}^{\prime }$ is an independent copy of $\mathbf{n}_{1}$ (and also independent of $\mathbf{n}_{2},\mathbf{n}_{3}$ ) on the event $\mathbf{a}=a,\mathbf{c}=c$ . We can similarly apply Lemma 2.1 to eliminate the $f_{1,c,a}(\mathbf{n}_{1},\mathbf{n}_{3})f_{1,c,a}(\mathbf{n}_{1}^{\prime },\mathbf{n}_{3})$ variables to conclude that
and finally apply Lemma 2.1 to eliminate the $f_{2,c,a}$ terms and arrive at
where $\mathbf{n}_{2}^{\prime },\mathbf{n}_{3}^{\prime }$ are independent copies of $\mathbf{n}_{2},\mathbf{n}_{3}$ respectively on $\mathbf{a}=a,\mathbf{c}=c$ , with $\mathbf{n}_{1},\mathbf{n}_{2},\mathbf{n}_{3},\mathbf{n}_{1}^{\prime },\mathbf{n}_{2}^{\prime },\mathbf{n}_{3}^{\prime }$ all independent relative to $\mathbf{a}=a,\mathbf{c}=c$ .
We now apply Theorem 8.1, replacing $\unicode[STIX]{x1D702}$ by a small multiple of $\unicode[STIX]{x1D702}^{8}$ , and choosing $\unicode[STIX]{x1D70C}_{i}:=\exp (-\unicode[STIX]{x1D702}^{-(\text{i}+2)C_{4}})\unicode[STIX]{x1D70C}$ for $i=0,\ldots ,10$ , and using the bounds (8.12), (3.21) to justify the hypothesis (8.3). We conclude that for $c$ obeying (8.8) and $a$ obeying (8.13), we can find a natural number $k_{a,c}$ obeying (8.9), a set $\tilde{S}_{a,c}$ with $S_{c}\subset \tilde{S}_{a,c}\subset \mathbb{Z}/p\mathbb{Z}$ obeying (8.10), a locally quadratic function $\unicode[STIX]{x1D719}_{a,c}:B(\tilde{S}_{a,c},\exp (-\unicode[STIX]{x1D702}^{-11C_{4}})\unicode[STIX]{x1D70C})\rightarrow \mathbb{R}/\mathbb{Z}$ , and a function $\unicode[STIX]{x1D6FD}_{a,c}:\mathbb{Z}/p\mathbb{Z}\rightarrow \mathbb{Z}/p\mathbb{Z}$ such that
if $\mathbf{n},\mathbf{m}$ are drawn independently and regularly from $B(S_{c},\exp (-\unicode[STIX]{x1D702}^{-2C_{4}})\unicode[STIX]{x1D70C}_{c})$ and $B(S_{a,c},\exp (\unicode[STIX]{x1D702}^{-12C_{4}})\unicode[STIX]{x1D70C}_{c})$ respectively on the event $\mathbf{a}=a,\mathbf{c}=c$ . Taking expectations in $\mathbf{a}$ (and choosing $S_{a,c}=S_{c}$ , $\unicode[STIX]{x1D719}_{a,c}=0$ and $\unicode[STIX]{x1D6FD}_{a,c}=0$ if (8.8) or (8.13) is not satisfied), we conclude that
In particular, if we set $\unicode[STIX]{x1D6FE}_{n,a,c}(m):=\unicode[STIX]{x1D719}_{a,c}(m)+\unicode[STIX]{x1D6FD}_{a,c}(n)m+\unicode[STIX]{x1D703}_{n,a,c}$ for a suitable phase $\unicode[STIX]{x1D703}_{n,a,c}\in \mathbb{R}/\mathbb{Z}$ , then $\unicode[STIX]{x1D6FE}_{n,a,c}$ is locally quadratic on $B(\tilde{S}_{a,c},\exp (-\unicode[STIX]{x1D702}^{-11C_{4}})\unicode[STIX]{x1D70C})$ and
giving the claim. ◻
Let $\mathbf{n},\mathbf{m},k_{a,c},\tilde{S}_{a,c},\unicode[STIX]{x1D6FE}_{n,a,c}$ be as in the above proposition. The conclusion (8.11) of Proposition 8.2 may be rewritten more compactly as
We now introduce the modified random function $\mathbf{f}^{\prime }:\mathbb{Z}/p\mathbb{Z}\rightarrow [-2,2]$ by the formula
where we extend $\unicode[STIX]{x1D6FE}_{n,a,c}$ arbitrarily outside of $B(S_{c}^{\prime },\exp (-\unicode[STIX]{x1D702}^{-11C_{4}})\unicode[STIX]{x1D70C}_{c})$ . Note from (8.9) and (3.21) that we can divide by $6k_{\mathbf{a},\mathbf{c}}$ in $\mathbb{Z}/p\mathbb{Z}$ without difficulty.
We claim that the function $\mathbf{f}^{\prime }$ is a little closer to $f$ than $\mathbf{f}$ is.
Lemma 8.3. We have
Proof. From (8.15) we have
and so
On the other hand, for any $(a,c)$ in the essential range of $(\mathbf{a},\mathbf{c})$ , we may use Lemma 4.4 to compare $\mathbf{n}$ with $\mathbf{n}+k_{\mathbf{a},\mathbf{c}}\mathbf{m}$ , and conclude that
(for example), and hence on taking expectations in $\mathbf{a}$
Applying Lemma 4.4 again to compare $\mathbf{a}$ with $\mathbf{a}+6\mathbf{n}$ , we conclude that
and hence on taking averages in $\mathbf{c}$
Taking expectations in (8.16) and using (8.15), (8.17), we obtain the claim. ◻
There is a very minor technical issue that $\mathbf{f}^{\prime }$ does not quite take values in $[-1,1]$ , which is what is needed in the definition of an approximant. However, this is easily fixed by truncation, or more precisely by introducing the random function $\mathbf{f}^{\prime \prime }:\mathbb{Z}/p\mathbb{Z}\rightarrow [-1,1]$ defined by
Since $f(l)$ already lies in $[-1,1]$ , we see that $\mathbf{f}^{\prime \prime }(l)$ is at least as close to $f(l)$ as $\mathbf{f}^{\prime }(l)$ is, thus we have the pointwise bound
for any $l\in \mathbb{Z}/p\mathbb{Z}$ . From the above lemma, we thus have
We can now construct the new structured approximant
as follows. We write the dilated torus $G_{c}$ as $G_{c}=\prod _{i=1}^{\dim (G_{c})}\mathbb{R}/\unicode[STIX]{x1D706}_{i,c}\mathbb{Z}$ .
-
(i) We set $C^{\prime }:=(\mathbb{Z}/p\mathbb{Z})\times (\mathbb{Z}/p\mathbb{Z})\times C$ and $\mathbf{c}^{\prime }:=(\mathbf{n},\mathbf{a},\mathbf{c})$ .
-
(ii) If $c^{\prime }=(n,a,c)$ is in $C^{\prime }$ , we set
$$\begin{eqnarray}\displaystyle n_{c^{\prime }}^{\prime } & := & \displaystyle a+6n,\nonumber\\ \displaystyle S_{c^{\prime }}^{\prime } & := & \displaystyle (6k_{a,c})^{-1}\cdot \tilde{S}_{a,c},\nonumber\\ \displaystyle \unicode[STIX]{x1D70C}_{c^{\prime }}^{\prime } & := & \displaystyle \exp (-\unicode[STIX]{x1D702}^{-12C_{4}})\unicode[STIX]{x1D70C}_{c},\nonumber\\ \displaystyle G_{c^{\prime }}^{\prime } & := & \displaystyle \mathop{\prod }_{i=1}^{\dim (G_{c})}(\mathbb{R}/100\unicode[STIX]{x1D706}_{i,c}\mathbb{Z})\times (\mathbb{R}/\mathbb{Z}).\nonumber\end{eqnarray}$$ -
(iii) If $c^{\prime }=(n,a,c)$ is in $C^{\prime }$ , we define $F_{c^{\prime }}^{\prime }:G_{c^{\prime }}^{\prime }\rightarrow [-1,1]$ to be the function
$$\begin{eqnarray}F_{c^{\prime }}^{\prime }(x,y):=\min (\max (F_{c}({\textstyle \frac{1}{100}}\cdot x)+\unicode[STIX]{x1D702}^{C_{2}/2}\cos (2\unicode[STIX]{x1D70B}y),-1),1)\end{eqnarray}$$for $x\in \prod _{i=1}^{\dim (G_{c})}(\mathbb{R}/100\unicode[STIX]{x1D706}_{i,c}\mathbb{Z})$ and $y\in \mathbb{R}/\mathbb{Z}$ , where $x\mapsto {\textstyle \frac{1}{100}}\cdot x$ is the obvious contraction map from $\prod _{i=1}^{\dim (G_{c})}(\mathbb{R}/100\unicode[STIX]{x1D706}_{i,c}\mathbb{Z})$ to $\prod _{i=1}^{\dim (G_{c})}(\mathbb{R}/\unicode[STIX]{x1D706}_{i,c}\mathbb{Z})$ . -
(iv) If $c^{\prime }=(n,a,c)$ is in $C^{\prime }$ , we define $\unicode[STIX]{x1D6EF}_{c^{\prime }}^{\prime }:n_{c^{\prime }}^{\prime }+B(S_{c^{\prime }}^{\prime },\unicode[STIX]{x1D70C}_{c^{\prime }}^{\prime })\rightarrow G_{c^{\prime }}^{\prime }$ by the formula
$$\begin{eqnarray}\unicode[STIX]{x1D6EF}_{c^{\prime }}^{\prime }(l):=\biggl(100\cdot \unicode[STIX]{x1D6EF}_{c}(l),\unicode[STIX]{x1D6FE}_{n,a,c}\biggl(\frac{l-a-6n}{6k_{a,c}}\biggr)\biggr)\end{eqnarray}$$for $l\in n_{c^{\prime }}^{\prime }+B(S_{c^{\prime }}^{\prime },\unicode[STIX]{x1D70C}_{c^{\prime }}^{\prime })$ (which implies in particular that $(l-a-6n)/6k_{a,c}\in B(\tilde{S}_{a,c},\exp (-\unicode[STIX]{x1D702}^{-12C_{4}})\unicode[STIX]{x1D70C}_{c})$ ), where $x\mapsto 100\cdot x$ is the obvious dilation map from $\prod _{i=1}^{\dim (G_{c})}(\mathbb{R}/\unicode[STIX]{x1D706}_{i,c}\mathbb{Z})$ to $\prod _{i=1}^{\dim (G_{c})}(\mathbb{R}/100\unicode[STIX]{x1D706}_{i,c}\mathbb{Z})$ (the inverse of the map $x\mapsto {\textstyle \frac{1}{100}}\cdot x$ from part (iii)).
Since $F_{c}$ is $1$ -Lipschitz, it is easy to see (thanks to the contraction by ${\textstyle \frac{1}{100}}$ ) that $F_{c^{\prime }}^{\prime }$ is also $1$ -Lipschitz; similarly, as $\unicode[STIX]{x1D6EF}_{c}$ and $\unicode[STIX]{x1D6FE}_{n,a,c}$ are locally quadratic on $n_{c}+B(S_{c},\unicode[STIX]{x1D70C}_{c})$ and $B(\tilde{S}_{a,c},\exp (\unicode[STIX]{x1D702}^{-11C_{4}})\unicode[STIX]{x1D70C}_{c})$ respectively, we see that $\unicode[STIX]{x1D6EF}_{c^{\prime }}^{\prime }$ is also locally quadratic on $n_{c^{\prime }}^{\prime }+B(S_{c^{\prime }}^{\prime },\unicode[STIX]{x1D70C}_{c^{\prime }}^{\prime })$ . From (8.15), (8.18), Definition 6.1, and the above constructions we see that
and hence by (8.19)
From Definition 6.1 and the above constructions, we also see that $\mathbf{a}_{v^{\prime }}$ has the same distribution as $\mathbf{a}+6\mathbf{n}+6k_{\mathbf{a},\mathbf{c}}\mathbf{m}$ (after conditioning to any positive probability event of the form $(\mathbf{n},\mathbf{a},\mathbf{c})=(n,a,c)$ ), which gives the required energy decrement (6.15).
The bound (6.10) follows from (8.10), while from construction we clearly have $\dim (G_{c^{\prime }}^{\prime })=\dim (G_{c})+1$ , which gives (6.11). Since we have $\unicode[STIX]{x1D70C}_{c^{\prime }}^{\prime }:=\exp (-\unicode[STIX]{x1D702}^{-12C_{4}})\unicode[STIX]{x1D70C}_{c}$ , the bound (6.12) is clear; also, from (6.4) we have
which gives (6.13). It remains to establish (6.14). By the definition of $\operatorname{Err}_{1}$ (just before (6.1)) and the triangle inequality, it suffices to show that
But as mentioned previously, $\mathbf{a}_{v^{\prime }}$ has the same distribution as $\mathbf{a}+6\mathbf{n}+6k_{\mathbf{a},\mathbf{c}}\mathbf{m}$ , and by using Lemma 4.4 as in the proof of Lemma 8.3 we have
giving the claim. This completes the proof of Theorem 6.6, assuming the local inverse Gowers norm theorem (Theorem 8.1).
9 Local inverse $U^{3}$ theorem
We now turn to the proof of Theorem 8.1, which is the last component needed in the proof of Theorem 1.1. Let us begin by recalling the setup of this theorem. We let $S$ be a subset of $\mathbb{Z}/p\mathbb{Z}$ , take a parameter $\unicode[STIX]{x1D702}$ satisfying $0<\unicode[STIX]{x1D702}<{\textstyle \frac{1}{2}}$ , and define the quantity $K$ by (8.1), thus
We suppose that
are scales obeying the separation condition (8.2) and the largeness condition (8.3), and suppose that $f:\mathbb{Z}/p\mathbb{Z}\rightarrow \mathbb{C}$ is a $1$ -bounded function obeying (8.4). Our task is to locate a natural number $k$ with $k<\exp (K^{O(C_{1})})$ , a set $S^{\prime }$ with $S\subset S^{\prime }\subset \mathbb{Z}/p\mathbb{Z}$ obeying (8.5), a locally quadratic phase $\unicode[STIX]{x1D719}:B(S^{\prime },\unicode[STIX]{x1D70C}_{9})\rightarrow \mathbb{R}/\mathbb{Z}$ , and a function $\unicode[STIX]{x1D6FD}:\mathbb{Z}/p\mathbb{Z}\rightarrow \mathbb{Z}/p\mathbb{Z}$ obeying (8.6). We will initially work at the scale $\unicode[STIX]{x1D70C}_{0}$ , but retreat to smaller scales as the argument progresses (mainly to ensure that the error terms in Lemma 4.4 are negligible), until we are working at the final scales $\unicode[STIX]{x1D70C}_{9}$ and $\unicode[STIX]{x1D70C}_{10}$ . Let us comment once more that the intermediate scales $\unicode[STIX]{x1D70C}_{3},\ldots ,\unicode[STIX]{x1D70C}_{8}$ play no role in the actual statement of Theorem 8.1.
In this section, all sums will be over $\mathbb{Z}/p\mathbb{Z}$ unless otherwise stated.
9.1 First step: associate a frequency $\unicode[STIX]{x1D709}(n_{2})$ to each derivative of $f$
We now begin the (lengthy) proof of this theorem, which broadly follows the same inverse $U^{3}$ strategy in previous literature [Reference Gowers11, Reference Green and Tao14], but localized to a Bohr set, the key aim being to reduce the dependence of constants on the rank or radius of this Bohr set as much as possible.
The first step is to use the local inverse $U^{2}$ theorem (Theorem 4.12) to associate a frequency $\unicode[STIX]{x1D709}(n_{2})\in \mathbb{Z}/p\mathbb{Z}$ to many “derivatives” $x\mapsto f(x+n_{2})\overline{f(x)}$ of $f$ .
Theorem 9.2. Let the notation and hypotheses be as in Theorem 8.1. Then there exists a set $\unicode[STIX]{x1D6FA}\subset B(S,2\unicode[STIX]{x1D70C}_{2})$ obeying the largeness condition
when $\mathbf{h}_{2},\mathbf{h}_{2}^{\prime }$ are drawn independently and regularly from $B(S,\unicode[STIX]{x1D70C}_{2})$ , and a function $\unicode[STIX]{x1D709}:\mathbb{Z}/p\mathbb{Z}\rightarrow \mathbb{Z}/p\mathbb{Z}$ such that
for all $n_{2}\in \mathbb{Z}/p\mathbb{Z}$ , and $\mathbf{n}_{0},\mathbf{n}_{1}$ are drawn independently and regularly from $B(S,\unicode[STIX]{x1D70C}_{0}),B(S,\unicode[STIX]{x1D70C}_{1})$ respectively.
Proof. For each $n_{2}\in \mathbb{Z}/p\mathbb{Z}$ , let $f_{n_{2}}:\mathbb{Z}/p\mathbb{Z}\rightarrow \mathbb{C}$ denote the $1$ -bounded function
Then we may rewrite the left-hand side of (8.4) as
By Lemma 4.4 and (8.2), the random variables $\mathbf{h}_{0},\mathbf{h}_{0}^{\prime }$ differ in total variation from $\mathbf{h}_{0}+\mathbf{h}_{2},\mathbf{h}_{0}^{\prime }+\mathbf{h}_{2}$ respectively by at most $\unicode[STIX]{x1D702}/4$ (for example). We conclude that
By the triangle inequality, the left-hand side is at most
The inner expectation is bounded by $1$ . Applying Lemma 2.2 (with $\mathbf{a}=\mathbf{h}_{2}-\mathbf{h}_{2}^{\prime }$ ), we conclude that there is a set $\unicode[STIX]{x1D6FA}\subset \mathbb{Z}/p\mathbb{Z}$ obeying (9.2) such that
for all $n_{2}\in \unicode[STIX]{x1D6FA}$ . Applying Theorem 4.12, we see that for each $n_{2}\in \unicode[STIX]{x1D6FA}$ , there exists $\unicode[STIX]{x1D709}(n_{2})\in \mathbb{Z}/p\mathbb{Z}$ such that
For $n_{2}\not \in \unicode[STIX]{x1D6FA}$ , we set $\unicode[STIX]{x1D709}(n_{2})$ arbitrarily (e.g. to zero). The claim follows.◻
9.3 Second step: $\unicode[STIX]{x1D709}$ is approximately linear 1% of the time
The next step, following Gowers [Reference Gowers11], is to obtain some approximate linearity control on the function $\unicode[STIX]{x1D709}:\mathbb{Z}/p\mathbb{Z}\rightarrow \mathbb{Z}/p\mathbb{Z}$ . Define an additive quadruple to be a quadruplet $\vec{a}=(a_{(1)},a_{(2)},a_{(3)},a_{(4)})\in (\mathbb{Z}/p\mathbb{Z})^{4}$ such that
and let $\operatorname{Q}\subset (\mathbb{Z}/p\mathbb{Z})^{4}$ denote the space of all additive quadruples. We call an additive quadruple $(a_{(1)},a_{(2)},a_{(3)},a_{(4)})\in \operatorname{Q}$ bad if
where the word norm $\Vert \Vert _{S}$ was defined in Definition 4.5. Let $\operatorname{BQ}\subset \operatorname{Q}$ denote the space of all bad additive quadruples.
Theorem 9.4. Let the notation and hypotheses be as in Theorem 8.1, and let $\unicode[STIX]{x1D6FA}$ and $\unicode[STIX]{x1D709}:\mathbb{Z}/p\mathbb{Z}\rightarrow \mathbb{Z}/p\mathbb{Z}$ be as in Theorem 9.2. If $\mathbf{h}_{2},\mathbf{h}_{2}^{\prime },\mathbf{k}_{2},\mathbf{k}_{2}^{\prime }$ are drawn independently and regularly from $B(S,\unicode[STIX]{x1D70C}_{2})$ , then with probability $\gg \unicode[STIX]{x1D702}^{O(1)}$ , one has
Proof. Let $\mathbf{n}_{0},\mathbf{n}_{1}$ be drawn independently and regularly from the Bohr sets $B(S,\unicode[STIX]{x1D70C}_{0})$ , $B(S,\unicode[STIX]{x1D70C}_{1})$ respectively. From (9.3) we have
for any $n_{2}\in \unicode[STIX]{x1D6FA}$ . Using (9.2), we conclude that
where $\mathbf{h}_{2},\mathbf{h}_{2}^{\prime }$ are drawn independently and regularly from $B(S,\unicode[STIX]{x1D70C}_{2})$ , and are independent of $\mathbf{n}_{0},\mathbf{n}_{1}$ . By the pigeonhole principle, one can thus find $n_{0}\in \mathbb{Z}/p\mathbb{Z}$ such that
We can rewrite the left-hand side as
for some $1$ -bounded function $F_{n_{0}}:\mathbb{Z}/p\mathbb{Z}\rightarrow \mathbb{C}$ depending on $n_{0}$ . Using Lemma 4.4 to compare $\mathbf{n}_{1}$ with $\mathbf{n}_{1}+\mathbf{h}_{2}^{\prime }$ , we conclude that
We rearrange the left-hand side as
where $G_{n_{0},n_{1}}:\mathbb{Z}/p\mathbb{Z}\times \mathbb{Z}/p\mathbb{Z}\rightarrow \mathbb{C}$ is the $1$ -bounded function
By Hölder’s inequality, we conclude that
From this point onward we cease to keep careful track of powers of $\unicode[STIX]{x1D702}$ . On the other hand, by using two applications of Lemma 2.1 to eliminate the $1$ -bounded functions $f$ , we have
where $(\mathbf{k}_{2},\mathbf{k}_{2}^{\prime })$ is an independent copy of $(\mathbf{h}_{2},\mathbf{h}_{2}^{\prime })$ . We thus have
which by the triangle inequality and (9.7) gives
By Lemma 2.2, we conclude that with probability $\gg \unicode[STIX]{x1D702}^{O(1)}$ , the tuple $(\mathbf{h}_{2},\mathbf{k}_{2},\mathbf{h}_{2}^{\prime },\mathbf{k}_{2}^{\prime })$ attains a value $(h_{2},k_{2},h_{2}^{\prime },k_{2}^{\prime })$ for which
and
thanks to (9.1). Since $(h_{2}-h_{2}^{\prime },k_{2}-k_{2}^{\prime },h_{2}-k_{2}^{\prime },k_{2}-h_{2}^{\prime })$ is an additive quadruple, the claim now follows from Lemma 4.7, (8.2), and (9.1).◻
We localize this claim slightly, though for notational reasons we will not move from $\unicode[STIX]{x1D70C}_{2}$ immediately to $\unicode[STIX]{x1D70C}_{3}$ and beyond, but instead first work in some intermediate scales between $\unicode[STIX]{x1D70C}_{2}$ and $\unicode[STIX]{x1D70C}_{3}$ . For any natural number $j$ , define
thus
if (for example) $j\leqslant K^{C_{1}^{2}}$ .
It will be necessary to break the symmetry between the four components of an additive quadruple, by restricting the second component to a tiny Bohr set, the third component to a larger Bohr set, and the first and fourth components to an even larger Bohr set. More precisely, given an additive quadruple $\vec{a}_{0}=(a_{(1),0},a_{(2),0},a_{(3),0},a_{(4),0})\in \operatorname{Q}$ , a subset $S^{\prime }\subset \mathbb{Z}/p\mathbb{Z}$ , and radii $0<r_{2}\leqslant r_{3}\leqslant r_{4}\leqslant 1/2$ , we say that a random additive quadruple $\vec{\mathbf{a}}=(\mathbf{a}_{(1)},\mathbf{a}_{(2)},\mathbf{a}_{(3)},\mathbf{a}_{(4)})\in \operatorname{Q}$ is centred at $\vec{a}_{0}$ with frequencies $S^{\prime }$ and scales $r_{2},r_{3},r_{4}$ if $\mathbf{a}_{(2)},\mathbf{a}_{(3)},\mathbf{a}_{(4)}$ are drawn independently and regularly from $a_{(2),0}+B(S^{\prime },r_{2})$ , $a_{(2),0}+B(S^{\prime },r_{2})$ , and $a_{(2),0}+B(S^{\prime },r_{2})$ respectively. Note that this property also describes the distribution of $\mathbf{a}_{(1)}$ , since we have the constraint
In practice, $r_{4}$ will be much larger than $r_{2},r_{3}$ , so (by Lemma 4.4) $\mathbf{a}_{(1)}$ will be approximately regularly drawn from $a_{(1),0}+B(S^{\prime },r_{4})$ , but will be highly coupled to the other three components of the quadruple (in particular, it will stay close to $\mathbf{a}_{(4)}$ ). We thus see that for $i=1,2,3,4$ , each $\mathbf{a}_{(i)}$ is either exactly or approximately drawn regularly from $a_{(i),0}+B(S^{\prime },r_{l_{i}})$ , where $l_{i}\in \{0,1,2\}$ is the quantity defined by the formulae
Corollary 9.5. Let the notation and hypotheses be as in Theorem 8.1, and let $\unicode[STIX]{x1D6FA}$ and $\unicode[STIX]{x1D709}$ be as in Theorem 9.2. Then there exists a random additive quadruple $\vec{\mathbf{a}}\in \operatorname{Q}$ centred at some quadruple $\vec{a}_{0}\in \operatorname{Q}$ with frequencies $S$ and scales $\unicode[STIX]{x1D70C}_{2,2},\unicode[STIX]{x1D70C}_{2,1},\unicode[STIX]{x1D70C}_{2,0}$ , such that $\vec{\mathbf{a}}\in \unicode[STIX]{x1D6FA}^{4}\cap (\operatorname{Q}\backslash \operatorname{BQ})$ with probability $\gg \unicode[STIX]{x1D702}^{O(1)}$ .
Proof. Let $\mathbf{h}_{2},\mathbf{k}_{2},\mathbf{h}_{2}^{\prime },\mathbf{k}_{2}^{\prime },\mathbf{n}_{2,1},\mathbf{n}_{2,2}$ be drawn independently and regularly from $B(S,\unicode[STIX]{x1D70C}_{2,0})$ , $B(S,\unicode[STIX]{x1D70C}_{2,0})$ , $B(S,\unicode[STIX]{x1D70C}_{2,0})$ , $B(S,\unicode[STIX]{x1D70C}_{2,0})$ , $B(S,\unicode[STIX]{x1D70C}_{2,1})$ and $B(S,\unicode[STIX]{x1D70C}_{2,2})$ respectively. From Theorem 9.4, we have
with probability $\gg \unicode[STIX]{x1D702}^{O(1)}$ . Using Lemma 4.4, we may replace $\mathbf{k}_{2}^{\prime }$ by $\mathbf{k}_{2}^{\prime }-\mathbf{n}_{2,2}$ , and similarly replace $\mathbf{h}_{2}$ by $\mathbf{h}_{2}+\mathbf{n}_{2,1}-\mathbf{n}_{2,2}$ , to conclude that
with probability $\gg \unicode[STIX]{x1D702}^{O(1)}$ . By the pigeonhole principle, we may thus find $k_{2},k_{2}^{\prime },h_{2}\in \mathbb{Z}/p\mathbb{Z}$ such that
with probability $\gg \unicode[STIX]{x1D702}^{O(1)}$ . The left-hand side is an additive quadruple centred at $(h_{2},k_{2}-k_{2}^{\prime },h_{2}-k_{2}^{\prime },k_{2})$ with frequencies $S$ and scales $\unicode[STIX]{x1D70C}_{2,2},\unicode[STIX]{x1D70C}_{2,1},\unicode[STIX]{x1D70C}_{2,0}$ , and the claim follows.◻
9.6 Third step: $\unicode[STIX]{x1D709}$ is approximately linear 99% of the time on a rough set
The next general step in the standard inverse $U^{3}$ argument is to upgrade this weak additive structure, which is of a “1 percent” nature, to a more robust “99 percent” additive structure. There are two basic ways to proceed here. The first way is to invoke the Balog–Szemerédi–Gowers theorem [Reference Balog and Szemerédi1, Reference Gowers11], followed by standard sum set estimates including Freiman’s theorem (see e.g. [Reference Tao and Vu33, Ch. 2]). It is likely that this approach will eventually work here, but these results need to be localized efficiently to Bohr sets, and also to allow for the fact that $\unicode[STIX]{x1D709}(a_{(1)})+\unicode[STIX]{x1D709}(a_{(2)})-\unicode[STIX]{x1D709}(a_{(3)})-\unicode[STIX]{x1D709}(a_{(4)})$ no longer vanishes, but instead has controlled word norm. This would require reworking of large portions of the standard additive combinatorics literature. We have thus elected instead to follow the second approach, also due to Gowers [Reference Gowers12], in which a certain probabilistic argument is used to “purify” a 1 percent additive map to a 99 percent additive map, albeit on a set that has no particular structure itself. To deal with this set we will use a more recent innovation, namely a variantFootnote 4 of the arithmetic regularity lemma [Reference Green13], [Reference Green and Tao18] to make the subsets of $\mathbb{Z}/p\mathbb{Z}$ on which one has good control of $\unicode[STIX]{x1D709}$ suitably “pseudorandom” in the sense of Gowers.
We turn to the details. We first locate a reasonably large quadruple of sets $A_{(1)},A_{(2)},A_{(3)},A_{(4)}$ on which $\unicode[STIX]{x1D709}$ is “almost a Freiman homomorphism” in the sense that most quadruples falling inside $A_{(1)}\times A_{(2)}\times A_{(3)}\times A_{(4)}$ are somewhat good. We call an additive quadruple $(a_{(1)},a_{(2)},a_{(3)},a_{(4)})\in \operatorname{Q}$ very bad if
and let $\operatorname{VBQ}\subset \operatorname{BQ}$ denote the space of all very bad additive quadruples.
Theorem 9.7. Let the notation and hypotheses be as in Theorem 8.1, and let $\unicode[STIX]{x1D6FA}$ and $\unicode[STIX]{x1D709}$ be as in Theorem 9.2. Let $\vec{a}$ be the random additive quadruple from Corollary 9.5. Then there exist sets $A_{(1)},A_{(2)},A_{(3)},A_{(4)}\subset \unicode[STIX]{x1D6FA}$ such that
where $W:\text{Q}\rightarrow \mathbb{R}$ is the weight function
The idea here is that $W$ is a weight function that strongly penalizes very bad quadruples, and so Theorem 9.7 is asserting that “most” of the quadruples in $A_{(1)}\times A_{(2)}\times A_{(3)}\times A_{(4)}$ are not very bad.
Proof. We will construct the sets $A_{(i)}$ by the probabilistic method, adapting an argument from [Reference Gowers12] in which the $A_{(i)}$ are created by applying a number of random linear “filters” to the graph of $\unicode[STIX]{x1D709}$ to eliminate most of the additive quadruples that are not (almost) preserved by $\unicode[STIX]{x1D709}$ .
We turn to the details. Let $m$ be the integer
We then select jointly independent random variables $\mathbf{h}_{j}\in \mathbb{Z}/p\mathbb{Z}$ and $\boldsymbol{\unicode[STIX]{x1D706}}_{j}\in \mathbb{Z}/p\mathbb{Z}$ for each for $j=1,\ldots ,m$ , by selecting each $h_{j}$ regularly from $B(S,\unicode[STIX]{x1D70C}_{2})$ , and selecting $\boldsymbol{\unicode[STIX]{x1D706}}_{j}$ uniformly at random from $\mathbb{Z}/p\mathbb{Z}$ ; we also choose these random variables to be independent of $\vec{\mathbf{a}}$ . For $j=1,\ldots ,m$ , we then let $\boldsymbol{\unicode[STIX]{x1D6EF}}_{\!j}:\mathbb{Z}/p\mathbb{Z}\rightarrow \mathbb{R}/\mathbb{Z}$ be the random map
and then define the random sets
for $i=1,2,3,4$ , where
and
We will show that
and
which will give the claim thanks to (9.13) and (9.12), if $C_{1}$ is large enough.
We first show (9.15). By Corollary 9.5 and linearity of expectation, it suffices to show that
whenever $(a_{(1)},a_{(2)},a_{(3)},a_{(4)})$ lies in $\unicode[STIX]{x1D6FA}^{4}\cap (\operatorname{Q}\backslash \operatorname{BQ})$ . Actually, we will only show the weaker assertion that (9.17) holds for all but at most $O(m^{O(1)}p^{2})$ of the available additive quadruples $(a_{(1)},a_{(2)},a_{(3)},a_{(4)})$ ; this still suffices, since by (4.3), (9.1) each exceptional additive quadruple is attained with probability $O(1/\unicode[STIX]{x1D70C}_{3}^{O(K)}p^{3})$ , and the additional factor of $p$ will dominate all the losses in $m,K,\unicode[STIX]{x1D70C}_{3}$ thanks to (8.3), (9.13).
Fix an additive quadruple $\vec{a}=(a_{(1)},a_{(2)},a_{(3)},a_{(4)})$ in $\unicode[STIX]{x1D6FA}^{4}\cap (\operatorname{Q}\backslash \operatorname{BQ})$ . The left-hand side of (9.17) factors as
so it will suffice to show that for each $j=1,\ldots ,m$ , one has
for all but $O(m^{O(1)}p^{2})$ quadruples $(a_{(1)},a_{(2)},a_{(3)},a_{(4)})\in \operatorname{Q}\backslash \operatorname{BQ}$ . Note however that from (9.14) we have
and hence by the hypothesis $(a_{(1)},a_{(2)},a_{(3)},a_{(4)})\in \operatorname{Q}\backslash \operatorname{BQ}$ and the range of $\mathbf{h}_{j}$ we have
(for example). In particular, we see from the triangle inequality that the claim $a_{(4)}\in \mathbf{A}_{(4),j}$ is implied by the claims $a_{(i)}\in \mathbf{A}_{(i),j}$ for $i=1,2,3$ . Thus it suffices to show that
for all but $O(m^{O(1)}p^{2})$ triples $(a_{(1)},a_{(2)},a_{(3)})\in (\mathbb{Z}/p\mathbb{Z})^{3}$ , noting that $a_{(4)}$ is determined by $a_{(1)},a_{(2)},a_{(3)}$ . We can write the left-hand side as
where we view the interval $[-1/200,1/200]$ as a subset of $\mathbb{R}/\mathbb{Z}$ . Thus it will suffice to show the equidistribution property
Let $\unicode[STIX]{x1D713}:(\mathbb{R}/\mathbb{Z})^{3}\rightarrow [0,1]$ be a Lipschitz cutoff supported on $[-1/20,1/20]^{3}$ that equals one on $[-1/200+1/m,1/200-1/m]^{3}$ and has Lipschitz constant $O(m)$ . Then we may lower bound the left-hand side by
By standard Fourier expansion (see e.g. [Reference Green and Tao17, Lemma A.9]), we may write
for all $y\in (\mathbb{R}/\mathbb{Z})^{3}$ and some bounded Fourier coefficients $c_{k}=O(1)$ ; integrating in $x$ , we see in particular that $c_{0}=10^{-3}+O(1/m)$ . We may thus write (9.19) as
which gives the desired claim as long as there are no relations of the form
for some non-zero $k\in \mathbb{Z}^{3}$ with $k=O(m^{O(1)})$ . But it is easy to see that the number of $(a_{(1)},a_{(2)},a_{(3)})$ with such a relation is $O(m^{O(1)}p^{2})$ , thus concluding the proof of (9.15).
Now we show (9.16). By linearity of expectation as before, it suffices to show that
for all but $O(m^{O(1)}p^{2})$ of the quadruples $(a_{(1)},a_{(2)},a_{(3)},a_{(4)})$ in $\operatorname{VBQ}$ . Using the factorization (9.18), it suffices to show that for each $j=1,\ldots ,m$ , one has
for all but $O(m^{O(1)}p^{2})$ of the quadruples $(a_{(1)},a_{(2)},a_{(3)},a_{(4)})$ in $\operatorname{VBQ}$ .
The left-hand side may be written as
which we bound above by
where $\unicode[STIX]{x1D70E}:=\unicode[STIX]{x1D709}(a_{(1)})+\unicode[STIX]{x1D709}(a_{(2)})-\unicode[STIX]{x1D709}(a_{(3)})-\unicode[STIX]{x1D709}(a_{(4)})$ . By arguing as in the proof of (9.15), we see that after deleting $O(m^{O(1)}p^{2})$ exceptional tuples, one has
so by Fubini’s theorem and the independence of $\mathbf{h}_{j}$ and $\boldsymbol{\unicode[STIX]{x1D706}}_{j}$ it will suffice to show that
However, by Lemma 4.6 and the hypothesis $(a_{(1)},a_{(2)},a_{(3)},a_{(4)})\in \operatorname{VBQ}$ we may find $h\in \mathbb{Z}/p\mathbb{Z}$ such that
In particular, $h$ is non-zero. By repeatedly doubling $h$ until $\Vert \unicode[STIX]{x1D702}h/p\Vert _{\mathbb{R}/\mathbb{Z}}$ exceeds ${\textstyle \frac{1}{4}}$ , we may also assume that
and thus
From Lemma 4.4 we conclude that
But from the triangle inequality we see that the events $\Vert \unicode[STIX]{x1D702}(\mathbf{h}_{j}+h)/p\Vert _{\mathbb{R}/\mathbb{Z}}\leqslant {\textstyle \frac{1}{8}}$ , $\Vert \unicode[STIX]{x1D702}\mathbf{h}_{j}/p\Vert _{\mathbb{R}/\mathbb{Z}}\leqslant {\textstyle \frac{1}{8}}$ are disjoint. The claim follows.◻
9.8 Fourth step: the rough set is pseudorandom in a Bohr set
The sets $A_{(i)}$ provided by Theorem 9.7 are currently rather arbitrary. In particular we have no control on the pseudorandomness of these sets (as measured by local Gowers $U^{2}$ norms) in the Bohr sets we are working with. However, it is possible to use an “energy decrement argument” to pass to smallerFootnote 5 Bohr sets in which the sets $A_{(i)}$ do enjoy good pseudorandomness properties, basically by converting any large Fourier coefficient of any of the $A_{(i)}$ in a Bohr set into a refinement of the Bohr sets (which add the frequency of the large Fourier coefficient to the frequency set $S$ ) on which the indicator function $1_{A_{(i)}}$ has smaller variance. Furthermore, it is possible to shrink the Bohr sets in this fashion without destroying the conclusion (9.11) of Theorem 9.7.
Here is a precise statement.
Theorem 9.9. Let the notation and hypotheses be as in Theorem 8.1, and let $\unicode[STIX]{x1D6FA}$ and $\unicode[STIX]{x1D709}$ be as in Theorem 9.2. Let $A_{(1)},A_{(2)},A_{(3)},A_{(4)},W$ be as in Theorem 9.7. Then there exists a natural number $j$ , $j\leqslant \unicode[STIX]{x1D702}^{-10^{3}C_{1}}$ , an additive quadruple $\vec{a}_{1}=(a_{(1),1},a_{(2),1},a_{(3),1},a_{(4),1})\in \operatorname{Q}$ , and a set $S_{1}$ , $S\subset S_{1}\subset \mathbb{Z}/p\mathbb{Z}$ with $|S_{1}|\leqslant |S|+j$ , with the following properties.
-
(i) (Few very bad quadruples) We have
(9.20) $$\begin{eqnarray}\mathbb{E}W(\vec{\mathbf{a}})\gg \unicode[STIX]{x1D702}^{C_{1}+O(1)},\end{eqnarray}$$where $\vec{\mathbf{a}}$ is a random additive quadruple centred at $\vec{a}_{1}$ with frequencies $S_{1}$ and scales $\unicode[STIX]{x1D70C}_{2,j+2}$ , $\unicode[STIX]{x1D70C}_{2,j+1}$ , and $\unicode[STIX]{x1D70C}_{2,j}$ . -
(ii) (Local Fourier pseudorandomness) For each $i=1,2,3,4$ , we have
$$\begin{eqnarray}\displaystyle & & \displaystyle |\mathbb{E}f_{i}(\mathbf{a}_{(i)}+\mathbf{h}_{0}+\mathbf{h}_{1})f_{i}(\mathbf{a}_{(i)}+\mathbf{h}_{0}+\mathbf{h}_{1}^{\prime })f_{i}(\mathbf{a}_{(i)}+\mathbf{h}_{0}^{\prime }+\mathbf{h}_{1})\nonumber\\ \displaystyle & & \displaystyle \quad \times \,f_{i}(\mathbf{a}_{(i)}+\mathbf{h}_{0}^{\prime }+\mathbf{h}_{1}^{\prime })|\leqslant \unicode[STIX]{x1D702}^{100C_{1}},\nonumber\end{eqnarray}$$where $f_{i}:\mathbb{Z}/p\mathbb{Z}\rightarrow [-1,1]$ denotes the balanced function(9.21) $$\begin{eqnarray}f_{i}(a_{(i)}):=1_{A_{(i)}}(a_{(i)})-\unicode[STIX]{x1D6FC}_{i},\end{eqnarray}$$$\unicode[STIX]{x1D6FC}_{i}$ denotes the mean(9.22) $$\begin{eqnarray}\unicode[STIX]{x1D6FC}_{i}:=\mathbb{E}1_{A_{(i)}}(\mathbf{a}_{(i)}),\end{eqnarray}$$and where $\mathbf{a}_{(i)}$ and $\mathbf{h}_{0},\mathbf{h}_{0}^{\prime },\mathbf{h}_{1},\mathbf{h}_{1}^{\prime }$ are drawn independently and regularly from the Bohr sets $a_{(i),1}+B(S_{1},\unicode[STIX]{x1D70C}_{2,j+l_{i}})$ and $B(S_{1},\unicode[STIX]{x1D70C}_{2,j+10})$ , $B(S_{1},\unicode[STIX]{x1D70C}_{2,j+10})$ , $B(S_{1},\unicode[STIX]{x1D70C}_{2,j+11})$ , $B(S_{1},\unicode[STIX]{x1D70C}_{2,j+11})$ respectively, with the quantity $l_{i}$ given by (9.9).
Proof. We will formulate the “energy decrement” argument here as a “score maximization” argument. Define a $4$ -neighbourhood to be a tuple
where $\vec{a}_{1}\in \text{Q}$ is an additive quadruple, $j$ is a natural number between $0$ and $\unicode[STIX]{x1D702}^{-10^{3}C_{1}}$ , and $S_{1}$ is a subset of $\mathbb{Z}/p\mathbb{Z}$ containing $S$ with $|S_{1}|\leqslant |S|+j$ ; we refer to $j$ as the depth of the $4$ -neighbourhood $N$ . Given such a neighbourhood, we define the score $\operatorname{Score}(N)$ of the $4$ -neighbourhood to be the quantity
where $\vec{\mathbf{a}}=(\mathbf{a}_{(1)},\mathbf{a}_{(2)},\mathbf{a}_{(3)},\mathbf{a}_{(4)})$ is a random additive quadruple centred at $\vec{a}_{1}$ with frequencies $S_{1}$ and scales $\unicode[STIX]{x1D70C}_{2,j+2},\unicode[STIX]{x1D70C}_{2,j+1},\unicode[STIX]{x1D70C}_{2,j}$ , and $\operatorname{E}_{i}$ is the energy-type quantity
If we define $N_{0}$ to be the $4$ -neighbourhood
then Theorem 9.7 tells us that
We choose
to be a $4$ -neighbourhood that comes within $\unicode[STIX]{x1D702}^{10^{3}C_{1}}$ (for example) of maximizing the adjusted score. Then we must have
which from (9.23) implies the bound (9.20), as well as the bound
(for example). It will then suffice to show that property (ii) of the theorem holds.
It remains to show (ii). Let $i=1,2,3,4$ , and write
Suppose for contradiction that
where $f_{i}$ is given by (9.21), and $\mathbf{a}_{(i)},\mathbf{h}_{0},\mathbf{h}_{0}^{\prime },\mathbf{h}_{1},\mathbf{h}_{1}^{\prime }$ are drawn independently and regularly from the Bohr sets $a_{(i),1}+B(S_{1},\unicode[STIX]{x1D70C}_{2,j+l_{i}})$ , $B(S_{1},\unicode[STIX]{x1D70C}_{2,j+10})$ , $B(S_{1},\unicode[STIX]{x1D70C}_{2,j+10})$ , $B(S_{1},\unicode[STIX]{x1D70C}_{2,j+11})$ , $B(S_{1},\unicode[STIX]{x1D70C}_{2,j+11})$ , with $l_{i}$ given by (9.9).
We will use (9.26) to construct a random $4$ -neighbourhood $\mathbf{N}$ of depth $j+20$ obeying the estimates
and
for $i^{\prime }=1,2,3,4$ . If we have the estimates (9.27), (9.28), we conclude from (9.23) and linearity of expectation that
contradicting the near-maximality of $\operatorname{Score}(N)$ .
It remains to construct $\mathbf{N}$ obeying (9.27), (9.28). We begin by noting that for each $a_{(i)}\in \mathbb{Z}/p\mathbb{Z}$ , the Gowers uniformity-type quantity
can be factored as
and thus takes values between $0$ and $1$ . By (9.26) and Lemma 2.2, we may thus find a set $E\subset \mathbb{Z}/p\mathbb{Z}$ with
such that
for all $a_{(i)}\in E$ . Applying Theorem 4.12, we may thus find, for each $a_{(i)}\in E$ , a frequency $\unicode[STIX]{x1D709}(a_{(i)})\in \mathbb{Z}/p\mathbb{Z}$ such that
where $\mathbf{n}_{0},\mathbf{n}_{1}$ are drawn independently and regularly from $B(S_{1},\unicode[STIX]{x1D70C}_{2,j_{\ast }+10})$ and $B(S_{1},\unicode[STIX]{x1D70C}_{2,j_{\ast }+11})$ respectively, independently of the $\mathbf{a}_{(i)}$ .
If we define $\unicode[STIX]{x1D709}(a_{(i)})$ arbitrarily for $a_{(i)}\not \in E$ (e.g. setting $\unicode[STIX]{x1D709}(a_{(i)})=0$ ), we thus have
In particular, there exists a $1$ -bounded function $g:\mathbb{Z}/p\mathbb{Z}\times \mathbb{Z}/p\mathbb{Z}\rightarrow \mathbb{C}$ such that
We now construct the random $4$ -neighbourhood $\mathbf{N}$ as follows. We first construct a random additive quadruple $\vec{\mathbf{k}}=(\mathbf{k}_{1},\mathbf{k}_{2},\mathbf{k}_{3},\mathbf{k}_{4})$ centred at the origin $(0,0,0,0)$ with frequency set $S_{1}$ and scales $\unicode[STIX]{x1D70C}_{2,j+10+l_{2}-l_{i}}$ , $\unicode[STIX]{x1D70C}_{2,j+10+l_{3}-l_{i}}$ , $\unicode[STIX]{x1D70C}_{2,j+10+l_{4}-l_{i}}$ , and independent of all previous random variables. We then set
It is easy to verify that $\mathbf{N}$ is a (random) $4$ -neighbourhood.
We now verify (9.27). The left-hand side of (9.27) can be expanded as
where, once $\vec{\mathbf{a}}$ and $\vec{\mathbf{k}}$ are chosen, the random additive quadruple $\vec{\mathbf{h}}=(\mathbf{h}_{1},\mathbf{h}_{2},\mathbf{h}_{3},\mathbf{h}_{4})$ is selected to be centred at $(0,0,0,0)$ with frequencies $S_{1}\cup \{\unicode[STIX]{x1D709}(\mathbf{a}_{(i)})\}$ and scales $\unicode[STIX]{x1D70C}_{2,j+22},\unicode[STIX]{x1D70C}_{2,j+21},\unicode[STIX]{x1D70C}_{2,j+20}$ .
From two applications of Lemma 4.4 (and the fact that $W=O(\unicode[STIX]{x1D702}^{-C_{1}/100})$ ), we have
(for example). The claim (9.27) now follows from (9.23).
Now we verify (9.28). By (9.24), we have
where $\vec{a}=(a_{(1)},\ldots ,a_{(4)})$ , $\vec{k}=(k_{1},\ldots ,k_{4})$ , and $\unicode[STIX]{x1D6FC}_{i^{\prime },\vec{a},\vec{k}}$ is the quantity
By Pythagoras’ theorem, we thus have
where $\unicode[STIX]{x1D6FC}_{i^{\prime }}$ is defined in (9.22). We shall shortly establish the bound
Assuming this bound, we conclude that
By applying Lemma 4.4 twice as in the proof of (9.27) to replace $\mathbf{a}_{(i^{\prime })}+\mathbf{k}_{i^{\prime }}+\mathbf{h}_{i^{\prime }}$ by $\mathbf{a}_{(i^{\prime })}$ for $i^{\prime }=2,3,4$ (and by using Lemma 4.4 six times for $i^{\prime }=1$ , after writing $\mathbf{a}_{(1)}$ in terms of $\mathbf{a}_{(2)},\mathbf{a}_{(3)},\mathbf{a}_{(4)}$ , and similarly for $\mathbf{k}_{(1)}$ and $\mathbf{h}_{(1)}$ ) we thus have
This will give (9.28) as soon as we establish (9.31). This is trivial for $i^{\prime }\neq i$ , so suppose that $i=i$ . By (9.30) and (9.21), it suffices to show that
To prove this, we introduce random variables $\mathbf{n}_{0},\mathbf{n}_{1}$ drawn independently and regularly from $B(S_{1},\unicode[STIX]{x1D70C}_{2,j+10})$ and $B(S_{1},\unicode[STIX]{x1D70C}_{2,j+11})$ independently of all previous variables. From (9.29) we have
for some $1$ -bounded function $g$ . After using Lemma 4.4 to compare $\mathbf{n}_{1}$ and $\mathbf{n}_{1}+\mathbf{h}_{i}$ for each fixed choice of $\mathbf{n}_{0}$ and $\mathbf{a}_{(i)}$ , we conclude that
But we have
and hence by (2.2)
We conclude that
For fixed choices of $\mathbf{a}_{(i)},\mathbf{h}_{(i)},\mathbf{n}_{1}$ , we see from Lemma 4.4 that $\mathbf{k}_{i}$ and $\mathbf{n}_{0}+\mathbf{n}_{1}$ differ in total variation by $O(\unicode[STIX]{x1D702}^{10^{3}C_{1}})$ . Thus we have
and the claim now follows after using Lemma 2.1 to eliminate the $g(\mathbf{k}_{i}-\mathbf{n}_{1},a_{(i)})e_{p}(-\unicode[STIX]{x1D709}(\mathbf{a}_{(i)})\mathbf{n}_{1})$ factor.◻
A useful consequence of the bounds in Theorem 9.9(ii) is the following weak mixing bound, which roughly speaking asserts that the convolution of $1_{A_{(i)}}$ with a bounded function is essentially constant.
Lemma 9.10. Let the notation and hypotheses be as above, and let $\unicode[STIX]{x1D6FA}$ and $\unicode[STIX]{x1D709}$ be as in Theorem 9.2. Let $A_{(1)},\ldots ,A_{(4)}$ be as in Theorem 9.7, and let $j,a_{(1),\ast },\ldots ,a_{(4),\ast },S_{1},f_{1},\ldots ,f_{4}$ be as in Theorem 9.9. Then for any $i=1,2,3,4$ , any $l_{i}<m\leqslant 10$ , and any $1$ -bounded function $g:\mathbb{Z}/p\mathbb{Z}\rightarrow \mathbb{C}$ , one has
where $\mathbf{n},\mathbf{k}$ are drawn independently and regularly from $a_{(i),\ast }+B(S_{1},\unicode[STIX]{x1D70C}_{2,j})$ and $B(S_{1},\unicode[STIX]{x1D70C}_{2,j+m})$ respectively. Dually, for any $1$ -bounded function $G:\mathbb{Z}/p\mathbb{Z}\rightarrow \mathbb{C}$ , one has
Proof. In preparation for invoking Theorem 9.9(ii), we introduce random variables $\mathbf{h}_{0},\mathbf{h}_{1},\mathbf{h}_{1}^{\prime }$ drawn independently and regularly from $B(S_{1},\unicode[STIX]{x1D70C}_{2,j_{\ast }+10})$ , $B(S_{1},\unicode[STIX]{x1D70C}_{2,j_{\ast }+11})$ , and $B(S_{1},\unicode[STIX]{x1D70C}_{2,j_{\ast }+11})$ respectively, independently of $\mathbf{n}$ and $\mathbf{k}$ . Using Lemma 4.4 to compare $\mathbf{n},\mathbf{k}$ with $\mathbf{n}+\mathbf{h}_{0}$ , $\mathbf{k}-\mathbf{h}_{1}$ respectively, we may transform (9.33) to the estimate
By the triangle inequality in $L^{2}$ , it thus suffices to show that
for all $k\in B(S_{1},\unicode[STIX]{x1D70C}_{2,j_{\ast }+m})$ .
Fix $k$ . We may expand out the left-hand side of (9.35) as
Using Lemma 4.4 to compare $\mathbf{n}$ with $\mathbf{n}+\mathbf{h}_{0}-\mathbf{h}_{1}-\mathbf{h}_{1}^{\prime }-k$ , we can thus rewrite (9.35) as
which by the triangle inequality and the $1$ -boundedness of $g$ would follow from
which by Cauchy–Schwarz will follow in turn from
But this follows from Theorem 9.9(ii) (relabelling $\mathbf{n}$ as $\mathbf{a}_{(i)}$ ).
Finally, we show (9.34). By subtracting $\mathbb{E}G(\mathbf{n})$ from $G$ (and dividing by $2$ to recover $1$ -boundedness), we may assume that $\mathbb{E}G(\mathbf{n})=0$ . It then suffices to show that
for any $1$ -bounded function $g$ . But the left-hand side may be rearranged as
and the claim follows from (9.33) and the Cauchy–Schwarz inequality. ◻
9.11 Fifth step: a frequency function $\unicode[STIX]{x1D709}^{\prime }$ that is approximately linear 99% of the time on a Bohr neighbourhood
The next step is to obtain additive structure on almost all of a Bohr neighbourhood, rather than just the subsets $A_{(i)}$ .
Theorem 9.12. Let the notation and hypotheses be as in Theorem 8.1, and let $\unicode[STIX]{x1D709}$ be as in Theorem 9.2. Let $A_{(1)},\ldots ,A_{(4)}$ be as in Theorem 9.7, and let $j,a_{(1),1},a_{(2),1},a_{(3),1},a_{(4),1},S_{1},\unicode[STIX]{x1D6FC}_{1},\ldots ,\unicode[STIX]{x1D6FC}_{4}$ be as in Theorem 9.9. Let $a_{1}\in \mathbb{Z}/p\mathbb{Z}$ be the quantity
and let $\mathbf{a}$ and $\mathbf{a}_{(2)}$ be drawn regularly and independently from $a_{1}+B(S_{1},\unicode[STIX]{x1D70C}_{2,j})$ and $a_{(2),1}+B(S_{1},\unicode[STIX]{x1D70C}_{2,j+2})$ respectively. Then there is a function $\unicode[STIX]{x1D709}^{\prime }:\mathbb{Z}/p\mathbb{Z}\rightarrow \mathbb{Z}/p\mathbb{Z}$ , such that with probability at least $1-O(\unicode[STIX]{x1D702}^{C_{1}/200})$ , the random variable $\mathbf{a}$ attains a value $a$ for which we have the estimates
and
Proof. Let $\mathbf{a}$ be drawn regularly from $a_{1}+B(S_{1},\unicode[STIX]{x1D70C}_{2,j})$ , and let $(\mathbf{a}_{(1)},\mathbf{a}_{(2)},\mathbf{a}_{(3)},\mathbf{a}_{(4)})$ be a random additive quadruple centred at $(a_{(1),1},a_{(2),1},a_{(3),1},a_{(4),1})$ with frequencies $S_{1}$ and scales $\unicode[STIX]{x1D70C}_{2,j+2},\unicode[STIX]{x1D70C}_{2,j+1},\unicode[STIX]{x1D70C}_{2,j}$ , independently of $\mathbf{a}$ . From the definition of an additive quadruple, we have $\mathbf{a}_{(1)}=\mathbf{a}_{(3)}+\mathbf{a}_{(4)}-\mathbf{a}_{(2)}$ . From Theorem 9.9(i) we thus have
From Lemma 4.4 we see that once we condition $\mathbf{a}_{(2)}$ and $\mathbf{a}_{(3)}$ to be fixed, $\mathbf{a}_{(4)}$ and $\mathbf{a}-\mathbf{a}_{(3)}$ differ in total variation by $O(\unicode[STIX]{x1D702}^{100C_{1}})$ . Thus we may replace $\mathbf{a}_{(4)}$ by $\mathbf{a}-\mathbf{a}_{(3)}$ in (9.38) to conclude that
If we then define
then from (9.12) we see that
and
We can express $\unicode[STIX]{x1D70E}$ in the form
where $g_{12},g_{34}:\mathbb{Z}/p/\mathbb{Z}\rightarrow \mathbb{R}$ are the functions
and
From Lemma 9.10, we have
if $\mathbf{n},\mathbf{k}$ are drawn independently and regularly from $a_{(i),1}+B(S_{1},\unicode[STIX]{x1D70C}_{2,j})$ and $B(S_{1},\unicode[STIX]{x1D70C}_{2,j+m})$ respectively. Note that the pair $(\mathbf{n},\mathbf{k})$ has the same distribution as $(\mathbf{a}-a_{(2),1},\mathbf{a}_{(2)}-a_{(2),1})$ , thus
From (9.21), (9.22), (9.42) we have
and thus
Similarly we have
From Cauchy–Schwarz and the triangle inequality we conclude that
and hence by (9.41) and the triangle inequality
In particular, from (9.39) one has
From (9.45), (9.46) and (9.40) we have
where
By Markov’s inequality, we conclude that we have
with probability $1-O(\unicode[STIX]{x1D702}^{C_{1}/200})$ . Similarly, from (9.43), (9.44) and Chebyshev’s inequality we also have
and
with probability $1-O(\unicode[STIX]{x1D702}^{C_{1}/200})$ .
Now let $a$ be a value of $\mathbf{a}$ be such that (9.48)–(9.50) hold. From (9.50) we have in particular that
comparing this with (9.48) and (9.47), we see that we may find $a_{(3)}(a)\in A_{(3)}$ (depending only on $a$ ) with $a-a_{(3)}(a)\in A_{(4)}$ such that
If we then set $\unicode[STIX]{x1D709}^{\prime }(a):=\unicode[STIX]{x1D709}(a_{(3)}(a))+\unicode[STIX]{x1D709}(a-a_{(3)}(a))$ (and define $\unicode[STIX]{x1D709}^{\prime }(\mathbf{a})$ arbitrarily when (9.48), (9.49), or (9.50) fail), then the claims (9.36), (9.37) follow from (9.49) and the definition (9.10) of $\operatorname{VBQ}$ .◻
The function $\unicode[STIX]{x1D709}^{\prime }$ has better additive structure than $\unicode[STIX]{x1D709}$ , in that it respects almost all additive quadruples in a Bohr set, rather than almost all additive quadruples in a rough set. More precisely, we have the following.
Proposition 9.13. Let the notation and hypotheses be as in Theorem 9.12. Suppose that $\mathbf{a},\mathbf{a}^{\prime },\mathbf{h}$ are selected independently and regularly from $a_{1}+B(S_{1},\unicode[STIX]{x1D70C}_{2,j})$ , $a_{1}+B(S_{1},\unicode[STIX]{x1D70C}_{2,j})$ , and $B(S_{1},\unicode[STIX]{x1D70C}_{2,j+3})$ respectively. Then with probability $1-O(\unicode[STIX]{x1D702}^{C_{1}/200})$ we have
Proof. Let $\mathbf{a}_{(2)}$ be drawn regularly from $a_{(2),1}+B(S_{1},\unicode[STIX]{x1D70C}_{2,j+2})$ , independently of $\mathbf{a},\mathbf{a}^{\prime },\mathbf{h}$ . For each $a,a^{\prime },h\in \mathbb{Z}/p\mathbb{Z}$ , let $\mathbf{I}_{a,a^{\prime },h}$ denote the random indicator variable
Suppose that we can show that with probability $1-O(\unicode[STIX]{x1D702}^{C_{1}/200})$ , the triple $(\mathbf{a},\mathbf{a}^{\prime },\mathbf{h})$ attains a value $(a,a^{\prime },h)$ for which one has the estimates
Assuming these estimates, we conclude from the union bound that with probability $1-O(\unicode[STIX]{x1D702}^{C_{1}/200})$ , the random variable $(\mathbf{a},\mathbf{a}^{\prime },\mathbf{h})$ attains a value $(a,a^{\prime },h)$ for which there exists at least one element $a_{(2)}$ of $\mathbb{Z}/p\mathbb{Z}$ obeying the constraints
and (9.51) then follows from the triangle inequality.
It remains to establish (9.52)–(9.56). We first prove (9.53). By Markov’s inequality, it suffices to show that
We rewrite the left-hand side as
where
and
But from (9.37) we have
from Lemma 4.4 one has
and from (9.33) one has
with probability $1-O(\unicode[STIX]{x1D702}^{10C_{1}})$ (for example), with the trivial bound $g(\mathbf{a}_{(2)})=O(1)$ otherwise, and the claim (9.53) then follows from (9.46).
The proofs of (9.54)–(9.56) are similar to (9.53) and are omitted. It thus remains to prove (9.52). From (9.34) and Markov’s inequality, we see that with probability $1-O(\unicode[STIX]{x1D702}^{C_{1}/200})$ , the random variable $\mathbf{h}$ attains a value $h$ for which
For any $h$ obeying this inequality, define $E(h)\subset \mathbb{Z}/p\mathbb{Z}$ to be the set
so that
By (9.33) and the Chebyshev inequality, we conclude that with probability $1-O(\unicode[STIX]{x1D702}^{C_{1}/200})$ , the random variable $(\mathbf{a},\mathbf{h})$ attains a value $(a,h)$ for which one has
For any $(a,h)$ of the above form, define $E^{\prime }(a,h)\subset \mathbb{Z}/p\mathbb{Z}$ to be the set
then
By one last application of (9.33) and the Chebyshev inequality, we see that with probability $1-O(\unicode[STIX]{x1D702}^{C_{1}/200})$ , the random variable $(\mathbf{a}^{\prime },\mathbf{a},\mathbf{h})$ attains a value $(a^{\prime },a,h)$ for which one has
which gives (9.52) as required. ◻
9.14 Sixth step: a frequency function $\unicode[STIX]{x1D709}^{\prime \prime }$ that is approximately linear 100% of the time on a Bohr set
We now use a standard “majority vote” argument to upgrade the “99% linear” structure of $\unicode[STIX]{x1D709}^{\prime }$ to a “100% linear” structure of a closely related function $\unicode[STIX]{x1D709}^{\prime \prime }$ (cf. [Reference Blum, Luby and Rubinfeld5]). More precisely, one has the following.
Theorem 9.15. Let the notation and hypotheses be as in Theorem 8.1. Let $j,S_{1}$ be as in Theorem 9.9, and let $a_{1}$ , $\unicode[STIX]{x1D709}^{\prime }$ be as in Theorem 9.12. Then there is a function $\unicode[STIX]{x1D709}^{\prime \prime }:B(S_{1},\unicode[STIX]{x1D70C}_{3})\rightarrow \mathbb{Z}/p\mathbb{Z}$ such that
for all $n,m\in B(S_{1},\unicode[STIX]{x1D70C}_{3}/2)$ , and such that for any $n\in B(S_{1},\unicode[STIX]{x1D70C}_{3})$ , if $\mathbf{a}$ is drawn regularly from $a_{1}+B(S_{1},\unicode[STIX]{x1D70C}_{2,j})$ , one has
with probability $1-O(\unicode[STIX]{x1D702}^{C_{1}/200})$ .
Proof. Let $\mathbf{a},\mathbf{h}$ be drawn independently and regularly from $a_{\ast }+B(S_{1},\unicode[STIX]{x1D70C}_{2,j})$ and $B(S_{1},\unicode[STIX]{x1D70C}_{2,j+3})$ respectively. From Proposition 9.13 and the pigeonhole principle, we may find $a_{0}^{\prime }\in \mathbb{Z}/p\mathbb{Z}$ such that
Fix this $a_{0}^{\prime }$ . Now let $n$ by an arbitrary element of $B(S_{1},\unicode[STIX]{x1D70C}_{3})$ . Then using Lemma 4.4 to compare $\mathbf{a}$ with $\mathbf{a}-n$ and $\mathbf{h}$ with $\mathbf{h}+n$ , we obtain
Combining this with (9.59) and the triangle inequality, we see that
Thus, by the pigeonhole principle, we may find $h_{n}\in \mathbb{Z}/p\mathbb{Z}$ such that
If we thus define
then we have obtained (9.58).
Now suppose that $n,m\in B(S_{1},\unicode[STIX]{x1D70C}_{3}/2)$ . From (9.58), we see that with probability at least $1-O(\unicode[STIX]{x1D702}^{C_{1}/200})$ we have
and
Using Lemma 4.4 to compare $\mathbf{a}$ with $\mathbf{a}-n$ in the second inequality, we also conclude
with probability $1-O(\unicode[STIX]{x1D702}^{C_{1}/200})$ . Thus there is a positive probability that the first, third, and fourth estimates hold simultaneously, and the claim (9.57) follows from the triangle inequality.◻
The function $\unicode[STIX]{x1D709}^{\prime \prime }$ is still closely related to $\unicode[STIX]{x1D709}$ , and in particular a variant of the correlation estimate (9.3) is obeyed by $\unicode[STIX]{x1D709}^{\prime \prime }$ .
Proposition 9.16. Let the notation and hypotheses be as in the preceding theorem. Then there exist $a_{0}\in B(S,3\unicode[STIX]{x1D70C}_{2})$ and $\unicode[STIX]{x1D709}_{0}\in \mathbb{Z}/p\mathbb{Z}$ such that
where $\mathbf{n},\mathbf{n}_{0},\mathbf{h}$ are drawn independently and regularly from the Bohr sets $B(S_{1},\unicode[STIX]{x1D70C}_{3}/4)$ , $B(S,\unicode[STIX]{x1D70C}_{0})$ , $B(S_{1},\unicode[STIX]{x1D70C}_{4})$ respectively.
With this proposition and the previous theorem, we may now safely forget about the original function $\unicode[STIX]{x1D709}$ , and work now with $\unicode[STIX]{x1D709}^{\prime \prime }$ ; the parameters $a_{1},j$ will also no longer be relevant.
Proof. Let $\mathbf{n}$ , $\mathbf{a}$ , $\mathbf{a}_{(2)}$ be drawn independently and regularly from $B(S_{1},\unicode[STIX]{x1D70C}_{3}/4)$ , $a_{1}+B(S_{1},\unicode[STIX]{x1D70C}_{2,j})$ , and $B(S_{1},\unicode[STIX]{x1D70C}_{2,j+2})$ respectively. From (9.58) we have
with probability $1-O(\unicode[STIX]{x1D702}^{C_{1}/200})$ . Similarly, from (9.36), (9.37), (9.46) we see that with probability $1-O(\unicode[STIX]{x1D702}^{C_{1}/200})$ , the random variable $\mathbf{a}$ attains a value $a$ for which
Using Lemma 4.4 to compare $\mathbf{a}$ and $\mathbf{a}-\mathbf{n}$ , we also see that with probability $1-O(\unicode[STIX]{x1D702}^{C_{1}/200})$ , the random variable $(\mathbf{a},\mathbf{n})$ attains a value $(a,n)$ for which
From the union bound and Fubini’s theorem, we conclude that with probability $\gg \unicode[STIX]{x1D6FC}_{1}\unicode[STIX]{x1D6FC}_{2}$ , we simultaneously have the statements
and hence by the triangle inequality
By the pigeonhole principle, we may thus find $a,a_{(2)}\in \mathbb{Z}/p\mathbb{Z}$ such that the statements
simultaneously hold with probability $\gg \unicode[STIX]{x1D6FC}_{1}\unicode[STIX]{x1D6FC}_{2}$ , and thus with probability $\gg \unicode[STIX]{x1D702}^{C_{1}+O(1)}$ thanks to (9.46). Writing $a_{0}:=a-a_{(2)}$ and $\unicode[STIX]{x1D709}_{0}:=\unicode[STIX]{x1D709}(a_{(2)})-\unicode[STIX]{x1D709}^{\prime }(a)$ , and recalling from Theorem 9.7 that $A_{(1)}\in S$ , we thus have
In particular, since $\mathbf{n}\in B(S_{1},\unicode[STIX]{x1D70C}_{3}/4)$ and $S\subset B(S,2\unicode[STIX]{x1D70C}_{2})$ , we have $a_{0}\in B(S,3\unicode[STIX]{x1D70C}_{2})$ .
Let $\mathbf{n}_{0},\mathbf{n}_{1}$ be drawn independently and regularly from $B(S,\unicode[STIX]{x1D70C}_{0}),B(S,\unicode[STIX]{x1D70C}_{1})$ respectively, independently of all previous random variables. From the above estimate and (9.3), we see that with probability $\gg \unicode[STIX]{x1D702}^{C_{1}+O(1)}$ , the random variable $\mathbf{n}$ attains a value $n$ for which the statements
simultaneously hold.
Let $n$ obey the above estimates (9.60)–(9.62). If we now draw $\mathbf{h}$ regularly from $B(S_{1},\unicode[STIX]{x1D70C}_{4})$ , then by using Lemma 4.4 to compare $\mathbf{n}_{1}$ with $\mathbf{n}_{1}+\mathbf{h}$ in (9.62), we obtain
and thus by the triangle inequality in $L^{2}$
We may delete the deterministic phase $e_{p}(-\unicode[STIX]{x1D709}(a_{0}-n)n_{1})$ to obtain
Since $\mathbf{h}$ takes values in $B(S_{1},\unicode[STIX]{x1D70C}_{4})$ , we see from (9.61) that
(for example), and so
Using Lemma 4.4 to compare $\mathbf{n}_{0}$ with $\mathbf{n}_{0}+\mathbf{n}_{1}$ , we conclude that
Multiplying by $\mathbb{P}(\mathbf{n}=n)$ and summing in $n$ , we obtain the claim. ◻
9.17 Seventh step: derivatives of $f$ correlate with a locally bilinear form
We now pass to the “cohomological” phase of the argument, in which we remove the error $\unicode[STIX]{x1D709}^{\prime \prime }(n+m)-\unicode[STIX]{x1D709}^{\prime \prime }(n)-\unicode[STIX]{x1D709}^{\prime \prime }(m)$ in the linearity of $\unicode[STIX]{x1D709}^{\prime \prime }$ that appears in (9.57). This improved linearity of the form $(n,h)\mapsto \unicode[STIX]{x1D709}(n)h$ in the $n$ aspect will come at the expense of the $h$ aspect, which will now merely be locally linear instead of globally linear. However, this is a worthwhile tradeoff for our purposes (and in any event local linearity is more natural in this context than global linearity).
More precisely, the purpose of this subsection is to establish the following result towards the proof of Theorem 8.1.
Theorem 9.18. Let the notation and hypotheses be as in Theorem 8.1. Then there exists a set $S_{1}$ with $S\subset S_{1}\subset \mathbb{Z}/p\mathbb{Z}$ and $|S_{1}|\leqslant |S|+O(\unicode[STIX]{x1D702}^{-O(C_{1})})$ , a locally bilinear map
a shift $a_{1}\in B(S,4\unicode[STIX]{x1D70C}_{2})$ , and a frequency $\unicode[STIX]{x1D709}_{1}\in \mathbb{Z}/p\mathbb{Z}$ such that
if $\mathbf{n}_{0},\mathbf{m}_{1},\mathbf{n}_{1}$ are drawn independently and regularly from $B(S,\unicode[STIX]{x1D70C}_{0})$ , $B(S_{1},\unicode[STIX]{x1D70C}_{5})$ , and $B(S_{1},\unicode[STIX]{x1D70C}_{6})$ respectively.
Once the proof of this theorem is completed, the auxiliary data $\unicode[STIX]{x1D709},\unicode[STIX]{x1D709}^{\prime },\unicode[STIX]{x1D709}^{\prime \prime },j$ , $\unicode[STIX]{x1D6FA},\operatorname{VBQ}$ used in the previous parts of the section are no longer needed and may be discarded.
We now prove Theorem 9.18. Let $j_{\ast },S_{1}$ be as in Theorem 9.9, let $a_{\ast }$ , $\unicode[STIX]{x1D709}^{\prime }$ be as in Theorem 9.12, let $\unicode[STIX]{x1D709}^{\prime \prime }:B(S_{1},\unicode[STIX]{x1D70C}_{3})\rightarrow \mathbb{Z}/p\mathbb{Z}$ be as in Theorem 9.15, and let $a_{0},\unicode[STIX]{x1D709}_{0}$ be as in Proposition 9.16. We will use a “cohomological” argument to construct the required bilinear map $\unicode[STIX]{x1D6EF}$ . Namely, we define the cocycle $\unicode[STIX]{x1D707}:B(S_{1},\unicode[STIX]{x1D70C}_{3}/2)\times B(S_{1},\unicode[STIX]{x1D70C}_{3}/2)\rightarrow \mathbb{Z}/p\mathbb{Z}$ to be the quantity
Clearly (9.57) is symmetric, and we have the cocycle equation
as well as the auxiliary equations
whenever $n_{1},n_{2},n_{3}\in B(S_{1},\unicode[STIX]{x1D70C}_{3}/4)$ . From (9.57) we also have the estimate
for all $n,m\in B(S_{1},\unicode[STIX]{x1D70C}_{3}/4)$ .
To construct the bilinear map $\unicode[STIX]{x1D6EF}$ , we will show that a certain projection of $\unicode[STIX]{x1D707}$ is a “coboundary” is a certain sense. Let $\unicode[STIX]{x1D719}:\mathbb{Z}^{S}\rightarrow \mathbb{Z}/p\mathbb{Z}$ be the homomorphism
From (9.66), we see that for each $n,m\in B(S_{1},\unicode[STIX]{x1D70C}_{3}/4)$ we have a representation of the form
for some lift $\tilde{\unicode[STIX]{x1D707}}(n,m)\in \mathbb{Z}^{S}$ of size
This lift $\tilde{\unicode[STIX]{x1D707}}(n,m)$ is only defined up to an element of the kernel $\text{ker}(\unicode[STIX]{x1D719}):=\{p\in \mathbb{Z}^{S}:\unicode[STIX]{x1D719}(p)=0\}$ of $\unicode[STIX]{x1D719}$ ; to eliminate this ambiguity we will apply a projection. Since $S$ contains a non-zero element, $\unicode[STIX]{x1D719}:\mathbb{Z}^{S}\rightarrow \mathbb{Z}/p\mathbb{Z}$ is a surjective homomorphism, and in particular, $\text{ker}(\unicode[STIX]{x1D719})$ is a sublattice of $\mathbb{Z}^{S}$ of index $p$ . Applying Lemma 4.8, we may find generators $v_{1},\ldots ,v_{|S|}$ of $\text{ker}(\unicode[STIX]{x1D719})$ and real numbers $N_{1},\ldots ,N_{|S|}>0$ with
such that
for all $t>0$ .
By relabelling, we may take the $N_{i}$ to be non-increasing. Let $d$ , $0\leqslant d\leqslant |S|$ be such that
From (9.69), (8.3) we see that $d$ cannot equal $|S|$ . Let $V$ be the $d$ -dimensional subspace of $\mathbb{R}^{S}$ spanned by $v_{1},\ldots ,v_{d}$ , let $V^{\bot }$ be the orthogonal complement of $V$ in $\mathbb{R}^{S}$ , and let $\unicode[STIX]{x1D70B}:\mathbb{R}^{S}\rightarrow V^{\bot }$ be the orthogonal projection.
We claim that $\unicode[STIX]{x1D70B}(\tilde{\unicode[STIX]{x1D707}}(n,m))$ is now uniquely determined by $\unicode[STIX]{x1D707}(n,m)$ for $n,m\in B(S_{1},\unicode[STIX]{x1D70C}_{3}/4)$ . Indeed, if $\tilde{\unicode[STIX]{x1D707}}(n,m)$ and $\tilde{\unicode[STIX]{x1D707}}^{\prime }(n,m)$ both obeyed (9.67), (9.68), then their difference (call it $w$ ) would be of magnitude $O(1/\unicode[STIX]{x1D70C}_{3})$ and lies in the kernel of $\unicode[STIX]{x1D719}$ . By (9.70) with $t=\exp (-K^{C_{1}})\unicode[STIX]{x1D70C}_{3}$ , we conclude that $w$ lies in $V$ , and hence $\unicode[STIX]{x1D70B}(\tilde{\unicode[STIX]{x1D707}}(n,m))$ and $\unicode[STIX]{x1D70B}(\tilde{\unicode[STIX]{x1D707}}^{\prime }(n,m))$ agree.
A variant of the above argument shows that $\unicode[STIX]{x1D70B}\circ \tilde{\unicode[STIX]{x1D707}}$ also continues to obey the cocycle equation.
Lemma 9.19 (Projected lift is a cocycle).
One has
and additionally
for all $n_{1},n_{2},n_{3}\in B(S_{1},\unicode[STIX]{x1D70C}_{3}/4)$ .
Proof. By (9.68), the quantity $w:=\tilde{\unicode[STIX]{x1D707}}(n_{1},n_{2}+n_{3})+\tilde{\unicode[STIX]{x1D707}}(n_{2},n_{3})-\tilde{\unicode[STIX]{x1D707}}(n_{1},n_{2})-\tilde{\unicode[STIX]{x1D707}}(n_{1}+n_{2},n_{3})$ has magnitude $O(1/\unicode[STIX]{x1D70C}_{3})$ ; by (9.67), (9.65), $w$ lies in the kernel of $\unicode[STIX]{x1D719}$ . Repeating the previous arguments, we conclude that $w\in V$ . Applying the homomorphism $\unicode[STIX]{x1D70B}$ , we obtain the first claim. The second claim is proven similarly.◻
We can in fact make $\unicode[STIX]{x1D70B}\circ \tilde{\unicode[STIX]{x1D707}}$ a coboundary, after shrinking the domain somewhat.
Proposition 9.20 (Projected lift is a coboundary).
There exists a map $F:B(S_{1},2\exp (-K^{C_{1}^{2}})\unicode[STIX]{x1D70C}_{3})\rightarrow V^{\bot }$ with
for all $n\in B(S_{1},2\exp (-K^{C_{1}^{2}})\unicode[STIX]{x1D70C}_{3})$ , such that
for all $n_{1},n_{2}\in B(S_{1},\exp (-K^{C_{1}^{2}})\unicode[STIX]{x1D70C}_{3})$ .
Proof. As a first attempt at constructing $F$ , we introduce the average
for $n\in B(S_{1},\unicode[STIX]{x1D70C}_{3}/4)$ , where $\mathbf{n}_{3}$ is drawn regularly from $B(S_{1},\unicode[STIX]{x1D70C}_{3}/4)$ . From (9.68) we have
for all $n\in B(S_{1},\unicode[STIX]{x1D70C}_{3}/4)$ . Also, since $|S_{1}|\ll K^{O(C_{1})}$ , if we replace $n_{3}$ by $\mathbf{n}_{3}$ in Lemma 9.19 and take expectations using Lemma 4.4, we conclude that
for all $n_{1},n_{2}\in B(S_{1},\unicode[STIX]{x1D70C}_{3}/8)$ .
If we now introduce the modified cocycle
for $n_{1},n_{2}\in B(S_{1},\unicode[STIX]{x1D70C}_{3}/8)$ , then we have the cocycle equation
the auxiliary equations
and the bound
for $n_{1},n_{2}\in B(S_{1},\unicode[STIX]{x1D70C}_{3}/16)$ .
We now make $\unicode[STIX]{x1D70E}_{1}$ a coboundary by using a basis for $B(S_{1},\unicode[STIX]{x1D70C}_{3}/16)$ . Set $d:=|S_{1}|\leqslant K^{O(C_{1})}$ . By Corollary 4.9, we can find $a_{1},\ldots ,a_{d}$ of $\mathbb{Z}/p\mathbb{Z}$ and real numbers $N_{1},\ldots ,N_{d}>0$ such that
for all $i=1,\ldots ,d$ , and such that for any $a\in \mathbb{Z}/p\mathbb{Z}$ , there exists a representation
with $m_{1},\ldots ,m_{d}$ integers of size
for $i=1,\ldots ,d$ , with at most one such representation obeying the bounds $|m_{i}|<N_{i}/2$ for $i=1,\ldots ,d$ .
By relabelling we may assume that $N_{i}\geqslant 32d^{\prime }/\unicode[STIX]{x1D70C}_{3}$ for $i=1,\ldots ,d^{\prime }$ and $N_{i}<32d^{\prime }/\unicode[STIX]{x1D70C}_{3}$ for $i=d^{\prime }+1,\ldots ,d$ for some $0\leqslant d^{\prime }\leqslant d$ . By (9.75) we have $a_{i}\in B(S_{1},\unicode[STIX]{x1D70C}_{3}/32d^{\prime })$ for all $i=1,\ldots ,d^{\prime }$ . In particular, from (9.73) we see that for any $n\in B(S_{1},\unicode[STIX]{x1D70C}_{3}/32)$ and $1\leqslant i,j\leqslant d^{\prime }$ , we have
and hence by swapping $i$ and $j$ and subtracting
Let $P\subset \mathbb{Z}^{d^{\prime }}$ denote the collection of tuples $(m_{1},\ldots ,m_{d^{\prime }})\in \mathbb{Z}^{d^{\prime }}$ with $|m_{i}|\leqslant \unicode[STIX]{x1D70C}_{3}/2N_{i}$ for $i=1,\ldots ,d^{\prime }$ , and for each $m\in P$ and $i=1,\ldots ,d$ , define the quantity
where $\unicode[STIX]{x1D719}:\mathbb{Z}^{d^{\prime }}\rightarrow \mathbb{Z}/p\mathbb{Z}$ is the homomorphism
Then from (9.75) we have $\unicode[STIX]{x1D719}(P)\subset B(S_{1},\unicode[STIX]{x1D70C}_{3}/32)$ . The above identity then says that the “ $1$ -form” $(f_{1},\ldots ,f_{d^{\prime }})$ is “closed” or “curl-free” in the sense that
whenever $i,j=1,\ldots ,d^{\prime }$ and $m,m+e_{i},m+e_{j}\in P$ , where $e_{1},\ldots ,e_{d^{\prime }}$ is the standard basis for $P$ . This implies that there exists a function $H:P\rightarrow V^{\bot }$ such that $F(0)=0$ and $f_{i}(m)=H(m+e_{i})-H(m)$ whenever $i=1,\ldots ,d$ and $m,m+e_{i}\in P$ . Indeed, one can define $H$ to be an “antiderivative” of the $(f_{1},\ldots ,f_{d^{\prime }})$ by setting
whenever $0=m_{0},\ldots ,m_{L}=m$ is a path in $P$ with $m_{l+1}=m_{l}+e_{i_{l}}$ for $l=0,\ldots ,L-1$ ; a “homotopy” argument using (9.78) shows that the right-hand side does not depend on the choice of path. From (9.74), (9.75) we have
for $m\in P$ and $i=1,\ldots ,d^{\prime }$ , which on “integrating” (and recalling that $d^{\prime }\leqslant d\ll K^{O(C_{1})}$ ) implies that
for all $m\in P$ .
Since $\unicode[STIX]{x1D70E}_{1}(0,e_{i})=0$ , we have $f_{i}(0)=0$ and hence $H(e_{i})=0$ for all $i=1,\ldots ,d^{\prime }$ . Thus we have
whenever $m,m+e_{i}\in P$ . An induction (on the magnitude of a vector $m^{\prime }$ ) using (9.73) then shows that
whenever $m,m^{\prime },m+m^{\prime }\in P$ . Now, if $n\in B(S_{1},2\exp (-K^{C_{1}^{2}})\unicode[STIX]{x1D70C})$ , then by (9.76), (9.77) we see that $n=\unicode[STIX]{x1D719}(m)$ for some $m\in P$ . If we then define $F_{2}:B(S_{1},2\exp (-K^{C_{1}^{2}})\unicode[STIX]{x1D70C})\rightarrow V^{\bot }$ by setting $F_{2}(n):=H(m)$ , we conclude that
and
for all $n,n^{\prime }\in B(S_{1},\exp (-K^{C_{1}^{2}})\unicode[STIX]{x1D70C})$ . Setting $F:=F_{2}-F_{1}$ , we obtain the claim.◻
Let $F$ be as in Proposition 9.20. We use $F$ to construct the locally bilinear form $\unicode[STIX]{x1D6EF}:B(S_{1},\unicode[STIX]{x1D70C}_{4})\times B(S_{1},\unicode[STIX]{x1D70C}_{4})\rightarrow \mathbb{R}/\mathbb{Z}$ as follows. We first define the locally linear map $\unicode[STIX]{x1D704}:B(S_{1},\unicode[STIX]{x1D70C}_{4})\rightarrow \mathbb{R}^{S}$ by the formula
where $x\mapsto \{x\}$ is the signed fractional map from $\mathbb{R}/\mathbb{Z}$ to $(-1/2,1/2]$ ; note that $\unicode[STIX]{x1D704}$ takes values in the box $[-\unicode[STIX]{x1D70C}_{4},\unicode[STIX]{x1D70C}_{4}]^{S}$ . We then define
for $n,m\in B(S_{1},\unicode[STIX]{x1D70C}_{4})$ , where $\cdot$ denotes the dot product on $\mathbb{R}^{S}$ . It is clear that $\unicode[STIX]{x1D6EF}$ is locally linear in $m$ ; we also claim that it is locally linear in $n$ , thus
whenever $n_{1},n_{2},n_{1}+n_{2}\in B(S_{1},\unicode[STIX]{x1D70C}_{4})$ . By (9.64) and Proposition 9.20, the left-hand side of (9.80) may be written as
From (9.67) we have
so to prove (9.80), it suffices to show that $\unicode[STIX]{x1D704}(m)$ lies in $V^{\bot }$ . This is equivalent to showing that $\unicode[STIX]{x1D704}(m)\cdot v_{i}=0$ for $i=1,\ldots ,d$ . Since $v_{i}\in \text{ker}(\unicode[STIX]{x1D719})$ , we have
On the other hand, we have $\unicode[STIX]{x1D704}(m)=O(K^{1/2}\unicode[STIX]{x1D70C}_{4})$ , and from (9.70) with $t=N_{i}^{-1}$ followed by (9.71), we have
and hence $|\unicode[STIX]{x1D704}(m)\cdot v_{i}|<1$ . The claim follows.
Now we verify (9.63). Let $a_{0},\unicode[STIX]{x1D709}_{0}$ be as in Proposition 9.16. Let $\mathbf{n},\mathbf{n}_{0},\mathbf{h},\mathbf{n}_{1}$ , $\mathbf{m}_{1}$ be drawn independently and regularly from the Bohr sets $B(S_{1},\unicode[STIX]{x1D70C}_{3}/4)$ , $B(S,\unicode[STIX]{x1D70C}_{0})$ , $B(S_{1},\unicode[STIX]{x1D70C}_{4})$ , $B(S_{1},\unicode[STIX]{x1D70C}_{6})$ , $B(S_{1},\unicode[STIX]{x1D70C}_{5})$ respectively. From Proposition 9.16 we have
Using Lemma 4.4 to replace $\mathbf{n}$ by $\mathbf{n}+\mathbf{n}_{1}$ , and to replace $\mathbf{h}$ by $\mathbf{h}+\mathbf{m}_{1}$ , we have
and thus by the triangle inequality we have
The phase $e((\unicode[STIX]{x1D709}^{\prime \prime }(n+n_{1})-\unicode[STIX]{x1D709}_{0})h)$ is deterministic and may thus be omitted:
As the expectation only depends on the sum $n_{0}+h$ rather than the individual variables $n_{0},h$ , we thus have
By Lemma 4.4 we may replace $\mathbf{n}_{0}+\mathbf{h}$ here by $\mathbf{n}_{0}$ . From (9.57) we have
and so
By the pigeonhole principle, there thus exists $n\in B(S_{\ast },\unicode[STIX]{x1D70C}_{3}/4)$ such that
which, if we write $a_{1}:=a_{0}-n$ and $\unicode[STIX]{x1D709}_{1}:=\unicode[STIX]{x1D709}_{0}-\unicode[STIX]{x1D709}^{\prime \prime }(n)$ , simplifies to
Since $a_{0}\in B(S,3\unicode[STIX]{x1D70C}_{2})$ and $n\in B(S_{\ast },\unicode[STIX]{x1D70C}_{3}/4)$ , we have $a_{1}\in B(S,4\unicode[STIX]{x1D70C}_{2})$ .
Now, from (9.79) one has
but since $\mathbf{m}_{1}\in B(S_{\ast },\unicode[STIX]{x1D70C}_{5})$ , we have $\unicode[STIX]{x1D704}(\mathbf{m}_{1})=O(K\unicode[STIX]{x1D70C}_{5})$ , and hence by (9.72) we have
and so
which gives (9.63). The proof of Theorem 9.18 is now complete.
9.21 Eighth step: making the frequency function symmetric
The next step is the “symmetry step” from [Reference Green and Tao14, Reference Samorodnitsky26], which uses the Cauchy–Schwarz inequality to ensure that $\unicode[STIX]{x1D6EF}$ is essentially symmetric.
Theorem 9.22. Let the notation and hypotheses be as in Theorem 9.18. For $n,m\in B(S_{1},\unicode[STIX]{x1D70C}_{4})$ , define
Then there exists a natural number $k$ with $1\leqslant k\ll \exp (K^{O(C_{1})})$ such that
for all $n,m\in B(S_{1},\unicode[STIX]{x1D70C}_{9})$ .
Proof. Let $\mathbf{n}_{0},\mathbf{m}_{1},\mathbf{n}_{1}$ be as in Theorem 9.18. From (9.63) and the pigeonhole principle, we may find $n_{0}\in \mathbb{Z}/p\mathbb{Z}$ such that
which by the boundedness of the expectation implies
and thus we may find a $1$ -bounded function $b_{1}:\mathbb{Z}/p\mathbb{Z}\rightarrow \mathbb{C}$ such that
Writing $b_{2}(n):=f(n_{0}+a_{1}+n)$ and $b_{3}(n):=\overline{f}(n_{0}+\mathbf{m}_{1})e(-\unicode[STIX]{x1D709}_{1}\mathbf{m}_{1})$ , we may simplify this as
Using the Cauchy–Schwarz inequality (Lemma 2.1) to eliminate the $b_{3}(\mathbf{m}_{1})$ factor, we conclude that
where $\mathbf{n}_{1}^{\prime }$ is an independent copy of $\mathbf{n}_{1}$ . Writing $\mathbf{k}:=\mathbf{n}_{1}+\mathbf{n}_{1}^{\prime }-\mathbf{m}_{1}$ , and noting from the local bilinearity of $\unicode[STIX]{x1D6EF}$ that
we conclude that
where $b_{3},b_{4}:\mathbb{Z}/p\mathbb{Z}\times \mathbb{Z}/p\mathbb{Z}\rightarrow \mathbb{C}$ are the $1$ -bounded functions
and
For fixed $\mathbf{n}_{1},\mathbf{n}_{1}^{\prime }$ , we see from Lemma 4.4 that $\mathbf{k}$ differs from $\mathbf{m}_{1}$ in total variation by $O(\unicode[STIX]{x1D702}^{100C_{1}})$ , and hence
By the pigeonhole principle, we may thus find $m_{1}\in \mathbb{Z}/p\mathbb{Z}$ such that
Using Cauchy–Schwarz (Lemma 2.1) to eliminate $b_{4}(\mathbf{n}_{1}^{\prime },m_{1})$ , and using the local bilinearity of $\{\,,\}$ , we conclude that
where $\mathbf{l}_{1}$ is an independent copy of $\mathbf{n}_{1}$ ; using a further application of Cauchy–Schwarz (Lemma 2.1) to eliminate $b_{3}(\mathbf{n}_{1},m_{1})\overline{b_{3}}(\mathbf{l}_{1},m_{1})$ , we conclude that
where $\mathbf{l}_{1}^{\prime }$ is an independent copy of $\mathbf{n}_{1}^{\prime }$ (thus $\mathbf{n}_{1},\mathbf{n}_{1}^{\prime },\mathbf{l}_{1},\mathbf{l}_{1}^{\prime }$ are jointly independent and drawn regularly from $B(S_{1},\unicode[STIX]{x1D70C}_{6})$ ). In particular, by the pigeonhole principle one can find $l_{1},l_{1}^{\prime }\in B(S_{1},\unicode[STIX]{x1D70C}_{6})$ such that
By local bilinearity, one can rewrite $\{\mathbf{n}_{1}-l_{1},\mathbf{n}_{1}^{\prime }-l_{1}^{\prime }\}$ as $\{\mathbf{n}_{1},\mathbf{n}_{1}^{\prime }\}$ plus locally linear functions of $\mathbf{n}_{1}$ and $\mathbf{n}_{1}^{\prime }$ . The claim now follows from Proposition 4.11.◻
9.23 Ninth step: integrating the frequency function
We may now finally prove Theorem 8.1. Let the notation and hypotheses be as in that theorem, let $S_{1}$ and $\unicode[STIX]{x1D6EF}$ be as in Theorem 9.18, and let $k$ be as in Theorem 9.22. Thus if we let $\mathbf{n}_{0},\mathbf{n}_{1},\mathbf{m}_{1}$ be drawn independently and regularly from $B(S,\unicode[STIX]{x1D70C}_{0})$ , $B(S_{1},\unicode[STIX]{x1D70C}_{6})$ , $B(S_{1},\unicode[STIX]{x1D70C}_{5})$ respectively, we have
Now let $\mathbf{n}_{2},\mathbf{m}_{2}$ be drawn independently and regularly from the Bohr sets $B(S_{1},\unicode[STIX]{x1D70C}_{9}),B(S_{1},\unicode[STIX]{x1D70C}_{10})$ respectively, independently of all previous random variables. By Lemma 4.4, we may replace $\mathbf{n}_{1},\mathbf{m}_{1}$ by $\mathbf{n}_{1}+2k\mathbf{n}_{2}$ and $\mathbf{m}_{1}+2k\mathbf{m}_{2}$ in (9.81), leading to
Thus we may find $n_{1}\in B(S_{1},\unicode[STIX]{x1D70C}_{6})$ , $m_{1}\in B(S_{1},\unicode[STIX]{x1D70C}_{5})$ such that
which we can simplify slightly as
where $a_{2}:=a_{1}+m_{1}-n_{1}$ ; since $a_{1}\in B(S,4\unicode[STIX]{x1D70C}_{2})$ , $m_{1}\in B(S_{1},\unicode[STIX]{x1D70C}_{5})$ , $n_{1}\in B(S_{1},\unicode[STIX]{x1D70C}_{6})$ , we have $a_{2}\in B(S,5\unicode[STIX]{x1D70C}_{2})$ . By the local bilinearity of $\unicode[STIX]{x1D6EF}$ , we have
and so we have
where
and
By Theorem 9.22, one has $\Vert k\{\mathbf{n}_{2},\mathbf{m}_{2}\}\Vert _{\mathbb{R}/\mathbb{Z}}\ll \unicode[STIX]{x1D702}^{100C_{1}}$ , and thus
By boundedness of the expectation, this implies that
and thus
for some $1$ -bounded function $H:\mathbb{Z}/p\mathbb{Z}\times \mathbb{Z}/p\mathbb{Z}\rightarrow \mathbb{C}$ . By Cauchy–Schwarz (Lemma 2.1), we thus have
where $\mathbf{m}_{2}^{\prime }$ is an independent copy of $\mathbf{m}_{2}$ ; by a second application of Cauchy–Schwarz (Lemma 2.1), we then have
where $\mathbf{n}_{2}^{\prime }$ is an independent copy of $\mathbf{n}_{2}$ . Since the distributions of $\mathbf{m}_{2},\mathbf{m}_{2}^{\prime }$ are symmetric, we thus have
In particular, with probability $\gg \unicode[STIX]{x1D702}^{4C_{1}+O(1)}$ , the random variable $\mathbf{n}_{0}$ attains a value $n_{0}$ for which
If $n_{0}$ is such that (9.83) holds, then we may apply Theorem 4.12 and conclude that there exists a frequency $\unicode[STIX]{x1D6FD}(n_{0})\in \mathbb{Z}/p\mathbb{Z}$ such that
and thus (defining $\unicode[STIX]{x1D6FD}(n_{0})$ arbitrarily if (9.83) does not hold),
and hence there exists $n_{2}\in B(S_{1},\unicode[STIX]{x1D70C}_{9})$ with
Applying (9.82), we conclude that
where $a_{3}:=a_{2}-2kn_{2}$ ; since $a_{2}\in B(S,5\unicode[STIX]{x1D70C}_{2})$ , $n_{2}\in B(S_{1},\unicode[STIX]{x1D70C}_{9})$ , and $k=O(\exp (K^{O(C_{1})}))$ , we have $a_{3}\in B(S,6\unicode[STIX]{x1D70C}_{2})$ . In particular, by Lemma 4.4, $\mathbf{n}_{0}$ and $\mathbf{n}_{0}+a_{3}$ differ in total variation by $O(\unicode[STIX]{x1D702}^{100C_{1}+O(1)})$ , and thus
Theorem 8.1 then follows after a change of variables, noting that the map $\mathbf{m}_{2}\mapsto \unicode[STIX]{x1D6EF}(\mathbf{m}_{2},\mathbf{m}_{2})$ is locally quadratic on $B(S_{1},\unicode[STIX]{x1D70C}_{9})$ .
Acknowledgements
The first author is supported by a Simons Investigator grant. The second author is supported by a Simons Investigator grant, the James and Carol Collins Chair, the Mathematical Analysis & Application Research Fund Endowment, and by NSF grant DMS-1266164. Part of this paper was written while the authors were in residence at MSRI in spring 2017, which is supported by NSF grant DMS-1440140.
We are indebted to the anonymous referee for helpful corrections and suggestions. Finally, we thank any readers interested in the result of this paper for their patience. Most of the argument was worked out by us in 2005, and the result was claimed in [Reference Green, Tao, Chen, Gowers, Halberstam, Schmidt and Vaughan19], dedicated to Roth’s 80th birthday. While a complete, though not very readable, version has been available on request since around 2012, it has taken us until now to create a potentially publishable manuscript.