1 Introduction
Throughout this paper,
$P \in \mathbb {Z}[\mathrm {n}]$
denotes a polynomial with integer coefficients of some degree
$d \geq 2$
in one indeterminate
$\mathrm {n}$
; a typical case to keep in mind is the quadratic polynomial
$P(\mathrm {n}) = \mathrm {n}^2$
.
Define a measure-preserving system to be a triple
$X = (X,\nu ,T)$
, where
$X = (X,\nu )$
is a
$\sigma $
-finite measure space and
$T \colon X \to X$
is an invertible bimeasurable map which is measure-preserving in the sense that
$\nu (T^{-1}(E)) = \nu (E)$
for all measurable E. It is common in the literature to restrict to finite measure systems and to normalize
${\nu (X)=1}$
; but our results will not require any hypothesis of finite measure. Given functions
$f,g \colon X \to \mathbb {C}$
, a scale
$N \geq 1$
and a weight function
$w \colon \mathbb {N} \to \mathbb {C}$
, we can then define the non-conventional averaging operator

for any
$x \in X$
(see §2 for our averaging notation).
1.1 Unweighted ergodic averages
In the unweighted case
$w=1$
, the following ergodic theorem was recently proven by two of the authors with Mirek.
Theorem 1.1. (Unweighted ergodic theorem [Reference Krause, Mirek and Tao13, Theorem 1.17])
Let
$(X,\nu ,T)$
be a measure-preserving system and let
$f \in L^{p_1}(X)$
,
$g \in L^{p_2}(X)$
for some
$1 < p_1,p_2 < \infty $
with
$({1}/{p_1}) + ({1}/{p_2}) = ({1}/{p}) \leq 1$
.
-
(i) (Mean ergodic theorem) The averages
$\mathrm {A}_{N,1;X}(f,g)$ converge in
$L^p(X)$ norm.
-
(ii) (Pointwise ergodic theorem) The averages
$\mathrm {A}_{N,1;X}(f,g)$ converge pointwise almost everywhere.
-
(iii) (Maximal ergodic theorem) One has
$$ \begin{align*} \| (\mathrm{A}_{N,1;X}(f,g))_{N \in \mathbb{Z}^+} \|_{L^p(X; \ell^\infty)} \lesssim_{p_1,p_2,P} \|f\|_{L^{p_1}(X)} \|g\|_{L^{p_2}(X)}\end{align*} $$
-
(iv) (Variational ergodic theorem) If
$r>2$ and
$\unicode{x3bb}>1$ , one has
$$ \begin{align*} \| (\mathrm{A}_{N,1;X}(f,g))_{N \in \mathbb{D}} \|_{L^p(X; \mathbf{V}^r)} \lesssim_{p_1,p_2,r,P,\unicode{x3bb}} \|f\|_{L^{p_1}(X)} \|g\|_{L^{p_2}(X)}\end{align*} $$
$\mathbb {D} \subset [1,+\infty )$ is finite and
$\unicode{x3bb} $ -lacunary (see §2.6 for the definition of
$\unicode{x3bb} $ -lacunarity and the variational norm
$\mathbf {V}^r\!$ ).
We very briefly review the main ingredients of the proof of Theorem 1.1. Case (iv) is the main estimate, which easily implies the other three claims. By some standard sparsification and transference arguments, as well as dyadic decompositions, it sufficed to prove the variant estimate

where

is the ‘upper half’ of
$\mathrm {A}_{N,w;X}$
when X is the integers
$\mathbb {Z}$
with the usual shift
$T \colon n \mapsto n+1$
and counting measure
$\nu $
.
A crucial observation was that the averages
$\tilde {\mathrm {A}}_{N,1}$
are ‘complexity zero’ in the sense that they are small when the Fourier transform of f or g vanish on ‘major arcs’. Indeed, in [Reference Krause, Mirek and Tao13, Theorem 5.12], the single-scale minor arc estimate

was proven for
$N \geq 1$
,
$l \in \mathbb {N}$
and
$f,g \in \ell ^2(\mathbb {Z})$
with either the Fourier transform
$\mathcal {F}_{\mathbb {Z}} f$
of f vanishing on the major arc set
$\mathcal {M}_{\leq l, \leq -\operatorname {Log} N + l}$
or the Fourier transform
$\mathcal {F}_{\mathbb {Z}} g$
of g vanishing on the major arc set
$\mathcal {M}_{\leq l, \leq -d\operatorname {Log} N + dl}$
; we refer the reader to §2 for the definition of the various terms and symbols introduced here. This minor arc estimate was proven by combining Peluse–Prendiville estimates [Reference Peluse and Prendiville24] with a discrete
$\ell ^p$
improving inequality from [Reference Han, Kovač, Lacey, Madrid and Yang8], together with a Hahn–Banach argument.
Using equation (1.2), one could now focus attention to major arcs. After some routine manipulations involving Ionescu–Wainger multiplier theory [Reference Ionescu and Wainger10], the task reduced to controlling the
$\ell ^p(\mathbb {Z}; \mathbf {V}^r)$
norm of tuples of the form

where
$\mathbb {I}$
is a certain
$\unicode{x3bb} $
-lacunary set (bounded from below by certain bounds, but not from above) and
$F_N, G_N$
are various frequency localizations of
$f,g$
, respectively, to major arcs (see [Reference Krause, Mirek and Tao13, Theorem 5.30] for a precise statement). By estimation of the bilinear symbol of the averaging operator
$\tilde {\mathrm {A}}_{N,1}$
, one could approximate this tuple by another tuple

where
$F,G$
are again some Fourier localizations of
$f,g$
to major arcs and
$\mathrm {B}^{l_1, l_2, m_{\hat {\mathbb {Z}}}}_{(\varphi _N \otimes \tilde \varphi _N) \tilde m_{N,\mathbb {R}}}$
is a certain bilinear Fourier multiplier adapted to major arcs; see [Reference Krause, Mirek and Tao13, Proposition 7.13] for a precise statement. At this stage, it became necessary to split the set
$\mathbb {I}$
of spatial averaging scales into the small scales
$\mathbb {I}_{\leq }$
and large scales
$\mathbb {I}_{>}$
. For the small scales, one could reduce matters to controlling another tuple

for another bilinear Fourier multiplier
$B^{l_1,l_2, m_{\hat {\mathbb {Z}}}}_{m_*}$
and Fourier multipliers
$T^{l_1}_{\varphi _N,t,j_1}$
,
$T^{l_2}_{\tilde {\varphi }_N,t,j_2}$
, while for the large scales, one instead considered tuples of the form

where
$F_{\mathbb {A}}, G_{\mathbb {A}}$
were now defined on the ring
$\mathbb {A}_{\mathbb {Z}} = \mathbb {R} \times \hat {\mathbb {Z}}$
of adelic integers rather than on the integers
$\mathbb {Z}$
. See [Reference Krause, Mirek and Tao13, Theorem 7.28] for a precise statement of the estimates required on these tuples.
In the small-scale case, it was possible to apply a general two-parameter Radamacher– Menshov inequality [Reference Krause, Mirek and Tao13, Corollary 8.2] followed by some shifted Calderón–Zygmund theory [Reference Krause, Mirek and Tao13, Theorem B.1] to reduce matters to obtaining a good
$\ell ^{p_1}(\mathbb {Z}) \times \ell ^{p_2}(\mathbb {Z}) \to \ell ^p(\mathbb {Z})$
estimate for the bilinear multiplier
$\mathrm {B}^{l_1,l_2, m_{\hat {\mathbb {Z}}}}_{m_*}$
(see [Reference Krause, Mirek and Tao13, Lemma 8.6]), which was ultimately proven with the assistance of the minor arc estimate in equation (1.2) and the approximation result in [Reference Krause, Mirek and Tao13, Proposition 7.13].
In the large-scale case, some interpolation and factorization arguments, together with a version of equation (1.2) on the profinite integers
$\hat {\mathbb {Z}}$
, reduced matters to establishing
$L^2(\mathbb {Z}_p) \times L^2(\mathbb {Z}_p) \to L^q(\mathbb {Z}_p)$
bounds on the p-adic averaging operator

for all primes p and some
$q>2$
, with the operator norm required to be bounded by
$1$
for p large enough; see [Reference Krause, Mirek and Tao13, equations (10.3), (10.4)] for a precise statement. The boundedness ultimately came from some distributional analysis of the level sets of P on the p-adics (see [Reference Krause, Mirek and Tao13, Corollary C.2]); getting the bound of
$1$
for large p required some additional refined analysis in which one again uses (a p-adic version of) the minor arc estimate in equation (1.2).
1.2 Möbius-weighted ergodic averages
More recently, another one of us [Reference Teräväinen26] considered the non-conventional averaging operators
$\mathrm {A}_{N,\mu ;X}$
weighted by the Möbius function
$\mu $
instead of
$1$
. Perhaps counter-intuitively, the convergence of ergodic averages weighted by
$\mu $
is actually better than that of the unweighted case, especially in light of the recent progress on quantitative Gowers uniformity of the Möbius function [Reference Green and Tao7, Reference Krause, Mirek and Tao14–Reference Leng, Sah and Sawhney16, Reference Tao and Teräväinen25]. For instance, as a special case of [Reference Teräväinen26, Theorem 1.2], the following result was shown.
Theorem 1.2. (Möbius-weighted ergodic theorem)
Let X have finite measure,
${f \in L^{p_1}(X)}$
,
$g \in L^{p_2}(X)$
with
$({1}/{p_1}) + ({1}/{p_2}) < 1$
, and let
$A>0$
. Then,

pointwise almost everywhere.
The ingredients used to prove Theorem 1.2 are somewhat different from those used to prove Theorem 1.1; a key input was [Reference Teräväinen26, Theorem 4.1], which, in our context, establishes the bound

for all
$1$
-bounded
$f,g,h, \theta $
and some
$1 \leq K \lesssim _d 1$
, where the ‘little’ Gowers uniformity norm
$\|\theta \|_{u^{d+1}[N]}$
is defined as

where Q ranges over all polynomials of degree at most d with real coefficients and . The results of [Reference Green and Tao7] show that
$\|\mu \|_{u^{d+1}[N]}$
decays faster than any power of
$\log N$
, and the claim then follows by standard sparsification and transference arguments.
1.3 Prime-weighted ergodic averages
In this paper, we combine the methods of [Reference Krause, Mirek and Tao13, Reference Teräväinen26], together with some additional arguments, to obtain a non-conventional ergodic theorem in which the weight is selected to be the von Mangoldt function
$\Lambda $
, defined by

More specifically, we show the following.
Theorem 1.3. (Main theorem)
Let
$(X,\nu ,T)$
be a measure-preserving system and let
$f \in L^{p_1}(X)$
,
$g \in L^{p_2}(X)$
for some
$1 < p_1,p_2 < \infty $
with
$({1}/{p_1}) + ({1}/{p_2}) \leq 1$
. Then, the averages
$\mathrm {A}_{N,\Lambda ;X}(f,g)$
converge pointwise almost everywhere. In fact, one has the variational estimate

whenever
$\unicode{x3bb}> 1$
,
$p \geq 1$
and
$r>2$
with
$({1}/{p_1}) + ({1}/{p_2}) = ({1}/{p})$
, and
$\mathbb {D}\subset [1,+\infty )$
is finite and
$\unicode{x3bb} $
-lacunary.
The range of r here is optimal, as will be mentioned in §6.4. It is possible to extend the range of
$(p_1,p_2)$
slightly beyond duality, see the discussion in §6.3.
Using the fact that
$\log n=\log N+O(\log M)$
for
$n\in [N/M,N]$
and the prime number theorem, we have the following immediate corollary to Theorem 1.3.
Corollary 1.4. Let the assumptions be as in Theorem 1.3. Then, the prime-weighted averages

converge pointwise almost everywhere.
Previously, the pointwise convergence of ergodic averages over the primes was known only in the case of a single polynomial iterate. This case was established by Bourgain [Reference Bourgain2] and Wierdl [Reference Wierdl27] for linear polynomials (with the latter work allowing
$L^q$
functions for any
$q>1$
), and the case of an arbitrary single polynomial iterate was handled by Nair [Reference Nair21, Reference Nair22]. We also mention that the problem of pointwise convergence of ergodic averages with more than one iterate was discussed by Frantzikinakis in [Reference Frantzikinakis3, Problem 12]; the specific problem there about two linear iterates however remains open.
Let us also mention that the norm convergence of non-conventional ergodic averages is now known for any number of polynomial iterates, thanks to the works of Frantzikinakis, Host and Kra [Reference Frantzikinakis, Host and Kra4], and Wooley and Ziegler [Reference Wooley and Ziegler28].
1.4 Methods of proof
From a high-level perspective, Theorem 1.3 is proven by combining the methods used in [Reference Krause, Mirek and Tao13] to prove Theorem 1.1 with the methods used in [Reference Teräväinen26] to prove Theorem 1.2. However, several technical difficulties make the analysis delicate in places, as we shall now discuss.
The first issue arises when trying to approximate various frequency-localized averages (analogous to equation (1.3), but with the weight
$1$
replaced by
$\Lambda $
) by certain bilinear model operators (analogous to equation (1.4), but with the symbol
$m_{\hat {\mathbb {Z}}}$
replaced by a variant
$m_{\hat {\mathbb {Z}}^\times }$
). It is important for the arguments in [Reference Krause, Mirek and Tao13] that the error in this approximation gains a polynomial factor
$N^{-c}$
in N, or at least a quasipolynomial factor
$\exp (-\log ^c N)$
. Using the von Mangoldt function as a weight, this is possible in the absence of Siegel zeroes (and, in particular, assuming the generalized Riemann hypothesis); however, the presence of a Siegel zero near a given scale N requires one to add a scale-dependent correction term to the bilinear symbol
$m_{\hat {\mathbb {Z}}}$
to obtain a satisfactory approximation at small scales. While this correction term is ultimately manageable because of the Landau–Page theorem, it significantly complicates the analysis, in that one cannot simply repeat arguments from [Reference Krause, Mirek and Tao13] verbatim. See §6 for further discussion.
To avoid this issue, we adapt some ideas from [Reference Tao and Teräväinen25] and swap the von Mangoldt weight
$\Lambda $
early in the argument with an approximant
$\Lambda _N$
that is not sensitive to Siegel zeroes. The arguments used in [Reference Teräväinen26] to establish Theorem 1.2 allow one to do so provided that one has good control of the little Gowers uniformity norm in the sense that

for some large A. One available choice of approximant is the Cramér(–Granville) approximant

for a suitable parameter w and
$W=\prod _{p\leq w}p$
(we end up selecting
for some large constant
$C_0$
); the required bounds follow, for instance, from the results in [Reference Matomäki, Shao, Tao and Teräväinen18] (which even extend to shorter intervals). A useful fact, first observed in [Reference Tao and Teräväinen25] and refined further here, is that these approximants are stable in Gowers uniformity norms with respect to the w parameter; see Lemma 4.5 for a precise statement.
After using the arguments from [Reference Teräväinen26] to replace
$\Lambda $
by
$\Lambda _N$
, most of the arguments of [Reference Krause, Mirek and Tao13] proceed with only minor changes; in particular, the analogue of the approximation of equation (1.3) by equation (1.4) is fairly routine, thanks in large part to the fundamental lemma of sieve theory; see the proof of Proposition 3.4 in §5. We remark that Siegel zeroes play no role whatsoever in establishing this proposition, in contrast to what would have occurred if we retained the original weight
$\Lambda $
instead of
$\Lambda _N$
. However, three components of the argument of Theorem 1.3 still require some additional care. The first is a polynomial improving estimate

for
$p\in (2-c_P,2]$
, with
$c_P>0$
small (see Lemma 5.1). This is eventually reduced to the analogous unweighted improving estimate using some properties of the Cramér approximant, in particular, Corollary 4.4.
The second component is the p-adic estimates, in which the averaging operator in equation (1.5) ends up being replaced by the variant

It is necessary to bound the
$L^2(\mathbb {Z}_p) \times L^2(\mathbb {Z}_p) \to L^q(\mathbb {Z}_p)$
norm of this operator by exactly the constant
$1$
when
$q>2$
is close to
$2$
and p is large; losing a multiplicative factor such as
$1+O(1/p)$
would not be acceptable as one needs to multiply these constants over all primes p. Fortunately, the effect of restricting to the invertible elements
$\mathbb {Z}_p^\times $
of
$\mathbb {Z}_p$
is not too severe and the arguments from [Reference Krause, Mirek and Tao13] can be adapted with only a modest amount of effort to avoid any losses of
$O(1/p)$
in the constants.
The most delicate step is to adapt the single-scale estimate in equation (1.2) to the weighted setting. As the Peluse–Prendiville theory is somewhat complicated, our approach is to use the approximation theory from [Reference Teräväinen26] to try to replace the approximant
$\Lambda _N$
with an approximant closer to the constant weight
$1$
. With the theory of the Cramér approximant from [Reference Tao and Teräväinen25], it is not too difficult to replace
$\Lambda _N$
by a Cramér approximant
$\Lambda _{\mathrm{Cram}\acute{\rm e}\mathrm{r}, w}$
for a smaller parameter w, with error terms polynomial in w. However, a technical problem then arises: this approximant is not a pure ‘Type I’ sum of the form
$\sum _{d\mid n} \unicode{x3bb} _d$
for certain well-behaved weights
$\unicode{x3bb} _d$
, preventing one from removing the weight entirely. To resolve this, we appeal to the theory from [Reference Teräväinen26] once more to replace the Cramér approximant
$\Lambda _{\mathrm{Cram}\acute{\rm e}\mathrm{r}, w}$
with a more Fourier-analytic approximant, which we call the Heath-Brown approximant (as it was introduced by him in [Reference Heath-Brown9]). This approximant is defined by

where Q is a parameter of similar size to w and
$c_q$
is a Ramanujan sum; roughly speaking, this approximant is the main term in the Fourier restriction of the von Mangoldt function to major arcs. By using the analysis of the little Gowers uniformity norms of Type I sums from [Reference Matomäki and Shao17], we are able to show that
$\Lambda _{\mathrm{Cram}\acute{\rm e}\mathrm{r}, w}$
is close in these norms to
$\Lambda _{\operatorname {HB},w}$
and then, by the theory from [Reference Teräväinen26] (and a dyadic decomposition), one can replace the former by the latter, at least for the purposes of proving an ‘
$\ell ^\infty $
’ Peluse–Prendiville inverse theorem for weighted averages. As in [Reference Krause, Mirek and Tao13], it is also necessary to obtain a more delicate ‘
$\ell ^2$
’ inverse theorem, which requires a weighted version of the
$\ell ^p$
improving inequality from [Reference Han, Kovač, Lacey, Madrid and Yang8], but this can be achieved by a variant of the arguments just presented.
Remark 1.5. The proof of Theorem 1.3 quickly yields a version of Peluse’s inverse theorem [Reference Peluse23, Theorem 3.3] with prime weights. This was not needed for proving Theorem 1.3 (what we did need was in essence a version with the weight function
$\Lambda _N$
; see Proposition 5.3), but we believe such a result may be of independent interest, so we record it as Theorem 6.1. Some combinatorial applications of this result will be investigated in a future work.
Remark 1.6. We expect the methods of this paper to be applicable also to pointwise convergence of bilinear polynomial ergodic averages weighted by some other weights of arithmetic interest. The exact requirements for the weight are not so easy to axiomatize, but we need the weight to satisfy analogues of equations (3.1)–(3.4), as well as a suitable ‘local-to-global’ factorization over the primes to be able to pass to the adeles. In particular, we expect the methods to be applicable to ergodic averages weighted by the divisor function
$\tau $
, but we will not pursue this problem here.
2 Notation
2.1 General notation
Our notation largely follows [Reference Krause, Mirek and Tao13], though somewhat abridged, as some of the notation in [Reference Krause, Mirek and Tao13] is only used to establish results or arguments that we are treating here as ‘black boxes’.
We use to denote the positive integers and
to denote the natural numbers.
We use to denote the indicator function of a set E. Similarly, if S is a statement, we use
to denote its indicator, equal to
$1$
if S is true and
$0$
if S is false. Thus, for instance,
. We use
$|E|$
to denote the cardinality of a set E and adopt for
$f\colon E\to \mathbb {C}$
the averaging notation

if E is finite and non-empty. We similarly define
$L^p$
norms

for
$0 < p < \infty $
, with the usual convention that
$\|f\|_{L^\infty (E)}$
is the (essential) supremum of f on E. One can extend these averaging conventions to other measurable spaces E of positive finite measure (such as a p-adic group
$\mathbb {Z}_p$
equipped with Haar probability measure), if f (or
$|f|^p$
) is absolutely integrable, in the obvious fashion. When X is equipped with counting measure, we will write
$\ell ^p(X)$
or just
$\ell ^{p}$
in place of
$L^p(X)$
.
Throughout,
$p'$
denotes the dual exponent of
$p\in [1,\infty ]$
, so
$1/p+1/p'=1$
.
If
$f \colon X \to \mathbb {C}$
,
$g \colon Y \to \mathbb {C}$
are functions, we use
$f \otimes g \colon X \times Y \to \mathbb {C}$
to denote the tensor product

2.2 Magnitudes and asymptotic notation
We use the Japanese bracket notation

for any real or complex x. We use
$\lfloor x \rfloor $
to denote the greatest integer less than or equal to x. For any
$N \geq 1$
, we define the logarithmic scale
$\operatorname {Log} N$
of N by the formula

thus
$\operatorname {Log} N$
is the unique natural number such that
$2^{\operatorname {Log} N} \leq N < 2^{\operatorname {Log} N+1}$
.
For any two quantities
$A, B$
, we will write
$A \lesssim B$
,
$B \gtrsim A$
or
$A = O(B)$
to denote the bound
$|A| \leq CB$
for some absolute constant C. If we need the implied constant C to depend on additional parameters, we will denote this by subscripts; thus, for instance,
$A \lesssim _\rho B$
denotes the bound
$|A| \leq C_\rho B$
for some
$C_\rho $
depending on
$\rho $
. We write
$A \sim B$
for
$A \lesssim B \lesssim A$
. To abbreviate the notation, we will sometimes explicitly permit the implied constant to depend on certain fixed parameters (such as the polynomial P) when the issue of uniformity with respect to such parameters is not of relevance. Due to our reliance in some places on tools based on Siegel’s theorem (specifically, Siegel’s theorem is used in [Reference Matomäki, Shao, Tao and Teräväinen18], and we will use results from that paper to establish equation (3.1)), several of the implied constants in our arguments will be ineffective, but we will not track the effectivity of constants explicitly in this paper.
2.3 Algebraic notation
If R is a commutative ring, we use
$R^\times $
to denote the multiplicatively invertible elements of R.
2.4 Number theoretic notation
For any
$N> 0$
,
$[N]$
denotes the discrete interval
. If
$q_1,q_2 \in \mathbb {Z}_+$
, we write
$q_1\mid q_2$
if
$q_1$
divides
$q_2$
. If
$a,q \in \mathbb {Z}_+$
, we let
$(a,q)$
denote the greatest common divisor of a and q, and
$[a,q]$
the least common multiple.
All sums and products over the symbol p will be understood to be over primes; other sums will be understood to be over positive integers unless otherwise specified.
In addition to the von Mangoldt function
$\Lambda (n)$
and Möbius function
$\mu (n)$
already introduced, we will also use the divisor function
and the Euler totient function
.
2.5 Fourier analytic notation
We write for any real
$\theta $
, and also
$\|\theta \|_{\mathbb {R}/\mathbb {Z}}$
for the distance from
$\theta $
to the nearest integer.
For a prime p, we let
$\mathbb {Z}_p$
be the ring of p-adic integers, defined as the inverse limit of the cyclic groups
$\mathbb {Z}/p^j\mathbb {Z}$
for
$j \in \mathbb {N}$
; this is a compact abelian group equipped with a Haar probability measure. Similarly, let
$\hat {\mathbb {Z}}$
be the ring of profinite integers, defined as the inverse limit of the cyclic groups
$\mathbb {Z}/Q\mathbb {Z}$
for all positive integers Q; this is again a compact abelian group with a Haar probability measure, being the direct product of the
$\mathbb {Z}_p$
. We use
$\mathbb {E}_{\mathbb {Z}_p}$
or
$\mathbb {E}_{\hat {\mathbb {Z}}}$
to denote averaging with respect to these compact abelian groups. Finally, we let
denote the ring of adelic integers, which is a locally compact abelian group.
We define some Fourier transforms on various locally compact abelian groups.
-
(i) Given a summable function
$f \colon \mathbb {Z} \to \mathbb {C}$ , the Fourier transform
$\mathcal {F}_{\mathbb {Z}} f \colon \mathbb {R}/\mathbb {Z} \to \mathbb {C}$ is defined by the formula
-
(ii) Given a Schwartz function
$f \colon \mathbb {R} \to \mathbb {C}$ , the Fourier transform
$\mathcal {F}_{\mathbb {R}} f \colon \mathbb {R} \to \mathbb {C}$ is defined by the formula
-
(iii) Given a function
$f \colon \hat {\mathbb {Z}} \to \mathbb {C}$ which is Schwartz–Bruhat in the sense that it factors through a function
$f_Q \colon \mathbb {Z}/Q\mathbb {Z} \to \mathbb {C}$ on a cyclic group, we define the Fourier transform
$\mathcal {F}_{\hat {\mathbb {Z}}} f \colon \mathbb {Q}/\mathbb {Z} \to \mathbb {C}$ by the formula
-
(iv) Given a function
$f \colon \mathbb {A}_{\mathbb {Z}} \to \mathbb {C}$ which is Schwartz–Bruhat in the sense that it factors through a function
$f_Q \colon \mathbb {R} \times \mathbb {Z}/Q\mathbb {Z}$ which is Schwartz in the first variable, we define the Fourier transform
$\mathcal {F}_{\mathbb {A}} f \colon \mathbb {R} \times \mathbb {Q}/\mathbb {Z} \to \mathbb {C}$ by the formula
$\xi \in \mathbb {R}$ , and
$\mathcal {F}_{\hat {\mathbb {A}}}$ vanishing otherwise.
We refer the reader to [Reference Krause, Mirek and Tao13, §4] for a further discussion of the Fourier transform on such locally compact abelian groups as
$\mathbb {Z}$
,
$\mathbb {R}$
,
$\mathbb {Z}_p$
,
$\hat {\mathbb {Z}}$
,
$\mathbb {Z}/Q\mathbb {Z}$
or
$\mathbb {A}_{\mathbb {Z}}$
, and the various intertwining relationships among these transforms.
Given a Schwartz symbol
$m \colon \mathbb {R}/\mathbb {Z} \to \mathbb {C}$
, we define the Fourier multiplier
$\mathrm {T}_m$
on
$\ell ^2(\mathbb {Z})$
by the formula

and, similarly, given a bilinear Schwartz symbol
$m \colon \mathbb {R}/\mathbb {Z} \times \mathbb {R}/\mathbb {Z} \to \mathbb {C}$
, define the bilinear Fourier multiplier
$\mathrm {B}_m$
by the formula

Linear and bilinear multipliers are defined similarly for the other locally compact abelian groups defined here, and obey a certain operator calculus; again, we refer the reader to [Reference Krause, Mirek and Tao13, §4] for details, as we shall largely use facts and arguments about these operators from [Reference Krause, Mirek and Tao13] as ‘black boxes’.
We will need the Ionescu–Wainger Fourier multipliers on major arcs. Again, we shall mostly be using these tools as ‘black boxes’, so their definition and properties are not of critical importance in this paper; however, for sake of completeness, we recall the main definitions from [Reference Krause, Mirek and Tao13]. Given a small parameter
$\rho $
, it is possible to assign a Ionescu–Wainger height
$\mathrm {h}(\alpha )=\mathrm {h}_{\rho }(\alpha ) \in 2^{\mathbb {N}}$
for each
$\alpha \in \mathbb {Q}/\mathbb {Z}$
; see [Reference Krause, Mirek and Tao13, Appendix A]. Using this height, we define the Ionescu–Wainger arithmetic frequency sets

and the Ionescu–Wainger major arcs

thus,
${\mathcal M}_{\leq l, \leq k}$
is the union of arcs
$[\alpha -2^k, \alpha +2^k]$
for
$\alpha \in (\mathbb {Q}/\mathbb {Z})_{\leq l}$
; we will be focused on the regime where k is sufficiently small that these arcs are disjoint, which happens whenever
$k \leq -C_\rho 2^{\rho l}$
. We also use the variants

and

with the convention that
$(\mathbb {Q}/\mathbb {Z})_{\leq -1}$
and
${\mathcal M}_{\leq -1,k}$
are empty.
The Ionescu–Wainger Fourier projection operator
$\Pi _{\leq l, \leq k}$
for any
$(l,k) \in \mathbb {N} \times \mathbb {Z}$
is defined by the formula

where
$\eta $
is a smooth even function supported on
$[-1,1]$
that equals
$1$
on
$[-1/2,1/2]$
. We then define

We refer the reader to [Reference Krause, Mirek and Tao13, §5, Appendix A] for the key properties of these projections, which can be viewed as analogues of Littlewood–Paley projection operators for major arcs.
2.6 Variational norms
A sequence
$1 \leq N_1 < \cdots < N_k$
of positive reals is said to be
$\unicode{x3bb} $
-lacunary for some
$\unicode{x3bb} \geq 1$
if

for all
$1 \leq j < k$
.
For any finite dimensional normed vector space
$(B,\|\cdot \|_B)$
and any sequence
$(\mathfrak a_t)_{t\in \mathbb {I}}$
of elements of B indexed by a totally ordered set
$\mathbb {I}$
, and any exponent
$1 \leq r < \infty $
, the r-variation seminorm is defined by the formula

where the supremum is taken over all finite increasing sequences in
$\mathbb {I}$
and is set by convention to equal zero if
$\mathbb {I}$
is empty.
The r-variation norm for
$1 \leq r < \infty $
is defined by

This clearly defines a norm on the space of functions from
$\mathbb {I}$
to B. If
$B=\mathbb {C}$
, then we will abbreviate
$V^r(\mathbb {I};X)$
to
$V^r(\mathbb {I})$
or
$V^r$
, and
$\mathbf {V}^r(\mathbb {I};X)$
to
$ \mathbf {V}^r(\mathbb {I})$
or
$\mathbf {V}^r$
.
2.7 Gowers norms
In addition to the little Gowers uniformity norm
$u^{d+1}[N]$
defined in equation (1.8), we will also need the full Gowers norm
$U^{d+1}[N]$
defined for functions
$f \colon \mathbb {Z} \to \mathbb {C}$
as

where the
$U^{d+1}(\mathbb {Z})$
norm is defined for finitely supported functions by the formula

where
$\omega = (\omega _1,\ldots ,\omega _{d+1})$
and
${\mathcal C}$
denotes the complex conjugation operator. It is well known that

see, e.g. [Reference Green and Tao5, equation (2.2)].
Similar uniformity norms
$u^{d+1}(I)$
,
$U^{d+1}(I)$
can then be defined for other intervals
$I \subset \mathbb {R}$
than
$[N]$
in the obvious fashion.
3 High-level proof of theorem
We now describe the high-level proof of Theorem 1.3, reducing it to two key statements (Theorem 3.2 and Proposition 3.4) that we will prove in §5. The arguments here will closely follow those of [Reference Krause, Mirek and Tao13], and some familiarity with the arguments in that paper would be highly recommended to follow the text in this section.
In the next section, we shall introduce an approximant
$\Lambda _N \colon \mathbb {N} \to \mathbb {R}$
to
$\Lambda $
(depending on a parameter
$C_0$
) which enjoys the bound

for any
$A>0$
, as well as the pointwise bound

the
$L^1$
bound

and finally the polynomial improving bound

for all
$u_P< p \leq 2$
and
$g \in \ell ^p(\mathbb {Z})$
, with
$u_P < 2$
an exponent depending only on P, and
$C>0$
a constant also depending only on P.
We shall also require further properties of
$\Lambda _N$
in the following as needed. (Our choice of approximant
$\Lambda _N$
will in fact be non-negative and, although this is not crucial, it makes it easier to establish the
$L^1$
bound in equation (3.3) and the improving bound in equation (3.4).)
Arguing as in the proof of [Reference Krause, Mirek and Tao13, Proposition 3.2(i)] (inserting the non-negative weight
$\Lambda $
as necessary), we see that the pointwise convergence claim of Theorem 1.3 follows from the ‘Hölder variational estimate’ in equation (1.9), so we focus now on this estimate. Henceforth, we fix
$p_1,p_2,p,d,P,r,\unicode{x3bb} $
, as well as the finite
$\unicode{x3bb} $
-lacunary set
$\mathbb {D}$
. We allow all constants to depend on
$p_1, p_2, p, d, P, r, \unicode{x3bb} $
(but not on
$\mathbb {D}$
). As in [Reference Krause, Mirek and Tao13, §5], we now select sufficiently large parameters

By a routine application of Calderón’s transference principle [Reference Krause, Mirek and Tao13, Theorem 3.2(ii)], adapted to this weighted setting, it suffices to prove equation (1.9) for the integer shift system
$(\mathbb {Z}, |\cdot |, x \mapsto x-1)$
, endowed with counting measure
$|\cdot |$
. Thus, our task is now to show that

for all
$f \in \ell ^{p_1}(\mathbb {Z})$
and
$g \in \ell ^{p_2}(\mathbb {Z})$
. Arguing as in the proof of [Reference Krause, Mirek and Tao13, Proposition 3.2(iii)] (inserting the weight
$\Lambda $
as needed), it suffices to prove the ‘upper half’

of this estimate, where the averaging operators
$\tilde {\mathrm {A}}_{N,w}$
were defined in equation (1.1).
The next step is to replace the von Mangoldt weight
$\Lambda $
by the approximant
$\Lambda _N$
.
Lemma 3.1. (From
$\Lambda $
to
$\Lambda _N$
)
To prove equation (3.5) (and hence, equation (1.9)), it suffices to show that

Proof. Assuming equation (3.6), from the triangle inequality and the lacunarity of
$\mathbb {D}$
, we see that equation (3.5) reduces to the single-scale estimate

for each
$N \in \mathbb {D}$
.
Using the triangle and Hölder inequalities, the prime number theorem and the hypothesis in equation (3.3), we may bound

so by interpolation (modifying the exponents
$p_1,p_2,p$
as needed), it suffices to prove the
$\ell ^2 \times \ell ^2 \to \ell ^1$
bound

for any
$A>0$
.
We claim that it suffices to prove equation (3.7) when
$f,g$
are supported on intervals of length
$N^d$
. Write

Let
$C=C_P$
be such that
$\{P(n)\colon n\in [N]\}$
is contained in an interval of length
$CN^d$
. Supposing that equation (3.7) holds whenever
$f,g$
are supported on intervals of length
$N^d$
, by the triangle inequality and Cauchy–Schwarz, we have

Assume henceforth that
$f,g$
are supported on intervals of length
$N^d$
in equation (3.7). By translation, we can further assume that g is supported on
$[N^d]$
.
By duality, for some function
$h\in \ell ^{\infty }(\mathbb {Z})$
with
$|h|\leq 1$
, we have

where

is one of the adjoint averaging operators. By Cauchy–Schwarz, the desired estimate in equation (3.7) follows from equation (3.8) if we show that

By equation (3.4) and the triangle inequality, for all
$u_P<q\leq 2$
, we have

However, [Reference Teräväinen26, Theorem 4.1] (i.e. equation (1.7)), the assumption on the support of g and the hypotheses in equations (3.1) and (3.2), we have

for any
$A>0$
. Interpolating equations (3.9) and (3.10), the claim in equation (3.7) follows.
With this lemma, we can now pass to the approximant
$\Lambda _N$
.
We are left with showing equation (3.6). Note from equation (3.3) and the triangle and Hölder inequalities that
$\tilde {\mathrm {A}}_{N,\Lambda _N}$
is bounded from
$\ell ^{p_1}(\mathbb {Z}) \times \ell ^{p_2}(\mathbb {Z})$
to
$\ell ^{p}(\mathbb {Z})$
whenever
${1}/{p_1} + {1}/{p_2} = {1}/{p}$
; the challenge is to estimate all the scales N in
$\mathbb {D}$
simultaneously in
$\mathbf {V}^r$
norm. We can restrict attention to scales
$N \geq C_3$
, since the contribution of the case
$N < C_3$
can be handled just from the Hölder and triangle inequalities. The fact that the weight function
$\Lambda _N$
now depends on N will not significantly impact the arguments that follow.
As in [Reference Krause, Mirek and Tao13, §5], we introduce the Ionescu–Wainger parameter

We use c to denote various small positive constants that can depend on the fixed quantities
$p_1, p_2, d, P, r$
, but do not depend on
$C_0,C_1,C_2,C_3$
(or
$\rho $
). As reviewed in §2.5, this allows us to create major arc sets
$\mathcal {M}_{\leq l, \leq k}$
,
$\mathcal {M}_{l,\leq k}$
for
$l \in \mathbb {N}$
,
$k \in \mathbb {Z}$
, as well as associated Ionescu–Wainger multipliers
$\Pi _{\leq l, \leq k}$
,
$\Pi _{l,\leq k}$
. As in [Reference Krause, Mirek and Tao13, equation (5.8)], we say that the pair
$(l,k)$
has good major arcs if

for some sufficiently large
$C_\rho $
depending only on
$\rho $
. This condition will always be satisfied in practice and will ensure that the intervals
$[\alpha -2^k, \alpha +2^k]$
that comprise
$\mathcal {M}_{\leq l, \leq k}$
in equation (2.2) are disjoint; thus, avoiding any difficulties arising from ‘aliasing’.
In §5, we shall establish the following crucial variant of [Reference Krause, Mirek and Tao13, Theorem 5.12].
Theorem 3.2. (Single scale minor arc estimate)
Let
$N \geq 1$
,
$l \in \mathbb {N}$
, and suppose that
$f,g \in \ell ^2(\mathbb {Z})$
obey one of the following two properties:
-
(i)
$\mathcal {F}_{\mathbb {Z}} f$ vanishes on
$\mathcal {M}_{\leq l, \leq -\operatorname {Log} N+l}$ ;
-
(ii)
$\mathcal {F}_{\mathbb {Z}} g$ vanishes on
$\mathcal {M}_{\leq l, \leq -d\operatorname {Log} N + dl}$ .
Then, one has

As in [Reference Krause, Mirek and Tao13, equation (5.22)], we introduce the scales

and repeat the arguments in [Reference Krause, Mirek and Tao13, §5] all the way to [Reference Krause, Mirek and Tao13, equation (5.25)], inserting the weight
$\Lambda _N$
as needed, to reduce to establishing the bound

for all
$l_1,l_2 \in \mathbb {N}$
, where
.
Now, we fix
$l_1,l_2$
, and (as in [Reference Krause, Mirek and Tao13, equation (5.26)]) introduce the quantity

As in [Reference Krause, Mirek and Tao13, equations (5.27), (5.28)], we introduce the frequency-localized functions

and

for any integers
$-u \leq s_1, s_2 \leq l_{(N)}$
. Arguing as in the text up to [Reference Krause, Mirek and Tao13, Theorem 5.30], inserting the weight
$\Lambda _N$
as necessary, it now suffices to establish the following.
Theorem 3.3. (Variational paraproduct estimates)
Let
$l_1, l_2 \in \mathbb {N}$
,
, let
$f,g \colon \mathbb {Z} \to \mathbb {C}$
be finitely supported and define u by equation (3.11). Let
$s_1,s_2 \geq -u$
, and then let
,
,
be defined respectively by equations (3.12) and (3.13) and

Then,

Repeating the proof of [Reference Krause, Mirek and Tao13, Proposition 5.33], inserting the weight
$\Lambda _N$
as needed, we see that Theorem 3.3 already holds in the ‘high-high’ case where
$s_1,s_2> -u$
and
${p_1=p_2=2}$
. Thus, we may assume that at least one of the statements
$s_1=-u$
,
$s_2=-u$
or
$(p_1,p_2) \neq (2,2)$
holds.
We now begin the arguments in [Reference Krause, Mirek and Tao13, §7]. We introduce the functions

and note that

where

and

Repeating the arguments up to [Reference Krause, Mirek and Tao13, equation (7.7)], we thus see that it suffices to show that the tuple

is ‘acceptable’ in the sense that it has an
$\ell ^{p_0}(\mathbb {Z};\mathbf {V}^r)$
norm of

We introduce the arithmetic symbol
$m_{\hat {\mathbb {Z}}^\times } \colon (\mathbb {Q}/\mathbb {Z})^2 \to \mathbb {C}$
by the formula

for any
$q \in \mathbb {Z}_+$
and
$a_1,a_2 \in \mathbb {Z}$
; this differs from the corresponding symbol
$m_{\hat {\mathbb {Z}}}$
in [Reference Krause, Mirek and Tao13] by restricting n to the primitive residue classes of
$\mathbb {Z}/q\mathbb {Z}$
rather than all residue classes, which is a key effect of weighting by
$\Lambda $
. It is easy to see from the Chinese remainder theorem that
$m_{\hat {\mathbb {Z}}^\times }$
is well defined, in the sense that replacing
$a_1, a_2, q$
by
$k a_1, ka_2, kq$
for any positive integer k does not affect the right-hand side of equation (3.17). Given any Schwartz function
$m \colon \mathbb {R}^2 \to \mathbb {C}$
, we then define the twisted bilinear multiplier operator
$\mathrm {B}^{l_1,l_2,m_{\hat {\mathbb {Z}}^\times }}_{m}(f,g)$
for rapidly decreasing
$f,g \colon \mathbb {Z} \to \mathbb {C}$
by the formula

As in [Reference Krause, Mirek and Tao13, equation (7.9)], we also introduce the continuous symbol
$\tilde m_{N,\mathbb {R}} \colon \mathbb {R}^2 \to \mathbb {C}$
by the formula

and also the cutoff functions

for any integer k and frequency
$\xi \in \mathbb {R}$
, where
$\eta \colon \mathbb {R} \to [0,1]$
is a fixed smooth even function supported on
$[-1,1]$
that equals one on
$[-1/2,1/2]$
.
In §5, we will prove the following analogue of [Reference Krause, Mirek and Tao13, Proposition 7.13].
Proposition 3.4. (Major arc approximation of
$\tilde A_{N,\Lambda _N}$
)
For any
$N \geq 1$
and
$s \in \mathbb {N}$
with
$-\operatorname {Log} N+s \leq -u$
, we have

for all
$\tilde F \in \ell ^{p_1}(\mathbb {Z}), \tilde G \in \ell ^{p_2}(\mathbb {Z})$
.
This is a slightly weaker type of bound than the corresponding result in [Reference Krause, Mirek and Tao13], as the polynomial gain of
$N^{-1}$
has been reduced to the quasipolynomial gain of
$\exp (-\operatorname {Log}^c N)$
. However, this is still good enough to dominate the
$2^{O(\max (2^{\rho l},s))}$
terms, since from [Reference Krause, Mirek and Tao13, equation (7.1)], one has

for all
$N \in \mathbb {I}$
. Because of this, we can repeat the Fourier-analytic arguments in [Reference Krause, Mirek and Tao13, §7] down to [Reference Krause, Mirek and Tao13, Theorem 7.23] with the obvious changes, and reduce to showing the acceptability of the small-scale model tuple

and the large-scale model tuple

where:
-
(i)
and
;
-
(ii)
;
-
(iii)
,
;
-
(iv) the adelic model functions
$F_{\mathbb {A}} \in L^{p_1}(\mathbb {A}_{\mathbb {Z}})$ ,
$G_{\mathbb {A}} \in L^{p_2}(\mathbb {A}_{\mathbb {Z}})$ are defined by the formulae
(3.22)and(3.23)for$x \in \mathbb {R}, y \in \hat {\mathbb {Z}}$ .
We can then repeat the integration by parts arguments in the remainder of [Reference Krause, Mirek and Tao13, §7] (replacing
$m_{\hat {\mathbb {Z}}}$
by
$m_{\hat {\mathbb {Z}}^\times }$
) and reduce to establishing the small-scale model estimate

and the large-scale model estimate

whenever
$1/2 \leq t \leq 1$
and
$j_1,j_2 \in \{-1,0,+1\}$
are such that

where

and

To prove the small-scale argument in equation (3.25), we use the two-dimensional Radamacher–Menshov inequality [Reference Krause, Mirek and Tao13, Corollary 8.2] by repeating the arguments of [Reference Krause, Mirek and Tao13, §8] (replacing
$m_{\hat {\mathbb {Z}}}$
by
$m_{\hat {\mathbb {Z}}^\times }$
), reducing matters to establishing the following single-scale estimate.
Lemma 3.5. (Single-scale estimate)
If
$\tilde F \in \ell ^{p_1}(\mathbb {Z}), \tilde G \in \ell ^{p_2}(\mathbb {Z})$
have Fourier support on
${\mathcal M}_{l_1, \leq -3u}$
and
${\mathcal M}_{l_2, \leq -3du}$
, respectively, then

However, this can be proven by repeating the proof of [Reference Krause, Mirek and Tao13, Lemma 8.6], using Proposition 3.4 in place of [Reference Krause, Mirek and Tao13, Proposition 7.13]; the replacement of
$m_{\hat {\mathbb {Z}}}$
with
$m_{\hat {\mathbb {Z}}^\times }$
makes no difference here, and the slight reduction in strength of Proposition 3.4 from a polynomial gain in N to a quasipolynomial gain in N is similarly manageable.
It remains to establish the large-scale estimate in equation (3.25). We repeat the arguments in [Reference Krause, Mirek and Tao13, §9], replacing
$m_{\hat {\mathbb {Z}}}$
by
$m_{\hat {\mathbb {Z}}^\times }$
, and noting that
$\mathrm {B}_{1 \otimes m_{\hat {\mathbb {Z}}^\times }}$
is the tensor product of the identity and the bilinear operator
$\mathrm {A}_{\hat {\mathbb {Z}}^\times }$
on the profinite integers defined for
$f \colon \mathbb {Z}/Q\mathbb {Z} \to \mathbb {C}$
,
$g \colon \mathbb {Z}/Q\mathbb {Z} \to \mathbb {C}$
for any Q (which one can also view as functions on
$\hat {\mathbb {Z}}$
in the obvious fashion) by the formula

These arguments reduce matters to establishing the following analogue of [Reference Krause, Mirek and Tao13, Theorem 9.9].
Theorem 3.6. (Arithmetic bilinear estimate)
Let
$l \in \mathbb {N}$
and let
$f, g \in L^2(\hat {\mathbb {Z}})$
obey one of the following hypotheses:
-
(i)
$\mathcal {F}_{\hat {\mathbb {Z}}} f$ vanishes on
$(\mathbb {Q}/\mathbb {Z})_{\leq l}$ ;
-
(ii)
$\mathcal {F}_{\hat {\mathbb {Z}}} g$ vanishes on
$(\mathbb {Q}/\mathbb {Z})_{\leq l}$ .
Then, for any
$1 \leq r < ({2d}/({d-1}))$
, one has

Repeating the arguments in [Reference Krause, Mirek and Tao13, §10] up to [Reference Krause, Mirek and Tao13, equations (10.3), (10.4)], using
$\mathrm {A}_{\hat {\mathbb {Z}}^\times }$
in place of
$\mathrm {A}_{\hat {\mathbb {Z}}}$
and Theorem 3.2 in place of [Reference Krause, Mirek and Tao13, Theorem 5.12], we see that it suffices to establish the p-adic bound

for all primes p, together with the improvement

whenever
$1 \leq q < ({2d}/({d-1}))$
and p is sufficiently large depending on q, where the averaging operator
$\mathrm {A}_{\mathbb {Z}_p^\times }$
is defined as

Because
$\mathbb {Z}_p^\times $
has density
$({p-1})/{p}$
in
$\mathbb {Z}_p$
, we have the pointwise bound

from the triangle inequality, where

Hence, equation (3.29) is immediate from [Reference Krause, Mirek and Tao13, equation (10.3)]. It remains to establish equation (3.30). As in [Reference Krause, Mirek and Tao13, §10], we may assume
$2 < q < ({2d}/({d-1}))$
and
$\|f\|_{L^2(\mathbb {Z}_p)} = \|g\|_{L^2(\mathbb {Z}_p)} = 1$
with
$f,g$
non-negative, in which case, our task is to show that

Applying equation (3.31) and the bound
$\|\mathrm {A}_{\mathbb {Z}_p}(|f|,|g|)\|_{L^q(\mathbb {Z}_p)}\leq 1$
from [Reference Krause, Mirek and Tao13, §10] would cost a factor of
$({p}/({p-1}))^q$
, which is not acceptable here (the product
$\prod _p ({p}/({p-1}))$
diverges). Instead, we follow the arguments in [Reference Krause, Mirek and Tao13, §10], decomposing
$f = a + f_0$
,
$g = b + g_0$
, where
$0 \leq a,b \leq 1$
,
$f_0, g_0$
have mean zero, and the ‘energies’

obey
$0 \leq E_f, E_g \leq 1$
and

In the case of
$\mathrm {A}_{\mathbb {Z}_p}$
, we clearly have

(was observed in [Reference Krause, Mirek and Tao13, §10]) so that by linearity, we have

For the averaging operator
$\mathrm {A}_{\mathbb {Z}_p^\times }$
, the situation is slightly more complicated; we have

where
$h \colon \mathbb {Z}_p \to \mathbb {R}$
is the function

Since
$f_0$
has mean zero, h has mean zero as well. Furthermore, from Young’s convolution inequality, one has the bounds

where
$1/q + 1 = 1/2 + 1/r$
.
We now have the decomposition

and hence by the Taylor expansion
$(x+y)^q=x^q+qx^{q-1}y+O(q^2x^{q-2}y^2)$
(as in [Reference Krause, Mirek and Tao13, §10]), we have

Since
$a,b\in [0,1]$
, we can bound
$|ab|^q\leq |ab|^2 = (1-E_f)(1-E_g)$
. Furthermore,
${p}/({p-1}) b h$
has mean zero and
$\mathrm {A}_{\mathbb {Z}_p^\times }(f,g_0)$
has a mean of at most
$\| \mathrm {A}_{\mathbb {Z}_p^\times }(f_0,g_0)\|_{L^1(\mathbb {Z}_p)}$
since
$\mathrm {A}_{\mathbb {Z}_p^\times }(a,g_0)$
has mean zero. We conclude that

By arguing as in [Reference Krause, Mirek and Tao13, §10] (using Theorem 3.2 in place of [Reference Krause, Mirek and Tao13, Theorem 5.12]), we see that if l is any large integer and p is sufficiently large depending on q, we have the estimates

for some
$c_q>0$
depending only on q, and hence, by the arithmetic mean-geometric mean inequality and the hypothesis
$q> 2$
, we have

and the right-hand side is bounded by
$1$
for l and p large enough, as required.
To summarize, to complete the proof of Theorem 1.3, we need to select an approximant
$\Lambda _N$
to the weight
$\Lambda $
at each scale N that obeys the estimates in equations (3.1), (3.2), (3.3) and (3.4), as well as the single scale minor arc estimate in Theorem 3.2 and the major arc approximation in Proposition 3.4. This will be the focus of the next sections.
4 Approximants to the von Mangoldt function
As seen in the previous section, the arguments rely on using an approximant
$\Lambda _N$
to the von Mangoldt function
$\Lambda $
at scale N. There are several plausible candidates for such approximants, including the following.
-
(i)
$\Lambda $ itself.
-
(ii) A Cramér (or Cramér–Granville) approximant
$w \geq 1$ is a parameter.
-
(iii) A Heath-Brown approximant
(4.1)where$Q \geq 1$ is a parameter and
$c_q(n)$ are the Ramanujan sums
(4.2)
Other possibilities for approximants exist, including Goldston–Pintz–Yıldırım type approximants
$(\log R) \sum _{\ell \mid n} \mu (\ell ) \eta (\log \ell /\log R)$
and
$(\log R) (\sum _{\ell \mid n} \mu (\ell ) \eta (\log \ell /\log R))^2$
for suitable level parameters R and smooth cutoffs
$\eta $
, Selberg sieve approximants
$(\sum _{\ell \mid n} \unicode{x3bb} _{\ell })^2$
, or adjustments to several of the previous approximants by a correction term arising from a Siegel zero, but we will not discuss these other options further here.
The choice of option (i) (that is, setting ) is tempting, particularly in view of recent advances in quantitative understanding of functions such as
$\Lambda $
in [Reference Leng15, Reference Tao and Teräväinen25]. However, it turns out that the presence of a Siegel zero would distort the asymptotics of
$\Lambda $
to such an extent that the desired approximation in Proposition 3.4 no longer holds with quasipolynomial error terms in N, which turns out to significantly complicate the analysis (particularly in the small-scale regime, in which one has to modify the Radamacher–Menshov type arguments significantly). See §6 for further discussion.
The choice of option (ii) has the advantage of being non-negative, reasonably well controlled in
$\ell ^\infty $
and also relatively easy to control in Gowers uniformity norms, and so we shall take such a choice for our approximant
$\Lambda _N$
; specifically, we will set

However, there is one aspect in which this approximant
$\Lambda _N(n)$
is not ideal: it is not exactly equal to a ‘Type I sum’
$\sum _{\ell \mid n} \unicode{x3bb} _{\ell }$
, where
$\unicode{x3bb} _{\ell }$
are weights supported on relatively small values of d. The Heath-Brown approximants
$\Lambda _{\operatorname {HB},Q}$
introduced in option (iii) are precisely Type I sums, and so we will switch to those approximants at a certain point in the proof.
To achieve these goals, we will need to collect some basic facts about the Cramér approximants
$\Lambda _{\mathrm{Cram}\acute{\rm e}\mathrm{r}, w}$
and the Heath-Brown approximants
$\Lambda _{\operatorname {HB},Q}$
, which may be of independent interest.
4.1 Bounds on the Cramér approximant
We begin with the Cramér approximant. First, we record an easy uniform bound.
Lemma 4.1. (Uniform bound on Cramér model)
If
$w \geq 1$
, then

for all
$n \in \mathbb {Z}$
.
Proof. This is immediate from the Mertens theorem bound

The Cramér approximant is not easily expressible as an exact Type I sum once w is reasonably large (in particular, larger than
$\operatorname {Log} N$
), but thanks to the fundamental lemma of sieve theory, it can be approximated by such a sum.
Lemma 4.2. (Fundamental lemma of sieve theory)
If
$2 \leq w \leq y \leq N^{1/10}$
, then there exist weights
$\unicode{x3bb} ^\pm _{\ell } \in [-1,1]$
, supported on
$1\leq \ell \leq y$
, such that

for all n, and also

for any interval I of length N. In particular,

Proof. This follows easily from [Reference Iwaniec and Kowalski11, Lemma 6.3].
The fundamental lemma can then be used to give many good estimates for the Cramér model.
Proposition 4.3. (Linear equations in the Cramér model)
Let
$t,m \geq 1$
be integers and let
$N \geq 100$
. Let
$\Omega \subset [-N,N]^d$
be convex, and let
$\psi _1,\ldots ,\psi _t \colon \mathbb {Z}^m \to \mathbb {Z}$
be linear forms

for some
$\dot \psi _i \in \mathbb {Z}^m$
and
$\psi _i(0) \in \mathbb {Z}$
. Assume that the linear coefficients
$\dot \psi _1,\ldots ,\dot \psi _t \in \mathbb {Z}^m$
are all pairwise linearly independent and have magnitude at most
$\exp (\log ^{3/5} N)$
. Suppose that
$1 \leq z_i \leq \exp (\operatorname {Log}^{1/10} N)$
for all
$i=1,\ldots ,t$
. Then, one has

for some
$c>0$
depending only on
$t,m$
, where for each p,
$\beta _p$
is the local factor

where
$\psi _i$
is also viewed as a map from
$(\mathbb {Z}/p\mathbb {Z})^m$
to
$\mathbb {Z}/p\mathbb {Z}$
in the obvious fashion. Furthermore,
$\beta _p$
obeys the bounds

for all primes p (and
$\beta _p=1$
if
$p> \max (z_1,\ldots ,z_t)$
).
Proof. This is essentially [Reference Tao and Teräväinen25, Proposition 5.2] (which relies to a large extent on the fundamental lemma of sieve theory). Strictly speaking, this proposition only covered the case where the
$z_i$
were equal to a single parameter z which was also assumed to be at least
$2$
, but an inspection of the argument shows that it applies without significant difficulty to variable
$z_i$
as well, even if some of the
$z_i$
are as small as
$1$
. The bound in equation (4.4) follows from [Reference Tao and Teräväinen25, equations (5.2), (5.5)] (a slightly weaker bound, which also suffices for our application, can be found in [Reference Green and Tao6, Lemma 1.3]).
Specializing to the
$t=m=1$
case (and noting that the constant coefficients of
$\psi _i$
can be large in Proposition 4.3), we immediately obtain the following corollary.
Corollary 4.4. (Mean value of Cramér)
Let
$N \geq 100$
and
$1 \leq z \leq \exp (\operatorname {Log}^{1/10} N)$
, then

for any interval I of length N. In particular, since
$\Lambda _{\mathrm{Cram}\acute{\rm e}\mathrm{r},z}(n)$
is non-negative, we also have

More generally, if
$1 \leq q \leq z$
and
$a\ (q)$
is a residue class, then

As a more sophisticated application of Proposition 4.3, we record the following improvement of [Reference Tao and Teräväinen25, Proposition 1.2].
Lemma 4.5. (Improved stability of the Cramér model)
If
$1 \leq z,w \leq \exp (\operatorname {Log}^{1/10} N)$
, for any
$d \ge 1$
, one has

for any interval I of length N. In particular, by equation (2.5),

In fact, one can take
$c = 1/2^{d+1}$
in these estimates.
The result in [Reference Tao and Teräväinen25, Proposition 1.2] had an additional term of
$\operatorname {Log}^{-c} N$
on the right-hand side. The removal of this term was already conjectured in [Reference Tao and Teräväinen25, Remark 5.4].
Proof. Without loss of generality, we may assume that
$z \leq w$
. Expanding out the expression
$\| \Lambda _{\mathrm{Cram}\acute{\rm e}\mathrm{r}, w} - \Lambda _{\mathrm{Cram}\acute{\rm e}\mathrm{r}, z} \|_{U^{d+1}(I)}^{2^{d+1}}$
into an alternating sum of
$2^{d+1}$
terms, it suffices to show that

for all choices of parameters
$w_\epsilon \in \{w,z\}$
, where
$\epsilon = (\epsilon _1,\ldots ,\epsilon _{d+1})$
and X is a quantity that is independent of the choice of parameters
$w_\epsilon $
. Applying Proposition 4.3, the left-hand side is

where
$\Omega $
is a certain explicit convex polytope of volume
$\beta _\infty N^{d+2}$
for some constant
$\beta _\infty $
depending only on d, and the local factors
$\beta _p$
are defined by the formula

The local factors
$\beta _p$
are independent of the
$w_\epsilon $
if
$p \leq w$
or
$p> z$
. Thus, by equation (4.4), the product
$\prod _p \beta _p$
can be written as
$Y(1+O(1/z))$
for some Y that is independent of the
$w_\epsilon $
parameters, and the claim follows.
4.2 Bounds on the Heath-Brown approximant
We now turn to the Heath-Brown approximants
$\Lambda _{\operatorname {HB},Q}$
. The nice bounds in
$\ell ^\infty $
or
$\ell ^1$
one has in Lemma 4.1 or Corollary 4.4 are unfortunately not available for this approximant. However, we have reasonable control in other norms such as
$\ell ^2$
, in large part due to a good Type I representation.
Lemma 4.6. (Moment bounds for Heath-Brown approximant)
For any
$Q \geq 1$
, one has the Type I representation

for some weights
$\unicode{x3bb} _{\ell }$
with

In particular, we have the pointwise bound

where
$\tau (n,Q)$
is the truncated divisor function

Furthermore, we have the moment bounds

for any positive integer k and
$N \geq 1$
.
Proof. Applying the standard identity
$c_q(n)=\sum _{\ell \mid (q,n)}\ell \mu (q/\ell )$
and then writing
$q=\ell r$
, we have

We then take

From Rankin’s trick and Mertens’s theorem, for any
$1\leq d\leq Q$
, one has

where we used the Euler product formula and the standard bound
$\zeta (\sigma )\sim {1}/{(\sigma -1)}$
for
$\sigma>1$
to estimate the product over the primes. This gives equation (4.6). The bound in equation (4.7) then follows from the triangle inequality.
Now, we turn to equation (4.8). We may assume that
$Q \geq 100$
, as the claim is trivial otherwise. We allow all implied constants to depend on k. In view of equation (4.7), it suffices to establish the bound

We expand

where
$[a_1,\ldots , a_k]$
is the least common multiple of
$a_1,\ldots , a_k$
.
Now, we apply Rankin’s trick. For
$\ell _i<Q$
, we have
$\ell _i^{1/\langle \operatorname {Log} Q\rangle } = O(1)$
, and thus,

Factorizing into an Euler product, we conclude that

where . Hence, on taking logarithms, it will suffice to show that

From partial summation and the prime number theorem, we have

Moreover, we can use Mertens’s theorem to estimate

Combining these bounds gives the result.
4.3 Comparing the Cramér and Heath-Brown approximants
We have a useful comparison theorem between the Cramér and Heath-Brown approximants.
Proposition 4.7. (Comparison between Cramér and Heath-Brown)
Let
$N \geq 1$
and
$1 \leq w, Q \leq \exp (\operatorname {Log}^{1/20} N)$
, and let
$d \geq 1$
be an integer. Then,

for any interval I of length N. As a consequence, from Lemma 4.5 and the triangle inequality, we also have

whenever
$1 \leq Q_1,Q_2 \leq \exp (\operatorname {Log}^{1/20} N)$
.
Proof. We allow all implied constants to depend on d. In view of Lemma 4.5 and the triangle inequality, it suffices to establish the bound

for any interval I of length N, that is to say, it suffices to show that

for any polynomial
$R(n) = \sum _{j=0}^d \alpha _j (n-n_I)^d$
of degree at most d with some real coefficients
$\alpha _j$
, where
$n_I$
denotes the midpoint of I. By subdividing I into smaller intervals and using the triangle inequality (adjusting the coefficients
$\alpha _j$
as necessary), we may assume without loss of generality that

We can then also assume that Q (and hence N) are large, as the claim is trivial otherwise. In particular,
$\operatorname {Log} N = \operatorname {Log}^{O(1)} Q$
, which in practice will permit us to absorb all logarithmic factors of N in the analysis below.
Fix the polynomial R. We may of course assume without loss of generality that

Applying Lemma 4.2 (with
$w=Q$
and
$y = \exp (\operatorname {Log}^{1/10} N)$
) as well as Lemma 4.6, we thus have

for some weights
$\unicode{x3bb} _{\ell }$
of size
$O(\operatorname {Log}^{O(1)} N) = O(\operatorname {Log}^{O(1)} Q)$
. Applying [Reference Matomäki and Shao17, Proposition 2.1] (after shifting the summation variable by
$n_I$
), we conclude that the polynomial R is major arc in the sense that there exists an integer
$1 \leq q \lesssim Q^{O(1)}$
such that

for all
$1 \leq j \leq d$
. We may assume that
$q\geq Q$
by multiplying q by an integer of size Q if necessary. Thus, one can write
$R(n) = R_0(n) + E(n)$
, where
$R_0$
is a polynomial of degree at most d that is periodic with period q and the error E satisfies
$\sup _{n\in I}|E(n+1)-E(n)|=O( Q^{O(1)}/N)$
.
Set

and thus,
$Q \leq w \lesssim Q^{O(1)}$
. By Lemma 4.5 and the triangle inequality, it will suffice to show that

Breaking up I into intervals J of length
$\sqrt {N}$
and using the slowly varying nature of
$E(n)$
, it suffices to show that

for any interval J of length
$\sqrt {N}$
.
From Corollary 4.4 and the q-periodicity of
$R_0$
, we have

(in fact, the error term is significantly better than this). Using the multiplicativity of the Ramanujan sums
$c_q(\cdot )$
and the fact that
, we have

We thus have

Note that for any natural numbers
$\ell ,a,q$
with
$\ell \nmid q$
, by the geometric sum formula, we have

Therefore, from equation (4.1) and the q-periodicity of
$e(R_0(n))$
, we have

(again, a better error term is available here). Thus, by the triangle inequality, it suffices to show that

By the divisor bound, q has at most
$Q^{o(1)}$
factors, so it will suffice to establish the bound

for each square-free
$\ell \mid q$
with
$\ell \geq Q$
. By the triangle inequality, it suffices to show that

However, from the Plancherel identity (or Bessel inequality) and the fact that
$\ell \leq q$
, one has

and the claim follows from Cauchy–Schwarz (noting from the hypothesis
$\ell \geq Q$
that
$\varphi (\ell ) \gtrsim Q^{1/2}$
, say, so that
$\varphi (\ell )^{1/2} \lesssim \varphi (\ell ) Q^{-1/4}$
).
5 Verifying the properties of the approximant
Recall the definition of
$\Lambda _N$
from equation (4.3). In this section, we verify the properties in equations (3.1), (3.2), (3.3) and (3.4) for
$\Lambda _N$
, and prove Proposition 3.4 and Theorem 3.2 concerning it.
Verifying equations (3.1), (3.2) and (3.3). The bound in equation (3.3) follows from Corollary 4.4, while the bound in equation (3.2) follows from Lemma 4.1. The bound in equation (3.1) follows, for instance, from [Reference Matomäki, Shao, Tao and Teräväinen18, Theorem 1.1(ii)] (and could also be extracted from the earlier arguments in [Reference Matomäki and Shao17]). (Strictly speaking, the results in [Reference Matomäki, Shao, Tao and Teräväinen18] were stated only for
$C_0=10$
, but an inspection of the arguments reveal that they also apply for larger choices of
$C_0$
.)
Verifying equation (3.4). We need the following weighted analogue of [Reference Krause, Mirek and Tao13, Proposition 6.21].
Lemma 5.1. (
$L^p$
improving)
Let
$Q \in \mathbb {Z}[\mathrm {n}]$
be of degree
$d\geq 1$
. If
$2-c_d < p \leq 2$
for some sufficiently small
$c_d>0$
, then

and also for the dual exponent
$p'=p/(p-1)$
, we have

The value of
$c_d$
here could be explicitly computed, but we do not attempt to optimize it here. After Lemma 5.1 has been proven, equation (5.1) together with the non-negativity of
$\Lambda _N$
immediately implies the required estimate in equation (3.4).
Proof. By interpolation (adjusting
$c_d$
as necessary), it suffices to show the second estimate in equation (5.1).
For any polynomial
$Q(\mathrm {n}) \in \mathbb {Z}[\mathrm {n}]$
, we define the averaging operators
$\mathrm {A}^{Q,0}_N, \ \mathrm {A}^Q_N \colon \ell ^p(\mathbb {Z}) \to \ell ^p(\mathbb {Z})$
by the formulae

First, the operators
$\mathrm {A}^Q_N, \mathrm {A}^{Q,0}_N$
are bounded on every
$\ell ^p(\mathbb {Z})$
thanks to equation (3.3) and the triangle inequality. With this notation, it suffices to show that

We can write
$\mathrm {A}^Q_N = \mathrm {A}^Q_{N, \exp (\operatorname {Log}^{1/C_0} N)}$
, where

On the one hand, from Lemma 4.1 and the results in [Reference Han, Kovač, Lacey, Madrid and Yang8] (see also [Reference Krause, Mirek and Tao13, Proposition 6.21]), we have

for any
$2-c < p \leq 2$
(where
$c>0$
depends on d and can vary from line to line). On the other hand, from Lemma 4.5, we have

for any
$1 \leq z \leq w \leq \exp (\operatorname {Log}^{1/C_0} N)$
.
By the Plancherel theorem, this implies that

Interpolating (and reducing c as necessary), we see that if
$2-c \leq p \leq 2$
, then

if
$1 \leq z \leq w \leq \exp (\operatorname {Log}^{1/C_0} N)$
is such that
$w^{1/2} \leq z$
. Summing this bound telescopically for suitable values of
$z, w$
, we conclude from the triangle inequality that

Combining this with the
$w=1$
case of equation (5.3), we obtain the first estimate in equation (5.2).
The second estimate in equation (5.2) follows similarly, except that in the proof, we replace equation (5.3) with

and replace equation (5.4) with

and use the first estimate in equation (5.2).
Proof of Proposition 3.4
Arguing as in the proof of [Reference Krause, Mirek and Tao13, Proposition 7.13], Proposition 3.4 reduces to establishing the symbol estimates

for
$0 \leq j_1,j_2 \leq 2$
,
$\alpha _1 \in (\mathbb {Q}/\mathbb {Z})_{l_1}$
,
$\alpha _2 \in (\mathbb {Q}/\mathbb {Z})_{l_2}$
and
$\xi _1 = O( 2^{s}/N)$
,
$\xi _2 = O( 2^{ds}/N^d)$
, where the symbol
$M_0$
is defined by the formula

As in the proof of [Reference Krause, Mirek and Tao13, Proposition 7.13], the function
$n \mapsto e(\alpha _1 n + \alpha _2 P(n))$
is periodic of some period

In particular, from equation (3.19), one has

and hence q divides W. So the function
$\Lambda _N(n)$
vanishes outside of the primitive residue classes modulo q. Meanwhile, we have

By the triangle inequality, it thus suffices to show for each
$a \in (\mathbb {Z}/q\mathbb {Z})^\times $
that

Evaluating the derivatives, it suffices to show that

where

The function w is smooth with a total variation of
$O( 2^{O(\max (2^{\rho l},s))} N^{j_1+2j_2})$
. Summing (or integrating) by parts as in [Reference Matomäki, Shao, Tao and Teräväinen18, Lemma 2.2(iii)], it suffices to show that

for all intervals I in
$[N,2N]$
. However, this follows from Corollary 4.4.
Proof of Theorem 3.2
The last remaining task is to establish the single-scale estimate in Theorem 3.2. We first recall an application of the Peluse–Prendiville theory.
Proposition 5.2. (Unweighted inverse theorem)
Let
$N \geq 1$
and
$0 < \delta \leq 1$
, and let
$N_0$
be a quantity with
$N_0 \sim N^d$
. Let
$f,g,h \colon \mathbb {Z} \to \mathbb {C}$
be supported on
$[-N_0,N_0]$
with

obeying the lower bound

Then, there exists a function
$F \in \ell ^2(\mathbb {Z})$
with

and with
$\mathcal {F}_{\mathbb {Z}}F$
supported in the
$O(\delta ^{-O(1)}/N)$
-neighbourhood of some rational
$a/b \mod 1 \in \mathbb {Q}/\mathbb {Z}$
with
$b = O(\delta ^{-O(1)})$
such that

Here, we use the inner product .
Proof. See [Reference Krause, Mirek and Tao13, Proposition 6.6].
We now transfer this to the weighted setting, under an additional (mild) largeness hypothesis on
$\delta $
.
Proposition 5.3. (Weighted inverse theorem)
Let
$N \geq 1$
and
$\exp (-\operatorname {Log}^{1/C_0} N) \leq \delta \leq 1$
, and let
$N_0$
be a quantity with
$N_0 \sim N^d$
. Let
$f,g,h \colon \mathbb {Z} \to \mathbb {C}$
be supported on
$[-N_0,N_0]$
, obeying equation (5.6) and the lower bound

Then, the conclusions of Proposition 5.2 hold.
Proof. We may assume that N is sufficiently large depending on the fixed polynomial P, as the claim is easy to establish otherwise.
For any
$1 \leq z \leq w \leq \exp (\operatorname {Log}^{1/C_0} N)$
, we have from Lemmas 4.5, 4.1 and [Reference Teräväinen26, Theorem 4.1] (that is, equation (1.7)) that

In particular, we have

for
$z \in [w/2,w]$
; summing dyadically using the triangle inequality, we conclude that

for any
$1 \leq w \leq \exp (\operatorname {Log}^{1/C_0} N)$
.
The weight
$\Lambda _{\mathrm{Cram}\acute{\rm e}\mathrm{r},w}$
is not quite of Type I form, so we now aim to swap it with the Heath-Brown weight
$\Lambda _{\operatorname {HB},w}$
. From Lemma 4.7, we have

We would like to apply [Reference Teräväinen26, Theorem 4.1] again, but we have the technical issue that
$\Lambda _{\operatorname {HB},w}$
does not quite have a good uniform bound, but is instead only controlled in the
$\ell ^k$
norm for arbitrarily large but finite k. However, from Lemma 4.6 (applied with sufficiently large k) and Chebyshev’s inequality, for any small
$\kappa> 0$
and
$\varepsilon>0$
, we can find an approximation
$\Lambda ^{\prime }_{\operatorname {HB},w}$
to
$\Lambda _{\operatorname {HB},w}$
with

We can use the
$\ell ^1$
norm to control the
$u^{d+1}$
norm; hence, by equation (5.12) and the triangle inequality,

Now, we can apply [Reference Teräväinen26, Theorem 4.1] (and Lemma 4.1) to conclude that

Finally, from the triangle inequality and Cauchy–Schwarz, we can crudely bound

Putting this all together, choosing
$\varepsilon $
to be sufficiently small and
$\kappa $
to be a small multiple of
$w^{-c}$
for a suitable c, we conclude that

for any
$1 \leq w \leq \exp (\operatorname {Log}^{1/C_0} N)$
. In particular, from equation (5.10), we now have

for some
$1 \leq w \lesssim \delta ^{-O(1)}$
. Expanding equation (4.1) and using the triangle inequality and crude bounds, we conclude that

for some
$1 \leq r \leq q \lesssim \delta ^{-O(1)}$
. However, observe the identity

We can thus apply Proposition 5.2 to conclude that

for some function F obeying the conclusions of that proposition. Transferring the plane wave
$e(-r\cdot / q)$
from f to F, we obtain the claim (noting that the denominator b will remain acceptably under control since
$q \lesssim \delta ^{-O(1)}$
).
If we now repeat the arguments of [Reference Krause, Mirek and Tao13, §6.1], using Proposition 5.3 and Lemma 5.1 in place of [Reference Krause, Mirek and Tao13, Proposition 6.6] and [Reference Krause, Mirek and Tao13, Proposition 6.21], respectively, inserting the weights
$\Lambda _N$
in the averaging operators in the obvious fashion, we obtain case (i) of Theorem 3.2. To handle case (ii), we need the following variant of Proposition 5.3.
Proposition 5.4. (Weighted inverse theorem for g)
Under the hypotheses of Proposition 5.3, there exists a function
$G \in \ell ^2(\mathbb {Z})$
with

and with
$\mathcal {F}_{\mathbb {Z}}G$
supported in the
$O(\delta ^{-O(1)}/N^d)$
-neighbourhood of some rational
$a/b \mod 1 \in \mathbb {Q}/\mathbb {Z}$
with
$b=O(\delta ^{-O(1)})$
such that

However, this can be derived from [Reference Krause, Mirek and Tao13, Proposition 6.26] in precisely the same way Proposition 5.3 was derived from [Reference Krause, Mirek and Tao13, Proposition 6.6]. By repeating the remaining arguments of [Reference Krause, Mirek and Tao13, §6.2], one obtains case (ii) of Theorem 3.2.
6 Remarks
6.1 Peluse’s inverse theorem for the primes
As is clear from the previous sections, Peluse’s inverse theorem [Reference Peluse23] was an important ingredient in the proof of the unweighted bilinear ergodic theorem in [Reference Krause, Mirek and Tao13]. In the course of proving Theorem 1.3, we essentially needed a version of this inverse theorem where one of the variables was weighted by the approximant
$\Lambda _N$
; see Proposition 5.3. It is natural to ask if one can also obtain a version of Peluse’s inverse theorem with the von Mangoldt weight
$\Lambda $
. We record here how such a result quickly follows from the arguments used to prove Proposition 5.3.
Theorem 6.1. (Peluse’s inverse theorem with prime weight)
Let
$k,d\in \mathbb {N}$
and
$A>0$
. Let
$N\geq 2$
,
$(\log N)^{-A}\leq \delta \leq 1$
and
$N_0\sim N^d$
. Let
$P_1,\ldots , P_k$
be polynomials with integer coefficients of distinct degrees, with maximal degree d. Let
$h,f_1,\ldots , f_k\colon \mathbb {Z}\to \mathbb {C}$
be functions bounded in modulus by
$1$
and supported on
$[-N_0,N_0]$
. Suppose that

Then, either
$N_0\lesssim _{P_1,\ldots , P_k} \delta ^{-O_d(1)}$
or there exists a positive integer
$q\lesssim _{P_1,\ldots , P_k} \delta ^{-O_d(1)}$
and
$\delta ^{O_d(1)}N\lesssim _{P_1,\ldots , P_k} N'\leq N$
such that

Proof. Fix
$P_1,\ldots , P_k$
; we allow all implied constants to depend on them. Define the polynomial averaging operator

Let
$w_0=\delta ^{-C_d}$
for a large enough constant
$C_d$
. We claim that

and

and

After we have these three estimates, we conclude from equation (6.1) and linearity that

By equations (4.1) and (4.2), the function
$\Lambda _{\operatorname {HB},w_0}$
is a linear combination, with
$1$
-bounded coefficients, of
$O(w_0^3)$
indicators of arithmetic progressions of common difference at most
$w_0$
. Hence, crudely using the triangle inequality, we obtain

for some
$1\leq a\leq q'\lesssim \delta ^{-O_d(1)}$
. However, now the claim of the theorem follows from [Reference Peluse23, Theorem 3.3] after making a change of variables.
We are left with showing equations (6.2), (6.3) and (6.4). The estimate in equation (6.2) follows immediately from [Reference Teräväinen26, Theorem 4.1] and equation (3.1). The estimate in equation (6.3) follows by using Lemmas 4.5, 4.1 and [Reference Teräväinen26, Theorem 4.1] to obtain

for some
$c_d>0$
and any
$z\in [w/2,w]$
,
$1\leq w\leq \exp ((\log N)^{1/10})$
, and then summing this dyadically. For proving equation (6.4), note that from equation (5.14) and [Reference Teräväinen26, Theorem 4.1], we have for any
$\kappa>0, \varepsilon >0$
, the bound

with
$\Lambda ^{\prime }_{\operatorname {HB},w_0}$
obeying equation (5.13). However, from equation (5.13) and the triangle inequality, we now obtain equation (6.4) by taking
$\varepsilon>0$
small enough and
$\kappa =w_0^{-c}$
for a small enough constant c (depending on d). This was enough to complete the proof.
6.2 Siegel zeroes
In this subsection, we mention an alternative approach to Theorem 1.3 based on working with Siegel zeroes. This approach is somewhat more complicated than that implemented above and we shall only sketch it very briefly, leaving the details to the interested reader.
The place in the proof of Theorem 1.3 where passing from the von Mangoldt function
$\Lambda $
to the approximant
$\Lambda _N$
avoided dealing with Siegel zeroes is Proposition 3.4, so we begin by sketching how a variant of Proposition 3.4 can be proven for the weight
$\Lambda $
.
We say that a modulus
$q\geq 2$
is exceptional if there exists a non-principal real Dirichlet character
$\chi _q\pmod q$
such that
$L(s,\chi _q)$
has a real zero
$\beta _q>1-c_0/(\log q)$
, where
$c_0$
is some small absolute constant. We call the corresponding character
$\chi _q$
an exceptional character and we call
$\beta _q$
a Siegel zero. For any given exceptional q, the character
$\chi _q$
and Siegel zero
$\beta _q$
are uniquely determined.
For exceptional characters
$\chi _q$
, we define the arithmetic symbol

and the (weighted) continuous multiplier

where
$\beta _q\in (0,1)$
is the Siegel zero. Then, if we replace in equation (3.18),

the conclusion of Proposition 3.4 holds with the von Mangoldt weight
$\Lambda $
in place of
$\Lambda _N$
. This follows from essentially the same proof as in §5, but using the Landau–Page theorem [Reference Montgomery and Vaughan20, Corollary 11.10] in place of Corollary 4.4.
In the large-scale regime, the error bounds arising from the Siegel–Walfisz theorem remove the need for the above approximation; in the small-scale regime,

further analysis is required to reduce matters to the two-parameter Rademacher–Menshov inequality.
The first observation is the classical fact that there is at most one exceptional character at each dyadic scale:

We let
$q_j$
denote the unique exceptional modulus in
$(2^j,2^{j+1}]$
and abbreviate
$\beta _j = \beta _{q_j}$
.
We then introduce a dyadic decomposition

where

The key novelty then derives from proving the following modified Rademacher– Menshov-type inequality, similar to [Reference Krause, Mirek and Tao13, Lemma 8.2].
Lemma 6.2. Let
$V,W$
be normed vector spaces,
$K,J$
be two positive integers and let
$0<q<\infty $
. Let
$B_j\colon V\times W\rightarrow L^q(X)$
be a family of bilinear operators for
$j\in [J]$
. Let
$\{f_{k}^j\}, \{g_k^j\}$
be sets of functions with
$f_k^j\in V$
and
$g_k^j\in W$
for
$k\in [K]$
and
$j\in [J]$
. Then,

This result may be of independent interest, so we provide a brief proof.
Proof. Set
$a_{k_1,k_2}= \sum _{j \in [J]} B_j(f_{k_1}^j,g_{k_2}^j)$
. By [Reference Krause, Mirek and Tao13, Lemma 8.1], we have

where

Taking

we need to bound

Applying Khintchine’s inequality

we arrive at the following chain of inequalities:

By bilinearity, we may consolidate

putting everything together,

and so we get the result upon telescoping e.g.

6.3 Breaking duality
We briefly remark that one may establish Theorem 1.3 with r-variation restricted to the range
$r> 2 + \epsilon $
for exponents
$p_1,p_2>1$
that satisfy

where
$\epsilon '> 0$
is sufficiently small in terms of
$\epsilon $
; hence, going beyond the duality range.
The single-scale estimate

anchors the argument; equation (6.7) follows from Hölder’s inequality and the improving estimate Lemma 5.1, as per [Reference Krause, Mirek and Tao13, Lemma 11.1]. With equation (6.7) in hand, the proof of [Reference Krause, Mirek and Tao13, Proposition 11.4] can be formally reproduced, with only notational changes arising. We leave the details to the interested reader.
6.4 Sharpness of the variational result
The unboundedness of the quadratic variation along polynomial orbits, namely [Reference Krause, Mirek and Tao13, Proposition 12.1], extends to our context.
Proposition 6.3. Let
$P \in \mathbb {Z}[\mathrm {n}]$
be a non-constant polynomial and let
$0 < p \leq \infty $
. Let
$I \subset \mathbb {N}$
be an infinite set. Then, for every
$C> 0$
, there exists a measure-preserving system
$(X,\mu ,T)$
of total measure 1 and a
$1$
-bounded
$f \in L^{\infty }(X)$
so that

We shall leave the details of the proof of this proposition to the interested reader as it is similar to the proof of [Reference Krause, Mirek and Tao13, Proposition 12.1]. The key additional observation is the equidistribution of

over the primes whenever
$\alpha _1,\ldots ,\alpha _K$
are
$\mathbb {Q}$
-linearly independent and
$P \in \mathbb {Z}[\mathrm {n}]$
is a non-constant polynomial (which follows from Weyl’s criterion and a standard exponential sum estimate for polynomials of primes; see e.g. [Reference Matomäki and Shao17, Theorem 1.3]).
To see why this implies the sharpness of the range of the variational estimate in Theorem 1.3, one may employ the convexity arguments of [Reference Mirek, Trojan and Zorin-Kranich19, §5], taking into account [Reference Mirek, Trojan and Zorin-Kranich19, Proposition 4.1], to obtain the lower bound

6.5 Continuous extensions
From the perspective of density, the primes are ‘full/dimensional’, with a very ‘Fourier-uniform’ measure,
$\Lambda $
. A natural question concerns establishing a continuous analogue of Theorem 1.3, namely the existence of a measure
$\nu $
supported on
$[0,1]$
, with (say) full Fourier dimension,

so that

exists almost everywhere whenever
$f \in L^{p_1}(\mathbb {R})$
and
$g \in L^{p_2}(\mathbb {R})$
with
$p_1,p_2> 1$
and
${1}/{p_1} + {1}/{p_2} \leq 1$
. The key point is establishing a suitable Sobolev inequality, namely

for some
$c> 0$
, whenever
$|f|, |g| \leq 1$
, and
$\hat {f}$
vanishes on
$\{ |\xi | \lesssim 2^{l}/N \}$
and/or
$\hat {g}$
vanishes on
$\{ |\xi | \lesssim 2^l/N^d\}$
.
Estimates of this form in the unweighted setting go back to [Reference Bourgain1], with the strongest estimates recently established by one of us as part of a much more general phenomenon, see [Reference Krause, Mirek, Peluse and Wright12]. This approach relies on PET induction, which suggests that certain Gowers-uniformity conditions might need to be imposed on
$\nu $
; it is unclear how this might interact with dimension, so we leave the problem to the interested reader.
Acknowledgments
We thank the referee for careful reading of the paper. B.K. is supported by an EPSRC New Investigators grant and an ERC Starting grant. T.T. is supported by NSF grant DMS-2347850. J.T. is supported by European Union’s Horizon Europe research and innovation programme under Marie Skłodowska-Curie grant agreement No. 101058904 and ERC grant agreement No. 101162746, and Academy of Finland grant No. 362303.