1. Introduction
The distribution of prime numbers in sparse sets is a central topic in modern analytic number theory. A key motivating question is Landau’s fourth problem, which asks if there are infinitely many prime numbers of the form
$n^2+1$
. This is far beyond the current methods as the set is very sparse: the number of integers up to X of this form is of order
$X^{1/2}$
.
As an approximation to Landau’s question much attention has been given to Gaussian prime numbers
$p=a^2+b^2$
with b restricted to some specific sparse set B. A major breakthrough was achieved by Friedlander and Iwaniec [Reference Friedlander and IwaniecFI98b] who proved that there are infinitely many primes of the form
$a^2+b^4$
, that is, with B being the set of squares. Following this, there have been many variants where b is drawn from a sparse set B, for instance, the papers of Heath-Brown and Li, Pratt, and the current author [Reference Heath-Brown and LiHBL17, Reference MerikoskiMer22, Reference PrattPra20]. We also point out the results of Heath-Brown [Reference Heath-BrownHB01], Li [Reference LiLi21], and Maynard [Reference MaynardMay20] for primes in other polynomial sequences, where Li’s result has the record for the sparsest polynomial sequence with primes, with size of the set being
$X^{43/67 + {\varepsilon}}$
.
Notably, all of the above-mentioned results exploit heavily the structure of the specific sparse set B, leaving open the question of what can be said about an arbitrary sparse set B. In this direction Fouvry and Iwaniec [Reference Fouvry and IwaniecFI97] proved that one can take B with density
$(\log X)^{-C}$
for any
$C >0$
and establish an asymptotic formula for the number of primes
$a^2+b^2 \leq X$
with
$b \in B$
. Using the argument of Fouvry and Iwaniec one would be required to improve upon the famous Siegel–Walfisz theorem to reach sparser sets B. In comparison, our main result obtains unconditionally a power saving in the density of B for the first time.
Theorem 1.1.
There is some (computable)
$\delta > 0$
such that the following holds. If B is a set of integers with
$|B\cap[0,Y]| \gg Y^{1-\delta}$
for all
$Y\gg 1$
, then there are infinitely many primes of the form
$a^2+b^2$
with
$b \in B$
.
This result is a corollary of the following lower bound for the number of such primes, which is weaker than expected by a factor of
$\gg_{\varepsilon} X^{-{\varepsilon}}$
.
Theorem 1.2.
There is some (computable)
$\delta > 0$
such that the following holds for any small
$\eta > 0$
. For all sufficiently large X and for all
$B \subseteq [\eta X^{1/2},(1-\eta)X^{1/2}] \cap {\mathbb{Z}}$
with
$|B| \geq X^{1/2-\delta}$
we have, for any
${\varepsilon} > 0$
,

Deduction of Theorem 1.1 from Theorem 1.2. Let
$\delta > 0$
be small and let B be a set of integers with
$|B\cap[0,Y]| \gg Y^{1-\delta}$
for all
$Y\gg 1$
. Then by the pigeonhole principle for any
${\varepsilon} > 0$
for any sufficiently large Y, there is some
$Y_1 \in [Y^{1-\delta-{\varepsilon}},Y]$
such that
$B_1:=B\cap[Y_1/4,Y_1/2]$
satisfies
$ |B_1| \geq Y_1^{1-\delta-{\varepsilon}}$
. By Theorem 1.2 with
$X=Y_1^2$
and a trivial upper bound for
$a^2+b^2 \leq X^{1-4{\varepsilon} }$
we have

Therefore, for all large Y there exists a prime number
$p=a^2+b^2 \in (Y^{2- 2\delta - 8{\varepsilon}},Y^2]$
with
$b \in B$
, so that, in particular, there are infinitely many primes of the form
$a^2+b^2$
with
$b \in B$
.
Remark 1.3. A back-of-the-envelope estimate shows that it should be possible to establish Theorem 1.2 for some
$\delta \in (1/20,1/10)$
but we have not checked this as it depends on the numerical constants
$c_1,c_2,c_3$
in Lemmas 2.13 and 2.14 as well as optimization of Theorem 3.1: this would require a separate lengthy optimization similar to the arguments in [Reference Heath-BrownHB92]. It is possible to generalize our results to general binary quadratic forms instead of
$a^2+b^2$
by adapting ideas from [Reference Lam, Schindler and XiaoLSX20].
Remark 1.4. The lower bound in Theorem 1.2 is the best that can be hoped for in general with the current technology. In fact, improving the lower bound in Theorem 1.2 to the correct order of magnitude would imply the non-existence of Siegel zeros by a suitable application of Theorem 3.3. The implied constant in the lower bound
$\gg_{\varepsilon}$
in Theorem 1.2 is ineffective but the result can be made effective with
${\varepsilon}=\delta$
for some very small
$\delta >0$
by using the class number formula, so that Theorem 1.1 is effective in all aspects.
Remark 1.5. The argument we give works for
$\eta = X^{-{\varepsilon}}$
for some small
$ {\varepsilon} >0$
. It is possible to extend the proof of Theorem 1.2 to handle sets
$B \subseteq [0,X^{1/2}]$
. The possibility that B or the variable a is restricted to a narrow interval
$[0,X^{1/2-\delta'}]$
for some
$\delta' \leq \delta$
adds only a technical problem, namely, all of the Gaussian primes
$b+ia$
counted lie in a very narrow sector. See Remark 7.7 for further details on how to modify our argument.
In the case that B is unbiased, we are able to obtain an asymptotic formula similar to [Reference Fouvry and IwaniecFI97] for
$\delta < 1/10$
. Let

and define

Then the following result is a corollary of our quasi-explicit formula (cf. Theorem 3.3).
Theorem 1.6.
Let
$\eta > 0$
be small. Let
$C''>0$
be large compared with
$C'>0$
which is large compared with
$C >0$
. Let
$\lambda_b$
be complex coefficients with
$|\lambda_b| \leq X^{o(1)}$
, supported on
$ [\eta X^{1/2}, (1-\eta)X^{1/2}] \cap {\mathbb{Z}}$
, and satisfying

Suppose that for all
$(\log X)^{C''} < q \leq X^{\delta+\eta}$
we have

Then

Remark 1.7. With more work the assumptions that
$|\lambda_b| \leq X^{o(1)}$
and
$\sum_b |\lambda_b| \geq X^{2/5+{\varepsilon}}$
may be replaced (here and later) by

where
$\|\lambda_b\|_p := \big( \sum_{b}|\lambda_b|^p \big)^{1/p}$
. By adapting Harman’s sieve [Reference HarmanHar07] one can get a correct order lower bound for
$1/2-\delta$
for some
$\delta \in (1/8,1/10)$
.
Remark 1.8. We note that
$\rho(p)=1+\chi_4(p)$
with
$\chi_4$
being the unique non-trivial character to modulus 4 and

Also, for all b,

so that the main term is of the same form as in [Reference Fouvry and IwaniecFI97, Theorem 1].
The assumption (1.1) in Theorem 1.6 is very mild since C” can be taken to be large compared with C’ and, thus, (1.1) applies to most instances that come up in nature. For example, we immediately get as a corollary [Reference MerikoskiMer22, Theorem 3] for large k and a weak version of [Reference PrattPra20, Theorem 1.1] where the base is taken to be large with B being the set of numbers missing one digit.
In particular, we get an asymptotic formula if B is a sparse subset of primes.
Corollary 1.9. Let

with
$|B| \geq X^{2/5+{\varepsilon}}$
. Then, for any
$C>0$
,

The assumption (1.1) in Theorem 1.6 may be replaced by a wide zero-free region for Hecke L-functions (cf. §2.7 for the relevant notation). For the statement, we let
${\mathcal{M}}(u)$
denote the smallest integer m with
$u|m$
.
Theorem 1.10.
For all
$C > 0$
, there is some
$C'>0$
such that the following holds. Let
$\lambda_b$
be complex coefficients with
$|\lambda_b| \leq X^{o(1)}$
, supported on
$ [\eta X^{1/2},(1-\eta)X^{1/2}] \cap {\mathbb{Z}}$
, and satisfying

Assume that Hecke L-functions
$L(s,\xi_k \chi)$
on
${\mathbb{Q}}(i)$
with
$|k| \leq X^\eta$
and modulus
${\mathcal{M}}(u) \leq X^{1/10+\eta}$
have no zeros in the region

Then

1.1 Overview of the proof
Our goal is to estimate

where
$|B| = X^{1/2-\delta}$
. It is immediately apparent that potential Siegel zeros can cause a problem, since if
$\chi_1 \in \widehat{({\mathbb{Z}}/q_1{\mathbb{Z}})^\times}$
is an exceptional character (necessarily quadratic) to modulus
$q_1 \leq X^{\delta}$
and if
$B \subseteq q_1 {\mathbb{Z}}$
, then for
$b \in B$

implying that the sequence
$a^2+b^2$
is biased towards numbers with an even number of prime factors. In this case we would expect the main term for (1.4) to be multiplied essentially by
$L(1,\chi_1)$
. Note that the function

defines a Dirichlet character on
$({\mathbb{Z}}[i]/q_1{\mathbb{Z}}[i])^\times$
.
Similarly as in the proof of Linnik’s theorem [Reference LinnikLin44], our argument splits into two cases depending on whether there is a Siegel zero or not. We choose a small parameter
${\varepsilon}_1>0$
and we will take
$\delta$
to be small in terms of
${\varepsilon}_1$
. If there is a zero

then we can use a similar argument as in [Reference Friedlander and IwaniecFI10, Chapter 24.2] to give a lower bound for primes of the form
$a^2+b^2$
, using just Type I information. This works for a certain fixed
${\varepsilon}_1$
once
$\delta>0$
is sufficiently small.
For simplicity, let us then assume in this sketch that we are in the situation of Theorem 1.10, that is, for Dirichlet characters
$\chi$
to moduli
$u \in {\mathbb{Z}}[i]$
with
${\mathcal{M}}(u) \leq X^{ \delta+\eta}$
Hecke L-functions on
${\mathbb{Q}}(i)$
have no zeros in the wider region (1.3). By Vaughan’s identity, evaluating (1.4) is reduced to estimating

and for
$MN=X$

where
$\alpha,\beta$
denote bounded coefficients.
For the Type I sums (1.5) the argument goes back to [Reference Fouvry and IwaniecFI97] and we get an asymptotic formula for
$D=X^{1-\delta-o(1)}$
by applying Poisson summation for the free variable a and the quadratic large sieve (cf. Lemma 2.12). For small
$\delta$
the exponent of distribution approaches
$1-o(1)$
, so that we only need very little parity breaking Type II information.
For the Type II sums (1.6) it suffices to consider the case when
$\beta=\mu$
, the Möbius function. Similar to [Reference Friedlander and IwaniecFI98b], by unique factorization in
${\mathbb{Q}}(i)$
we essentially have, for
$w,z \in {\mathbb{Z}}(i)$
,

and get

After using Cauchy–Schwarz similarly to [Reference Friedlander and IwaniecFI98b] we essentially need to show cancellation in

The goal is to evaluate the sum
$T_B(z_1,z_2)$
with a main term
$M_B(z_1,z_2)$
and then bound

by catching the oscillations from
$\mu(|z|^2)$
. All of the previous works [Reference Friedlander and IwaniecFI98b, Reference Heath-Brown and LiHBL17, Reference MerikoskiMer22, Reference PrattPra20] rely on specific analytic and arithmetic properties of the set B to evaluate the sum
$T_B(z_1,z_2)$
.
For a general sparse set B we first factor out
$b_0=(b_1,b_2)$
to get, assuming for simplicity that
$b_0| {\text{Im}} (z_2 \overline{z_1})$
,

It is crucial to carefully track the dependency on
$b_0$
since it is possible that a large subset of
$B\times B$
has a large common factor
$b_0$
, for instance, if B is in
$q{\mathbb{Z}}$
for some fixed
$q \leq X^{\delta}$
.
To evaluate
$T_B(z_1,z_2)$
use Dirichlet characters to expand the congruence
$b'_1z_2\equiv b'_2z_1 \, ({\text{Im}}(z_2\overline{z_1})/b_0)$
. The contribution from Dirichlet characters with a large conductor
$d \geq X^{\delta+\eta}/b_0$
may be bounded by using the large sieve for multiplicative characters (Lemma 2.11). The Dirichlet characters with a small conductor
$d < X^{\delta +\eta}/b_0$
give the main term
$M_B(z_1,z_2)$
and we can bound (1.7) provided that for any
$f=db_0 \leq X^{\delta+\eta}$
and
$a \in ({\mathbb{Z}}/f {\mathbb{Z}})^{\times}$
, we have

This follows by standard arguments (e.g. using Heath-Brown’s identity for
$\mu$
) from the assumption (1.3), provided that N is sufficiently large compared with f, say,
$N > X^\eta f^3$
. We give a more detailed sketch of the argument for the Type II sums in § 6.1.
Without assuming the zero-free region (1.3), for the Type II sums we need to extract the contribution from the potential exceptional characters before applying Cauchy–Schwarz to
$S(\alpha,\mu)$
. Denoting
$\mu_z:= \mu(|z|^2)$
, we write

where
$\mu^{\#}_z$
is an approximation for
$\mu_z$
and
$\mu^{\flat}_z$
is a function which is balanced along arithmetic progressions (cf. (1.8)). Importantly,
$\mu^{\#}_z$
must be simple enough so that
$S(\alpha,\mu^{\#})$
can be evaluated using only Type I information.
Let
$\chi_j$
for
$j \leq J=(\log X)^{O(1)}$
denote the worst of the possible exceptional characters
$\chi_j$
of conductors
$u_j \in {\mathbb{Z}}[i]$
with
${\mathcal{M}}(u_j) \leq X^{\delta+\eta}$
and denote the normalized correlation of two coefficients
$\alpha,\beta$
by

Then, for our approximation, we essentially choose

The idea is similar in spirit to the dispersion method of Drappeau [Reference DrappeauDra15] which also takes into account contributions from multiple characters. This construction is also motivated by the prime number theorem of Gallagher [Reference GallagherGal70, Theorem 7] and its use by Montgomery and Vaughan to get a power saving for the exceptional set in the binary Goldbach problem [Reference Montgomery and VaughanMV75]. For the function
$\mu^\flat := \mu_z-\mu^{\#}_z$
, we can then unconditionally show that for any
$f \leq X^{\delta+\eta}$
,
$N > X^\eta f^3$
, and
$a \in ({\mathbb{Z}}/f {\mathbb{Z}})^{\times}$

Taking into account the bias
$S(\alpha,\mu^\sharp)$
from the exceptional characters for the Type II sums, we obtain the quasi-explicit formula Theorem 3.3.
The paper is structured as follows. In § 2 we present some basic lemmas. In § 3 we split the proof of Theorem 1.2 into two cases depending on the existence of a Siegel zero (Theorems 3.1 and 3.2) and state the quasi-explicit formula Theorem 3.3. In § 4 we evaluate Type I sums and in § 5 we give a proof of Theorem 1.2 under the assumption that a Siegel zero does exist. In § § 6, 7, and 8 we estimate Type II sums. In § § 9, 11, and 12 we deduce our main theorems.
1.2 Notation and conventions
Our notation and conventions are as follows.
${\mathfrak{a}}, {\mathfrak{b}} ,{\mathfrak{c}} ,{\mathfrak{d}},{\mathfrak{f}},{\mathfrak{m}}, {\mathfrak{n}},{\mathfrak{p}}$
: Ideals of
${\mathbb{Z}}[i]$
, reserving
${\mathfrak{p}}$
for prime ideals.
u, v, w, z: Gaussian integers.
(z, w): Greatest common primary divisor.
$\chi,\psi,u$ : Dirichlet characters
$\chi,\psi \in \widehat{({\mathbb{Z}}[i]/u{\mathbb{Z}}[i])^\times},$
$u \in {\mathbb{Z}}[i]$ .
$\xi_k(z)$ : Character
$\xi_k(z) = (z/|z|)^k=e^{ik\arg z}$ .
${\mathcal{N}}{\mathfrak{n}}$ : Norm of an ideal,
${\mathcal{N}} (z) = |z|^2$ .
${\mathcal{M}}(z)$ : Smallest integer m with
$z | m$ ,
${\mathcal{N}}(z)^{1/2} \leq {\mathcal{M}}(z) \leq {\mathcal{N}}(z)$ .
$k,\ell,m,n,p,q,r,s$ : Integers, reserving p for a prime number.
$\nu_j$ : Small power of X,
$\nu_j= X^{-\eta_j}$ .
$\eta$ : A generic small constant.
C: A generic large constant.
F: A smooth compactly supported function.
$H_N({\mathfrak{n}})$ : Indicator of
${\mathcal{N}}{\mathfrak{n}} \in [N,N(1+\nu_2)]$ for
$\nu_2=X^{-\eta_2}$ . W: Denotes
$X^{1/(\log\log X)^2}$ .
P(W): Denotes
$\prod_{p < W} p$ .
2. Lemmas
2.1 Introducing finer-than-dyadic smooth weights
Here we describe a device that allows us to partition a sum smoothly into finer-than-dyadic intervals. Let
$\nu \in (0,1/10)$
be small (we will use
$\nu=X^{-{\varepsilon}}$
or
$\nu= \log^{-C} X$
), and fix a non-negative
$C^\infty$
-smooth function F supported on
$[1-\nu,1+\nu]$
and satisfying

Suppose that we want to introduce a smooth partition to bound a sum of the form
$\sum_{n \leq N} f_n$
. We can write (using a change of variables
$t \mapsto t/n$
)

so that

Hence, at the cost of a factor
$\nu^{-1} \log N$
, it suffices to consider sums of the form

for
$t \leq 2N$
. Naturally, if the original sum is
$\sum_{n \sim N} f_n$
(dyadic n), then it suffices to consider
$\sum_{n \sim N} f_n F(n/t)$
for
$t \asymp N$
at a cost
$\nu^{-1}$
. The effect is the same as with the usual smooth partition of unity except that we did not need to construct explicitly the partition functions F. We use the notation
$F_t (n):=F(n/t)$
.
We also have the following variant on
${\mathbb{R}}/2 \pi {\mathbb{Z}}$
. Fix
$\nu_1=X^{-\eta_1}$
for some small
$\eta_1 >0$
and let
$G: {\mathbb{R}}/2 \pi {\mathbb{Z}} \to {\mathbb{C}}$
be a non-negative
$C^\infty$
-smooth function supported on
$[-\nu_1,\nu_1]$
, satisfying

Then, for a sum over Gaussian integers, we may write

to obtain a smooth finer-than-dyadic partition in terms of
$\arg z$
.
2.2 Elementary estimates
Lemma 2.1. We have

Proof. Denoting
$c=(a,d)$
and
$a=ca'$
, we have

In handling the weights
$\mathbf{1}_{(n,P(W))=1}$
we can use the following standard bound for exceptionally smooth numbers, which effectively gives a version of the fundamental lemma of the sieve (see, for instance, [Reference TenenbaumTen15, Chapter III.5, Theorem 1]).
Lemma 2.2. For any
$2 \leq Z \leq Y$
, we have

where
$u:= \log Y/\log Z.$
We also require the following elementary bound (see, for instance, [Reference Friedlander and IwaniecFI98a, Lemma 1]).
Lemma 2.3. For every square-free integer n and every
$k \geq 2$
, there exists some
$d|n$
such that
$d \leq n^{1/k}$
and

From this we get the more general version.
Lemma 2.4. For every integer n and every
$k \geq 2$
, there exists some
$d|n$
such that
$d \leq n^{1/k}$
and

Proof. Write
$n=b_1 b_2^2 \cdots b_{k-1}^{k-1} b_k^k$
with
$b_1, \dots, b_{k-1}$
square-free, by letting
$b_k$
be the largest integer such that
$b_k^k | n$
, so that
$n/b_k^k$
is k-free and splits uniquely into
$b_1 b_2^2 \cdots b_{k-1}^{k-1}$
with
$b_j$
square-free. We have

By Lemma 2.3 for all
$j \leq k-1$
there is some
$d_j|b_j$
with
$d_j \leq b_j^{1/k}$
and
$\tau(b_j) \leq 2^k\tau(d_j)^k$
. Hence, for
$d=d_1\cdots d_{k-1} b_k$
, we have

and

2.3 Sieve bounds
The following version of the fundamental lemma of the sieve follows from applying [Reference Friedlander and IwaniecFI10, Theorem 6.9] to the real and the imaginary parts of
$a_n$
.
Lemma 2.5 (Fundamental lemma of the sieve). Let
${\mathcal{A}}=(a_n)$
be a sequence of complex coefficients and let
$\kappa \geq 0$
,
$Z \geq 2,$
and
$D \geq Z^{9\kappa +2}$
. Suppose that for some
${\mathbb{X}}$
and for some real-valued multiplicative function g(d), we have, for all square-free d,

and suppose that, for all p, we have
$0 \leq g(p) < 1$
. Suppose that for some
$K > 1$
we have for all
$W < Z$

Use the notation

Then, for some bounded coefficients
$\lambda_d$
depending only on
$\kappa$
, we have

Remark 2.6. The fact that
$\lambda_d$
depend only on
$\kappa$
and not the sequence
$a_n$
will be important for us since we apply the same sieve to several sequences
$a_n^{(b)}$
indexed by
$b \in B$
and then use the summation over
$b \in B$
while bounding the remainder
$\sum_{b \in B}\big|\sum_{d < D} \lambda_d r^{(b)}_d\big|$
(cf. Proposition 4.1). In our set-up, the functions
$g^{(b)}(d)$
and
$V^{(b)}(Z)$
also depend on b with
$g^{(b)}(d)= \mathbf{1}_{(d,2b)=1} \rho(d)/d$
. Thus, we have

where the product over
$p|b,2 < p< Z$
may be completed to all prime factors
$p|b, p\neq 2$
with a negligible error term if Z is not too small.
2.4 Poisson summation
The following lemma gives a truncated version of the Poisson summation formula.
Lemma 2.7 (Poisson summation). Let F be as in § 2.1 for some
$\nu \in (0,1/10)$
and denote
$F_N(n):=F(n/N)$
. Let
$x \gg 1$
and let
$q \sim Q$
be an integer. Let
${\varepsilon} >0$
and denote

Then, for any
$A >0$
,

where
$\hat{f}(h):= \int f(u)e(hu) \,du$
is the Fourier transform.
Proof. By the usual Poisson summation formula we have

For
$|h| > H$
, we have by integration by parts
$j\geq 2$
times

which gives the result.
We will also need the two-dimensional Poisson summation formula.
Lemma 2.8. Let
$F: {\mathbb{R}}^2 \to {\mathbb{C}}$
be a
$C^\infty$
-smooth compactly supported function such that

and let
$ F_{\boldsymbol{N}}({\mathfrak{n}}) := F(n_1/N_1,n_2/N_2)$
. Then

and

2.5 Fourier expansions
We have the following lemma on Mellin transforms, where the construction of the non-negative majorant
$\tilde{F}$
is from the proof of [Reference Iwaniec and KowalskiIK04, Lemma 7.1] (cf. the function g(y), with
$x_m=\log m$
).
Lemma 2.9. Fix a non-negative
$C^\infty$
-smooth function F supported on
$[1-\nu,1+\nu]$
and satisfying

Then, for any
$c \in {\mathbb{R}}$
,

where the Mellin transform is

Furthermore,
$|\dot{F}(it)|$
has a smooth majorant
$\tilde{F}(t)$
such that, for
$m,n\sim M$
,

Similarly, we have the following lemma on Fourier series, which will be applied with the characters

to expand smooth weights
$G(\arg z)$
.
Lemma 2.10. Let
$\theta\in {\mathbb{R}}/2\pi {\mathbb{Z}}$
and let G be a bounded smooth function supported on
$[\theta-\nu,\theta+\nu]$
and satisfying

Then

with

Furthermore, there is a majorant
$\widetilde{G}(k) $
of
$|\check{G}(k)|$
such that

2.6 Large sieve bounds
For Type II sums we need the multiplicative large sieve inequality of Bombieri and Davenport (cf., for instance, [Reference Friedlander and IwaniecFI10, (9.52)]).
Lemma 2.11. For any complex numbers
$\gamma_n$
, we have

For Type I sums we need the large sieve inequality for roots of quadratic congruences (cf. [Reference Fouvry and IwaniecFI97] and, in particular, [Reference Friedlander and IwaniecFI05, Lemma 14.4] for this variant with the twist by
$\overline{q}$
).
Lemma 2.12. Let
$q \geq 1$
. For any complex numbers
$\gamma_n$
we have

2.7 Zeros of Hecke L-functions
We say that a Gaussian integer z is primary if
$z \equiv 1 \ (2(1+i))$
, so that every odd ideal of
${\mathbb{Z}}[i]$
has a unique primary generator. Note that this definition is multiplicative. We extend any function
$\psi:{\mathbb{Z}}[i] \to {\mathbb{C}}$
to a function on odd ideals by defining
$\psi({\mathfrak{a}}):= \psi(z)$
if z is the primary generator of
${\mathfrak{a}}$
. For
$k \in {\mathbb{Z}}$
, we let
$\xi_k$
denote the character

which controls the angular distribution of z. For a Dirichlet character
$\chi \in \widehat{({\mathbb{Z}}[i]/u{\mathbb{Z}}[i])^\times}$
with a modulus
$u \in {\mathbb{Z}}[i]\setminus\{0\}$
we define the Hecke L-function by

For a modulus u we define

We have the following lemmas, where all the constants are effectively computable. We will not need the Deuring–Heilbronn zero repulsion as we deal with the case of a Siegel zero via a different method. The first lemma is classical (cf., for instance, [Reference Iwaniec and KowalskiIK04, Chapter 5]).
Lemma 2.13 (Zero-free region, Landau–Page). There is a constant
$c_1>0$
such that the function
$L_u(s,\xi_k)$
has at most one zero in the region

If such a zero exists, then it is real and simple,
$k=0$
, and it is a zero of some
$L(s,\chi_1)$
with a quadratic character
$\chi_1$
.
We let
$N^\ast(\alpha,T,K,Q)$
denote the number of zeros of
$L(s,\xi_k\chi)$
with primitive characters
$\chi$
,
$|k| \leq K$
, and modulus
$|u|^2 \leq Q$
with
$\sigma \geq \alpha,|t| \leq T$
. The following lemma is a generalization of [Reference GallagherGal70, Theorem 6] to Gaussian integers.
Lemma 2.14 (Log-free zero-density estimate). There is some constants
$c_2,c_3>0$
such that

As a corollary to the zero-free region (Lemma 2.13) we get the following lemma, by taking
$\delta>0$
small enough in terms of
$c_1$
and for two different moduli
$u_1,u_2$
applying Lemma 2.13 with
$u=u_1u_2$
.
Lemma 2.15. Let
$\delta > 0$
be sufficiently small in terms of
$c_1$
. Then there is at most one modulus
$|u_1| \leq X^{2\delta}$
with a primitive character
$\chi_1$
such that
$L(s,\xi_k \chi_1)$
has a zero
$\beta_1 \geq 1-\frac{1}{\sqrt{\delta} \log X}$
. If such a zero exists, then it is real and simple,
$k=0$
, and it is a zero of some
$L(s,\chi_1)$
with a real character
$\chi_1$
.
The following lemma is proved by the same argument as in [Reference Montgomery and VaughanMV07, (11.7)].
Lemma 2.16. Let
$\chi \in \widehat{({\mathbb{Z}}[i]/u{\mathbb{Z}}[i])^\times}$
for some
$|u|^2 \leq Q$
and let
$|k| \leq Q$
. If
$L(s,\xi_k\chi)$
has no zeros counted by
$N^\ast(\alpha,T,K,Q) $
, then for all
$\sigma > (1+\alpha)/2, |t| \leq T, |k| \leq K$
we have

2.8 Character sums
We need the following lemma for computing sums over primitive characters.
Lemma 2.17. For any a we have

where the sum extends over primitive Dirichlet characters of
$({\mathbb{Z}}/d{\mathbb{Z}})^\times$
.
Proof. By the Chinese remainder theorem we have

Thus, it suffices to show that

This follows from

We will need the smoothed Polyá–Vinogradov bound on Gaussian integers, which is a consequence of Poisson summation (Lemma 2.8) and the Gauss sum bound on
${\mathbb{Z}}[i]$
.
Lemma 2.18. Let
$F:{\mathbb{R}}^2 \to {\mathbb{C}}$
be as in Lemma 2.8 and for
$z=x+iy \in {\mathbb{C}}$
define
$F(z) := F(x,y)$
. Let
$\chi$
be a character of modulus
$u \in {\mathbb{Z}}[i]$
. Then

The following lemma is required for the proof of Theorem 1.6.
Lemma 2.19. Let
$k\geq 1$
and let p be a prime number. Let
$\chi \in \widehat{({\mathbb{Z}}[i]/p^k{\mathbb{Z}}[i])^\times}$
with conductor
$p^k$
or
$\pi^{k_1}\bar{\pi}^{k_2}$
with
$k=\max\{k_1,k_2\}$
. Then

Proof. Let
$q:=p^k$
. We have by
$r \mapsto r/s$

The function
$ s \mapsto \overline{\chi}(s)$
defines a multiplicative character over the integers and we let
$q_1|q$
denote its conductor. By expansion of
$\overline{\chi}(s)$
into additive characters we get

We have (writing
$s \mapsto s /p^{\ell}+tp^{k-\ell}, p^\ell|s$
)

Here

Hence, denoting
$q_2=(q_1,p^{k-\ell})$
and making the change of variables
$a\mapsto a q_1/q_2$
, we get

Since
$\chi(t)$
is of conductor
$q_1$
we have the Gauss sum bound [Reference Montgomery and VaughanMV07, Theorem 9.12]

For the second sum we expand the condition
$s \equiv 0 \, (p^\ell)$
with additive characters to get

The function

is an additive character on
${\mathbb{Z}}[i]/q{\mathbb{Z}}[i]$
since
$q_2|p^{k-\ell}$
. If
$\chi$
is of conductor
$\pi^{k_1}\bar{\pi}^{k_2}$
, then we get by the Gauss sum bound (a direct generalization of [Reference Montgomery and VaughanMV07, Theorem 9.12] to
${\mathbb{Z}}[i]$
, denoting
$r:=q/(c,q)$
)

since
$\pi^{k_1}\bar{\pi}^{k_2} | r$
and
$k=\max\{k_1,k_2\}$
implies that
$r=p^k=q$
. We also get the same bound if the conductor of
$\chi$
is
$p^k$
, as then
$\chi$
is primitive. Putting the bounds (2.1) and (2.2) together and using
$q_2=(q_1,p^{k-\ell})$
we get (noting that
$q/\varphi(q) \leq 2$
for
$q=p^k$
)

The previous lemma implies the following.
Lemma 2.20. Let
$b \in {\mathbb{Z}}$
and
$u\in {\mathbb{Z}}[i]$
and let
$Y > |u|^{4}$
. Let
$\chi$
be a primitive character modulo u and let
$v=(u,b)$
. Then, for any integer
$a_0$
,

Proof. We have

Let
$u=m w$
, where m consists of all prime factors
$p\equiv 3 \, (4)$
. Let n denote the smallest integer such that
$w| n$
. The contribution from
$c > Y/mn$
is trivially bounded by

For
$c \leq Y/mn$
, we have

once we show that

To prove this, write

By the Chinese remainder theorem, we get (denoting
$k=\max\{k_1,k_2\}$
)

and the claim follows by Lemma 2.19.
3. Set-up and statement of the quasi-explicit formula
Let
$\lambda_b$
be divisor-bounded coefficients and define

Define the sequences over Gaussian integers
${\mathcal{A}}=(a_z)$
with

Note that
$(z,\overline{z})=1$
implies that
$(z,(1+i))=1$
and for
$z=b+ia$
that
$(a,b)=1$
.
For an ideal
${\mathfrak{n}}= (z)$
we define

so that for any function f on the ideals we have

In the case that there is a Siegel zero for the character
$\chi_1$
, for any finite set of integers B let
$B_1 \subseteq B$
be the largest subset such that for all
$b \in B_1$
we have

and let us denote

We split the proof of Theorem 1.2 into two cases depending on whether there is a Siegel zero
$\beta_1 > 1-{\varepsilon}_1/\log X$
or not and according to the size of
$\Omega(B_1)$
. Theorem 1.2 is then an immediate corollary of the following two theorems.
Theorem 3.1
(Exceptional case) Let
${\varepsilon}_1 \in (0,1/10)$
. Let
$B \subseteq [\eta X^{1/2},(1-\eta)(2X)^{1/2}] \cap {\mathbb{Z}}$
with
$|B| = X^{1/2-\delta}$
and let
$\lambda_b=\mathbf{1}_{B}(b)$
. Suppose that for some
$\chi_1$
of modulus
${\mathcal{M}}(u_1) \leq X^{\delta+\eta}$
the L-function
$L(s,\chi_1)$
has a zero
$\beta_1 > 1-{\varepsilon}_1/\log X$
and that
$\Omega(B_1) \geq \Omega(B)/2$
. Suppose that
$\delta$
is sufficiently small in terms of
${\varepsilon}_1$
. Then we have, for all
${\varepsilon} > 0$
,

Theorem 3.2
(Regular case) Let
${\varepsilon}_1 \in (0,1/10)$
. Let
$B \subseteq [\eta X^{1/2},(1-\eta)(2X)^{1/2}] \cap {\mathbb{Z}}$
with
$|B| = X^{1/2-\delta}$
and let
$\lambda_b=\mathbf{1}_{B}(b)$
. Suppose that the L-functions
$L(s,\chi)$
have no zeros
$\beta > 1-{\varepsilon}_1/\log X$
or that
$\Omega(B_1) \leq \Omega(B)/2$
and suppose that
$\delta$
is sufficiently small in terms of
${\varepsilon}_1$
. Then

In Theorem 3.2 the possible exceptional zero is not a Siegel zero, the potential zeros are only somewhat close to the line
${\text{Re}}(s)=1$
and we show that these zeros can have only a small influence on the main term. This means that in all error terms it suffices to save only a power of
$\log X$
instead of a power of X. Theorem 3.1 is proved in § 5. Theorem 3.2 is proved in § 10 and is a quick consequence of the following result. Recall that
${\mathcal{M}}(u)$
denotes the smallest integer m with
$u|m$
.
Theorem 3.3 (Quasi-explicit formula). Let
$\eta > 0$
be small. Let
$\delta \in (0,1/10)$
and
$X \gg 1$
. For every
$C_1> 0$
, there is some
$C_2>$
such that for some

there is a set of primitive Hecke characters
$\{\xi_{k_j} \chi_j\}_{j \leq J}$
with Dirichlet characters
$\chi_j$
to moduli
$u_j \in{\mathbb{Z}}[i]$
with
${\mathcal{M}}(u_j) \leq X^{\delta+\eta}$
and
$|k_j| \leq X^\eta$
such that the following holds. Let
$\lambda_b$
be coefficients with
$|\lambda_b| \leq X^{o(1)}$
, supported on
$ [\eta X^{1/2},(1-\eta)(2X)^{1/2}] \cap {\mathbb{Z}}$
, and satisfying

Then

The restriction
${\mathcal{M}}(u_j) \leq X^{\delta+\eta}$
instead of a condition involving
$|u_j|$
may appear unusual, but, in fact, it occurs very naturally in the proof of the Type II estimate (Proposition 6.2), where we consider the distribution of Gaussian integers in arithmetic progressions to moduli d which are regular integers, so that
$u_j|d$
. We prove Theorem 3.3 in § 11. It is possible that the range
$\delta < 1/10$
may be improved a bit with further work but we seem to hit a hard barrier at
$\delta=1/6$
, cf. Remark 8.1 for more details. It is also plausible that one could obtain a power saving in the error term by taking into account
$J \leq X^\eta$
bad characters. The factors of
$X^\eta$
in the ranges for
$u_j,k_j,{\text{Im}}(\rho_j)$
may be replaced by
$(\log X)^{C_3}$
for some large
$C_3 > 0$
but this is inconsequential for our applications. We also note that the right-hand side may be expressed as a sum over Gaussian integers by writing

where the last equation follows from the change of variables
$z \mapsto z/u$
and summing over u.
We conclude this section by making elementary reductions for the proof of Theorem 3.3, to reduce to the case when
$\lambda_b=\mathbf{1}_B(b)$
with B in a short interval.
3.1 Reduction to bounded
$\lambda_b$
We can reduce from coefficients satisfying
$|\lambda_b| \leq X^{o(1)}$
to
$|\lambda_b| \leq 1$
as follows. Let
${\varepsilon}> 0$
be small enough so that
$\delta+2{\varepsilon} < 1/10$
and denote

The contribution from
$b \in B_j$
with
$|B_j| \leq X^{1/2-\delta-{\varepsilon}}$
is negligible, since for such j by
$|\lambda_b| \leq X^{o(1)}$

For each j such that
$|B_j|\, > X^{1/2-\delta-{\varepsilon}}$
, we can renormalize
$\lambda_b$
by
$2^{-j}$
to get bounded coefficients which satisfy

Then the general case follows by applying the bounded case for each such j separately with the weights
$\lambda_{b}^{(j)} := 2^{-j}\lambda_b \mathbf{1}_{b \in B_j} $
. Indeed, if we denote the claim in Theorem 3.3 by
$S(\lambda)= \mathrm{M}(\lambda) + O(\mathrm{E}(\lambda))$
, then assuming that the theorem holds for the bounded
$\lambda^{(j)}_b$
, we get by linearity

Thus, it suffices to prove Theorem 3.3 for
$|\lambda_b| \leq 1$
.
3.2 Reduction from
$\lambda_b$
to
$\mathbf{1}_B$
We can reduce the proof from general bounded weights
$\lambda_b$
to the weights of the type
$\mathbf{1}_B(b)$
by a finer-than-dyadic decomposition in terms of the values of
$\lambda_b$
. That is, for
$\nu=(\log X)^{-C}$
we write


Since
$|\lambda_b| \leq 1$
, we obtain a partition

where the contribution from the error term is negligible by crude bounds. We consider
$\mathbf{1}_B$
for

and note that for
$j_1,j_2$
with
$|B(j_1,j_2)| < X^{1/2-\delta-\eta}$
we can bound the contribution trivially. Hence, it suffices to show Theorem 3.3 for
$\lambda_b= \mathbf{1}_B(b)$
.
3.3 Reduction to B in a short interval
Let
$B \subseteq [\eta X^{1/2},(2-\eta)X^{1/2}]$
and split

The contribution from
$B_j$
with
$|B_j| \leq X^{-2\eta} |B|$
is negligible by a crude bound. Thus, we only need to deal with
$|B_j| \geq |B|X^{-2\eta} = X^{1/2-\delta-2\eta}$
. Therefore, it suffices to prove Theorem 3.3 for
$\lambda_b= \mathbf{1}_B(b)$
with

From now on we always assume that
$\lambda_b= \mathbf{1}_B(b)$
for a set B satisfying (3.2).
4. Type I information
For
$b \in B$
and for any function f on the ideals of
${\mathbb{Z}}[i]$
denote

Define

We have Type I information provided by the following proposition.
Proposition 4.1 (Type I information). Let
$\alpha_w$
be divisor-bounded coefficients supported on
$(w,2\overline{w})=1$
. Let
$\chi$
be a Dirichlet character to modulus u and let
$\xi=\xi_k$
with
$|k| \ll X^{\eta/1000}$
. Let
$q := {\mathcal{M}}(u) \leq X^{1/4}$
. Then

The same is true if
$|z|^2 \sim X$
is replaced by
$|z|^2 \in [X',X'(1+\nu)]$
or by a smooth weight
$F(|z|^2/X')$
as in § 2.1 with
$X'\sim X$
and
$\nu \geq X^{-\eta/1000}$
, with the same change applied to the condition
$a \sim (X^2-b^2)^{1/2}$
.
Proof. Let us split d dyadically into
$d \sim D \leq X^{1-\delta-\eta}/q^2$
and denote

We apply § 2.1 with
$\nu=X^{-\eta/80}$
to split a into finer than dyadic ranges to get (bounding the part
$A \leq X^{1/2-\eta}$
trivially using the divisor bound and dropping the condition
$a^2+b^2 \sim X$
)

with

Since
$b+ia$
is odd, we have

for some unit u depending on z modulo
$2(1+i)$
. Since a is restricted to a short interval, for a fixed b the function
$\xi_k(b+ia)$
with
$k \ll X^{\eta/1000}$
is equal to a constant up to a negligible error term and may therefore be dropped. By splitting B into residue classes modulo 4 we may assume that u does not depend on b. Thus, we can split a into congruence classes modulo 4q and
$d=|w|^2$
with
$a \equiv a_0d \, (4q)$
to get, for some unit
$u_{a_0d}$
,

where the sum over
$\nu$
contains just the one element
$\nu=-r/s$
for
$w=r+is$
. Note that
$(a,b)=1$
implies that
$(w,b)=1$
and that
$(w,\overline{w})=1$
, so that
$(r,s)=1$
and we may restrict to
$\nu \in {\mathbb{Z}}/d{\mathbb{Z}}$
in the above.
It suffices to show that

Let us first deal with the cross-condition
$\chi(b+i a_0d)$
between b and d. If
$(b,q)=1$
, we could just make the change of variables
$a_0 \mapsto a_0 b$
and
$\chi(b)$
would factor out. In general, we have for some characters
$\chi_{p^k}$
modulo
$p^k$
:

Denoting

and making the change of variables
$a_0 \mapsto a_0 b/p^\ell$
modulo
$p^k$
we get

Note that the first factor depends only on b and the second factor no longer depends on b but on
$q_1$
. The residue classes
$b/p^\ell$
and
$p^\ell$
combine to unique residue classes
$ \theta_{b,q}$
and
$\gamma_{q_1,q}$
modulo 8q by the Chinese remainder theorem. Thus, we get

Dropping the condition
$(b,q)=q_1$
, using the divisor bound for
$\sum_{q_1 | q} 1 =\tau(q)$
, taking the sum over
$a_0$
to the outside, and absorbing the factor
$\chi(u_{a_0d})\chi(\gamma_{q_1,q}+i a_0d) $
into the coefficient
$\alpha_w$
, it suffices to show that for any
$a_0$
,

We expand the condition
$(a,b)=1$
using the Möbius function and use triangle inequality to get (note that
$(b+ia_0,u)=1$
implies that
$(c,q)=1$
)

where we have partitioned the sum depending on whether
$c \leq X^{\eta/20}$
or
$c > X^{\eta/20}$
. The second sum can be bounded trivially by

By the divisor bound for
$\tau(a^2+b^2)$
and
$\tau(b)$
, we obtain

Hence, it remains show that for any
$a_0$

By writing

it suffices to show that for every
$c \leq X^{\eta/20}$

Applying Poisson summation (Lemma 2.7, note that d, q, c are all pairwise coprime) we get

where for

we have (denoting
$\alpha'_w := \alpha_w({D}/{|w|^2}) $
)

Note that the phase
$e_{4q}(a_0 \,d \theta_{b,q} h\overline{cd}) = e_{4q}(a_0 \theta_{b,q} h\overline{c})$
does not depend on d or
$\nu$
and thus it may be replaced by 1.
By definition,
$g^{(b)}(w)$
matches with
$1/d$
when
$(b,w)=1$
, so that, in fact,
$T(\alpha,A) = 0$
. Thus, it suffices to show that

4.0.1 Bounding
$U(\alpha,A)$
.
The smooth cross-condition
$\widehat{F} (hA/c\,dq)$
may be removed by using the Mellin inversion formula (Lemma 2.9) at a cost of a factor
$ \ll X^{\eta/40}$
. We expand the condition
$(b,d)=1$
using the Möbius function to get

For each c and e, let

for some coefficients

Then, by rearranging the sums, we have

and for (4.2) it suffices to show that

By a divisor bound

By Cauchy–Schwartz and Lemma 2.12, using (4.1), we have

by using

5. Proof of Theorem 3.1
In this section, we prove Theorem 3.1 by following a similar strategy as in [Reference Friedlander and IwaniecFI10, Chapter 24.2]. We first note that by restricting to
$b \in B_1$
(recall (3.1)) we may assume that for all b we have

We define the Dirichlet convolution on ideals of the Gaussian integers as

We define the auxiliary function

which assuming the existence of a Siegel zero for
$L(1,\chi_1)$
is sparsely supported, precisely, for square-free
${\mathfrak{a}}$
it is supported on
${\mathfrak{a}}$
such that for all
${\mathfrak{p}}|{\mathfrak{a}}$
we have
$\chi_1({\mathfrak{p}})=+1$
.
We set
${\mathcal{A}}'=(a'_{{\mathfrak{n}}})$
with
$a'_{{\mathfrak{n}}}= a_{{\mathfrak{n}}} \lambda_1({\mathfrak{n}}) \mathbf{1}_{({\mathfrak{n}},u_1)=1}$
and define the multiplicative functions

denoting
$p= {\mathfrak{p}} \overline{{\mathfrak{p}}}$
, which is well defined since
$\rho(p) \neq 0$
precisely if p splits in
${\mathbb{Z}}[i]$
and
$\chi_1({\mathfrak{p}}) = \chi_1(\overline{{\mathfrak{p}}})$
since
$\chi_1$
is real. We also set

Note that by (5.1) the contribution from the terms with
$\chi_1((b+ia))$
is positive and, therefore, we may drop it to conclude

We define

We note for large primes p the function g’(p) is essentially
$p^{-1}\rho(p)(1+\chi_1({\mathfrak{p}}))$
.
Theorem 3.1 is then a direct corollary of the following and the lower bound
$L(1,\chi_1) \gg_{\varepsilon} |u_1|^{-{\varepsilon}}$
. Note that from the assumption
$\Omega(B_1) \geq \Omega(B)/2$
it follows that
$|B_1| \, \gg_{\varepsilon} |B| X^{-{\varepsilon}}$
since
$\omega(b) = X^{\pm o(1)}$
.
Proposition 5.1. Suppose that the assumptions of Theorem 3.1 hold. Then for
$Z = |u_1|^4$
we have

where

Remark 5.2. Note that by introducing the weight
$\lambda_1({\mathfrak{n}})$
we have already removed all prime factors of
${\mathfrak{n}}$
with
$\chi_1({\mathfrak{p}}) =-1$
, which by the Siegel zero assumption is most of the primes if
${\varepsilon}_1$
is small. Thus, we are in a situation of a low-dimensional sieve as we only need to shift out prime divisors with
$\chi_1({\mathfrak{p}})=+1$
.
We first gather Type I information. Note that
$(a,b)=1$
implies that
${\mathfrak{d}}$
is not divisible by a rational prime.
Lemma 5.3. Denote
$q={\mathcal{M}}(u_1)$
. Let
$\alpha_{\mathfrak{d}}$
be divisor-bounded coefficients supported on square-free
${\mathfrak{d}}$
with
$({\mathfrak{d}},\overline{{\mathfrak{d}}})=1$
. Let
$g_1^{(b)}({\mathfrak{d}}),{\mathbb{Y}}^{(b)}$
be defined similarly as in § 4. Then

Proof. The function
$\lambda_1({\mathfrak{n}})$
is multiplicative and we have similarly to [Reference Friedlander and IwaniecFI10, (24.5)] (denoting
${\mathfrak{n}}={\mathfrak{d}} {\mathfrak{m}}$
)

We have

The contribution from the error term where
${\mathcal{N}} {\mathfrak{m}}/{\mathfrak{c}}=\square$
may be bounded by crude bounds. We have

The contribution from
${\mathcal{N}} {\mathfrak{c}} > X^{\eta}$
can be bounded by crude bounds. We obtain

We now relax the cross-conditions
${\mathcal{N}} {\mathfrak{n}} > {\mathcal{N}} {\mathfrak{c}}{\mathfrak{d}} {\mathfrak{a}}^2$
and
${\mathcal{N}} {\mathfrak{n}} \sim X$
by introducing a finer-than-dyadic decomposition for
${\mathcal{N}} {\mathfrak{n}}$
(using § 2.1 with
$\nu=X^{-\eta/1000}$
) to get for
$X' \sim X$
sums of the type

where

with

Note that
${\mathcal{N}} {\mathfrak{d}} \leq X^{1-2\delta-4\eta}/q^{4}$
,
${\mathcal{N}} {\mathfrak{c}} \leq X^\eta $
, and
${\mathcal{N}} {\mathfrak{c}} {\mathfrak{d}} {\mathfrak{a}}^2 \leq X'$
imply that
${\mathcal{N}} {\mathfrak{c}}{\mathfrak{d}}{\mathfrak{a}} \leq X^{1-\delta-\eta}/q^2 $
. If
${\mathfrak{n}}=(z)$
, then in
$S_2$
the character value

depends only on the residue class of z modulo
$4(1+i) u_1$
(note that
$(a,b)=1$
implies
$2 \nmid z$
). Thus, by Proposition 4.1, we get

with

Note that we have picked up the condition
$({\mathfrak{f}},\overline{{\mathfrak{f}}})$
from
$(a,b)=1$
implicit in
$a_{\mathfrak{n}}$
. We have

and

To evaluate the sum over
${\mathfrak{a}}$
, we have (by applying a Möbius expansion to
$({\mathfrak{a}},\overline{{\mathfrak{a}}})=1$
)

since
$\chi_1(\overline{{\mathfrak{p}}}) = \chi_1({\mathfrak{p}})$
and
$({\mathfrak{d}},\overline{{\mathfrak{d}}})=1$
. Let us denote (noting that the contribution from
$ {\mathcal{N}} {\mathfrak{c}} > X^\eta $
may now be added back in at a negligible cost)

The functions
$g_j$
are multiplicative with

where the last equality holds since
$\chi_1({\mathfrak{p}})^2=1$
. Hence, we have

with

5.1 Proof of Proposition 5.1
We apply a sieve argument to the sequence
${\mathcal{A}}''= (a''_n)$
over integers defined by

Note that then

The proof is essentially the same as in [Reference Friedlander and IwaniecFI10, Proof of Proposition 24.1], but we give it for completeness as it is short. Let
$D:=X^{1-2\delta-4\eta}/q^4$
and denote
$s= \log D / \log Z$
. By Buchstab’s identity we have

By the fundamental lemma of the sieve (Lemma 2.5, see also Remark 2.6) and Lemma 5.3 we have

where

By an upper bound sieve (e.g. using Lemma 2.5) with level
$D/p> X^{1/4}$
and Lemma 5.3 we have

Combining the two estimates we get Proposition 5.1, noting that the bound (5.2) can be proved by a similar argument as in [Reference Friedlander and IwaniecFI10, (24.20)].
Remark 5.4. Instead of taking a small Z and using the fundamental lemma of the sieve, we can take a larger Z and use the linear sieve lower and upper bounds [Reference Friedlander and IwaniecFI10, Theorem 11.12]. Taking
$Z=X^{1/4}$
we get that the lower bound for
$S({\mathcal{A}}'',2X^{1/2})$
is proportional to (denoting the linear sieve functions by f, F)

To make the right-hand side positive we can certainly take any
${\varepsilon}_1 < 1$
and
$\delta = (1- {\varepsilon}_1)/100$
, for instance.
6. Type II information: preliminaries
We now fix small parameters
$\eta,\eta_1,\eta_2,\eta_3 > 0$
such that
$\eta_1$
is small compared with
$\eta$
,
$\eta_2$
is small compared with
$\eta_1,$
and
$\eta_3$
is small compared with
$\eta_2$
, that is,

For instance, for our purposes it would suffice to take any small
$\eta > 0$
and let

We denote

We often refer to these parameters in the course of the following sections.
Let
$G: {\mathbb{R}}/2 \pi {\mathbb{Z}} \to {\mathbb{C}}$
be a non-negative
$C^\infty$
-smooth function supported on
$[-\nu_1,\nu_1]$
as in Lemma 2.10, satisfying

and recall that

For a set of Hecke characters
$\Psi=\{ \xi \chi \}$
and
$u \in {\mathbb{Z}}[i]\setminus \{0\}$
we let
$\Psi_u$
denote the set of characters induced to modulus u. We say that
$\xi \chi$
is primitive if
$\chi$
is a primitive Dirichlet character. The following definition depends on the choice of
$\eta,\eta_j$
but since these are fixed throughout the argument we omit this dependency in the notation.
(Q-regularity) Let
$C_1,C_2>0$
and
$Q,N,X \geq 1$
. Let
$\beta_{\mathfrak{n}}$
be complex coefficients. We say that
$\beta_{\mathfrak{n}}$
is
$(Q,N,X,C_1,C_2)$
-regular if there exists a set of primitive characters
$\Psi=\{\xi\chi\}$
with
$|\Psi| \leq (\log X)^{C_2}$
such that the following holds. For any
$\xi \chi \in \Psi$
, we have

and for any
$u \in {\mathbb{Z}}[i]$
with
${\mathcal{M}}(u) \leq Q$
and
$N' \sim N$
we have

Given a function
$Q=Q(N,X)$
, we say that
$\beta_{{\mathfrak{n}}}$
is Q-regular if for any
$C_1>0$
there is some
$C_2>0$
and some
$X_0 >0$
such that for all
$X \geq X_0$
and for all
$N \geq X^\eta$
the coefficient
$\beta_{{\mathfrak{n}}}$
is
$(Q,N,X,C_1,C_2)$
-regular.
Informally speaking, coefficient
$\beta_{\mathfrak{n}}$
is Q-regular if it is equidistributed in residue classes and polar boxes apart from a set of Hecke characters of size
$ \leq (\log X)^{O(1)}$
, uniformly in the size of the modulus of the
${\mathcal{M}}(u) \leq Q $
. We need the fact that the Möbius function restricted to rough numbers is Q-regular for Q close to
$N^{1/3}$
, which we prove in § 6.4 using the zero density estimate (Lemma 2.14).
Lemma 6.1. Let
$W:= X^{1/(\log\log X)^2}$
. The coefficient

is Q-regular for any
$Q=Q(N,X)$
with
$N \geq X^\eta Q^{3}$
. Furthermore, for fixed
$Q,X,C_1>0$
, a fixed set of characters
$\Psi$
with
$C_2 \ll C_1$
works for all ranges of
$N > X^\eta Q^{3}$
in the definition of
$(Q,N,X,C_1,C_2)$
-regularity.
The exponent 3 in
$N > X^\eta Q^{3}$
is not the best that can be obtained but it suffices for our purposes. This could be improved by using results on large values of Dirichlet polynomials (analogous to [Reference HuxleyHux74]). We have Type II information given by the following proposition. We will apply it with
$\beta_{\mathfrak{n}}$
as above but it applies equally well to, e.g., products of k primes.
Proposition 6.2 (Type II information). Let
$W:= X^{1/(\log\log X)^2}$
and let
$\nu= (\log X)^{-C}$
for some
$C>0$
. For every
$C_1>0$
, there is some
$C_2 \ll_{C_1} 1$
such that the following holds. Let
$MN= X' \sim X$
with

Let
$\alpha_{\mathfrak{m}},\beta_{\mathfrak{n}}$
be bounded coefficients, supported on
$({\mathfrak{m}}{\mathfrak{n}},P(W))=1$
and

Let F be a smooth function as in § 2.1 with the parameter
$\nu$
. Suppose that for
$Q=X^{\delta + \eta}$
the coefficient
$\beta_{\mathfrak{n}}$
is Q-regular and let
$\xi_{k_j} \chi_j$
denote the corresponding Hecke characters with
$j \leq J\leq (\log X)^{C_2}$
. Then

Remark 6.3. Note that on the right-hand side of Proposition 6.2 we have the sequence
$a_{\mathfrak{a}}^\omega$
which is twisted by
$\omega$
, which arises from the assumption
$({\mathfrak{m}}{\mathfrak{n}},P(W))=1$
. In particular, Proposition 6.2 as stated would be false without restricting to
$({\mathfrak{m}}{\mathfrak{n}},P(W))=1$
.
We set

so that in the latter we may swap freely between Gaussian integers z and ideals
${\mathfrak{n}}$
and that

We have included the complex conjugate in
$\overline{w}$
to make our notation match with those of [Reference Friedlander and IwaniecFI98b].
6.1 Sketch of the argument
As the proof of Proposition 6.2 is quite technical, we include here a simplified non-rigorous sketch. In § 6.2 we construct an approximation
$\beta^\#_{\mathfrak{n}}$
for
$\beta$
such that the difference

is balanced along arithmetic progressions. Write

The approximation will be simple enough that
$S(\alpha,\beta^\#)$
can be evaluated by Type I information, so let us consider

where the aim is to capture the oscillations from
$\beta^\flat_z$
. By applying Cauchy–Schwarz we get

with

for some smooth majorant
$F_M$
of the interval [M, 2M]. The goal is to evaluate the sum over w with a main term
$M(z_1,z_2)$
and then show that

is small due to cancellations from the coefficients
$\beta^\flat_z$
.
Similarly as in [Reference Friedlander and IwaniecFI98b], we note that by denoting
$b_j = {\text{Re}}(\overline{w}z_j)$
and

we have

Note that typically
$|\Delta| \approx N$
. The parts where
$\Delta=0$
or
$|\Delta| < N/ (\log X)^C$
correspond to diagonal contributions and may be bounded by crude estimates. Thus, we assume for simplicity that
$|\Delta| \asymp N$
.
Let
$b_0:=(b_1,b_2)$
and write
$b_j=b_0b_j'$
. Note that in the situation that
$B \subseteq q_1 {\mathbb{Z}}$
we have
$q_1 | b_0$
, so that
$b_0$
can be quite large for a large subset of
$B\times B$
. As usual, in most places dealing with greatest common divisors does not cause serious problems but the dependency on
$b_0$
will be crucial for our argument.
We have
$b_0 | \Delta w$
by (6.1) and, for simplicity, let us assume that
$b_0| \Delta$
. Then
$w = (z_2 b_1- z_1b_2)/(i\Delta)$
is fixed once we fix
$z_j,b_j$
, so that (ignoring the smooth weight
$F_M$
) we have to bound

Note that

is congruent to an integer, so that the congruence
$b_2' \equiv a b_1' \, (\Delta/b_0)$
lives in
${\mathbb{Z}}/(\Delta/b_0) {\mathbb{Z}}$
.
By expansion with Dirichlet characters and sorting into primitive characters we get (ignoring issues with greatest common divisors)

where, morally,

We split this into two parts,
$d b_0 > X^{\delta+\eta}$
and
$db_0 \leq X^{ \delta+\eta}$
. The contribution from the small d is our main term
$M(z_1,z_2)$
referred to in the above.
For the large
$db_0$
, we get

and applying the large sieve for multiplicative characters (Lemma 2.11) we get

by using
$N \ll X^{-\eta} |B|$
. Note that we are applying the large sieve to a very sparse set
$B/b_0 \cap {\mathbb{Z}}$
, which causes a loss in the diagonal terms and we are forced to take
$d b_0$
at least a bit bigger than
$X^\delta$
.
For the small
$db_0$
, we can rewrite the conditions
$b_0 d| \Delta$
,
$a \equiv z_2/z_1 \, (\Delta)$
as
$z_2 \equiv a z_1 \, (b_0 d)$
to get

We now see what precisely is required of the balanced function
$\beta_z^\flat$
, we need

where the modulus may be as large as
$X^{\delta+\eta}$
. This would follow if there were no zeros of
$L(s,\chi)$
with a real part
$>1- C'\log \log X / \log X$
. Since this is not known, we need to construct the approximation
$\beta^\#_{z}$
in a way that takes into account these possible bad characters. By the zero-density estimate (Lemma 2.14) this means that the approximation needs to see
$\ll (\log X)^{O(1)}$
of the characters. For technical reasons (due to the smooth weight
$F_M$
) the approximation also needs to see the distribution of
$\beta_z$
with respect to
$\arg z$
and
$|z|^2$
. Note that, in the case, that
$B \subseteq q_1 {\mathbb{Z}}$
we have always
$q_1|b_0$
, where
$q_1$
can be as large as
$X^\delta$
.
6.2 An approximation for
$\beta_{\mathfrak{n}}$
For the approximation, it is convenient to use a rough finer-than-dyadic partition of unity instead of § 2.1, so that the different parts do not overlap. Let
$\nu_2= X^{-\eta_2}$
, and let

so that

We can, of course, choose
$\nu_2$
so that
$2=(1+\nu_2)^k$
for some
$k \asymp \nu_2^{-2}$
.
Let
$\beta_{\mathfrak{n}}$
be Q-regular and let
$\Psi= \{\xi_{k_j}\chi_j\}$
denote the set of
$J \leq (\log X)^{C_2}$
characters, and denote the moduli of the characters by
$u_1,\dots,u_J$
and the primitive characters by
$\chi_1,\dots,\chi_J$
. For any two coefficients
$\alpha,\beta$
, we define their normalized W-rough correlation as

if the denominator is non-zero. We then define the approximation
$\beta_{\mathfrak{n}}^\#$
for
$\beta_{\mathfrak{n}}$
,

and the balanced function,

so that we have a decomposition

Morally, the approximation
$\beta^\#$
can be viewed as a kind of ‘expansion’ with respect to a ‘basis’, which is justified since the functions
$\xi \chi H_{N'}$
are approximately orthogonal over W-rough number, as the following lemma shows. For the lemma, recall that all functions of odd Gaussian integers z are extended to
${\mathfrak{n}}$
by considering the primary generator, for instance, we write
$G(\arg {\mathfrak{n}}) = G(\arg z)$
if z is the primary generator of
${\mathfrak{n}}$
.
Lemma 6.4. Let
$\psi,\chi$
be a characters to coprime moduli
$u,u_1$
and let
$\xi=\xi_k$
with
$|k| \ll (\nu_1)^{-2}$
. Let
$N > X^\eta |u|^2|u_1|$
. Then, for any
$C > 0$
, we have

and, for any
$({\mathfrak{m}},u)=1$
,

Proof. We prove the second claim, the first is similar but easier. By applying § 2.1 with a smooth function F with the parameter
$\nu_1$
we split
${\mathcal{N}}{\mathfrak{n}}$
smoothly into finer-than-dyadic intervals. The contribution from the edges of the support of
$H_{N'}$
gives a negligible contribution by trivial bounds. It then suffices to show that for any
$N_1 \sim N$

We let z denote a primary generator of
${\mathfrak{n}}$
. The condition
$(z,\overline{z})=1$
may be dropped with a negligible error term since z is supported on
$(z,P(W))=1$
. We write

and write

For the large v we note that by
$v|P(W)$
there is some factor
$v_0|v$
such that
$|v_0|^2 \in (X^{\eta_1}, X^{\eta_1} W]$
. Thus, by
$\check{G}(-k) \ll \nu_1$

Recall that
$\eta$
is large compared with
$\eta_1$
. Hence, by counting the sum over
$z \equiv 0\, (v_0)$
(using Lemma 2.4 to handle
$\tau(z)^{O(1)}$
) and applying Lemma 2.2, we get

For small v we split into two cases depending on
$\chi\neq 1$
and
$\chi=1$
. For
$\chi \neq 1$
, by writing
$z=uvz'+\alpha$
we have

since
$(uv,u_1)=1$
. Treating the weight

as a smooth weight, we get by the Polyá–Vinogradov bound (Lemma 2.18)

by
$N^{1-\eta} > |u|^2|u_1|$
since
$\eta_1$
is small compared with
$\eta$
.
For
$\chi=1$
, we have by Lemma 2.10 and
$\check{(G\xi_k)}(\ell) =\check{G}(\ell-k) $

Estimating the contribution from
$|\ell| > (\nu_1)^{-3}$
trivially (by Lemma 2.10) and for
$|\ell| \leq (\nu_1)^{-3}$
applying the Poisson summation formula on
${\mathbb{Z}}[i]$
we get

Our main lemma about the approximation is the following, which says that
$\beta^\flat_{\mathfrak{n}}$
is balanced over arithmetic progressions restricting to small polar boxes.
Lemma 6.5. Let
$N> X^\eta Q^3$
. For any
$C_1> 0$
, there is some
$C_2 \ll_{C_1} 1$
such that the following holds. Let
$\xi=\xi_k$
with
$k=(\log N)^{O(1)}$
. Let
$\beta_{\mathfrak{n}}$
be Q-regular, and let
$\beta_{\mathfrak{n}}^\flat$
be as above with
$|\Psi|=J \leq (\log X)^{C_2}$
. Suppose that
$\beta_{\mathfrak{n}}$
is supported on
$({\mathfrak{n}},P(W))=1$
. Let
$N'= N(1+\nu_2)^{j} \in [N,2N]$
and
$\theta \in {\mathbb{R}}/2\pi {\mathbb{Z}}$
. Then, for any
$u \in {\mathbb{Z}}[i]$
with
${\mathcal{M}}(u) \leq Q$
, we have

Proof. Let us first show that in the approximation
$\beta_{\mathfrak{n}}^\#$
we can replace the characters with conductor dividing u by characters with modulus u. Suppose that
$\psi$
is induced by a primitive character
$\psi'$
of modulus
$u' < u$
. Then

Since we have defined the correlation by sums over
$({\mathfrak{n}},P(W))=1$
, the characters agree unless
${\mathcal{N}} ({\mathfrak{n}},(u/u')) > W$
, so that we have

Thus, if the approximation includes a character whose conductor is a proper divisor of u, we may replace it by the character with
$({\mathfrak{n}},u)=1$
at a negligible cost (by using orthogonality of characters). Let us assume that this has been done, so that the moduli of the characters
$\chi_j$
satisfy either
$u_j=u$
or
$(u_j,u)=1$
. Let
$\Psi_u$
denote the characters
$\xi\psi$
modulo u that are equal to
$\xi_{k_j}\chi_{j}$
for some j. By expanding G with Lemma 2.10 we get

Denote

Then by Cauchy–Schwarz (
$(A+B)^2 \leq 2(A^2+B^2)$
) we have

6.2.1 Bounding
$S(\Psi_u)$
.
By the definition of
$\beta^\flat_{\mathfrak{n}}$
, we see that for
$\xi_k \psi = \xi_{k_{j_0}} \chi_{j_0}$
we have

Thus, by Cauchy–Schwarz on j and k (recall that for
$\psi \xi_k \in \Psi$
we have
$|k| \leq (\nu_1)^{-2}$
by Q-regularity) and using
$|\check{G}(k)| \ll \nu_1$
we have

By Lemma 6.4 we get

6.2.2 Bounding
$S(\Psi_u^{\complement})$
.
By Cauchy–Schwarz (
$(A+B)^2 \leq 2(A^2+B^2)$
) we have

with

By the assumption that
$\beta_{\mathfrak{n}}$
is Q-regular we have

For
$S_2(\Psi_u^{\complement})$
we have by Cauchy–Schwarz on j

we write

The contribution from the third sum may be extracted from
$ S_2(\Psi_u^{\complement})$
by Cauchy–Schwarz and bounded by the same argument as with
$ S(\Psi_u)$
. Thus, we are left with bounding

The claim now follows by orthogonality of characters and Lemma 6.4 since
$N^{1-\eta} > Q^3$
and
$|u|,|u_j| \leq Q$
.
Remark 6.6. When constructing the approximation
$\beta^\#_{\mathfrak{n}}$
we have a choice of using either the physical space or the Fourier space. To approximate
$\beta_{{\mathfrak{n}}}$
with respect to arithmetic progressions and sectors of
${\mathbb{Z}}[i]$
we use the Fourier space (i.e. characters
$\chi,\xi$
), whereas to approximate
$\beta_{{\mathfrak{n}}}$
with respect to the size of
${\mathcal{N}} {\mathfrak{n}}$
we use the physical space (i.e. smooth partition
$H_{N'}$
). These are the most convenient choice for using existing zero-density estimates and information about exceptional characters.
6.3 Partitioning the Type II sum
With the approximation for
$\beta_{\mathfrak{n}}$
defined as in § 6.2, we can extract the main term from the Type II sum by writing

By
$(a,b)=1$
, we may restrict to
$(w,\overline{w})=1$
. Since we are working with rough numbers the condition
$(a,b)=1$
may be dropped with a negligible error term. By definition,
$a_{\mathfrak{n}} = \sum_{u \in \{\pm 1,\pm i\}} a_{uz}$
so that we have

The two contributions are bounded by the following two propositions, which together imply Proposition 6.2.
Proposition 6.7. Suppose that the assumptions of Proposition 6.2 hold and let
$\beta_{\mathfrak{n}}^\flat$
be as in § 6.2. Then, for every
$C_1>0$
, there is some
$C_2>0$

Proposition 6.8. Suppose that the assumptions of Proposition 6.2 hold and let
$\beta^\#_{\mathfrak{n}}$
be as in § 6.2. Then, for any
$C>0$
,

Remark 6.9. We want to carry the condition
$(w,\overline{w})=1$
through the application of Cauchy–Schwarz. To see why, consider a situation where
$B \subseteq d_0 {\mathbb{Z}}$
with
$d_0 > X^\eta$
being very smooth so that
$\tau(d_0)$
is larger than any fixed power of
$\log X$
. Let us split the Type II sum according to the g.c.d. of w and
$d_0$
, which gives us

Now in the inner sum
$e| w$
means that
$e|b$
is automatic, so that the density on the inside is bumped up, that is, we morally have

Prior to Cauchy–Schwarz this is not an issue since we still get converging sum
$\sum_{e|d_0} e^{-1}$
. However, after applying Cauchy–Schwarz to w, we get

which means that we have picked up a large divisor function
$\tau(d_0)$
. This would be problematic since we can only save a fixed power of
$\log X$
from Lemma 6.5. We resolve this issue by keeping the condition
$(w,\overline{w})=1$
so that w has no non-trivial integer divisors, but there are also other ways to deal with this.
6.4 Proof of Lemma 6.1
Note that
${\mathcal{M}}(u) \leq Q$
implies that
$|u| \leq Q$
, so that by
$N > X^\eta Q^{3}$
we have
$N > X^{\eta} |u|^{3}$
. By Lemma 2.14 we can take for
$\sigma_Q := 1- \frac{C_2' \log \log Q}{ \log Q}$
with some large
$C_2'> 0$
to get

and let
$\Psi$
be the set of primitive characters
$\xi\psi$
such that
$L(s,\xi\psi)$
has a zero counted in the above with
${\mathcal{M}}(u) \leq Q$
. Recall that we now specify

Then, for Lemma 6.1, we need to show that if
$\Psi_u$
denotes the set of characters modulo u which are induced by
$\Psi$
, then for any
$N' \in [N,2N]$
for any
$C_1>0$
there is some
$C_2>0$
such that

The contribution from the trivial character
$\psi=\psi_0$
is bounded by a similar but easier argument as below, using Heath-Brown’s identity and the Vinogradov strength zero-free region of Coleman [Reference ColemanCol90]. We then restrict to
$\psi \neq \psi_0$
. We apply § 2.1 to
${\mathcal{N}}{\mathfrak{n}}$
with
$\nu_1=X^{-\eta_1}$
, using the assumption that
$\eta_2$
is small compared with
$\eta_1$
to replace
${\mathcal{N}} {\mathfrak{n}} \in ( N',N'(1+\nu_2)]$
with a smooth weight. It then suffices to show that for any
$N_1 \sim N$
, we have

The proof strategy is classical so we will be brief. We use the Heath-Brown identity [Reference Iwaniec and KowalskiIK04, (13.58)] with
$K=3$

Let
$F_1$
be as in § 2.1 with
$\nu=1/2$
. Using the Heath-Brown identity and a dyadic decomposition (smooth for the free variable
${\mathfrak{a}}$
), we get sums of Type I

with
$M \ll 2N^{2/3}$
and
$AM \sim N$
and sums of Type II

with
$\alpha_1({\mathfrak{m}}) = 1$
or
$\alpha_1({\mathfrak{m}}) = \mu({\mathcal{N}} {\mathfrak{m}})$
,
$M_1M_2M_3 \sim M$
and

To see this note that we are in the Type I case unless all of the variables
$n_j$
are
$\ll N^{1/3}$
, and in that case we can take the
$M_1$
for the Type II sum to be the range of largest variable
$m_j,n_j$
, which must be
$\gg N^{1/6}$
.
To show (6.3) it then suffices to show that

For
$S_I$
, we let
$D= X^{\eta_1}$
and write

The contribution from
${\mathcal{N}}{\mathfrak{d}} > D$
is bounded by using Lemma 2.2, after using orthogonality of characters and Lemma 2.10. For
$S_I$
, we then need to bound

with
$D \ll X^{\eta_1}$
and
$M \ll N^{2/3}$
.
Denote

Then, for
$\psi \neq \psi_0$
, we have the standard point-wise bounds

and for
$\alpha_1=1$
or
$\alpha_1=\mu$
with
$M_1 \gg X^{1/6}$
once
$C_2$
is large compared with
$C_1$
, with
$\xi\psi \not \in \Psi_u$
,

The bound (6.4) follows by the Polyá–Vinogradov bound (i.e. the convexity bound for
$L(s,\xi\psi)$
in the u aspect, Lemma 2.18). The bound (6.5) follows by the truncated Perron’s formula and shifting the contour to
$(1+\sigma_Q)/2$
(justified by
$\xi \psi \not \in \Psi_u$
), using the bound Lemma 2.16 for
$1/L(s,\xi \psi)$
, and taking
$C_2'>0$
in the definition of
$\sigma_Q$
sufficiently large.
By Mellin inversion (Lemma 2.9), we get

with

Hence, we have

with

To bound
$J_I$
we apply Cauchy–Schwarz on t, k, (6.4), and orthogonality of characters to get, for some coefficients
$\gamma_{\mathfrak{n}}$
and for some
$t,\xi$
,

By
$N > N^{\eta} |u|^{3} $
,
$M \ll N^{2/3}$
, and
$D =X^{\eta_1}$
we get

since
$\eta_1$
is small compared with
$\eta$
, which is sufficient for bounding
$J_I$
.
For
$J_{II}$
we apply the bound (6.5) for
$M_1$
and Cauchy–Schwarz in the t, k variables to get

By orthogonality of characters and Lemmas 2.10 and 2.9 we get

Let us denote
${\mathfrak{m}}_{2j} = (w_j)$
and
${\mathfrak{m}}_{3j}=(z_{j})$
so that

Writing
$w_2=w_1+u$
,
$z_1=z_2+v$
we get (using Lemma 2.4 to handle
$\tau(w) \tau(z) $
)

using
$M_1 \ll N^{1/3}$
,
$M_1M_2 M_3 \asymp N$
to get
$M_2M_3 \gg N^{2/3} \gg X^{\eta} |u|^2$
.
7. Type II information: proof of Proposition 6.7
7.1 Cauchy–Schwarz
Let
$F_M(m)=F(m/M)$
with a fixed smooth majorant F for the interval [1,2], supported on
$[1/2,3]$
. By applying Cauchy–Schwarz we get

where

It then suffices to show that

Define

Note that typically
$|\Delta| \asymp N$
. We partition the sum into a main term and diagonal terms by writing

where

For
$V(\beta)$
we apply § 2.1 with
$G:{\mathbb{R}}/2\pi{\mathbb{Z}} \to {\mathbb{R}}$
being a non-negative smooth function with the parameter
$\nu_1=X^{-\eta_1}$
to the variables
$\arg z_1$
and
$\arg z_2$
to get

with

and

We now further partition
$V(\beta)$
according to the size of
$|\Delta|$
. Let
$\nu_3=X^{-\eta_3}$
and write

where
$V_{> \nu_3}(\beta)$
is the part where
$|\sin (\theta_1-\theta_2) | > \nu_3$
, which implies (since
$\eta_3$
is small compared with
$\eta_1$
)

We then have the following lemmas, which together imply Proposition 6.7.
Lemma 7.1 (Off-diagonal contribution). For
$|\sin (\theta_1-\theta_2) | > \nu_3$
,

Lemma 7.2 (Diagonal contribution). We have

Lemma 7.3 (Pseudo-diagonal contribution I). We have

Lemma 7.4 (Pseudo-diagonal contribution II). We have

7.2 Proof of Lemma 7.1
It suffices to show that for
$|\theta_1 - \theta_2\pmod{\pi}| > \nu_3$
we have

We note that
$(z_1,z_2)=(z_1,\overline{z_1}) = (z_2,\overline{z_2}) = 1$
implies
$(\Delta,|z_1|^2|z_2|^2)=1$
. By symmetry, this can be seen from

Denoting
${\text{Re}} (\overline{w} z_1 ) = b_j$
, we have

Let
$b_0:= (b_1,b_2)$
. Since
$(w,\overline{w})=1$
, we know that
$b_0|\Delta$
. Thus,

Denoting
$b_j= b_0b_j'$
, we have

We now wish to remove the smooth weight
$F_M$
. Recall that already
$b_0b'_j \in [Y,Y + X^{1/2-\eta}]$
by (3.2). We introduce a rough finer-than-dyadic partition for
$|z_1|^2,|z_2|^2$
by using
$H_{N'}$
as in § 6.2 with
$\nu_2=X^{-\eta_2}$
. Let
$N_{1},N_{1}\sim N$
, and denote

To prove (7.2) it then suffices to prove that for
$N_{1},N_{1}\sim N$
and for
$|\sin (\theta_1-\theta_2) | \geq \nu_3$
we have

Using
$|\sin (\theta_1-\theta_2) | > \nu_3$
we have
$\Delta = |z_1z_2| \sin (\arg_2-\arg z_1) \gg \nu_3 N$
, so that for some constant
$F_M(\boldsymbol{N},\boldsymbol{\theta})$

Therefore, we have

where

Then (7.4) follows from the following two lemmas once
$\eta_3$
is small compared with
$\eta_2$
so that
$\nu_2 \nu_3^{-3}=X^{-\eta_2+3\eta_3} < X^{-\eta_2/2}$
, say.
Lemma 7.5. For any
$C_1 >0$
, there is some
$C_2>0$
for
$\beta^\flat_{\mathfrak{n}}$
as in § 6.2 such that for
$|\sin (\theta_1-\theta_2)|> \nu_3$
we have

Lemma 7.6. For
$|\sin (\theta_1-\theta_2)| > \nu_3$
, we have

7.2.1 Proof of Lemma 7.5.
The condition
$(w,\overline{w})=1$
has served its purpose so we remove it by expanding

where
$\ell$
runs over integers. Writing
$w=\ell w'$
, we get from (7.3)

We have

To see this, by using
$(b_1',b_2')=1$
and (7.6), we get that for
$j \in \{1,2\}$
,

and (7.7) follows since the left-hand side is an integer and by
$(z_j,\overline{z_j})=1$
the only integer dividing
$z_j$
is 1. Similarly, we also get
$(z_1z_2,\ell) =1$
by (7.6). Thus,

is congruent to a rational integer, which is equivalent to saying that
$\Delta \equiv 0 \, (\ell)$
. Thus, we get

By expanding the congruence
$b'_2 \equiv a b'_1 \, (\ell\Delta/b_0) $
into Dirichlet characters and splitting into primitive characters we get

We separate
$db_0 > X^{\delta+\eta/2}$
and
$ db_0 \leq X^{ \delta +\eta/2}$
, that is, write

For the large
$db_0$
, we use the expansion

to get

We split the sum according to
$a_0=(b_0,\ell)$
which gives us

Recall that by
$|\sin (\theta_1-\theta_2)| \gg \nu_3$
we have
$|\Delta| \gg \nu_3 N$
. For any fixed D with
$\nu_3 N \ll D \ll N$
we have (combining the variables
$z_2\overline{z_1}=z$
, using a divisor bound, and recalling that
$\Delta= {\text{Im}} (z_2\overline{z_1})$
)

which gives us (since
$\eta_3$
is small compared with
$\eta$
)

By a dyadic partition, we get (denoting
$D'=\ell D/(db_0) $

By the large sieve for multiplicative characters (Lemma 2.11) we obtain

since
$N \leq X^{-\eta} |B|$
. This is sufficient for Lemma 7.5 since
$\eta_1,\eta_2$
are small compared with
$\eta$
.
To bound the contribution from the small
$d b_0$
recall that

We expand the conditions
$(b_1'b_2',\ell \Delta/(db_0))=1$
by using the Möbius function to get

We have

Since
$z_1,z_2$
are restricted to polar boxes, we have

Plugging this in, we get

with

Let us first consider
$V'_{\leq}(\beta,\boldsymbol{N},\boldsymbol{\theta})$
. We note that the sums over
$d_1,e_2,e_2,\ell$
converge quickly, so that we expect to be able to bound the contribution from large ranges of
$d_1,e_2,e_2$
by crude estimates. We have

Denoting

we get
$f | \Delta$
, which is equivalent to saying that

for some
$a \, (f)$
. Note that
$|f| \ll N$
. Thus,

We write

where
$E_{\leq}(\beta,\boldsymbol{N}, \boldsymbol{\theta})$
is the part where
$f > X^{\delta +\eta}$
.
For
$f \leq X^{\delta + \eta}$
, we note by
$(z_1,z_2)=1$
we have
$(z_1z_2,f)=1$
. By dropping the condition
$(z_1,z_2)=1$
, we get

with

and

By expanding
$z_2 \equiv a z_1 \, (f)$
with Dirichlet characters we have

By applying Cauchy–Schwarz on
$\psi$
and using Lemma 6.5 we get

By Lemma 2.17 (with the argument
$ab'_1/b'_2$
) and Lemma 2.1 this is bounded by

Using

by Lemma 2.1 and (7.8), we get

which is sufficient.
The argument for bounding
$ Z_{ \leq}(\beta,\boldsymbol{N}, \boldsymbol{\theta}) $
from (7.11) is the same except that instead of expanding into Hecke characters and Lemma 6.5 we use the trivial bound

where the last bound holds since
$N > X^{3\delta+3\eta}$
and
$f^2 \leq X^{\delta+2\eta}$
. This gives us

which suffices for Lemma 7.5.
For
$E_{\leq}(\beta,\boldsymbol{N},\boldsymbol{\theta})$
from (7.10) with large f we will need to use a slightly different argument since the modulus f can be as large as N which would make
$f^2$
much bigger than N. We write

By using
$f > X^{\delta + \eta}$
and
$db_0 \leq X^{\delta + \eta/2}$
we get
$f^{-1} \leq X^{-\eta/2} (b_0 d)^{-1}$
, which gives us

By using the expansion

Cauchy–Schwarz, orthogonality of characters, and a divisor bound, we have

where the last bound follows by symmetry. Hence, we have

The contribution from
$b_1'=b_1$
is bounded by

The contribution from
$b_1'\neq b_1$
is bounded by

by applying the divisor bound
$\tau(n) \ll_{\varepsilon} n^{\varepsilon}$
. Therefore, by (7.8) we get (recall that
$\nu_3$
is small compared with
$\nu$
)

Finally, the error term
$W_{\leq }(\beta,\boldsymbol{N},\boldsymbol{\theta})$
from (7.9) is bounded by exactly the same argument as above except that in the part
$f \leq X^{\delta+\eta}$
we use the trivial estimate

instead of expanding into Hecke characters and Lemma 6.5.
7.2.2 Proof of Lemma 7.6.
The argument is exactly the same as in § 7.2.1, except that in the part
$f \leq X^{\delta+\eta}$
we use the trivial estimate

instead of expanding into Hecke characters and Lemma 6.5.
7.3 Proof of Lemma 7.2
Since
$\Delta=0$
, we have

Multiplying both sides by
$\overline{w}$
and taking the imaginary parts we find

Hence, we get

by using the divisor bound
$\tau(\ell) \ll_{\varepsilon} \ell^{\varepsilon}$
.
7.4 Proof of Lemma 7.3
Since
$(z_1 z_2, P(W)) =1$
, having
$(z_1,z_2) > 1$
implies
$|(z_1,z_2)|^2 \geq W$
. Let
$z_0 = (z_1,z_2)$
and
$z_j=z_0z_j' $
. Denoting
$w_0=\overline{z_0} w$
and
$\Delta'={\text{Im}}(\overline{z_1'} z_2')$
, we have

which implies that

so that
$w_0$
is fixed once we fix
$z_j',b_j$
. Furthermore, we have

Note also that
$(w_0,\overline{w_0})=1$
implies that
$(b_1,b_2)=b_0 | \Delta'$
. Then denoting
$b_j=b_0b_j'$
and
$a'=z_2'/z_1'$
we have

Thus, denoting

it suffices to show that

We apply a similar argument as in § 7.5 except that certain parts will be easier by positivity. By expanding into Dirichlet characters, we get

We split into the parts
$db_0 \leq X^{\delta+\eta/2}$
and
$db_0 > X^{\delta+\eta/2}$

For large
$db_0$
we expand the condition
$(b_1',b_2')$
to get

By using the estimate

we get (denoting
$D=D' \,db_0$
)

By applying the multiplicative large sieve (Lemma 2.11) similarly as in § 7.2.1, we get

which suffices for (7.13).
For small
$db_0$
, using Lemma 2.17 we write

We note that
$z'_2 \equiv a z'_1 \, (db_0)$
implies that for
$z=z_2\overline{z_1}$
, we have
$z \equiv \overline{z} \, (db_0)$
, that is, denoting
$z=r+is$
, we get
$s\equiv 0 \, (db_0)$
. Thus, using the divisor bound

we get

where the last step follows from
$N > X^{3\delta+3\eta}$
and
$d b_0\leq X^{\delta+\eta}$
. Therefore, we get by Lemma 2.1,

which completes the proof of (7.13).
7.5 Proof of Lemma 7.4
By recombining the finer-than-dyadic partitions for
$\theta_1,\theta_2$
, we get

with

Similarly as in § 7.2, we write

Using
$\Delta \ll \nu_2 N$
and
$MN \sim X$
we see that the smooth weight
$F_M$
restricts
$z_2b_1-z_1b_2$
to a small disc, that is,

Now recall that by (3.2) we have
$B \subseteq [Y,Y+X^{1/2-\eta}]$
for some
$Y \asymp X^{1/2}$
. Hence, we obtain

This can now be bounded by the same argument as in § 7.4, replacing the bounds (7.14) and (7.15) by

respectively, and

We get

which gives us Lemma 7.4.
Remark 7.7. Without the assumption
$B \subset [\eta_1 X^{1/2},(2-\eta)X^{1/2}]$
we would need to take
$\nu_3=X^{-\delta-{\varepsilon}}$
to get savings in this argument, since it is possible that
$Y \asymp X^{1/2-\delta}$
. This means that in the approximation for
$\beta_{\mathfrak{n}}$
we need to track the distribution of
$\beta_{\mathfrak{n}}$
in sectors with an angle
$X^{-\delta-\eta}$
. It is possible to do so by a more careful argument but we do not pursue this issue here. For this, we also note that the smooth weight

could be handled more efficiently in terms of the dependency on
$\arg z_j$
since it is a function of the difference
$\arg z_2 -\arg z_1$
, so that we need an expansion to
$\xi_k$
only once instead of twice. Note that B cannot simultaneously be multiples of a large fixed q and restricted to a narrow interval. Thus, for this extension the condition
${\mathcal{M}}(u) \leq X^{\delta+\eta}$
ought to be replaced by
$|k| {\mathcal{M}}(u) \leq X^{\delta+\eta}$
.
8. Type II information: proof of Proposition 6.8
Recall that we are trying to evaluate

The condition
$({\mathfrak{n}}, \overline{{\mathfrak{n}}})$
may be dropped since we have
$(z,\overline{z})=1$
in the definition of
$a_z$
. We have

Here, for any
$C>0$
,

Then Proposition 4.1 follows from applying the fundamental lemma of the sieve (Lemma 2.5, see also Remark 2.6) and Proposition 4.1 to handle
$\mathbf{1}_{({\mathfrak{n}}, P(W))=1}$
in
$S_j$
. Note that by
$N > X^{3\delta+3\eta}$
and
${\mathcal{M}}(u_j) \leq Q \leq X^{\delta+\eta}$
, we get that
${\mathcal{N}}{\mathfrak{m}} \ll X^{1-\delta-\eta}/{\mathcal{M}}(u_j) ^2$
, which is required for Proposition 4.1. This gives us a main term of the form

The weight
$\frac{H_{N'} ({\mathfrak{a}}/{\mathfrak{m}})}{\nu_2 }$
may be replaced (with a negligible error term) by
$\frac{F({\mathcal{N}}{\mathfrak{m}}{\mathfrak{n}}/{\mathcal{N}} {\mathfrak{a}})}{\widehat{F}(0) } $
by a further application of Poisson summation (Lemma 2.7) on the free variable a in
$z=b+ia$
, completing the proof.
Remark 8.1. There are two potential bottlenecks for improving the range of
$\delta < 1/10$
in Theorem 3.3, namely, the exponent 3 in Lemma 6.1 and the exponent 2 in
$X^{1-\delta-\eta}/q^2$
in Proposition 4.1. It is plausible that with more work these exponents may be improved to 2 and 1, respectively, which would suffice to prove Theorem 3.3 for
$\delta < 1/8$
. Both of these improvements run into quite delicate issues and we have decided not to pursue this here. It is not clear whether the boundary
$1/8$
can be improved, but we certainly hit a hard barrier at
$\delta =1/6$
as this when even the most optimistic the Type II range
$[N^{2\delta}, N^{1/2-\delta}]$
becomes empty.
9. Proof of Theorem 3.3
We apply a sieve argument to the sequence
${\mathcal{A}}=(a_n)$
over integers

where for convenience we have split n into finer-than-dyadic intervals (as in § 2.1) with
$\nu=(\log X)^{-C}$
and
$X'\sim X$
. We also define an auxiliary sequence
${\mathcal{B}}_j$
by

denoting
$\xi_{k_0}\chi_0 = 1$
for
$j=0$
. Then Theorem 3.3 follows by using the explicit formula to evaluate the sums

once we prove that for any
$C_1>0$
, there is some
$C_2 >0$
such that

Let
$Y=X^{3 \delta + 4\eta}$
and
$Z=X^{1/2 + \delta+\eta}$
. Then
$YZ \leq X^{1-\delta-\eta}$
for some
$\eta >0$
by
$\delta <1/10$
. By Vaughan’s identity [Reference VaughanVau75] for
$n > Y $
we have

Applying this with both sides multiplied by
$(n,P(W))=1$
we get

and, similarly, for
$j \leq J$
write

The sums
$S_1,S_2$
correspond to Type I sums and
$S_3$
is a Type II sum, with all variables coprime to P(W).
By the fundamental lemma of the sieve (Lemma 2.5) and Type I information (Proposition 4.1), for
$k=1,2$
we get

since for
$j \geq 1$
,
$k=1,2$
for any
$C > 0$
,

By Type II information (Proposition 6.2, note that
$Y < b \ll X/Z$
) we get

since for any
$C > 0$
,

By recombining Vaughan’s identity for
${\mathcal{B}}_j$
we get (9.1).
10. Proof of Theorem 3.2
We have two cases, no zeros
$\beta > 1-{\varepsilon}_1/\log X$
or that in the case of a zero
$\beta > 1-{\varepsilon}_1/\log X$
we have
$\Omega(B_1) \leq \Omega(B)/2$
.
10.1 No zeros
$\beta > 1-{\varepsilon}_1/\log X$
Let us denote by
$a_{\mathfrak{n}}, a^{\omega}_{\mathfrak{n}}$
the sequences corresponding to
$\lambda=\mathbf{1}_{B}$
. By Theorem 3.3 we have

If there is a zero
$\beta_1 \geq 1-\frac{1}{\sqrt{\delta} \log X}$
as in Lemma 2.15 corresponding to
$\chi_1 $
real and
$\xi_{k_1}=1$
, then
$\beta_1 \leq 1-{\varepsilon}_1/\log X$
and the contribution from that zero is

since
${\varepsilon}_1 < 1/10$
. Therefore, the contribution from the first two terms in (10.1) is

Denoting
$\sigma_0=1-\frac{1}{\sqrt{\delta} \log X}$
, by Lemma 2.15 the remaining zeros satisfy
$\beta_j\leq \sigma_0$
and they contribute at most

The last term is negligible once
$\delta,\eta$
are sufficiently small. By Lemma 2.14 the integral is bounded by

once
$\delta,\eta$
are sufficiently small compared with the constant
$c_2$
in Lemma 2.14. Combining all of the above estimates we have

once
$\delta$
is small enough compared with
${\varepsilon}_1$
.
10.2 Zero
$\beta > 1-{\varepsilon}_1/\log X$
and
$\Omega(B_1) \leq \Omega(B)/2$
Let us call
$B_2= B \setminus B_1 $
so that
$\Omega(B_2) \geq \Omega(B)/2$
and for all
$b \in B_2$
, we have

Let us denote by
$a_{\mathfrak{n}}, a^{\omega}_{\mathfrak{n}}$
the sequences corresponding to
$\lambda=\mathbf{1}_{B}$
and
$a^{(2)}_{\mathfrak{n}},a^{(2) \omega}_{\mathfrak{n}}$
the sequences corresponding to
$\lambda=\mathbf{1}_{B_2}$
. Then, by Theorem 3.3, we have

The first term contributes

The contribution from
$2 \leq j \leq J$
is handled similarly as in § 10.1 and similarly for
$j=1$
the zeros
$\beta \leq 1-{\varepsilon}_1/\log X$
. The contribution from the Siegel zero
$\beta > 1-{\varepsilon}_1/\log X$
for
$j=1$
is essentially positive, since
$\xi_{k_1}=1$
,
$\chi_1$
is real, and by (10.2)

since
${\varepsilon}_1 < 1/10$
. Therefore, we conclude that also in the second case

once
$\delta$
is sufficiently small.
11. Proof of Theorem 1.6
By similar reductions as in § 3 it suffices to consider
$\lambda_b= \mathbf{1}_B(b)$
. The goal is to show that if
$u_j$
is a modulus of one of the characters
$\xi_{k_j} \chi_j$
and b does not have a large common factor with
$u_j$
, then the sum over the free variable a of
$\xi_{k_j} \chi_j(b+ia)$
exhibits cancellation. By Theorem 3.3 and the Siegel–Walfisz bound [Reference Friedlander and IwaniecFI98b, Lemma 16.1] for small moduli
$u_j$
, it suffices to show that for any
$j\leq J_1$
and
$Y \in [X^{1/2-\eta},2X^{1/2}]$
and any
$|u_j|^2 \gg (\log X)^{C''}$
with C” large compared with C’ we have

Note that the weight
$\xi_j((b+ia))$
has been removed by splitting a into finer-than-dyadic intervals and using (12) to note that then
$b+ia$
lives in a small box.
We write

For
$|v| > |u_j|/(\log X)^{C'/2}$
we use the assumption (1.1) to get

once C’ is large compared with
$C_1$
and C. For
$|v| \leq |u_j|/(\log X)^{C'/2}$
we use Lemma 2.20 to get

by taking C’ large compared with
$C_1$
and C. To evaluate the main term we note that

12. Proof of Theorem 1.10
This follows immediately from Theorem 3.3 with the zero-density estimate Lemma 2.14 once C’ is sufficiently large, via similar arguments as in § 10.1.
Acknowledgements
I am grateful to Akshat Mudgal for numerous discussions and to Lasse Grimmelt for suggestions on constructing an approximation for primes, as well as to James Maynard for encouragement and helpful comments. I also wish to thank the anonymous referee for comments.
Conflicts of interest
None.
Financial support
This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement number 851318).
Journal information
Compositio Mathematica is owned by the Foundation Compositio Mathematica and published by the London Mathematical Society in partnership with Cambridge University Press. All surplus income from the publication of Compositio Mathematica is returned to mathematics and higher education through the charitable activities of the Foundation, the London Mathematical Society and Cambridge University Press.