1. Introduction
What are the typical gaps between the prime divisors of a randomly selected integer? For $m\in \mathbb{N}$ , we let $\omega(m)$ be the number of distinct prime divisors of m and $p_i(m)$ be the i-th smallest prime divisor of m, so that
is a finite sequence that depends on m. It is not difficult to show that for almost all m and almost all $1\leqslant i\leqslant \omega(m) $ , one has $\log \log p_i (m) \sim i $ ; hence, $\log \log p_{i+1}(m) -\log \log p_i (m) $ is typically bounded. A natural question is to count the number of gaps exceeding a fixed constant $z\geqslant 0 $ , i.e. estimate
Erdős [Reference Erdős6, p. 534] was the first to study this question. He showed that for almost all m, the function $\omega_z(m)$ is well-approximated by $\textrm{e}^{-z}\omega(m) $ by proving an upper bound for the second moment:
However, it turns out that this is not of the right order of magnitude. Here, we prove asymptotics not just for the second moment, but for all moments:
Theorem 1.1. Fix any $ z \geqslant 0 $ and $r \geqslant 0 $ . Then
where $\mu_r$ is the r-th moment of the standard normal distribution.
As a consequence, for all $\alpha<\beta \in \mathbb{R}$ one has
Setting $z=0$ , we recover the much celebrated Erdős–Kac theorem [Reference Erdős and Kac7]. Our method is different from that of Erdős [Reference Erdős6] in that it relies on Stein’s method on normal approximations [Reference Stein18]. This allows us to deal with certain sums of dependent random variables that arise when modelling $\omega_{z}(m)$ . Stein’s method has been rarely used in number theory, for example, by Harper [Reference Harper11].
There are many generalisations of the Erdős–Kac theorem to functions of the form $\sum_{p\mid m } g(p)$ but they do not cover $\omega_z(m)$ , as g(p) would have to be a function of m as well. Galambos [Reference Galambos9, Theorem 2] studied the values of a function that is somewhat related to our $\omega_z$ , namely the cardinality of $i < \omega(m)$ for which $\log \log p_{i+1}(m) -\log \log p_i (m)> z +\log \log \log m $ . His results and method are rather different as they are suited to values of large gaps, while our result relates to small gaps. A function similar to Galambos’ occurs in the recent work of Chan–Koymans–Milovic–Pagano [Reference de la Bretèche and Tenenbaum4, Section 4] on the negative Pell equation.
Remark 1.2. At the cost of a non-self sufficient argument, the number theoretic part of the proof of Theorem 1.1 (namely, Lemma 2.9) can be alternatively verified via the Kubilius model [Reference Elliott5, Section 12]. The approximation of $\omega(m) $ by $ \textrm{e}^{-z}\log \log m $ means that the gaps in the sequence $\{\log \log p_i(m)\}_{i\geqslant 1 }$ are Poissonian. It is worth mentioning that the occurrence of Poisson distribution in other areas of Probabilistic Number Theory is not uncommon, see the work of de Koninck–Galambos [Reference de Koninck and Galambos3], Harper [Reference Harper11], Granville [Reference Granville10] and Kowalski–Nikeghbali [Reference Kowalski and Nikeghbali12], for example.
Remark 1.3 (Further developments). The interested reader may wonder whether one can use tools from analysis to make explicit the term o(1) in Theorem 1.1. In the case of the Erdős–Kac theorem, this was done by Rényi and Turán [Reference Rényi and Turán15] using complex analysis. After seeing the first version of this paper on arXiv, R. de la Bretèche and G. Tenenbaum proved an explicit error term using methods quite different from ours (namely, Fourier analysis); see their preprint [Reference de la Bretèche and Tenenbaum4] for details.
1.1. Generalisations in Diophantine geometry
In Section 3, we provide a generalisation of Theorem 1.1, given by Theorem 3.2. In brief terms, it states that the gaps between primes p for which a typical variety over $\mathbb{Q}$ has no $\mathbb{Q}_p$ -points obey the Poisson distribution. A statement analogous to the Erdős–Kac theorem was proved by Loughran–Sofos [Reference Loughran and Sofos14] by using geometric input from the work of Loughran–Smeets [Reference Loughran and Smeets13].
2. The proof of Theorem 1.1
2.1. Defining the model
The letter z will denote a fixed non-negative real number throughout Section 2. As usual, we denote $\exp\!(z)\;:\!=\;\textrm{e}^z$ . For a prime p and a positive integer m, we define
In particular, $\omega_z(m) = \sum_{p} \delta_{p,z}(m)$ , where the sum is over all primes. Our plan, initially, is to follow the Kubilius model idea (see Billingsley [Reference Billingsley1, equations (1.8),(1.9)]) to define Bernoulli random variables $B_p$ that model the behaviour of $\delta_{p,z}$ . For this, we use the random variables $X_p$ as follows: for every prime p the random variable $X_p$ is defined so that
and such that $X_p$ are independent. In particular, the mean $E[X_p]$ equals $1/p$ , thus, $X_p=1$ models the event that a random integer m is divisible by a fixed prime p. Let $$[\,\cdot\,]$$ denote the integer part. The independence of $X_p$ is related to the Chinese Remainder Theorem.
To model $\delta_{p,z}$ , we must also take into account the fact that each prime q in the range $(p, p^{\exp\!(z)}]$ must not divide m. Thus, we are naturally led to define
We will later prove that $\sum_pB_p$ is a good model for $\omega_z=\sum_p\delta_{p,z}$ in the sense that their moments agree asymptotically.
Remark 2.1 (Independence break-down). Definition (2.1) leads to a major difference between this paper and the proofs of the Erdős–Kac theorem, namely, the variables $B_p$ are dependent. Indeed, for all primes $p<q$ with $q\leqslant p^{\exp\!(z)}$ , the quantity $E[B_p B_{q}] $ vanishes while none of $E[B_p] , E[ B_{q}] $ does.
2.2. Distribution and moments of the model via Stein’s method
For any positive N, we define
and denote its expectation and variance, respectively, by
Our goal in this section is to prove that $(S_N-c_N)/s_N$ converges in law to the standard normal distribution as $N\to \infty $ and that its moments are asymptotically Gaussian. This will be done, respectively, in Propositions 2.5 and 2.7. We first need a few preparatory estimates.
Lemma 2.2. We have
and
Proof. Recall that Mertens’ theorem states that $\sum_{p\leqslant T} 1/p= \log \log T+c+O(1/\log T)$ for some constant c. The independence of $X_p$ yields
which, by the approximation $1-\varepsilon=\exp\!(\!-\!\varepsilon+O(\varepsilon^2)) $ for $| \varepsilon| \leqslant 1 $ and Mertens’ theorem is
Since $\exp\!\left(O(1/\log p )\right)=1+O(1/\log p)$ , this is sufficient for (2.2). The estimate (2.3) is directly deduced from it and the fact that $\sum_p \!(p \log p)^{-1} $ converges. Next, denoting $h_p=E[B_p]$ we have
First note that $ E\!\left[(B_p-h_p)^2\right]= E\!\left[B_p \right]-h_p^2=h_p(1-h_p) $ . Further, if $q>p^{\exp\!(z)}$ then $B_p$ and $B_q$ are independent, hence, $E\!\left[(B_p-h_p)(B_q-h_q) \right]=0$ . If $p<q\leqslant p^{\exp\!(z)}$ then $E[B_p B_q]$ vanishes, hence
We obtain
By (2.2) we have $h_p\ll 1/p$ , hence, $\sum_{p } h_p^2 =O(1)$ and
Hence, (2.3) gives
Using (2.2) we see that
which, by Mertens’ theorem and $\sum_{q>t}(q\log q)^{-1} \ll (\!\log t )^{-1}$ , equals
Injecting this into (2.5) concludes the proof.
Lemma 2.3. For all $u\in \mathbb{N}, \textbf{r}\in \mathbb{N}^u$ and primes $p_1,\ldots,p_u$ , we have
where rad denotes the radical.
Proof. We write the factorisation into prime powers of $\prod_{i=1}^ u p_i^{r_i} $ as $\prod_{j=1}^{v} q_j^{s_j}$ , where $q_j$ are v distinct primes. This implies that
Using $|B_{q_j} - E[B_{q_j}]|\leqslant B_{q_j} +E[B_{q_j}]\leqslant X_{q_j} +E[X_{q_j}]=X_{q_j} +1/{q_j}$ and the binomial theorem yields
hence,
By the independence of the $X_q$ , we infer that
The proof now concludes by noting that $\prod_{ j=1 }^v q_j$ is the radical of $\prod_{i=1}^ u p_i^{r_i} $ .
The following lemma is the main tool in the proof of Theorem 1.1. It is due to Stein [Reference Stein18, Corollary 2, p. 110].
Lemma 2.4 (Stein). Let T be a finite set, and for each $t \in T$ , let $Z_t$ be a real random variable and $T_t$ a subset of T such that $E[Z_t]=0$ , $E[Z_t^4]<\infty$ and $E[\sum_{t\in T }Z_t \sum_{s\in T_t} Z_s]=1$ . Then for all real b,
where the terms $\Psi_i$ are defined through
and
Proposition 2.5. Fix $z\geqslant 0 $ and $b \in \mathbb{R}$ . For any $N\in \mathbb{N}$ , we have
where the implied constant depends at most on z. In particular, $ (S_N- c_N)/s_N $ converges in law to the standard normal distribution as $N\to \infty$ .
Proof. We will apply Lemma 2.4 with
-
T being the set of primes in [2, N],
-
$T_p$ being the set of primes in $[p^{\exp\!(-z) },p^{\exp\!(z) }] \cap [2,\,N]$ ,
-
$Z_p=(B_p - E[B_p] )/s_N$ for $p\in T$ .
Let $Y_p\;:\!=\;B_p -E[B_p]$ . Note that if $q \notin T_p$ then $Z_q$ and $Z_p$ are independent, hence, $E[Y_pY_q]=0$ . Therefore,
which verifies $E\!\left[\!\sum_{p\in T }Z_p \sum_{q\in T_p} Z_q\right]=1$ . We next observe that since for every $q \notin T_p$ , the random variables $Z_q$ and $Z_p$ are independent; one obtains $E\!\left[Z_p | Z_q, q\notin T_t\right]=E\!\left[Z_p \right]=0,$ therefore
Next, we use Lemma 2.3 to obtain
The sum $\sum_{q\in T_p }1/q $ is bounded only in terms of z by Mertens’ theorem. It shows that
owing to (2.4).
To bound $\Psi_3$ , we write $\mathcal{C}_{p}\;:\!=\;\sum_{q\in T_{p}}\!\left(Y_{p}Y_{q}-E[ Y_pY_q ]\right)$ to obtain
Furthermore, $E\!\left[\mathcal{C}_p^2 \right]$ can be written as
which can be seen to be
by Lemma 2.3. Alluding to $\sum_{q\in T_p }1/q \ll 1 $ shows that
Let us now observe that if $p_2>p_1^{\exp\!(2z)}$ then $T_{p_1} \cap T_{p_2} =\emptyset $ , therefore $\mathcal{C}_{p_1}$ and $\mathcal{C}_{p_2}$ are independent. Since for every p we have $E[\mathcal{C}_p]=0$ by definition, we get $E\!\left[\mathcal{C}_{p_1}\mathcal{C}_{p_2}\right]=\prod_{i=1}^2E\!\left[\mathcal{C}_{p_i}\right] =0$ . Thus,
By Lemma 2.3, this is
For any positive integer c and prime q, we have $\textrm{rad}(c q)=\textrm{rad}(c)\frac{q}{\gcd\!(q,c)}$ . Hence, the sum over $q_2 $ is
by Mertens’ theorem. Hence, (2.11) is
The two sums over $q_1 $ in the right-hand side are both bounded only in terms of z. This can be proved similarly as before with the sum over $q_2$ . We obtain the bound
This shows that the quantity in (2.11) is $\ll \log \log N$ , which, when combined with (2.10), can be fed into (2.9) to yield $\Psi_3^2 s_N^{4} \ll \log \log N$ . Invoking (2.4) provides us with $\Psi_3 \ll1/\sqrt{\log \log N}$ . Together with (2.7)–(2.8), it implies that
owing to Stein’s bound (2.6). Finally, letting $N\to \infty$ shows that $(S_N-c_N)/s_N$ converges in law to the standard normal distribution.
Remark 2.6. We next prove asymptotics for the moments of $(S_N-c_N)/s_N$ . This is possibly the central proof in the present paper. The argument is a modification of the one by Billingsley [Reference Billingsley1, Lemma 3.2], which relies on a version of the dominated convergence theorem. However, the underlying random variables are now dependent; thus, we need to introduce the notion of linked indices.
Proposition 2.7. Fix $z\geqslant 0 $ and a positive integer r. Then we have
where $\mu_r$ is the r-th moment of the standard normal distribution.
Proof. Take 2k to be the least strictly positive integer with $r<2k $ , so that Proposition 2.5 [Reference van der Vaart19, Example 2.21] implies that it suffices to prove that
is bounded only in terms of k and z. Equivalently, by (2.4) it suffices to show
The left side equals
Using Lemma 2.3, we see that the contribution of the terms with $ u \leqslant k $ is
Therefore,
with an implied constant that is independent of N.
For given $u \in \mathbb{N} $ , $z\geqslant 0 $ and primes $p_1<\ldots < p_u$ , we say that two consecutive integers $ i, i+1$ in [1, u] are linked if and only if $p_{i+1}\leqslant p_{i}^{\exp\!(z)}$ . In particular, $p_{i+1}$ lies in a relatively small interval; hence, its contribution will be small. Denote the number of linked pairs $(i,i+1)$ by $\ell(\textbf{p} ) $ . By Lemma 2.3, we obtain
where we used the estimate $\sum_{p_i<p_{i+1} <p_i^{\exp\!(z)}}1/p_i \ll_z 1 $ whenever i and $i+1$ are linked. Hence, the contribution of all prime vectors $(p_1,\ldots, p_u)$ with at least $ \ell(\textbf{p} )\geqslant u-k $ linked pairs is at most
which is acceptable. By (2.12), we obtain
We will now show that every sum over $p_i$ in (2.13) vanishes. Denoting the cardinality of $1\leqslant i\leqslant u $ with $r_i=1$ by a, we see that the number of i with $r_i \geqslant 2$ is $u-a$ . Since $2k =\sum_{i=1}^u r_i$ , we get $2k \geqslant a+ 2(u-a)$ . Equivalently, $ 2 (u-k)\leqslant a $ , hence, by $ \ell(\textbf{p} ) < u-k$ one gets
We now partition the integers in [1, u] into disjoint subsets ${\mathcal{A}}_1,\ldots,{\mathcal{A}}_r$ using the following rules:
-
if i and $i+1$ are in $S_j$ then they are linked,
-
if $i \in S_a$ and $i+1 \in S_b$ for some $a\neq b$ then i and $i+1$ are not linked.
The inequality $s\leqslant 2(\!-\!1+s)$ for $s\geqslant 2$ gives
This equals $2\ell (\textbf{p} ) $ since each ${\mathcal{A}}_j$ has $-1+\sharp{\mathcal{A}}_j$ linked pairs and the total number of links is $\ell(\textbf{p} ) $ . By (2.14), we infer that there exists an index j for which $r_j=1$ and that is not linked to any other index. This implies that the following random variables are independent:
Since $ E\!\left[ B_{p_j} - E[B_{p_j}] \right]=0$ , we infer that every expectation in the right-hand side of (2.13) vanishes. This concludes the proof.
2.3. Justifying the model
Let n be a positive integer and denote by $\Omega_n$ the uniform probability space $\mathbb{N}\cap [1,n]$ . Our goal now becomes to show that, as $n\to\infty $ , the moments of $\omega_z(m)$ for m in $\Omega_n$ are asymptotically the same as the moments of $S_N$ for some parameter $N=N(n)\to\infty$ . Recall (2.1). For technical reasons, we will first work with a truncated version of $\omega_z$ , namely,
where $N=N(n)$ . The function $\delta_{p,z}$ imposes simultaneous coprimality conditions of m with several primes in large intervals, and to deal with this, we shall need the Fundamental Lemma of Sieve Theory [Reference Friedlander and Iwaniec8, Corollary 6.10].
Lemma 2.8 (Fundamental Lemma of Sieve Theory). Let ${\mathcal{P}}$ be a set of primes. Given any sequence $a_m\geqslant 0 $ for $m \in \mathbb{N}$ and any square-free $d\leqslant x $ that is only divisible by primes in ${\mathcal{P}}$ , we assume that
for some real numbers $X, r_d$ and a multiplicative function g. Assume that $0\leqslant g (p)<1 $ and that there exist constants $K>1, \kappa>0$ such that
holds for all $2\leqslant w < y $ . Then for all $D\geqslant y \geqslant 2 $ , we have
where $s=\log D/\log y$ and the implied constants depend at most on $\kappa$ and K.
Lemma 2.9. Assume that there exists a function $N\;:\;[1,\infty)\to [1,\infty)$ satisfying
Fix $z\geqslant 0 $ and $k\in \mathbb{N}$ . Then we have
where $\mu_k $ is the k-th moment of the standard normal distribution.
Proof. By Proposition 2.7 and (2.17), it is sufficient to prove
Let $r\in \mathbb{N}$ . By (2.15), the fact that $\delta_{p,z}\in \{0,1\}$ and the binomial theorem, we obtain
Let ${\mathcal{P}}$ be the set of all primes in $\bigcup_{i=1}^u \!(p_i, p_i^{\exp\!(z)}]$ and let $a_m $ be the indicator function of integers divisible by $p_1\cdots p_u$ . In particular,
We assume that $p_{i+1}>p_i^{\exp\!(z)}$ for all $i=1,2,\ldots, u-1$ since otherwise the sum clearly vanishes. We will now use Lemma 2.8 with $ X=n/(p_1\cdots p_u), g(d)=1/d, D=\sqrt n, y= N^{2\exp\!(z)}.$ If d is divisible only by primes in ${\mathcal{P}}$ , then it is coprime to $p_1\cdots p_u$ , hence,
thus, $|r_d|\leqslant 1 $ because $ r_d $ is the fractional part of $X/d$ . Furthermore, we can take K to be any large fixed positive constant and $\kappa =1$ , owing to
The bound $|r_d|\leqslant 1 $ , means that $\sum_{d\leqslant D} \mu^2(d) |r_d| \leqslant D=\sqrt n$ . Since $p_u\leqslant N$ , every prime in ${\mathcal{P}}$ is strictly smaller than y, hence, (2.16) gives
where the implied constant depends at most on r and z.
By the binomial theorem, we get
and we note that we can restrict the sum over $p_i$ to the terms with $p_{i+1}>p_i^{\exp\!(z)}$ for all i, since otherwise $E \left[ B_{p_1} \cdots B_{p_u} \right]=0$ . Under this restriction, the random variables $B_{p_i}$ are independent, hence,
We infer from (2.20) and (2.21) that
By (2.3), this is $\ll(\!\log \log N)^r \exp\!(\!-\!\frac{\log n}{4\exp\!(z)\log N})+n^{-1/2}N^r$ . Thus, the difference in (2.19) is
We need to show that this vanishes asymptotically, and by (2.4) and (2.17), it suffices to show
Both of these inequalities can be directly inferred from (2.17) to (2.18).
Lemma 2.10. Assume that there exists a function $N\;:\;[1,\infty)\to [1,\infty)$ satisfying
Fix $z\geqslant 0 $ . Then ${s_N} ( (1-2z\textrm{e}^{-z}) {\textrm{e}}^{-z}\log \log n )^{-1/2} \to 1$ as $n\to \infty $ and
Proof. Combining (2.4) and (2.22) one immediately gets
For any $m \in [1,n]$ , we have
Since $\delta_{p,z}$ takes only values in $\{0,1\}$ and $\delta_{p,z}(m)=1$ implies that p divides m, we see that
Furthermore, (2.3) gives
The proof now concludes by using (2.23).
2.4. Proof of Theorem 1.1
The function
fulfills (2.17)–(2.18)–(2.22)–(2.23). Hence, we can apply Lemmas 2.9–2.10.
For any $r\in \mathbb{N}$ , $c\in \mathbb C$ any probability space $\Omega_n$ and any two sequences of random variables $X_n,Y_n$ satisfying $\lim_{n\to\infty } \sup_{m \in \Omega_n}|X_n(m)-Y_n(m) |=0$ and $\lim_{n\to \infty} E_{m\in \Omega_n} [X_n(m)^r]=c$ it is easy to see by the binomial theorem that $\lim_{n\to \infty} E_{m\in \Omega_n}[Y_n(m)^r]=c$ . Using this with $ \Omega_n=\mathbb{N}\cap[1,n] $ ,
in combination with Lemmas 2.9–2.10, shows that for every $k \in \mathbb{N} $ one has
Given any sequence $a_n \in \mathbb{R}$ having limit 1 and any sequence of random variables $X_n$ with $E[X_n] $ having limit c, it is clear that $a_n E[X_n]$ has limit c. Using this with
and invoking Lemma 2.10 shows that for every $k\in \mathbb{N}$ one has
This proves Theorem 1.1 whenever r is a positive integer and this is sufficient. To see that, take any $r\in [0,\infty)$ and note that (2.24) implies that
converges in law to the standard normal distribution. Taking p to be the least even integer strictly exceeding r in [Reference van der Vaart19, Example 2.21] shows that the r-th moment of $T_n$ converges to the r-th moment of the standard normal distribution.
3. Poissonian gaps for local solubility in families of varieties
Serre’s problem [Reference Serre16] on the probability that a random variety over $\mathbb{Q}$ has a $\mathbb{Q}$ -rational point has recently received a lot of attention due to its extension by Loughran–Smeets [Reference Loughran and Smeets13] to a very general setting, namely, for any dominant morphism $f\;:\; V \to \mathbb{P}^n$ , where, V is a smooth projective variety over $\mathbb{Q}$ and f has a geometrically integral generic fibre. The fibres of f form an infinite family of varieties and typically one is interested in how often they have a $\mathbb{Q}$ -rational point. Imposing the harmless condition that the generic fibre of f is geometrically integral, it is easy to see that for every x outside of some proper Zariski closed set the function
is bounded due to the Lang–Weil estimates and Hensel’s lemma. This function helps us in understanding the density of fibres with a $\mathbb{Q}$ -rational point. Ordering $\mathbb{P}^n(\mathbb{Q})$ by the standard Weil height H on $\mathbb{P}^n(\mathbb{Q})$ and assuming that a certain invariant $\Delta(\pi)$ is non-vanishing, Loughran and Sofos [Reference Loughran and Sofos14] recently proved the analogue of Erdős–Kac’s theorem for $\omega_f(x) $ , namely that
converges in law to the standard normal distribution. This was the first instance of an Erdős–Kac law in Diophantine geometry.
Our goal in this section is to go further and study the gaps between the primes p counted by $\omega_f(x)$ . For $x\in \mathbb{P}^n(\mathbb{Q})$ with $f^{-1}(x)$ smooth, we let $p_i(x)$ be the i-th smallest prime number for which $f^{-1}(x)$ has no $\mathbb{Q}_p$ -point. We then define for all $z\geqslant 0 $ ,
Before stating our theorem, we must recall the definition of the invariant $\Delta(f)$ that is due to Loughran and Smeets [Reference Loughran and Smeets13].
Definition 3.1. Let $f\;:\;V \to X$ be a dominant proper morphism of smooth irreducible varieties over a field k of characteristic 0. For each point $x \in X$ with residue field $\kappa(x)$ , the absolute Galois group $\textrm{Gal}(\overline{\kappa(x)}/ \kappa(x))$ of the residue field acts on the irreducible components of
of multiplicity 1. Choose some finite group $\Gamma_x$ through which this action factors and define
and
where $X^{(1)}$ denotes the set of codimension 1 points of X.
Theorem 3.2. Let V be a smooth projective variety over $\mathbb{Q}$ equipped with a dominant morphism $f\;:\; V \to \mathbb{P}^n$ with geometrically integral generic fibre and $\Delta(f)\neq 0$ . Let H be the usual Weil height on $\mathbb{P}^n$ . Fix any $ z \geqslant 0 $ and $r \geqslant 0 $ . Then
as $B\to \infty $ , where $\mu_r$ is the r-th moment of the standard normal distribution.
The case $z=0$ recovers Theorems 1.2–1.3 of Loughran–Sofos [Reference Loughran and Sofos14].
Taking $r=2$ in Theorem 3.2 and [Reference Loughran and Sofos14, Theorem 1.2] shows the following after a use of Chebychev’s inequality:
Corollary 3.3. Let $f\;:\;V\to \mathbb{P}^n$ be a morphism as in Theorem 3.2. Fix any $z\geqslant 0 $ . Ordering $\mathbb{P}^n(\mathbb{Q})$ by the usual Weil height, 100% of fibres $f^{-1}(x)$ satisfy
Remark 3.4. As the right-hand side vanishes asymptotically, the corollary means that for almost all fibres $f^{-1}(x)$ , the proportion of gaps in the sequence $\{\log \log p_i(x)\}_{i\geqslant 1 }$ exceeding z is roughly constant, independently of the fibre!
In our proof, we use the arguments from Section 2, where the uniform probability space $\mathbb{N} \cap [1,n]$ is replaced by $\{x\in \mathbb{P}^n(\mathbb{Q})\;:\;H(x) \leqslant B\}$ . The main number–theoretic we use is Proposition 3.6. In sieve theory language, this is a level of distribution result for the fibres of f. The level of distribution it provides is less than $B^\varepsilon$ for any constant $\varepsilon>0$ , which is well-known to be a problematic regime for any sieve theory problem; we overcome this by extirpating small primes $p\leqslant t_0(B)$ from $\widetilde \omega_{f,z}$ , see (3.5).
3.1. Proof of Theorem 3.2
For a prime p, we define
where we use the term “non-split” in the sense of Skorobogatov [Reference Skorobogatov17, Def. 0.1]. We then introduce the random variable $\widetilde{X}_p$ so that
and such that $\widetilde X_p$ are independent. We then define
Furthermore, for any positive N, we define
Using [Reference Loughran and Sofos14, Proposition 3.6] instead of Mertens’ theorem and the estimate $\sigma_p \ll 1/p$ from [Reference Loughran and Sofos14, Lemma 3.3], the arguments in Lemma 2.2 can be modified to yield
and
Next, the proof of Lemma 2.3 goes through easily upon replacing $B_p $ by $\widetilde B_p$ owing to the inequality $E[\widetilde B_p]\leqslant E[\widetilde X_p] =\sigma_p \ll 1/p$ . Replacing $S_N$ by $\widetilde S_N$ in the statement of Proposition 2.5, we see that the proof goes through by replacing $Z_p$ by $\widetilde Z_p\;:\!=\;(\widetilde B_p-E[\widetilde B_p])/\widetilde s_N$ . Finally, using all the analogues of results in Section 2 that we mentioned so far allows one to modify the arguments of the proof of Proposition 2.7 to obtain the following result:
Proposition 3.5. Fix $z\geqslant 0 $ and a positive integer r. Then we have
where $\mu_r$ is the r-th moment of the standard normal distribution.
This concludes the probabilistic part of the proof of Theorem 3.2. The number–theoretic part requires the Fundamental lemma of sieve theory and the following:
Proposition 3.6. Keep the setting of Theorem 3.2. Then there exist constants $\delta >1, A>0$ that depends on V and f with the following property. Let $Q \in \mathbb{N}$ with $p \nmid Q$ for all $p \leqslant A$ . Then for all $\varepsilon>0$ and $Q\leqslant B^{1/6}$ , we have
where the implied constant is independent of B and Q.
Proof. By [Reference Loughran and Sofos14, Proposition 3.4], there exist $\alpha >0 , d\in \mathbb{N} $ such that the left-hand side is at most
while, it exceeds a similar quantity with $\alpha $ replaced by $-\alpha$ . As shown in [Reference Loughran and Sofos14, Lemma 3.7], we have
This is satisfactory by defining $\delta=2+\max\{4d, 2\alpha d\}$ . Finally,
owing to $Q\leqslant B^{1/6}$ .
Our next task is to show that the moments of a truncated version of $\omega_{f,z}$ are asymptotically Gaussian. For this we shall follow the arguments in Section 2.3, where $\Omega_n =\mathbb{N}\cap [1,n]$ is replaced by the uniform discrete probability space
for $B>0$ . The condition that $f^{-1}(x) \textrm{ smooth}$ is included in the definition of $\widetilde \Omega_B$ to make $\omega_{f,z}(x)$ well-defined for each $x\in \widetilde \Omega_B$ . Choosing a polynomial which vanishes on the singular locus of f, we see that
Then the standard result
where $c_n=2^n/\zeta(n+1)$ , shows that
We furthermore let for $x\in \mathbb{P}^n(\mathbb{Q})$ ,
We shall choose any two functions $t_0,t_1\;:\;(0,\infty )\to (0,\infty )$ satisfying
They will be chosen optimally later. The analogue of (2.15) in our setting is defined as
We obtain for $r\in \mathbb{N}$ ,
where we added the assumption $p_{i+1} > p_i^{\exp\!(z) } \ \forall i$ since otherwise the expectation in the right-hand side vanishes.
Let us now define the function $m_B\;:\;\mathbb{P}^n(\mathbb{Q})\to \mathbb{N}$ given by
Letting $\widetilde x$ be the product of all primes $p\leqslant t_1(B)$ we note that $m_B(x) \leqslant \widetilde x$ . Now let ${\mathcal{P}}$ be the set of all primes in $\bigcup_{i=1}^u \!(p_i, p_i^{\exp\!(z)}]$ and for $m\in \mathbb{N}$ let
This gives
We shall use Lemma 2.8 with $a_m \;:\!=\;\widetilde a_m , \kappa=\Delta(f) $ ,
The assumption $0\leqslant g(p)<1$ is satisfied here due to $\sigma_p\ll 1/p$ and $p>t_0(B) \to \infty $ . Note that for square-free d that is only divisible by primes in ${\mathcal{P}}$ , we have
Assuming that
we see that when $d\leqslant D$ , one has
for all large B. This allows us to employ Proposition 3.6 with $Q=dp_1\cdots p_u$ to obtain
where we used $\sharp\widetilde\Omega_B=c_n B^{n+1}+O(B^n(\!\log B)^{[1/n]})$ and $X g(d) \ll 1 $ . The inequality (3.8) shows that
hence,
This shows that the error term occurring in (2.16) is
The product over $p\in {\mathcal{P}}$ equals
Furthermore, the estimates $p_1 >t_0(B)$ and $X\ll 1/(p_1\cdots p_u) $ show that
The main term occurring in Lemma 2.8 is
hence, the expectation $E_{ x\in \widetilde \Omega_B}\left[\widetilde \delta_{p_1,z}(x) \cdots \widetilde\delta_{p_u,z}(x) \right]$ in the right-hand side of (3.6) equals
Injecting this into (3.6) produces the error term
Following arguments similar to the ones in the proof of Lemma 2.8, the main term is
where
We have shown that for all $r\in \mathbb{N}$ one has
Noting that $T_B= \widetilde S_{t_1(B)}-\widetilde S_{t_0(B)}$ gives
and the Cauchy–Schwarz inequality shows that
Since $0\leqslant \widetilde B_p\leqslant \widetilde X_p $ , we infer that $0\leqslant \widetilde S_N \leqslant \sum_{p\leqslant N } \widetilde X_p$ , hence,
But $E[\widetilde X_p^{r_i} ] = E[\widetilde X_p ] =\sigma_p\ll 1/p$ , hence, $E\!\left[ \widetilde S_{N}^r\right] \ll (\!\log \log N)^r $ . Hence,
which is $\ll (\!\log \log t_0(B)) (\!\log \log t_1(B) )^{r-1}.$ Hence,
Therefore,
is
This vanishes asymptotically as long as we assume that
This is due to (3.7) which implies that
We have therefore shown that, subject to (3.4)–(3.7)–(3.9), one has
The concluding arguments follow those in Lemma 2.10, the only difference being dealing with primes $p\leqslant t_0(B)$ . Recall from [Reference Loughran and Sofos14, Lemma 3.2, part (2)] that there exists a constant $A>0$ and a homogeneous $F\in \mathbb{Z}[x_0,\ldots, x_n]$ (both of which depend only on f) with the property that for all primes p and $x\in \mathbb{P}^n(\mathbb{Q})$ with $f^{-1}(x) $ smooth and $f^{-1}(x) (\mathbb{Q}_p)=\emptyset$ , one has $p\mid F(x)$ . Then
For $z>1$ and $m \in \mathbb N$ , we have $\sharp\{p\mid m \;:\; p>z\} \leqslant (\!\log m )/(\!\log z)$ . For $x\in \widetilde \Omega_B$ , we have $H(x)\leqslant B$ , thus, $\log |F(x)|\ll \log B$ . In particular,
where the implied constant is independent of B, z and x. Combined with arguments similar to the ones in Lemma 2.10, we obtain
as long as
The proof of Theorem 3.2 concludes by adapting the arguments in Section 2.4 to the current setting. This can be achieved as long as we assume that
and it now remains to find functions $t_0(B)$ and $ t_1(B)$ that satisfy all assumptions (3.4)–(3.7)–(3.9)–(3.10)–(3.11). This can be done by choosing $t_0(B)$ and $t_1(B)$ so that
Acknowledgements
I wish to thank Daniel El-Baz for suggesting the use of Stein’s method. While working on this paper, I was supported by EPSRC New Horizons grant EP/V048236/1. I would like to thank Maxim Gerspach for various helpful remarks and for finding typos in the preprint version. Furthermore, I am grateful to the referee for careful reading of the paper and helpful comments.