1. Introduction
The main focus of this paper is to examine the geometric and topological features of U-statistics when the geometric configuration of a point cloud does not occur frequently. Let ${\mathcal{X}}_n=\{X_1,\ldots,X_n\} \subset {\mathbb R}^d$ , $d\ge 2$ , be a random sample, and let $(r_n)$ be a sequence of positive numbers such that $r_n\to 0$ as $n\to\infty$ . A geometric graph $G({\mathcal{X}}_n,r_n)$ is an undirected graph with a vertex set ${\mathcal{X}}_n$ and edges $[X_i,X_j]$ for all pairs $X_i, X_j\in {\mathcal{X}}_n$ such that $\|X_i-X_j\|\le r_n$ , where $\|\!\cdot\!\|$ denotes the Euclidean norm. The monograph [Reference Penrose30] by Penrose covers a range of related topics, including subgraph counts, the vertex degree, the clique number, and the formation of a giant component. As seen in the monograph, many of the geometric statistics can be represented as U-statistics. Namely, for every $n, k\ge 2$ ,
where $|{\mathcal{Y}}|$ is the cardinality of a point set ${\mathcal{Y}}$ in ${\mathbb R}^d$ and $H_n\colon ({\mathbb R}^d)^k\to {\mathbb R}$ is defined as
for some symmetric and translation-invariant map $H\colon ({\mathbb R}^d)^k \to {\mathbb R}$ . Additionally, we can also consider a certain variant of (1.1), defined by
where ${\mathbb 1} \{ \cdot \}$ denotes an indicator function. This is of particular importance when we are examining k-tuples ${\mathcal{Y}}\subset {\mathcal{X}}_n$ , which not only satisfy geometric conditions implicit in $H_n$ but are also separated from the other points in ${\mathcal{X}}_n$ . In Section 2 we provide a more general definition of $T_{k,n}^{(i)}$ for $i=1,2$ . If one takes
where $\Gamma$ is a connected graph with k vertices and $\cong$ means graph isomorphism, then $T_{k,n}^{(1)}$ represents the number of subgraphs isomorphic to $\Gamma$ (with radius $r_n$ ) and $T_{k,n}^{(2)}$ counts the number of connected components isomorphic to $\Gamma$ . In addition to the random geometric graph setup, many of the functionals in stochastic geometry, such as intrinsic volumes of intersection processes, the volumes of simplices, can be treated under the framework of U-statistics [Reference Blaszczyszyn, Yogeshwaran and Yukich2, Reference Decreusefond, Schulte and Thäle6, Reference Lachièze-Rey and Reitzner22, Reference Last, Peccati and Schulte24, Reference Reitzner and Schulte31]. Additionally, $T_{k,n}^{(i)}$ can also arise when examining random geometric complexes. For example, with an appropriate choice of H, $T_{k,n}^{(i)}$ can be used to dictate the behavior of topological invariants of a geometric complex [Reference Bobrowski and Adler3, Reference Bobrowski and Mukherjee4, Reference Kahle and Meckes17, Reference Owada and Thomas29].
The limiting behavior of $T_{k,n}^{(i)}$ depends crucially on the decay rate of $r_n$ as $n\to\infty$ . If $r_n$ is chosen such that $n^kr_n^{d(k-1)}\to \infty$ as $n\to\infty$ , it then follows that $\mathbb{E}\bigl[T_{k,n}^{(i)}\bigr] \to \infty$ . This implies that the geometric configuration of k-tuples relating to $H_n$ asymptotically occurs infinitely many times. Then $T_{k,n}^{(i)}$ obeys a central limit theorem:
converges weakly to a standard normal random variable. Last et al. [Reference Last, Peccati and Schulte24] and Reitzner and Schulte [Reference Reitzner and Schulte31] established the rate of convergence in normal approximation in terms of the Wasserstein distance and the Kolmogorov distance, via the Malliavin–Stein method together with Palm calculus for a Poisson point process. The monograph [Reference Last and Penrose23] by Last and Penrose provides details of this line of research. Furthermore, Blaszczyszyn et al. [Reference Blaszczyszyn, Yogeshwaran and Yukich2] derived asymptotic normality of geometric statistics (not necessarily U-statistics) when the input process exhibits fast decay of correlations. In the context of random topology, proving the asymptotic normality of the simplex counts, which themselves are U-statistics, will be a crucial step in deriving the central limit theorem for topological invariants, such as the Euler characteristic and Betti numbers [Reference Kahle and Meckes17, Reference Krebs, Roycraft and Polonik21, Reference Owada and Thomas29, Reference Thomas and Owada35]. If $r_n$ decays more slowly, such that $n^k r_n^{d(k-1)}\to c$ , $n\to\infty$ , for some $c\in (0,\infty)$ , the k-tuples that satisfy the geometric conditions in $H_n$ will occur less frequently. Then $\mathbb{E}\bigl[T_{k,n}^{(i)}\bigr]$ tends to a finite positive constant as $n\to\infty$ . In particular, if one takes $H_n$ as in (1.2), $T_{k,n}^{(i)}$ converges weakly to a Poisson random variable as $n\to\infty$ , that is, for all integers $\ell \ge 0$ and $i=1,2$ ,
where ‘ $\operatorname{Poi}\!(\nu_k)$ ’ stands for a Poisson random variable with mean $\nu_k\in (0,\infty)$ . In research relating to random topology, Kahle and Meckes [Reference Kahle and Meckes17] and Owada and Thomas [Reference Owada and Thomas29] proved that the Betti number of a geometric complex converges weakly to the difference of time-changed homogeneous Poisson processes on the real half-line. Furthermore, Decreusefond et al. [Reference Decreusefond, Schulte and Thäle6] provided the rate of convergence of a point process induced by U-statistics in terms of the Kantorovich–Rubinstein distance.
The main aim of this paper is to explore the limiting behavior of $T_{k,n}^{(i)}$ when the k-tuples satisfying the geometric conditions in $H_n$ are even less likely to occur. Specifically, we assume that $r_n$ decays to 0 at a faster rate: $n^k r_n^{d(k-1)}\to 0$ as $n\to\infty$ . It then follows that
for all $\epsilon>0$ . In this setting, we aim to detect a sequence $(v_n)$ that grows to infinity, so that
converges to a non-degenerate limiting measure. Since (1.4) is not a sequence of probability measures, weak convergence as in (1.3) can no longer be used. Alternatively, by exploiting the notion of vague convergence (see [Reference Kallenberg19] and [Reference Resnick32]), we show that in the space of Radon measures on $[\!-\!\infty, \infty]\setminus \{ 0 \}$ ,
where $\stackrel{v}{\to}$ denotes vague convergence and $\mu_k$ is a non-null limit measure with $\mu_k ( \{ \pm\infty \})=0$ . From (1.5), one can deduce, from the perspective of vague topology, the exact rate (up to the scale) of the probability that $T_{k,n}^{(i)}$ becomes non-trivial (i.e. non-zero). Furthermore, the limit $\mu_k$ is expected to dictate the geometric and topological structure of $T_{k,n}^{(i)}$ which still remains in the limit. In the literature of random topology (not necessarily related to random geometric complexes), one of the key focuses is how rapidly each homology group appears and disappears [Reference Bobrowski and Weinberger5, Reference Fowler9, Reference Kahle16, Reference Kahle and Pittel18, Reference Skraba, Thoppe and Yogeshwaran34]. In the same spirit, we replace $T_{k,n}^{(i)}$ in (1.5) with the (persistent) Betti numbers and explore the rate of $v_n$ , as well as the structure of $\mu_k$ . See Section 3.1 for more details.
From a technical viewpoint, the articles most relevant to this study are those of Fasen and Roy [Reference Fasen and Roy8] and Hult and Samorodnitsky [Reference Hult and Samorodnitsky14]. In these papers the authors established large deviations for point processes based on a stationary sequence with heavy-tailed marginals and non-trivial dependency. Using the same approach as applied in these papers, the required vague convergence in (1.5) can be derived from the limit theorem for the sequence
where $\delta_x$ denotes the Dirac measure at $x\in {\mathbb R}$ . The point process in (1.6) is a random element into the space of Radon point measures, but this space is not locally compact. Accordingly, the convergence of (1.6) can no longer be treated in the vague topology. Alternatively, we aim to demonstrate the limit theory for (1.6) in the so-called ${\mathcal{M}}_0$ -topology. This notion was first developed by Hult and Lindskog [Reference Hult and Lindskog13] and has been used extensively, especially in extreme value theory, for the study of regular variation of stochastic processes [Reference Fasen and Roy8, Reference Hult and Samorodnitsky14, Reference Lindskog, Resnick and Roy25, Reference Segers, Zhao and Meinguet33]. Proposition 4.1 gives a more precise statement of this result. After completing the limit theorem for (1.6), this paper proceeds to show (1.5) by means of a continuous mapping theorem for ${\mathcal{M}}_0$ -convergence, as well as by using various approximation arguments.
The remainder of this paper is structured as follows. Section 2 presents the limit theorems for $T_{k,n}^{(i)}$ under a more general setup. Section 3 applies our general result to deduce the limit theory for geometric and topological statistics, including persistent Betti numbers of Čech complexes, the volume of simplices, a functional of the Morse critical points, and values of the min-type distance function. All the proofs are deferred to Section 4.
Before commencing the main body of the paper, let us add a few more comments on our setup. First, we assume that the density f of ${\mathcal{X}}_n$ is a.e. continuous and bounded. We can obtain the same result under a weaker assumption that
However, we have decided to impose stronger assumptions in order to avoid technical arguments relating to moment convergence, which necessarily involves the density f. Second, we observe that the same result can be obtained even if a random sample ${\mathcal{X}}_n$ is replaced by a Poisson point process $\mathcal P_n\;:\!=\; \{ X_1,\ldots,X_{N_n} \}$ , where $N_n$ is Poisson-distributed with mean n, independent of $(X_i)$ . In this case, one needs to use Palm calculus (see e.g. [Reference Penrose30, Section 1.7]) when computing the moments of $T_{k,n}^{(i)}$ . Finally, we remark that establishing a more general limit theory for a (discrete time) process ${\mathcal{X}}_n$ with non-trivial dependency remains a topic of further research. Indeed, Fasen and Roy [Reference Fasen and Roy8] and Hult and Samorodnitsky [Reference Hult and Samorodnitsky14] examined a moving average process and derived a series of large deviation results in the form of (1.5) and (1.6). In such cases, the structure of the limit $\mu_k$ becomes more complicated, reflecting a significant amount of clusters induced by a moving average process. In the case of Poisson limit theorems, a similar line of research can be found in [Reference Owada27], which studied the asymptotic behavior of Betti numbers generated by a moving average process.
2. Main limit theorem
We take a random sample ${\mathcal{X}}_n = \{ X_1,\ldots,X_n \} \subset {\mathbb R}^d$ , $d\ge 2$ , with density f, and a sequence of (non-random) radii $r_n \to 0$ , $n\to\infty$ , such that $n^kr_n^{d(k-1)} \to 0$ for some $k\ge 2$ . Assume that f is a.e. continuous and bounded, that is, $\|f\|_\infty \;:\!=\; \operatorname{ess\,sup}_{x\in{\mathbb R}^d} f(x)<\infty$ . Fix $m\ge 1$ and let $H\colon ({\mathbb R}^d)^k\to {\mathbb R}^m$ be a measurable function satisfying the following conditions.
-
(H1) H is symmetric about permutations, i.e. $H(x_1,\ldots,x_k) = H(x_{\sigma(1)}, \ldots, x_{\sigma(k)})$ for all $x_i\in {\mathbb R}^d$ and every permutation $\sigma$ of $\{ 1,\ldots,k \}$ .
-
(H2) H is translation-invariant, i.e. $H(x_1,\ldots,x_k) = H(x_1 + y, \ldots, x_k + y)$ for all $x_i, y \in {\mathbb R}^d$ .
-
(H3) H is locally determined, i.e. there exists $L>0$ such that $H(x_1,\ldots,x_k) = 0$ whenever $\operatorname{diam}\!(x_1,\ldots,x_k) \ge L$ , where $\operatorname{diam}\!(x_1,\ldots,x_k) = \max_{1 \le i,j \le k}\| x_i - x_j\|$ .
-
(H4) H is integrable in the sense of
\[\int_{({\mathbb R}^d)^{k-1}} \| H(0,y_1,\ldots,y_{k-1}) \| \,{\textrm{d}} \textbf{y} < \infty.\]
We also define a scaled version of H by
Given a subset ${\mathcal{Y}}$ of k points in ${\mathbb R}^d$ , a finite point set ${\mathcal{Z}}\supset {\mathcal{Y}}$ in ${\mathbb R}^d$ , and $\;\textbf{t} = (t_1,\ldots, t_m)\in [0,\infty)^m$ , we define
In particular, the ith component of (2.2) requires that each point in ${\mathcal{Y}}$ must be distance at least $t_i$ from all the remaining points in ${\mathcal{Z}}\setminus {\mathcal{Y}}$ . Moreover,
where $\circ$ means the Hadamard product: for two matrices $A=(a_{ij})$ and $B=(b_{ij})$ of the same dimension $\ell_1 \times \ell_2$ , $A\circ B$ represents an $\ell_1 \times \ell_2$ matrix with (i, j) element given by $a_{ij}b_{ij}$ . For ${\mathcal{Y}}=(y_1,\ldots,y_k) \in ({\mathbb R}^d)^k$ and $a\in {\mathbb R}$ , we write $a{\mathcal{Y}}=(ay_1,\ldots,ay_k)$ . We then define
and
The primary objective of this paper is to examine the behavior of
For the rigorous description of the asymptotic theory of (2.6), one needs the following notations and concepts. Our main references are [Reference Kallenberg19] and [Reference Resnick32]. First, let $E\;:\!=\; (\overline {\mathbb R})^m \setminus \{ \textbf{0} \} = [\!-\!\infty, \infty]^m \setminus \{ \textbf{0} \}$ with $\textbf{0}=(0,\ldots,0)\in {\mathbb R}^m$ , and let $M_+(E)$ be the space of Radon measures on E, and $M_p(E)$ denotes the space of Radon point measures on E. Note that $M_p(E)$ is a closed subset of $M_+(E)$ in the vague topology; see Proposition 3.14 in [Reference Resnick32]. Define $C_K^+(E)$ to be the collection of non-negative and continuous functions on E with compact support. For $\eta_n, \eta \in M_+(E)$ , we say that $\eta_n$ converges vaguely to $\eta$ , denoted by $\eta_n \stackrel{v}{\to} \eta$ in $M_+(E)$ , if it holds that
Now we can state our main theorem. The proof is deferred to Section 4.1.
Theorem 2.1. Under the assumptions above, for each $i=1,2$ , we have
where
and $\lambda$ is the Lebesgue measure on $({\mathbb R}^d)^{k-1}$ .
The U-statistics $T_{k,n}^{(1)}$ is associated with k-tuples satisfying the geometric conditions implicit in $H_n$ , while $T_{k,n}^{(2)}$ adds an extra constraint that the points in ${\mathcal{Y}}$ must be distance at least a constant multiple of $r_n$ from the remaining points in ${\mathcal{X}}_n\setminus {\mathcal{Y}}$ . Despite such a difference, Theorem 2.1 indicates that the behaviors of $T_{k,n}^{(i)}$ , $i=1,2$ are asymptotically the same. In other words, the extra restriction imposed on $T_{k,n}^{(2)}$ is asymptotically negligible, whenever $r_n$ decays so fast that $n^k r_n^{d(k-1)}\to0$ as $n\to\infty$ .
3. Geometric and topological applications
In this section we use Theorem 2.1 to deduce the limit theory for geometric and topological statistics satisfying conditions (H1)–(H4). Throughout this section we assume that ${\mathcal{X}}_n=\{ X_1,\ldots,X_n \}$ is a random sample in ${\mathbb R}^d$ , $d\ge 2$ , with density f, and let $(r_n)$ be a sequence of connectivity radii with $r_n\to 0$ as $n\to\infty$ . Furthermore, f is assumed to be a.e. continuous and bounded. Denote $\lambda$ to be the Lebesgue measure in a given dimension. All of the proofs are provided in Sections 4.2–4.4. All the examples here are more or less concerned with a Čech complex defined on ${\mathcal{X}}_n$ with connectivity radius $r_n$ .
Definition 3.1. Given a set ${\mathcal{X}}=\{ x_1,\ldots,x_n \}$ of points in ${\mathbb R}^d$ and a positive number $r>0$ , we define a Čech complex $\check{C}({\mathcal{X}},r)$ as follows.
-
• The 0-simplices are the points in ${\mathcal{X}}$ .
-
• The p-simplex $[x_{i_0}, \ldots, x_{i_p}]$ , $1 \le i_0 < \dots < i_p \le n$ , belongs to $\check{C}({\mathcal{X}},r)$ if
\[ \bigcap_{\ell=0}^p B(x_{i_\ell}, r/2) \neq \emptyset, \]where B(x, r) is a d-dimensional closed ball of radius r centered at $x\in {\mathbb R}^d$ .
3.1. Persistent Betti number
Our first application is concerned with the persistent Betti number. Because of the recent development of topological data analysis, the (persistent) Betti number has been intensively studied as a basic topological invariant representing, roughly, the creation and destruction of topological cycles of various dimensions [Reference Bobrowski and Mukherjee4, Reference Hiraoka, Shirai and Trinh12, Reference Kahle15, Reference Kahle and Meckes17, Reference Krebs and Polonik20, Reference Yogeshwaran and Adler36, Reference Yogeshwaran, Subag and Adler37]. First we define a family
of Čech complexes over a scaled random sample $r_n^{-1}{\mathcal{X}}_n$ . Note that (3.1) constitutes a nested sequence of Čech complexes satisfying monotonicity property $\check{C}({\mathcal{X}}_n, r_n s) \subset \check{C}({\mathcal{X}}_n, r_nt)$ for all $0 < s \le t <\infty$ .
Now we fix a non-negative integer k and let $Z_k( \check{C} ({\mathcal{X}}_n, r_nt))$ be the kth cycle group of $\check{C} ({\mathcal{X}}_n, r_nt)$ , and let $B_k( \check{C} ({\mathcal{X}}_n, r_nt) )$ be the kth boundary group of the same complex. Then $H_k ( \check{C} ({\mathcal{X}}_n, r_nt) )\;:\!=\; Z_k( \check{C} ({\mathcal{X}}_n, r_nt) )/ B_k ( \check{C} ({\mathcal{X}}_n, r_nt) )$ is the kth homology group, representing the elements of (non-trivial) k-dimensional cycles, which can be interpreted as the boundary of a $(k+1)$ -dimensional body. The kth Betti number
denotes the rank of $H_k( \check{C} ({\mathcal{X}}_n, r_nt) )$ . Loosely speaking, (3.2) counts the number of k-dimensional cycles in $\check{C} ({\mathcal{X}}_n, r_nt)$ . Moreover, (3.2) can be extended to the kth persistent Betti number, defined by
More intuitively, (3.3) represents the number of k-dimensional cycles that appear in (3.1) before time s and remain alive at time t. Clearly $\beta_{k,n}(t,t)$ reduces to the ordinary Betti number in (3.2). Readers wishing to have a more rigorous coverage of these algebraic topological notions may refer to [Reference Edelsbrunner, Letscher and Zomorodian7], [Reference Hatcher11], and [Reference Munkres26].
To provide a precise setup for the theorem below, we restrict the range of k to $\{ 1,\ldots,d-1 \}$ , while taking $m\ge 1$ and $0\le s_i \le t_i < \infty$ for $i=1,\ldots,m$ . For $(x_1,\ldots,x_{k+2})\in ({\mathbb R}^d)^{k+2}$ and $r>0$ , we define
Here (3.4) requires that a point set $\{ x_1,\ldots,x_{k+2} \}$ in ${\mathbb R}^d$ forms a single k-dimensional cycle with connectivity radius r. Furthermore, we let
where $\textbf{s}=(s_1,\ldots,s_m)$ and $\;\textbf{t}=(t_1,\ldots,t_m)$ . It is then easy to check that H satisfies conditions (H1)–(H4). The theorem below derives the exact rate (up to the scale) of a probability that the kth persistent Betti number becomes non-zero, when $n^{k+2}r_n^{d(k+1)} \to 0$ as $n\to\infty$ .
Theorem 3.1. Assume that $n^{k+2}r_n^{d(k+1)}\to 0$ as $n\to\infty$ . Then, as $n\to\infty$ , we have
where $\textbf{y} = (y_1,\ldots,y_{k+1})\in ({\mathbb R}^d)^{k+1}$ , $\textbf{0}=(0,\ldots,0)\in {\mathbb R}^m$ , and $C_{k+2}$ is given in Theorem 2.1.
Additionally, for $u_i\ge0$ , $u_i \neq 1$ , $i=1,\ldots,m$ , with $\max_{1\le i \le m} u_i >0$ , we have, as $n\to\infty$ ,
Note that the limit in (3.7) is equal to 0 whenever $\max_{1\le i \le m}u_i >1$ . As a direct consequence of (3.7), we obtain that for every $a_i \in \{0,1 \}$ , $i=1,\ldots,m$ with $\sum_{i=1}^m a_i \ge 1$ ,
We observe that $h_{s_i}(0,\textbf{y})h_{t_i}(0,\textbf{y})=1$ if and only if the point set $\{0,\textbf{y}\}=\{0,y_1,\ldots,y_{k+1}\}\in ({\mathbb R}^d)^{k+2}$ forms a single k-cycle before time $s_i$ , such that this cycle is still alive at time $t_i$ .
3.2. Volume of simplices
We next consider an application to the volume functional of simplices. Fix $d\ge 2$ and $1\le k\le d$ . For $(x_1,\ldots,x_{k+1}) \in ({\mathbb R}^d)^{k+1}$ , let
be the k-simplex spanned by $x_1,\ldots,x_{k+1}$ . Furthermore, $V_k ( [x_1,\ldots,x_{k+1}] )$ denotes its k-dimensional volume. By slightly abusing notation, we write $V_k({\mathcal{Y}}) = V_k ( [y_1,\ldots,y_{k+1}] )$ for ${\mathcal{Y}}=(y_1,\ldots,y_{k+1})\in ({\mathbb R}^d)^{k+1}$ . The objective of this section is to explore the asymptotic behavior of
where $b_i \ge 0$ and $T_i>0$ for $i=1,\ldots,m$ . If one takes $b_i=0$ , the ith component of $F_{k,n}$ represents the k-simplex counts of a Čech complex $\check{C}({\mathcal{X}}_n, r_nT_i)$ . In the case of $b_i=1$ , the ith component of $F_{k,n}$ represents the total volume of these k-simplices. Furthermore, if $k=1$ and $b_i=1$ , the ith component of $F_{1,n}$ is the total edge length in a random geometric graph with radius $r_nT_i$ .
The corollary below investigates the probability that each component of the scaled $F_{k,n}$ exceeds a positive constant when $n^{k+1}r_n^{dk} \to 0$ as $n\to\infty$ .
Corollary 3.1. Assume that $n^{k+1}r_n^{dk}\to 0$ as $n\to\infty$ . Then, as $n\to\infty$ ,
where $\textbf{y}=(y_1,\ldots,y_k)\in ({\mathbb R}^d)^k$ , $\textbf{0}=(0,\ldots,0)\in {\mathbb R}^m$ , and
Furthermore, for all $u_i > 0$ , $i=1,\ldots,m$ , we have as $n\to\infty$ ,
3.3. Morse critical points and values of min-type distance function
To understand the topology of random Čech complexes, Bobrowski and Adler [Reference Bobrowski and Adler3] proposed an approach based on an extension of Morse theory to ‘min-type’ distance functions. For a finite set ${\mathcal{Z}}$ of points in ${\mathbb R}^d$ , we define a distance function $d_{{\mathcal{Z}}}\colon {\mathbb R}^d\to [0,\infty)$ by
Since $d_{\mathcal{Z}}$ is not differentiable, the classical definition of critical points does not apply to $d_{\mathcal{Z}}$ . Nevertheless, one can still extend a notion of critical points, as well as their Morse critical index, to the min-type distance function as in (3.11) by means of an approach in [Reference Gershkovich and Rubinstein10]. More precisely, following the notations and definitions in [Reference Bobrowski and Adler3], we say that $c\in {\mathbb R}^d$ is a critical point of $d_{\mathcal{Z}}$ with index $1\le k \le d$ , if there exists a set ${\mathcal{Y}}\subset {\mathcal{Z}}$ of $k+1$ points, such that:
-
(i) the points in ${\mathcal{Y}}$ are in general position,
-
(ii) $d_{\mathcal{Z}}(c)=\|c-y\|$ for all $y\in {\mathcal{Y}}$ , while $d_{\mathcal{Z}}(c) < \|c-z\|$ for all $z\in {\mathcal{Z}}\setminus {\mathcal{Y}}$ ,
-
(iii) $c\in \operatorname{conv}^{\circ} ({\mathcal{Y}})$ , where $\operatorname{conv}^\circ ({\mathcal{Y}})$ represents an interior of a convex hull spanned by the points in ${\mathcal{Y}}$ .
By virtue of the nerve lemma (see e.g. [Reference Björner1, Theorem 10.7]), for each $r>0$ , the sublevel set $d_{{\mathcal{X}}_n}(\!-\!\infty, r]$ is homotopy equivalent to a Čech complex $\check{C} ({\mathcal{X}}_n, 2r)$ . By the standard application of Morse theory as well as the nerve lemma, Bobrowski and Adler [Reference Bobrowski and Adler3] justified that given a sequence $r_n\to 0$ , $n\to\infty$ , the number of critical points of $d_{{\mathcal{X}}_n}$ with index k, such that their critical values are less than $r_n$ , behaves very similarly to $\beta_{k-1} ( \check{C}({\mathcal{X}}_n,2r_n) )$ . A similar analysis was conducted in [Reference Yogeshwaran and Adler36], in the case when a set of points are sampled from a stationary point process. Additionally, Bobrowski and Mukherjee [Reference Bobrowski and Mukherjee4] studied a more general case for which random points are supported on an $\ell$ -dimensional manifold $\mathcal M \subset {\mathbb R}^d$ ( $\ell < d$ ).
In this setting, we aim to study the asymptotic theory of
where $b_i \ge 0$ and $T_i >0$ , $i=1,\ldots, m$ . Moreover, $\gamma({\mathcal{Y}})$ denotes a critical point of $d_{{\mathcal{X}}_n}$ with index k, generated by the points in ${\mathcal{Y}}$ , $R({\mathcal{Y}})$ is its critical value, and ${\mathcal{U}}({\mathcal{Y}})$ is an open ball in ${\mathbb R}^d$ with radius $R({\mathcal{Y}})$ centered at $\gamma({\mathcal{Y}})$ . If $b_i=0$ , the ith component of $S_{k,n}$ represents the number of critical points of index k with critical values less than $r_nT_i$ . In the case of $b_i=1$ , the ith component of $S_{k,n}$ represents the sum of those critical values. The corollary below gives the rate of a probability that the appropriately scaled $S_{k,n}$ is asymptotically non-trivial when $n^{k+1}r_n^{dk} \to 0$ as $n\to\infty$ .
Corollary 3.2. Assume that $n^{k+1}r_n^{dk}\to 0$ as $n\to\infty$ . Then, as $n\to\infty$ ,
where $\textbf{y}=(y_1,\ldots,y_k) \in ({\mathbb R}^d)^k$ and
Moreover, for all $0< u_i \le T_i^{b_i}$ , $i=1,\ldots,m$ ,
4. Proofs
4.1. Proof of Theorem 2.1
The main machinery for our proof is a certain asymptotic result of point processes induced by the statistics in (2.6). More precisely, we consider the point processes
where $\delta_z$ denotes the Dirac measure at $z\in {\mathbb R}^m$ .
For the rigorous description of the asymptotic behavior of (4.1), we need the following concepts. The main references here are [Reference Hult and Lindskog13], [Reference Hult and Samorodnitsky14], and [Reference Lindskog, Resnick and Roy25]. Recall first that the vague topology on $M_p(E)$ is metrizable as a complete, separable metric space. The metric that induces the vague topology is called the vague metric, and its explicit form is given in the proof of Proposition 3.17 of [Reference Resnick32]. Let $\emptyset \in M_p(E)$ be the null measure that assigns zeros to all Borel-measurable sets in E, and let $B_{\emptyset, r}$ denote an open ball of radius $r>0$ centered at $\emptyset$ in the vague metric. Let ${\mathcal{M}}_0 = {\mathcal{M}}_0( M_p(E) )$ denote the space of Borel measures on $M_p(E)$ , the restriction of which to $M_p(E)\setminus B_{\emptyset, r}$ is finite for all $r>0$ . Moreover, define $\mathcal C_0 = \mathcal C_0(M_p(E) )$ to be the space of continuous and bounded real-valued functions on $M_p(E)$ that vanish in the neighborhood of $\emptyset$ . Given $\eta_n, \eta \in {\mathcal{M}}_0$ , we say that $\eta_n$ converges to $\eta$ in the ${\mathcal{M}}_0$ -topology, denoted by $\eta_n\to \eta$ in ${\mathcal{M}}_0$ , if it holds that
The proposition below reveals the required asymptotics of (4.1). The result may be of independent interest. It can actually parallel Theorem 4.1 of [Reference Hult and Samorodnitsky14] and Theorems 3.1 and 4.1 of [Reference Fasen and Roy8], the authors of which studied large deviations for point processes based on a stationary sequence with heavy-tailed marginals and non-trivial dependency. As in the case of Theorem 2.1, the limits of $N_{k,n}^{(i)}$ , $i=1,2$ coincide with one another, due to the fact that the indicator $c_n$ at (2.4) tends to 1 as $n\to\infty$ .
Proposition 4.1. Under the assumptions in Theorem 2.1, for each $i=1,2$ , we have
Before commencing the proof, we observe that $M_p(E)$ is not locally compact; thus, unlike Theorem 2.1, the convergence in Proposition 4.1 cannot be treated in terms of vague topology. In contrast, the theory of ${\mathcal{M}}_0$ -topology requires only that the underlying space be complete and separable. Since $M_p(E)$ is complete and separable (see Proposition 3.17 in [Reference Resnick32]), one can exploit ${\mathcal{M}}_0$ -topology as an appropriate topology for the convergence in Proposition 4.1.
Proof of Proposition 4.1. Since the proofs of the two statements are very similar in nature, we prove the case $i=2$ only. Given $U_1, U_2 \in C_K^+(E)$ and $\epsilon_1, \epsilon_2> 0$ , define $F_{U_1, U_2, \epsilon_1, \epsilon_2}\colon M_p(E)\to [0,1]$ by
where $(a)_+ = a$ if $a\ge 0$ and 0 otherwise, and
It is elementary to check that $F_{U_1, U_2, \epsilon_1, \epsilon_2} \in \mathcal C_0$ .
For ease of description, we introduce several shorthand notations: for $\ell\ge 1$ and $n\ge 1$ , let
be the collection of ordered $\ell$ -tuples of positive integers. Given a random sample ${\mathcal{X}}_n=\{ X_1,\ldots, X_n \}\subset {\mathbb R}^d$ , we write
Using these notations, we denote
According to Theorem A.2 in [Reference Hult and Samorodnitsky14], the required statement follows if one can show that
for every $U_1, U_2 \in C_K^+(E)$ and $\epsilon_1, \epsilon_2>0$ . For each $\ell=1,2$ , $U_\ell$ has compact support in E, so there exists $\zeta>0$ such that
( $\operatorname{supp}\!(U_\ell)$ denotes the support of $U_\ell$ ). Define
then we have
We first show that $B_n$ tends to 0 as $n\to\infty$ . Since $0 \le \Theta_n \le 1$ and $\|G_n({\mathcal{X}}_\textbf{i}, {\mathcal{X}}_n; \;\textbf{t})\| \le \|H_n({\mathcal{X}}_\textbf{i})\|$ for all $\textbf{i} \in {\mathcal{I}}_{k,n}$ , we have
Performing the change of variables by $x_i = x+r_ny_{i-1}$ , $i=1,\ldots,k$ (with $y_0 \equiv 0$ ) together with the translation invariance of H as well as (2.1),
By property (H3) of H, the integral in the last term is finite. Since $n^k r_n^{d(k-1)} \to 0$ as $n\to \infty$ , we obtain $C_n \to 0$ , $n\to\infty$ . Next, turning to $D_n$ , we change the variables by $x_i = x + r_n y_{i-1}$ , $i=1,\ldots,2k-\ell$ (with $y_0 \equiv 0$ ), to obtain that
By property (H3) of H, the integral in the last term is again finite. As $nr_n^d\to 0$ , $n\to\infty$ , we can obtain $D_n \to 0$ , $n\to\infty$ , which concludes that $B_n \to 0$ , $n\to\infty$ , as desired.
Returning to $A_n$ in (4.8), we observe that
are disjoint. Hence we can see from (4.7) that
Repeating the same argument as that for proving $B_n\to0$ , $n\to\infty$ , it is not hard to see that $F_n\to 0$ as $n\to\infty$ . Assuming without loss of generality that $0\le t_1 \le \dots \le t_m <\infty$ , we divide $E_n$ into two terms:
where $c_{n,m}({\mathcal{X}}_\textbf{i}, {\mathcal{X}}_n; \;\textbf{t})$ denotes the mth element of $c_n({\mathcal{X}}_\textbf{i}, {\mathcal{X}}_n; \;\textbf{t})$ (see (2.4)). Of the last two terms, we show that $J_n$ is negligible as $n\to\infty$ . Indeed, by (4.7) and $\|G_n({\mathcal{X}}_\textbf{i}, {\mathcal{X}}_n; \;\textbf{t})\| \le \|H_n({\mathcal{X}}_\textbf{i})\|$ , we see that
Then the right-hand side of (4.10) is equal to
In the above, ${\mathcal{B}}({\mathcal{X}}_\textbf{i};\; r_nt_m)$ represents the union of balls of radius $r_nt_m$ around the points in ${\mathcal{X}}_\textbf{i}=\{ X_{i_1},\ldots,X_{i_k} \}$ , that is,
By the change of variables $x_i=x+r_ny_{i-1}$ , $i=1,\ldots,k$ (with $y_0\equiv0$ ) and the translation invariance of H, the last expression in (4.11) becomes
where $\textbf{y}=(y_1,\ldots,y_{k-1})\in ({\mathbb R}^d)^{k-1}$ . For every $x\in {\mathbb R}^d$ and $\textbf{y}=(y_1,\ldots,y_{k-1}) \in ({\mathbb R}^d)^{k-1}$ ,
so that
Hence we have obtained
Now the dominated convergence theorem, as well as property (H3) of H, ensures that the last expression in (4.12) goes to 0 as $n\to\infty$ . Thus $J_n\to0$ as $n\to\infty$ .
From all of the convergence results derived thus far, we have $\eta_n(F_{U_1,U_2,\epsilon_1,\epsilon_2}) = I_n +{\textrm{o}}(1)$ as $n\to\infty$ . We note that if $c_{n,m}({\mathcal{X}}_\textbf{i},{\mathcal{X}}_n; \;\textbf{t})=1$ , then all the other elements in $c_n({\mathcal{X}}_\textbf{i},{\mathcal{X}}_n; \;\textbf{t})$ are equal to 1, and therefore $ G_n({\mathcal{X}}_\textbf{i}, {\mathcal{X}}_n; \;\textbf{t}) = H_n({\mathcal{X}}_\textbf{i})$ . By the conditioning on ${\mathcal{X}}_\textbf{i}$ as in (4.11), as well as the same change of variables as in (4.12), we can see that
By the continuity of f, it holds that
Furthermore, the integrand in (4.14) is bounded above by $f(x) \| f \|_\infty^{k-1} {\mathbb 1} \{ \| H(0,\textbf{y}) \| > \zeta \}$ , which is clearly integrable in $(x,\textbf{y})\in({\mathbb R}^d)^k$ . Therefore the dominated convergence theorem gives that as $n\to\infty$ ,
Proof of Theorem 2.1. As in the case of Proposition 4.1, the proofs of the two statements are similar in nature, so we again prove the case $i=2$ only. Let $0 < \epsilon < 1$ , and define $V_\epsilon\colon E \to E$ by
Next we define a map $T_{V_\epsilon}\colon M_p(E) \to E$ by
Below we only consider $\epsilon \in (0,1)$ , so that
Note that (4.15) holds except at most countably many $\epsilon\in (0,1)$ . Now we claim that
where $\eta_n$ and $\eta$ are defined at (4.5) and (4.6) respectively. Equivalently, we aim to show that
for every $F\in C_K^+(E)$ . To show this, by [Reference Resnick32, Proposition 3.12] it suffices to verify that
for all relatively compact sets $A\subset E$ with $\eta\circ T_{V_\epsilon}^{-1} (\partial A)=0$ , where $\partial A$ denotes the boundary of A. According to [Reference Hult and Lindskog13, Theorem 2.4] along with (4.2), we must show
where $\emptyset\in M_p(E)$ is the null measure and $\overline{B}$ denotes the closure of B. For the proof of the first requirement in (4.17), it is elementary to check that
where $\mathcal D_{T_{V_\epsilon}}$ is the collection of $\xi\in M_p(E)$ such that $T_{V_\epsilon}$ is discontinuous at $\xi$ . It then follows from (4.15) and (4.18) that
Next, suppose for contradiction that $\emptyset \in \overline{T_{V_\epsilon}^{-1}(A)}$ . Then there exists a sequence $(\xi_n)\subset T_{V_\epsilon}^{-1}(A)$ such that $\xi_n\stackrel{v}{\to} \emptyset$ in $M_p(E)$ . Since $T_{V_\epsilon}$ is continuous at $\emptyset$ with $T_{V_\epsilon}(\emptyset)=\textbf{0} = (0,\ldots,0)\in {\mathbb R}^m$ , we have $T_{V_\epsilon}(\xi_n)\to \textbf{0}$ as $n\to\infty$ . This implies $\textbf{0}\in \overline{A}$ , which however contradicts the relative compactness of A in E.
For ease of description, using the notations in (4.3) and (4.4) we denote
Then the entire proof will be completed if we can verify that
for every $F\in C_K^+(E)$ . To begin, we bound $| {\widetilde \eta}_n(F) - {\widetilde \eta}(F) |$ as follows:
Because of (4.16), we have
It thus remains to demonstrate that
For the proof of (4.19), note that there exists $\delta_0 > 0$ so that $\operatorname{supp}\!(F) \cap {\mathbb R}^m \subset \{ x\in{\mathbb R}^m\colon \|x\| > \delta_0 \}$ . We also fix a constant $\delta' \in (0,\delta_0/2)$ . Then, for every $0 < \delta < \delta_0/2$ , we have
Since F is bounded, it follows from the dominated convergence theorem and property (H3) of H that
Next, turning our attention to $B_n$ , we can see that
To show this, suppose that $\| H(0,\textbf{y}) \| \le \delta' < \delta_0/2$ . Then $F( H(0,\textbf{y}) ) = 0$ , and further,
which implies that $F( V_\epsilon(H(0,\textbf{y})) ) = 0$ . From (4.21) we have
where
is the modulus of continuity of F.
Combining all of these results, we conclude that
for all $0<\delta<\delta_0/2$ . Finally, letting $\delta\downarrow 0$ , we find that $\lim_{\epsilon\to0} | \eta\circ T_{V_\epsilon}^{-1}(F)-{\widetilde \eta}(F) |=0$ as F is uniformly continuous on ${\mathbb R}^m$ .
Next, let us proceed to the proof of (4.20). We fix $\delta_0$ and $\delta'$ in the same way as above. Then, for $0<\delta<\delta_0/2$ ,
Noting that F is bounded, while assuming without loss of generality that $0 \le t_1 \le \dots \le t_m <\infty$ , we can bound $C_n$ as follows:
where $c_{n,m}({\mathcal{X}}_\textbf{i},{\mathcal{X}}_n; \;\textbf{t})$ was defined in (4.9). Of the last two terms, we have that as $n\to\infty$ ,
where the last convergence is obtained by repeating the same argument as that for proving that (4.10) converges to 0 as $n\to\infty$ . For the asymptotics of $E_n$ , recall that $G_n({\mathcal{X}}_\textbf{i},{\mathcal{X}}_n; \;\textbf{t})=H_n({\mathcal{X}}_\textbf{i})$ whenever $c_{n,m}({\mathcal{X}}_\textbf{i},{\mathcal{X}}_n; \;\textbf{t})=1$ . Therefore
Making the change of variables by $x_i=x+r_ny_{i-1}$ , $i=1,\ldots,k$ (with $y_0\equiv 0$ ) and using the translation invariance of H,
where the last convergence is obtained as a consequence of the dominated convergence theorem and condition (H4) of H. Combining (4.22) and (4.23), we conclude that $\lim_{\epsilon\to 0}\limsup_{n\to\infty} C_n=0$ . Next, the same reasoning as in (4.21) yields that, for every $0 < \delta < \delta_0/2$ ,
Therefore, for every $0 < \delta < \delta_0/2$ ,
The rightmost term above tends to 0 as $\delta \downarrow 0$ because F is uniformly continuous. Now the entire proof has been completed.
4.2. Proof of Theorem 3.1
Before starting the proof of Theorem 3.1, we will introduce a certain lemma that gives the upper and lower bounds of the kth persistent Betti number in (3.3). Before stating the lemma, we need to recall the notations (4.3) and (4.4).
Lemma 4.1. (Lemma 4.1 in [Reference Owada28].) Under the assumptions of Theorem 3.1, we have, for all $0\le s \le t \le \infty$ ,
where
Moreover, for all $0<t<\infty$ ,
Proof of Theorem 3.1. We first define a scaled version of H in (3.5) by
For a subset ${\mathcal{Y}}$ of $k+2$ points in ${\mathbb R}^d$ and a finite point set ${\mathcal{Z}}\supset {\mathcal{Y}}$ in ${\mathbb R}^d$ , define $c({\mathcal{Y}}, {\mathcal{Z}}; \;\textbf{t})$ and $c_n({\mathcal{Y}},{\mathcal{Z}}; \;\textbf{t})$ as in (2.2) and (2.4) respectively. Analogously to (2.3) and (2.5), we also define
Since Theorem 2.1 yields
(3.6) will follow, provided that for every $F\in C_K^+( [0,\infty]^m \setminus \{ \textbf{0} \})$ ,
where
As in the proof of Theorem 2.1, we fix $\delta_0, \delta' >0$ so that
and $\delta'\in (0,\delta_0/2)$ . Then, for every $0<\delta<\delta_0/2$ , the absolute value of (4.26) is bounded by
By Markov’s inequality and $\|F\|_\infty<\infty$ ,
For ease of description, we assume that $0 \le t_1 \le \dots \le t_m <\infty$ . Then Lemma 4.1 gives that
The last inequality is due to the fact that $L_r$ is non-decreasing in r. By virtue of this bound and (4.25) in Lemma 4.1,
Repeating the same argument as in (4.21) and (4.24), one can obtain that
Therefore, for every $0 < \delta <\delta_0/2$ ,
Finally, the right-hand side converges to 0 as $\delta\downarrow 0$ . Hence $\limsup_{n\to\infty} (A_n+B_n)=0$ , and we have established (3.6).
Finally, applying Portmanteau’s theorem for vague convergence (see [Reference Resnick32, Proposition 3.12]) to (3.6), we can see that as $n\to\infty$ ,
for all $u_i\ge 0$ , $u_i \neq 1$ , $i=1,\ldots,m$ , with $\max_{1\le i \le m}u_i >0$ . By the customary change of variables, it is elementary to show that as $n\to\infty$ ,
4.3. Proof of Corollary 3.1
Proof. We need to rewrite $F_{k,n}$ in the notation of Theorem 2.1. First it is easy to show that H in (3.9) fulfills conditions (H1)–(H4). Letting $H_n$ be defined as in (2.1), one can write
Then (3.8) is an easy consequence of Theorem 2.1. Note also that
Finally, (3.10) can be obtained from (3.8) and (4.29) as well as Portmanteau’s theorem for vague convergence.
4.4. Proof of Corollary 3.2
Proof. As in the proof of Corollary 3.1, one needs to reformulate $S_{k,n}$ in the notation of Theorem 2.1. First we notice that H in (3.13) satisfies conditions (H1)–(H4). Define $H_n$ by (2.1), and for a subset ${\mathcal{Y}}\subset {\mathbb R}^d$ with $|{\mathcal{Y}}|=k+1$ and a finite ${\mathcal{Z}}\supset {\mathcal{Y}}$ in ${\mathbb R}^d$ ,
Note that we have defined (4.30) in a way different from the original definition in (2.2). Then, unlike the definition in (2.4),
does not depend on $n\ge 1$ . Defining $G_n({\mathcal{Y}},{\mathcal{Z}}) \;:\!=\; H_n({\mathcal{Y}})\circ c_n({\mathcal{Y}},{\mathcal{Z}}) = H_n({\mathcal{Y}})\circ c({\mathcal{Y}},{\mathcal{Z}})$ as in (2.5), we have
For our purposes we need to apply Theorem 2.1 to (4.31). Before doing so, however, one must slightly modify the proof of Theorem 2.1 as we have changed the definition of c as in (4.30). Below, we show that for every $x\in {\mathbb R}^d$ and $\textbf{y}=(y_1,\ldots,y_k) \in ({\mathbb R}^d)^k$ ,
In fact this replaces the argument in (4.13). If (4.32) is established, the remainder of the argument in the proof of Theorem 2.1 can be altered in a very obvious manner. To show (4.32), we note that
so that
where $\theta_d$ is a volume of the unit ball in ${\mathbb R}^d$ . By the Lebesgue differentiation theorem,
Since $nr_n^d \to 0$ as $n\to\infty$ , we have obtained (4.32).
Now we can apply Theorem 2.1 to get (3.12). Furthermore, by Portmanteau’s theorem for vague convergence, we obtain that
for all $0 < u_i \le T_i^{b_i}$ , $i=1,\ldots,m$ . By the same calculation as in (4.29), we can show that as $n\to\infty$ ,
Acknowledgement
The author is very grateful for useful comments received from an anonymous referee and an anonymous Associate Editor. These comments helped the author to introduce a number of improvements to the paper.
Funding information
This research was partially supported by the NSF grant DMS-1811428 and the AFOSR grant FA9550-22-0238.
Competing interests
There were no competing interests to declare which arose during the preparation or publication process of this article.