Hostname: page-component-cd9895bd7-8ctnn Total loading time: 0 Render date: 2024-12-25T07:24:03.015Z Has data issue: false hasContentIssue false

On a variant of the product replacement algorithm

Published online by Cambridge University Press:  09 January 2024

C.R. Leedham-Green*
Affiliation:
School of Mathematical Sciences, Queen Mary University of London, London, E1 4NS, UK
Rights & Permissions [Opens in a new window]

Abstract

We discuss a variant, named ‘Rattle’, of the product replacement algorithm. Rattle is a Markov chain, that returns a random element of a black box group. The limiting distribution of the element returned is the uniform distribution. We prove that, if the generating sequence is long enough, the probability distribution of the element returned converges unexpectedly quickly to the uniform distribution.

Type
Research Article
Copyright
© The Author(s), 2024. Published by Cambridge University Press on behalf of Glasgow Mathematical Journal Trust

1. Introduction

The product replacement algorithm (PRA), see [Reference Celler, Leedham-Green, Murray, Niemeyer and O’Brien6], is an algorithm for constructing random elements of a black box group, that arose from a conversation between the author and Leonard Soicher.

The input, as is inevitable for a Black Box algorithm, consists of generating sequence $\mathcal X$ for the group $G$ of length $k$ , say. For theoretical reasons, it will be assumed that $\mathcal X$ is symmetric. That is to say is closed under inversion. If the given generating set is not symmetric, it may be enlarged to a symmetric generating set at the cost of at worst doubling its length. The requirement that $\mathcal X$ should be symmetric is not enforced in practice. An integer $n\gt k$ is chosen, and the team $T$ is defined to be an array of elements of $G$ , of length $n$ , initialised to consist of $\mathcal X$ , padded out with copies of the identity. This notation will be used throughout the paper. The operations are certain Nielsen transformations, namely the transformations $R_{ij}^\epsilon$ defined by $T[i]\mapsto T[i]T[j]^\epsilon$ where $1\le i\ne j\le n$ , and $\epsilon =\pm 1$ , the other elements of $T$ being fixed, and $L_{ij}^\epsilon$ defined similarly with left multiplication. These operations take generating sequences of $G$ to generate sequences. So these operations, chosen independently with the uniform distribution, define a Markov chain whose states are the generating sequences for $G$ of length $n$ . If, as will always be assumed, $n$ is sufficiently large for the process to be transitive, then, since it is clearly symmetric and aperiodic, by standard Markov chain theory the probability distribution of the set of states tends exponentially fast to the uniform distribution as the length of the chain tends to $\infty$ ; and after a number of iterations, a random element of $T$ is returned as a random element of $G$ . This raises two issues. The lesser issue arises from the fact that some group elements appear in more generating sequences than others, see [Reference Babai and Pak3, Reference Pak14], and [Reference Detomi, Lucchini and Morigi7]. This problem can be satisfactorily contained by ensuring that $n$ is not too small. The greater issue is to decide how many iterations of the chain are needed to get reasonably close to the uniform distribution on the set of states. That is to say, we want an upper bound to the mixing time. The mixing time depends on the initial state. This initial state will be omitted from the statement of results and from the notation, it is understood that the results hold for any initial state constructed as above.

The variant of PRA that we wish to discuss here is called ‘Rattle’, see [Reference Leedham-Green and Murray11]. It too is a Markov chain, where a state consists of a team $T$ , as in PRA, and an additional group element $A$ , the accumulator. The operations consist of the same Nielsen transformations on $T$ as before, followed or preceded by the multiplication of $A$ on the left or the right by a random element of $T$ , or by its inverse.

One advantage of Rattle over PRA lies in the fact that the limiting distribution of $A$ , as the number of iterations tends to $\infty$ , is precisely the uniform distribution on $G$ . But until now there has been no explicit bound known to the mixing time of Rattle, whereas deep theorems, that we quote in Section 2, give powerful bounds to the mixing time of PRA, provided that the length $n$ of $T$ is not too small. Our main result, Theorem 3.12, states that if $n$ satisfies rather more stringent conditions, the probability distribution of $A$ ultimately converges to the uniform distribution more rapidly than Markov chain theory would allow.

2. The product replacement algorithm

The product replacement algorithm, or PRA, is a Markov chain, as defined in the introduction, whose states are the generating sequences of length $n$ of a finite group $G$ . It is usual to regard $G$ as a black box group; the only significance of this assumption is that we have an upper bound to the order of $G$ .

The transition matrix of a Markov chain is a matrix $P$ whose rows and columns are indexed by the states of the chain, and $P(x,y)$ is the probability that, if the current state is $x$ , then the next state will be $y$ . In the case of PRA, the transition matrix is symmetric (change the sign of $\epsilon$ ) and so $P$ is a doubly stochastic matrix; that is to say, the entries are non-negative, and the row and column sums are all equal to 1, and hence, the eigenvalues of $P$ are real, and lie in the interval $[-1,1]$ , with 1 as an eigenvalue whose multiplicity is 1 if the chain is transitive, and then the uniform distribution gives a corresponding eigenvalue. Since PRA is not bipartite, as the initial transition may leave the state unchanged, all eigenvalues of $P$ are strictly greater than $-1$ . Given a symmetric Markov chain, with transition matrix $P$ , a lazy version of the chain is defined to be the Markov chain with transition matrix $Q$ , where $Q(x,x)=(P(x,x)+1)/2$ for every state $x$ , and $Q(x,y)=P(x,y)/2$ for all $x\ne y$ . In other words, there is an even chance that the process does nothing, and otherwise it behaves as with the original chain. Clearly, $Q$ is doubly stochastic, and its eigenvalues are obtained from the eigenvalues of $P$ by adding 1 and dividing by 2. The point of introducing the lazy version of a Markov chain is that the eigenvalues of $Q$ are all non-negative.

The operations $R_{ij}^\epsilon$ and $L_{ij}^\epsilon$ turn the set of generating sequences of $G$ of length $n$ into a regular directed graph of valency $4n(n-1)$ . The graph is symmetric (change the sign of $\epsilon$ ) so we may regard this as an undirected graph $\Gamma _n$ of valency $2n(n-1)$ , with loops and multiple edges.

We assume throughout that $n$ has been chosen to be sufficiently large for $\Gamma _n$ to be connected. Clearly, $\Gamma _n$ is connected if $n\ge d(G)+r$ , where $r$ is the maximal length of an irredundant generating set of $G$ , as is the case if $n\ge 2\log _2\,|G|$ .

In 2000 Igor Pak proved in [Reference Pak18] the very difficult theorem that the mixing time for the lazy version of PRA, for sufficiently large $n$ , is polynomial in $\log \, |G|$ and hence is polynomial in the size of the input. Later much stronger results were obtained, see Theorem 2.2 below.

We need the following standard notation. If $V$ and $W$ are probability distributions on a finite set $S$ , then their total variation distance is defined by

\begin{equation*}||V,W||_{\textrm {tv}}=\max _{B\subset S}|V(B)-W(B)|\end{equation*}

where $V(B)=\sum _{b\in B}V(b)$ , and similarly for $W(B)$ . We will also need the singleton distance defined by

\begin{equation*}||V,W||_{\textrm {s}}=\max _{x\in S}|V(x)-W(x)|.\end{equation*}

Clearly, $||V,W||_{\textrm{s}}\le ||V,W||_{\textrm{tv}}\le |S|\times ||V,W||_{\textrm{s}}$ .

If $g\in G$ is chosen at random using the distribution $V$ , and if $U$ is the uniform distribution on $G$ , and if $B\subseteq G$ , then the probability that $g\in B$ is $|B|/|G| + \epsilon$ , where $|\epsilon |\le ||V,U||_{\textrm{tv}}$ . This is why the total variation is important in applications.

The first polynomial time algorithm for constructing random elements of a Black Box group is due to Babai, see [Reference Babai2]. His algorithm constructs random elements of $G$ according to a probability distribution $V$ with $||V,U||_{\textrm{tv}}\lt \epsilon$ for any $\epsilon$ that is exponentially small in $\log \, |G|$ using $O(\log \, |G|)$ multiplications after a Monte Carlo pre-processing operation that requires $O\big(\log ^5\,|G|\big)$ group operations.

If $R_x^{(t)}$ is the probability distribution on the set of all states defined by $t$ iterations of a Markov chain $R$ that is transitive, symmetric, and aperiodic, starting at the state $x$ , and if $U$ is the uniform distribution on the set of states, then there exists $r\gt 0$ , independent of $x$ and $t$ , such that $\big|\big|R_x^{(t)},U\big|\big|_{\textrm{tv}} \lt e^{-t/r}$ for all $t$ . We then say that the mixing time of $R$ is less than $r$ . Other similar definitions of the term ‘mixing time’ are used.

A number of applications will be made of the following trivial lemma.

Lemma 2.1. Let $V$ and $W$ be probability distributions on the set of all states $T$ for PRA, and let $V_i$ and $W_i$ be the probability distributions on $G$ defined by restriction to $T[i]$ , where $1\le i\le n$ . Then, $||V_i,W_i||_{\textrm{tv}} \le ||V,W||_{\textrm{tv}}$ . Similarly, if $1 \le i \ne j \le n$ , and $\alpha,\beta \in \{1,-1\}$ , and if $V_{ij}^{\alpha \beta }$ is the probability distribution on $G$ defined by $T[i]^\alpha T[j]^\beta$ when $T$ is distributed according $V$ , and similarly with $W_{ij}^\alpha$ , then $\big|\big|V_{ij}^{\alpha \beta }, W_{ij}^{\alpha \beta }\big|\big|_{\textrm{tv}} \le ||V,W||_{\textrm{tv}}$ .

Proof. If $B\subseteq G$ let $\overline{B}$ be the set of states $T$ for which $T[i] \in B$ . Then, $|V_i(B)-W_i(B)|=|V(\overline{B})-W(\overline{B})| \le ||V,W||_{\textrm{tv}}$ . The proof of the second assertion is similar.

We rely on the following theorems. In the first, from [Reference Lubotzky and Pak17], the graph $\Gamma$ corresponds to our graph $\Gamma _n$ , and $\Gamma ^{\prime}$ is the connected component of $\Gamma$ that contains the initial state; so our running assumption above is that $\Gamma ^{\prime}=\Gamma$ . Note also that in this theorem, the distinction between $k$ and $n$ has disappeared, because the authors do not pad out $T$ with copies of $1_G$ , but rather, in our notation, take $n=k$ . The free group of rank $n$ is denoted by ${\mathbb F}_n$ .

Theorem 2.2. If $\mathrm{Aut}({\mathbb F}_n)$ has Kazhdan’s property (T) then, for every finite group $G$ generated by $n$ elements, the mixing time $\textrm{mix}$ of PRA, starting at any symmetric state on a connected component $\Gamma ^{\prime}\subseteq \Gamma$ , satisfies $\textrm{mix} \lt C(n)\log \,|G|$ , where $C(n)=O(n^2)$ depends only on $n$ .

Proof. This is the main theorem of [Reference Lubotzky and Pak17], the bound $C(n)=O(n^2)$ being given in [Reference Kaluba, Kielak and Nowak8]. The published version of this theorem requires the lazy version of PRA, but does not require the initial state to be symmetric. But Lubotzky has shown me in [Reference Lubotzky15] how the lower bound on the least eigenvalue of the transition matrix given by Biswas in [Reference Biswas4], or the earlier less explicit bound due to Breuillard, Green, Guralnik, and Tao in [Reference Breuillard, Green, Guralnik and Tao5], may be used to remove the need for the lazy version of PRA, at the cost of requiring the initial state to be symmetric.

Kazhdan’s property (T) is a much-studied property of topological groups, somewhat weaker than compactness, with various equivalent definitions, that we can ignore here in view of the following result.

Theorem 2.3. If $n\ge 5$ , then $\mathrm{Aut}({\mathbb F}_n)$ satisfies Kazhdan’s property (T).

Proof. For $n\gt 5$ , this result was proved in 2018 by Kaluba et al. [Reference Kaluba, Kielak and Nowak8], while Kaluba et al. [Reference Kaluba, Nowak and Ozawa9] dealt with the case $n=5$ .

We now consider the probability distribution on $G$ obtained by restricting to $T[i]$ for some $i$ .

To deal with this problem, we prove, in Theorem 2.6, a slight generalisation of the main theorem of [Reference Lubotzky16].

Let $m_r(G)$ be the number of maximal subgroups of $G$ of index $r$ , and let ${\mathcal M}(G)=\max _{r\ge 2}(\log \, m_r(G)/\log \, r)$ . Let $Q_n(G)$ be the probability that a sequence of $n$ randomly chosen elements of $G$ fails to generate $G$ .

Lemma 2.4. If $0\lt \epsilon \lt 1$ , and $n\ge{\mathcal M}(G)+2+\log _2\,\epsilon ^{-1}$ , then $Q_n(G)\lt \epsilon$ .

Proof. In the proof of Proposition 1.2 of [Reference Lubotzky16], it is, in effect, observed that if $n\ge{\mathcal M}(G)+s$ for some real $s$ then $Q_n(G)\le \zeta (s)-1$ . But $\zeta (s)-1\lt 2^{2-s}$ if $s\ge 2$ , and $2^{2-s}\le \epsilon$ if $s\ge 2 + \log _2\,\epsilon ^{-1}$ .

Theorem 2.5. Let $G$ be a finite group. Then, ${\mathcal M}(G)\le d(G)+2\log _2\log _2\,|G|+2$ .

Proof. This is Theorem 2.1 of [Reference Lubotzky16].

Corollary 2.6. If $n\ge d(G)+2\log _2\log _2\,|G|+4+\log _2\,\epsilon ^{-1}$ , then $Q_n(G)\lt \epsilon .$

The issue of the non-uniform distribution of elements of $G$ in the states of PRA may now be resolved as follows. If $g\in G$ and $1\le i\le n$ let $p(s)=p_i(s)$ be the limiting probability, as $t$ tends to infinity, that $T[i]=g$ after $t$ iterations. Since PRA is transitive, $p_i(s)$ is independent of the initial state and of $i$ . For $B\subseteq G$ let $p(B) = \sum _{g\in B}p(g)$ .

Theorem 2.7. Let $V$ be the probability distribution defined on $G$ by $p$ , and let $U$ be the uniform distribution on $G$ . If $n\ge d(G) + 2\log _2\log _2\,|G|+5 + \log _2\,\epsilon ^{-1}$ then $||V,U||_{\textrm{tv}}\lt \epsilon$ .

Proof. Let $(g_1,\ldots,g_n)$ be a sequence $s$ of elements of $G$ , chosen at random with the uniform distribution, where $n$ satisfies the condition of the theorem. With probability greater than $1-\epsilon$ , if $1\le i\le n$ the subsequence of $s$ obtained by deleting $g_i$ will generate $G$ , and in this case, the sequence obtained from $s$ by replacing $g_i$ by any element of $G$ will define an element of $\Gamma _n$ . So if $B\subseteq G$ then $\epsilon + (1-\epsilon )|B|\,|G|^{-1}\gt p(B)\ge (1-\epsilon )|B|\,|G|^{-1}$ . The result follows.

Theorem 2.8. Let $V$ be the probability distribution defined on $G$ by $p$ , and let $W(i,t)$ be the probability distribution on $G$ defined by the value of $T[i]$ after $t$ steps from some fixed initial value, and assume that $n\ge 5$ . Let $t\gt C(n)\log \,|G|\log \,\epsilon ^{-1}$ , with $C(n)$ as in Theorem 2.2. Then, $||W(i,t),V||_{\textrm{tv}}\lt \epsilon$ .

Proof. Let $W_s$ be the probability distribution on the set of states after $s$ steps from the initial configuration, and let $W=\lim _{s\to \infty }W_s$ . Let $t$ be as in the theorem. Then by Theorems 2.2 and 2.3, $||W_t,W||_{\textrm{tv}}\lt \epsilon$ , and restriction to $T[i]$ gives the result, by Lemma 2.1.

3. Rattle

The variant Rattle of PRA that we discuss here differs slightly from the original version in [Reference Leedham-Green and Murray11]; in that, the palette of operations has been increased to make the process symmetric.

The primary input for an instance of Rattle consists of a generating sequence $\mathcal X$ for a finite group $G$ and an integer $n\gt |{\mathcal X}|$ .

The set of states is the set of pairs $(T,A)$ , where the array $T$ (the team) stores any generating sequence of $G$ of length $n$ , and $A$ (the accumulator) is any element of $G$ .

The operations are as follows.

For $1\le i,j,k\le n$ and $i\ne j$ , and $\alpha,\beta \in \{-1,1\}$ , the operation $R_{1ijk}^{\alpha \beta }$ is defined by the sequence

  1. 1. $T[i]\mapsto T[i]T[j]^\alpha$ ;

  2. 2. $A\mapsto AT[k]^\beta$ .

Similarly, the operation $L_{1ijk}^{\alpha \beta }$ is defined by the sequence

  1. 1. $T[i]\mapsto T[j]^\alpha T[i]$ ;

  2. 2. $A\mapsto T[k]^\beta A$ .

The operation $R_{2ijk}^{\alpha \beta }$ is defined as in $R_{1ijk}^{\alpha \beta }$ , but with the operations carried out in the reverse order, and similarly for $L_{2ijk}^{\alpha \beta }$ .

So $R_{2ijk}^{-\alpha,-\beta }$ is the inverse of $R_{1ijk}^{\alpha \beta }$ , and similarly with $L_{2ijk}^{-\alpha,-\beta }$ .

Initially, $A$ is set to $1_G$ , and $T$ stores the sequence $\mathcal X$ padded out with copies of $1_G$ . This initial state will be denoted by $v$ .

After a number of iterations of the chain, the accumulator is returned as the random element of $G$ .

As with PRA, we shall assume that $\mathcal X$ is symmetric, a restriction not observed in practice.

We shall assume throughout this section that $n$ has been chosen so that PRA is transitive on the set of possible values for $T$ ; equivalently, the restriction of Rattle to $T$ is transitive.

Lemma 3.1. The Markov chain Rattle is transitive.

Proof. It suffices to prove that, for any state $u=(T,A)$ there is a walk from $u$ to the initial state $v$ .

Since the restriction of Rattle to $T$ is transitive, there is a walk from $u$ to a state $w$ for which $T$ takes the same value as does $v$ , so $T[n]=1_G$ . Now moves of the form $R_{11nk}^{1\beta }$ define a walk from $w$ to $v$ .

Recall that $p(g)$ was defined immediately before Theorem 2.7, and let $\delta _g=p(g)-1/|G|$ .

Lemma 3.2. $\delta _g=\delta _{g^{-1}}=\delta _{g^h}$ for all $g,h\in G$ .

Proof. The anti-automorphism $g\mapsto g^{-1}$ maps generating sequences of $G$ to generating sequences, as does conjugation by $h$ .

Lemma 3.3. $\sum _{g\in G} \delta _g=0$ .

Proof. $\sum _{g\in G} p(g)=1$ .

Let $|G|^{-1}+n_{gh}$ be the limiting probability, as $t$ tends to $\infty$ , that the value of the accumulator, given that this is $g$ at time $t$ , will be $h$ at time $t+1$ .

Lemma 3.4. Let $V$ be the probability distribution on $G$ defined by $p$ , and let $U$ be the uniform distribution on $G$ . If $B\subseteq G$ , then $\big|\sum _{h\in B}n_{gh}\big| \le ||V,U||_{\textrm{tv}}$ .

Proof. Since Rattle is transitive, in the limiting distribution, as $t$ goes to $\infty$ , there is no correlation in the limit between the values of $A$ and $T$ . The probability that the value of the accumulator will change from $g$ to an element of $B$ at time $t$ in one step, once an operation $R_{aijk}^{\alpha \beta }$ or $L_{aijk}^{\alpha \beta }$ has been chosen, is the probability that $T[k]$ lies in a specified subset of $G$ of cardinal $|B|$ at time $t$ if $a=1$ or at time $t+1$ if $a=2$ , and in the limit this probability differs from $1/|B|$ by less than $||V,U||_{\textrm{tv}}$ . The result follows.

Lemma 3.5. $\sum _{h\in G}n_{gh}=0$ for all $g\in G$ .

Proof. $\sum _{h\in G}\left(|G|^{-1}+n_{gh}\right)=1$ .

If $g$ and $h$ are elements of $G$ , let $p_{gh}^{(t)}$ be the probability that if at time $t$ the value of the accumulator is $g$ then at time $t+1$ it will be $h$ . These probabilities depend on the initial state, but we omit this from the notation.

Lemma 3.6. $p_{gh}^{(t)}=p_{hg}^{(t)}$ for all $g,h\in G$ and all $t\gt 0$ .

Proof. Rattle is symmetric.

Let $\epsilon _{gh}^{(t)}=p_{gh}^{(t)} - |G|^{-1} - n_{gh}$ .

Lemma 3.7. Let $U$ be the uniform distribution on the set of states for PRA, and let $V_t$ be the probability distribution on this set after $t$ steps. If $B\subseteq G$ , then $\big|\sum _{h\in B}\epsilon _{gh}^{(t)}\big|\le ||V_t,U||_{\textrm{tv}}$ .

Proof. Suppose that, after $t$ steps, $A=g$ , and an operation $R_{1ijk}^{\alpha \beta }$ is chosen. Let $V_{tk}$ and $U_k$ be the probability distributions on $G$ obtained from $V_t$ and $U$ , respectively, by restriction to $T[k]$ . Then, the probability $p$ that the accumulator at step $t+1$ will take a value in $B$ is $V_{tk}(C)$ for a certain subset $C$ of $G$ , with $|C|=|B|$ , and the limiting probability $q$ , as $t$ tends to $\infty$ , that this operation will take the accumulator from its assumed value of $g$ to $h$ is $U_k(C)$ for the same subset $C$ of $G$ . But by Lemma 2.1, $||V_{tk},U_k||_{\textrm{tv}} \le ||V_t,U||_{\textrm{tv}}$ , so $|p-q| \le ||V_t,U||_{\textrm{tv}}$ . The same conclusion is reached by the same argument if the chosen operation is $R_{2ijk}^{\alpha \beta }$ if $i\ne k$ . So now suppose that the chosen operation is $R_{2ijk}^{\alpha \beta }$ and that $i=k$ . In this case, the probability $p$ or $q$ that this operation will take the accumulator from $g$ to $h$ at step $t+1$ , respectively in the limit, is the probability that $T[i]T[j]^\alpha$ lies in some subset $C$ of $G$ , the same subset in both cases; and the same argument, using the second part of Lemma 2.1, shows again that $|p-q| \le ||V_t,U||_{\textrm{tv}}$ . A similar argument applies to the left operations, and the lemma follows.

Lemma 3.8. $n_{gh}=n_{hg}$ for all $g,h\in G$ .

Proof. By Lemma 3.6, $p_{gh}^{(t)}=p_{hg}^{(t)}$ . But $p_{gh}^{(t)} = |G|^{-1} + n_{gh}+\epsilon _{gh}^{(t)}$ , and $\epsilon _{gh}^{(t)}$ tends to $0$ as $t$ tends to $\infty$ . So $n_{gh}=n_{hg}$ , as required.

Lemma 3.9. $\sum _{h\in G}\epsilon _{gh}^{(t)}=0$ for all $g\in G$ and for all $t$ .

Proof. $\sum _{h\in G}p_{gh}^{(t)}=1$ .

Lemma 3.10. $\epsilon ^{(t)}_{gh}=\epsilon ^{(t)}_{hg}$ for all $g$ and $h$ in $G$ , and for all $t$ .

Proof. This follows from Lemmas 3.8 and 3.6.

Let the probability that the accumulator takes the value $g$ at time $t$ be $|G|^{-1}+\gamma _g^{(t)}$ , where again we omit the initial state from the notation. Then, we arrive at a central result.

Theorem 3.11. $\gamma _g^{(t+1)}=\sum _{h\in G}\gamma _h^{(t)}\big(n_{gh}+\epsilon _{gh}^{(t)}\big)$ .

Proof. $|G|^{-1}+\gamma _g^{(t+1)} = \sum _{h\in G}\left(|G|^{-1}+\gamma _h^{(t)}\right)p_{hg}^{(t)} = \sum _{h\in G}\left(|G|^{-1}+\gamma _h^{(t)}\right)\left(|G|^{-1}+n_{gh}+\epsilon _{gh}^{(t)}\right)$ . The theorem follows from the above lemmas.

If $B\subseteq G$ let $\gamma _B^{(t)}=\sum _{g\in B}\gamma _g^{(t)}$ . Let $W_t$ be the probability distribution of the accumulator at time $t$ , again omitting the initial state from the notation, and $U$ be the uniform distribution on $G$ . Then, by definition, $||W_t,U||_s=\max _{g\in G}|\gamma _g^{(t)}|$ , and $||W_t,U||_{\textrm{tv}}=\max _{B\subseteq G}|\gamma _B^{(t)}|$ .

Theorem 3.12. If $n\ge d(G)+2\log _2\log _2\,|G|+5+\log _2\,\epsilon _1^{-1}$ , and $t\ge C(n)\log _2\,|G|\log _2\,\epsilon _2^{-1}$ , where $C(n)$ is defined in Theorem 2.2, then $||W_{t+1},U||_{\textrm{tv}}\lt (\epsilon _1+\epsilon _2)|G|\,||W_t,U||_{\textrm{s}}\le (\epsilon _1+\epsilon _2)|G|\,||W_t,U||_{\textrm{tv}}$ .

Proof. Let $B\subseteq G$ . Then, $\gamma _B^{(t+1)}= \sum _{g\in B}\gamma _g^{(t+1)}=\sum _{g\in B}\sum _{h\in G}\gamma _h^{(t)}\big(n_{gh}+\epsilon _{gh}^{(t)}\big)$ by Theorem 3.11 $ = \sum _{h\in G}\sum _{g\in B}\gamma _h^{(t)}\big(n_{gh}+\epsilon _{gh}^{(t)}\big)= \sum _{h\in G}\gamma _h^{(t)}\sum _{g\in B}\big(n_{gh}+\epsilon _{gh}^{(t)}\big)$ . But $\big|\sum _{g\in B}n_{gh}\big|\lt \epsilon _1$ by Lemma 3.4, and $\big|\sum _{g\in B}\epsilon _{gh}^{(t)}\big|\lt \epsilon _2$ by Theorems 2.2, 2.3, and Lemma 3.7. So $\big|\gamma _B^{(t+1)}\big|\le \sum _{h\in G}|\gamma _h^{(t)}|(\epsilon _1+\epsilon _2)\le (\epsilon _1+\epsilon _2)|G|\max _{h\in G}|\gamma _h^{(t)}|=(\epsilon _1+\epsilon _2)|G|\,||W_t,U||_{\textrm{s}}$ , and $||W_t,U||_{\textrm{s}}\le ||W_t,U||_{\textrm{tv}}$ .

Clearly, this theorem has no useful content if $\epsilon _1+\epsilon _2\ge 1/|G|$ . So suppose that $n\gt d(G)+2\log _2\log _2\,|G| + 7 +\log _2\,|G|$ . Then, $\epsilon _1\lt |G|^{-1}/4$ . Moreover, there exists $s=O\big(\log ^3\,|G|\big)$ such that if $t\gt s$ then $\epsilon _2\lt |G|^{-1}/4$ . If both these conditions are satisfied, then $||W_t,U||_{\textrm{tv}}$ is reduced by a factor greater than 2 for every single iteration. Similarly, if $n\gt d(G)+2\log _2\log _2\,|G| + 6 + 2\log _2\,|G|$ then $\epsilon _1\lt |G|^{-2}/2$ , and there exists $s=O(\log ^4\,|G|)$ such that if $t\gt s$ then $\epsilon _2\lt |G|^{-2}/2$ . If both these conditions are satisfied, then $||W_t,U||_{\textrm{tv}}$ is reduced by a factor greater than $|G|$ for every iteration.

In conclusion, the mixing time of Rattle is $O\left(\max \left(k^2\log _2\,|G|, \log _2^3\,|G|\right)\right)$ , where $k=|{\mathcal X}|$ . If we ignore $T$ , and just concentrate on $A$ , the process is no longer Markov, and the term ‘mixing time’, as we have defined it, is no longer applicable. For any $c\gt 1$ , polynomially bounded values of $n$ and $t$ exist such that, after $t$ steps, $||W_t,U||_{\textrm{tv}}$ is reduced by a factor greater than $c$ for every further iteration of the process. Unfortunately, in the absence of an explicit bound for $C(n)$ in Theorem 2.2, we have no means of telling when $t$ is big enough for this rate of convergence to hold.

So for any given generating set for any finite group $G$ , Rattle can outperform PRA if the chosen value of $\epsilon$ is small enough. And since PRA is a Markov process, this superior performance of Rattle is absolute; in that no further progress in bounding the mixing time of PRA will alter this fact.

4. Practice

To start with the conclusion, there is a disconnect between the theory and the practice of these algorithms, and it seems that the theory of Markov processes, so essential for proving that random elements of the group may be constructed in polynomial time, may have little or nothing to do with explaining why the algorithms work so well. In practice, when used in the simplest way, the chain is repeated a number of times in a pre-processing stage and then repeated just once for each random element of $G$ . To reduce the risk of returning to the previous state, the pallet of moves may be reduced to avoid symmetry. The length of the pre-processing stage is often taken to be too small for the majority of elements of $G$ (which group may have a very large order) to be within reach. Yet the algorithms pass all $\chi ^2$ tests with flying colours for these modest pre-processing stages (a chain of length at most 100 ). Why is this? Random search is most frequently used for looking for elements of a subset $S$ of $G$ that is expected to constitute a proportion of elements of $G$ that is not too small. Typically $S$ will be a union of conjugacy classes in $G$ . For example, we may look in a matrix group for an element whose characteristic polynomial has an irreducible factor of degree within some given range. The unfailing success of PRA or Rattle implies that membership of $S$ , however defined, cuts across the operations that are used.

Sometimes all that is required of the random elements that are constructed is that they should be of good enough quality. For example, the very complex algorithm that takes as input a generating set $\mathcal X$ for a subgroup $G$ of $\mathrm{GL}(d,q)$ and returns a composition series and constructive membership test for $G$ may be run in Monte Carlo style. A composition series and constructive membership test are constructed for a subgroup $K$ of a subgroup $H$ of $G$ , with the expectation that $K=H$ . A set of say 100 random elements of $H$ (the mandarins) are used to check this supposition. The crucial point is that we do not have a generating set for $H$ , the mandarins having been constructed from a set of random elements of $G$ constructed using PRA or Rattle. If a mandarin fails to lie in $K$ , then of course $K$ is not the whole of $H$ . Conversely, if $K$ is not the whole of $H$ then the probability that the mandarins fail to detect this is $r^{-100}$ , where $r=|H\,:\,K|$ , assuming that the random elements are constructed perfectly. The conclusion may be checked deterministically by constructing presentations, though this is rather expensive. But the mandarins are never shown to have lied. Of course, it would take a very serious failure of the random generation algorithm for them to have any chance of lying.

More frequently, random elements of $G$ are required together with straight line programmes that define them as words in $\mathcal X$ , and these words should be as short as can be conveniently arranged; so there is a tension between having a short chain on the one hand and good quality random elements on the other.

For a general discussion of how matrix groups over a finite field defined by a generating set are analysed, and the use of random elements in this analysis, see [Reference Leedham-Green10] and [Reference O’Brien13].

As an alternative to using Rattle (or PRA) in the above simple way, a serious attempt to keep the above balance between short chains and good quality random elements may be made by using the ‘prospector’ as described in [Reference Bäärnhielm and Leedham-Green1]. That paper also analyses experimentally the use of another variant of PRA that uses an ‘accelerator’, which is particularly valuable if the given generating set is large.

Acknowledgement

Alex Lubotzky has guided me through the latest research related to PRA and has saved me from some embarrassing errors.

References

Babai, L., Local expansion of vertex-transitive graphs and random generation in finite groups, in Proceedings of 23rd ACM STOC (1991), 164174.CrossRefGoogle Scholar
Babai, L. and Pak, I., Strong bias of group generators: an obstacle to the “product replacement algorithm, J. Algorithms 50(2) (2004), 215231.CrossRefGoogle Scholar
Biswas, A., On a Cheeger type inequality in Cayley graphs of finite groups, Eur. J. Comb. 81 (2019), 298308.CrossRefGoogle Scholar
Breuillard, E., Green, B., Guralnik, R. and Tao, T., Expansion in finite simple Groups of Lie type arXiv:1390.1975 (2014).Google Scholar
Bäärnhielm, H. and Leedham-Green, C. R., The product replacement prospector, J. Symb. Comput. 47(1) (2012), 6475.CrossRefGoogle Scholar
Celler, F., Leedham-Green, C. R., Murray, S. H., Niemeyer, A. C. and O’Brien, E. A., Generating random elements of a finite group, Commun. Algebra 23(13) (1995), 49314948.CrossRefGoogle Scholar
Detomi, E., Lucchini, A. and Morigi, M., The limiting distribution of the product replacement algorithm for finitely generated prosoluble groups, J. Algebra 468 (2006), 4971.CrossRefGoogle Scholar
Kaluba, M., Kielak, D. and Nowak, P., On property $(T)$ for $\mathrm{Aut}({\mathbb F}_n)$ and $\mathrm{SL}_n({\mathbb Z})$ . arXiv:1812.03456.Google Scholar
Kaluba, M., Nowak, P. W. and Ozawa, N., Aut (F5) has property $(T)$ arXiv: 1712.07167.Google Scholar
Leedham-Green, C., The computational matrix group project, in Groups and computation III (de Gruyter, New York, Ohio) (1999), 229248.Google Scholar
Leedham-Green, C. and Murray, S., Variants of product replacement, in Computational and statistical group theory, Contemporary Mathematics, vol. 298 (American Mathematical Society, Providence, RI, 2001), 97104.CrossRefGoogle Scholar
Levin, D. A., Peres, Y. and Wilmer, E. L., Markov chains and mixing times, 2nd edition (American Mathematical Society, Providence, Rhode Island, 2017).CrossRefGoogle Scholar
Lubotzky, A., Private communication.Google Scholar
Lubotzky, A., The expected number of random elements to generate a finite group, J. Alg. 257 (2002), 452459.CrossRefGoogle Scholar
Lubotzky, A. and Pak, I., The product replacement algorithm and Kazhdan’s property (T), J.A.M.S. 14(2) (2001), 347363.Google Scholar
O’Brien, E., Towards effective algorithms for linear groups, in Finite geometries, groups and computation, (Colorado), September 2004 (2006), 163190.Google Scholar
Pak, I., What do we know about the product replacement algorithm?, in Groups and computation III (Ohio State University Mathematical Research Institute Publications, de Gruyter, Ohio, Berlin) (1999), 301347.Google Scholar
Pak, I., The product replacement algorithm is polynomial, in Proceedings 41st Annual Symposium on Foundations of Computer Science, Redondo Beach, CA, USA (2000), 476485.Google Scholar
Peres, Y., Tanaka, R. and Zhai, A., Cutoff for product replacement on finite groups, in Probability and related fields (2020).Google Scholar