Hostname: page-component-cd9895bd7-dk4vv Total loading time: 0 Render date: 2024-12-26T18:54:19.642Z Has data issue: false hasContentIssue false

John’s walk

Published online by Cambridge University Press:  10 November 2022

Adam Gustafson*
Affiliation:
Microsoft Corporation
Hariharan Narayanan*
Affiliation:
TIFR Mumbai
Rights & Permissions [Opens in a new window]

Abstract

We present an affine-invariant random walk for drawing uniform random samples from a convex body $\mathcal{K} \subset \mathbb{R}^n$ that uses maximum-volume inscribed ellipsoids, known as John’s ellipsoids, for the proposal distribution. Our algorithm makes steps using uniform sampling from the John’s ellipsoid of the symmetrization of $\mathcal{K}$ at the current point. We show that from a warm start, the random walk mixes in ${\widetilde{O}}\!\left(n^7\right)$ steps, where the log factors hidden in the ${\widetilde{O}}$ depend only on constants associated with the warm start and desired total variation distance to uniformity. We also prove polynomial mixing bounds starting from any fixed point x such that for any chord pq of $\mathcal{K}$ containing x, $\left|\log \frac{|p-x|}{|q-x|}\right|$ is bounded above by a polynomial in n.

Type
Original Article
Copyright
© The Author(s), 2022. Published by Cambridge University Press on behalf of Applied Probability Trust

1. Introduction

Drawing random samples from a convex body in $\mathcal{K} \subset \mathbb{R}^n$ is an important problem for volume computation and optimization which has generated a large body of research. Usually $\mathcal{K}$ is specified by a membership oracle which certifies whether or not a test point $x \in \mathbb{R}^n$ is contained in $\mathcal{K}$ . Given such an oracle, geometric random walks are then used to explore $\mathcal{K}$ in such a way that, after a sufficient number of steps, the walk has ‘mixed’, in the sense that the current point is suitably close to a point uniformly drawn from $\mathcal{K}$ in terms of statistical distance. To use such walks, an assumption that $\mathcal{B}(r) \subset \mathcal{K} \subset \mathcal{B}(R)$ is often made, where $\mathcal{B}(r)$ represents the Euclidean ball of radius $r > 0$ . One common example of such geometric walks is the ball walk, which generates the next point by uniformly randomly sampling from a ball of radius $\delta \leqslant r/\sqrt{n}$ centered at the current point, and mixes in ${\widetilde{O}}\big(n\big(R^2/\delta^2\big)\big)$ steps from an M-warm start (i.e., the starting distribution has a density bounded above by a constant M [Reference Kannan, Lovász and Simonovits5]). Here and elsewhere, the ${\widetilde{O}}({\cdot})$ notation suppresses polylogarithmic factors as well as constants depending only on the error parameters. Another example is the hit-and-run walk, where the next point is chosen uniformly at random from a random chord in $\mathcal{K}$ which intersects the current point. Hit-and-run mixes in $O\big(n^3(R/r)^2 \log\!(R/(d\epsilon))\big)$ , where the starting point is at distance d from the boundary and $\epsilon$ is the desired distance to stationarity [Reference Lovász and Vempala11]. Affine-invariant walks (i.e., geometric walks whose mixing time is invariant to such affine transformations) are another class of random walks which avoid the problem of rounding. One such random walk is known as the Dikin walk [Reference Kannan and Narayanan6], which uses uniform sampling from Dikin ellipsoids to make steps. Given a polytope with m inequality constraints, the Dikin walk mixes in ${\widetilde{O}}(mn)$ steps from a warm start. This random walk was extended to general convex bodies equipped with a $\nu$ -self-concordant barrier in [Reference Narayanan13], and mixes in ${\widetilde{O}}\big(n^3\nu^2\big)$ steps from a warm start. For the case of a polytope, this implies that the Dikin walk equipped with the Lee–Sidford (LS) barrier [Reference Lee and Sidford8] mixes in ${\widetilde{O}}\big(n^5\big)$ steps from a warm start, though at each step one must additionally compute the LS barrier, which requires $O\big(nnz(A) + n^2\big)$ arithmetic operations, where nnz(A) is the number of non-zero entries in the matrix A which defines the polytope. A significantly improved analysis of this walk was performed by [Reference Chen, Dwivedi, Wainwright and Yu2], whose algorithm reaches a total variation distance of $\epsilon$ from the uniform measure in $O\!\left(n^{2.5}\log^4\!\left(\frac{2m}{n}\right)\log\!\left(\frac{M}{\epsilon}\right)\right)$ steps from an M-warm start. Very recently, it was shown in [Reference Laddha, Lee and Vempala7] that for certain ‘strongly self-concordant’ barriers there is a Dikin walk that mixes in $\tilde{O}(n \tilde{\nu})$ , where $\tilde{\nu}$ is related to the self-concordance parameter of the barrier.

In this paper we introduce another affine-invariant random walk akin to the Dikin walk which uses uniform sampling from John’s ellipsoids of a certain small radius of appropriately symmetrized convex sets to make steps. We show that this walk mixes to achieve a total variation distance $\epsilon$ in $O\big(n^7\log \epsilon^{-1}\big)$ steps from a warm start. The type of convex body $\mathcal{K}$ is not specified (i.e., need not be a polytope) in our analysis of the mixing time, but one must have access to the John’s ellipsoid of the current symmetrization of the convex body. While this dependence on the dimension is admittedly steep, a significant feature of this walk is that its mixing time from a warm start, or alternatively a ‘central point’ such as the center of mass, can be bounded above by a quantity that has absolutely no dependence on any parameter associated with the body apart from its dimension.

Notation: We will denote a large universal constant by C and a small universal constant by c. Let $\pi_0$ be a measure that is absolutely continuous with respect to the uniform measure $\pi$ on $\mathcal{K}$ . Let $\pi_t$ denote the measure after t steps of the Markov chain.

Our main theorems at the end of this paper are the following.

Theorem 1: Let $r = c n^{-\frac{5}{2}}$ be the radius of the John’s ellipsoids in the walk. Let $\epsilon > 0$ and $M =\sup \frac{\pi_0(A)}{\pi(A)}$ . After $t(\epsilon) = C n^7 \log\!({M}/\epsilon) $ steps of John’s walk, we have $d_{TV}\big(\pi_{t(\epsilon)}, \pi\big) \leqslant \epsilon$ .

Theorem 2: Let $r = c n^{-\frac{5}{2}}$ be the radius of the John’s ellipsoids in the walk. For all chords pq of $\mathcal{K}$ containing x, assume $\frac{|p-x|}{|q-x|} \in \big(\eta, \eta^{-1}\big)$ for some parameter $0 < \eta < 1$ that measures the centrality of x in $\mathcal{K}$ . Then there is a random geometrically distributed time $\tau$ with mean bounded above by C such that for $\epsilon > 0$ , after $t(\epsilon) + \tau = C n^7 \left(n \log\!\left(\sqrt{n}/(r\eta)\right) + \log\!({1}/\epsilon)\right) + \tau$ steps of John’s walk starting at x, we have $d_{TV}\big(\pi_{t(\epsilon)+ \tau}, \pi\big) \leqslant \epsilon$ .

It is known that for the center of mass, $\eta \geqslant \frac{c}{n}.$

2. John’s walk

In this section, we describe John’s maximum-volume ellipsoid for a convex body $\mathcal{K} \subset \mathbb{R}^n$ , and describe a geometric random walk using such ellipsoids. We begin by reviewing John’s theorem and some implications of the theorem.

2.1. John’s theorem

Fritz John showed that any convex body contains a unique ellipsoid of maximal volume, and characterized the ellipsoid [Reference John4, Reference Ball1]. Without loss of generality, we may assume that the ellipsoid of maximal volume is the unit Euclidean ball $\mathcal{B} \subset \mathbb{R}^n$ , since this is the case after an affine transformation. John’s theorem as stated for the unit ball case is as follows.

Theorem 1. (John’s theorem.) Each convex body $\mathcal{K} \subset \mathbb{R}^n$ contains a unique ellipsoid of maximal volume. The ellipsoid is $\mathcal{B}$ if and only if the following conditions are satisfied: $\mathcal{B} \subset \mathcal{K}$ , and for some $m \geqslant n$ there are Euclidean unit vectors $\left\{{u_i}\right\}_{i=1}^m$ on the boundary of $\mathcal{K}$ and positive constants $\left\{{c_i}\right\}_{i=1}^m$ satisfying

(2.1) \begin{equation} \sum_{i=1}^m c_i u_i = 0, \end{equation}
(2.2) \begin{equation} \sum_{i=1}^m c_i u_i u_i^T = I_n, \end{equation}

where $I_n$ denotes the identity matrix in $\mathbb{R}^{n\times n}$ .

Note that the condition (2.2) is sometimes written equivalently as

\begin{equation*}\left \langle {x}, \, {y} \right \rangle = \sum_{i=1}^m c_i \!\left \langle {u_i}, \, {x} \right \rangle \left \langle {u_i}, \, {y} \right \rangle\end{equation*}

for all $x, y \in \mathbb{R}^n$ . Using the cyclic invariance of the trace and that the $\left\{{u_i}\right\}$ are unit vectors, (2.2) implies that

(2.3) \begin{equation}\sum_{i=1}^m c_i = n,\end{equation}

a property we employ in the subsequent analysis.

We now enumerate some properties from [Reference Ball1] which provide additional insight into the geometric properties of John’s ellipsoids and are useful for the analysis in subsequent sections. Note that the condition (2.1) implies that all the contact points do not lie in one half-space of the unit ball, and this condition is redundant in the symmetric case, since for every contact point $u_i$ , its reflection about the origin $-u_i$ is also a contact point. The condition (2.2) guarantees that such contact points do not lie close to a proper subspace. Furthermore, there are at most $n(n+3)/2$ contact points for general $\mathcal{K}$ , and $n(n+1)/2$ non-redundant contact points if $\mathcal{K}$ is origin-symmetric [Reference Gruber3]. At each $u_i$ , the supporting hyperplane to $\mathcal{K}$ is unique and orthogonal to $u_i$ , since this is the case for the unit ball. Thus, considering the polytope resulting from such supporting hyperplanes, $\mathcal{P} = \left\{{x \in \mathbb{R}^n {\;\big|\;} \langle x, u_i \rangle \leqslant 1, \, i = 1, \ldots, m}\right\}$ , the convex set $\mathcal{K}$ obeys the sandwiching $\mathcal{B} \subset \mathcal{K} \subset \mathcal{P}$ . By Cauchy–Schwarz, for any $x \in \mathcal{P}$ , we have

\begin{equation*}-|x| \leqslant \left \langle {u_i}, \, {x} \right \rangle \leqslant 1.\end{equation*}

Since the weights $\left\{{c_i}\right\}$ are positive, it follows from applying the conditions (2.1), (2.2), and (2.3) that

\begin{equation*}\begin{aligned}0 &\leqslant \sum_{i = 1}^m c_i (1 - \left \langle {u_i}, \, {x} \right \rangle\!)(|x| + \left \langle {u_i}, \, {x} \right \rangle\!) \\&= |x|\sum_{i = 1}^m c_i + (1 - |x|) \left \langle {\sum_{i=1}^m c_i u_i}, \, {x} \right \rangle - \sum_{i=1}^m c_i \!\left \langle {u_i}, \, {x} \right \rangle^2 \\&= n |x| - |x|^2,\end{aligned}\end{equation*}

from which it follows that $|x| \leqslant n$ . If the convex body is origin-symmetric, then by substituting $-u_i$ for $u_i$ , for any $x \in \mathcal{P}$ , we have

\begin{equation*}|\!\left \langle {u_i}, \, {x} \right \rangle\!| \leqslant 1.\end{equation*}

It follows that

\begin{equation*}|x|^2 = \sum_{i=1}^m c_i \!\left \langle {u_i}, \, {x} \right \rangle^2 \leqslant \sum_{i=1}^m c_i = n,\end{equation*}

so $|x| \leqslant \sqrt{n}$ . Therefore, if $\mathcal{K}$ is origin-symmetric and the unit ball is the John’s ellipsoid, then the containment is

(2.4) \begin{equation}\mathcal{B} \subset \mathcal{K} \subset \sqrt{n} \mathcal{B}.\end{equation}

2.2. The John’s walk algorithm

We state the algorithm for a general convex body $\mathcal{K}$ . At a given point $x \in \mathcal{K}$ , let the symmetrization of $\mathcal{K}$ at x be

\begin{equation*}\mathcal{K}_x^s \equiv \mathcal{K} \cap \left\{{2x - y {\;\big|\;} y \in \mathcal{K}}\right\},\end{equation*}

and for some symmetric positive definite matrix $E_x$ , let

(2.5) \begin{equation}\mathcal{E}_x = \left\{{E_x u + x {\;\big|\;} |u| \leqslant 1}\right\}\end{equation}

denote the John’s ellipsoid of $\mathcal{K}_x^s$ . Similarly, let the rescaled John’s ellipsoid be $\mathcal{E}_x(r) = \left\{{r (E_x u) + x {\;\big|\;} |u| \leqslant 1}\right\}$ , where the radius $r > 0$ will be specified in Section 3. Assume $0 = x_0 \in \text{int}(\mathcal{K})$ , and we have computed $\mathcal{E}_{x_0}$ . To generate a sample $x_i$ given $x_{i-1}$ , we use Algorithm 2.1, where $\lambda({\cdot})$ denotes the Lebesgue measure on $\mathbb{R}^n$ .

Algorithm 2.1: John’s walk step

Algorithm 2.1 is a Metropolis–Hastings geometric random walk which uses the uniform measure $Q_x({\cdot})$ on the dilated John’s ellipsoid $\mathcal{E}_x(r)$ as the proposal distribution. Tossing a fair coin ensures that the transition probability kernel defined by the algorithm is positive definite, which is known as making the walk lazy. Lazy random walks have the same stationary distribution as the original walk, at the cost of a constant increase in mixing time (we will analyze the non-lazy walk, noting that the mixing time is not affected in terms of complexity as a function of m and n). The rejection of any sample y such that $x \notin \mathcal{E}_y(r)$ is necessary to ensure that the random walk is reversible.

The uniform measure on the John’s ellipsoid $\mathcal{E}_x(r)$ is absolutely continuous with respect to the Lebesgue measure $\lambda$ , and thus the Radon–Nikodym derivative (i.e., density) for the proposal distribution is

(2.6) \begin{equation}\begin{aligned}q_x(y) &\equiv \frac{dQ_x}{d\lambda}(y) \\&= \left(\frac{1}{\lambda(\mathcal{E}_x(r))}\right) \cdot 1_{\left\{{y \in \mathcal{E}_x(r)}\right\}}.\end{aligned}\end{equation}

The acceptance probability corresponding to the uniform stationary measure in the Metropolis filter is

\begin{equation*}\alpha_x(y) = \min\!\left[1, \; \frac{\lambda(\mathcal{E}_x(r))}{\lambda(\mathcal{E}_y(r))}\right].\end{equation*}

By the Lebesgue decomposition, the transition probability measure $P_x({\cdot})$ of the non-lazy version of Algorithm 2.1 is absolutely continuous with respect to the measure

(2.7) \begin{equation} \mu \equiv \lambda + \delta_x,\end{equation}

where $\delta_x({\cdot})$ is the Dirac measure at x corresponding to a rejected move. The transition density is thus

(2.8) \begin{equation} \begin{aligned}p_x(y) &\equiv \frac{dP_x}{d\mu}(y) \\&= \alpha_x(y) q_x(y) 1_{\left\{{y \neq x, \, x \in \mathcal{E}_y(r)}\right\}} + \rho(x) 1_{\left\{{y = x}\right\}} \\&= \min \!\left[\frac{1}{\lambda(\mathcal{E}_x(r))}, \, \frac{1}{\lambda(\mathcal{E}_y(r))} \right]1_{\left\{{y \neq x, \, x \in \mathcal{E}_y(r), \, y \in \mathcal{E}_x(r)}\right\}} + \rho(x) 1_{\left\{{y = x}\right\}},\end{aligned}\end{equation}

where $1_{\left\{{\cdot}\right\}}$ is the indicator function and the rejection probability is denoted by $\rho(x)$ . Following the observation below, we analyze the mixing time of the walk.

2.2.1. An observation

As a consequence of John’s theorem, $\mathcal{E}_x = \mathcal{B}$ for $x = 0 \in \mathcal{K}$ if and only if the following conditions are satisfied: $\mathcal{B} \subset \mathcal{K}$ , and for some $m \geqslant n$ there are Euclidean unit vectors $\left\{{u_i}\right\}_{i=1}^m$ on the boundary of $\mathcal{K}$ and positive constants $\left\{{c_i}\right\}_{i=1}^m$ satisfying

(2.9) \begin{align}&\sum_{i=1}^m c_i u_i u_i^T = I_n,\end{align}

where $I_n$ denotes the identity matrix in $\mathbb{R}^{n\times n}$ . Note that we have done away with the first condition $\sum_i c_i u_i = 0$ , because not all contact points with $\mathcal{K}_x^s$ are contact points with $\mathcal{K}$ . We are forced to place a zero weight on such vectors, and this is possible, because in such a case, $-u_i$ is a contact point with both $\mathcal{K}$ and $\mathcal{K}_x^s$ .

3. Analysis of mixing time

In what follows we let a discrete-time, homogeneous Markov chain be the triple $\left\{{\mathcal{K}, \mathcal{A}, P_x({\cdot})}\right\}$ along with a distribution $P_0$ for the starting point, where the sample space is the convex body $\mathcal{K} \subset \mathbb{R}^n$ , the measurable sets on $\mathcal{K}$ are denoted by $\mathcal{A}$ , and $P_x({\cdot})$ denotes the transition measure for any $x \in \mathcal{K}$ . We say that $\pi$ is a stationary distribution of $\left\{{\mathcal{K}, \mathcal{A}, P_x({\cdot})}\right\}$ if for all $A \in \mathcal{A}$ ,

\begin{equation*}\int_{\mathcal{K}} \int_A P_x(y) \pi(dx) = \pi(A).\end{equation*}

We say that $\left\{{\mathcal{K}, \mathcal{A}, P_x({\cdot})}\right\}$ is reversible with respect to the stationary distribution $\pi$ if for all $A, A^{\prime} \in \mathcal{A}$ ,

\begin{equation*}\int_{A^{\prime}} \int_A P_x(y) \pi(dx) = \int_{A} \int_{A^{\prime}} P_y(x) \pi(dy).\end{equation*}

3.1. Conductance and mixing times

We use the approach from [Reference Lovász and Simonovits10] of lower-bounding the conductance of the chain to prove mixing times. The conductance is defined as follows.

Definition 1. (Conductance.) Let P be a discrete-time homogenous Markov chain with kernel $P_x({\cdot})$ that is reversible with respect to the stationary measure $\pi({\cdot})$ . Given $A \in \mathcal{A}$ with $0 < \pi(A) < 1$ , the conductance of A is defined as

\begin{equation*}\phi(A) \equiv \frac{\Phi(A)}{\min\!\left\{{\pi(A), \pi(\mathcal{K}{\setminus}A)}\right\}},\end{equation*}

where $\Phi(A) \equiv \int_A P_u(\mathcal{K} {\setminus} A) \, d\pi(u)$ . The conductance of the chain is defined as

\begin{equation*}\phi \equiv \inf \!\left\{{\phi(A) {\;\big|\;} A \in \mathcal{A}, \, 0 < \pi(A) \leqslant 1/2}\right\}.\end{equation*}

Recall that the total variation distance between two measures $P_1, P_2$ on a measurable space $(\mathcal{K}, \mathcal{A})$ is

\begin{equation*}d_{TV}(P_1, P_2) = \sup_{A \in \mathcal{A}} |P_1(A) - P_2(A)|.\end{equation*}

Note that $|P_1(A) - P_2(A)| = |P_1(\mathcal{K}{\setminus} A) - P_2(\mathcal{K}{\setminus} A)|$ , so if the supremum is attained on any $A \in \mathcal{A}$ , then it is attained on $\mathcal{K} {\setminus} A \in \mathcal{A}$ as well. If $P_1$ and $P_2$ are both absolutely continuous with respect to a dominating measure $\mu$ and thus have densities $p_1 \equiv \frac{dP_1}{d\mu}$ and $p_2 \equiv \frac{dP_2}{d\mu}$ , respectively, the total variation distance may also be written as

(3.1) \begin{equation}\begin{aligned}d_{TV}(P_1, P_2) &= \frac{1}{2}\int \big|p_1 - p_2\big| d\mu \\&= 1 - \int \min\!(p_1, p_2)\, d\mu \\&= 1 - \int_{S_1} \!\left[\min \!\left(1, \frac{p_2}{p_1}\right)\right] p_1 \, d\mu \\&= 1 - \mathbb{E}_{P_1} \!\left[\min\!\left(1, \, \frac{p_2}{p_1}\right)\right],\end{aligned}\end{equation}

where $S_1 = \left\{{x {\;\big|\;} p_1(x) > 0}\right\}$ . Recall that (3.1) does not depend on the choice of dominating measure $\mu$ , but that, rather, the densities are correctly specified with respect to the dominating measure. Additionally, note that the equality is attained on $\left\{{x {\;\big|\;} p_1(x) \geqslant p_2(x)}\right\}$ almost everywhere with respect to $\mu$ (or alternatively on its complement). The following relationship between conductance and the total variation distance to the stationary measure was proven in [Reference Lovász and Simonovits10].

Theorem 2. (Lovász and Simonovits.) Let $\pi_0$ be the initial distribution for a lazy, reversible Markov chain with conductance $\phi$ and stationary measure $\pi$ , and let $\pi_t$ denote the distribution after t steps. Let $\pi_0$ be an M-warm start for $\pi$ ; i.e., we have $M = \sup_{A \in \mathcal{A}} \frac{\pi_0(A)}{\pi(A)}$ . Then

(3.2) \begin{equation}d_{TV}(\pi_t, \pi) \leqslant \sqrt{M}\left(1 - \frac{\phi^2}{2}\right)^t.\end{equation}

As a consequence, we have the following bound on the mixing time.

Corollary 1. Given $\epsilon > 0$ and $M \equiv \sup \frac{\pi_0(A)}{\pi(A)}$ , after $t(\epsilon) \equiv \big\lceil \frac{2}{\phi^2} \log\!\big(\sqrt{M}/\epsilon\big) \big\rceil$ steps of the chain, we have $d_{TV}(\pi_{t(\epsilon)}, \pi) \leqslant \epsilon$ . Thus the Markov chain mixes in ${\widetilde{O}}\big(\phi^{-2}\big)$ steps from a warm start.

To find mixing times, it then suffices to lower-bound the conductance $\phi$ .

3.2. Isoperimetry

The typical means by which one finds lower bounds on the conductance is via isoperimetric inequalites. We first restate the cross-ratio used in the isoperimetric inequality we will employ.

Definition 2. (Cross-ratio.) Let $x, y \in \mathcal{K}$ , and let p, q be the end points of a chord in $\mathcal{K}$ passing through x, y, such that x lies in between p and y. The cross-ratio is defined to be

\begin{equation*}\sigma(x, y) = \frac{|x-y||p-q|}{|p - x||y - q|},\end{equation*}

where $|\cdot|$ denotes the Euclidean norm.

Additionally, for any $S_1, S_2 \subset \mathcal{K}$ , let

\begin{equation*}\sigma(S_1, S_2) = \inf_{x \in S_1, y \in S_2} \sigma(x, y).\end{equation*}

In [Reference Lovász9], Lovász proved an isoperimetric inequality involving the cross-ratio, from which the conductance $\phi$ may be lower-bounded for the special case of the uniform distribution on a convex body $\mathcal{K} \subset \mathbb{R}^n$ . This was extended to log-concave measures by Lovász and Vempala in [Reference Lovász and Vempala12], of which the uniform measure on a convex body is a special case. We state the latter result as follows.

Theorem 3. (Lovász and Vempala.) For any log-concave probability measure $\pi({\cdot})$ supported on $\mathcal{K}$ and a partition of $\mathcal{K}$ into measurable subsets $S_1$ , $S_2$ , and $S_3 = \mathcal{K}{\setminus}(S_1 \cup S_2)$ , we have

(3.3) \begin{equation}\pi(S_3) \geqslant \sigma(S_1, S_2) \pi(S_1)\pi(S_2).\end{equation}

3.3. Mixing of John’s walk

The key step in proving conductance lower bounds is to show that if two points x and y are close in geometric distance, then the total variation distance between the corresponding marginals $P_x$ and $P_y$ is bounded away from 1 by a controlled quantity. Note that given John’s ellipsoid $\mathcal{E}_x = \left\{{E_x u + x {\;\big|\;} |u| \leqslant 1}\right\}$ , where $E_x$ is symmetric and positive definite, a norm is induced via

\begin{equation*}\|y - x\|_x^2 = (y-x)^T E_x^{-2} (y - x).\end{equation*}

We first relate this norm to the cross-ratio as follows.

Theorem 4. Let $\|{\cdot}\|_x$ denote the norm induced by the John’s ellipsoid of $\mathcal{K}_x^s$ . Then

\begin{equation*}\sigma(x, y) \geqslant \frac{1}{\sqrt{n}}\|y- x\|_x.\end{equation*}

Proof. Noting that the cross-ratio is invariant to affine transformations, without loss of generality we may assume by a suitable affine transformation that the John’s ellipsoid of $\mathcal{K}_x^s$ is the unit ball, and thus $\|y - x\|_x = |y- x|$ . Let p, x, y, q denote successive points on a chord through $\mathcal{K}_x^s$ . Then

\begin{equation*}\begin{aligned}\sigma(x,y) &= \frac{|x-y||p-q|}{|p-x||y-q|} \\&\geqslant \frac{|x-y|\left(|p-x| + |y - q|\right)}{|p-x||y-q|} \\&\geqslant \max \!\left(\frac{|x-y|}{|y-q|}\, ,\, \frac{|x-y|}{|p-x|} \right) \\&= \frac{|x-y|}{\min\!(|y-q|, |p-x|)}\\&\geqslant \frac{|x-y|}{\sqrt{n}},\end{aligned}\end{equation*}

where the last inequality follows from the containment

\begin{equation*}\mathcal{B} \subset \mathcal{K} \subset \sqrt{n} \mathcal{B}\end{equation*}

(see (2.4)) and the observation that either $|x - p|$ or $|y - q|$ is contained inside $\mathcal{K}_x^s.$

Before bounding the statistical distance between $P_x$ and $P_y$ given a bound on the geometric distance between x and y, we first state some useful lemmas regarding the ellipsoids $\mathcal{E}_x$ and $\mathcal{E}_y$ . The next lemma is a generalization of the Cauchy–Schwarz inequality to semidefinite matrices.

Lemma 1. (Semidefinite Cauchy–Schwarz.) Let $\alpha_1, \ldots, \alpha_m \in \mathbb{R}$ and let $A_1, \ldots, A_m \in \mathbb{R}^{r \times n}$ . Then

(3.4) \begin{equation}\left(\sum_{i=1}^m \alpha_i A_i\right) \left(\sum_{i=1}^m \alpha_i A_i\right)^T \preceq \left(\sum_{i=1}^m \alpha_i^2\right)\left(\sum_{i=1}^m A_i A_i^T\right),\end{equation}

where $A \preceq B$ signifies that $B - A$ is positive semidefinite.

Proof. The proof is as in Lemma 3.11 in [Reference Kannan and Narayanan6]. For all i and j,

\begin{equation*}\big(\alpha_j A_i - \alpha_i A_j\big)\big(\alpha_j A_i - \alpha_i A_j\big)^T \succeq 0.\end{equation*}

Thus

\begin{equation*}\begin{aligned}0 &\preceq \frac{1}{2}\sum_{i=1}^m\sum_{j=1}^m \big(\alpha_j A_i - \alpha_i A_j\big)\big(\alpha_j A_i - \alpha_i A_j\big)^T \\&= \frac{1}{2} \sum_{i=1}^m \left[\left(\sum_{j=1}^m \alpha_j^2\right)A_i A_i^T - \alpha_i A_i \sum_{j=1}^m \left(\alpha_j A_j^T\right) - \left(\sum_{j=1}^m \left(\alpha_j A_j\right)\right)\left(\alpha_i A_i^T\right) + \alpha_i^2 \sum_{j=1}^m A_j A_j^T\right]\\&= \left(\sum_{i=1}^m \alpha_i^2\right)\left(\sum_{i=1}^m A_i A_i^T\right) - \left(\sum_{i=1}^m \alpha_i A_i\right) \left(\sum_{i=1}^m \alpha_i A_i\right)^T.\end{aligned}\end{equation*}

Now we study how the volume and aspect ratio of the John’s ellipsoid changes in a move from x to y. If the John’s ellipsoid centered at $x = 0$ is the unit ball (which we can assume after an affine transformation) and we make a move to y, the matrix $E_y$ such that

\begin{equation*}\mathcal{E}_y = \Big\{z| (z - y)^T E_y^{-2} (z - y) \leqslant 1\Big\}\end{equation*}

is the unique (from John’s theorem) $\arg\max$ over positive definite matrices of $ \log \det E$ under the constraint that

(3.5) \begin{equation} \mathcal{E} \subseteq \mathcal{K}_y^s. \end{equation}

Noting that the John’s ellipsoid of $\mathcal{K}_x^s$ is a unit ball at the origin, we see that the matrix $E_x$ can be taken to be the identity matrix. Let us translate $\mathcal{K}$ by $x-y$ (where $x = 0$ ). All the contact points with $\mathcal{K} $ of $\mathcal{E}_x $ must lie on the boundary of $\mathcal{E}_y $ or outside $\mathcal{E}_y$ . Note that if $u_i$ is a contact point of $\mathcal{E}_x$ with $\mathcal{K}$ , then by definition, $u_i^T u_i = 1.$ This implies that for any $\|u\| = 1$ ,

\begin{equation*}E_y u+ y \in \mathcal{E}_y \subseteq \mathcal{K} \subseteq \Big\{x| u_i^T x \leqslant 1\Big\}.\end{equation*}

The last containment above uses the fact that the John’s ellipsoid $\mathcal{E}_x$ to $\mathcal{K}^s_x$ at $x=0$ is the unit ball and that $u_i$ is a contact point of $\mathcal{E}_x$ with $\mathcal{K}$ . This in turn implies that

\begin{equation*}u_i^T\big(E_y u+ y\big) \leqslant 1.\end{equation*}

Since u may be an arbitrary unit vector, and $E_y$ is symmetric, this condition can be rewritten as

(3.6) \begin{equation} |E_y u_i| + \left \langle {u_i}, \, {y} \right \rangle \leqslant 1,\quad i = 1, \ldots, m.\end{equation}

Note that we do not claim that $E_y$ is the matrix with largest determinant that satisfies the above constraints. There may be other constraints corresponding to contact points for $\mathcal{E}_y$ that are not contact points of $\mathcal{E}_x$ .

Using Theorem 1 and Lemma 1, we deduce an upper bound on $\det E_y$ as follows.

Recall that we will denote a universal positive constant that is large by C and a universal positive constant that is small by c.

Lemma 2. Let $r = cn^{-5/2}$ , and assume y is chosen from a ball of radius r such that $\|y - x\|_x = |y - x| \leqslant r$ . Then

\begin{equation*}\det E_y \leqslant 1 + 2 n^{-2}.\end{equation*}

Proof. Note that by (3.6), $E_y$ satisfies the constraints $u_i^TE_y u_i \leqslant 1 - u_i^T y.$ Since the weights corresponding to the John’s ellipsoid $\mathcal{K}_x^s$ are positive, the constraint implies that

\begin{equation*}\sum_{i=1}^m c_i u_i^T E_y u_i \leqslant \sum_{i=1}^m c_i\big(1- u_i^Ty\big).\end{equation*}

In the above equation, $c_i$ is chosen so that it is non-zero for a contact point of the John’s ellipsoid with $\mathcal{K}_x^s$ only if it is also a contact point for $\mathcal{K}$ itself. Such a choice of $c_i$ can always be made, because, as explained in Subsection 2.2.1, if $u_i$ is a contact point with $\mathcal{K}_x^s$ that is not a contact point with $\mathcal{K}$ , then $-u_i$ must be a contact point with both $\mathcal{K}_x^s$ and $\mathcal{K}$ , and so no weight need fall on $u_i$ ; we can compensate with a weight on $-u_i$ . By (2.2) and (2.3), and using the linearity and cyclic invariance of the trace, we have

\begin{equation*}\begin{aligned}\text{tr}(E_y) &\leqslant \sum_{i=1}^m c_i - \sum_{i=1}^m c_i u_i^T y \\&= n - \left(\sum_{i=1}^m c_i u_i\right)^T y \\&= n - \left(\sum_{i=1}^m c_i u_i \right)^T y.\end{aligned}\end{equation*}

Considering $\big|\sum_i c_i u_i\big|$ to bound the middle term, we may employ Lemma 1. Letting $\alpha_i = \sqrt{c_i}$ and $A_i = \sqrt{c_i} u_i$ , we have

\begin{equation*}\left(\sum_{i=1}^m c_i u_i\right)\left(\sum_{i=1}^m c_i u_i\right)^T \preceq \left(\sum_{i=1}^m c_i\right)\left(\sum_{i=1}^m c_i u_i u_i^T\right).\end{equation*}

Noting that the right side is equal to $n I_n$ , and comparing the largest eigenvalue of the matrices on the left and the right side, it follows that

\begin{equation*}\Bigg|\sum_{i=1}^m c_i u_i\Bigg| \leqslant \sqrt{n}.\end{equation*}

Therefore, if y is chosen from a ball of radius $cn^{-5/2}$ , by Cauchy–Schwarz we conclude that

(3.7) \begin{equation}\text{tr}\big(E_y\big) \leqslant n + cn^{-2}.\end{equation}

Now, letting the eigenvalues of $E_y$ be denoted by $d_i > 0 $ , we have by the arithmetic–geometric mean inequality

\begin{equation*}\begin{aligned}\big(\!\det E_y\big)^{1/n} &= \left(\prod_{i=1}^n d_i \right)^{1/n} \\&\leqslant \frac{1}{n}\sum_{i=1}^n d_i \\&\leqslant \frac{1}{n}\left(n + cn^{-2}\right).\end{aligned}\end{equation*}

Thus,

\begin{align*}\det E_y &\leqslant \big(1 + cn^{-3}\big)^{n} \\& \leqslant 1 + n^{-2}.\end{align*}

The last inequality assumes that c is a sufficiently small positive constant.

We deduce a lower bound on $\det E_y$ by considering a positive definite matrix of the form $E = \beta\big(I_n - \alpha y y^T\big)$ such that the corresponding ellipsoid $y + \mathcal{E}$ is contained in the unit ball $\mathcal{E}_x$ . Note that such a matrix has eigenvalue $\beta\big(1 - \alpha |y|^2\big)$ of multiplicity 1 corresponding to the unit eigenvector $y/|y|$ , and eigenvalues $\beta$ of multiplicity $n - 1$ corresponding to any unit vector z which is orthogonal to y. We choose $\beta = 1 -\frac{|y|}{\sqrt{n}}$ , and $\alpha = 2\sqrt{n} |y|^{-1}.$

Lemma 3. We have $y + \mathcal{E} \subseteq \mathcal{E}_x $ , and hence $\text{vol}(\mathcal{E}_y) \geqslant \text{vol}(\mathcal{E}).$

Proof. Assume after rescaling that $\mathcal{E}_x$ is the unit ball $\mathcal{B}$ centered at the origin. We divide the points u on the boundary of $\mathcal{B}$ into two sets: $A = \left\{{u{\;\big|\;} \left \langle {u}, \, {y} \right \rangle \leqslant \frac{|y|}{\sqrt{n}}}\right\}$ and $B = \left\{{u {\;\big|\;} \left \langle {u}, \, {y} \right \rangle > \frac{|y|}{\sqrt{n}}}\right\}$ .

If $u\in A$ , we have

\begin{equation*}1 - \left \langle {u}, \, {y} \right \rangle \geqslant 1 - \frac{|y|}{\sqrt{n}},\end{equation*}

and noting that $E \succ 0$ and $0 \prec \big(I_n- \alpha y y^T\big) \prec I_n$ ,

\begin{equation*}|E u| \leqslant \beta \leqslant 1 - \left \langle {u}, \, {y} \right \rangle.\end{equation*}

If $u \in B$ , we have

\begin{equation*}\begin{aligned}|E u|^2 &= \beta^2 u^T\big(I - \alpha y y^T\big)^2 u\\& < \beta^2\big(1 - \alpha \!\left \langle {u}, \, {y} \right \rangle^2\big)\\&= \beta^2\big(1 - 2\sqrt{n} |y|^{-1} \left \langle {u}, \, {y} \right \rangle^2\big)\\& < (1 - \left \langle {u}, \, {y} \right \rangle\!)^2.\end{aligned}\end{equation*}

Therefore, for any $u \in A \cup B$ , we see that $|Eu| + \left \langle {u}, \, {y} \right \rangle \leqslant 1$ . From this, it follows that for any $u, v \in \partial\mathcal{B},$

(3.8) \begin{equation}u^T(Ev + y) \leqslant 1.\end{equation}

Therefore, for all $v \in \partial\mathcal{B}$ , $|Ev + y| \leqslant 1$ .

Thus, $\mathcal{E} + y \subseteq \mathcal{E}_x \subseteq \mathcal{K}$ , and so $\mathcal{E} +y \subseteq \mathcal{K}_y^s.$ Hence the volume of the John’s ellipsoid of $\mathcal{K}_y^s$ , namely $\mathcal{E}_y$ , is at least the volume of $\mathcal{E}$ .

Lemma 4. Let $r = cn^{-5/2}$ , and assume y is chosen from a ball of radius r such that $\|y - x\|_x = |y - x| \leqslant r$ . Then

\begin{equation*}\det E_y \geqslant 1 - 3 n^{-2}.\end{equation*}

Proof. Considering the matrix E as provided by Lemma 3, $E_y$ satisfies

\begin{equation*}\begin{aligned}\det E_y &\geqslant \det E \\&= \beta^n\big(1-\alpha|y|^2\big) \\&= \left(1-\frac{|y|}{\sqrt{n}}\right)^n\big(1 - 2\sqrt{n}|y|\big).\end{aligned}\end{equation*}

Thus

\begin{equation*}\begin{aligned}\det E_y &\geqslant \big(1 - cn^{-2}\big)\big(1 - 2cn^{-2}\big) \\&\geqslant \big(1 - 3 n^{-2}\big).\end{aligned}\end{equation*}

Lemmas 2 and 4 establish that for some universal constant $c > 0$ and $|y - x| \leqslant cn^{-5/2}$ , the volume ratio of $\mathcal{E}_y$ and $\mathcal{B}_x$ satisfies

(3.9) \begin{equation} 1 - 3n^{-2} \leqslant \frac{\text{vol}(\mathcal{E}_y)}{\text{vol}(\mathcal{B}_x)} \leqslant 1 + 2n^{-2}.\end{equation}

This does not necessarily indicate that the shapes of $\mathcal{E}_x$ and $\mathcal{E}_y$ are close, a property we require so that rejection does not occur too frequently. The following lemma guarantees that this is indeed the case.

Lemma 5. Let $d_1 \geqslant d_2 \geqslant \ldots \geqslant d_n > 0$ denote the eigenvalues of $E_y$ . Then the minimum eigenvalue $d_n$ satisfies

\begin{equation*}d_n \geqslant 1 - 4n^{-1}\end{equation*}

for large enough n. Similarly, the maximum eigenvalue $d_1$ satisfies

\begin{equation*}d_1 \leqslant 1 + Cn^{-1}\end{equation*}

for large enough n.

Proof. Assume the eigenvalues are ordered so that $d_1 \geqslant \ldots \geqslant d_n$ . If $d_n \geqslant 1$ , then of course $d_n \geqslant 1 - 4 n^{-1}$ , so we may assume that $\zeta \,:\!=\, 1 - d_n > 0$ . By Lemma 4 we see that

\begin{equation*}\det E_y = \prod_{i=1}^n d_i \geqslant 1 - 3n^{-2}.\end{equation*}

By the fact that $\text{tr}(E_y) \leqslant n + c/n^2$ (see (3.7)), making r smaller if necessary, we have that $\text{tr}(E_y) \leqslant n + n^{-2}.$ By the arithmetic–geometric mean inequality, we thus have

\begin{equation*}\begin{aligned}\big(1 - 3n^{-2}\big)/d_n &\leqslant \prod_{i < n} d_i \\&\leqslant \left(\frac{\sum_{i< n}d_i}{n-1}\right)^{n-1} \\&\leqslant \left(\frac{n + n^{-2} - d_n}{n-1}\right)^{n-1} \\&\leqslant \left(1 + \frac{1+n^{-2}-d_n}{n-1}\right)^{n-1}\\& \leqslant \exp\!\big(1 + n^{-2} - d_n\big).\end{aligned}\end{equation*}

The last inequality holds because $(1 + z/m)^m \leqslant \exp\!(z)$ for $z \geqslant 0$ and integer $m \geqslant 1$ . Thus, to summarize the above chain of inequalities,

(3.10) \begin{equation}\big(1 - 3/n^2\big) \leqslant \exp\!\big(1 + n^{-2} - d_n\big) d_n.\end{equation}

Then, (3.10) may be rephrased as

\begin{equation*}\exp\!\big(\zeta + n^{-2}\big)(1 - \zeta) \geqslant \Big(1 - \frac{3}{n^2}\Big).\end{equation*}

This implies that

(3.11) \begin{equation}\exp\!(\zeta)(1 - \zeta) \geqslant \Big(1 - \frac{6}{n^2}\Big).\end{equation}

We note that $\exp\!(\zeta)(1 - \zeta)$ is a monotonically decreasing function of $\zeta$ on (0, 1), because its derivative is

\begin{equation*}-\exp\!(\zeta) + \exp\!(\zeta)(1 - \zeta) = - \zeta \exp\!(\zeta).\end{equation*}

Recall that we say that a function $\tau$ of n satisfies $\tau = \Omega(1)$ if $\liminf\limits_{n \rightarrow \infty} > 0.$ If $\zeta = \Omega(1)$ , then $\exp\!(\zeta)(1- \zeta) \leqslant 1 - \Omega(1),$ which would contradict (3.11). Therefore, we assume that $\zeta = o(1)$ . But then

\begin{equation*}1 - \frac{\zeta^2(1 - \zeta)}{2} \geqslant \exp\!(\zeta)(1 - \zeta) \geqslant 1 - \frac{6}{n^2}\end{equation*}

gives us $\zeta \leqslant \frac{4}{n}$ , which implies the lower bound on $d_n$ .

To obtain the desired upper bound on $d_1,$ we first note that since there is a uniform lower bound of $1- \frac{4}{n}$ on all the $d_i$ , and as the product lies within $\Big(1 - \frac{3}{n^2}, 1 + \frac{2}{n^2}\Big)$ , we obtain a uniform upper bound of $2 e^4$ on the $d_i$ and hence an upper bound of $2 e^4$ on $d_1$ . Therefore, if $\|y - x\|_x \leqslant r$ then $\|x - y\|_y \leqslant 2 e^4 r$ . We may now reverse the roles of x and y, and obtain the following upper bound on $d_1$ :

\begin{equation*}d_1 \leqslant 1 + \frac{C}{n}.\end{equation*}

Now, to derive a lower bound on the conductance for John’s walk, we first must bound the statistical distance between two points given a bound on their geometric distance with respect to the local norm. Again without loss of generality, in what follows we may assume that $x = 0$ and that the John’s ellipsoid centered at x is the unit ball $\mathcal{B}_x$ (otherwise, perform an affine transformation so that this is the case). Let $x, y \in \mathcal{K}$ represent any two points in the body such that $\|y - x\|_x = |y - x| \leqslant r$ , where $r \in (0, 1)$ is a constant to be specified in terms of the dimension n. Let $P_x$ and $P_y$ denote the one-step transition probability measures defined at x and y, respectively. Let the uniform probability measures defined by the rescaled John’s ellipsoids $\mathcal{E}_x(r)$ and $\mathcal{E}_y(r)$ be denoted by $Q_x$ and $Q_y$ , respectively. We seek to bound

(3.12) \begin{equation}d_{TV}(P_x, P_y) \leqslant d_{TV}(P_x, Q_x) + d_{TV}(Q_x, Q_y) + d_{TV}(Q_y, P_y)\end{equation}

by choosing r such that the right side of (3.12) is $1-\Omega(1)$ .

To bound $d_{TV}(Q_x, Q_y)$ in (3.12), letting $\widetilde{Q}_y$ denote the probability measure corresponding to the uniform distribution on a ball of radius r centered at y, we may alternatively bound

(3.13) \begin{equation}d_{TV}(Q_x, Q_y) \leqslant d_{TV}\big(Q_x, \widetilde{Q}_y\big) + d_{TV}\big(\widetilde{Q}_y, Q_y\big).\end{equation}

We bound each term in (3.13) separately. To bound $d_{TV}\big(Q_x, \widetilde{Q}_y\big)$ , note that by our assumption that $\mathcal{E}_x = \mathcal{B}_x$ (the unit ball at x), the corresponding densities with respect to the dominating Lebesgue measure $\lambda$ are

\begin{equation*}q_x(z) = \left(\frac{1}{\lambda(\mathcal{B}_x(r))}\right) \cdot 1_{\left\{{z \in \mathcal{B}_x(r)}\right\}}\end{equation*}

and

\begin{equation*}\widetilde{q}_y(z) = \left(\frac{1}{\lambda\big(\mathcal{B}_y(r)\big)}\right) \cdot 1_{\left\{{z \in \mathcal{B}_y(r)}\right\}}.\end{equation*}

Thus, using (3.1) and noting that $\lambda(\mathcal{B}_x(r)) = \lambda\big(\mathcal{B}_y(r)\big)$ , we have

(3.14) \begin{equation}\begin{aligned}d_{TV}\big(Q_x, \widetilde{Q}_y\big) &= 1 - \int_{\mathcal{B}_x(r) \cap \mathcal{B}_y(r)} q_x(z)\, d\lambda(z) \\&= 1 - \frac{\lambda\!\left(\mathcal{B}_x(r) \cap \mathcal{B}_y(r)\right)}{\lambda(\mathcal{B}_x(r))}.\end{aligned}\end{equation}

The Lebesgue measure of $\mathcal{B}_x \cap \mathcal{B}_y$ is equal to twice the volume of a spherical cap. The following lemma regarding the volume of a spherical cap (Lemma 0.1 from [Reference Lovász and Simonovits10]) is useful.

Lemma 6. Let $\mathcal{B}_x \subset \mathbb{R}^n$ be the Euclidean ball of unit radius centered at x. Let $\mathcal{H} \subset \mathbb{R}^n$ define a half-space at a distance of at least t from x (so x is not contained in the half-space). Then for $t \leqslant \frac{1}{\sqrt{n}}$ , we have

\begin{equation*}\lambda(\mathcal{H}\cap \mathcal{B}_x) \geqslant \frac{1}{2}\left(1 - t\sqrt{n}\right) \lambda(\mathcal{B}_x).\end{equation*}

The following lemma results trivially from Lemma 6 and (3.14).

Lemma 7. Let $t \leqslant 1$ . If $\|y - x\|_x = |y-x| \leqslant \frac{rt}{\sqrt{n}}$ , then

\begin{equation*}d_{TV}\big(Q_x, \widetilde{Q}_y\big) \leqslant t.\end{equation*}

To bound $d_{TV}\big(\widetilde{Q}_y, Q_y\big)$ , note that we are bounding the total variation distance between a density supported on a ball and a density supported on an ellipsoid with the same center. The following lemma provides the bound.

Lemma 8. If $\|y-x\|_x \leqslant r = cn^{-5/2}$ , the total variation distance between $\widetilde{Q}_y$ and $Q_y$ satisfies

\begin{equation*}d_{TV}\big(\widetilde{Q}_y, Q_y\big) \leqslant 1/4.\end{equation*}

Proof. Note that by (3.1), we have

\begin{equation*}\begin{aligned}d_{TV}\big(\widetilde{Q}_y, Q_y\big) &= 1 - \mathbb{E}_{\widetilde{Q}_y}\!\left[\min \!\left(1, \, \frac{\lambda\big(\mathcal{B}_y(r)\big)}{\lambda(\mathcal{E}_y(r))} \right)\textbf{1}_{z \in \mathcal{E}_y(r)}\right]\\&= 1- \min\!\left[1,\, \frac{\lambda(\mathcal{B}_y)}{\lambda(\mathcal{E}_y)}\right] \mathbb{P}_{\widetilde{Q}_y}\big(Z \in \mathcal{E}_y(r)\big) \\&\leqslant 1- \min\!\left[1,\, \frac{\lambda(\mathcal{B}_y)}{\lambda(\mathcal{E}_y)}\right] \mathbb{P}_{\widetilde{Q}_y}\big(Z \in \mathcal{E}_y(r)|B\big)\mathbb{P}_{\widetilde{Q}_y}(Z \in B),\end{aligned}\end{equation*}

where B denotes the event in which $Z \in \big(1 - \frac{C}{n}\big)\cdot\mathcal{B}_y(r)$ . By Lemma 5, it follows that $\mathbb{P}_{\widetilde{Q}_y}(Z \in \mathcal{E}_y(r)|B) = 1$ , since the smallest eigenvalue of $E_y$ is at least $1 - Cn^{-1}$ . Additionally, by (3.9),

\begin{equation*}\begin{aligned}\min\!\left[1, \, \frac{\lambda(\mathcal{B}_y)}{\lambda(\mathcal{E}_y)}\right] &\geqslant \frac{1}{1+Cn^{-2}} \\&\geqslant \exp\!\big({-}Cn^{-2}\big).\end{aligned}\end{equation*}

Now, noting that $\big(1-\frac{x}{2}\big) \geqslant e^{-x}$ for $x \in [0, 1]$ , we have $\mathbb{P}_{\widetilde{Q}_y}(Z \in B) = \left(1-\frac{C}{n}\right)^n \geqslant e^{-2c}$ , and

\begin{equation*}d_{TV}\big(\widetilde{Q}_y, Q_y\big) \leqslant 1 - \exp\!\left({-}C\big(2 + n^{-2}\big)\right).\end{equation*}

To bound $d_{TV}(P_x, Q_x)$ , we provide the following lemma.

Lemma 9. If $\|y-x\|_x \leqslant r = cn^{-5/2}$ , the total variation distance between $P_x$ and $Q_x$ satisfies

\begin{equation*}d_{TV}(P_x, Q_x) \leqslant 1/4.\end{equation*}

Proof. With some abuse of notation with regard to (2.6), temporarily let the density of $Q_x$ with respect to the dominating measure $\mu$ as defined by (2.7) be

\begin{equation*}q_x(y) \equiv \frac{dQ}{d\mu}(y) = \left(\frac{1}{\lambda(\mathcal{E}_x(r))}\right) \cdot 1_{\left\{{y \in \mathcal{E}_x(r), \, y \neq x}\right\}}.\end{equation*}

Then, since $q_x(x) = 0$ and $p_x(y) \leqslant q_x(y)$ for $y \neq x$ , by (3.1) we have

\begin{equation*}\begin{aligned}d_{TV}(P_x, Q_x) &= 1 - \int_{\left\{{y {\;\big|\;} q_x(y) \geqslant p_x(y)}\right\}} \min\!\left[q_x(y), \, p_x(y) \right] d\mu(y) \\&= 1 - \int_{\mathcal{K}{\setminus}\left\{{x}\right\}} \min\!\left[q_x(y), \, p_x(y) \right] d\lambda(y) \\&= 1 - \int_{\mathcal{E}_x(r) {\setminus}\left\{{x}\right\}} \min \!\left[\frac{1}{\lambda(\mathcal{E}_x(r))}, \, \frac{1}{\lambda(\mathcal{E}_y(r))} \right]\cdot 1_{\left\{{x \in \mathcal{E}_y(r)}\right\}}\, d\lambda(y) \\&= 1 - \int_{\mathcal{E}_x(r)}\left[\min\!\left(1, \, \frac{\lambda(\mathcal{E}_x(r))}{\lambda(\mathcal{E}_y(r))} \right)\right]\cdot 1_{\left\{{x \in \mathcal{E}_y(r)}\right\}}\cdot \left(\frac{1}{\lambda(\mathcal{E}_x(r))}\right)\, d\lambda(y) \\&= 1 - \mathbb{E}_{Q_x}\!\left[\min\!\left(1, \, \frac{\lambda(\mathcal{E}_x)}{\lambda(\mathcal{E}_Y)} \right) \cdot 1_{\left\{{Y \in A}\right\}}\right],\end{aligned}\end{equation*}

where we let A denote the ‘accept’ event in which $x \in \mathcal{E}_Y(r)$ . Since $\mathcal{E}_x = \mathcal{B}_x$ , as in the proof of Lemma 8, we have for all $Y \in A$

\begin{equation*}\min\!\left[1, \, \frac{\lambda(\mathcal{E}_x)}{\lambda(\mathcal{E}_Y)}\right] \geqslant \frac{1}{1+Cn^{-2}}.\end{equation*}

Therefore,

\begin{equation*}\begin{aligned}d_{TV}(P_x, Q_x) &\leqslant 1 - \left(\frac{1}{1+Cn^{-2}}\right)\mathbb{P}_{Q_x}(Y \in A) \\&\leqslant 1 - \left(\frac{1}{1+Cn^{-2}}\right)\mathbb{P}_{Q_x}(Y \in A|Y \in B)\mathbb{P}_{Q_x}(Y \in B),\end{aligned}\end{equation*}

where B is the event in which $Y \in \big(1-\frac{C}{n}\big)\cdot\mathcal{E}_x(r) = r\big(1-\frac{C}{n}\big)\cdot\mathcal{B}_x$ . Again by Lemma 5, $\mathbb{P}_{Q_x}(Y \in A| Y \in B) = 1$ . The remainder of the proof is as in Lemma 8.

Note that by a similar argument, $d_{TV}(Q_y, P_y) \leqslant 1/4$ for some universal $c > 0$ as well. Combining this with Lemmas 7, 8, and 9, we obtain the following theorem.

Theorem 5. If $\|y-x\|_x \leqslant \frac{rt}{\sqrt{n}} = ctn^{-3}$ for some universal constant $c > 0$ and some $t \leqslant 1$ , the total variation distance between $P_x$ and $P_y$ satisfies

\begin{equation*}d_{TV}(P_x, P_y) \leqslant 3/4 + t = 1 - \epsilon.\end{equation*}

In particular, we may choose $t = 1/8$ , so that $\epsilon = 1/8$ .

We finally arrive at a lower bound on the conductance for John’s walk using Theorems 3, 4, and 5. The proof of the next result is similar to that of Corollary 10 and Theorem 11 in [Reference Lovász9].

Theorem 6. (Conductance lower bound.) Consider the partition $\mathcal{K} = S_1 \cup S_2$ where $S_1, S_2 \in \mathcal{A}$ , and let $\pi$ be the uniform measure on $\mathcal{K}$ , i.e.,

\begin{equation*}\pi(A) = \frac{\lambda(A)}{\lambda(\mathcal{K})} \quad { for\ all }\ A \in \mathcal{A}.\end{equation*}

Then for large enough n and $t = 1/8$ , we have

\begin{equation*}\int_{S_1} P_x(S_2) d\pi(x) \geqslant \left(\frac{c}{512n^{7/2}}\right) \min\!(\pi(S_1), \pi(S_2)),\end{equation*}

so $\phi = \Omega\big(n^{-7/2}\big)$ .

Proof. Note that the Radon–Nikodym derivative of $P_x$ with respect to the Lebesgue measure $\lambda$ is well-defined for all $y \in \mathcal{K}{\setminus}\left\{{x}\right\}$ , and is given as

\begin{equation*}\frac{dP_x}{d\lambda}(y) = \min \!\left[\frac{1}{\lambda(\mathcal{E}_x(r))}, \, \frac{1}{\lambda(\mathcal{E}_y(r))} \right] 1_{\left\{{x \in \mathcal{E}_y(r),\, y \in \mathcal{E}_x(r)}\right\}}.\end{equation*}

Let

\begin{equation*}\rho(x) \equiv \frac{d\pi}{d\lambda}(x) = \frac{1}{\lambda(\mathcal{K})} \cdot 1_{\left\{{x \in \mathcal{K}}\right\}}\end{equation*}

be the density for $\pi$ . Then for any $x, y \in \mathcal{K}$ such that $y \neq x$ , we have

\begin{equation*}\rho(x)\frac{dP_x}{d\lambda}(y) = \rho(y) \frac{dP_y}{d\lambda}(x),\end{equation*}

from which it follows that $\pi$ is the stationary measure for the chain.

Now consider points far inside $S_1$ that are unlikely to cross over to $S_2$ . Letting $t = 1/8$ so $\epsilon = 1/8$ as in Theorem 5, we define

\begin{equation*}S^{\prime}_1 \equiv S_1 \cap \left\{{x {\;\big|\;} \rho(x) P_x(S_2) < \frac{\epsilon}{2\lambda(\mathcal{K})}}\right\}.\end{equation*}

Similarly, let

\begin{equation*}S^{\prime}_2 \equiv S_2 \cap \left\{{y {\;\big|\;} \rho(y) P_y(S_1) < \frac{\epsilon}{2\lambda(\mathcal{K})}}\right\}.\end{equation*}

Since $\rho(x)P_x(S_2) \geqslant \epsilon/(2\lambda(\mathcal{K}))$ for $x \in S_1{\setminus} S^{\prime}_1$ , we have

\begin{equation*}\begin{aligned}\int_{S_1} P_x(S_2) d\pi(x) &\geqslant \int_{S_1 {\setminus} S^{\prime}_1} \rho(x) P_x(S_2) d\lambda(x) \\&\geqslant \frac{\epsilon \lambda\big(S_1{\setminus} S^{\prime}_1\big)}{2\lambda(\mathcal{K})} \\&= (\epsilon/2)\pi\big(S_1{\setminus} S^{\prime}_1\big).\end{aligned}\end{equation*}

Similarly, for $y \in S_2{\setminus} S^{\prime}_2$ we have

\begin{equation*}\int_{S_2} P_y(S_1) d\pi(y) \geqslant (\epsilon/2)\pi\big(S_2{\setminus} S^{\prime}_2\big).\end{equation*}

By the reversibility of the chain, we have

(3.15) \begin{equation}\int_{S_1} P_x(S_2) d\pi(x) = \int_{S_2} P_y(S_1)d\pi(y),\end{equation}

so it follows that

\begin{equation*}\begin{aligned}\int_{S_1} P_x(S_2)d\pi(x) &= \frac{1}{2}\int_{S_1} P_x(S_2)d\pi(x) + \frac{1}{2}\int_{S_2} P_y(S_1)d\pi(y) \\&\geqslant \frac{\epsilon}{4}\left(\pi\big(S_1{\setminus} S^{\prime}_1\big) + \pi\big(S_2{\setminus} S^{\prime}_2\big)\right) \\&= (\epsilon/4)\pi\big(\mathcal{K}{\setminus}\big(S^{\prime}_1 \cup S^{\prime}_2\big)\big).\end{aligned}\end{equation*}

Now let $\delta = ctn^{-3}$ . Assuming that $\pi\big(S^{\prime}_1\big) \leqslant (1-\delta)\pi(S_1)$ , we have $\pi\big(S_1{\setminus}S^{\prime}_1\big) = \pi(S_1) - \pi\big(S^{\prime}_1\big) \geqslant \delta\pi(S_1)$ , and thus

\begin{equation*}\begin{aligned}\int_{S_1} P_x(S_2) d\pi(x) &\geqslant (\epsilon\delta/2)\pi(S_1) \geqslant (\epsilon\delta/2) \min\!(\pi(S_1), \pi(S_2)) \\&= \left(\frac{c}{128n^{3}}\right) \min\!(\pi(S_1), \pi(S_2)),\end{aligned}\end{equation*}

which proves the claim. Similarly, if $\pi\big(S^{\prime}_2\big) \leqslant (1-\delta)\pi(S_2)$ , the claim is proved again using (3.15). Thus, assume that $\pi\big(S^{\prime}_1\big) > (1-\delta)\pi(S_1)$ and $\pi\big(S^{\prime}_2\big) > (1-\delta)\pi(S_2)$ . By Theorem 3, we have

\begin{equation*}\pi\big(\mathcal{K} {\setminus}\big(S^{\prime}_1 \cup S^{\prime}_2\big)\big) \geqslant \sigma\big(S^{\prime}_1, S^{\prime}_2\big)\pi\big(S^{\prime}_1\big)\pi\big(S^{\prime}_2\big).\end{equation*}

Now, given $x \in S^{\prime}_1$ and $y \in S^{\prime}_2$ , the total variation between $P_x$ and $P_y$ satisfies

\begin{equation*}\begin{aligned}d_{TV}(P_x, P_y) &\geqslant P_x(S_1) - P_y(S_1) \\&= 1 - P_x(S_2) - P_y(S_1) \\&> 1 - \epsilon.\end{aligned}\end{equation*}

By Theorem 5, it follows that $\|y-x\|_x > \delta$ . Then by Theorem 4 it follows that

\begin{equation*}\sigma\big(S^{\prime}_1, S^{\prime}_2\big) \geqslant n^{-1/2}\|y-x\|_x > \delta n^{-1/2}.\end{equation*}

Finally, we deduce that

\begin{equation*}\begin{aligned}\int_{S_1} P_x(S_2)d\pi(x) &\geqslant (\epsilon\delta n^{-1/2}/4) \pi\big(S^{\prime}_1\big)\pi\big(S^{\prime}_2\big) \\&\geqslant \left(\frac{\epsilon \delta(1-\delta)^2}{4\sqrt{n}}\right)\pi(S_1)\pi(S_2) \\&\geqslant \left(\frac{\epsilon \delta(1-\delta)^2}{8\sqrt{n}}\right)\min\!(\pi(S_1), \pi(S_2))\\&= \left(\frac{c(1-\delta)^2}{512n^{7/2}}\right)\min\!(\pi(S_1), \pi(S_2)).\end{aligned}\end{equation*}

The claim follows by absorbing terms into the constant for large enough n.

Now applying Corollary 1, we obtain our main theorem.

Theorem 7. For $\epsilon > 0$ and $M \geqslant \sup \frac{\pi_0(A)}{\pi(A)}$ , after $t(\epsilon) = C n^7 \log\!\big(\sqrt{M}/\epsilon\big) $ steps of John’s walk, we have $d_{TV}(\pi_{t(\epsilon)}, \pi) \leqslant \epsilon$ .

As a matter of fact, this theorem allows us to find mixing time bounds starting from any point that is not on the boundary of $\mathcal{K}$ . Suppose we know that x belongs to the interior of $\mathcal{K}$ and satisfies the following chord condition. For all chords pq of $\mathcal{K}$ containing x, assume $\frac{|p-x|}{|q-x|} \in \big(\eta, \eta^{-1}\big)$ for some parameter $0 < \eta < 1$ that measures the centrality of x in $\mathcal{K}$ . Then we see that

\begin{equation*}\lambda(\mathcal{B}_x(r)) \geqslant \left(\frac{r\eta}{\sqrt{n}}\right)^n \lambda(\mathcal{K}).\end{equation*}

After a random geometrically distributed time $\tau$ with mean bounded above by an absolute constant, the first nontrivial move occurs. Then the distribution of $x_\tau$ has a density bounded above by

\begin{equation*}M = \left(\frac{r\eta}{\sqrt{n}}\right)^{-n}.\end{equation*}

We thus have the following theorem.

Theorem 8. Let $r = c n^{-\frac{5}{2}}$ be the radius of the John’s ellipsoids in the walk. For all chords pq of $\mathcal{K}$ containing x, assume $\frac{|p-x|}{|q-x|} \in \big(\eta, \eta^{-1}\big)$ for some parameter $0 < \eta < 1$ that measures the centrality of x in $\mathcal{K}$ . Then there is a random geometrically distributed time $\tau$ with mean bounded above by C such that for $\epsilon > 0$ , after $t(\epsilon) + \tau = C n^7 \left(n \log\!\big(\sqrt{n}/(r\eta)\big) + \log\!({1}/\epsilon)\right) + \tau$ steps of John’s walk, we have $d_{TV}(\pi_{t(\epsilon) + \tau}, \pi) \leqslant \epsilon$ .

4. Concluding remarks

We introduced an affine-invariant random walk akin to the Dikin walk which uses uniform sampling from John’s ellipsoids of a certain small radius of appropriately symmetrized convex sets to make steps, and showed that this walk mixes to within a total variation distance $\epsilon$ in $O\big(n^7\log \epsilon^{-1}\big)$ steps from a warm start. The type of convex body $\mathcal{K}$ is not specified (i.e., it need not be a polytope) in our analysis of the mixing time, but one must have access to the John’s ellipsoid of the current symmetrization of the convex body. A significant feature of this walk is that its mixing time from a warm start, or alternatively a ‘central point’ such as the center of mass, can be bounded above by a quantity that has absolutely no dependence on any parameter associated with the body apart from its dimension.

One potential avenue for obtaining an improved mixing time is to increase r and control the relative change of shape of the John’s ellipsoids $\mathcal{E}_x$ and $\mathcal{E}_y$ when $\|x - y\|_x < r/\sqrt{n}$ in a more direct manner. We suspect that the method of controlling the change in shape using the arithmetic–geometric mean inequality and the ratio of the volumes of $\mathcal{E}_x$ and $\mathcal{E}_y$ that appears in Lemma 3.3 might be suboptimal.

Acknowledgements

We are deeply grateful to the anonymous reviewers for their role in improving this paper by pointing out numerous places where the arguments needed greater clarity.

Funding information

H. N. was partially supported by a Ramanujan Fellowship and a Swarna Jayanti Fellowship instituted by the Government of India. H. N. also acknowledges the support of DAE project no. 12-R&D-TFR-5.01-0500.

Competing interests

There were no competing interests to declare which arose during the preparation or publication process of this article.

References

Ball, K. (1992). Ellipsoids of maximal volume in convex bodies. Geom. Dedicata 41, 241250.CrossRefGoogle Scholar
Chen, Y., Dwivedi, R., Wainwright, M. J. and Yu, B. (2017). Fast MCMC sampling algorithms on polytopes. Preprint. Available at https://arxiv.org/abs/1710.08165.Google Scholar
Gruber, P. M. (1988). Minimal ellipsoids and their duals. Rend. Circ. Mat. Palermo 37, 3564.CrossRefGoogle Scholar
John, F. (1948). Extremum problems with inequalities as subsidiary conditions. In Studies and Essays: Courant Anniversary Volume, eds K. O. Friedrichs, O. E. Neugebauer and J. J. Stoker, Wiley-Interscience, New York, pp. 187204.Google Scholar
Kannan, R., Lovász, L. and Simonovits, M. (1997). Random walks and an ${O}^*\big(n^5\big)$ volume algorithm for convex bodies. Random Structures Algorithms 11, 150.3.0.CO;2-X>CrossRefGoogle Scholar
Kannan, R. and Narayanan, H. (2012). Random walks on polytopes and an affine interior point method for linear programming. Math. Operat. Res. 37, 120.CrossRefGoogle Scholar
Laddha, A., Lee, Y. T. and Vempala, S. S. (2020). Strong self-concordance and sampling. In Proc. 52nd Annual ACM SIGACT Symposium on Theory of Computing (STOC 2020), eds K. Makarychev, Y. Makarychev, M. Tulsiani, G. Kamath and J. Chuzhoy, Association for Computing Machinery, New York, pp. 12121222.CrossRefGoogle Scholar
Lee, Y. T. and Sidford, A. (2013). Path finding I: solving linear programs with $\widetilde{O}(\sqrt{rank})$ linear system solves. Preprint. Available at https://arxiv.org/abs/1312.6677.Google Scholar
Lovász, L. (1999). Hit-and-run mixes fast. Math. Program. 86, 443461.Google Scholar
Lovász, L. and Simonovits, M. (1993). Random walks in a convex body and an improved volume algorithm. Random Structures Algorithms 4, 359412.CrossRefGoogle Scholar
Lovász, L. and Vempala, S. (2006). Hit-and-run from a corner. SIAM J. Computing 35, 9851005.CrossRefGoogle Scholar
Lovász, L. and Vempala, S. (2007). The geometry of logconcave functions and sampling algorithms. Random Structures Algorithms 30, 307358.CrossRefGoogle Scholar
Narayanan, H. (2016). Randomized interior point methods for sampling and optimization. Ann. Appl. Prob. 26, 597641.CrossRefGoogle Scholar
Figure 0

Algorithm 2.1: John’s walk step