1. Transportation on the sphere
Optimal transportation involves moving unit mass from one probability distribution to another, at minimal cost, where the cost is measured by Wasserstein's distance.
Definition Let $(M,\,d)$ be a compact metric space and let $\mu$
and $\nu$
be probability measures on $M$
. Then for $1\leq p<\infty$
, Wasserstein's distance from $\mu$
to $\nu$
is $W_p(\nu,\, \mu )$
, where

where the probability measure $\pi$ has marginals $\nu$
and $\mu$
(see [Reference Dudley8, Reference Villani14]).
Transportation inequalities are results that bound the transportation cost $W_p(\nu,\, \mu )^p$ in terms of $\mu$
, $\nu$
and geometrical quantities of $(M,\,d)$
. Typically, one chooses $\mu$
to satisfy special conditions, and then one imposes minimal hypotheses on $\nu$
. In this section, we consider the case where $(M,\,d)$
is the unit sphere ${\bf S}^2$
in ${\bf R}^3$
, and obtain transportation inequalities by vector calculus. In section two, we extend these methods to a connected, compact and $C^\infty$
smooth Riemannian manifold $(M,\,d)$
.
On ${\bf S}^2$, let $\theta \in [0,\, 2\pi )$
be the longitude and $\phi \in [0,\, \pi ]$
the colatitude, so the area measure is ${\rm d}x=\sin \phi \, d\phi d\theta$
. Let $ABC$
be a spherical triangle where $A$
is the North Pole; then by [Reference Kimura and Okamoto10] the Green's function $G(B,\,C)=-(4\pi )^{-1}\log (1-\cos d(B,\,C))$
may be expressed in terms of longitude and co latitude of $B$
and $C$
via the spherical cosine formula. A related cost function is listed in [Reference Villani14], p 972. Given probability measures $\mu$
and $\nu$
on ${\bf S}^2$
, we can form

with gradient in the $x$ variable

Proposition 1.1 Let $\mu$ and $\nu$
be nonatomic probability measures on ${\bf S}^2$
. Then

Proof. The Green's function is chosen so that $\nabla \cdot \nabla G(B,\,C)=\delta _B(C)-1/(4\pi )$ in the sense of distributions. Given non-atomic probability measures $\mu$
and $\nu$
on ${\bf S}^2$
, their difference $\mu -\nu$
is orthogonal to the constants on ${\bf S}^2,$
so for a $1$
-Lipschitz function $\varphi : {\bf S}^2\rightarrow {\bf R}$
, we have

so by Kantorovich's duality theorem [Reference Dudley8], the Wasserstein transportation distance is bounded by

Definition Suppose that $\mu$ is a probability measure and $\nu$
is a probability measure that is absolutely continuous with respect to $\mu$
, so $d\nu =vd\mu$
for some probability density function $v\in L^1(\mu )$
. Then the relative entropy of $\nu$
with respect to $\mu$
is

where $0\leq {\hbox {Ent}}(\nu \mid \mu ) \leq \infty$ by Jensen's inequality.
At $x\in {\bf S}^2$, we have tangent space $T_s{\bf S}^2=\{ y\in {\bf R}^3: x\cdot y=0\}$
. For $y\in T_x{\bf S}^2$
with $\Vert y\Vert =1$
, we consider $\exp _x(ty)=x\cos t+y\sin t$
so that $\exp _x(0)=x$
, $\Vert \exp _x(ty)\Vert =1$
and $(d/{\rm d}t)_{t=0}\exp _x(ty)=y$
; hence $\exp _x:T_x{\bf S}^2\rightarrow {\bf S}^2$
gives the exponential map. We let $J_{\exp _x}$
be the Jacobian determinant of this map.
Suppose that $\mu ({\rm d}x)=e^{-U (x)}{\rm d}x$ is a probability measure and $\nu$
is a probability measure that is absolutely continuous with respect to $\mu$
, so $d\nu =vd\mu$
. We say that a Borel function $\Psi :{\bf S}^2\rightarrow {\bf S}^2$
induces $\nu$
from $\mu$
if $\int f(y)\nu ({\rm d}y)=\int f(\Psi (x))\mu ({\rm d}x )$
for all $f\in C({\bf S}^2; {\bf R})$
. McCann [Reference McCann12] showed that there exists $\Psi$
that gives the optimal transport strategy for the $W_2$
metric; further, there exists a Lipschitz function $\psi : {\bf S}^2\rightarrow {\bf R}$
such that $\Psi (x)=\exp _x(\nabla \psi (x))$
; so that

Talagrand developed $T_p$ inequalities in which $W_p(\nu,\, \mu )^p$
is bounded in terms of ${\hbox {Ent}}(\nu \mid \mu )$
, as in [Reference Villani14], p 569. In [Reference Cordero-Erausquin5] and [Reference Cordero-Erausquin, McCann and Schmuckensläger6], the authors obtain some functional inequalities that are related to $T_p$
inequalities. Here we offer an approach that is more direct, and uses only basic differential geometry to augment McCann's fundamental result. The key point is an explicit formula for the relative entropy in terms of the optimal transport maps.
Lemma 1.2 Suppose that $\nu$ has finite relative entropy with respect to $\mu,$
and let

let $\Psi _t(x)=\exp _x(t\nabla \psi (x))$ for $t\in [0,\,1]$
. Then the relative entropy satisfies

where $A$ is positive definite, $H$
is symmetric and $A+H$
is also positive definite, and

If $\psi \in C^2,$ then equality holds in (1.8).
Proof. To express the relative entropy in terms of the transportation map, we adapt an argument from [Reference Blower1]. We have ${\hbox {Ent}}(\nu \mid \mu )=\int _{{\bf S}^2} \log v(\Psi (x))\mu ({\rm d}x)$, where the integrand is

where the final term arises from the Jacobian of the change of variable $y=\Psi (x)$, where $\Psi =\Psi _1$
and $\Psi _t(x)=\exp _x(t\nabla \psi (x))$
. We compute this Jacobian by the chain rule for derivatives with respect to $x$
. Specifically by [Reference Cordero-Erausquin, McCann and Schmuckensläger6] p 622, we have ${\hbox {Hess}}(\psi (x)+d(x,\,y)^2/2)\geq 0$
and

where $J_{\exp _x}$ is the Jacobian of $\exp _x:T_x{\bf S}^2\rightarrow {\bf S}^2$
and ${\hbox {Hess}}=D_x^2$
is the Hessian, where the expression is evaluated at $y=\exp _x(\nabla \psi (x))$
. For $x\in {\bf S}^2$
and $\tau \in {\bf R}^3$
such that $x\cdot \tau =0$
, we have $\tau \in T_x{\bf S}^2$
and

see [Reference Cordero-Erausquin5]. By a vector calculus computation, which we replicate from [Reference Cordero-Erausquin5], one finds

With $\psi :{\bf S}^2\rightarrow {\bf R}$ we have $\nabla \psi (x)\perp x$
, so $0=x\cdot \nabla \psi (x),$
hence $0=\nabla \psi (x)+{\hbox {Hess}}(\psi (x)) x$
. We write $\theta =\Vert \nabla \psi (x)\Vert$
for the angle between $x$
and $\Psi (x)$
so

let $v=x\times \theta ^{-1}\nabla \psi (x)$ where $\times$
denotes the usual vector product; then $\{ x,\, \theta ^{-1}\nabla \psi (x),\, v\}$
gives an orthonormal basis of ${\bf R}^3$
. Hence

and we obtain (1.13) from the final factor. Then by spherical trigonometry, we have

so we have $\langle \nabla _x \cos d(x,\,y),\, \tau \rangle =\langle y,\, \tau \rangle$ and $\langle {\hbox {Hess}}_x\cos d(x,\,y)\tau,\, \tau \rangle =-(\cos d(x,\,y)) \Vert \tau \Vert ^2$
; so

hence $A$ is positive definite and is a rank-one perturbation of a multiple of the identity matrix. Note that the formulas degenerate on the cut locus $d(x,\,y)=\pi ;$
consider the international date line opposite the Greenwich meridian.
We have

in which

and we can combine the first two terms in (1.16) by the divergence theorem so

Hence from (1.11) we have

in which the Alexandrov Hessian [Reference Cordero-Erausquin, McCann and Schmuckensläger6], [Reference Villani14] p 363 satisfies

where $\Delta _D\psi$ is the distributional derivative of the Lipschitz function $\psi$
; so we recognize (1.8).
We have an orthonormal basis

for ${\bf R}^3$ in which the final two vectors give an orthonormal basis for $T_x{\bf S}^2$
. Then

and

hence $A$ and $H$
have the form

with respect to the stated basis of $T_x{\bf S}^2$.
The function $f(x)=x-1-\log x$ for $x>0$
is convex and takes its minimum value at $f(1)=0$
. Let $T$
be a self-adjoint matrix with eigenvalues $\lambda _1\geq \dots \geq \lambda _n$
where $\lambda _n>-1$
; then the Carleman determinant of $I+T$
is $\det _2(I+T)=\prod _{j=1}^n (1+\lambda _j)e^{-\lambda _j}$
. Since $A+H$
is positive definite, as in [Reference Blower1] corollary 4.3, we can apply the spectral theorem to compute the Carleman determinant and show that

so

Proposition 1.3 Suppose that the Hessian matrix of $U$ satisfies

for some $\kappa _U>0$. Then $\mu$
satisfies the transportation inequality

This applies in particular when $\mu$ is normalized surface area measure.
Proof. Let $K:[0,\, \pi )\rightarrow {\bf R}$ be the function

Then from (1.13) and (1.26) we have

Considering the final integral in (1.8), we have

which has constant speed $\Vert {\frac {\partial \Psi _t(x) }{\partial t}}\Vert =\Vert \nabla \psi (x)\Vert$ and $\langle {\frac {\partial \Psi _t(x) }{\partial t}},\, \Psi _t(x)\rangle =0;$
also

where the final term is zero since $\nabla U\circ \Psi _t(x)$ is in the tangent space at $\Psi _t(x)$
, hence is perpendicular to $\Psi _t(x)$
. We therefore have the crucial inequality

To simplify the function $K$, we recall from [Reference Gradsteyn and Ryzhik9] 8.342 the Maclaurin series

where we have introduced Euler's $\Gamma$ function and Riemann's $\zeta$
function, so

Now we consider (1.32) with the hypothesis (1.27) in force. The Carleman determinant contributes a nonnegative term as in (1.25), while the final integral in (1.32) combines with the integral of $K(\Vert \nabla \psi (x)\Vert )$ to give

When $\mu$ is normalized surface area, $U$
is a constant and the hypothesis (1.27) holds with $\kappa _U=1$
.
2. Transportation on compact Riemannian manifolds
Let $M$ be a connected, compact and $C^\infty$
smooth Riemannian manifold of dimension $n$
without boundary, and let $g$
be the Riemannian metric tensor, giving metric $d$
. Let $\mu ({\rm d}x)=e^{-U(x)}{\rm d}x$
be a probability measure on $M$
where ${\rm d}x$
is Riemannian measure and $U\in C^2(M; {\bf R})$
. Suppose that $\nu$
is a probability measure on $M$
that is of finite relative entropy with respect to $\mu$
. Then by McCann's theory [Reference McCann12], there exists a Lipschitz function $\psi :M\rightarrow {\bf R}$
such that $\Psi (x)=\exp _x(\nabla \psi (x))$
induces $\nu$
from $\mu$
. then we let $\Psi _t(x)=\exp _x(t\nabla \psi (x))$
. We proceed to compute quantities which we need for our extension of lemma 1.2.
Given distinct points $x,\,y\in M$, we suppose that $x=\exp _y(\xi )$
, and for $w\in T_yM$
introduce

so that $t\mapsto \gamma (s,\,t)$ is a geodesic, and in particular $\gamma (0,\,t)$
is the geodesic from $y=\gamma (0,\,0)$
to $x=\gamma (0,\,1)$
. When $y=\exp _x(\nabla \psi (x))$
for a Lipschitz function $\psi :M\rightarrow {\bf R}$
, we can determine $\xi$
as follows. Let $\phi (z)=-\psi (z)$
and introduce its infimal convolution

which is attained at $x$ since $y=\exp _x(\nabla \psi (x))=\exp _x(-\nabla \phi (x))$
. Now $\phi ^{cc}(x)=\phi (x)$
, so

where the infimum is attained at $y$ since $\phi (x)+\phi ^c(y)=d(x,\,y)^2/2$
. By lemma 2 of [Reference McCann12], $\phi ^c$
is Lipschitz and

The speed of $\gamma (0,\,t)$ is given by

Let $R$ be the curvature of the Levi–Civita derivation $\nabla$
so

Then by [Reference Pedersen13] p 36, for all $Y\in T_xM$, the curvature operator $R_Y: X\mapsto R(X,\,Y)Y$
is self-adjoint with respect to the scalar product on $T_xM$
. Also

satisfies the initial conditions

and Jacobi's differential equation [Reference Chavel4] (2.43)

By calculating the first variation of the length formula [Reference Pedersen13] p 161, one shows that

Assume that there are no conjugate points on $\gamma (s,\,t)$. Then by varying $w$
, we can make $Y(0,\,1)$
cover a neighbourhood of $0$
in $T_xM$
. Let

and

Let $J_{\exp _x}(v)$ be the Jacobian of the map $T_xM\rightarrow M$
given by $v\mapsto \exp _x(v)$
, as in (3.4) of [Reference Cabre3].
Lemma 2.1 Suppose that $\Psi _t (x)=\exp _x(t \nabla \psi (x))$, where $\Psi _1$
induces the probability measure $\nu$
from $\mu$
and gives the optimal transport map for the $W_2$
metric. Then the relative entropy satisfies

where $H$ is symmetric and $A+H$
is also positive definite. If $\psi \in C^2(M; {\bf R})$
, then equality holds in (2.12).
Proof. This is similar to lemma 1.2. As in (1.5), we have

and by standard calculations [Reference Pedersen13] p 32 we have

since $\Psi _t(x)$ is a geodesic.
The curvature operator is the symmetic operator $R_Z:Y\mapsto R(Z,\,Y)Z$. If $M$
has nonnegative Ricci curvature so that $R_Z\geq 0$
as a matrix for all $Z$
, then we have

by (3.4) of [Ca].
The following result recovers the Lichnérowicz integral, as in (4.16) of [Reference Blower1] and (1.1) of [Reference Deuschel and Stroock7]. This integral also appears implicitly in the Hessian calculations in appendix D of [Reference Lott and Villani11]. Let $\Vert H\Vert _{HS}$ be the Hilbert–Schmidt norm of $H$
.
Proposition 2.2 Suppose that $\psi \in C^2(M; {\bf R})$ and $\Psi _\tau (x)=\exp _x(\tau \nabla \psi (x))$
induces a probability measure $\nu _\tau$
from $\mu$
such that $\Psi _\tau$
is the optimal transport map for the $W_2$
metric. Then

Proof. For small $\tau >0$, we rescale $\psi$
to $\tau \psi$
and consider $y=\exp _x(\tau \nabla \psi (x))$
; then we return to $x$
along a geodesic $\gamma _\tau (t)=\exp _y(-t\nabla (-\tau \psi )^c(y))$
for $0\leq t\leq 1$
with constant speed $\tau \Vert \nabla \psi (x)\Vert$
. Observe that $\tau \psi (x)=(-\tau \psi )^c(y)-\tau ^2\Vert \nabla \psi (x)\Vert ^2/2$
, and $\nabla _xd(x,\,y)^2/2=-\exp _x^{-1}(y)=-\tau \nabla \psi (x)$
and $\nabla _yd(x,\,y)^2/2=-\exp _y^{-1}(x)=\nabla (-\tau \psi )^c(y)$
by Gauss's Lemma. Recalling that the curvature operator is self-adjoint by page 36 of [Reference Pedersen13], we choose the basis of $T_yM$
so that the first basis vector points along the direction of the geodesic $\gamma _\tau (0)$
. Hence Jacobi's equation (2.8) can be expressed as a second-order differential equation in block matrix form, with a symmetric matrix $S_{-\nabla (-\tau \psi )^c(y)}$
given by components of the curvature tensor such that

as in (2.4) of [Reference Cordero-Erausquin, McCann and Schmuckensläger6]. Then the Jacobi equation reduces to a first-order block matrix equation with blocks of shape $(1+(n-1))\times (1+(n-1))$ in a $(2n)\times (2n)$
matrix

To find the limit as $\tau \rightarrow 0$, we can assume that $S_{-\nabla (-\tau \psi )^c (y)}$
is constant on the geodesic, and may be expressed as $\tau ^2 S$
where $\tau ^2 S=S_{\tau \nabla \psi (x)}$
has shape ${(n-1)\times (n-1)}$
. The functions $\cos \alpha$
and $\sin \alpha /\alpha$
are entire and even, so $\cos \sqrt {s}$
and $\sin \sqrt {s}/\sqrt {s}$
are entire functions, hence they operate on complex matrices. Note that the matrix

in the bottom left corner is symmetric, has rank less than or equal to $n-1$, and does not depend upon $t$
. Hence we consider the matrix

which has derivative

so we can use this formula to solve (2.18). So the approximate differential equation has solution

Hence by (2.9) we have

which gives rise to the approximation

and likewise we obtain

From (2.19), we have

so the result follows by lemma 2.1.
We conclude with a transportation inequality which generalizes proposition 1.3 to the unit spheres ${\bf S}^n$. See [Reference Blower and Bolley2] for a discussion of measures on product spaces.
Theorem 2.3 Let $M={\bf S}^n$ for some $n\geq 2,$
and suppose that

for some $\kappa _U>0$. Then

Proof. In this case, the curvature operator is constant, so we have $S_{\nabla \psi (x)} Y=\Vert \nabla \psi (x)\Vert ^2Y$, so

Thus the result follows with a similar proof to proposition 1.3 using data from the proof of proposition 2.2.
Acknowledgments
I thank Graham Jameson for helpful remarks concerning inequalities which led to (1.34). I am also grateful to the referee, whose helpful comments improved the exposition.