Sufficiency of -cyclical monotonicity in a class of multi-marginal optimal transport problems

Luigi De Pascale; Anna Kausamo

doi:10.1017/S0956792524000202

Sufficiency of $\boldsymbol{c}$-cyclical monotonicity in a class of multi-marginal optimal transport problems

Part of: Miscellaneous topics in calculus of variations and optimal control Optimality conditions Existence theories

Published online by Cambridge University Press: 14 May 2024

Luigi De Pascale

and

Anna Kausamo

Show author details

Luigi De Pascale*: Affiliation:
Dipartimento di Matematica e Informatica, Università di Firenze, Firenze, Italy
Anna Kausamo: Affiliation:
Dipartimento di Matematica e Informatica, Università di Firenze, Firenze, Italy
*: Corresponding author: Luigi De Pascale; Email: luigi.depascale@unifi.it

Article contents

Abstract
Introduction
Essential background and preliminary results
Discretisation of transport plans (Dyadic-type decomposition in Polish spaces)
Variational approximations and conclusions
Funding statement
Competing interests
References

Rights & Permissions

Abstract

$c$-cyclical monotonicity is the most important optimality condition for an optimal transport plan. While the proof of necessity is relatively easy, the proof of sufficiency is often more difficult or even elusive. We present here a new approach, and we show how known results are derived in this new framework and how this approach allows to prove sufficiency in situations previously not treatable.

Keywords

Monge–Kantorovich problem Optimality condition c-cyclical monotonicity ∞-cyclical monotonicity Kantorovich potentials

MSC classification

Primary: 49J45: Methods involving semicontinuity and convergence; relaxation 49N15: Duality theory

Secondary: 49K21: Problems involving relations other than differential equations 49K27: Problems in abstract spaces

Information

Type: Papers
Information: European Journal of Applied Mathematics , Volume 36 , Issue 1 , February 2025 , pp. 68 - 81

DOI: https://doi.org/10.1017/S0956792524000202 [Opens in a new window]
Creative Commons: This is an Open Access article, distributed under the terms of the Creative Commons Attribution-NonCommercial-ShareAlike licence (http://creativecommons.org/licenses/by-nc-sa/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the same Creative Commons licence is used to distribute the re-used or adapted article and the original article is properly cited. The written permission of Cambridge University Press must be obtained prior to any commercial use.
Copyright: © The Author(s), 2024. Published by Cambridge University Press

1. Introduction

1.1 Optimal transport problems

Consider $N\ge 2$ Polish probability spaces $(X^1,\mu ^1),\ldots, (X^N,\mu ^N)$ and $X=\prod _{i=1}^NX^i$ . Let $c\;:\;X\to [0,\infty ]$ and consider the Multi-marginal optimal transport problems

(P)

\begin{equation} \min _{\gamma \in \Pi (\mu ^1,\ldots,\mu ^N)}C[\gamma ]\;:\!=\; \min _{\gamma \in \Pi (\mu ^1,\ldots,\mu ^N)}\int _X c d\gamma,\end{equation}

and

(P_∞)

\begin{equation} \min _{\gamma \in \Pi (\mu ^1,\ldots,\mu ^N)}C_\infty [\gamma ]\;:\!=\; \min _{\gamma \in \Pi (\mu ^1,\ldots,\mu ^N)} \gamma -\mathop{\mathrm{ess\,sup\,}}_{(x^1,\ldots,x^N)\in X} c,\qquad \end{equation}

in the set

\begin{equation*}\Pi (\mu ^1,\ldots,\mu ^N)\;:\!=\;\{\gamma \in \mathcal {P}(X)\,|\,\pi ^i(\gamma )=\mu ^i\text { for all }i=1,\ldots,N\},\end{equation*}

that is, in the set of couplings or transport plans between the $N$ marginals $\mu ^1,\ldots,\mu ^N$ . We refer to the second problem as the $\sup$ case. The first of these problems is widely encountered in the literature of the last thirty years. The second, although also old, gained popularity only more recently thanks to the applications of optimal transportation in machine learning (see, for example [Reference Peyré and Cuturi17]).

Some general results on the multi-marginal optimal transport problem are available in refs. [Reference Carlier10, Reference Kellerer28, Reference Pass30–Reference Rachev and Rüschendorf32], and results for special costs are available, for example in ref. [Reference Gangbo and Swiech23] for the quadratic cost with some generalisations in ref. [Reference Heinich27] and in ref. [Reference Carlier and Nazaret12] for the determinant cost. More applications appeared in ref. [Reference Ghoussoub and Moameni24]. Applications to economics of the multi-marginal optimal transportation problems include, for example, the problem of team-matching which is a generalisation of the classical marriage problem [Reference Carlier and Ekeland11, Reference Chiapporri, McCann and Nesheim13]. Applications to physics are related to quantum chemistry and the strong interacting regime for particles which are described in refs. [Reference Seidl33–Reference Seidl, Perdew and Levy35]. By now, there are several papers on the transport theory for the Coulomb cost and some more general repulsive costs, a selection is given by [Reference Buttazzo, De Pascale and Gori-Giorgi8, Reference Colombo, De Pascale and Di Marino14–Reference Cotar, Friesecke and Klüppelberg16, Reference Di Marino, Gerolin, Nenna, Bergounioux, Oudet, Rumpf, Carlier, Champion and Santambrogio20, Reference Friesecke, Mendl, Pass, Cotar and Klüppelberg21].

This paper is concerned with an optimality condition for the problems above, introduced in the next subsection. In particular, we will study the sufficiency of such optimality condition. We will give a new, easier, and in our opinion easier-to-understand proof of some known results, and we will show that this new approach allows to extend sufficiency results to a wider setting.

1.2 $\boldsymbol{c}$ -cyclical monotonicity, $\boldsymbol\infty$ - $\boldsymbol{c}$ -cyclical monotonicity and the main theorem

In this context, the $c$ -cyclical monotonicity takes the following form.

Definition 1.1. We say that a set $\Gamma \subset \prod _{i=1}^NX^i$ is $c$ -cyclically monotone (CM), if for every $k$ -tuple of points $(x^{1,j},\ldots, x^{N,j})_{j=1}^k$ and every $(N-1)$ -tuple of permutations $(\sigma ^2,\ldots,\sigma ^N)$ of the set $\{1,\ldots, k\}$ we have

\begin{equation*}\sum _{j=1}^kc(x^{1,j},x^{2,j},\ldots,x^{N,j})\le \sum _{j=1}^kc(x^{1,j},x^{2,\sigma ^2(j)},\ldots,x^{N, \sigma ^N(j)})\,.\end{equation*}

We also say that $\gamma \in \Pi (\mu ^1,\ldots,\mu ^N)$ is $c$ -cyclically monotone if it is concentrated on a $c$ -cyclically monotone set.

Definition 1.2. We say that a set $\Gamma \subset \prod _{i=1}^NX^i$ is infinitely $c$ -cyclically monotone (ICM), if for every $k$ -tuple of points $(x^{1,j},\ldots, x^{N,j})_{j=1}^k$ and every $(N-1)$ -tuple of permutations $(\sigma ^2,\ldots,\sigma ^N)$ of the set $\{1,\ldots, k\}$ we have

\begin{equation*}\max \{c(x^{1,j},x^{2,j},\ldots,x^{N,j})\,|\,j\in \{1,\ldots,k\}\}\le \max \{c(x^{1,j},x^{2,\sigma ^2(j)},\ldots,x^{N, \sigma ^N(j)})\,|\,j\in \{1,\ldots,k\}\}\,.\end{equation*}

We also say that a coupling $\gamma \in \Pi (\mu ^1,\ldots,\mu ^N)$ is infinitely cyclically monotone if it is concentrated on an ICM set.

We will use the expression $c$ -cyclically monotone for both conditions above. The main theorem of this paper is the following

Theorem 1.3. Let $\mu ^i \in \mathcal P(X^i)$ with compact support for $i=1,\ldots,N$ , let $c\;:\;X \to{\mathbb{R}}\cup \{+\infty \}$ be continuous. If $\gamma \in \Pi (\mu ^1,\ldots,\mu ^N)$ is an ICM plan for $c$ , then $\gamma$ is optimal.

Along the way, we will give a new proof of the following known result due, in a more general setting, to Griessler [Reference Griessler25].

Theorem 1.4. Let $\mu ^i \in \mathcal P(X^i)$ with compact support for $i=1,\ldots,N$ , let $c\;:\;X \to{\mathbb{R}}$ be continuous. If $\gamma \in \Pi (\mu ^1,\ldots,\mu ^N)$ is a CM plan for $c$ , then $\gamma$ is optimal.

We now discuss an important characterisation of $c$ -cyclically monotone transport plans. With this aim we define, still according to Griessler [Reference Griessler25],

Definition 1.5. Let $\gamma$ be a positive and finite Borel measure on $X$ . We say that $\gamma$ is finitely optimal if all its finitely supported submeasures are optimal with respect to their marginals. Here by submeasure we mean any probability measure $\alpha$ satisfying $\mathop{\textrm{supp}}\nolimits (\alpha )\subset \mathop{\textrm{supp}}\nolimits (\gamma )$ .

Proposition 1.6. If $\gamma \in \Pi (\mu ^1,\ldots, \mu ^N)$ is CM or ICM, then it is finitely optimal for the problem $(P)$ or $(P_\infty )$ , respectively.

Lemma 1.7. Let $\alpha =\sum _{i=1}^lm^i\delta _{( x^{1,i},\ldots, x^{N,i})}$ and $\overline \alpha =\sum _{i=1}^{\overline l}\overline{m}^i\delta _{(\overline{x}^{1,i},\ldots,\overline{x}^{N,i})}$ be two discrete measures with positive, integer coefficients and the same marginals. Let us denote by $\tilde l=m^1 + \ldots +m^l$ the number of rows of the following table

\begin{equation*} \left . \begin {array}{l@{\quad}l@{\quad}l} x^{1,1} & \ldots & x^{N,1} \\[5pt] \vdots & & \vdots \\[5pt] x^{1,1} & \ldots & x^{N,1} \end {array} \right \rbrace \begin {array}{l} \\[5pt] m^1 \mbox {- times} \end {array} \end{equation*}

\begin{equation*} \begin {array}{l@{\quad}l@{\quad}l@{\quad}l@{\quad}l@{\quad}l@{\quad}l@{\quad}l@{\quad}l@{\quad}l} \;\;\;\;\;\;\;\;\;\;\ldots & \ldots & \ldots & & & & & & & \end {array} \end{equation*}

\begin{equation*} \left .\begin {array}{l@{\quad}l@{\quad}l} x^{1,l} & \ldots & x^{N,l} \\[5pt] \vdots & & \vdots \\[5pt] x^{1,l} & \ldots & x^{N,l} \end {array} \right \rbrace \begin {array}{l} \\[5pt] m^l \mbox {- times}\\[5pt] \\[5pt] \end {array} \end{equation*}

where the first $m^1$ rows are equal among themselves, the following $m^2$ rows are equal among themselves and so on. Let $\overline A$ be the analogous table associated with $\overline \alpha$ . Then, $\overline A$ has $\tilde l$ rows, and there exist $(N-1)$ permutations of the set $\{1,\ldots,\tilde l\}$ such that $\overline{A}$ is equal to

\begin{equation*} \begin {array}{l@{\quad}l@{\quad}l} x^{1,1} & \ldots & x^{N,\sigma ^N(1)} \\[5pt] \vdots & & \vdots \\[5pt] x^{1,1} & \ldots & x^{N,\sigma ^N (m_1)}\\[5pt] x^{1,2} & \ldots & x^{N, \sigma ^N(m_1 +1)}\\[5pt] \vdots & & \vdots \\[5pt] x^{1,l} & \ldots & x^{N,\sigma ^N(m_1+\ldots +m_{l-1}+1)}\\[5pt] \vdots & & \vdots \\[5pt] x^{1,l} & \ldots & x^{N, \sigma ^N (\tilde l)} \end {array} \end{equation*}

Proof. For each $k\in \{1,\ldots, N\}$ , the $k$ -th marginal of $\alpha$ is given by the sum of the Dirac masses centred on the points of the $k$ -th column of the table $A$ with multiplicity. Analogously, the $k$ -th marginal of $\overline \alpha$ is given by the sum of the Dirac masses centred on the points of the $i$ -th column of the table $\overline A$ with multiplicity. Since the marginals of $\alpha$ and $\overline \alpha$ are the same, each point $x^{k,i}$ appearing in the $k$ -th marginal must appear in both matrices the same number of times, proving the existence of the bijections $\sigma ^2,\ldots,\sigma ^N$ as required. This also implies that $\overline A$ has $\tilde l$ rows.

Proof. (of Proposition 1.6) We fix a finitely supported submeasure $\alpha =\sum _{i=1}^l a_i\delta _{X^i}$ of $\gamma$ . We need to show that $\alpha$ is an optimal coupling of its marginals. To do this, we fix another coupling, $\overline \alpha = \sum _{i=1}^{\overline l} \overline a _i\delta _{\overline X^i}$ , with the same marginals as $\alpha$ . We have to show that

(1)

\begin{equation} \tilde C[\alpha ]\le \tilde C[\overline \alpha ], \end{equation}

where $\tilde C$ is any of the two costs under consideration. Let us first assume that the discrete measures $\alpha$ and $\overline \alpha$ have rational coefficients. We consider the measures $M\alpha$ and $M \overline \alpha$ , where $M$ is the product of the denominators of the coefficients of $\alpha$ and $\overline \alpha$ . They are discrete measures having positive, integer coefficients and the same marginals, so we can apply Lemma 1.7 to find permutations $\sigma ^2,\ldots,\sigma ^N$ such that $M \alpha$ and $M \overline \alpha$ have representations $A$ and $\overline A$ , respectively. If $\tilde C=C$ we have using the $c$ -cyclical monotonicity of $\alpha$

\begin{equation*}MC[\alpha ]=\sum _{i=1}^{\tilde l} c(x^{1,i},\ldots, x^{N,i})\le \sum _{i=1}^{\tilde l} c(x^{1,i},x^{2,\sigma ^2(i)},\ldots, x^{N, \sigma ^N(i))})=MC[\overline \alpha ],\end{equation*}

proving the optimality of $\alpha$ . If $\tilde C=C_\infty$ , the conclusion is immediate:

\begin{equation*}C_\infty [\alpha ]=\max _{1\le i\le \tilde k}c(x^{1,i},\ldots, x^{N,i})\le \max _{1\le i\le \tilde k}c(x^{1,i},x^{2,\sigma ^2(i)},\ldots, x^{N,\sigma ^N(i)})= C_\infty [\overline \alpha ].\end{equation*}

Now, assume that $\alpha$ and $\overline \alpha$ have real (not necessarily rational) coefficients,

\begin{equation*}\alpha \;:\!=\; \sum _{i=1}^l a_i \delta _{X^i}, \ \overline \alpha = \sum _{i=1}^{\overline l} \overline a_i \delta _{\overline X ^i}. \end{equation*}

We show that for all $\varepsilon \gt 0$ there exist two discrete measures

\begin{equation*}\beta \;:\!=\; \sum _{i=1}^l q_i \delta _{X^i} \ \mbox {and} \ \ \overline \beta = \sum _{i=1}^{\overline l} \overline q_i \delta _{\overline X ^i}, \end{equation*}

with the same marginals, $q_i, \overline q_i \in \mathbb{Q}$ and

\begin{equation*}|a_i-q_i| \lt \varepsilon, \ |\overline a_i-\overline q_i| \lt \varepsilon . \end{equation*}

Being concentrated on $X^1,\ldots,X^l$ and $\overline X^1, \ldots \overline X^{\overline l}$ is equivalent to the fact that the vector $\underline{\textbf{a}}\;:\!=\;(a_1, \ldots, a_l, \overline a_1, \ldots, \overline a_{\overline l})$ is a solution of

\begin{equation*}\mathcal A \underline {\textbf {a}}=0,\end{equation*}

where $\mathcal A$ is a matrix with coefficients $1, 0, -1$ . Indeed, if we write, for example, the equality between the first two marginals we obtain

\begin{equation*} \sum _{i=1}^l a_i \delta _{x^{1,i}}= \sum _{i=1}^{\overline l} \overline a_i \delta _{\overline x ^{1,i}}. \end{equation*}

so some of the points $\overline x^{1,i}$ must coincide with, for example, $x^{1,1}$ and this gives, for two sets of indices

\begin{equation*}\sum _{i\in I} a_i = \sum _{j\in J} \overline a_j.\end{equation*}

Since the matrix $\mathcal A$ has integer coefficients

\begin{equation*}\overline {Ker_{\mathbb {Q}} \mathcal A} = Ker_{\mathbb {R}} \mathcal A,\end{equation*}

and this allows to choose $\beta$ and $\overline \beta$ . Since $C[\alpha ]\approx C[\beta ]$ , $C[\overline \alpha ]\approx C[\overline \beta ]$ , $C_\infty [\alpha ]=C_\infty [\beta ]$ and $C_\infty [ \overline \alpha ]=C[\overline \beta ]$ , we conclude.

2. Essential background and preliminary results

The $c$ -cyclical monotonicity is the most important optimality condition for a transport plan. Giving a satisfactory historical background requires a paper on his own and we refer the reader to the survey [Reference De Pascale, Kausamo and Wyczesany19]. Originally born in convex analysis as characterisation of sub-differential of convex functions, for $N=2$ it first appeared as optimality condition in ref. [Reference Knott and Smith29] in an equivalent formulation of the Kantorovich’s problem. In that context, which was partly motivated by some models appearing in financial mathematics, the authors started by characterising optimal random variables using $c$ -cyclical monotonicity.

For the quadratic cost $c(x,y)=|x-y|^2$ in ${\mathbb{R}} ^d$ , the necessity of the condition is a basic result. See, for example, Prop. 2.24 of [Reference Villani36]. The classical structure of cyclical monotonicity of optimal plans was mentioned as a possible alternative tool in ref. [Reference Brenier7] and explicitly exploited in ref. [Reference Caffarelli9]. After that, in ref. [Reference Gangbo and McCann22], the authors extended the result to lower semi-continuous cost functions bounded from below. They showed that every finite optimal plan with respect to such a cost lies on a $c$ -cyclically monotone set.

For more general settings there are, essentially, two arguments to prove that the support of the optimal plan must be $c$ -cyclically monotone. The first one uses duality and appears in ref. [Reference Knott and Smith29], while the second one relies on modifying a transport plan that is not $c$ -cyclically monotone and showing that its cost can be improved. The latter technique was introduced in ref. [Reference Abdellaoui and Henich1] and used, for example, in Proposition 2.24 of [Reference Villani36]. Both approaches can be extended to the multi-marginal case with few technical modifications.

To the best of our knowledge, for $N=2$ the most general result was proved in ref. [Reference Beiglböck, Goldstern, Maresch and Schachermayer4] who removed regularity assumptions on the cost proving that: if $X, \,Y$ are Polish spaces equipped with Borel probability measures $\mu, \nu$ and $c \;:\; X \times Y \to [0, \infty ]$ is a Borel measurable cost function, then every optimal transport plan with finite total cost is $c$ -cyclically monotone.

Concerning the sufficiency of the condition, we reported above Th. 1.4 which seems to be the most general available in the case $N\gt 2$ . Much more is known for $N=2$ , and we will comment on this at the end of the paper.

2.1 Lower semi-continuity, compactness and existence of minimisers

Existence f the optimal transport problems above is usually obtained by the direct method of Calculus of Variations. Here, we shortly report the tools which we do not find elsewhere or that will be used substantially in our proofs. A useful convergence on the set of transport plans is the tight convergence.

Definition 2.1. Let $X$ be a metric space and let $\gamma _n \in \mathcal P (X)$ we say that $\gamma _n$ converges tightly to $\gamma$ if for all $\phi \in C_b (X)$

\begin{equation*} \int \phi d\gamma _n \to \int \phi d \gamma .\end{equation*}

The tight convergence will be denoted by $\stackrel{*}{\rightharpoonup }$ .

Definition 2.2. Let $\Pi$ be a set of Borel probability measures on a metric space $X$ . We say that $\Pi$ is tight (or uniformly tight) if for all $\varepsilon \gt 0$ there exists $K_\varepsilon \subset X$ compact such that

\begin{equation*}\gamma (K_\varepsilon )\gt 1-\varepsilon \ \mbox {or, equivalently,} \ \gamma (X\setminus K_\varepsilon )\leq \varepsilon \end{equation*}

for all $\gamma \in \Pi$ .

Theorem 2.3 (Prokhorov). Let $X$ be a complete and separable metric space (Polish space). Then, $\Pi \subset \mathcal P (X)$ is tight if and only if it is pre-compact with respect to the tight convergence.

Remark 2.4.

1. The tight convergence is lower semi-continuous on open sets and upper semi-continuous on closed sets;
2. If $X$ is complete and separable, then if $\Pi$ is a singleton it is always tight.

The following compactness theorem will be used in this paper.

Theorem 2.5. For $i=1, \ldots, N$ , let $X^i$ be a Polish space. Let $X= X^1 \times \ldots \times X^N$ . Let $\mathcal M^i \subset \mathcal P (X^i)$ be tight for all $i$ . Then, the set

\begin{equation*}\Pi =\{\gamma \in \mathcal P (X)\ | \ \pi ^i_\sharp \gamma \in \mathcal M^i \}\end{equation*}

is tight.

Proof. Let $\varepsilon \gt 0$ . By the tightness of $\mathcal M^i$ , we can fix a compact set $K^i \subset X^i$ such that for all $\mu ^i \in \mathcal M^i$

\begin{equation*}\mu ^i (X^i \setminus K^i)\lt \frac \varepsilon N .\end{equation*}

Let $K= K^1 \times \ldots \times K^N$ and let $\gamma \in \Pi$ . Since all the marginals $\pi ^i_\sharp \gamma \in \mathcal M ^i$ , and since

\begin{equation*} X\setminus K \subset \biggl ((X^1\setminus K^1 )\times \prod _{k=2}^NX^k\biggr )\cup \biggl (X^1\times (X^2\setminus K^2 )\times \prod _{k=3}^NX^k\biggr )\cup \cdots \cup \biggl (\prod _{k=1}^{N-1}X^k\times (X^N\setminus K^N)\biggr ), \end{equation*}

one gets

\begin{equation*} \gamma (X\setminus K)\leq \varepsilon . \end{equation*}

Corollary 2.6. By Prokhorov’s theorem, a set $\Pi \subset \mathcal P(X)$ as in the theorem above is pre-compact for the tight convergence. This is, in particular, true if $\mathcal M^i=\{\mu ^i\}$ .

If $c$ is lower semi-continuous, then also the functionals $C$ and $C_\infty$ (see problems (P) and (P_∞) on the first page for the definition) are lower semi-continuous with respect to the tight convergence of measures. The lower semi-continuity of $C$ is a standard result of optimal transport theory (see, e.g. [Reference Villani37] or [Reference Kellerer28] for the multi-marginal case). The next lemma proves the lower semi-continuity of $C_\infty$ .

Lemma 2.7. If the function $c\;:\;X\to{\mathbb{R}}\cup \{+\infty \}$ is lower semi-continuous, then also the functional $C_\infty$ is lower semi-continuous.

Proof. First, we note that, thanks to the lower semi-continuity of $c$ , its $\gamma$ -essential supremum can be written as

\begin{equation*}\gamma -\mathop {\mathrm {ess\,sup\,}} c=\sup \{c(x^1,\ldots,x^N)\,| \,(x^1,\ldots,x^N)\in \mathop {\textrm {supp}}\nolimits \gamma \}.\end{equation*}

Fix $\gamma \in \Pi (\mu ^1,\ldots,\mu ^N)$ and let $(\gamma ^n)_n$ be a sequence converging to $\gamma$ . Now there exist a vector $v\in \mathop{\textrm{supp}}\nolimits \gamma$ and a sequence $v^n=(x^{1,n},\ldots,x^{N,n})\in \prod _{i=1}^NX^i$ such that $v^n\in \mathop{\textrm{supp}}\nolimits \gamma ^n$ for all $n$ and $v^n\to v$ . Moreover,

\begin{equation*}\liminf _{n\to \infty }C_\infty [\gamma ^n]\ge \liminf _{n\to \infty }c(v^n)\ge c(v)\,.\end{equation*}

Since the above inequality holds for all $v\in \mathop{\textrm{supp}}\nolimits \gamma$ and for all sequences converging to $v$ , it also holds for the $\gamma$ -essential supremum, and the claim follows.

The use of compactness and semi-continuity theorems above gives the existence of optimal transport plans for both problems considered here.

2.2 $\boldsymbol\Gamma$ -convergence

A crucial tool that we will use in this paper is $\Gamma$ -convergence. All the details can be found, for instance, in Braides’s book [Reference Braides6] or in the classical book by Dal Maso [Reference Dal Maso18]. In what follows, $(X,d)$ is a metric space or a topological space equipped with a convergence.

Definition 2.8. Let $(F_n)_n$ be a sequence of functions $X \mapsto \bar{\mathbb{R}}$ . We say that $(F_n)_n$ $\Gamma$ -converges to $F$ if for any $x \in X$ we have

• for any sequence $(x^n)_n$ of $X$ converging to $x$
\begin{equation*} \liminf \limits _n F_n(x^n) \geq F(x) \qquad \text {($\Gamma $-liminf inequality);}\end{equation*}
• there exists a sequence $(x^n)_n$ converging to $x$ and such that
\begin{equation*} \limsup \limits _n F_n(x^n) \leq F(x) \qquad \text {($\Gamma $-limsup inequality).} \end{equation*}

This definition is actually equivalent to the following equalities for any $x \in X$ :

\begin{equation*} F(x) = \inf \left \{ \liminf \limits _n F_n(x^n) \;:\; x^n \to x \right \} = \inf \left \{ \limsup \limits _n F_n(x^n) \;:\; x^n \to x \right \} \end{equation*}

The function $x \mapsto \inf \left \{ \liminf \limits _n F_n(x^n) \;:\; x^n \to x \right \}$ is called $\Gamma$ -liminf of the sequence $(F_n)_n$ and the other one its $\Gamma$ -limsup. A useful result is the following (which for instance implies that a constant sequence of functions does not $\Gamma$ -converge to itself in general).

Proposition 2.9. The $\Gamma$ -liminf and the $\Gamma$ -limsup of a sequence of functions $(F_n)_n$ are both lower semi-continuous on $X$ .

The main interest of $\Gamma$ -convergence resides in its consequences in terms of convergence of minima.

Theorem 2.10. Let $(F_n)_n$ be a sequence of functions $X \to \bar{\mathbb{R}}$ and assume that $F_n$ $\Gamma$ -converges to $F$ . Assume moreover that there exists a compact and non-empty subset $K$ of $X$ such that

\begin{equation*} \forall n\in \mathbb {N}, \; \inf _X F_n = \inf _K F_n \end{equation*}

(we say that $(F_n)_n$ is equi-mildly coercive on $X$ ). Then, $F$ admits a minimum on $X$ and the sequence $(\inf _X F_n)_n$ converges to $\min F$ . Moreover, if $(x_n)_n$ is a sequence of $X$ such that

\begin{equation*} \lim _n F_n(x_n) = \lim _n (\inf _X F_n) \end{equation*}

and if $(x_{\phi (n)})_n$ is a subsequence of $(x_n)_n$ having a limit $x$ , then $ F(x) = \inf _X F$ .

3. Discretisation of transport plans (Dyadic-type decomposition in Polish spaces)

Let $\gamma$ be a Borel probability measure on $X=(X^1,d_1)\times \cdots \times (X^N,d_N)$ with marginals $\mu ^1,\ldots,\mu ^N$ . The space $X$ will be equipped with the $\sup$ metric

\begin{equation*} d(w,z)=\max _{1\le i\le N}d_i(w^i,z^i). \end{equation*}

Let $\varepsilon _n=\frac 1n$ . Since $\{\mu ^i\}_{i=1}^N$ are Borel probability measures, they are inner regular. Hence for all $n$ , there exist compact sets $K^{1,n}\subset \mathop{\textrm{supp}}\nolimits \mu ^1, K^{2,n}\subset \mathop{\textrm{supp}}\nolimits \mu ^2,\ldots, K^{N,n}\subset \mathop{\textrm{supp}}\nolimits \mu ^N$ such that

(2)

\begin{equation} \mu ^k(X^k\setminus K^{k,n})\lt \frac{\varepsilon _n}{N}, \end{equation}

for all $k=1,\ldots,N.$ We may assume that, for all $k$ and $n$ , $K^{k,n}\subset K^{k,{n+1}}$ .

We denote $K^{n}\;:\!=\;\prod _{k=1}^N K^{k,n}$ . Since

\begin{equation*} X\setminus K^n \subset \biggl ((X^1\setminus K^{1,n})\times \prod _{k=2}^NX^k\biggr )\cup \biggl (X^1\times (X^2\setminus K^{2,n} )\times \prod _{k=3}^NX^k\biggr )\cup \cdots \cup \biggl (\prod _{k=1}^{N-1}X^k\times (X^N\setminus K^{N,n})\biggr ), \end{equation*}

one gets

\begin{equation*} \gamma (X\setminus K^n)\leq \varepsilon _n. \end{equation*}

The cost $c$ is uniformly continuous on each $K^{n}$ , and for all $n$ , we can fix $\delta _n\in (0,\varepsilon _n)$ such that the sequence ( $\delta _n$ ) is decreasing in $n$ and

\begin{equation*} |c(u)-c(z)|\lt \varepsilon _n \,\,\,\text { for all }u,z\in K^n\text { for which } d(u,z)\lt \delta _n. \end{equation*}

Next we fix, for all $n$ , finite Borel partitions for the sets $K^{1,n},\ldots,K^{N,n}$ . We denote these by $\{\tilde B_i^{k,n}\}_{i=1}^{\tilde m^{k,n}}$ , $k=1,\ldots,N$ , and we choose them in such a way that for all $n\in \mathbb{N}$ and $k\in \{1,\ldots,N\}$

\begin{equation*} diam(\tilde B_i^{k,n})\lt \frac 12\delta _n, \end{equation*}

for all $i\in \{1,\ldots,\tilde m^{k,n}\}$ .

We form a new, possibly finer, partition $\{B_i^{k,n}\}_{i=1}^{m^{k,n}}$ for each $K^{k,n}$ by intersecting (if the intersection if non-empty) each element $\tilde B_i^{k,n}$ successively first with the set $K^{k,1}$ , then with $K^{k,2}$ , and so on up until intersecting with the set $K^{k,n-1}$ . So that for $j \in \{1, \ldots n\}$ either $B_i^{k,n} \cap K^{k,j}$ is empty or it is the entire $B_i^{k,n}$ . The products

\begin{equation*} \mathcal {Q}^n=\{ B_{i_1}^{1,n}\times B_{i_2}^{2,n}\times \cdots \times B_{i_N}^{N,n},\,i_k\in \{1,\ldots, m^{k,n} \}\text { for all }k=1,\ldots,N\} \end{equation*}

form a partition of the set $K^n$ with

\begin{equation*}diam(B_i^{k,n})\lt \frac 12\delta _n, \end{equation*}

for all $i\in \{1,\ldots,m^{k,n}\}.$

We denote

\begin{equation*}I^n=\{(i_1,\ldots,i_N)\,| \,\gamma (B_{i_1}^{1,n}\times B_{i_2}^{2,n}\times \cdots \times B_{i_N}^{N,n})\gt 0\},\end{equation*}

and for all ${\textbf{i}}\;:\!=\;(i_1,\ldots,i_N)\in I^n$ , we use the notation $Q_{\textbf{i}}^n\;:\!=\; B_{i_1}^{1,n}\times \cdots \times B_{i_N}^{N,n}$ . We fix points $z^n _{\textbf{i}}=z_{i_1,\ldots,i_N}^n\in \prod _{k=1}^NB_{i_k}^{k,N}\cap \mathop{\textrm{supp}}\nolimits \gamma$ (i.e. $z^n _{\textbf{i}}\in Q^n_{\textbf{i}}\cap \mathop{\textrm{supp}}\nolimits \gamma$ ). We define

(3)

\begin{equation} \tilde \alpha ^n=\sum _{(i_1,\ldots,i_N)\in I^n}\gamma (B_{i_1}^{1,n}\times \cdots \times B_{i_N}^{N,n})\delta _{z_{i_1,\ldots,i_N}^n}\,\,\,\text{and}\,\,\,\alpha ^n=\frac{1}{\gamma (K^n)}\tilde \alpha ^n; \end{equation}

since $\tilde \alpha ^n(X)=\gamma (K^n)$ , the measures $\alpha ^n$ are probability measures.

To each multi-index $\textbf{i}=(i_1,\ldots,i_N)$ and thus to each point $z_{\textbf{i}}^n$ correspond $N$ points

\begin{equation*}x_{\textbf {i}}^{1,n}\in B_{i_1}^{1,n},\ldots,\,x_{\textbf {i}}^{N,n}\in B_{i_N}^{N,n},\end{equation*}

which are ‘coordinates’ in the spaces $X_i$ of $z_{\textbf{i}}^n$ . The marginals of $\alpha ^n$ are supported by the Dirac measures given by these points. We denote these marginals by $\mu ^{1,n},\ldots, \mu ^{N,n}$ . More precisely, they can be described as

(4)

\begin{equation} \mu ^{k,n}=\frac{1}{\gamma (K^n)}\sum _{i=1}^{m_{k,n}} \sum _{\stackrel{{\textbf{i}} \in I^n}{i={\textbf{i}}_k}} \gamma (Q_{\textbf{i}}) \delta _{x_{\textbf{i}}^{k,n}}. \end{equation}

Proposition 3.1. $\alpha ^n\rightharpoonup \gamma$ .

Proof. Let $\varepsilon \gt 0$ and $\varphi \in C_b(X)$ . We have to find $n_0\in \mathbb{N}$ such that

(5)

\begin{equation} \bigg |\int _X \varphi d \gamma - \int _X\varphi d \alpha ^n \bigg | \lt \varepsilon, \,\,\,\text{ for all }n\ge n_0. \end{equation}

Let $M\gt 0$ be such that

\begin{equation*} |\varphi (z)| \le M\,\,\,\text {for all }z\in X.\end{equation*}

We fix $\bar n\in \mathbb{N}$ such that

\begin{equation*} \gamma (X\setminus K^n)\lt \min \left \{\frac {1}{2},\frac {\varepsilon }{5M}\right \},\,\,\,\text {for all }n\ge \bar n \end{equation*}

Since $\varphi \in C_b(X)$ , it is uniformly continuous on the set $K^{\bar n}$ , there exists $\delta \gt 0$ such that

\begin{equation*} |\varphi (z)-\varphi (v)|\lt \frac {\varepsilon }{5}\,\,\,\text { for all }z,v\in K^{\bar n}\text { such that }d(z,v)\lt \delta . \end{equation*}

Moreover, the decomposition $\mathcal{Q}^n$ has been constructed so that there exists $n_0 \geq \bar{n}$ such that for all $k \in \{1, \ldots, N\}$ and $n \geq n_0$

\begin{equation*} diam(B_i^{k,n})\lt \delta \,\,\,\text {for all }i\in \{1,\ldots,m^{k,n}\}. \end{equation*}

We start from

(6)

\begin{equation} \bigg |\int _X\varphi{\,\mathrm{d}}\gamma -\int _X\varphi{\,\mathrm{d}}\alpha ^n \bigg | \le \bigg | \int _{K^{\bar n}}\varphi{\,\mathrm{d}}\gamma -\int _{K^{\bar n}}\varphi{\,\mathrm{d}}\alpha ^n \bigg |+ \bigg |\int _{X\setminus K^{\bar n}}\varphi{\,\mathrm{d}}\gamma -\int _{X\setminus K^{\bar n}}\varphi{\,\mathrm{d}}\alpha ^n\bigg | \end{equation}

and we evaluate separately the two terms on the RHS. For all $n\ge n_0$ , the first term can be estimated as follows: (we recall that, by construction, $K^{\bar n}\subset K^n$ )

(7)

\begin{align} &\bigg | \int _{K^{\bar n}}\varphi{\,\mathrm{d}}\gamma -\int _{K^{\bar n}}\varphi{\,\mathrm{d}}\alpha ^n \bigg | =\bigg |\int _{K^{\bar n}}\varphi{\,\mathrm{d}}\gamma -\frac{1}{\gamma (K^n)}\int _{K^{\bar n}}\varphi{\,\mathrm{d}}\tilde \alpha ^n\bigg |\nonumber \\[5pt] &\stackrel{a)}{\le }\bigg |\int _{ K^{\bar n}}\varphi{\,\mathrm{d}}\gamma -\int _{K^{\bar n}}\varphi{\,\mathrm{d}}\tilde \alpha ^n \bigg | +\frac{\gamma (X\setminus K^n)}{1-\gamma (X\setminus K^n)}\int _{K^{\bar n}}|\varphi |{\,\mathrm{d}}\tilde \alpha ^n\nonumber \\[5pt] &\lt \bigg |\int _{K^{\bar n}}\varphi{\,\mathrm{d}}\gamma -\int _{K^{\bar n}}\varphi{\,\mathrm{d}}\tilde \alpha ^n\bigg | +M\cdot 2\cdot \frac{\varepsilon }{5M}\nonumber \\[5pt] &\lt \bigg |\int _{ K^{\bar n}}\varphi{\,\mathrm{d}}\gamma -\int _{K^{\bar n}}\varphi{\,\mathrm{d}}\tilde \alpha ^n\bigg | +\frac{2\varepsilon }{5}. \end{align}

Above in $a)$ , we have written

\begin{equation*} \frac {1}{\gamma (K^n)}=\frac {1}{1-\gamma (X\setminus K^n)}=1+\frac {\gamma (X\setminus K^n)}{1-\gamma (X\setminus K^n)} \end{equation*}

and then estimated the numerator from above by $\frac{\varepsilon }{5M}$ and the term $\gamma (X\setminus K^n)$ of the denominator from below by $\frac 12$ . By construction, since $n\ge \bar n$ , there exist a subset $\bar I^n\subset I^n$ such that

\begin{equation*} K^{\bar n}=\bigcup _{(i_1,\ldots, i_N)\in \bar I^n}B_{i_1}^{1,n}\times \cdots \times B_{i_N}^{N,n}. \end{equation*}

So we write

\begin{equation*} \int _{K^{\bar n}}\varphi {\,\mathrm {d}}\gamma -\int _{K^{\bar n}}\varphi {\,\mathrm {d}}\tilde \alpha ^n=\sum _{\textbf {i}\in \bar I^n}\left (\int _{(B_{i_1}^{1,n}\times \cdots \times B_{i_N}^{N,n})}\varphi {\,\mathrm {d}}\gamma -\int _{(B_{i_1}^{1,n}\times \cdots \times B_{i_N}^{N,n})}\varphi {\,\mathrm {d}}\tilde \alpha ^n\right ). \end{equation*}

We simplify the notations for the next few lines and, for all $\textbf{i}\in \bar I^n$ , we denote by $Q\;:\!=\;B_{i_1}^{1,n}\times \cdots \times B_{i_N}^{N,n}$ and by $u_0=z_{i_1,\ldots,i_N}\in Q$ the point in which $\tilde \alpha ^n$ is concentrated. Then for each ‘cube’ $Q$

\begin{align*} &\bigg |\int _{Q}\varphi (u)\,{\,\mathrm{d}}\gamma -\int _{Q}\varphi (u)\,{\,\mathrm{d}}\tilde \alpha ^n\bigg |=\bigg |\int _{Q}\varphi (u)\,{\,\mathrm{d}}\gamma -\varphi (u_0)\gamma (Q)\bigg |\\[5pt] &\le \int _{Q}|\varphi (u)-\varphi (u_0)|\,{\,\mathrm{d}}\gamma \leq \gamma (Q)\cdot \frac{\varepsilon }{5}, \end{align*}

and in the last passage, we have used the uniform continuity of $\varphi$ on $K^{\bar n}$ . Summing the estimate above over all cubes $Q=B_{i_1}^{1,n}\times \cdots \times B_{i_N}^{N,n}$ , $\textbf{i}\in \bar I^n$ , gives

\begin{equation*} \bigg |\int _{K^{\bar n}} \varphi {\,\mathrm {d}}\gamma - \int _{K^{\bar n}} \varphi {\,\mathrm {d}}\tilde \alpha ^n \bigg |\lt \gamma (K^{\bar n})\cdot \frac {\varepsilon }{5} \leq \frac {\varepsilon }{5}. \end{equation*}

Combining this estimate with (7) gives us the estimate

(8)

\begin{equation} \bigg |\int _{K^{\bar n}}\varphi{\,\mathrm{d}}\gamma -\int _{K^{\bar n}}\varphi{\,\mathrm{d}}\alpha ^n\bigg |\lt \frac 15 \varepsilon +\frac{2\varepsilon }{5}=\frac 35 \varepsilon . \end{equation}

Finally, the ‘tail’ term in (6). Using the set $\bar I^n$ defined above one gets

\begin{align*} \alpha ^n(X\setminus K^{\bar n})&=1-\frac{1}{\gamma (K^n)}\sum _{(i_1,\ldots,i_N)\in \bar I^{n}}\gamma (B_{i_1}^{1,n}\times \cdots \times B_{i_N}^{N, n})\\[5pt] &=1-\frac{\gamma (K^{\bar n})}{\gamma (K^n)}\le 1-\gamma (K^{\bar n})\lt \frac{\varepsilon }{5M}. \end{align*}

Using this we get

(9)

\begin{align} &\bigg |\int _{X\setminus K^{\bar n}}\varphi{\,\mathrm{d}}\gamma -\int _{X\setminus K^{\bar n}}\varphi{\,\mathrm{d}}\alpha ^n\bigg |\le \int _{X\setminus K^{\bar n}}|\varphi |{\,\mathrm{d}}\gamma +\int _{X\setminus K^{\bar n}}|\varphi |{\,\mathrm{d}}\alpha ^n\nonumber \\[5pt] &\lt M\frac{\varepsilon }{5M}+M\frac{\varepsilon }{5M}=\frac 25\varepsilon . \end{align}

Together estimates (8) and (9) prove the claim (5).

Remark 3.2. If $\mathop{\textrm{supp}}\nolimits \mu ^k$ is compact for $k=1, \ldots, N$ , then the dependence on $n$ of $K^n$ is not needed anymore since one can take $K^n\equiv K\;:\!=\; \mathop{\textrm{supp}}\nolimits \mu ^1 \times \ldots \times \mathop{\textrm{supp}}\nolimits \mu ^N$ . This also simplifies the analytic expressions of $\alpha ^n$ and their marginal measures.

In line with the previous Remark, we prove the following:

Proposition 3.3. If $\mathop{\textrm{supp}}\nolimits \mu ^k$ is compact for $k=1, \ldots, N$ then for all $k,n$ and all $i$

\begin{equation*} \mu ^{k,n} (B_i^{k,n})= \mu ^k (B_i^{k,n}), \end{equation*}

where, we recall, $\mu ^{k,n}$ is defined in (4) above and is the $k$ -th marginal of the discretisation $\alpha ^n$ of $\gamma$ defined in (3).

Proof. Again we prove the formula for the first marginal.

\begin{align*} \alpha ^n \bigg ( B_i ^{1,n}\times \prod _{k=2}^N X^k \bigg ) &= \sum _{\stackrel{{\underline{\textbf{i}}} \in I^n}{i=i_1}} \gamma (Q_{\underline{\textbf{i}}} ^n) \delta _{z_{{\underline{\textbf{i}}}}^n} \bigg (B_i ^{1,n}\times \prod _{k=2}^N X^k\bigg )\\[5pt] &=\sum _{\stackrel{{\underline{\textbf{i}}} \in I^n}{i=i_1}} \gamma (Q_{\underline{\textbf{i}}} ^n)=\gamma \bigg ( B_i ^{1,n}\times \prod _{k=2}^N X^k\bigg ). \end{align*}

4. Variational approximations and conclusions

In this section, we prove the discrete approximations of the functionals that will be used in the optimality proofs. Given a transport plan $\gamma$ , we have introduced, in the previous section, the dyadic approximation $\{\alpha ^n\}_{n \in \mathbb{N}}$ of $\gamma$ .

4.1 The $\sup$ case

We define the functionals $\mathcal{F}_n,\mathcal{F}\;:\;\mathcal{P}(X)\to{\mathbb{R}} \cup \{+\infty \}$ by

\begin{equation*} \mathcal {F}_n(\beta )= \begin {cases} C_\infty [\beta ]&\text { if }\beta \in \Pi (\mu ^{1,n},\ldots,\mu ^{N,n}),\\[5pt] +\infty &\text { otherwise; } \end {cases} \end{equation*}

and

\begin{equation*} \mathcal {F}(\beta )= \begin {cases} C_\infty [\beta ]&\text { if }\beta \in \Pi (\mu ^1,\ldots,\mu ^N),\\[5pt] +\infty &\text { otherwise.} \end {cases} \end{equation*}

For the rest of this subsection, we assume that $c$ is continuous and that $\mu ^i$ has compact support for $i=1,\ldots,N.$ We prove the following

Proposition 4.1. The functionals $\mathcal{F}_n$ are equi-coercive and

(10)

\begin{equation} \mathcal{F}_n \stackrel{\Gamma }{\rightarrow }\mathcal{F}. \end{equation}

Proof. Let $\beta \in \mathcal{P}(X)$ . We recall that we need to prove the following:

(I)

\begin{equation} \forall (\beta ^n)_n\stackrel {*}{\rightharpoonup }\beta \text { in }\mathcal {P}(X), \ \liminf _{n\to \infty }\mathcal {F}_n(\beta ^n)\ge \mathcal {F}(\beta ).\end{equation}

(II)

\begin{equation} \exists (\beta ^n)_n \stackrel {*}{\rightharpoonup } \beta \text { in } \mathcal {P}(X) \text { s.t. } \limsup _{n\to \infty }\mathcal {F}_n(\beta ^n)\le \mathcal {F}(\beta ). \end{equation}

If $\mathcal F[\beta ]\lt +\infty$ , the $\Gamma$ - $\liminf$ inequality (Condition (I)) follows from the lower semi-continuity of the functional $C_\infty$ . If $\mathcal{F}[\beta ]=+\infty$ , then either $\beta \notin \Pi (\mu ^1,\ldots,\mu ^N)$ or $C_\infty (\beta )=+\infty$ . In the first case, since $\beta ^n \stackrel{*}{\rightharpoonup } \beta$ and $\mu ^{i,n}\stackrel{*}{\rightharpoonup } \mu ^i$ for $i=1, \ldots, N$ , there exists $n_0\in \mathbb{N}$ such that $\beta ^n\notin \Pi (\mu ^{1,n},\ldots,\mu ^{N,n})$ for all $n\ge n_0$ . Hence, $\mathcal{F}_n[\beta ^n]=+\infty$ for all $n\ge n_0$ . If $C_\infty (\beta )=+\infty$ , then let $M\gt 0$ and let ${\textbf{x}} \in \mathop{\textrm{spt}}\nolimits \beta$ and $r\gt 0$ be such that $B({{\textbf{x}}}, r) \subset \{c\gt M-\varepsilon \}$ . Since the evaluation on open sets is lower semi-continuous with respect to the tight convergence, we have that, for $n$ big enough, $\beta _n (B({{\textbf{x}}}, r))\gt 0$ so that $C_\infty (\beta _n)\gt M-\varepsilon$ and since $M$ is arbitrary we conclude.

For the $\Gamma$ - $\limsup$ inequality (Condition (II)), if $\mathcal{F}[\beta ]=+\infty$ , then any sequence with the right marginals and tightly converging to $\beta$ will do. Therefore, we may assume that the measure $\beta$ satisfies $\beta \in \Pi (\mu ^1,\ldots,\mu ^N)$ and $C_\infty [\beta ]\lt +\infty$ . To build the approximants, we use the Borel partitions $\{B_i^{k,n}\}_{i=1}^{m^{k.n}}$ and discrete measures introduced in Section 3. For all $n$ , given a multi-index ${\underline{\textbf{i}}}=(i_1,\ldots,i_N)$ we use, again, the ‘cube’

\begin{equation*} Q_{\underline {\textbf {i}}}^n\;:\!=\;B_{i_1}^{1,n}\times \cdots \times B_{i_N}^{N,n} \end{equation*}

and set

\begin{equation*}J^n\;:\!=\;\{{\underline {\textbf {i}}} \ | \ \beta (Q^n_{\underline {\textbf {i}}}) \gt 0\}.\end{equation*}

We then define the measures

\begin{equation*} \beta ^n=\sum _{{\underline {\textbf {i}}} \in J^n} \beta (Q_{\underline {\textbf {i}}}^n) \frac {\mu ^{1,n} \lower 3pt\hbox {$_{|{B^{1,n}_{i_1}}}$} }{\mu ^1 (B^{1,n}_{i_1})}\otimes \ldots \otimes \frac {\mu ^{N,n} \lower 3pt\hbox {$_{|{B^{N,n}_{i_N}}}$}}{\mu ^N (B^{N,n}_{i_N})}. \end{equation*}

We show that $\beta ^n$ has marginals $\mu ^{1,n},\ldots,\mu ^{N,n}$ . For all Borel sets $A\subset X_1$ , we have

(11)

\begin{align} \beta ^n\left (A\times \prod _{k=2}^N X_k\right )&=\sum _{\textbf{j}\in J^n}\beta (Q_{\textbf{j}}^{n})\frac{\mu ^{1,n}_{|B_{j_1}^{1,n}}(A)}{\mu ^{1,n}(B_{j_1}^{1,n})}\nonumber \\[5pt] &=\sum _{j_1\in \pi ^1(J^n)}\frac{\mu ^{1,n}_{|B_{j_1}^{1,n}}(A)}{\mu ^{1}(B_{j_1}^{1,n})}\sum _{\{(j_2,\ldots,j_N)\,|\,\textbf{j}\in J^n\}}\beta (Q_{\textbf{j}}^n)\nonumber \\[5pt] &=\sum _{j_1\in \pi ^1(J^n)}\frac{\mu ^{1,n}_{|B_{j_1}^{1,n}}(A)}{\mu ^{1}(B_{j_1}^{1,n})}\mu ^1(B_{j_1}^{1,n})\nonumber \\[5pt] &=\sum _{j_1\in \pi ^1(J^n)}\mu ^{1,n}_{|B_{j_1}^{1,n}}(A)=\mu ^{1,n}(A). \end{align}

where the third inequality is due to Proposition 3.3. The computation is analogous for the other marginals.

The sequence $(\beta ^n)$ converges tightly to $\beta$ which can be seen in a manner analogous to the convergence of the sequence $(\alpha ^n)$ to $\gamma$ . It remains to prove that the sequence satisfies the $\Gamma$ - $\limsup$ inequality. We fix $\varepsilon \gt 0$ . It suffices to show that

\begin{equation*}\limsup _{n\to \infty }C_\infty [\beta ^n]\le C_\infty [\beta ]+\varepsilon .\end{equation*}

Since for all $n$ the support of $\beta ^n$ is a finite set, we can fix $u^n\in \mathop{\textrm{supp}}\nolimits \beta ^n$ such that $C_\infty [\beta ^n]=c(u^n)$ . Moreover, for all $n$ there exists $z^n\in \mathop{\textrm{supp}}\nolimits \beta$ such that $d(u^n,z^n)\le \tfrac 12\delta _n$ . Now for all $n$ large enough to satisfy $\varepsilon _n\lt \varepsilon$ , we have

\begin{equation*}C_\infty [\beta ^n]=c(u^n)\le c(z^n)+\varepsilon _n\le C_\infty [\beta ]+\varepsilon _n\lt C_\infty [\beta ]+\varepsilon \end{equation*}

and we are done.

By Corollary 2.6, $\Pi (\mu ^1, \ldots, \ \mu ^N)\cup _n \Pi (\mu ^{1,n}, \ldots, \ \mu ^{N,n})$ is compact and therefore the equi-coercivity follows.

4.2 The integral case

We define the functionals $\mathcal{G}_n,\mathcal{G}\;:\;\mathcal{P}(X)\to{\mathbb{R}} \cup \{+\infty \}$ by

\begin{equation*} \mathcal {G}_n(\beta )= \begin {cases} C[\beta ]&\text { if }\beta \in \Pi (\mu ^{1,n},\ldots,\mu ^{N,n}),\\[5pt] +\infty &\text { otherwise; } \end {cases} \end{equation*}

and

\begin{equation*} \mathcal {G}(\beta )= \begin {cases} C[\beta ]&\text { if }\beta \in \Pi (\mu ^1,\ldots,\mu ^N),\\[5pt] +\infty &\text { otherwise.} \end {cases} \end{equation*}

Also for the integral case, we assume that the measures $\mu ^1,\ldots,\mu ^N$ have compact supports and that the cost function $c\;:\;X\to{\mathbb{R}}$ is continuous. We prove the following:

Proposition 4.2. The functionals $\mathcal{G}_n$ are equi-coercive and

(12)

\begin{equation} \mathcal{G}_n \stackrel{\Gamma }{\rightarrow }\mathcal{G}. \end{equation}

Proof. The proof is analogous to that of Proposition 4.1. The only substantial difference is in the proof of the $\Gamma$ - $\limsup$ inequality in the case that the measure $\beta$ belongs to the set $\Pi (\mu ^1,\ldots,\mu ^N)$ . We have to find a sequence $(\beta ^n)$ , weakly ${}^\ast$ converging to $\beta$ and satisfying Condition (II). Let ( $\beta ^n$ ) be the discretisation defined in the proof of Proposition 4.1. Since the supports of the measures $\mu ^1,\ldots,\mu ^N$ are compact, also the set $K\;:\!=\;\mathop{\textrm{spt}}\nolimits \mu ^1\times \cdots \times \mathop{\textrm{spt}}\nolimits \mu ^N$ is compact. Note that for all $n\in \mathbb{N}$ , we have $\mathop{\textrm{spt}}\nolimits \beta ^n\subset K$ . We set $T=\max _{z\in K}c(z)$ . Now the function $c_T\;:\!=\;\min \{c,T\}$ is continuous and bounded on $X$ and by the weak ${}^\ast$ -convergence

\begin{equation*}\mathcal {G}(\beta ^n)=\int _Xcd\beta ^n=\int _X c_Td\beta ^n\to \int _Xc_Td\beta =\int _Xcd\beta =\mathcal {G}[\beta ],\end{equation*}

from which the $\Gamma$ - $\limsup$ inequality follows.

4.3 Proof of the main theorems and a counterexample

Proof. (of Theorem 1.3) By Proposition 3.1 and Remark 3.2, we can find a sequence $(\alpha ^n)_n$ with finite supports such that $\mathop{\textrm{spt}}\nolimits \alpha ^n \subset \mathop{\textrm{spt}}\nolimits \gamma$ and $\alpha ^n \stackrel{*}{\rightharpoonup } \gamma$ . We define the functionals $\mathcal{F}$ and $\mathcal{F}_n$ of Subsection 4.1 using the marginals of $\gamma$ and $\alpha ^n$ . The plan $\gamma$ is ICM; therefore by Proposition 1.6, it is finitely optimal. This means that each plan $\alpha ^n$ is optimal between its marginals and thus a minimiser of the functional $\mathcal{F}_n$ .

The $\Gamma$ -convergence and equi-coercivity established in Proposition 4.1 imply, by Theorem 2.10, that the minimisers of the functionals $\mathcal{F}_n$ converge, up to subsequences, to a minimiser of $\mathcal F$ . Therefore, since $\alpha ^n\stackrel{*}{\rightharpoonup }\gamma$ , the plan $\gamma$ is optimal for the problem ( $P_\infty$ ).

Proof. (of Theorem 1.4) The proof is the same as that of Theorem 1.3. The $\Gamma$ -convergence is now given by Proposition 4.2.

In ref. [Reference Ambrosio2], Ambrosio and Pratelli give, for the problem ( $P$ ), an example of lower semi-continuous cost function $c\;:\;X\times X\to [0,\infty ]$ ( $c$ assumes also the value $+\infty$ ), for which there exists a $c-$ cyclically monotone transport plan which is not optimal. After that it has been shown in refs. [Reference Beiglböck, Goldstern, Maresch and Schachermayer4] and [Reference Bianchini and Caravenna5] that, for $N=2$ , it is enough that $c$ is Borel measurable and that the set $\{c=+\infty \}$ as a special structure. Actually, the measure theoretical tools introduced in ref. [Reference Bianchini and Caravenna5] could be applied in an even more general settings. We refer the reader to those papers for further details.

The next example, that is a slightly modified version of the example of [Reference Ambrosio2], shows that also in the case of the problem ( $P_\infty$ ) the continuity of the cost may be required, even when the cost assumes only finite values.

Example 4.3. Let us consider the two-marginal $L^\infty$ -optimal transportation problem with marginals $\mu =\nu =\mathcal{L}|_{[0,1]}$ and the cost function

\begin{equation*}c(x,y)= \begin {cases} 1&\text { if }x=y\\[5pt] 2&\text { otherwise} \end {cases}.\end{equation*}

We fix an irrational number $\alpha$ . We set $T_1=Id_{[0,1]}$ and $T_2\;:\;[0,1]\to [0,1]$ , $T_2(x)=x+\alpha \pmod 1$ . Now $T_1$ is an optimal transportation map for the problem ( $P_\infty$ ) with $C_\infty [T_1]=1$ . Since $C_\infty [T_2]=2$ , $T_2$ cannot be optimal. However, it is ICM.

In fact if we assume that $T_2$ is not ICM, we should find a minimal $K\in \mathbb{N}$ and a $K$ -tuple of couples $\{x_i,y_i\}_{i=1}^K$ , all belonging to the support of the plan given by $T_2$ , such that

\begin{equation*} \max _{1\le i\le K}c(x_i,y_i)\gt \max _{1\le i\le K}c(x_{i+1},y_i), \end{equation*}

with the convention $x_{K+1}=x_1$ . By the definition of the map $T_2$ , we have $y_i=x_i+\alpha \pmod 1$ for all $i$ . Given the form of $c$ , the only form in which this inequality can hold is $2\gt 1$ . The right-hand side now tells us that $y_i=x_i+\alpha \pmod 1$ for all $i$ , that is, $x_{i+1}=x_i+\alpha \pmod 1$ for all $i$ . Summing up now gives us (keeping in mind that $x_{K+1}=x_1$ ) that $x_1=x_1+K\alpha \pmod 1$ , contradicting the irrationality of $\alpha$ .

Acknowledgements

Both authors acknowledge the support of GNAMPA-INDAM and ‘Fondi di ricerca di ateneo, ex 60 $\%$ ’ of the University of Firenze.

Funding statement

The research of the first author is part of the project Metodologie innovative per l’analisi di dati a struttura complessa financed by the Fondazione Cassa di Risparmio di Firenze. The second author acknowledges the support of the PRIN (Progetto di ricerca di rilevante interesse nazionale) 2022J4FYNJ, Variational methods for stationary and evolution problems with singularities and interfaces.

Competing interests

There are no competing interest about this research.

References

Abdellaoui, T. & Henich, H. (1994) Sur la distance de deux lois dans le cas vectoriel. Comptes rendus de l’Académie des sciences. C. R. l’Acad. Sci. Sér 1, Math. 319(4), 397–400.Google Scholar

Ambrosio, L. (2003) Lecture notes on optimal transportation. Mathematical aspects of evolving interfaces (Funchal, 2000), Lecture notes in mathematics, Springer, New York.Google Scholar

Beiglböck, M., Léonard, C. & Schachermayer, W. (2012) A general duality theorem for the monge–kantorovich transport problem. Stud. Math. 209(2), 2–167.CrossRef Google Scholar

Beiglböck, M., Goldstern, M., Maresch, G. & Schachermayer, W. (2009) Optimal and better transport plans. J. Funct. Anal. 256(6), 1907–1927.CrossRef Google Scholar

Bianchini, S. & Caravenna, L. (2010) On optimality of c-cyclically monotone transference plans. C. R. Math. 348(11-12), 613–618.CrossRef Google Scholar

Braides, A. (2002) Gamma-convergence for beginners, vol. 22, Oxford University Press, Oxford.CrossRef Google Scholar

Brenier, Y. (1991) Polar factorization and monotone rearrangement of vector-valued functions. Commun. Pure Appl. Math. 44(4), 375–417.CrossRef Google Scholar

Buttazzo, G., De Pascale, L. & Gori-Giorgi, P. (2012) Optimal-transport formulation of electronic density-functional theory. Phys. Rev. A 85(6), 062502.CrossRef Google Scholar

Caffarelli, L. (1992) The regularity of mappings with a convex potential. J. Am. Math. Soc. 5(1), 99–104.CrossRef Google Scholar

Carlier, G. (2003) On a class of multidimensional optimal transportation problems. J. Convex Anal. 10(2), 517–530.Google Scholar

Carlier, G., Ekeland, I. (2010) Matching for teams. Econom. Theory 42(2), 397–418.CrossRef Google Scholar

Carlier, G. & Nazaret, B. (2008) Optimal transportation for the determinant. ESAIM Control Optim. Calc. Var. 14(04), 678–698.CrossRef Google Scholar

Chiapporri, P.-A., McCann, R. & Nesheim, L. (2010) Hedonic price equilibria, stable matching and optimal transport; equivalence, topology and uniqueness. Econom. Theory 42(2), 317–354.CrossRef Google Scholar

Colombo, M., De Pascale, L. & Di Marino, S. (2015) Multimarginal optimal transport maps for 1-dimensional repulsive costs. Can. J. Math. 67(2), 350–368.CrossRef Google Scholar

Colombo, M. & Di Marino, S. (2013) Equality between Monge and Kantorovich multimarginal problems with coulomb cost. Ann. Matem. Pura Appl., 1–14.Google Scholar

Cotar, C., Friesecke, G. & Klüppelberg, C. (2013) Density functional theory and optimal transportation with coulomb cost. Commun. Pure Appl. Math. 66(4), 548–599.CrossRef Google Scholar

Peyré, G. & Cuturi, M. (2019) Computational optimal transport: With applications to data science. Found. Trends® Mach. Learn. 11, 355–607.CrossRef Google Scholar

Dal Maso, G. (1993) An Introduction to Γ-Convergence, Birkhäuser, Boston, MA.CrossRef Google Scholar

De Pascale, L., Kausamo, A. & Wyczesany, K. 60 years of cyclic monotonicity: A survey. J. Convex Anal., ArXiv: 2308.07682v2.Google Scholar

Di Marino, S., Gerolin, A., & Nenna, L. (2017) Optimal transportation theory for repulsive costs. In: Bergounioux, M., Oudet, E., Rumpf, M., Carlier, G., Champion, T. & Santambrogio, F. (editors), Topological Optimization and Optimal Transport: In the Applied Sciences, De Gruyter, Berlin, Boston, pp. 204–256.Google Scholar

Friesecke, G., Mendl, C. B., Pass, B., Cotar, C. & Klüppelberg, C. (2013) N-density representability and the optimal transport limit of the Hohenberg-Kohn functional. J. Chem. Phys. 139(16), 164109.CrossRef Google Scholar PubMed

Gangbo, W. & McCann, R. J. (1996) The geometry of optimal transportation. Acta Math. 177(2), 113–161.CrossRef Google Scholar

Gangbo, W. & Swiech, A. (1998) Optimal maps for the multidimensional Monge-Kantorovich problem. Commun. Pure Appl. Math. 51(1), 23–45.3.0.CO;2-H>CrossRef Google Scholar

Ghoussoub, N. & Moameni, A. (October 2012) A self-dual polar factorization for vector fields. Commun. Pure Appl. Math. 66(6), 905–933. DOI: 10.1002/cpa.21430.CrossRef Google Scholar

Griessler, C. (2018)

$c$ -cyclical monotonicity as a sufficient criterion for optimality in the multi-marginal Monge-Kantorovich problem . Proc. Am. Math. Soc. 145, 4735–4740.CrossRef Google Scholar

Jylhä, H. (2015) The

$L^{\infty }$ optimal transport: Infinite cyclical monotonicity and the existence of optimal transport maps. Calc. Var. Partial Differ. Eq. 52, 303–326.CrossRef Google Scholar

Heinich, H. (2002) Problème de Monge pour n probabilités. C. R. Math. 334(9), 793–795.CrossRef Google Scholar

Kellerer, H. G. (1984) Duality theorems for marginal problems. Prob. Theory Relat. Fields 67(4), 399–432.Google Scholar

Knott, M. & Smith, C. (1992) On Hoeffding-Fréchet bounds and cyclic monotone relations. J. Multiv. Anal. 40(2), 328–334.Google Scholar

Pass, B. (2011) Uniqueness and Monge solutions in the multi-marginal optimal transportation problem. SIAM J. Math. Anal. 43(6), 2758–2775.CrossRef Google Scholar

Pass, B. (2012) On the local structure of optimal measures in the multi-marginal optimal transportation problem. Calc. Var. Partial Differ. Eq. 43(3-4), 529–536.CrossRef Google Scholar

Rachev, S. T. & Rüschendorf, L. (1998) Mass transportation problems. In: Vol. I. Probability and its Applications (New York). Theory, Springer-Verlag, New York.Google Scholar

Seidl, M. (1999) Strong-interaction limit of density-functional theory. Phys. Rev. A 60(6), 4387–4395.CrossRef Google Scholar

Seidl, M., Gori-Giorgi, P. & Savin, A. (2007) Strictly correlated electrons in density-functional theory: A general formulation with applications to spherical densities. Phys. Rev. A 75(4), 042511.CrossRef Google Scholar

Seidl, M., Perdew, J. P. & Levy, M. (1999) Strictly correlated electrons in density-functional theory. Phys. Rev. A 59(1), 51–54.CrossRef Google Scholar

Villani, C. (2003) Topics in Optimal Transportation, vol. 58, American Mathematical Society, Providence, RI.Google Scholar

Villani, C. (2009) Optimal transport. Old and new. vol. 338 of Grundlehren der Mathematischen Wissenschaften, Springer-Verlag, Berlin.Google Scholar