Hostname: page-component-cd9895bd7-gvvz8 Total loading time: 0 Render date: 2024-12-26T08:04:30.119Z Has data issue: false hasContentIssue false

Divergences on monads for relational program logics

Published online by Cambridge University Press:  31 July 2023

Tetsuya Sato
Affiliation:
Tokyo Institute of Technology, 2-12-1 Ookayama, Meguro-ku, Tokyo, Japan
Shin-ya Katsumata*
Affiliation:
National Institute of Informatics, 2-1-2 Chiyoda-ku, Tokyo, Japan
*
Corresponding author: Shin-ya Katsumata; Email: s-katsumata@nii.ac.jp
Rights & Permissions [Opens in a new window]

Abstract

Several relational program logics have been introduced for integrating reasoning about relational properties of programs and measurement of quantitative difference between computational effects. Toward a general framework for such logics, in this paper, we formalize the concept of quantitative difference between computational effects as divergences on monads, then develop a relational program logic called approximate computational relational logic (acRL for short). It supports generic computational effects and divergences on them. The semantics of the acRL is given by graded strong relational liftings constructed from divergences on monads. We derive two instantiations of the acRL: (1) for the verification of various kinds of differential privacy of higher-order functional probabilistic programs and (2) the other for measuring difference of distributions of cost between higher-order functional probabilistic programs with a cost counting operator.

Type
Special Issue: Differences and Metrics in Programs Semantics: Advances in Quantitative Relational Reasoning
Copyright
© The Author(s), 2023. Published by Cambridge University Press

1. Introduction

Comparing behavior of programs is one of the fundamental activities in verification and analysis of programs, and many concepts, techniques, and formal systems have been proposed for this purpose, such as product program construction (Barthe et al., Reference Barthe, Crespo, Kunz, Butler and Schulte2011, Reference Barthe, D’Argenio and Rezk2004; Zaks and Pnueli, Reference Zaks, Pnueli, Cuellar, Maibaum and Sere2008), relational Hoare logic (Benton, Reference Benton2004), higher-order relational refinement types (Barthe et al., Reference Barthe, Gaboardi, Arias, Hsu, Roth and Strub2015), and so on.

Several recent relational program logics integrate compositional reasoning about relational properties of programs and over-approximating quantitative differences between computational effects of programs; the latter is done in the style of effect system (Lucassen and Gifford, Reference Lucassen and Gifford1988). One successful logic of this kind is Barthe et al.’s approximate probabilistic relational Hoare logic (apRHL for short) designed for verifying differential privacy of probabilistic programs (Barthe et al., Reference Barthe, Köpf, Olmedo and Béguelin2012). A judgment of apRHL is of the form $ c\sim_{\epsilon,\delta}c':\Phi\Rightarrow\Psi $ , and its intuitive meaning is that for any pair of initial states $ (\rho,\rho') $ related by $ \Phi $ , the $ \epsilon $ -distance between two probability distributions of final states $ [\![ c]\!]\rho $ and $ [\![ c']\!]{\rho'} $ is below $ \delta $ , and final states are related by $ \Psi $ . Another relational program logic that measures differences between computational effects of programs is Çiçek et al.’s RelCost (Çiçek et al., Reference Çiçek, Barthe, Gaboardi, Garg and Hoffmann2017). The target of the reasoning is a higher-order functional programming language equipped with cost counting effect. When we derive a judgment $ \Delta; \Psi; \Gamma \vdash M_1 \ominus M_2 \precsim n \colon \Phi $ in RelCost, its sound semantics ensures that the difference of cost counts by $ M_1 $ and $ M_2 $ is bound by n.

A high-level view on these relational program logics is that they integrate the feature of measuring quantitative differences between computational effects into relational program logic. We aim to mathematically formulate this perspective in this paper. The central concept in our formulation is a divergence on a monad. It specifies quantitative differences between computational effects modeled by a monad, and it may be seen as a generalization of various statistical divergences, such as the Kullback–Leibler divergence and the total variation distance, to general computational effects. We then present a construction of the semantics of a relational program logic that integrates the measurement of quantitative differences between computational effects from a divergence on a monad. This construction works with general monads and divergences on them and can be applied to derive the semantics of the approximate relational program logic for differential privacy as an instance. The detail of the development of our formulation is as follows:

  • We introduce a structure called the divergence on a monad for measuring quantitative differences between computational effects in Section 4. Divergences on monads are a mild generalization of Barthe and Olmedo’s composable divergences on probability distributions, such as the Kullback–Leibler divergence and the total variation distance on probability distributions (Barthe and Olmedo, Reference Barthe and Olmedo2013). We visit examples of divergences on monads in Section 5 and give a method to transfer divergences on a monad to those on another monad through monad opfunctors.

  • We study mathematical aspects of divergences on monads in Section 6. We first reformulate them as structures in the category of divergences. We show that divergences on monads can be lifted to the category of divergences as monads, and that these liftings yield relative monads corresponding to the original divergences on monads. We next introduce the concept of generatedness of divergences on monads. It is a direct generalization of the same concept studied for differential privacy by Balle et al. (Reference Balle, Barthe, Gaboardi, Hsu, Sato, Chiappa and Calandra2020). We then associate them to the concept of quantitative equational theories studied by Mardare et al. (Reference Mardare, Panangaden and Plotkin2016). We show that unconditional quantitative equational theories on a set X of variables bijectively correspond to X-generated pseudometrics on free monads.

  • The key structure to integrate divergences on monads and relational program logics is something called the graded strong relational liftings of a monad. We present a general construction of such liftings from divergences on monads in Section 7. This construction makes it possible to give semantics of relational program logics with quantitative measurement of differences between computational effects for various monads and divergences on them.

  • We introduce a generic relational program logic (called acRL) over Moggi’s computational metalanguage (the simply typed lambda calculus with monadic types) in Section 8. Inside acRL, we can use graded strong relational liftings constructed from divergences on a monad and reason about relational properties of programs together with quantitative differences of computational effects. To illustrate how the reasoning works in acRL, we instantiate it with the computational metalanguage having effectful operations for continuous random sampling (Section 9) and cost counting operation (Section 10).

2. Preliminaries

We assume basic knowledge about category theory (Mac Lane, Reference Mac Lane1998) and Moggi’s model of computational effects (Moggi, Reference Moggi1991). The definition of monad (Mac Lane, Reference Mac Lane1998, Chapter VI) and Kleisli category (Mac Lane, Reference Mac Lane1998, Section VI.5) are omitted.

In this paper, a Cartesian category (CC for short) is specified by a category $ {\mathbb{C}} $ with a designated terminal object 1 and a binary product functor $ (\times)\colon {\mathbb{C}}^2\to{\mathbb{C}} $ . The associated pairing operation and projection morphisms are denoted by $ \langle-,-\rangle,\pi_1, \pi_2 $ , respectively. The unique morphism to the terminal object is denoted by $ !_I:I\to 1 $ . A Cartesian closed category (CCC for short) is a CC $ ({\mathbb{C}},1,(\times)) $ with a specified exponential functor $ (\Rightarrow)\colon {\mathbb{C}}^{\mathrm{op}}\times{\mathbb{C}}\to{\mathbb{C}} $ . The associated evaluation morphism and currying operation is denoted by $ \mathrm{ev},\lambda(-) $ , respectively. There are plenty of examples of C(C)Cs. For the models of probabilistic computation, we will use the CC Meas of measurable spaces and the CCC QBS of quasi-Borel spaces (Heunen et al., Reference Heunen, Kammar, Staton and Yang2017), which we briefly recall in Section 2.1.

Let $ ({\mathbb{C}},1,(\times)) $ be a CC. A global element of $ I\in{\mathbb{C}} $ is a morphism of type $ 1\rightarrow I $ . For a category $ {\mathbb{C}} $ , we define the functor $ U^{\mathbb{C}}:{\mathbb{C}}\to{\bf Set} $ by $ U^{\mathbb{C}}={\mathbb{C}}(1,-) $ . When $ {\mathbb{C}} $ is obvious, $ U^{\mathbb{C}} $ is denoted by U. Morphisms in $ {\mathbb{C}} $ act on global elements by the composition. To emphasize this action, we introduce a dedicated notation $ (\mathbin{\bullet}) $ whose type is $ {\mathbb{C}}(I,J)\times UI\to UJ $ . Of course, $ f\mathbin{\bullet} x\triangleq f\circ x=(Uf)(x) $ . We also define the partial application of a binary morphism $ f \colon I \times J \rightarrow K $ to a global element $ i\in UI $ by $ f_i \triangleq f \circ \langle i \circ !_J, {\rm id}_J \rangle:J\to K $ . When $ {\mathbb{C}} $ is a CCC, there is an evident isomorphism $ \lfloor{-}\rfloor\colon U(I\Rightarrow J)\cong{\mathbb{C}}(I,J) $ . We write $ \lceil{-}\rceil $ for its inverse.

A monad $ (T,\eta,\mu) $ on a category $ {\mathbb{C}} $ determines the operation $ (-)^\sharp:{\mathbb{C}}(I,TJ)\to{\mathbb{C}}(TI,TJ) $ called Kleisli extension. It is defined by $ f^\sharp\triangleq\mu_J\circ Tf $ . A monad may be equivalently given by a Kleisli triple (Moggi, Reference Moggi1991, Definition 1.2) that axiomatizes the triple $ (T,\eta,(-)^\sharp) $ . A strong monad on a CC $ ({\mathbb{C}},1,(\times)) $ is a pair of a monad $ (T,\eta,\mu) $ and a natural transformation $ \theta_{I,J}\colon I\times TJ\to T(I\times J) $ called strength. It should satisfy four axioms; see (Moggi, Reference Moggi1991, Definition 3.2) for detail.

A Cartesian category with a strong monad (CC-SM for short) is a pair of a CC and a strong monad on it. A Cartesian closed category with a strong monad (CCC-SM for short) is similarly defined. In a CC-SM $ ({\mathbb{C}},1,(\times),T,\eta,\mu,\theta) $ , the application of the strength to a global element can be expressed by the unit and the Kleisli extension of T (Moggi, Reference Moggi1991, Proof of Proposition 3.4):

(1) \begin{equation} \label{eq:stun} \theta_{I, J} \mathbin{\bullet}\langle i,c\rangle = ((\eta_{I\times J})_i)^{\sharp}\mathbin{\bullet} c \quad (i\in UI,c\in U(TJ)). \end{equation}

We will use this fact in Proposition 7, Proposition 17, and Theorem 39.

2.1 Measurable spaces and quasi-Borel spaces

Measurable spaces. For the treatment of continuous probability distributions, we employ the category Meas of measurable spaces and measurable functions. For a measurable space I, we write $ |I| $ and $ \Sigma_I $ for the underlying set and $ \sigma $ -algebra of I, respectively. The category Meas is a (well-pointed) CC, and it has all small limits and small colimits that are strictly preserved by the forgetful functor $ |{-}| \colon {\bf Meas} \to {\bf Set} $ , which is naturally isomorphic to the global element functor $ {\bf Meas}(1,-) $ .

Standard Borel spaces. A standard Borel space is a special measurable space $ (|\Omega|, \Sigma_\Omega) $ whose $ \sigma $ -algebra $ \Sigma_\Omega $ is the coarsest one including the topology $ \sigma_\Omega $ of a Polish space $ (|\Omega|, \sigma_\Omega) $ . In particular, the real line $ {\mathbb{R}} $ forms a standard Borel space. In fact, a measurable space $ \Omega $ is standard Borel if and only if there are $ \gamma \colon \Omega \to {\mathbb{R}} $ and $ \gamma' \colon {\mathbb{R}} \to \Omega $ in Meas forming a section–retraction pair, that is, $ \gamma' \circ \gamma = {\rm id}_\Omega $ . For example, [0,1], $ [0,\infty] $ , $ {\mathbb{N}} $ , and $ {\mathbb{R}}^k $ ( $ k \in {\mathbb{N}} $ ) are standard Borel.

The Giry monad. We recall the Giry monad G (Giry, Reference Giry and Banaschewski1982). For every measurable space I, G I is the measurable space given by the following data: the underlying set $ |G I| $ is the set of all probability measures over I, and the $ \sigma $ -algebra is the coarsest one induced by functions $ \mathrm{ev}_A \colon |G I | \to [0,1] $ ( $ A \in \Sigma_X $ ) defined by $ \mathrm{ev}_A(\mu) = \mu(A) $ . The unit $ \eta_I \colon I \to G I $ assigns to each $ x \in I $ the Dirac measure $ \mathbf{d}_x $ centered at x. For every $ f \colon I \to G J $ , the Kleisli extension $ f^\sharp \colon G I \to G J $ is given by $ (f^\sharp(\mu))(A) = \int_x f(x)(A)~d\mu(x) $ for each $ \mu \in G I $ . By $ G_{s} $ , we mean the subprobabilistic variant of G (called sub-Giry monad), where the underlying set $ |G_{s} I| $ of $ G_{s} I $ is relaxed to the set of subprobaility measures over I.

The Giry monad G (resp. the sub-Giry monad $ G_{s} $ ) carries a (commutative) strength $ \theta_{I,J} \colon I \times G J \to G (I \times J) $ over the CC $ ({\bf Meas},1,(\times)) $ . It computes the product of measures ( $ (x, \mu) \mapsto \mathbf{d}_x \otimes \mu $ ). Therefore, $ ({\bf Meas},G) $ and $ ({\bf Meas},G_{s}) $ are (well-pointed) CC-SMs.

Quasi-Borel spaces. The category Meas is not suitable for the semantics of higher-order programming languages since it is not Cartesian closed (Aumann, Reference Aumann1961) Footnote 1. For the treatment of higher-order probabilistic programs with continuous distributions, we employ the Cartesian closed category QBS of quasi-Borel spaces and morphisms between them, together with the probability monad P on QBS (Heunen et al., Reference Heunen, Kammar, Staton and Yang2017). A quasi-Borel space is a pair $ I = (|I|,M_I) $ of a set $ |I| $ and a subset $ M_I $ of the function space $ {\mathbb{R}} \Rightarrow |I| $ satisfying

  1. (1) for $ \alpha \in M_I $ and a measurable function $ f \colon {\mathbb{R}} \to {\mathbb{R}} $ , $ \alpha \circ f \in M_I $ .

  2. (2) for any $ x \in I $ , $ (\lambda r \in {\mathbb{R}}.x) \in M_I $ .

  3. (3) for all $ P \colon {\mathbb{R}} \to {\mathbb{N}} $ and a family $ \{ \alpha_i \}_{ i \in {\mathbb{N}}} $ of functions $ \alpha_i \in M_I $ , $ (\lambda r \in {\mathbb{R}}. \alpha_{P(r)}(r) ) \in M_I $ .

A morphism $ f \colon (|I|,M_I) \to (|J|,M_J) $ is a function $ f \colon |I| \to |J| $ such that $ f \circ \alpha \in M_J $ holds for all $ \alpha \in M_I $ . The category QBS is a (well-pointed) CCC and has all countable products and coproducts that are strictly preserved by the forgetful functor $ |{-}| \colon {\bf QBS} \to {\bf Set} $ . It is naturally isomorphic to the global element functor $ {\bf QBS}(1,-) $ .

Connection to measurable spaces: An adjunction We can convert measurable spaces and quasi-Borel spaces using an adjunction $ L \dashv K \colon {\bf Meas} \to {\bf QBS} $ . They are given by:

\begin{align*} L I &\triangleq (|I|, \{ U \subseteq |I| \mid \forall \alpha \in M_X. \alpha^{-1}(I) \in \Sigma_{{\mathbb{R}}} \}) & L f &\triangleq f\\ K I &\triangleq (|I|, {\bf Meas}({\mathbb{R}},I)) & K f &\triangleq f \end{align*}

For any standard Borel space $ \Omega \in {\bf Meas} $ , we have $ LK\Omega = \Omega $ . The right adjoint K is full-faithful when restricted to the standard Borel spaces (Heunen et al., Reference Heunen, Kammar, Staton and Yang2017, Proposition 15-(2)]. The right adjoint K preserves countable coproducts and function spaces (if exists) of standard Borel spaces (Heunen et al., Reference Heunen, Kammar, Staton and Yang2017, Proposition 19].

Probability Measures and the Probability Monad. A probability measure on a quasi-Borel space I is a pair $ (\alpha,\mu) \in M_I \times G {\mathbb{R}} $ . We introduce an equivalence relation $ \sim_I $ over probability measures on I by:

\begin{equation*} (\alpha,\mu) \sim_I (\beta,\nu) \iff \mu(\alpha^{-1}(-)) = \nu(\beta^{-1}(-)). \end{equation*}

Using this, we introduce a probability monad P on QBS as follows:

  • On objects, we define $ P:{\bf Obj}({\bf QBS})\to{\bf Obj}({\bf QBS}) $ by

    \begin{align*} |P(I)| &\triangleq (M_I \times G {\mathbb{R}}) / \sim_I, & M_{P(I)} \triangleq \{ \lambda r. [(\alpha,g(r))]_{\sim_I} \mid \alpha \in M_I, g \in {\bf Meas}({\mathbb{R}},G {\mathbb{R}})\}. \end{align*}
  • The unit is defined by $ \eta_I (x) \triangleq [\lambda r.x,\mu]_{\sim_I} $ for an arbitrary $ \mu \in G {\mathbb{R}} $ .

  • The Kleisli extension of $ f \colon I \to P(J) $ is defined by $ f^\sharp [\alpha,\mu]_{\sim_I} \triangleq [\beta, g^\sharp \mu] $ where there are $ \beta \in M_J $ and $ g \in {\bf Meas}({\mathbb{R}},G{\mathbb{R}}) $ satisfying $ f \circ \alpha = \lambda r \in {\mathbb{R}}. [\beta,g(r)]_{\sim_J} $ by definition of $ M_{P(J)} $ .

The monad P is (commutative) strong with respect to the C(C)C $ ({\bf QBS},1,(\times)) $ .

2.2 Category of binary relations

We define the category $ {\bf BRel}({\mathbb{C}}) $ of binary relations over $ {\mathbb{C}} $ -objects. This category is equivalent to subscones of $ {\mathbb{C}}^2 $ (Mitchell and Scedrov, Reference Mitchell and Scedrov1992). It offers an underlying category for relational reasoning about programs interpreted in $ {\mathbb{C}} $ .

  • An object in $ {\bf BRel}({\mathbb{C}}) $ is a triple $ (I_1,I_2,R) $ where $ R\subseteq UI_1\times UI_2 $ .

  • A morphism from $ (I_1,I_2,R) $ to $ (J_1,J_2,S) $ in $ {\bf BRel}({\mathbb{C}}) $ is a pair of $ {\mathbb{C}} $ -morphisms $ f_1\colon I_1\rightarrow J_1 $ and $ f_2\colon I_2\rightarrow J_2 $ such that for any $ (i_1,i_2)\in R $ , we have $ (f_1\mathbin{\bullet} i_1,f_2\mathbin{\bullet} i_2)\in S $ .

When X is a name of a $ {\bf BRel}({\mathbb{C}}) $ -object, by $ X_1,X_2 $ we mean its first and second component, and by $ R_X $ we mean its third component, so $ X=(X_1,X_2,R_X) $ . By $ (x_1,x_2)\in X $ , we mean $ (x_1,x_2)\in R_X $ . For objects $ X,Y\in{\bf BRel}({\mathbb{C}}) $ and a morphism $ (f_1,f_2)\colon (X_1,X_2)\to (Y_1,Y_2) $ in $ {\mathbb{C}}^2 $ , by

\begin{equation*} (f_1,f_2)\colon X\mathbin{\dot\rightarrow} Y \end{equation*}

we mean that $ (f_1,f_2)\in{\bf BRel}({\mathbb{C}})(X,Y) $ , that is, for any $ (x_1,x_2)\in X $ , we have $ (f_1\mathbin{\bullet} x_1,f_2\mathbin{\bullet} x_2)\in Y $ . We say that $ X\in{\bf BRel}({\mathbb{C}}) $ is an endorelation (over I) if $ X_1=X_2(=I) $ .

We next define the forgetful functor $ p_{{\mathbb{C}}}:{\bf BRel}({\mathbb{C}})\to{{\mathbb{C}}}^2 $ by:

\begin{equation*} p_{{\mathbb{C}}} X\triangleq(X_1,X_2),\quad p_{{\mathbb{C}}}(f_1,f_2)\triangleq(f_1,f_2). \end{equation*}

For $ (I_1,I_2)\in{\mathbb{C}}^2 $ , by $ {\bf BRel}({\mathbb{C}})_{(I_1,I_2)} $ we mean the complete boolean algebra $ \{X\in{\bf BRel}({\mathbb{C}}) \mid X_1=I_1\wedge X_2=I_2\} $ with the order given by $ X\le Y\iff R_X\subseteq R_Y $ .

When $ {\mathbb{C}} $ is a C(C)C, so is $ {\bf BRel}({\mathbb{C}}) $ (Mitchell and Scedrov, Reference Mitchell and Scedrov1992, Proposition 4.3). We specify a terminal object, a binary product functor and an exponential functor (in case $ {\mathbb{C}} $ is a CCC) on $ {\bf BRel}({\mathbb{C}}) $ by:

\begin{align*} \dot 1 & \triangleq (1,1,\{({\rm id}_1,{\rm id}_1)\}) \\ X\mathbin{\dot\times} Y & \triangleq (X_1\times Y_1, X_2\times Y_2, \{(\langle x_1,y_1\rangle,\langle x_2,y_2\rangle) \mid (x_1,x_2)\in X,(y_1,y_2)\in Y \})\\ X\mathbin{\dot\Rightarrow} Y & \triangleq (X_1\Rightarrow Y_1,X_2\Rightarrow Y_2, \{(f_1,f_2)\mid\forall{(x_1,x_2)\in X}~.~(\mathrm{ev}\circ\langle f_1,x_1\rangle,\mathrm{ev}\circ\langle f_2,x_2\rangle)\in Y\} ). \end{align*}

3. Divergences on Objects

We introduce the concept of divergence on objects in a CC $ {\mathbb{C}} $ . Major differences between divergences and metrics are threefold: (1) divergences are defined over objects in $ {\mathbb{C}} $ , (2) no axiom is imposed on divergences, and (3) divergences take values in a partially ordered monoid called divergence domain, which we define below.

Definition 1. A divergence domain $ {\mathcal{Q}}=(Q,\le,0,(+)) $ is a partially ordered commutative monoid whose poset part is a complete lattice.

The monoid addition $ (+) $ is only required to be monotone; no interaction with the sup/inf is required. We reserve the letter $ {\mathcal{Q}} $ to denote a general divergence domain. Examples of divergence domains are as follows:

\begin{align*} {\mathcal{N}}&=(\mathbb{N}\cup\{\infty\},\leq,0,(+)),& {\mathcal{R}}^+&=([0,\infty],\le,0,(+)),\\ {\mathcal{R}}^\times&=([0,\infty],\leq,1,(\times)),& {\mathcal{R}}^+_{1}&=([0,\infty],\le,0,\lambda{(p,q)}~.~p+q+pq),\\ {\mathcal{Z}}&=(\mathbb{Z}\cup\{\infty,-\infty\},\leq,0,(\mathbin{\bar{+}})),& {\mathcal{R}}&=([-\infty,\infty],\le,0,(\mathbin{\bar{+}})) \end{align*}

Here, $ \mathbin{\bar{+}} $ is an extension of the addition by $ r \mathbin{\bar{+}} (-\infty) = (-\infty) \mathbin{\bar{+}} r=-\infty $ .

Definition 2. Let $ {\mathbb{C}} $ be a CC. A $ {\mathcal{Q}} $ -divergence on an object $ I\in{\mathbb{C}} $ is a function $ d\colon (UI)^2\to {\mathcal{Q}} $ . For two $ {\mathcal{Q}} $ -divergences d,d’ on I, by $ d\preceq_I d' $ , we mean $ \forall{x_1,x_2\in UI}~.~d'(x_1,x_2)\le d(x_1,x_2) $ .

The binary relation $ \preceq_I $ is a partial order on the set of $ {\mathcal{Q}} $ -divergences on I.

A suitable notion of morphismetween $ {\mathbb{C}} $ -objects with divergences is nonexpansive morphism.

Definition 3. Let $ {\mathbb{C}} $ be a CC. We define the category $ {\bf Div}_{{\mathcal{Q}}}({{\mathbb{C}}}) $ of $ {\mathcal{Q}} $ -divergences on $ {\mathbb{C}} $ -objects and nonexpansive morphisms between them by the following data.

  • An object is a pair (I,d) of an object $ I\in{\mathbb{C}} $ and a $ {\mathcal{Q}} $ -divergence d on I.

  • A morphism from (I,d) to (J,e) is a $ {\mathbb{C}} $ -morphism $ f\colon I\to J $ such that for any $ x_1,x_2\in UI $ , $ e(f\mathbin{\bullet} x_1,f\mathbin{\bullet} x_2)\le d(x_1,x_2) $ holds.

For an object $ X\in{\bf Div}_{{\mathcal{Q}}}({{\mathbb{C}}}) $ , by $ d_X $ we mean its $ {\mathcal{Q}} $ -divergence part. We also define the forgetful functor $ V_{{\mathcal{Q}},{\mathbb{C}}}:{\bf Div}_{{\mathcal{Q}}}({{\mathbb{C}}})\to{{\mathbb{C}}} $ by $ V_{{\mathcal{Q}},{\mathbb{C}}}(I,d)\triangleq I $ and $ V_{{\mathcal{Q}},{\mathbb{C}}}(f)\triangleq f $ .

We remark that the forgetful functor $ V_{{\mathcal{Q}},{\bf Set}}:{\bf Div}_{{\mathcal{Q}}}({{\bf Set}})\to{{\bf Set}} $ is a (Grothendieck) fibration, and the functor $ \overline U\colon {\bf Div}_{{\mathcal{Q}}}({{\mathbb{C}}})\to{\bf Div}_{{\mathcal{Q}}}({{\bf Set}}) $ defined by $ \overline U(I,d)\triangleq (UI,d) $ and $ \overline U(f)\triangleq f $ makes the following commutative square a pullback in CAT (the large category of categories and functors between them):

Therefore, this pullback diagram asserts that $ V_{{\mathcal{Q}},{\mathbb{C}}}:{\bf Div}_{{\mathcal{Q}}}({{\mathbb{C}}})\to{{\mathbb{C}}} $ arises from the change-of-base (Jacobs, Reference Jacobs1999, Lemma 1.5.1) of the fibration $ V_{{\mathcal{Q}},{\bf Set}} $ along the global section functor $ U:{\mathbb{C}}\to{\bf Set} $ .

4. Divergences on Monads

We introduce the concept of divergence on monad as a quantitative measure of difference between computational effects. This mildly generalizes Barthe and Olmedo’s composable divergences on probability distributions (Barthe and Olmedo, Reference Barthe and Olmedo2013). Divergences on monads are defined upon two extra data called grading monoid and basic endorelation.

Definition 4. A grading monoid is a partially ordered monoid $ (M,\le,1,(\cdot)) $ .

Definition 5. A basic endorelation is a functor $ E\colon {\mathbb{C}}\to{\bf BRel}({\mathbb{C}}) $ such that for each $ I\in{\mathbb{C}} $ , E I is an endorelation.

Grading monoids will be used when formulating $ (\varepsilon,\delta) $ -differential privacy as a divergence on a monad. Basic endorelations specify which global elements are regarded as identical. Any CC $ {\mathbb{C}} $ has at least two basic endorelations given by equality relations and total relations:

\begin{align*} \operatorname{Eq} I&\triangleq(I,I,\{(i,i) \mid i\in UI\}) & \operatorname{Top} I&\triangleq(I,I,UI\times UI). \end{align*}

Other examples of basic endorelations can be found in concrete categories.

  • The category $ {\bf Div}_{{\mathcal{Q}}}({{\mathbb{C}}}) $ of $ {\mathcal{Q}} $ -divergences on $ {\mathbb{C}} $ -objects has a basic relation $ E_\delta $ parameterized by $ \delta\in {\mathcal{Q}} $ . It collects all pairs of global elements whose divergence is bound by $ \delta $ . That is, $ E_\delta(I,d)\triangleq ((I,d),(I,d),\{(x_1,x_2) \mid d(x_1,x_2)\le\delta\}) $ .

  • The category Pre of preorders and monotone functions has the basic endorelation $ E_{eq} $ collecting equivalent global elements: $ E_{eq}(I,\le)\triangleq ((I,\le),(I,\le),\{(x,y) \mid x\le y\wedge y\le x\}) $ .

Definition 6. Let $ ({\mathbb{C}},1,(\times),T,\eta,\mu,\theta) $ be a CC-SM, $ {\mathcal{Q}}=(Q,\le,0,(+)) $ be a divergence domain, $ (M,\le,1,(\cdot)) $ be a grading monoid and $ E\colon {\mathbb{C}}\to{\bf BRel}({\mathbb{C}}) $ be a basic endorelation. An E-relative M-graded $ {\mathcal{Q}} $ -divergence (when $ M=1 $ , we drop “M-graded”) on the monad T is a doubly indexed family of $ {\mathcal{Q}} $ -divergences $ {\mathsf\Delta} = \{{\mathsf\Delta}^{m}_{I} \colon (U(TI))^2 \to {\mathcal{Q}}\}_{m\in M,I\in{\mathbb{C}}} $ satisfying the following conditions:

  • (Monotonicity) For any $ m\le m' $ in M and $ I\in{\mathbb{C}} $ ,

  • (E-unit reflexivity) For any $ I\in{\mathbb{C}} $ ,

    \begin{equation*} \sup_{(x_1,x_2) \in E I}{\mathsf\Delta}^{1}_{I}(\eta_I\mathbin{\bullet} x_1,\eta_I\mathbin{\bullet} x_2)\le 0. \end{equation*}
  • (E-composability) For any $ m_1,m_2\in M $ , $ I,J\in{\mathbb{C}} $ , $ c_1,c_2\in U(TI) $ and $ f_1,f_2\colon I\to TJ $ ,

    \begin{equation*} {\mathsf\Delta}^{m_1\cdot m_2}_{J}(f_1^\sharp\mathbin{\bullet} c_1,f_2^\sharp\mathbin{\bullet} {c_2}) \le {\mathsf\Delta}^{m_1}_{I}(c_1,c_2) + \sup_{(x_1,x_2) \in E I}{\mathsf\Delta}^{m_2}_{J}(f_1\mathbin{\bullet} x_1,f_2\mathbin{\bullet} x_2). \end{equation*}

We write $ {\bf Div}(T, E,M,{\mathcal{Q}}) $ for the collection of E-relative M-graded $ {\mathcal{Q}} $ -divergences on T. We introduce a partial order $ \preceq $ (reusing the notation for the partial order between divergences in Definition 2) on $ {\bf Div}(T, E,M,{\mathcal{Q}}) $ by:

\begin{equation*} {\mathsf\Delta}_1\preceq {\mathsf\Delta}_2 \iff \forall{m\in M,I\in{\mathbb{C}}}~.~ ({\mathsf\Delta}_1)^m_I \preceq_{TI} ({\mathsf\Delta}_2)^m_I. \end{equation*}

The E-composability condition is a generalization of the composability of differential privacy stated as Theorem 1 in Barthe and Olmedo (Reference Barthe and Olmedo2013). What is new in the present paper is that 1) we introduce a condition on the monad unit (E-unit reflexivity), and that 2) the sup computed in E-unit reflexivity and E-composability scans global elements related by E, while Barthe and Olmedo (Reference Barthe and Olmedo2013) only considers the case where $ E=\operatorname{Eq} $ . We will later show that both E-unit reflexivity and E-composability play an important role when connecting divergences, relational liftings of T, and the monad structure of T – these conditions are necessary and sufficient to construct strong graded relational liftings of T satisfying fundamental property with respect to divergences (Proposition 2).

We end this section by stating an interaction between the strength of T and divergences on T.

Proposition 7. Let $ ({\mathbb{C}},T) $ be a CC-SM, $ E:{\mathbb{C}}\to{\bf BRel}({\mathbb{C}}) $ be a basic endorelation, $ (M, \leq, 1, (\cdot)) $ be a grading monoid, and $ {\mathcal{Q}} $ be a divergence domain. Suppose also that $ EI \mathbin{\dot\times} EJ \le E(I \times J) $ holds for all $ I,J \in {\mathbb{C}} $ . Then each divergence $ {\mathsf\Delta}\in{\bf Div}(T, E,M,{\mathcal{Q}}) $ satisfies: for all $ m\in M,(x_1,x_2) \in EI $ and $ c_1,c_2 \in U(TJ) $ ,

\[ {\mathsf\Delta}^{m}_{I \times J} (\theta_{I,J} \mathbin{\bullet} \langle x_1,c_1 \rangle ,\theta_{I,J} \mathbin{\bullet} \langle x_2,c_2 \rangle) \leq {\mathsf\Delta}^{m}_{J}(c_1,c_2). \]

5. Examples of Divergences on Monads

5.1 Cost difference for deterministic computations

We introduce examples of divergence on the cost count monad $ T \triangleq {\mathbb{N}} \times - $ on Set (which is isomorphic to the writer monad over a single alphabet $ \{*\} $ ). The divergence is inspired by the work on relational cost analysis (Çiçek et al., Reference Çiçek, Barthe, Gaboardi, Garg and Hoffmann2017; Radiček et al., Reference Radiček, Barthe, Gaboardi, Garg and Zuleger2017) measuring the difference of costs between two programs by subtraction. Through these examples we also discuss the roles of the E-unit reflexivity and E-composability conditions.

The unit and Kleisli extension of the cost count monad T are defined by:

\begin{align*} \eta_I (x) & \triangleq (0,x) & f^\sharp(i,x) & \triangleq (i + \pi_1(f(x)), \pi_2(f(x))) &(x\in I,i\in{\mathbb{N}},f\colon I\to TJ). \end{align*}

This monad T can be used to record the cost incurred by deterministic computations. For instance, consider the quicksort algorithm $ \mathsf{qsort} $ and the insertion sort algorithm $ \mathsf{isort} $ , both of which are modified so that they tick a count whenever they compare two elements to be sorted. These two modified sort programs are interpreted as functions $ [\![ \mathsf{qsort}]\!],[\![ \mathsf{isort}]\!]\colon {\mathbb{N}}^*\to T({\mathbb{N}}^*) $ so that the first component of $ [\![ \mathsf{qsort}]\!](x) $ and that of $ [\![ \mathsf{isort}]\!](x) $ report the number of comparisons performed during sorting x.

We first define an $ {\mathcal{N}} $ -divergence $ \mathsf{C}_I $ on TI, for each $ I\in{\bf Set} $ , by:

\begin{equation*} {\mathsf{C}_I} {((i,x),(j,y))} \triangleq |i - j|. \end{equation*}

This divergence $ \mathsf{C}_I $ computes the difference of costs between two computations $ (i,x),(j,y) \in TI $ , ignoring their return values. The family $ \mathsf C={\mathsf{C}_I}_{I \in {\bf Set}} $ forms a $ \operatorname{Top} $ -relative $ {\mathcal{N}} $ -divergence on T. The $ \operatorname{Top} $ -unit reflexivity of $ \mathsf{C} $ means that the difference of costs between pure computations is zero:

\[ \mathsf{C}_I (\eta_I (x),\eta_I (y)) = \mathsf{C}_I ((0,x),(0,y)) = 0. \]

The $ \operatorname{Top} $ -composability of $ \mathsf{C} $ says that we can limit the cost difference of two runs of programs $ f^\sharp(i,x) $ and $ g^\sharp(j,y) $ by the sum of cost difference of the preceding computations (i,x),(j,y) and that of two programs $ f,g\colon I\to TJ $ . The latter is measured by taking the sup of cost difference of f(x) and g(y), where (x,y) range over the basic correlation $ \operatorname{Top} I $ .

\begin{align*} \mathsf{C}_J (f^\sharp(i,x),g^\sharp(j,y))& = \mathsf{C}_J ( ( i + \pi_1(f(x)), \pi_2(f(x)) ), ( j + \pi_1(g(y)), \pi_2(g(y)) ) )\\ &\leq |i - j| + \sup_{x,y \in I} |\pi_1(f(x)) - \pi_1(g(y))|\\ &= \mathsf{C}_I((i,x),(j,y)) + \sup_{(x,y) \in \operatorname{Top} I} \mathsf{C}_J(f(x),g(y)). \end{align*}

We remark that $ \mathsf C $ is not an $ \operatorname{Eq} $ -relative $ {\mathcal{N}} $ -divergence on T because the $ \operatorname{Eq} $ -composability fails: If $ I = \{x,y,z\} $ , $ J = \{v,w\} $ and $ f \colon I \to \mathsf C J $ is defined by $ f(x)=(0,w) $ , $ f(y)=(1,w) $ and $ f(z) = (0,v) $ , then we have $ \mathsf{C}_I((0,x),(0,y)) = 0 $ and $ \sup_{(x,y) \in \operatorname{Eq} I} \mathsf{C}_J(f(x),f(y)) = 0 $ , but we have $ \mathsf{C}_J (f^\sharp(0,x),f^\sharp(0,y)) = \mathsf{C}_J((0,w),(1,w)) = 1 $ .

Alternatively, we may consider the following $ {\mathcal{N}} $ -divergence $ \mathsf{C}'_I $ on TI for each $ I\in{\bf Set} $ :

\[ \mathsf{C}'_I((i,x),(j,y)) \triangleq \begin{cases} |i - j| & x = y \\ \infty & x \neq y \end{cases}. \]

This divergence is sensitive on return values of computations. When return values of two computations agree, $ \mathsf C' $ measures the cost difference as done in $ \mathsf{C} $ , but when they do not agree, the cost difference is judged as $ \infty $ . This divergence is an $ \operatorname{Eq} $ -relative $ {\mathcal{N}} $ -divergence on T.

5.2 Cost difference for nondeterministic computations

We may model the cost counting effect and nondeterministic choice by the monad $ P(\mathbb{N} \times -) $ on Set, where P is the powerset monad. We extend the divergence on the cost count monad in the previous section to this combined monad as follows. For two results of nondeterministic computations $ A, B \in P(\mathbb{N} \times I) $ with cost counting effects, the least upper bound of the difference $ i - j $ for all possible choices of $ (i,x) \in A $ and $ (j,y) \in B $ forms the divergence on $ P(\mathbb{N} \times -) $ :

\[ \mathsf{NCI}_I(A,B) \triangleq \sup_{(i,x) \in A, (j,y) \in B} i - j. \]

If either A or B is empty, we fail to get an information of costs. We then have $ \mathsf{NCI}_I(A,B) = -\infty $ . Similary, we can define the divergence $ \mathsf{NC} $ in Table 1 measuring the absolute difference of costs.

Table 1. (1-graded) $ \operatorname{Top} $ -relative $ {\mathcal{Q}} $ -divergences for cost counting monads

5.3 Divergences for differential privacy

Differential privacy (DP for short) is a quantitative definition of privacy of randomized queries in databases. DP is based on the idea of noise-adding anonymization against background knowledge attacks. In the study of DP, a query is modeled by a measurable function $ c \colon I \to G J $ , where I and J are measurable spaces of inputs and outputs, respectively, and G J is the measurable space of all probability measures over J; here, G itself refers to the Giry monad (Giry, Reference Giry and Banaschewski1982); see also Section 2.1).

Definition 8. Differential privacy, (Dwork et al., Reference Dwork, McSherry, Nissim and Smith2006). Let $ c \colon I \to G J $ be a morphism in Meas, representing a randomized query. The query c satisfies $ (\varepsilon,\delta) $ -differential privacy ( $ \epsilon,\delta\ge 0 $ are reals) if for any adjacent datasets $ (d_1,d_2) \in R_{\mathrm{adj}} $ Footnote 2, the following holds

\begin{equation*} \forall S \subseteq_{\mathrm{measurable}}J.~ \Pr[ c(d_1) \in S] \leq \exp(\varepsilon) \Pr[ c(d_2) \in S] + \delta. \end{equation*}

To express this definition in terms of divergence on monad, we introduce a doubly indexed family of $ {\mathcal{R}}^+ $ -divergence $ \mathsf{DP}=\{\mathsf{DP}_{J}^{\varepsilon}\}_{\varepsilon\in[0,\infty],J\in{\bf Meas}} $ on G J by:

\begin{equation*} \mathsf{DP}_{J}^{\varepsilon} (\mu_1,\mu_2)\triangleq \sup_{S\in \Sigma_J } (\mu_1(S) - \exp(\varepsilon) \mu_2(S))\quad (\mu_1,\mu_2\in G J). \end{equation*}

Then the query $ c \colon I \to G J $ satisfies $ (\varepsilon,\delta) $ -DP if and only if

\begin{equation*} \forall{(d_1,d_2)\in R_{\mathrm{adj}}}~.~ \mathsf{DP}_{J}^{\varepsilon}(c(d_1),c(d_2))\le\delta. \end{equation*}

The pair $ (\varepsilon,\delta) $ indicates the difference between output probability distributions $ c(d_1) $ and $ c(d_2) $ of the query c for given datasets $ d_1 $ and $ d_2 $ . Intuitively, the parameter $ \varepsilon $ is an upper bound of the ratio $ \Pr[c(d_1) = s]/\Pr[c(d_2) = s] $ of probabilities which indicates the leakage of privacy. If $ \varepsilon $ is large, attackers can distinguish the datasets $ d_1 $ and $ d_2 $ from the outputs of the query c. The parameter $ \delta $ is the probability of failure of privacy protection.

The family $ \mathsf{DP} $ forms an $ \operatorname{Eq} $ -relative $ {\mathcal{R}}^+ $ -graded $ {\mathcal{R}}^+ $ -divergence on the Giry monad G (Sato et al., Reference Sato, Barthe, Gaboardi, Hsu and Katsumata2019, Lemma 6). This is proved by extending the composability of the divergence for DP on discrete probability distributions shown as Lemmas 3 and 6 in Barthe et al. (Reference Barthe, Köpf, Olmedo and Béguelin2012) and Proposition 5 in Barthe andOlmedo (Reference Barthe and Olmedo2013), based on the composition theorem of DP (Dwork and Roth, Reference Dwork and Roth2013, Section 3.5).

The conditions in Definition 6 on $ \mathsf{DP} $ corresponds to the following basic properties of DP:

  • (monotonicity) The monotonicity of $ \mathsf{DP} $ corresponds to weakening the differential privacy of queries: if c satisfies $ (\varepsilon,\delta) $ -DP and $ \varepsilon \leq \varepsilon' $ and $ \delta \leq \delta' $ holds, then c satisfies $ (\varepsilon',\delta') $ -DP.

  • ( $ \operatorname{Eq} $ -unit reflexivity) The $ \operatorname{Eq} $ -unit reflexivity of $ \mathsf{DP} $ implies $ \mathsf{DP}_{J}^{0}(\eta_J \circ h(x),\eta_J \circ h(x)) = 0 $ for any measurable function $ h \colon I \to J $ and $ x \in I $ . This, together with the composability below, ensures the robustness of DP of a query $ c\colon I\to G J $ with respect to deterministic postprocessing:

    (2) \begin{equation} \label{eq:dpostpro} \forall{h:J\to K}~.~ {\textit{c}}\,\text{is $(\epsilon,\delta)$-DP} \implies \text{$G h\circ c$ is $(\epsilon,\delta)$-DP}. \end{equation}
    In fact, the divergence $ \mathsf{DP} $ is reflexive: we have $ \mathsf{DP}_{J}^{0}(\mu,\mu) = 0 $ for every $ \mu \in G J $ . Therefore, $ h\colon J\to K $ and G h in (2) can be replaced by $ h\colon J\to G K $ and $ h^\sharp $ ; the replaced condition states the robustness of DP of a query with respect to probabilistic postprocessing.
  • ( $ \operatorname{Eq} $ -composability) The $ \operatorname{Eq} $ -composability of $ \mathsf{DP} $ corresponds to the known property of DP called the sequential composition theorem (Dwork and Roth, Reference Dwork and Roth2013). If $ c_1 \colon I \to G J' $ and $ c_2 \colon J' \to G J $ are $ (\varepsilon_1,\delta_1) $ -DP and $ (\varepsilon_2,\delta_2) $ -DP, respectively, then the sequential composition $ c_2^\sharp\circ c_1 \colon I \to G J $ of the queries $ c_1 $ and $ c_2 $ is $ (\varepsilon_1+\varepsilon_2,\delta_1+\delta_2) $ -DP.

A non-example: Pointwise differential privacy. We stated above that a parameter $ (\varepsilon, \delta) $ of DP intuitively gives an upper bound of the probability ratio $ \Pr[c(d_1) = s]/\Pr[c(d_2) = s] $ and the probability of failure of privacy protection. However, strictly speaking, there is a gap between the definition of $ (\varepsilon,\delta) $ -DP and this intuition of $ \varepsilon $ and $ \delta $ . Pointwise differential privacy in Prasad and Smith (Reference Prasad and Smith2014, Definition 3.2) and Hall (Reference Hall2012, Proposition 1.2.3) is a finer definition of DP that is faithful to this intuition.

Definition 9. A measurable function $ c \colon I \to G J $ (regarded as a query) is pointwise $ (\varepsilon,\delta) $ -differentially private if whenever $ d_1 $ and $ d_2 $ are adjacent, there exists a measurable subset $ A \in \Sigma_J $ satisfying $ \Pr[ c(d_1) \notin A] \leq \delta $ and the following inequality:

\[ \forall s \in A.~\Pr[c(d_1) = s] \leq \exp(\varepsilon) \Pr[c(d_2) = s], \]

which is equivalent toFootnote 3

\[ \forall S \subseteq_{\mathrm{measurable}} A.~\Pr[c(d_1) \in S] \leq \exp(\varepsilon) \Pr[c(d_2) \in S]. \]

To express this definition in terms of divergence on monad, we introduce a doubly indexed family of $ {\mathcal{R}}^+ $ -divergences $ \mathsf{pwDP}= \{\mathsf{pwDP}_{}^{\varepsilon}J\}_{\varepsilon\in {\mathcal{R}}^+,J\in{\bf Meas}} $ called pointwise indistinguishability:

\[ \mathsf{pwDP}_{J}^{\varepsilon}(\mu_1,\mu_2) \triangleq \inf \left\{\mu_1(J \setminus A) \mid A \in \Sigma_X \wedge (\forall{S \in \Sigma_J}. S \subseteq A \implies \mu_1(S) \leq \exp(\varepsilon) \mu_2(S)) \right\}. \]

Then, $ c \colon I \to G J $ is pointwise $ (\varepsilon,\delta) $ -differentially private if and only if

\begin{equation*} \forall{(d_1,d_2)\in R_{\mathrm{adj}}}~.~ \mathsf{pwDP}_{J}^{\varepsilon}(c(d_1),c(d_2)) \leq \delta. \end{equation*}

The family $ \mathsf{pwDP} $ is obviously reflexive: $ \mathsf{pwDP}_{J}^{\varepsilon}(\mu,\mu) = 0 $ holds for any $ \mu \in G J $ and $ \varepsilon \geq 0 $ . Hence, it is $ \operatorname{Eq} $ -unit reflexive. However, it is not $ \operatorname{Eq} $ -composable. We let $ I = \{0,1,2\} $ and $ J = \{0,1\} $ be discrete spaces, and let $ \alpha = \exp(\varepsilon) $ . We define two probability distributions $ \mu_1,\mu_2 \in G I $ by:

\[ \mu_1 \triangleq \frac{1}{10}\mathbf{d}_0 + \frac{9}{10}\mathbf{d}_1, \qquad \mu_2 \triangleq \frac{9}{10\alpha}\mathbf{d}_1 + \left(1 - \frac{9}{10\alpha}\right)\mathbf{d}_2. \]

We then have $ \mathsf{pwDP}_{I}^{\varepsilon}(\mu_1,\mu_2) = 1 / 10 $ with $ A = \{1,2\} $ because

\begin{alignat*}{3} \mu_1(\{0\}) ={}&~ \frac{1}{10} ~&>&~ \exp(\varepsilon) \cdot 0 ~&{}= \exp(\varepsilon) \mu_2(\{0\}),\\ \mu_1(\{1\}) ={}&~ \frac{9}{10} ~&\leq&~ \exp(\varepsilon) \cdot \frac{9}{10\alpha} ~&{}= \exp(\varepsilon) \mu_2(\{1\}),\\ \mu_1(\{2\}) ={}&~ 0 ~&\leq&~ \exp(\varepsilon) \cdot (1 - \frac{9}{10\alpha}) ~&{}= \exp(\varepsilon) \mu_2(\{2\}). \end{alignat*}

Next, we define $ f \colon I \to G J $ by:

\[ f(0) \triangleq \frac{1}{10}\mathbf{d}_0 + \frac{9}{10}\mathbf{d}_1,\quad f(1) \triangleq \frac{9}{10}\mathbf{d}_0 + \frac{1}{10}\mathbf{d}_1, \quad f(2) \triangleq \mathbf{d}_1. \]

We then have

\[ f^\sharp(\mu_1) = \frac{82}{100}\mathbf{d}_0 + \frac{18}{100}\mathbf{d}_1, \quad f^\sharp(\mu_2) = \frac{81}{100\alpha}\mathbf{d}_0 + \left(\frac{100\alpha- 90 + 9}{100\alpha} \right)\mathbf{d}_1. \]

Hence, we obtain $ \mathsf{pwDP}_{J}^{\varepsilon}(f^\sharp(\mu_1),f^\sharp(\mu_2)) = 82 / 100 $ with $ A = \{1\} $ because

\begin{alignat*}{3} f^\sharp(\mu_1) (\{0\}) ={}&~ \frac{82}{100} ~&>&~ \exp(\varepsilon) \cdot \frac{81}{100\alpha} ~&{}= \exp(\varepsilon) f^\sharp(\mu_2)(\{0\}),\\ f^\sharp(\mu_1) (\{1\}) ={}&~ \frac{18}{100} ~&\leq&~ \exp(\varepsilon) \cdot \left(\frac{100\alpha- 90 + 9}{100\alpha} \right) ~&{}= \exp(\varepsilon) f^\sharp(\mu_2)(\{1\}). \end{alignat*}

By the reflexivity of $ \mathsf{pwDP} $ , we have $ \sup_{(x,y) \in \operatorname{Eq} I} \mathsf{pwDP}_{J}^{0} (f(x),f(y)) = 0 $ . Therefore, we have obtained a counterexample to the $ \operatorname{Eq} $ -composability of $ \mathsf{pwDP} $ :

\[ \mathsf{pwDP}_{J}^{\varepsilon}(f^\sharp(\mu_1),f^\sharp(\mu_2)) =\frac{82}{100}> \frac{1}{10} = \mathsf{pwDP}_{I}^{\varepsilon}(\mu_1,\mu_2) + \sup_{(x,y) \in \operatorname{Eq} I} \mathsf{pwDP}_{J}^{0} (f(x),f(y)). \]

Various relaxations of differential privacy. Since the seminal work on DP by Dwork et al. (Reference Dwork, McSherry, Nissim and Smith2006), various relaxations of differential privacy have been proposed: Rényi DP (Mironov, Reference Mironov2017), zero-concentrated DP (Bun and Steinke, Reference Bun and Steinke2016), and truncated zero-concentrated DP (Bun et al., Reference Bun, Dwork, Rothblum and Steinke2018). They give tighter bounds of differential privacy. These relaxations of differential privacy can be expressed by suitable divergences on the Giry monad G and sub-Giry monad $ G_{s} $ ; see Table 2 for their definitions. Therefore, $ \alpha,w\in(1,\infty) $ are nongrading parameters for $ \mathsf{Re} $ and $ \mathsf{tCDP} $ . Each row of the table represents that $ {\mathsf\Delta} $ is an $ \operatorname{Eq}{} $ -relative $ {\mathcal{Q}} $ - (resp. $ {\mathcal{Q}}_s $ -) divergences on G (resp. $ G_{s} $ ), and the definition of $ {\mathsf\Delta}^{}_{I}(\mu_1,\mu_2) $ follows.

Table 2. $ \operatorname{Eq}{} $ -relative M-graded $ {\mathcal{Q}} $ - ( $ {\mathcal{Q}}_s $ -)divergences on G ( $ G_{s} $ )

5.4 Statistical divergences and composablity of f-divergences

Apart from differential privacy, various distances between (sub-)probability distributions are introduced in probability theory. They are called statistical divergences. Examples include total variation distance $ \mathsf{TV} $ , Hellinger distance $ \mathsf{HD} $ , Kullback–Leibler divergence $ \mathsf{KL} $ , and $ \chi^2 $ -divergence $ \mathsf{Chi} $ ; they are defined in Table 3. These statistical divergences are $ \operatorname{Eq} $ -relative divergences on the Giry monad G (and $ G_{s} $ for $ \mathsf{TV} $ ); see the same table for their divergence domains. Question marks in the column of $ {\mathcal{Q}}_s $ means that we do not know with which monoid structure the $ \operatorname{Eq} $ -composability holds. We remark that these divergences are also reflexive, that is, $ {\mathsf\Delta}(c,c)=0 $ . $ \operatorname{Eq} $ -composability of these divergences in discrete form are proved in Barthe and Olmedo (Reference Barthe and Olmedo2013) and Olmedo (Reference Olmedo2014). Later, Sato et al. (Reference Sato, Barthe, Gaboardi, Hsu and Katsumata2019) extends their results to the composability of divergences in continuous form.

Table 3. Statistical divergences that are $ \operatorname{Eq}{} $ -relative $ {\mathcal{Q}} $ - (resp. $ {\mathcal{Q}}_s $ -) divergences on G (resp. $ G_{s} $ )

Each of four divergences in Table 3 can be expressed as an f-divergence $ {}^{f}\mathsf{Div} $ (Csiszár, Reference Csiszár1963, Reference Csiszár1967; Morimoto, Reference Morimoto1963):

\[ {}^{f}\mathsf{Div}_{I}(\mu_1,\mu_2) \triangleq \int_I \mu_2(x)f\left(\frac{\mu_1(x)}{\mu_2(x)}\right)dx. \]

Here, f is a parameter called weight function and has to be a convex function $ f \colon [0,\infty) \to {\mathbb{R}} $ , continuous at 0 and satisfying $ \lim_{x\to +0}xf(x)=0 $ . To support general $ \mu_1,\mu_2 \in G_{s} I $ , we suppose $ a f (0/a) = a f^\ast(0) $ for $ a \in [0,\infty) $ where $ f^\ast(0) \triangleq \lim_{x \to \infty} f(x)/x $ (see also Liese and Vajda (Reference Liese and Vajda2006, Definition 2)). Weight functions for four divergences $ \mathsf{TV}, \mathsf{KL},\mathsf{HD},\mathsf{Chi} $ are in Table 4. In fact, $ \mathsf{DP}^\varepsilon $ is also an f-divergence with weight function $ f(t)=\max (0,t-\exp(\varepsilon)) $ ; see Barthe and Olmedo (Reference Barthe and Olmedo2013, Proposition 2). We also remark that Rényi divergence $ {}^{\alpha}\mathsf{Re}_{ }{} $ of order $ \alpha $ is the logarithm of the f-divergence with weight function $ f(t) = t^\alpha $ .

Table 4. Parameters for Proposition 3

f-divergences have several nice properties such as reflexivity, postprocessing inequality, joint-convexity, duality, and continuity (Csiszár, Reference Csiszár1967; Liese and Vajda, Reference Liese and Vajda2006). However, the $ \operatorname{Eq} $ -composability of f-divergences is not guaranteed in general. Here, we provide a sufficient condition for the $ \operatorname{Eq} $ -composability of $ {}^{f}\mathsf{Div} $ over a specific form of divergence domain.

Proposition 10. Let $ \gamma\geq 0 $ be a nonnegative real number, $ {\mathcal{R}}^+_{\gamma} =([0,\infty],\le,0,\lambda{(p,q)}~.~p+q+\gamma pq) $ be the divergence domain, and f be a weight function such that $ f \geq 0 $ and $ f(1) = 0 $ . If there exists $ \alpha, \beta, \beta' \in {\mathbb{R}} $ such that, for all $ x,y,z,w \in [0,1] $ , the following hold

\begin{align*} 0 & \leq (\beta' z + (1-\beta') x) + \gamma x f\left({z}/{x}\right) \\ xy f\left({zw}/{xy}\right) &\leq (\beta w + (1-\beta) y) x f\left({z}/{x}\right) \nonumber + (\beta' z + (1-\beta') x) y f\left({w}/{y}\right)\\ & \quad + \gamma xy f\left({z}/{x}\right)f\left({w}/{y}\right) \nonumber + \alpha (x-z)(w-y), \end{align*}

then $ {}^{f}\mathsf{Div} $ is an $ \operatorname{Eq}{} $ -relative $ {\mathcal{R}}^+_{\gamma} $ -divergence on the Giry monad G. When $ \alpha = 0 $ and $ \beta,\beta' \in [0,1] $ , G can be replaced with the sub-Giry monad $ G_{s} $ .

The proof of this proposition generalizes and integrates the proofs given in Olmedo (Reference Olmedo2014, Section 5.A.2). This proposition is applicable to prove the composability of divergences in Table 3 by choosing suitable parameters; see Table 4.

5.5 Divergences on the probability monad on QBS via monad opfunctors

We have seen various divergences on the Giry monad G. It would be nice if they are transferred to the probability monad P on QBS (Section 2.1). For this, we first develop a generic method for transferring divergences on monads.

Let $ ({\mathbb{C}},S) $ and $ (\mathbb D,T) $ be two CC-SMs. A monad opfunctor (Street, Reference Street1972, Section 4) is a functor $ p \colon {\mathbb{C}} \to \mathbb D $ together with a natural transformation $ \lambda\colon p\circ S \to T\circ p $ making the following diagrams commute:

Proposition 11. Let $ ({\mathbb{C}},S),(\mathbb D,T) $ be two CC-SMs, $ (p\colon {\mathbb{C}}\to\mathbb D, \lambda\colon p\circ S\to T\circ p) $ be a monad opfunctor and assume that $ U^\mathbb D \circ p = U^{\mathbb{C}} $ holds, and basic endorelations $ F \colon {\mathbb{C}} \to {\bf BRel}({\mathbb{C}}) $ and $ E \colon \mathbb D \to {\bf BRel}(\mathbb D) $ satisfy $ R_{FpI} = R_{EI} $ for all $ I \in {\mathbb{C}} $ (we here use $ U^\mathbb D\circ p = U^{\mathbb{C}} $ ). Then for any $ {\mathsf\Delta}\in{\bf Div}(T, E,M,{\mathcal{Q}}) $ , the following doubly indexed family of $ {\mathcal{Q}} $ -divergences $ \langle{p,\lambda}\rangle^*{\mathsf\Delta} =\{ (\langle{p,\lambda}\rangle^*{\mathsf\Delta}{ })^{m}_{I} \}_{m\in M,I\in{\mathbb{C}}} $ on SI is an F-relative M-graded $ {\mathcal{Q}} $ -divergence on S:

\begin{align*} (\langle{p,\lambda}\rangle^*{\mathsf\Delta}{ })^{m}_{I} (\nu_1,\nu_2) &\triangleq {\mathsf\Delta}^m_{pI} (\lambda_{I} \mathbin{\bullet} \nu_1,\lambda_{I} \mathbin{\bullet} \nu_2) = {\mathsf\Delta}^m_{pI} ((U^\mathbb D \lambda_{I})(\nu_1),(U^\mathbb D \lambda_{I})(\nu_2)). \end{align*}

The left adjoint $ L \colon {\bf QBS} \to {\bf Meas} $ of the adjunction $ L \dashv K \colon {\bf Meas}\rightarrow{\bf QBS} $ and the natural transformation $ l \colon LP \Rightarrow G L $ defined by $ l_X([\alpha,\mu]_{\sim_X})= \mu(\alpha^{-1}(-)) $ forms a monad opfunctor from the probability monad P on QBS to the Giry monad G on Meas (Heunen et al., Reference Heunen, Kammar, Staton and Yang2017, Prop. 22 (3)). Through this monad opfunctor (L,l), we can convert $ \operatorname{Eq}{} $ -divergences on G to those on P. This conversion can be applied to all the statistical divergences in Table 2 and 3.

In addition, for any standard Borel space, we can view such converted divergences $ \langle{L,l}\rangle^*{\mathsf\Delta} $ as the same thing as the original $ {\mathsf\Delta} $ . When $ \Omega \in {\bf Meas} $ is standard Borel, we have an equality $ L K \Omega=\Omega $ , and $ l_{K\Omega} $ is an isomorphism. Therefore, we obtain an isomorphism $ l_{K\Omega}\colon L P K \Omega \cong G L K \Omega= G\Omega $ (Heunen et al., Reference Heunen, Kammar, Staton and Yang2017, Prop. 22 (4)). A concrete description of its inverse is $ l_{K\Omega}^{-1}\mathbin{\bullet}\mu= [\gamma',\mu(\gamma^{-1}(-))]_{\sim_{K \Omega}} $ , where $ \gamma' \colon {\mathbb{R}} \to \Omega $ and $ \gamma \colon \Omega \to {\mathbb{R}} $ are a section–retraction pair (i.e. $ \gamma' \circ \gamma = \mathrm{id}_\Omega $ ) that exists for any standard Borel $ \Omega $ .

Theorem 12. For any $ {\mathsf\Delta}\in{\bf Div}(G,\operatorname{Eq},{\mathcal{Q}},M) $ and standard Borel $ \Omega \in {\bf Meas} $ ,

\[ (\langle{p,\lambda}\rangle^*{\mathsf\Delta}{ })^{L,l}_{m} {K \Omega} (l_{K\Omega}^{-1}\mathbin{\bullet}\mu_1,l_{K\Omega}^{-1}\mathbin{\bullet}\mu_2) = {\mathsf\Delta}^{m}_{\Omega}(\mu_1,\mu_2) \quad (\mu_1,\mu_2 \in U(G \Omega)). \]

5.6 Divergences on state monads

The state monad $ T_S\triangleq S \Rightarrow (- \times S) $ with a state space S is used to represent programs that update the state. We construct divergences on $ T_S $ using divergences $ d_S $ on the state space S in several ways.

5.6.1 Lipschitz constant on states

We first consider the state monad $ T_S $ on Set. We also consider a function $ d_S \colon S^2 \to [0,\infty] $ satisfying $ d_S(s,s) = 0 $ . The following $ {\mathcal{R}}^\times $ -divergence $ {\mathsf\Delta}^{\mathsf{lip},d_S}_I(f_1,f_2) $ on $ T_SI $ measures how much the function pair $ (\pi_2\circ f_1, \pi_2\circ f_2) $ extends the distance between two states before updated. In short, $ {\mathsf\Delta}^{\mathsf{lip},d_S} $ measures the Lipschitz constant on state transformers.

Proposition 13. The family $ {\mathsf\Delta}^{\mathsf{lip},d_S}=\{{\mathsf\Delta}^{\mathsf{lip},d_S}_I\}_{I\in{\bf Set}} $ of $ {\mathcal{R}}^\times $ -divergences on $ T_SI $ defined by:

\[ {\mathsf\Delta}^{\mathsf{lip},d_S}_I (f_1, f_2) \triangleq \sup_{s_1, s_2 \in S} \frac{d_S(\pi_2(f_1(s_1)), \pi_2(f_2(s_2)))}{d_S(s_1,s_2)} \quad (f_1,f_2 \in T_S I, \text{ we suppose } 0/0 = 1) \]

is a $ \operatorname{Top} $ -relative $ {\mathcal{R}}^\times $ -divergence on $ T_S $ .

For state transformers $ f_1,f_2 \in T_S I $ , their state-updating part is given as functions $ \pi_2\circ f_1, \pi_2\circ f_2 \in S \Rightarrow S $ . When $ f_1 = f_2 = g $ , $ {\mathsf\Delta}^{\mathsf{lip},d_S}_I(g,g) $ is exactly the Lipschitz constant of $ \pi_2\circ g $ .

5.6.2 Distance between state transformers with the same inputs

Suppose that the function $ d_S $ also satisfies the triangle inequality. The following $ {\mathcal{R}}^+ $ -divergence $ {\mathsf\Delta}^{\mathsf{met},d_S}_{I}(f_1,f_2) $ on $ T_SI $ estimates the distance between updated states after the state transformers $ f_1 $ and $ f_2 $ are applied to the same input.

Proposition 14. Suppose that the function $ d_S $ also satisfy the triangle inequality. The family $ {\mathsf\Delta}^{\mathsf{met},d_S}=\{{\mathsf\Delta}^{\mathsf{met},d_S}_I\}_{I\in{\bf Set}} $ of $ {\mathcal{R}}^+ $ -divergences on $ T_SI $ defined by:

\begin{align*} {\mathsf\Delta}^{\mathsf{met},d_S}_I (f_1, f_2) &\triangleq \begin{cases} \sup_{s \in S}\! d_S(\pi_2(f_1(s)), \pi_2(f_2(s))) & \pi_1\circ f_1 = \pi_1\circ f_2~\text{and}\\ &\pi_2\circ f_1,\pi_2\circ f_2 \colon \text{nonexpansive} \\ \infty & \text{otherwise} \end{cases} \end{align*}

is an $ \operatorname{Eq} {} $ -relative $ {\mathcal{R}}^+ $ -divergence on $ T_S $ .

5.6.3 Sup-metric on the state monad on the category of generalized ultrametric spaces

The category $ \mathbf{Gum} $ of generalized ([0,1]-valued) ultrametric spacesFootnote 4 and nonexpansive functions is Cartesian closed (Rutten, Reference Rutten1996, Section 2.2). We consider the state monad $ T_S = S \Rightarrow (- \times S) $ on $ \mathbf{Gum} $ for a fixed space $ (S,d_S)\in\mathbf{Gum} $ . From the definition of exponential objects in $ \mathbf{Gum} $ , $ T_S(I,d_I) $ consists of the set of nonexpansive state transformers with the sup-metric between them. In fact, the metric part of all $ T_S(I,d_I) $ forms a divergence on $ T_S $ .

Proposition 15. The family $ \{ d_{T_S I} \colon (T_S(I,d_I))^2 \to [0,1] \}_{(I,d_I) \in \mathbf{Gum}} $ consisting of the metric part of the spaces $ T_S(I,d_I) $ , given by:

\begin{align*} d_{T_S I} (f_1, f_2) \triangleq \sup_{s \in S} \max\left( d_I(\pi_1(f_1 (s)),\pi_1(f_2 (s))), d_S(\pi_2(f_1 (s)),\pi_2 (f_2 (s))) \right) \end{align*}

forms an $ \operatorname{Eq} {} $ -relative $ ([0,1],\leq,\max,0) $ -divergence on $ T_S $ .

In the category $ \mathbf{Gum} $ , instead of $ \operatorname{Eq} $ , there is another basic endorelation $ \mathsf{Dist}_0 $ :

\[ \mathsf{Dist}_0 (I,d_I) \triangleq \{ (x_1,x_2) \mid d_I(x_1,x_2) =0 \}. \]

By modifying the divergence $ d_{T_S (-)} $ , we obtain a $ \mathsf{Dist}_0 $ -relative $ ([0,1],\leq,\max,0) $ -divergence as below:

Proposition 16. The following forms a $ \mathsf{Dist}_0 {} $ -relative $ ([0,1],\leq,\max,0) $ -divergence on $ T_S $ :

\[ {\mathsf\Delta}^{\mathsf{Dist}_0}_{(I,d_I)} (f_1, f_2) \triangleq \sup_{d_S(s_1,s_2) = 0} \max( d_S(\pi_1 (f_1 (s_1)),\pi_1(f_2 (s_2))), d_I(\pi_2(f_1 (s_1)),\pi_2(f_2 (s_2)))). \]

5.7 Combining divergence with cost

In Section 5.2, we have introduced divergences on the monad $ P({\mathbb{N}}\times -) $ modeling nondeterministic choice and cost counting. These divergences are based on the distance/subtractions of costs represented by natural numbers. In this section, we provide an alternative divergences on the combination of a general computational effect T and cost counting. The basic idea is the following: given two computations $ c_1,c_2\in T(N\times I) $ , we discard the value part of $ c_i $ by $ T\pi_1:T(N\times I)\to T(N) $ and measure their difference by the divergence assumed on T.

Let $ ({\mathbb{C}},T) $ be a CC-SM and $ {\mathsf\Delta}\in{\bf Div}(T,\operatorname{Eq} ,1,{\mathcal{Q}}) $ be a divergence and $ (N,1_N\colon 1\to N,(\star)\colon N\times N\to N) $ be a monoid object in $ {\mathbb{C}} $ for cost counting. Then the composite $ T(N \times - ) $ of the monad T and the monoid action monad $ N \times (-) $ again carries a monad structure. We now define a family $ {\mathsf C}({\mathsf\Delta}, N) = \{ {\mathsf C}({\mathsf\Delta},N)_{I} \colon (U(T(N\times I)))^2 \to {\mathcal{Q}} \}_{I \in {\mathbb{C}}} $ of $ {\mathcal{Q}} $ -divergences by:

\begin{align*} {\mathsf C}({\mathsf\Delta},N )_{I} (c_1, c_2)\triangleq \begin{cases} {\mathsf\Delta}^{}_{N} (T \pi_1 \mathbin{\bullet} c_1, T \pi_1 \mathbin{\bullet} c_2) & {\mathsf\Delta}^{}_{N \times I} (c_1, c_2) \leq {\mathsf\Delta}^{}_{N} (T \pi_1 \mathbin{\bullet} c_1, T \pi_1 \mathbin{\bullet} c_2) \\ \top_{{\mathcal{Q}}} & \text{otherwise} \end{cases}. \end{align*}

Proposition 17. The family $ {\mathsf C}({\mathsf\Delta},N) $ is an $ \operatorname{Eq} $ -relative $ {\mathcal{Q}} $ -divergence on $ T(N\times -) $ .

For example, the divergence $ {\mathsf C}(\mathsf{KL}, {{\mathbb{R}}}) $ on the composite monad $ G({\mathbb{R}} \times -) $ on Meas describes Kullback–Leibler divergence between distributions of costs in the probabilistic computations with real-valued costs. Intuitively, the side condition $ \mathsf{KL}_{{\mathbb{R}} \times I} (\mu_1, \mu_2) \leq \mathsf{KL}_{{\mathbb{R}}} (G \pi_1 \mathbin{\bullet} \mu_1, G \pi_1 \mathbin{\bullet} \mu_2) $ in the definition of $ {\mathsf C}(\mathsf{KL}, {{\mathbb{R}}}) $ means that the difference between $ \mu_1 $ and $ \mu_2 $ lies only in the costs.

5.8 Preorders on monads

To explore the generality of our framework, we look at the case where the divergence domain is $ {\mathcal{B}}=(\{0\ge 1\},1,\times) $ ; here, $ \times $ is the numerical multiplication. We identify an indexed family $ {\mathsf\Delta} = \{{\mathsf\Delta}^{}_{I} \colon (U(TI))^2 \to {\mathcal{B}} \}_{I \in {\mathbb{C}}} $ of $ {\mathcal{B}} $ -divergences and a family of adjacency relations $ \tilde {\mathsf\Delta} (1) I \triangleq \{ (c_1, c_2) \mid {\mathsf\Delta}^{}_{I} (c_1, c_2) \leq 1 \}_{I\in{\bf Set}} $ .

We point out a connection between $ \operatorname{Eq} $ -relative $ {\mathcal{B}} $ -divergences and preorders on monads studied in Katsumata and Sato (Reference Katsumata, Sato and Pfenning2013) and Sato (Reference Sato, Jacobs, Silva and Staton2014). A preorder on a monad T on Set assigns a preorder $ \sqsubseteq_I $ on TI for each $ I\in{\bf Set} $ , and this assignment satisfies:

  • (Substitutivity) For any function $ f\colon I\to TJ $ and $ c_1,c_2\in TI $ , $ c_1\sqsubseteq _Ic_2 $ implies $ f^\sharp (c_1)\sqsubseteq_J f^\sharp (c_2) $ .

  • (Congruence) For any function $ f_1,f_2\colon I\to TJ $ , if $ f_1(x)\sqsubseteq_Jf_2(x) $ holds for any $ x\in I $ , then $ f_1^\sharp (c)\sqsubseteq_Jf_2^\sharp (c) $ holds for any $ c\in TI $ .

Proposition 18. A preorder on a monad T on Set bijectively corresponds to an $ \operatorname{Eq} $ -relative $ {\mathcal{B}} $ -divergence $ {\mathsf\Delta} $ on T such that each $ \tilde{\mathsf\Delta}(1)I $ is a preorder.

For a preorder $ \sqsubseteq $ on a monad T on Set, by $ {\mathsf\Delta}^{\sqsubseteq}_{ }{} $ we mean the divergence corresponding to $ \sqsubseteq $ by Proposition 18 (in fact, we have $ \widetilde{{\mathsf\Delta}^{\sqsubseteq}_{ }{}}(1)I = {\sqsubseteq_I} $ and $ \widetilde{{\mathsf\Delta}^{\sqsubseteq}_{ }{}}(0)I = TI \times TI $ for all set I).

6. Properties of Divergences on Monads

6.1 Divergences on monads as structures in $ {\bf Div}_{{\mathcal{Q}}}({{\mathbb{C}}}) $

In this section, we examine divergences on monads from the view point of the monoidal structure of $ {\bf Div}_{{\mathcal{Q}}}({{\mathbb{C}}}) $ . For any CC $ {\mathbb{C}} $ , the category $ {\bf Div}_{{\mathcal{Q}}}({{\mathbb{C}}}) $ has a symmetric monoidal structure, whose unit and tensor product are given by:

\begin{align*} {\bf I}&\triangleq(1,\lambda{(x_1,x_2)}~.~0),\\ (I,d)\otimes(J,e)&\triangleq(I\times J,\lambda{((x_1,y_1),(x_2,y_2))}~.~d(x_1,x_2)+e(y_1,y_2)). \end{align*}

The coherence isomorphisms of this symmetric monoidal structure are inherited from the Cartesian monoidal structure on $ {\mathbb{C}} $ . Moreover, $ V_{{\mathcal{Q}},{\mathbb{C}}}:{\bf Div}_{{\mathcal{Q}}}({{\mathbb{C}}})\to{{\mathbb{C}}} $ becomes a symmetric strict monoidal functor of type $ ({\bf Div}_{{\mathcal{Q}}}({{\mathbb{C}}}),{\bf I},(\otimes))\to ({\mathbb{C}},1,(\times)) $ .

6.1.1 Enrichments of Kleisli categories induced by divergences

Let $ ({\mathbb{C}},T) $ be a CC-SM. We first show that a nongraded divergence on a monad T attaches a $ {\bf Div}_{{\mathcal{Q}}}({{\bf Set}}) $ - enrichment on the Kleisli category $ {\mathbb{C}}_T $ of T. Attaching an enrichment to an ordinary category is formulated as follows.

Definition 19. A $ {\bf Div}_{{\mathcal{Q}}}({{\bf Set}}) $ -enrichment of a category $ \mathbb D $ is a family $ \{ d_{I, J}:\mathbb D(I,J)^2\to{\mathcal{Q}} \}_{I, J \in \mathbb D} $ of $ {\mathcal{Q}} $ -divergences on the homset $ \mathbb D (I, J) $ such that the following inequalities hold

(3) \begin{align} & d_{I, I} ({\rm id}_I, {\rm id}_I) \leq 0, \label{eq:unit} \end{align}
(4) \begin{align} & d_{I, K} (g_1 \circ f_1, g_2 \circ f_2) \leq d_{J, K} (g_1, g_2) + d_{I, J} (f_1, f_2). \label{eq:comp} \end{align}

Such an enrichment determines a $ {\bf Div}_{{\mathcal{Q}}}({{\bf Set}}) $ -enriched category $ \mathbb D^d $ , whose object collection and homobjects are given by:

\begin{align*} {\bf Obj}(\mathbb D^d)\triangleq {\bf Obj}(\mathbb D),\quad \mathbb D^d(I,J)\triangleq(\mathbb D (I, J),d_{I,J}). \end{align*}

The identity and composition morphisms of $ \mathbb D^d $ :

\begin{align*} j_I:{\bf I}\to\mathbb D^d(I,I),\quad m_{I,J,K}:\mathbb D^d(J,K)\otimes\mathbb D^d(I,J)\to\mathbb D^d(I,K) \end{align*}

are inherited from $ \mathbb D $ ; they are guaranteed to be nonexpansive by conditions (3) and (4).

We characterize $ {\bf Div}_{{\mathcal{Q}}}({{\bf Set}}) $ -enrichments of $ \mathbb D $ as $ {\bf Div}_{{\mathcal{Q}}}({{\bf Set}}) $ -enriched categories whose change-of-enriching-category along $ V_{{\mathcal{Q}},{\bf Set}} $ coincides with $ \mathbb D $ .

Proposition 20. Let $ \mathbb D $ be a category. There is a bijective correspondence between 1) a $ {\bf Div}_{{\mathcal{Q}}}({{\bf Set}}) $ -enrichment $ \{d_{I,J}\}_{I,J\in\mathbb D} $ of $ \mathbb D $ and 2) a $ {\bf Div}_{{\mathcal{Q}}}({{\bf Set}}) $ -enriched category $ \mathbb E $ such that the change-of-enriching-category of $ \mathbb E $ by $ V_{{\mathcal{Q}},{\bf Set}}:{\bf Div}_{{\mathcal{Q}}}({{\bf Set}})\to{{\bf Set}} $ is equal to $ \mathbb D $ .

We relate conditions (3) and (4) with the unit reflexivity and composability conditions in the definition of divergence on monad (Definition 6).

Theorem 21. Let $ ({\mathbb{C}}, T) $ be a CC-SM, $ E:{\mathbb{C}}\to{\bf BRel}({\mathbb{C}}) $ be a basic endorelation such that $ R_{E 1} \neq \emptyset $ Footnote 5, $ {\mathcal{Q}} $ be a divergence domain and $ {\mathsf\Delta}=\{{\mathsf\Delta}^{}_{I}:(U(TI))^2\to{\mathcal{Q}} \}_{I\in{\mathbb{C}}} $ be a family of $ {\mathcal{Q}} $ -divergences on TI. Define a family $ d^{\mathsf\Delta}=\{ d^{\mathsf\Delta}_{I, J}:{\mathbb{C}}_T(I,J)^2\to{\mathcal{Q}} \}_{I, J \in {\mathbb{C}}} $ of $ {\mathcal{Q}} $ -divergences on the homset $ {\mathbb{C}}_T (I, J) $ of the Kleisli category $ {\mathbb{C}}_T $ by:

(5) \begin{equation} \label{eq:enrich} d^{\mathsf\Delta}_{I, J} (f_1, f_2) \triangleq \sup_{(x_1, x_2) \in E I} {\mathsf\Delta}^{}_{J} (f_1 \mathbin{\bullet} x_1, f_2 \mathbin{\bullet} x_2) . \end{equation}

Then $ d^{\mathsf\Delta} $ is a $ {\bf Div}_{{\mathcal{Q}}}({{\bf Set}}) $ -enrichment of $ {\mathbb{C}}_T $ if and only if $ {\mathsf\Delta} $ is an E-relative $ {\mathcal{Q}} $ -divergence on T.

6.1.2 Internalizing divergences in $ {\bf Div}_{{\mathcal{Q}}}({{\mathbb{C}}}) $

We seek a further characterization of the $ {\mathcal{Q}} $ -divergence (5) given to each homset of $ {\mathbb{C}}_T $ . Under a strengthened assumption, we relate it with the closed structure with respect to the tensor product of $ {\bf Div}_{{\mathcal{Q}}}({{\mathbb{C}}}) $ . This allows us to internalize divergences on monads as structures in $ {\bf Div}_{{\mathcal{Q}}}({{\mathbb{C}}}) $ .

Let $ ({\mathbb{C}},T) $ be a CCC-SM and $ {\mathcal{Q}}=(Q,\le,0,(+)) $ be a divergence domain whose monoid operation $ (+) $ preserves the largest element $ \top\in{\mathcal{Q}} $ , that is, $ x+\top=\top $ . A consequence of this strengthened assumption is the following:

Lemma 22. Let $ X\in{\bf Div}_{{\mathcal{Q}}}({{\mathbb{C}}}) $ be an object such that its $ {\mathcal{Q}} $ -divergence $ d_X $ takes values in $ \{0,\top\}\subseteq{\mathcal{Q}} $ . Define a functor $ X\multimap(-):{\bf Div}_{{\mathcal{Q}}}({{\mathbb{C}}})\to{\bf Div}_{{\mathcal{Q}}}({{\mathbb{C}}}) $

(6) \begin{align} X\multimap Y & \triangleq(V_{{\mathcal{Q}},{\mathbb{C}}} X\Rightarrow V_{{\mathcal{Q}},{\mathbb{C}}} Y,d_{X\multimap Y}) \nonumber \\ d_{X\multimap Y}(f_1,f_2) & \triangleq \sup_{x_1,x_2\in U(V_{{\mathcal{Q}},{\mathbb{C}}} X),d_X(x_1,x_2)=0}d_Y(\lfloor{f_1}\rfloor\bullet x_1,\lfloor{f_2}\rfloor\bullet x_2); \label{eq:expdiv} \end{align}

here $ \lfloor{-}\rfloor:U(I\Rightarrow J)\to{\mathbb{C}}(I,J) $ is the bijection given in Section 2. Then it is a left adjoint to the functor $ (-)\otimes X:{\bf Div}_{{\mathcal{Q}}}({{\mathbb{C}}})\to{\bf Div}_{{\mathcal{Q}}}({{\mathbb{C}}}) $ tensoring with X. Moreover, $ V_{{\mathcal{Q}},{\mathbb{C}}}:{\bf Div}_{{\mathcal{Q}}}({{\mathbb{C}}})\to{{\mathbb{C}}} $ is a map of adjunction (Mac Lane, Reference Mac Lane1998, Section IV.7) in the following sense:

The $ {\mathcal{Q}} $ -divergence $ d_{X\multimap Y} $ in (6) is similar to the sup part of the composability condition in Definition 6. We exploit this similarity to express the unit reflexivity and composability conditions of divergence on monad (Definition 6) using the internal hom functor $ X\multimap(-) $ . First, we define the uncurried bind morphism $ \mathrm{ub}_{I,J}:TI\times(I\Rightarrow TJ)\rightarrow TJ $ by:

(7)

Next, for a basic endorelation $ E:{\mathbb{C}}\to{\bf BRel}({\mathbb{C}}) $ , we define the functor $ :{E}^{\mathcal{Q}}{\mathbb{C}}\to{\bf Div}_{{\mathcal{Q}}}({{\mathbb{C}}}) $ by:

\begin{equation*} {E}^{\mathcal{Q}}I\triangleq(I,d_{{E}^{\mathcal{Q}}I}),\quad {E}^{\mathcal{Q}}f\triangleq f,\quad\text{where}\quad d_{{E}^{\mathcal{Q}}I}(x_1,x_2)\triangleq \left\{\begin{array}{ll} 0 & (x_1,x_2)\in E \\ \top & (x_1,x_2)\not\in E. \end{array}\right. \end{equation*}

Theorem 23. Let $ ({\mathbb{C}},T) $ be a CCC-SM, $ (M,\le,1,(\cdot)) $ be a grading monoid, $ {\mathcal{Q}}=(Q,\le,0,(+)) $ be a divergence domain whose monoid operation $ (+) $ satisfies $ x+\top=\top $ , and $ E:{\mathbb{C}}\to{\bf BRel}({\mathbb{C}}) $ be a basic endorelation. Let $ {\mathsf\Delta}=\{{\mathsf\Delta}^{m}_{I}:U(TI)^2\to{\mathcal{Q}}\}_{m\in M,I\in{\mathbb{C}}} $ be a doubly indexed family of $ {\mathcal{Q}} $ -divergences on TI, regarded as a family of $ {\bf Div}_{{\mathcal{Q}}}({{\mathbb{C}}}) $ -objects. Then

  1. (1) $ {\mathsf\Delta} $ satisfies the E-unit reflexivity condition if and only if for any $ I\in{\mathbb{C}} $ , the following nonexpansivity holds on the global element $ \lceil{\eta_I}\rceil:1\to I\Rightarrow TI $ corresponding to the monad unit:

    \begin{equation*} \lceil{\eta_I}\rceil\in{\bf Div}_{{\mathcal{Q}}}({{\mathbb{C}}}) ({\bf I}, {E}^{\mathcal{Q}}I\multimap {\mathsf\Delta}^{1}_{I}). \end{equation*}
  2. (2) $ {\mathsf\Delta} $ satisfies the E-composablity condition if and only if for any $ I,J\in{\mathbb{C}} $ and $ m,n\in M $ , the following nonexpansivity holds on the uncurried bind morphism $ \mathrm{ub}_{I,J}:TI\times(I\Rightarrow TJ)\rightarrow TJ $ :

    \begin{equation*} \mathrm{ub}_{I,J}\in{\bf Div}_{{\mathcal{Q}}}({{\mathbb{C}}})({\mathsf\Delta}^{m}_{I}\otimes ( {E}^{\mathcal{Q}}I\multimap{\mathsf\Delta}^{n}_{J}), {\mathsf\Delta}^{m\cdot n}_{J}). \end{equation*}

Azevedo de Amorim et al. (Reference Azevedo de Amorim, Gaboardi, Hsu and Katsumata2019) formalized families of composable divergences as parameterized assignments in weakly closed monoidal refinements. Roughly speaking, they adopted the equivalence (2) of Theorem 23 as the definition of parameterized assignment. However, divergence on monads and parameterized assignments are built on slightly different categorical foundations, and their generalities are incomparable. Notable differences from parameterized assignments are: 1) divergences on monads are defined with respect to basic endorelations, and 2) the underlying category of divergences on monads is any CCs, while parameterized assignments requires a closed structure on their underlying category.

6.1.3 Divergences on monads and divergence liftings of monads

We next relate graded divergences on monads and monad-like structures on the category $ {\bf Div}_{{\mathcal{Q}}}({{\mathbb{C}}}) $ of $ {\mathcal{Q}} $ -divergences on $ {\mathbb{C}} $ -objects. What we mean by monad-like structures is graded divergence liftings of monads on $ {\mathbb{C}} $ , which we introduce below. It is a graded monad on $ {\bf Div}_{{\mathcal{Q}}}({{\mathbb{C}}}) $ (Katsumata, Reference Katsumata2014; Smirnov, Reference Smirnov2008) whose unit and multiplication are inherited from a monad on $ {\mathbb{C}} $ .

Definition 24. Let $ ({\mathbb{C}}, T) $ be a CC-SM, $ (M,\le,1,(\cdot)) $ be a grading monoid and $ {\mathcal{Q}} $ be a divergence domain. An M-graded $ {\mathcal{Q}} $ -divergence lifting of T is a mapping $ \dot{T} : M \times{\bf Obj}({\bf Div}_{{\mathcal{Q}}}({{\mathbb{C}}}) )\rightarrow{\bf Obj}({\bf Div}_{{\mathcal{Q}}}({{\mathbb{C}}})) $ such that (below V stands for the forgetful functor $ V_{{\mathcal{Q}},{\mathbb{C}}}:{\bf Div}_{{\mathcal{Q}}}({{\mathbb{C}}})\to{{\mathbb{C}}} $ )

  1. (1) $ V (\dot{T} m X) = T (V X) $

  2. (2) $ m \leq n $ implies $ \dot{T} m X \preceq_{T(VX)} \dot{T} n X $

  3. (3) $ \eta_{V X} \in {\bf Div}_{{\mathcal{Q}}}({{\mathbb{C}}}) (X, \dot{T} 1 X) $

  4. (4) $ \mu_{V X} \in {\bf Div}_{{\mathcal{Q}}}({{\mathbb{C}}}) (\dot{T} m (\dot{T} n X), \dot{T} (m \cdot n) X) $ .

Let $ E : {\mathbb{C}} \rightarrow {\bf BRel}({\mathbb{C}}) $ be a basic endorelation. We say that an M-graded $ {\mathcal{Q}} $ -divergence lifting $ \dot{T} $ of T is E-strong if the strength $ \theta $ of T satisfies

\[ \theta_{V X, J} \in {\bf Div}_{{\mathcal{Q}}}({{\mathbb{C}}}) (X \otimes \dot{T} m ( {E}^{\mathcal{Q}} J), \dot{T} m (X \otimes {E}^{\mathcal{Q}}J)) . \]

We write $ {\bf SGDLift}(T, E,M,{\mathcal{Q}}) $ for the collection of E-strong M-graded $ {\mathcal{Q}} $ -divergence liftings of T. We introduce a partial order $ \preceq $ (reusing the notation for the partial order between divergences in Definition 2) on $ {\bf SGDLift}(T, E,M,{\mathcal{Q}}) $ by:

\[ \dot{T} \preceq \dot{S} \iff \forall{m \in M, X \in {\bf Div}_{{\mathcal{Q}}}({{\mathbb{C}}})}~.~ \dot{T} m X\preceq_{T(VX)} \dot{S} m X. \]

We will later see a similar concept of strong graded relational lifting of monad in Definition 37. Divergence liftings and relational liftings are actually instances of a common general definition of strong graded lifting of monad (Katsumata, Reference Katsumata2014), but in this paper we omit this general definition.

In the following theorem, we show that every divergence on a monad can be expressed as the composite of a graded divergence lifting and the divergence corresponding to a basic endorelation.

Theorem 25. Let $ ({\mathbb{C}}, T) $ be a CC-SM, $ (M,\le,1,(\cdot)) $ be a grading monoid, $ {\mathcal{Q}} $ be a divergence domain, and $ E : {\mathbb{C}} \rightarrow {\bf BRel}({\mathbb{C}}) $ be a basic endorelation. For any $ {\mathsf\Delta} \in {\bf Div}(T, E,M,{\mathcal{Q}}) $ , define a mapping $ [{\mathsf\Delta}]:M\times{\bf Obj}({\bf Div}_{{\mathcal{Q}}}({{\mathbb{C}}}))\to{\bf Obj}({\bf Div}_{{\mathcal{Q}}}({{\mathbb{C}}})) $ by:

\[ [{\mathsf\Delta}] m X \triangleq (T I, d_{[{\mathsf\Delta}] m X})\quad (X=(I, d)) \]

where

\[ d_{[{\mathsf\Delta}] m X} (c_1, c_2) \triangleq \sup_{J \in {\mathbb{C}}, n \in M, f \in {\bf Div}_{{\mathcal{Q}}}({{\mathbb{C}}}) (X, {\mathsf\Delta}^n_J)} {\mathsf\Delta}^{m \cdot n}_J (f^\sharp \mathbin{\bullet} c_1, f^\sharp \mathbin{\bullet} c_2) . \]

Then $ [{\mathsf\Delta}] $ is an M-graded $ {\mathcal{Q}} $ -divergence lifting such that $ {\mathsf\Delta}^m_I = [{\mathsf\Delta}] m ( {E}^{\mathcal{Q}}I) $ .

When $ M = 1 $ , Theorem 25 implies that the assignment $ I\mapsto {\mathsf\Delta}_I $ extends to the $ {E}^{\mathcal{Q}} $ - relative monad $ [{\mathsf\Delta}]\circ :{E}^{\mathcal{Q}}{\mathbb{C}}\to{\bf Div}_{{\mathcal{Q}}}({{\mathbb{C}}}) $ in the sense of Altenkirch et al. (Reference Altenkirch, Chapman and Uustalu2015).

When we strengthen the assumptions on $ ({\mathbb{C}}, T) $ and $ {\mathcal{Q}} $ as done in Section 6.1.2, we obtain a sharper correspondence between divergences on monads and strong graded divergence liftings of monads.

Theorem 26. Let $ ({\mathbb{C}}, T) $ be a CCC-SM, M be a grading monoid, $ {\mathcal{Q}} $ be a divergence domain such that $ (+) $ satisfies $ x+\top=\top $ and $ E : {\mathbb{C}} \rightarrow {\bf BRel}({\mathbb{C}}) $ be a basic endorelation. Then there exists an adjunction between partial orders:

where $ \langle \dot{T}\rangle m I \triangleq \dot{T} m ( {E}^{\mathcal{Q}}I) $ .

6.2 Generation of divergences

It has been shown that DP can be interpreted as hypothesis testing (Kairouz et al., Reference Kairouz, Oh and Viswanath2015; Wasserman and Zhou, Reference Wasserman and Zhou2010). Given a query $ c\colon I\to G J $ and adjacent datasets $ (d_1,d_2) \in R_{\mathrm{adj}}\subseteq I^2 $ , we consider the following hypothesis testing with the null and alternative hypotheses:

For any rejection region $ S \in \Sigma_J $ , the Type I and Type II errors are then represented by $ \Pr[c(d_1) \in S] $ and $ \Pr[c(d_2) \notin S] $ , respectively. Kairouz et al. (Reference Kairouz, Oh and Viswanath2015) showed that c is $ (\varepsilon,\delta) $ -DP if and only if for any adjacent datasets $ (d_1,d_2) \in R_{\mathrm{adj}}\subseteq I^2 $ , the pair of Type I error and Type II error lands in the privacy region $ R(\varepsilon,\delta) $ :

\begin{equation*} \forall{S \in \Sigma_J}~.~ (\Pr[c(d_1) \in S],\Pr[c(d_2) \notin S]) \in \underbrace{\{ (x,y) \in [0,1]^2 | (1-x) \leq \exp(\varepsilon) y + \delta \}}_{\triangleq R(\varepsilon,\delta)} . \end{equation*}

They also showed that this is equivalent to the testing using probabilistic decision rules (Kairouz et al., Reference Kairouz, Oh and Viswanath2015, Corollary 2.3):

\[ \forall{k \colon J \to G \{\mathsf{Acc},\mathsf{Rej}\}}~.~ (\Pr[ k^\sharp c(d_1) = \mathsf{Acc}],\Pr[k^\sharp c(d_2) = \mathsf{Rej}]) \in R(\varepsilon,\delta). \]

Later Balle et al. (Reference Balle, Barthe, Gaboardi, Hsu, Sato, Chiappa and Calandra2020) generalized this probabilistic variant of hypothesis testing to general statistical divergences and arrived at a notion of k-generatedness of statistical divergences ( $ k \in {\mathbb{N}} \cup \{\infty\} $ ). Following their generalization, we introduce the concept of $ \Omega $ -generatedness of divergences on monads.

Definition 27. Let $ \Omega \in {\mathbb{C}} $ . A divergence $ {\mathsf\Delta}\in{\bf Div}(T, E,M,{\mathcal{Q}}) $ is $ \Omega $ -generated if for any $ m \in M $ , $ I\in{\mathbb{C}} $ and $ c_1,c_2 \in U(TI) $ ,

\begin{equation*} {\mathsf\Delta}^{m}_{I} (c_1,c_2) = \sup_{k \colon I \to T\Omega} {\mathsf\Delta}^{m}_{\Omega} (k^\sharp \mathbin{\bullet} c_1,k^\sharp \mathbin{\bullet} c_2). \end{equation*}

An equivalent definition of $ {\mathsf\Delta} \in {\bf Div}(T, E,M,{\mathcal{Q}}) $ being $ \Omega $ -generated is: the following holds for any $ m\in M,I\in{\mathbb{C}},c_1,c_2\in U(TI),v\in{\mathcal{Q}} $ :

\[ {\mathsf\Delta}^{m}_{I} (c_1,c_2) \leq v \iff \forall{k \colon I \to T\Omega}~.~ (k^\sharp\mathbin{\bullet} c_1,k^\sharp\mathbin{\bullet} c_2) \in \tilde{\mathsf\Delta} (m,v) \Omega. \]

Here, $ \tilde{\mathsf\Delta}(m,v)\Omega $ is the binary relation $ \{ (c_1, c_2) \mid {\mathsf\Delta}^{m}_{\Omega} (c_1, c_2) \leq v \} $ ; see also (14). For an $ \Omega $ -generated divergence $ {\mathsf\Delta} $ , its component $ {\mathsf\Delta}^{m}_{\Omega} $ at $ \Omega $ is an essential part that determines all components $ {\mathsf\Delta}^{m}_{I} $ of $ {\mathsf\Delta} $ . When a divergence is shown to be $ \Omega $ -generated, the calculation of the codensity lifting $ T^{[ {{\mathsf\Delta}}]} $ given in Section 7 will be simplified (Section 7.1).

We illustrate $ \Omega $ -generatedness of various divergences. First, we show the $ \Omega $ -generatedness of divergences on the Giry monad G in Tables 2 and 3.

On the sub-Giry monad $ G_{s} $ , the divergence $ \mathsf{DP} $ is 1-generated, and the total variation distance $ \mathsf{TV} $ is 2-generated.

Proposition 28. The divergence $ \mathsf{DP} \in {\bf Div}(G_{s},\operatorname{Eq}, {\mathcal{R}}^+,{\mathcal{R}}^+) $ is 1-generated.

Proposition 29. The divergence $ \mathsf{TV} \in {\bf Div}(G_{s},\operatorname{Eq}, 1,{\mathcal{R}}^+) $ is not 1-generated but 2-generated.

$ \Omega $ -Generatedness of preorders on monads. We relate $ \Omega $ -generatedness of divergences and preorders on monads studied in Katsumata and Sato (Reference Katsumata, Sato and Pfenning2013). Let T be a monad on Set and $ \Omega $ be a set. Katsumata and Sato (Reference Katsumata, Sato and Pfenning2013) introduced the concept of congruent and substitutive preorders on $ T\Omega $ as those satisfying:

  • (Substitutivity) For any function $ f\colon \Omega\to T\Omega $ and $ c_1,c_2\in T\Omega $ , $ c_1 \leq c_2 $ implies $ f^\sharp(c_1) \leq f^\sharp(c_2) $ .

  • (Congruence) For any function $ f_1,f_2\colon J\to T\Omega $ , if $ f_1(x) \leq f_2(x) $ holds for any $ x\in J $ , then $ f_1^\sharp (c) \leq f_2^\sharp (c) $ holds for any $ c\in T\Omega $ .

For instance, any component of a preorder on T at $ \Omega $ forms a congruent and substitutive preorder on $ T\Omega $ . We write $ \mathbf{CSPre}(T,\Omega) $ for the set of all congruent and substitutive preorders on $ T\Omega $ , and $ \mathbf{Pre}(T) $ for the collection of all preorders on T. Katsumata and Sato (Reference Katsumata, Sato and Pfenning2013) gave a construction $ [-]^\Omega\colon \mathbf{CSPre}(T,\Omega)\to\mathbf{Pre}(T) $ of preorders on T from congruent and substitutive preorders on $ T\Omega $ :

\[ c_1 [\leq]^\Omega_J c_2 \iff \forall{g \colon J \to T\Omega}~.~g^\sharp (c_1) \leq g^\sharp (c_2) \]

The constructed preorders on T are $ \Omega $ -generated in the following sense:

Proposition 30. For any $ {\leq} \in \mathbf{CSPre}(T,\Omega) $ , the $ {\mathcal{B}} $ -divergence $ {\mathsf\Delta}^{[\leq]^\Omega}_{ }{} $ corresponding to the preorder $ {[\leq]}^\Omega $ on T is $ \Omega $ -generated (see Proposition 18 for the correspondence).

Applying this proposition, we can determine $ \Omega $ -generatedness of preorders on monads:

6.3 An adjunction between quantitative equational theories and divergences

Mardare et al. (Reference Mardare, Panangaden and Plotkin2016) introduced a concept of quantitative equational theory as an algebraic presentation of monads on the category of pseudometric spaces. A quantitative equational theory is an equational theory with indexed equations $ {t}=_{\varepsilon}{u} $ having the axioms of pseudometric spaces, plus suitable axioms reflecting properties of quantitative algebras. A quantitative equational theory determines a pseudometric on the set of $ \Sigma $ -terms.

Consider a set $ \Sigma $ of function symbols of finite arity. If n is the arity of a function $ f \in \Sigma $ , we write $ f \colon n \in \Sigma $ . Let $ \Omega $ be a set of variables, and let $ T_\Sigma \Omega $ be the $ \Sigma $ -term algebra over $ \Omega $ . For $ f \colon n \in \Sigma $ and $ t_1,\ldots,t_n \in T_\Sigma \Omega $ , we write $ f(t_1,\ldots,t_n) $ for the term obtained by applying f to $ t_1,\ldots,t_n $ . The construction $ \Omega \mapsto T_\Sigma \Omega $ forms a (strong) monad on Set whose unit is given by $ \eta_\Omega (x) = x $ , and whose Kleisli extension $ h^\sharp \colon T_\Sigma I \to T_\Sigma \Omega $ of function $ h \colon I \to T_\Sigma \Omega $ is given inductively by:

\[ h^\sharp (x) \triangleq h(x), \quad h^\sharp (f(t_1,\ldots,t_n)) \triangleq f(h^\sharp(t_1),\ldots,h^\sharp(t_n)). \]

A substitution of $ \Sigma $ -terms over $ \Omega $ is a function $ \sigma \colon \Omega \to T_\Sigma \Omega $ . For $ t \in T_\Sigma \Omega $ , we call $ \sigma^\sharp(t) $ the substitution of $ \sigma $ to t. We define the set of indexed equations of terms by:

\[ \mathbb V (T_\Sigma \Omega ) \triangleq \{ {t}=_{\varepsilon}{u} \mid t,u \in T_\Sigma \Omega , \varepsilon \in \mathbb Q^+ \}. \]

Here, the index $ \varepsilon $ runs over nonnegative rational numbers. A conditional quantitative equation is a judgment of the following form:

\[ \{{t_i}=_{\varepsilon_i}{u_i} \mid i \in I\} \vdash {t}=_{\varepsilon}{u} \qquad ( I \colon \text{countable}, {t_i}=_{\varepsilon_i}{u_i}, {t}=_{\varepsilon}{u} \in \mathbb V (T_\Sigma \Omega )); \]

the left-hand side of turnstile ( $ \vdash $ ) is called hypothesis and the right-hand side conclusion. By $ \mathbb E(T_\Sigma \Omega ) $ , we mean the set of conditional quantitative equations. For any countable subset $ \Gamma $ of $ \mathbb V (T_\Sigma \Omega ) $ and any substitution $ \sigma \colon \Omega \to T_\Sigma \Omega $ , we define $ \sigma(\Gamma) \triangleq \{ {\sigma^\sharp(t)}=_{\varepsilon} {\sigma^\sharp(u)}{ } \mid {t}=_{\varepsilon}{u} \in \Gamma \} $ .

Definition 31. Quantitative Equational Theory (Mardare et al., Reference Mardare, Panangaden and Plotkin2016, Definition 2.1). A quantitative equational theory (QET for short) of type $ \Sigma $ over $ \Omega $ is a set $ U \subseteq \mathbb E(T_\Sigma \Omega ) $ closed under the rules summarized as Figure 1. We write $ {\bf QET}(\Sigma ,\Omega ) $ for the set of QETs of type $ \Sigma $ over $ \Omega $ . We regard it as a poset $ ({\bf QET}(\Sigma ,\Omega ),\subseteq) $ by the set inclusion order. Given a set $ U_0 $ of conditional quantitative equations of type $ \Sigma $ over $ \Omega $ , by $ \overline{U_0}^{\mathrm{QET}({\Sigma},{ })} \Omega $ we mean the least QET of type $ \Sigma $ over $ \Omega $ including $ U_0 $ .

Figure 1. Quantitative equational theory rules.

The goal of this section is to establish an adjunction between the poset of quantitative equational theories and the poset of pseudometricsFootnote 6 on free-algebra monads on Set. More specifically, we construct the following adjunction and isomorphism between posets:

(8)

which are subsequently defined (Definition 32 for CSPMet, Definition 33 for PMet and equations (9)–(12) for morphisms). The poset in the middle is that of congruent and substitutive pseudometrics, which are a quantitative analog of congruent and substitutive preorders appeared in Section 6.2.

Definition 32. Let T be a monad on Set and $ \Omega \in {\bf Set} $ . A congruent and substitutive pseudometric (CS-PMet for short) on $ T\Omega $ is a pseudometric $ d \colon (T\Omega )^2 \to {\mathcal{R}}^+ $ on $ T\Omega $ satisfying

  • (Substitutivity) For all functions $ f\colon \Omega \to T\Omega $ and $ c_1,c_2\in T\Omega $ , $ d(f^\sharp(c_1),f^\sharp(c_2)) \leq d(c_1,c_2) $ .

  • (Congruence) For all sets I, functions $ f_1,f_2 \colon I\to T\Omega $ and $ c \in TI $ , $ d(f_1^\sharp(c),f_2^\sharp(c)) \leq \sup_{i \in I} d(f_1(i),f_2(i)) $ .

By $ {\bf CSPMet}(T, \Omega ) $ , we mean the set of CS-PMets on $ T\Omega $ . We then make it into a poset $ ({\bf CSPMet}(T, \Omega ),\preceq) $ where $ \preceq $ is the restriction of the partial order $ \preceq_{T\Omega} $ in Definition 2 to CS-PMets.

Definition 33. Let T be a monad on Set and $ \Omega \in {\bf Set} $ . By $ {\bf PMet}(T,\Omega ) $ we mean the collection of $ \Omega $ -generated $ \operatorname{Eq} $ -relative $ {\mathcal{R}}^+ $ -divergences $ {\mathsf\Delta} $ on T such that each component $ {\mathsf\Delta}^{}_{I} $ is a pseudometric. We restrict the partial order $ \preceq $ on $ {\bf Div}(T,\operatorname{Eq},1,{\mathcal{R}}^+) $ to $ {\bf PMet}(T,\Omega ) $ .

We next introduce monotone functions appearing in (8):

(9) \begin{align} D[U] (t,u) &\triangleq \inf \left\{ \varepsilon \in \mathbb Q^+ \mathrel{}\middle|\mathrel{} \emptyset \vdash {t}=_{\varepsilon}{u} \in U \right\} \label{eq:d} \end{align}
(10) \begin{align} U[d] &\triangleq \overline{\left\{\emptyset \vdash {t}=_{\varepsilon}{u } \mathrel{}\middle|\mathrel{} \varepsilon \in \mathbb Q^+, d (t,u) \leq \varepsilon \right\}}^{\mathrm{QET}({\Sigma},{ })} \Omega \end{align}
(11)
(12) \begin{align} ({\mathsf\Delta})_\Omega & \triangleq {\mathsf\Delta}_\Omega \label{eq:a} \end{align}

Proposition 34. The functions $ D[-],U[-],\mathrm{Gen},(-)_\Omega $ defined as (9)-(12) are all well-defined monotone functions having types given in (8).

That D[U] is a pseudometric is shown in the beginning of Mardare et al. (Reference Mardare, Panangaden and Plotkin2016, Section 5). Here, we additionally show that it enjoys congruence and substitutivity of Definition 32. The function Gen is taken from the right-hand side of the definition of $ \Omega $ -generatedness (Definition 27). The function $ (-)_\Omega $ simply extracts the $ \Omega $ component of a given divergence.

Theorem 35. For any set $ \Sigma $ of function symbols with finite arity and set $ \Omega $ , the following holds for the monotone functions in (8):

  1. (1) Gen is the inverse of $ (-)_\Omega $ .

  2. (2) We have an adjunction $ U[-]\dashv D[-] $ satisfying $ D[U[-]] = {\rm id} $ :

    (13)

Intuitively, the right adjoint $ D[-] $ extracts the pseudometric on $ T_\Sigma \Omega $ from a given QET. The left adjoint $ U[-] $ constructs the least QET containing all information of a given pseudometric on $ T_\Sigma \Omega $ . The range of $ U[-] $ is characterized as the set of unconditional QETs of type $ \Sigma $ over $ \Omega $ defined below (see also Mardare et al. (Reference Mardare, Panangaden and Plotkin2017, Section 3)):

\[ \mathbf{UQET}(\Sigma ,\Omega ) \triangleq \left\{ \overline{S}^{\mathrm{QET}({\Sigma},\Omega)} \mathrel{}\middle|\mathrel{} S \subseteq \{ \emptyset \vdash{t}=_{\varepsilon}{u} \mid t,u \in T_\Sigma \Omega , \varepsilon \in \mathbb Q^+ \} \right\}. \]

Therefore, the adjunction (13) cuts down to the following isomorphism between posets, stating that unconditional QETs of type $ \Sigma $ over $ \Omega $ are equivalent to $ \Omega $ -generated pseudometrics on $ T_\Sigma $ :

Theorem 36. $ (\mathbf{UQET}(\Sigma ,\Omega ),\subseteq) \cong({\bf CSPMet}(T_\Sigma , \Omega ),\preceq) \cong({\bf PMet}(T_\Sigma ,\Omega ),\preceq) $ .

7 Graded Strong Relational Liftings for Divergences

We have introduced the concept of divergence on monad for measuring quantitative difference between two computational effects. To integrate this concept with relational program logic, we employ a semantic structure called graded strong relational lifting of monad. It is introduced for the semantics of approximate probabilistic relational Hoare logic for the verification of differential privacy (Barthe et al., Reference Barthe, Köpf, Olmedo and Béguelin2012), then later used in various program logics (Barthe et al., Reference Barthe, Gaboardi, Arias, Hsu, Kunz and Strub2014, Reference Barthe, Gaboardi, Arias, Hsu, Roth and Strub2015; Barthe and Olmedo, Reference Barthe and Olmedo2013; Sato, Reference Sato2016; Sato et al., Reference Sato, Barthe, Gaboardi, Hsu and Katsumata2019). Independently, it is also introduced as a semantic structure for effect system (Katsumata, Reference Katsumata2014). Liftings introduced in the study of differential privacy are designed to satisfy a special property called fundamental property (Barthe et al., Reference Barthe, Köpf, Olmedo and Béguelin2012, Theorem 1): when we supply the equivalence relation to the lifting, it returns the adjacency relation of the divergence. This special property is the key to express the differential privacy of probabilistic programs in relational program logics.

In this paper, we present a general construction of graded strong relational liftings from divergences on monads. First, we recall its definition (Gaboardi et al., Reference Gaboardi, Katsumata, Orchard, Sato and Yoshida2021; Katsumata, Reference Katsumata2014).

Definition 37. Let $ ({\mathbb{C}},T) $ be a CC-SM and $ (M, \leq, 1, (\cdot)) $ be a grading monoid. An M-graded strong relational lifting $ \dot{T} $ of T is a mapping $ \dot{T} : M \times {\bf Obj}({\bf BRel}({\mathbb{C}})) \rightarrow {\bf Obj}({\bf BRel}({\mathbb{C}})) $ satisfying the following conditions:

  1. (1) $ p_{{\mathbb{C}}}(\dot{T} m X)=(T X_1, T X_2) $ , and $ m \leq m' $ implies $ \dot{T} m X \leq \dot{T} m' X $ .

  2. (2) $ (\eta_{X_1}, \eta_{X_2}) : X \mathbin{\dot\rightarrow} \dot{T} 1 (X) $ .

  3. (3) $ (f_1, f_2) : X \mathbin{\dot\rightarrow} \dot{T} m (Y) $ implies $ (f^\sharp_1, f^\sharp_2) : \dot{T} m' X \mathbin{\dot\rightarrow} \dot{T} (m \cdot m') Y $ .

  4. (4) $ (\theta_{X_1, Y_1}, \theta_{X_2, Y_2}) : X \mathbin{\dot\times} \dot{T} m Y \mathbin{\dot\rightarrow} \dot{T} m (X \mathbin{\dot\times} Y) $ .

Our interest is in the graded strong relational lifting that carries the information of a given divergence $ {\mathsf\Delta}\in{\bf Div}(T, E,M,{\mathcal{Q}}) $ . We formulate such liftings by the following fundamental property. First, we define the adjacency relation of $ {\mathsf\Delta} $ by:

(14) \begin{equation}\label{eq:divergence:adjacency} \tilde{{\mathsf\Delta}} (m, v) I \triangleq (T I, T I, \{ (c_1, c_2) \mid {\mathsf\Delta}^{m}_{I} (c_1, c_2) \leq v \}) \quad (m \in M, v \in {\mathcal{Q}} ,I \in {\mathbb{C}}). \end{equation}

Remark that $ \tilde{\mathsf\Delta} $ is monotone on m and v.

Definition 38. We say that an $ M\times {\mathcal{Q}} $ -graded strong relational lifting $ \dot T $ of T satisfies the fundamental property with respect to $ {\mathsf\Delta}\in{\bf Div}(T, E,M,{\mathcal{Q}}) $ if the following holds

\begin{equation*} \dot T (m, v) (E I) = \tilde{\mathsf\Delta} (m, v) I\quad (m\in M,v\in {\mathcal{Q}},I\in{\mathbb{C}}). \end{equation*}

Theorem 39. Let $ ({\mathbb{C}},T) $ be a CC-SM, $ (M, \leq, 1, (\cdot)) $ be a grading monoid, $ {\mathcal{Q}}=(Q,\le,0,(+)) $ be a divergence domain and $ {\mathsf\Delta} = \{{\mathsf\Delta}^{m}_{I} \colon (U(TI))^2\to {\mathcal{Q}} \}_{m\in M,I\in{\mathbb{C}}} $ be a doubly indexed family of $ {\mathcal{Q}} $ -divergences on TI satisfying monotonicity on m (Definition 6). Define the following mapping $ T^{[{\mathsf\Delta}]}:(M\times{\mathcal{Q}})\times{\bf Obj}({\bf BRel}({\mathbb{C}}))\to{\bf Obj}({\bf BRel}({\mathbb{C}})) $ :

\begin{align*} T^{[{\mathsf\Delta}]} (m,v) X\triangleq (TX_1,TX_2,\{(c_1,c_2) \mid &\forall{I\in{\mathbb{C}}, n\in M, w\in{\mathcal{Q}}, (k_1, k_2) : X\mathbin{\dot\rightarrow} \tilde{\mathsf\Delta} (n, w) I}~.~ \\ &\quad (k_1^\sharp \mathbin{\bullet} c_1, k_2^\sharp \mathbin{\bullet} c_2) \in \tilde{\mathsf\Delta} (m \cdot n, v + w) I\}) \end{align*}

  1. (1) The mapping $ T^{[{\mathsf\Delta}]} $ is an $ M \times {\mathcal{Q}} $ -graded strong relational lifting of T.

  2. (2) Let $ E:{\mathbb{C}}\to{\bf BRel}({\mathbb{C}}) $ be a basic endorelation. Then,

    (S)
    (C)

Intuitively, $ T^{[ \mathsf\Delta]} $ is a graded version of the codensity lifting (Katsumata et al., Reference Katsumata, Sato and Uustalu2018) of T along the specific fibration $ p_{{\mathbb{C}}}:{\bf BRel}({\mathbb{C}})\to{{\mathbb{C}}}^2 $ . We extend the codensity lifting with the grading mechanism in the same way as the graded $ \top\top $ -lifting in Katsumata (Reference Katsumata2014). The graded codensity lifting is also a generalization of the graded relational lifting for DP given in Sato (Reference Sato2016).

Proof. (Proof of (1)) Proving conditions 1–3 of graded strong relational lifting (Definition 37) are routine generalization of Katsumata et al. (Reference Katsumata, Sato and Uustalu2018) and Katsumata (Reference Katsumata2014, Section 5); thus omitted here (see Lemma 50 in appendix).

Condition 4 of Definition 37 needs a special attention because in general codensity lifting does not automatically lift strength. The current setting works because of our particular choice of the category of binary relations over $ {\mathbb{C}} $ . We prove condition 4 as follows. Since $ f_i \mathbin{\bullet} j = f \mathbin{\bullet} \langle i, j \rangle $ holds for any $ j\in UJ $ , we have the equivalence:

\begin{align*} (f, g) : X \mathbin{\dot\times} Y\mathbin{\dot\rightarrow} Z & \iff \forall (x, x') \in X, (y, y') \in Y . (f\mathbin{\bullet}{\langle x, y\rangle}, g\mathbin{\bullet}{\langle x', y'\rangle}) \in Z\\ & \iff \forall (x, x') \in X, (y, y') \in Y . \left( \left( f_{x} \right)\mathbin{\bullet}{y}, \left( g_{x'} \right)\mathbin{\bullet}{y'} \right) \in Z\\ & \iff \forall (x, x') \in X. (f_{x}, g_{x'}) \colon Y\mathbin{\dot\rightarrow} Z. \end{align*}

From this, condition 3 (law of graded Kleisli extension), and the equation (1) on the strength of a CC-SM, we prove condition 4 from condition 2 (unit law): for all $ m \in M $ and $ v \in {\mathcal{Q}} $ , we have

\begin{align*} & (\eta_{X_1\times Y_1}, \eta_{X_2\times Y_2}) \colon X\mathbin{\dot\times} Y\mathbin{\dot\rightarrow} T^{[{\mathsf\Delta}]} (1,0) (X \mathbin{\dot\times} Y)\\ & \iff \forall{(x, x') \in X}~.~ ((\eta_{X_1\times Y_1})_x, (\eta_{X_2\times Y_2})_{x'}) \colon Y\mathbin{\dot\rightarrow} T^{[{\mathsf\Delta}]} (1,0) (X \mathbin{\dot\times} Y)\\ & \implies \forall{(x, x') \in X}~.~ (((\eta_{X_1\times Y_1})_x)^\sharp, ((\eta_{X_2\times Y_2})_{x'})^\sharp) : T^{[{\mathsf\Delta}]} (m,v) Y\mathbin{\dot\rightarrow} T^{[{\mathsf\Delta}]} (m,v) (X \mathbin{\dot\times} Y)\\ &\iff \left(\begin{aligned} &\forall{(x, x') \in X, (c_1,c_2) \in T^{[{\mathsf\Delta}]} (m,v) Y}~.~\\ &\qquad (((\eta_{X_1\times Y_1})_x)^\sharp\mathbin{\bullet} c_1, ((\eta_{X_2\times Y_2})_{x'})^\sharp \mathbin{\bullet} c_2) \in T^{[{\mathsf\Delta}]} (m,v) (X \mathbin{\dot\times} Y) \end{aligned}\right)\\ &\iff \left(\begin{aligned} &\forall{(x, x') \in X, (c_1,c_2) \in T^{[{\mathsf\Delta}]} (m,v) Y}~.~ \\ &\qquad (\theta_{X_1, Y_1}\mathbin{\bullet} \langle x, c_1\rangle, \theta_{X_2, Y_2}\mathbin{\bullet}\langle x', c_2\rangle)\in T^{[{\mathsf\Delta}]} (m,v) (X \mathbin{\dot\times} Y) \end{aligned}\right)\\ & \iff \forall{(x,x') \in X}~.~ ((\theta_{X_1, Y_1})_x, (\theta_{X_2, Y_2})_{x'})\colon T^{[{\mathsf\Delta}]} (m,v) Y \mathbin{\dot\rightarrow} T^{[{\mathsf\Delta}]} (m,v) (X \mathbin{\dot\times} Y)\\ & \iff (\theta_{X_1, Y_1}, \theta_{X_2, Y_2}) \colon X \mathbin{\dot\times} T^{[{\mathsf\Delta}]} (m,v) Y\mathbin{\dot\rightarrow} T^{[{\mathsf\Delta}]} (m,v) (X\mathbin{\dot\times} Y). \end{align*}

(Proof of (2)-(S)) We show the equivalence of $ {\mathsf\Delta} $ being E-unit-reflexive and the implication:

(15) \begin{align} & \forall{I \in {\mathbb{C}},m\in M,v\in {\mathcal{Q}} ,c,c'\in U(TI)}~.~ \nonumber \\ &\qquad( \forall{J \in {\mathbb{C}},m'\in M,v'\in {\mathcal{Q}} ,(k,l) : EI\mathbin{\dot\rightarrow}\tilde{\mathsf\Delta} (m',v') J}~.~ {\mathsf\Delta}^{m\cdot m'}_{J}(k^\sharp\mathbin{\bullet} c,l^\sharp\mathbin{\bullet}{c'})\le v+v') \label{eq:middle} \\ &\qquad\qquad \implies {\mathsf\Delta}^{m}_{I}(c,c')\le v. \nonumber \end{align}

We suppose that the above implication holds. We fix $ I \in {\mathbb{C}} $ . Let $ (i,j) \in EI $ . By instantiating the whole implication with $ m=1,v=0,c=\eta_I\mathbin{\bullet} i,c'=\eta_I\mathbin{\bullet} j $ , the middle part of (15) becomes

\begin{equation*} \forall{J \in {\mathbb{C}},m'\in M,v'\in {\mathcal{Q}} ,(k,l) : EI\mathbin{\dot\rightarrow}\tilde{\mathsf\Delta} (m',v')J}~.~ {\mathsf\Delta}^{m'}_{J}(k\mathbin{\bullet} i,l\mathbin{\bullet} j)\le v', \end{equation*}

which is trivially true. Therefore, we conclude $ {\mathsf\Delta}^{m}_{I}(\eta_I\mathbin{\bullet} i,\eta_I\mathbin{\bullet} j)\le 0 $ for any $ (i,j) \in EI $ , that is, E-unit reflexivity holds.

Conversely, we suppose that $ {\mathsf\Delta} $ satisfies the unit reflexivity. We take I,m,v,c,c’ of appropriate type and assume the middle part of (15). By instantiating it with $ J=I,m'=1,v'=0,k=l=\eta_I $ , we conclude $ {\mathsf\Delta}^{m}_{I}(c,c')\le v $ .

(Proof of (2)-(C)) We show the equivalence of $ {\mathsf\Delta} $ being E-composable and the implication $ \forall{I \in {\mathbb{C}},m\in M,v\in {\mathcal{Q}}}~.~\tilde{\mathsf\Delta} (m,v) I\le T^{[ {\mathsf\Delta}]} (m,v)(EI) $ as follows:

The first two equivalences are obtained by expanding the definitions of $ {\bf BRel}({\mathbb{C}}) $ , $ T^{[ {\mathsf\Delta}]} {} $ , and $ \tilde{\mathsf\Delta} $ , and the last two equivalences hold because $ {\mathcal{Q}} $ is a divergence domain.

The construction of $ T^{[ {\mathsf\Delta}]} {} $ gives the greatest relational lifting of T with the fundamental property.

Proposition 40. Let $ ({\mathbb{C}},T) $ be a CC-SM, $ E:{\mathbb{C}}\to{\bf BRel}({\mathbb{C}}) $ be a basic endorelation, $ (M, \leq, 1, (\cdot)) $ be a grading monoid, $ {\mathcal{Q}} $ be a divergence domain, $ {\mathsf\Delta}\in{\bf Div}(T, E,M,{\mathcal{Q}}) $ be a divergence, and $ \dot T $ be an $ M \times {\mathcal{Q}} $ -graded relational lifting satisfying the fundamental property with respect to $ {\mathsf\Delta} $ . Then, for all $ m \in M $ , $ q \in {\mathcal{Q}} $ , and $ X \in {\bf BRel}({\mathbb{C}}) $ , the following inequality holds

\[ \dot T (m,v) X \leq T^{[ {\mathsf\Delta}]} {} (m,v) X. \]

7.1 Simplifying codensity liftings by $ \Omega $ -generatedness of divergences

We show that the calculation of the codensity lifting $ T^{[ {\mathsf\Delta}]} {} $ can be simplified when $ {\mathsf\Delta} $ is $ \Omega $ -generated. For an object $ I\in{\mathbb{C}} $ , we define $ T^{[ {\mathsf\Delta}],I} $ by:

\begin{align*} &(c_1,c_2) \in T^{[ {\mathsf\Delta}],I} (m,v) X\\ &\iff \forall{n, w, (k_1, k_2) \colon X \mathbin{\dot\rightarrow} \tilde{\mathsf\Delta} (n, w) I}~.~ (k_1^\sharp \mathbin{\bullet} c_1, k_2^\sharp \mathbin{\bullet} c_2) \in \tilde{\mathsf\Delta} (m \cdot n, v + w) I. \end{align*}

The original calculation of $ T^{[{\mathsf\Delta} ]} {} $ is a large intersection $ T^{[ {\mathsf\Delta}]} = \bigwedge_{I\in {\mathbb{C}}} T^{[{\mathsf\Delta} ], I} $ where I runs over all $ {\mathbb{C}} $ -objects, while if $ {\mathsf\Delta} $ is $ \Omega $ -generated, the parameter I can be fixed to $ \Omega $ .

Proposition 41. For any $ \Omega $ -generated divergence $ {\mathsf\Delta}\in{\bf Div}(T, E,M,{\mathcal{Q}}) $ , we have $ T^{[ {\mathsf\Delta}]} {} = T^{[{\mathsf\Delta}], \Omega} $ .

Proof. We show the equivalence $ T^{[{\mathsf\Delta} ]} X = T^{[{\mathsf\Delta}], \Omega} X $ for each $ X \in {\bf BRel}({\mathbb{C}}) $ .

( $ \leq $ ) Immediate from $ T^{[{\mathsf\Delta} ]} = \bigwedge_{I\in {\mathbb{C}}} T^{[ {\mathsf\Delta}], I} $ .

( $ \geq $ ) By the $ \Omega $ -generatedness of $ {\mathsf\Delta} $ , for all $ I \in {\mathbb{C}} $ and $ c'_1,c'_2 \in U(TI) $ , we have

\[ (c'_1,c'_2) \in \tilde{\mathsf\Delta} (m', v') I \iff \forall{k \colon I \to T\Omega}~.~(k^\sharp \mathbin{\bullet} c'_1,k^\sharp \mathbin{\bullet} c'_2) \in \tilde{\mathsf\Delta} (m', v') \Omega \]

Therefore, for any $ (c_2,c_2) \in U(TX_1) \times U(TX_2) $ , we have

This completes the proof.

For example, the generatedness of $ \mathsf{DP} $ shown in Section 6.2 implies that $ G^{[ {\mathsf{DP}}]} = G^{[\mathsf{DP}], 2} $ and $ G_{s}^{[ {\mathsf{DP}}]} = G_{s}^{[ \mathsf{DP}], 1} $ . In fact, the simplification $ G_{s}^{[ \mathsf{DP}, 1]} $ is equal to the $ ({\mathcal{R}}^+)^2 $ -graded relational lifting $ G_{s}^{\top\top} $ for DP given in Sato (Reference Sato2016, Section 2.2), which is defined by, for each $ (X_1,X_2,R_X) \in {\bf BRel}({\bf Meas}) $ :

\begin{align*} &G_{s}^{{\top\top}}(\varepsilon,\delta)(X_1,X_2,R_X)\\ &\triangleq (G_{s}(X_1),G_{s}(X_2),\{(\nu_1,\nu_2) \mid \forall{A \in \Sigma_{X_1}, B \in \Sigma_{X_2}}~.~ R_X(A) \subseteq B \implies \nu_1(A) \leq \exp(\varepsilon) \nu_2(B) +\delta\}). \end{align*}

For detail, see the proof of equalities (†) and (‡) in the proof of Theorem 2.2 (iv) in Sato (Reference Sato2016).

7.2 Two lifting approaches: Codensity and coupling

We briefly compare two lifting approaches: graded codensity lifting and coupling-based lifting, and the latter of which is employed in Barthe et al. (Reference Barthe, Köpf, Olmedo and Béguelin2012), Barthe and Olmedo (Reference Barthe and Olmedo2013), Barthe et al. (Reference Barthe, Gaboardi, Arias, Hsu, Kunz and Strub2014), Barthe et al. (Reference Barthe, Gaboardi, Arias, Hsu, Roth and Strub2015), and Sato et al. (Reference Sato, Barthe, Gaboardi, Hsu and Katsumata2019).

We compare the role of the unit reflexivity and composability in each lifting approaches. Consider the CCC-SM $ ({\bf Set},D) $ , where D is the probability distribution monad. Given an $ \operatorname{Eq} $ -relative M-graded $ {\mathcal{Q}} $ -divergence $ {\mathsf\Delta} $ on D, the coupling-based graded lifting is defined by:

(16) \begin{equation} \label{eq:coupling} \dot{D}^{{\mathsf\Delta}} (m,v) X \triangleq (D X_1,D X_2,\{ (D p_1 \mathbin{\bullet} \mu_1,D p_2 \mathbin{\bullet} \mu_2) \mid (\mu_1,\mu_2) \in (D R_X)^2, {\mathsf\Delta}^{m}_{R_X} (\mu_1,\mu_2) \leq v \} ) \end{equation}

where $ p_i\colon R_X\rightarrow X_i $ is the projection ( $ i=1,2 $ ) from the binary relation. The pair $ (\mu_1,\mu_2) $ of probability distributions collected in the right-hand side of (16) is called a coupling.

The fundamental property $ \dot D^{\mathsf\Delta}(\operatorname{Eq} I)=\tilde{\mathsf\Delta}(m,v)I $ immediately follows from the definition of $ \dot D^{\mathsf\Delta} $ , while the composability and unit reflexivity of $ {\mathsf\Delta} $ are used to make $ \dot D^{{\mathsf\Delta}} $ a strong $ M\times {\mathcal{Q}} $ -graded lifting (Barthe and Olmedo, Reference Barthe and Olmedo2013, Proposition 9). On the other hand, the codensity graded lifting $ D^{[ {{\mathsf\Delta}}]} $ is always an $ M\times {\mathcal{Q}} $ -graded lifting; this does not rely on the unit reflexivity and composability of $ {\mathsf\Delta} $ (Proposition 1). These properties are used to show that $ D^{[ {{\mathsf\Delta}}]} $ satisfies the fundamental property (Proposition 2).

The coupling-based lifting (16) can be naturally generalized to any Set-monad T. However, at this momen, we do not know how to generalize the coupling technique to any CC-SM $ ({\mathbb{C}},T) $ . As the prior study by Sato et al. (Reference Sato, Barthe, Gaboardi, Hsu and Katsumata2019) pointed out, there is already a difficulty in extending it to the CC-SM $ ({\bf Meas},G) $ , where G is the Giry monad.

We illustrate how the problem arises. Let $ X \in {\bf BRel}({\bf Meas}) $ . We would like to pick two probability measures over $ R_X $ as couplings, but $ R_X $ is merely a set. We therefore equip it with the subspace $ \sigma $ -algebra of $ X_1\times X_2 $ , and let $ H_X $ be the derived measurable space (hence $ |H_X|=R_X $ ). We write $ p_i:H_X \rightarrow X_i $ for measurable projections ( $ i=1,2 $ ). We then define a candidate $ M\times {\mathcal{Q}} $ -graded lifting of G by:

\[ \dot{G} (m,v) X = (G X_1,G X_2,\{ (G p_1 \mathbin{\bullet} \mu_1,G p_2 \mathbin{\bullet} \mu_2) \mid (\mu_1,\mu_2) \in (U(G H_X))^2, {\mathsf\Delta}^{m}_{H_X} (\mu_1,\mu_2) \leq v \}). \]

We now verify that $ \dot{G} $ also lifts the Kleisli extension of G, that is,

\begin{equation*} (f, g)\colon Y\mathbin{\dot\rightarrow} \dot{G} (m',v') X \implies (f^\sharp, g^\sharp) \colon \dot{G} (m,v) Y\to \dot{G} (mm', v+v') X. \end{equation*}

Let $ (f, g)\colon Y\mathbin{\dot\rightarrow} \dot{G} (m',v') X $ be pair of measurable functions. Then for each $ (x,y) \in R_Y $ , we have $ (f \mathbin{\bullet} x, g \mathbin{\bullet} y) \in R_{\dot G(m,v) X} $ . Therefore, there exists $ (\mu^{(x,y)}_1,\mu^{(x,y)}_2) \in (UG H_X)^2 $ such that $ G\pi_1 \mathbin{\bullet} \mu^{(x,y)}_1 = f \mathbin{\bullet} x $ and $ G\pi_2 \mathbin{\bullet} \mu^{(x,y)}_2 = g \mathbin{\bullet} y $ . Using the axiom of choice, we turn this relationship into functions $ \mu_1,\mu_2\colon R_Y \to UG H_X $ . If they were measurable functions of type $ H_Y \rightarrow G H_X $ , then from the composability of $ {\mathsf\Delta} $ , we would have $ {\mathsf\Delta}^{mm'}_{H_X}(\mu^\sharp_1 \mathbin{\bullet} w_1, \mu^\sharp_2 \mathbin{\bullet} w_2) \leq v + v' $ for $ w_1, w_2 \in U(G H_Y) $ such that $ {\mathsf\Delta}^{m'}_{H_Y}(w_1, w_2) \leq v' $ . This gives $ (f^\sharp, g^\sharp) \colon \dot{G} (m,v) Y\mathbin{\dot\rightarrow} \dot{G} (mm', v+v') X $ . However, in general, ensuring the measurability of $ \mu_1,\mu_2 $ is not possible, especially because they are picked up by the axiom of choice. A solution given in Sato et al. (Reference Sato, Barthe, Gaboardi, Hsu and Katsumata2019) is to use the category $ {\bf Span}({\bf Meas}) $ of spans that guarantees the existence of good measurable functions $ h_1, h_2 \colon H_Y \rightarrow G H_X $ .

8. Approximate Computational Relational Logic

We introduce a program logic called approximate computational relational logic (acRL for short). It is a combination of Moggi’s computational metalanguage and a relational refinement type system (Barthe et al., Reference Barthe, Gaboardi, Arias, Hsu, Roth and Strub2015). The strong graded relational lifting of a monad constructed from a divergence will be used to relationally interpret monadic types, and gradings give upper bounds of divergences between computational effects caused by two programs. acRL is similar to the relational refinement type system HOARe2 (Barthe et al., Reference Barthe, Gaboardi, Arias, Hsu, Roth and Strub2015), which is designed for verifying differential privacy of probabilistic programs. Compared to HOARe2, acRL supports general monads and divergences, while it does not support dependent products nor nontermination.

The relational logic acRL adopts the extensional approach (Nielson and Nielson, Reference Nielson and Nielson2007, Chapter 9.2):

  • Relational assertions between contexts $ \Gamma $ and $ \Delta $ are defined as binary relations between $ U[\![ \Gamma]\!] $ and $ U[\![ \Delta]\!] $ , or equivalently $ {\bf BRel}({\mathbb{C}}) $ -objects $ \phi $ such that $ p_{{\mathbb{C}}}(\phi)=([\![ \Gamma]\!],[\![ \Delta]\!]) $ . Logical connectives and quantifications are defined as operations on such $ {\bf BRel}({\mathbb{C}}) $ -objects. This is in contrast to the standard design of logic where assertions are defined by a BNF.

  • Let $ \Gamma \vdash M : \tau $ and $ \Delta \vdash N : \sigma $ be well-typed terms, $ \phi $ be a relational assertion between $ \Gamma,\Delta $ , and $ \psi $ be an assertion between $ \tau, \sigma $ . The main concern of acRL is the statement “ $ \forall (\gamma, \delta) \in \phi . ([\![ M]\!] \mathbin{\bullet} \gamma, [\![ N]\!] \mathbin{\bullet} \delta) \in \psi $ ” (equivalently $ ([\![ M]\!],[\![ N]\!]):\phi\mathbin{\dot\rightarrow}\psi $ ). In this section, this statement is denoted by $ \phi \vdash (M, M') : \psi $ .

  • Inference rules of the logic consists of the facts about the statement $ \phi \vdash (M, M') : \psi $ . We remark that in the standard logic, proving these facts corresponds to the soundness of inference rules.

8.1 Moggi’s computational metalanguage

8.1.1 Syntax of the computational metalanguage

For the higher-order programming language, we adopt Moggi’s computational metalanguage (Moggi, Reference Moggi1991).It is an extension of the simply typed lambda calculus with monadic types. For a set B, we define the set $ {\bf Typ}(B) $ of types over B by the first BNF in Figure 2. We then define the set $ {\bf Typ}_1(B) $ of first-order types to be the subset of $ {\bf Typ}(B) $ consisting only of $ b,1,\times,+ $ .

Figure 2. Syntax of types and raw terms of the computational metalanguage.

We next introduce computational signatures for specifying constants in the computational metalanguage. A computational signature is a tuple $ (B,\Sigma_v,\Sigma_e) $ where B is a set of base types, and $ \Sigma_v $ and $ \Sigma_e $ are functions whose range is $ {\bf Typ}_1(B)^2 $ . The domains of $ \Sigma_v,\Sigma_e $ are sets of value operation symbols and effectful operation symbols and are denoted by $ O_v $ and $ O_e $ , respectively. These functions assign input and output types to these operations.

Fix a countably infinite set V of variables. A context is a function from a finite subset of V to $ {\bf Typ}(B) $ ; contexts are often denoted by capital Greek letters $ \Gamma,\Delta $ . For contexts $ \Gamma,\Delta $ such that $ {\rm dom}(\Gamma)\cap{\rm dom}(\Delta)=\emptyset $ , by $ \Gamma,\Delta $ we mean the join of $ \Gamma $ and $ \Delta $ .

The set of raw terms is defined by the second BNF in Figure 2. The type system of the computational metalanguage has judgments of the form $ \Gamma\vdash M:\tau $ , where $ \Gamma $ is a context, M is a raw term, and $ \tau $ is a type. It adopts the standard rules for products, coproducts, implications, and monadic types; see for example, Moggi (Reference Moggi1991). The typing rules for value operations and effectful operations are given by:

A simultaneous substitution $ \theta $ from $ \Gamma $ to $ \Gamma' $ (written $ \Gamma \vdash \theta \colon \Gamma' $ ) is a function $ \theta $ from the set $ {\rm dom}(\Gamma') $ to raw terms that assigns to each variable $ x\in{\rm dom}(\Gamma') $ a well-typed raw term $ \Gamma\vdash\theta(x):\Gamma'(x) $ . The application of $ \theta $ to a term $ \Gamma'\vdash M:\tau $ is denoted by $ M\theta $ , which has a typing $ \Gamma\vdash M\theta:\tau $ . For disjoint contexts $ \Gamma_i $ ( $ i=1,2 $ ), we define the projection substitutions $ \Gamma_1,\Gamma_2\vdash\pi_{i}^{\Gamma_1,\Gamma_2}:\Gamma_i $ by $ \pi_{i}^{\Gamma_1,\Gamma_2}(x)=x $ .

8.1.2 Categorical semantics of the computational metalanguage

The interpretation of the computational metalanguage over a computational signature $ (B,\Sigma_v,\Sigma_e) $ is given by the data specified by Figure 3.

Figure 3. Data for the categorical semantics of metalanguage.

We first inductively extend the interpretation of base types to all types using the bi-Cartesian closed structure and the monad. Next, for each context $ \Gamma $ , we fix a product diagram $ ([\![ \Gamma]\!],\{\pi_x:[\![ \Gamma]\!]\to[\![ \Gamma(x)]\!]\}_{x\in{\rm dom}(\Gamma)}) $ ; when $ {\rm dom}(\Gamma)=\{x\} $ , we assume that $ [\![ \Gamma]\!]=[\![ \Gamma(x)]\!] $ with $ \pi_x={\rm id} $ . Lastly, we interpret a typing derivation of $ \Gamma\vdash M:\tau $ as a morphism $ [\![ M]\!]:[\![ \Gamma]\!]\to[\![ \tau]\!] $ in the standard way, using the interpretations of operations given in Figure 3. We further extend this to the interpretation of each simultaneous substitution $ \Gamma\vdash\theta:\Gamma' $ as a morphisms $ [\![ \theta]\!]:[\![ \Gamma]\!]\to[\![ \Gamma']\!] $ .

8.2 Approximate relational computational logic

8.2.1 Relational logic in external form

A relational assertion $ \phi $ between disjoint contexts $ \Gamma $ and $ \Delta $ is a binary relation between $ U[\![ \Gamma]\!] $ and $ U[\![ \Delta]\!] $ . Such a relational assertion is denoted by $ {}^{\Gamma}_{\Delta}\vdash \phi $ . We identify it as a $ {\bf BRel}({\mathbb{C}}) $ -object $ \phi $ such that $ p_{{\mathbb{C}}}(\phi)=([\![ \Gamma]\!],[\![ \Delta]\!]) $ . Similarly, a relational assertion between types $ \tau $ and $ \sigma $ is defined to be a relational assertion $ {}^{u:\tau}_{d:\sigma}\vdash \phi $ ; here u and d are reserved and fixed variables, respectively.

Relational assertions between contexts $ \Gamma $ and $ \Delta $ carry a boolean algebra structure $ \wedge,\vee,\neg $ given by the set-intersection, set-union, and set-complement (see the boolean algebra $ {\bf BRel}({\mathbb{C}})_{([\![ \Gamma]\!],[\![ \Delta]\!])} $ in Section 2.2). The pseudo-complement $ \phi\Rightarrow\psi $ is defined to be $ \neg \phi\vee\psi $ . For $ {}^{\Gamma,x:\tau}_{\Delta,y:\sigma}\vdash \phi $ , by $ {}^{\Gamma}_{\Delta}\vdash \forall^{x}_{y}~.~\phi $ and $ {}^{\Gamma}_{\Delta}\vdash \exists^{x}_{y}~.~\phi $ we mean the relational assertions defined by the following equivalence:

\begin{align*} (\gamma,\delta)\in \forall^{x}_{y}~.~\phi \iff& \forall{\gamma'\in U[\![ \Gamma,x:\tau]\!],\delta'\in U[\![ \Delta,y:\sigma]\!]}~.~ \\&\quad\quad ([\![ \pi_1^{\Gamma,x:\tau}]\!]\mathbin{\bullet}\gamma'=\gamma)\wedge ([\![ \pi_1^{\Delta,y:\sigma}]\!]\mathbin{\bullet}\delta'=\delta) \Rightarrow (\gamma',\delta')\in\phi\\ (\gamma,\delta)\in \exists^{x}_{y}~.~\phi \iff& \exists{\gamma'\in U[\![ \Gamma,x:\tau]\!],\delta'\in U[\![ \Delta,y:\sigma]\!]}~.~ \\&\quad\quad ([\![ \pi_1^{\Gamma,x:\tau}]\!]\mathbin{\bullet}\gamma'=\gamma)\wedge ([\![ \pi_1^{\Delta,y:\sigma}]\!]\mathbin{\bullet}\delta'=\delta)\wedge (\gamma',\delta')\in\phi \end{align*}

The boolean algebra structure and the above quantifier operations allow us to interpret first-order logical formulas as relational assertions; we omit its detail here. In addition to these standard logical connectives, we will use graded strong relational lifting $ T^{[{\mathsf\Delta}]} $ to form relational assertions. That is, for any basic endorelation $ E:{\mathbb{C}}\to{\bf BRel}({\mathbb{C}}) $ , grading monoid M, divergence domain $ {\mathcal{Q}} $ and divergence $ {\mathsf\Delta}\in{\bf Div}(T, E,M,{\mathcal{Q}}) $ , we obtain a relational assertion $ {}^{u:\mathtt{T} \tau}_{d:\mathtt{T} \sigma}\vdash T^{[{\mathsf\Delta}]}(m,v)\phi $ from any $ {}^{u:\tau}_{d:\sigma}\vdash \phi $ , $ m\in M $ and $ v\in {\mathcal{Q}} $ .

For substitutions $ \Gamma\vdash\theta:\Gamma',\Delta\vdash{\theta'}:\Delta' $ and an assertion $ {}^{\Gamma}_{\Delta}\vdash \phi $ , by $ {}^{\Gamma'}_{\Delta'}\vdash \phi[{\theta};{\theta'}] $ , we mean the relational assertion $ \{(\gamma,\delta) \mid ([\![ \theta]\!]\mathbin{\bullet}\gamma,[\![ \theta']\!]\mathbin{\bullet}\delta)\in\phi\} $ . For disjoint context pairs $ \Gamma,\Gamma' $ and $ \Delta,\Delta' $ and relational assertions $ {}^{\Gamma}_{\Delta}\vdash \phi $ and $ {}^{\Gamma'}_{\Delta'}\vdash \psi $ , by the juxtaposition $ {}^{\Gamma,\Gamma'}_{\Delta,\Delta'}\vdash \phi,\psi $ , we mean the relational assertion $ {}^{\Gamma,\Gamma'}_{\Delta,\Delta'}\vdash \phi[{\pi_1^{\Gamma,\Gamma'}};{\pi_1^{\Delta,\Delta'}}]\wedge \psi[{\pi_2^{\Gamma,\Gamma'}};{\pi_2^{\Delta,\Delta'}}] $ .

8.2.2 Inference rules for acRL

For well-typed computational metalanguage terms $ \Gamma\vdash M:\tau $ and $ \Delta\vdash N:\sigma $ , and relational assertions $ {}^{\Gamma}_{\Delta}\vdash \phi $ and $ {}^{u:\tau}_{d:\sigma}\vdash \psi $ , by the judgment:

\begin{equation*} \phi\vdash(M,N)\colon \psi \end{equation*}

we mean the inclusion $ \phi\subseteq \psi[{[M/u]};{[N/d]}] $ of binary relations. This is equivalent to $ ([\![ M]\!], [\![ N]\!]):\phi\mathbin{\dot\rightarrow}\psi $ . We show basic facts about judgments $ \phi\vdash(M,N)\colon \psi $ .

Proposition 42.

  1. (1) $ \phi\vdash(M,N)\colon {\psi} $ and $ [\![ M]\!]=[\![ M']\!] $ and $ [\![ N]\!]=[\![ N']\!] $ implies $ \phi\vdash{}(M',N')\colon \psi $ .

  2. (2) $ \phi\vdash(M,N)\colon \psi $ and $ \phi'\subseteq \phi $ and $ \psi\subseteq \psi' $ implies $ \phi'\vdash{}(M,N)\colon \psi' $ .

  3. (3) $ \phi\vdash(M,N)\colon {T^{[{\mathsf\Delta}]}(m,v)\psi} $ and $ m\le n $ and $ v\le w $ and $ \psi\le \psi' $

  4. (4) implies $ \phi\vdash(M,N)\colon {T^{[{\mathsf\Delta}]}(n,w){\psi'}} $ .

  5. (5) $ \phi\vdash{}( M,N)\colon \psi $ implies $ \phi\vdash{}(\mathop{\mathtt{ret}}(M),\mathop{\mathtt{ret}}(N))\colon T^{[{\mathsf\Delta}]}(1,0){\psi} $ .

  6. (6) $ \phi\vdash(M,N)\colon {T^{[{\mathsf\Delta}]}(m,v)\psi} $ and $ \phi,\psi[{[x/u]};{[x'/d]}]\vdash{}(M',N')\colon {T^{[{\mathsf\Delta}]}(n,w)\rho} $

  7. (7) implies $ \phi\vdash{}(\mathop{\mathtt{let}}{x=M}\mathbin{\mathtt{in}}M',\mathop{\mathtt{let}}{x'=N}\mathbin{\mathtt{in}}N')\colon T^{[{\mathsf\Delta}]}(m\cdot n,v\cdot w)\rho $ .

We next establish relational judgments on effectful operations. We present a convenient way to establish such judgments using the fundamental property of the graded relational lifting $ T^{[{\mathsf\Delta}]} $ .

Proposition 43. For any $ c\in O_e $ such that $ \Sigma_e(c)=(b,b') $ , relational assertion $ {}^{u:b}_{d:b}\vdash \phi $ and $ m\in M $ , putting $ v= \sup \{{\mathsf\Delta}^{m}_{[\![ b']\!]} ([\![ c]\!] \mathbin{\bullet} x, [\![ c]\!] \mathbin{\bullet} y) \mid (x, y) \in \phi \} $ , we have $ \phi\vdash{}(c(u),c(d))\colon T^{[{\mathsf\Delta}]}(m,v)(E[\![ b']\!]) $ .

Proof. Take an arbitrary pair $ (x, y) \in \phi $ . We have $ {\mathsf\Delta}^{m}_{[\![ b']\!]} ([\![ c]\!] \mathbin{\bullet} x, [\![ c]\!] \mathbin{\bullet} y) \leq v $ by definition of v. Thanks to the fundamental property of $ T^{[{\mathsf\Delta} ]} $ (Theorem 39), it is equivalent to $ ([\![ c]\!] \mathbin{\bullet} x, [\![ c]\!] \mathbin{\bullet} y) \in T^{[{\mathsf\Delta}]} (m,v) (E [\![ b']\!]) $ .

9. Case Study I: Higher-Order Probabilistic Programs

We represent a higher-order probabilistic programming language with sampling commands from continuous distributions as a computational metalanguage. For now, we assume that the language supports sampling from Gaussian distributions and Laplace distributions. This computational metalanguage is specified by the computational signature:

\[ \mathcal C= (\{\mathtt{R}\}, \Sigma_v, \{\mathtt{norm}:(\mathtt{R}\times\mathtt{R},\mathtt{R}),~\mathtt{lap}:(\mathtt{R}\times\mathtt{R},\mathtt{R})\}), \]

where $ \Sigma_v $ is some chosen signature for value operations over reals. We interpret this computational metalanguage by filling Figure 3 as follows:

  1. (1) for the CCC-SM, we take $ ({\mathbb{C}},T)=({\bf QBS},P) $ (see Section 2.1),

  2. (2) for the interpretation $ [\![ \mathtt{R}]\!] $ of $ \mathtt{R} $ , we take the quasi-Borel space $ K{\mathbb{R}} $ associated with the standard Borel space $ {\mathbb{R}} $ , where $ K \colon {\bf Meas} \to {\bf QBS} $ is defined in Section 2.1,

  3. (3) the interpretation of value operations is given as expected (we omit it here); for example when $ \Sigma_v $ contains the real number addition operator $ + $ as type $ (\mathtt{R} \times \mathtt{R},\mathtt{R}) $ , its interpretation is the QBS morphism $ [\![ +]\!](x,y)=x + y: [\![ \mathtt{R} \times \mathtt{R}]\!]\rightarrow[\![ \mathtt{R}]\!] $ ,

  4. (4) for the interpretation of effectful operations, we put

    \begin{align*} [\![ \mathtt{norm}]\!](x,\sigma) &=[\mathrm{id},\mathcal{N}(x,\sigma^2)]_{\sim_{K{\mathbb{R}}}}, & [\![ \mathtt{lap}]\!](x,\lambda) &=[\mathrm{id},\mathrm{Lap}(x,\lambda)]_{\sim_{K{\mathbb{R}}}}. \end{align*}

Here, $ \mathcal{N}(x,\sigma^2) \in G{\mathbb{R}} $ is the Gaussian distribution with mean x and variance $ \sigma^2 $ . $ \mathrm{Lap}(x,\lambda) \in G{\mathbb{R}} $ is the Laplacian distribution with mean x and variance $ 2\lambda^2 $ Footnote 7. Every probability (Borel-)measure $ \mu \in G {\mathbb{R}} $ on $ {\mathbb{R}} $ can be converted to the probability measure $ [\mathrm{id},\mu]_{\sim_{K{\mathbb{R}}}} \in P K {\mathbb{R}} $ on the quasi-Borel space $ K {\mathbb{R}} $ (see Section 5.5).

9.1 A relational logic for differential privacy

To formulate differential privacy and its relaxations in the quasi-Borel setting, we convert statistical divergences $ {\mathsf\Delta} $ on the Giry monad G in Table 2 to $ \operatorname{Eq}{} $ -relative divergences $ \langle{L,l}\rangle^*{\mathsf\Delta} $ on the probability monad P on QBS by the construction in Section 5.5. Then, we construct the graded relational lifting $ P^{[ {\langle{L,l}\rangle^*{\mathsf\Delta}}]} $ by Theorem 39. Using this, as an instantiation of acRL, we build a relational logic reasoning about differential privacy and its relaxations, supporting higher-order programs and continuous random samplings. Basic proof rules can be given by Proposition 42.

For effectful operations, we import basic proof rules on noise-adding mechanisms given in prior studies (Bun et al., Reference Bun, Dwork, Rothblum and Steinke2018; Dwork et al., Reference Dwork, McSherry, Nissim and Smith2006; Dwork and Roth, Reference Dwork and Roth2013; Mironov, Reference Mironov2017) via Theorem 12 and Proposition 43. For example, consider the $ \operatorname{Eq} $ -relative $ {\mathcal{R}}^+ $ -graded $ {\mathcal{R}}^+ $ -divergence $ {\mathsf\Delta} = \langle{L,l}\rangle^*\mathsf{DP} $ on P. Proposition 43 with an effectful operation $ c=\mathtt{lap} $ and a relational assertion (below we identify global elements in $ K{\mathbb{R}} $ and real numbers):

\begin{equation*} {}^{u:\mathtt{R}\times\mathtt{R}}_{d:\mathtt{R}\times\mathtt{R}}\vdash \phi = \{(\langle x,1/\varepsilon\rangle,\langle y,1/\varepsilon\rangle) \mid |x-y| \leq 1\}, \end{equation*}

together with Theorem 12 and the prior result (Dwork et al., Reference Dwork, McSherry, Nissim and Smith2006, Example 1) yields the following judgment:

\begin{equation*} \phi\vdash{}({\mathtt{lap}(u)}, {\mathtt{lap})\colon (d)} {P^{[ {\langle{L,l}\rangle^*\mathsf{DP}}]} (0,\epsilon) (\operatorname{Eq}{K {\mathbb{R}}})}. \end{equation*}

By letting $ \mathrm{diff}_r $ be the relational assertion $ {}^{u:\mathtt{R}}_{d:\mathtt{R}}\vdash \{(x,y) \mid |x-y|\le r\} $ , the above judgment is equivalent to:

(17) \begin{equation} \mathrm{diff}_1\vdash{}(\mathtt{lap}(u,1/\epsilon), {\mathtt{lap}(d,1/\epsilon))})\colon {P^{[{ \langle{L,l}\rangle^*\mathsf{DP}}]} (0,\epsilon) (\operatorname{Eq}{K {\mathbb{R}}})}. \label{eq:judgment:laplace} \end{equation}

This rule corresponds to the rule [LapGen] of the program logic apRHL+ (Barthe et al., Reference Barthe, Grégoire, Hsu and Strub2017) for differential privacy. For another example, by the reflexivity of $ \mathsf{DP} $ , $ \langle{L,l}\rangle^*\mathsf{DP} $ is also reflexive. Hence, by letting $ \mathrm{succ}_r $ be the relational assertion $ {}^{u:\mathtt{R}}_{d:\mathtt{R}}\vdash \{(x,y) \mid y=x+r\} $ , we obtain the following judgments:

(18) \begin{align} & \mathrm{succ}_1\vdash{}(\mathtt{lap}(u,\lambda), \mathtt{lap}(d,\lambda))\colon {P^{[ { \langle{L,l}\rangle^*\mathsf{DP}}]} (0,0) (\mathrm{succ}_1)} \label{eq:judgment:DP:laplace:slide} \end{align}
(19) \begin{align} & \mathrm{succ}_1\vdash{}(\mathtt{norm}(u,\sigma),\colon \mathtt{norm}(d,\sigma) ) {P^{[{ \langle{L,l}\rangle^*\mathsf{DP}}]} (0,0) (\mathrm{succ}_1)} \label{eq:judgment:DP:gauss:slide}. \end{align}

The judgment (18) corresponds to [LapNull] of apRHL+. Similarly, the following judgments about the DP, Rényi-DP, and zero-concentrated DP of the Gaussian mechanism can be derived as (20)–(22):

(20) \begin{align} & \mathrm{diff}_1\vdash{}(\mathtt{norm}(u,\sigma),\mathtt{norm}(d,\sigma))\colon P^{[ \langle{L,l}\rangle^*\mathsf{DP}]} {(\epsilon,\delta)} {(\operatorname{Eq}{K {\mathbb{R}}})} \label{eq:judgment:gauss:DP} \end{align}
(21) \begin{align} & \mathrm{diff}_r\vdash{}(\mathtt{norm}(u,\sigma),\mathtt{norm}(d,\sigma))\colon P^{[\langle{L,l}\rangle^*{}^{\alpha}\mathsf{Re}]} {(\alpha r^2 / 2\sigma^2)} {(\operatorname{Eq}{K {\mathbb{R}}})} \label{eq:judgment:gauss:RDP} \end{align}
(22) \begin{align} & \mathrm{diff}_r\vdash{}(\mathtt{norm}(u,\sigma),\mathtt{norm}(d,\sigma))\colon P^{[\langle{L,l}\rangle^*\mathsf{zCDP}]} {(0,r^2 / 2\sigma^2)} {(\operatorname{Eq}{K {\mathbb{R}}})} \label{eq:judgment:gauss:zCDP} \end{align}

In (20), we require $ \sigma \geq \max ((1+\sqrt{3})/2,~\sqrt{2 \log (0.66/\delta)}/\epsilon) $ . The derivation is done via Proposition 43, Theorem12, and prior studies (Bun and Steinke, Reference Bun and Steinke2016;Mironov, Reference Mironov2017; Sato, Reference Sato2016).

10 Case Study II: Probabilistic Programs with Costs

We further extend the computational signature $ \mathcal C $ in the previous section with an effectful operation $ \mathtt{tick} $ such that $ \Sigma_e(\mathtt{tick})=(\mathtt{R},1) $ . The intention of $ \mathtt{tick}(r) $ is to increase cost counter by r during executionFootnote 8. To interpret this extended metalanguage, we fill Figure 3 as follows:

  1. (1) for the CCC-SM, we take $ ({\mathbb{C}},T) = ({\bf QBS},P_{c}) $ where $ P_{c}\triangleq P(K{\mathbb{R}}\times-) $ is the monad for modeling probabilistic choice and cost counting (see Section 5.7).

  2. (2) interpretation of $ b \in B $ is the same as Section 9,

  3. (3) interpretation of value operations is also the same as Section 9,

  4. (4) for the interpretation of effectful operations, put

    \begin{align*} [\![ \mathtt{norm}]\!](x,\sigma) &= [(0,\mathrm{id}),\mathcal{N}(x,\sigma^2)]_{\sim_{K{\mathbb{R}} \times K{\mathbb{R}}}},\\ [\![ \mathtt{lap}]\!](x,\lambda) &= [(0,\mathrm{id}),\mathrm{Lap}(x,\lambda)]_{\sim_{K{\mathbb{R}} \times K{\mathbb{R}}}},\\ [\![ \mathtt{tick}]\!] (r) &= \eta^P_{K{\mathbb{R}} \times [\![ 1]\!]} (r,\ast) = [\mathrm{const}(r,\ast), \mu]_{\sim_{K{\mathbb{R}} \times 1}}. \end{align*}

We derive a closed term $ \mathtt{ntick} \colon \mathtt{R} \Rightarrow \mathtt{R} \Rightarrow \mathtt{T} 1 $ for ticking with a cost sampled from Gaussian distribution:

\begin{equation*} \mathtt{ntick} \triangleq (\lambda s. \lambda r.~\mathop{\mathtt{let}}{x= }\mathbin{\mathtt{in}}\mathtt{norm}(r,s){\mathtt{tick}(x)}). \end{equation*}

The term $ \mathtt{ntick}~s~r $ adds cost counter by a random value sampled from the Gaussian distribution $ \mathtt{norm}(r,s^2) $ .

10.1 Relational reasoning on probabilistic costs

We convert the total valuation distance $ \mathsf{TV}\in{\bf Div}(G,\operatorname{Eq},1,{\mathcal{R}}^+) $ to the divergence $ {\mathsf\Delta}_{c} \triangleq {\mathsf C}(\langle{L,l}\rangle^*\mathsf{TV} ,{K{\mathbb{R}}} ) \in {\bf Div}(P_{c},\operatorname{Eq},1,{\mathcal{R}}^+) $ on $ P_c $ by Propositions 11 and 17. We also prove basic facts on effectful operations. First, the following relational judgments on $ \mathtt{tick} $ can be easily given:

(23)

Remark that $ \operatorname{Eq} 1 = \top $ and $ [\![ \mathtt{tick}(0)]\!] = [\![ \mathop{\mathtt{ret}}(\ast)]\!] $ holds. Next, in the similar way as (18), by the reflexivity of $ \mathsf{TV} $ , we have the reflexivity of $ \langle {L,l}\rangle^*\mathsf{TV} $ , and we obtain, for each real number constant $ \sigma $ and $ \lambda $ :

(24) \begin{align} &\mathrm{succ}_r\vdash{}({\mathtt{norm}(u,\sigma)},{\mathtt{norm})\colon (d,\sigma)} {T^{[ {{\mathsf\Delta}_{c}}]} (0) (\mathrm{succ}_r)} \notag\\ \label{eq:costs:laplacian_sliding} &\mathrm{succ}_r\vdash{}( {\mathtt{lap}(u,\lambda)}, {\mathtt{lap}(d,\lambda)})\colon {T^{[ {{\mathsf\Delta}_{c}}]} (0) (\mathrm{succ}_r)} \end{align}

We also directly verify the following judgment on $ \mathtt{ntick} $ using Theorem 12 and Proposition 43:

(25) \begin{align} \label{eq:costs:let_gaussian} \mathrm{diff}_1\vdash{}({\mathtt{ntick}~\sigma~u}, {\mathtt{ntick}~\sigma~d})\colon {T^{[ {{\mathsf\Delta}_{c}}]}{({\textstyle \Pr_{r \sim \mathcal{N}(0,\sigma^2)}}[|r| < 0.5] )}{(\top)}}. \end{align}

10.1.1 An example of relational reasoning

We give examples of verification of difference (of distributions) of costs between two runs of a probabilistic program whose output and cost depend on the input. We consider the following program:

\begin{equation*} M \triangleq \lambda r \colon \mathtt{R}.~ \lambda t \colon \mathtt{R} \to T1.~\mathop{\mathtt{let}}{x= }\mathbin{\mathtt{in}}\mathtt{lap}(r,5) {\mathop{\mathtt{let}}{\_= }\mathbin{\mathtt{in}}t(r) {\mathop{\mathtt{ret}}(x - r)}}. \end{equation*}

It first samples a real number x from the Laplacian distribution centered at the input r, call the (possibly effectful) closure t with r and return $ x-r $ . Since the return type of t is T1, it can only probabilistically tick the counter. We show that the following two judgments in acRL:

(A) \begin{align} \label{proof:probcost:example1} &\vdash{}( M~0~(\lambda x.\mathtt{tick}(x)), M~1~(\lambda x.\mathtt{tick}(x)))\colon {T^{[ {{\mathsf\Delta}_{c}}]}{(1)} {(\operatorname{Eq}{[\![ \mathtt{R}]\!]}})}\tag{A}, \end{align}
\begin{align} \label{proof:probcost:example2} &\vdash{}(M~0~(\mathtt{ntick}(2)), M~1~(\mathtt{ntick}(2)))\colon {T^{[ {{\mathsf\Delta}_{c}}]}{(0.20)} {(\operatorname{Eq}{[\![ \mathtt{R}]\!]}})}\tag{B} \end{align}

In judgment (A), we pass the tick operation $ t = \lambda x.\mathtt{tick}(x) $ itself to $ M~0 $ and $ M~1 $ . By the fundamental property of $ T^{[ {{\mathsf\Delta}_{c}}]} $ , the difference of costs between two runs of $ M~0~t $ and $ M~1~t $ is 1, because each of these programs reports cost 0 and 1 deterministically. In contrast, in judgment (B), we pass to $ M~0 $ and $ M~1 $ the probabilistic tick function $ t' = \mathtt{ntick}(2) $ that ticks a real number sampled from the Gaussian distribution with variance $ 2^2 = 4 $ . Therefore, the cost reported by the runs of programs $ M~0~t' $ and $ M~1~t' $ follow the Gaussian distributions $ \mathcal{N}(0,4) $ and $ \mathcal{N}(1,4) $ , whose difference by $ \mathsf{TV} $ is bounded by $ 0.20 $ .

We first show (A). By (23) and 2 of Proposition 42, we have

(26) \begin{equation} \label{eq:example:1} \mathrm{succ}_1\vdash{}( \mathtt{tick} (u), {\mathtt{tick} (d)})\colon {T^{[ {{\mathsf\Delta}_{c}}]}{(1)}{(\top)}}. \end{equation}

By (26), and 4, 5 of Proposition 42, we obtain

(27) \begin{align} \mathrm{succ}_1 \vdash& (\mathop{\mathtt{let}}{\_= }\mathbin{\mathtt{in}}\mathtt{tick} (u) {\mathop{\mathtt{ret}}(u)},\notag\\ \label{eq:example:3} &\quad\mathop{\mathtt{let}}{\_= }\mathbin{\mathtt{in}}\mathtt{tick} (d) {\mathop{\mathtt{ret}}(d-1)}) \colon T^{[ {{\mathsf\Delta}_{c}}]}{(1)}{(\operatorname{Eq}[\![ \mathtt{R}]\!])}. \end{align}

By (24), (27), and 1 and 5 of Proposition 42 again, we conclude (A).

To show (B), it suffices to replace (26) by the following judgment proved by (25), the inequality $ \Pr_{r \sim \mathcal{N}(0,4)}[|r| < 0.5] \leq 0.20 $ and 2 of Proposition 42:

\[ \mathrm{succ}_1\vdash{}( \mathtt{ntick}~2~u, {\mathtt{ntick}~2~d})\colon {T^{[ {{\mathsf\Delta}_{c}}]}{(0.20)}{(\top)}}. \]

The rest of proof is the same as (A).

11. Related Work

This work is inspired by relational Hoare logics for verifying differential privacy of probabilistic programs, as summarized in Table 5. Composable divergences employed in these logics include the one for the standard DP, plus its recent relaxations, such as, Rényi DP, zero-concentrated DP, and truncated-concentrated DP (Bun et al., Reference Bun, Dwork, Rothblum and Steinke2018; Bun and Steinke, Reference Bun and Steinke2016; Mironov, Reference Mironov2017).

Table 5. Relational Hoare logics for verifying divergences

The key semantic structure in these logics is graded relational liftings of the probability distribution monad. Barthe et al. gave a graded relational lifting of the probability distribution monad based on couplings (Barthe et al., Reference Barthe, Köpf, Olmedo and Béguelin2012). Since then, coupling-based liftings have been refined and used in several works (Barthe et al., Reference Barthe, Gaboardi, Arias, Hsu, Kunz and Strub2014, Reference Barthe, Gaboardi, Grégoire, Hsu and Strub2016; Barthe and Olmedo, Reference Barthe and Olmedo2013; Sato et al., Reference Sato, Barthe, Gaboardi, Hsu and Katsumata2019). They can be systematically constructed from composable divergences on the probability distribution monad (Barthe and Olmedo, Reference Barthe and Olmedo2013). One advantage of coupling-based liftings is that, to relate two probability distributions, it suffices to exhibit a coupling; this is exploited in the mechanized verification of differential privacy of programs (Reference Albarghouthi and HsuAlbarghouthi andHsu, 2018a,b). These coupling-based liftings, however, are developed upon discrete probability distributions, and measure-theoretic probability distributions, such as Gaussian or Cauchy distributions, were not supported until the work by Sato et al. (Reference Sato, Barthe, Gaboardi, Hsu and Katsumata2019).

The relational Hoare logic for differential privacy that supports sampling from continuous probability measures is given in the study by Sato (Reference Sato2016). In his work, the graded relational lifting for $ (\epsilon,\delta) $ -DP is constructed by a technique called codensity lifting (Katsumata et al., Reference Katsumata, Sato and Uustalu2018), which does not rely on the existence of coupling. It has been an open question (Sato et al., Reference Sato, Barthe, Gaboardi, Hsu and Katsumata2019, Section VIII) how to extend the codensity lifting technique to support various relaxations of differential privacy. Theorem 39 of this paper answers to this question. Later, coupling-based liftings has also been extended to support sampling from continuous probability measures (Sato et al., Reference Sato, Barthe, Gaboardi, Hsu and Katsumata2019). This extension is achieved by redefining binary relations as spans of measurable functions. Comparison of these approaches is in Section 7.2.

Verification of differential privacy in functional programming languages has also been pursued (Azevedo de Amorim et al., Reference Azevedo de Amorim, Gaboardi, Hsu and Katsumata2019; Barthe et al., Reference Barthe, Gaboardi, Arias, Hsu, Roth and Strub2015; Gaboardi et al., Reference Gaboardi, Haeberlen, Hsu, Narayan and Pierce2013; Reed and Pierce, Reference Reed and Pierce2010). Reed and Pierce (Reference Reed and Pierce2010) introduced a linear functional programming language with graded comonadic types that supports reasoning about $ \epsilon $ -differential privacy of probabilistic programs. Later, Gaboardi et al. (Reference Gaboardi, Haeberlen, Hsu, Narayan and Pierce2013) strengthened the Reed–Pierce type system with dependent types. A category-theoretic account of the Reed–Pierce type system is given by Azevedo de Amorim et al.

(2019), where general $ (\epsilon,\delta) $ -differential privacy is also supported. These works basically regard types as metric spaces, allowing us to reason about sensitivity of programs with respect to inputs. A coupling-based lifting is also employed in a relational model of a higher-order probabilistic programming language (Barthe et al., Reference Barthe, Gaboardi, Arias, Hsu, Roth and Strub2015).

The study by Azevedo de Amorim et al. (Reference Azevedo de Amorim, Gaboardi, Hsu and Katsumata2019) gives a categorical definition of composable divergences in a general framework called weakly closed refinements of symmetric monoidal closed categories (Azevedo de Amorim et al., Reference Azevedo de Amorim, Gaboardi, Hsu and Katsumata2019, Definition 1). A comparison is given in Section 6.1.2.

Mardare et al. (Reference Mardare, Panangaden and Plotkin2016) introduced a quantitative refinement of algebraic theory called quantitative equational theory and studied variety theorem for quantitative algebras. Bacci et al. (Reference Bacci, Mardare, Panangaden, Plotkin, Gadducci and Silva2021) discussed tensor products of quantitative equational theories. QETs and divergences on monads share the common interest of measuring quantitative difference between computational effects. Divergences on monads are derived as a generalization of the composability condition of statistical divergences studied by Barthe and Olmedo (Reference Barthe and Olmedo2013). To make a precise connection between these two concepts, in Section 6.3, we have given an adjunction between QETs of type $ \Sigma $ over $ \Omega $ and $ \Omega $ -generated divergences on the free monad $ T_\Sigma $ . The adjunction cuts down to the isomorphism between unconditional QETs of type $ \Sigma $ over $ \Omega $ and $ \Omega $ -generated divergences on $ T_\Sigma $ .

In this paper, we have constructed strong graded relational liftings of monads from divergences on monads. The concept of relational lifting and its fibrational generalization are also key technical concepts in the categorical studies of bisimulation (Balan et al., Reference Balan, Kurz and Velebil2019; Baldan et al., Reference Baldan, Bonchi, Kerstan and König2018; Hermida and Jacobs, Reference Hermida and Jacobs1995; Katsumata et al., Reference Katsumata, Sato and Uustalu2018; Kurz and Velebil, Reference Kurz and Velebil2016; Sprunger et al., Reference Sprunger, Katsumata, Dubut and Hasuo2021). It remains to be seen if the relational liftings obtained in this paper, as well as the divergence liftings in Section 6.1.3, have applications in the coalgebraic study of bisimulations.

Metric-like spaces are used in several recent papers on semantics of programming languages and systems. Gavazzo (Reference Gavazzo2018) studied a quantitative refinement of Abramsky’s applicative bisimilarity for the Reed–Pierce type system. He introduced a monadic operational semantics of the type system and formalized the concept of quantitative applicative bisimilarity using monads that are lifted to the category of quantale-valued relations. Bonchi et al. (Reference Bonchi, König and Petrisan2018) also used metric-like spaces to study bisimulations and up-to techniques in the category of quantale-valued relations. In this paper, we are interested in relational program verification of effectful programs, and we carry it out in the relational category $ {\bf BRel}({\mathbb{C}}) $ rather than $ {\bf Div}_{{\mathcal{Q}}}({{\mathbb{C}}}) $ . The quantitative difference of computational effects measured by a divergence $ {\mathsf\Delta} $ is represented by the binary relation $ \tilde{\mathsf\Delta}(v) $ that relates two computational effects whose distance is bound by v.

The $ \mathsf{RelCost} $ system by Çiçek et al. (Reference Çiçek, Barthe, Gaboardi, Garg and Hoffmann2017) is a formal system for reasoning about relational properties of higher-order functional programs and measuring cost difference of programs. It consists of two subsystems: the relational refinement type system that can measure cost difference of programs, and the unary logic that can estimate lower and upper bounds of cost (i.e. cost intervals) of programs. We expect a connection between the semantics of the former system and the graded relational lifting constructed from the divergence $ \mathsf{NCI} $ on $ P({\mathbb{N}} \times -) $ (or its variant) in Section 5.2. On the other hand, identifying a semantic structure behind the latter system is not the scope of this paper, and we leave it to future work to identify it and relate it with divergences on monads.

Acknowledgements

Tetsuya Sato carried out this research under the support by JST ERATO HASUO Metamathematics for Systems Design Project (No. JPMJER1603) and JSPS KAKENHI Grant Number 20K19775, Japan. Shin-ya Katsumata carried out this research under the support by JST ERATO HASUO Metamathematics for Systems Design Project (No. JPMJER1603) and JSPS KAKENHI Grant Number 18H03204, Japan. The authors are grateful to Ichiro Hasuo providing the opportunity of collaborating in that project. The authors are grateful to Satoshi Kura, Justin Hsu, Marco Gaboardi, Borja Balle, and Gilles Barthe for fruitful discussions and Kazuki Matsuoka for helpful comments on the manuscript.

Appendix A. Proofs for Section 4 (Divergences on Monads)

Proof. (Proof of Proposition 7) Let $ m \in M, (x_1, x_2) \in E I, c_1, c_2 \in U (T J) $ . Below $ \eta $ stands for the $ I \times J $ -component $ \eta_{I \times J} : I \times J \rightarrow T (I \times J) $ of the unit of T. From the composability of $ {\mathsf\Delta} $ and $ \eta_x \mathbin{\bullet} y = \eta \mathbin{\bullet} \langle x, y \rangle $ , we have

\begin{align*} {\mathsf\Delta}^{m}_{I \times J} ((\eta_{x_1})^\sharp \mathbin{\bullet} c_1, (\eta_{x_2})^\sharp \mathbin{\bullet} c_2) & \leq {\mathsf\Delta}^{m}_{I} (c_1, c_2) + \sup_{(y_1, y_2) \in E J} {\mathsf\Delta}^{1}_{I \times J} (\eta_{x_1} \mathbin{\bullet} y_1, \eta_{x_2} \mathbin{\bullet} y_2)\\ & = {\mathsf\Delta}^{m}_{I} (c_1, c_2) + \sup_{(y_1, y_2) \in E J} {\mathsf\Delta}^{1}_{I \times J} (\eta \mathbin{\bullet} \langle x_1, y_1 \rangle, \eta \mathbin{\bullet} \langle x_2, y_2 \rangle). \end{align*}

Since $ E I \mathbin{\dot\times} E J \leq E (I \times J) $ , we have $ (\langle x_1, y_1 \rangle, \langle x_2, y_2 \rangle) \in E (I \times J) $ for any $ (y_1, y_2) \in E J $ . Therefore,

\[ \forall{(y_1,y_2)\in EJ}~.~ {\mathsf\Delta}^{1}_{I \times J} (\eta \mathbin{\bullet} \langle x_1, y_1 \rangle, \eta \mathbin{\bullet} \langle x_2, y_2 \rangle) \leq 0 \]

holds by the E-unit reflexivity. From $ (\eta_x)^\sharp \mathbin{\bullet} c = \theta \mathbin{\bullet} \langle x, c \rangle $ by (1), we obtain

\[ {\mathsf\Delta}^{m}_{I \times J} (\theta_{I, J} \mathbin{\bullet} \langle x_1, c_1 \rangle, \theta_{I, J} \mathbin{\bullet} \langle x_2, c_2 \rangle) = {\mathsf\Delta}^{m}_{I \times J} ((\eta_{x_1})^\sharp \mathbin{\bullet} c_1, (\eta_{x_2})^\sharp \mathbin{\bullet} c_2) \leq {\mathsf\Delta}^{m}_{J} (c_1, c_2). \]

Appendix B. Proofs for Section 5 (Examples of Divergences on Monads)

Proposition 44. The family $ \mathsf{C}' = \{{\mathsf{C}'_I} \colon (\mathbb{N} \times I)^2 \to {\mathcal{N}} \}_{I \in {\bf Set}} $ of $ {\mathcal{N}} $ -divergences defined by:

\[ \mathsf{C}'_I((i,x),(j,y)) \triangleq \begin{cases} |i - j| & x = y \\ \infty & x \neq y \end{cases}. \]

is a $ \operatorname{Eq} $ -relative $ {\mathcal{N}} $ -divergence on the monad $ \mathbb{N} \times - $ .

Proof. The monotonicity of $ \mathsf{C}' $ is obvious.

We show the $ \operatorname{Eq} $ -unit reflexivity of $ \mathsf{C}' $ . For all $ (x,y) \in \operatorname{Eq} I $ (that is, $ x = y \in I $ ), we have

\[ \mathsf{C}'_I (\eta_I (x),\eta_I (y)) = {\mathsf{C}'_I} {((0,x),(0,y))} = 0. \]

We show the $ \operatorname{Eq} $ -composability of $ \mathsf{C}' $ . Let $ (i,x), (j,y) \in \mathbb{N} \times I $ and $ f,g \colon I \to \mathbb{N} \times J $ . We write $ f(z) = (i_z,f_z) $ and $ g(z) = (j_z,g_z) $ for each $ z \in Z $ .

  • If $ x = y $ and $ x_z = y_z $ for all $ z \in I $ , we have

    \begin{align*} \mathsf{C}'_J (f^\sharp(i,x),g^\sharp(j,y)) &= \mathsf{C}'_J((i + i_x,f_{x}),(j + j_x,g_{x}))\\ &= |(i + i_x) - (j + j_x)|\leq |i - j | + |i_x - j_x| \\ &\leq \mathsf{C}'_I((i,x),(j,y)) + \sup_{(x,y) \in \operatorname{Eq} I (\iff x = y \in I)}\mathsf{C}'_J(f(x),g(y)) \end{align*}
  • If $ x \neq y $ or $ f_z \neq g_z $ for some $ z \in I $ , we have

    \[ \mathsf{C}'_J (f^\sharp(i,x),g^\sharp(j,y)) \leq \infty = \mathsf{C}'_I((i,x),(j,y)) + \sup_{(x,y) \in \operatorname{Eq} I (\iff x = y \in I)}\mathsf{C}'_J(f(x),g(y)). \]

This completes the proof.

Proposition 45. The family $ \mathsf{NC} = \{\mathsf{NC}_I \colon (P(\mathbb{N} \times I))^2 \to {\mathcal{N}} \}_{I \in {\bf Set}} $ of $ {\mathcal{N}} $ -divergences defined by:

\[ \mathsf{NC}_I(A,B) \triangleq \sup_{(i,x) \in A, (j,x) \in B} |i - j|\]

is a $ \operatorname{Top} $ -relative $ {\mathcal{N}} $ -divergence on the monad $ P(\mathbb{N} \times - ) $ .

Proof. The monotonicity of $ \mathsf{NC} $ is obvious.

We show the $ \operatorname{Top} $ -unit reflexivity of $ \mathsf{NC} $ . For all $ (x,y) \in \operatorname{Top} I $ (that is, $ x,y \in I $ ), we have

\[ \mathsf{NC}_I (\eta_I (x),\eta_I (y)) = {\mathsf{NC}_I} {(\{ (0,x) \},\{ (0,y) \} )} = |0 - 0| = 0. \]

We show the $ \operatorname{Top} $ -composability of $ \mathsf{NC} $ . For all $ f,g \colon I \to P(\mathbb{N} \times J ) $ and $ A,B \in P(\mathbb{N} \times I ) $ , we have

\begin{align*} \mathsf{NC}_J (f^\sharp A,g^\sharp B) &= \sup \{ |i - j| \mid (i,x) \in f^\sharp(A), (j,y) \in g^\sharp(B) \}\\ &= \sup \left\{ |i_1 + i_2 - j_1 - j_2|~\middle| \begin{array}{l@{}} (i_1,x) \in A, (j_1,y) \in B, \\ (i_2,x') \in f(x), (j_2,y') \in g(y) \end{array} \right\}\\ &\leq \sup\{ |i_1 - j_1| \mid (i_1,x) \in A, (j_1,y) \in B \}\\ &\qquad + \sup_{(x,y) \in \operatorname{Top} I (\iff x,y \in I)}\{|i_2 - j_2| \mid (i_2,x') \in f(x), (j_2,y') \in g(y) \}\\ &= \mathsf{NC}_I (A, B) + \sup_{(x,y) \in \operatorname{Top} I (\iff x,y \in I)}\mathsf{NC}_J (f(x),g(y)). \end{align*}

This completes the proof.

Proposition 46. The family $ \mathsf{NCI} = \{\mathsf{NCI}_I \colon (P(\mathbb{N} \times I))^2 \to {\mathcal{N}} \}_{I \in {\bf Set}} $ of $ {\mathcal{Z}} $ -divergences defined by:

\[ \mathsf{NCI}_I(A,B) \triangleq \sup_{(i,x) \in A, (j,y) \in B} i - j \]

is a $ \operatorname{Top} $ -relative $ {\mathcal{Z}} $ -divergence on the monad $ P(\mathbb{N} \times - ) $ .

Proof. The monotonicity of $ \mathsf{NCI} $ is obvious.

We show the $ \operatorname{Top} $ -unit reflexivity of $ \mathsf{NCI} $ . For all $ (x,y) \in \operatorname{Top} I $ (that is, $ x,y \in I $ ), we have

\[ \mathsf{NCI}_I (\eta_I (x),\eta_I (y)) = {\mathsf{NCI}_I} {(\{ (0,x) \},\{ (0,y) \} )} = 0 - 0 = 0. \]

We show the $ \operatorname{Top} $ -composability of $ \mathsf{NCI} $ . For all $ f,g \colon I \to P(\mathbb{N} \times J ) $ and $ A,B \in P(\mathbb{N} \times I ) $ , we have

\begin{align*} \mathsf{NCI}_J (f^\sharp A,g^\sharp B) &= \sup \{ i - j \mid (i,x) \in f^\sharp(A) \land (j,y) \in g^\sharp(B) \}\\ &= \sup \left\{ i_1 + i_2 - j_1 - j_2~\middle| \begin{array}{l@{}} (i_1,x) \in A, (j_1,y) \in B, \\ (i_2,x') \in f(x), (j_2,y') \in g(y) \end{array} \right\}\\ &\leq \sup\{ i_1 - j_1 \mid (i_1,x) \in A, (j_1,y) \in B \}\\ &\qquad + \sup_{(x,y) \in \operatorname{Top} I (\iff x,y \in I)}\{ i_2 - j_2 \mid (i_2,x') \in f(x), (j_2,y') \in g(y) \}\\ &= \mathsf{NCI}_I (A, B) + \sup_{(x,y) \in \operatorname{Top} I (\iff x,y \in I)}\mathsf{NCI}_J (f(x),g(y)). \end{align*}

This completes the proof.

Proof. (Proof of Proposition 10) We recall the continuity of f-divergence $ {{}^{f}\mathsf{Div}} $ in Liese and Vajda (Reference Liese and Vajda2006, Theorem 16) and Sato et al. (Reference Sato, Barthe, Gaboardi, Hsu and Katsumata2019, Theorem 3):

\[ {{}^{f}\mathsf{Div}}_I(\mu_1,\mu_2) = \sup \left \{ \sum_{i = 0}^n \mu_2(B_i) f\left(\frac{\mu_1(B_i)}{\mu_2(B_i)}\right) \middle | \{B_j\}_{i = 0}^n \colon \text{measurable partition of } I \right\}. \]

Here, a measurable partition of I is a finite family $ \{B_i\}_{i = 0}^n $ of measurable subsets $ B_i \in \Sigma_I $ satisfying $ i \neq j \implies B_i \cap B_j = \emptyset $ and $ \bigcup_{i = 0}^n B_i = I $ .

We have the $ \operatorname{Eq}{} $ -unit reflexivity because the reflexivity $ {}^{f}\mathsf{Div}_{I} (\mu,\mu) = 0 $ is obtained from $ f(1) = 0 $ . We show the $ \operatorname{Eq}{} $ -composability. To show this, we prove a bit stronger statement.

Consider three positive weight functions $ f, f_1, f_2 \geq 0 $ with $ f(1) = f_1(1) = f_2(1) = 0 $ . Assume that there are some $ \alpha, \beta, \beta' \in {\mathbb{R}} $ satisfying the following conditions:

Let $ \mu_1, \mu_2 \in G_{s} I $ , and $ h, k \colon I \to G_{s} J $ . We suppose that at least one of the following two conditions holds

  1. (1) $ \mu_1(I) = \mu_2(I) = 1 $ and $ \forall x \in I.~h(x)(J) = k(x)(J) = 1 $ ,

  2. (2) $ \alpha = 0 $ and $ \beta,\beta \in [0,1] $ .

We then prove the composability in the sense of Olmedo (Reference Olmedo2014, Definition 5.2):

(28) \begin{equation}\label{inequality:composition:f-div:target0} \begin{split} &{{}^f\mathsf{Div}}_J(h^\sharp \mu_1, k^\sharp \mu_2)\\ & \leq {{}^{f_1}\mathsf{Div}}_I(\mu_1, \mu_2) + \sup_{x \in I} {{}^{f_2}\mathsf{Div}}_J(h(x), k(x)) + \gamma {{}^{f_1}\mathsf{Div}}_I(\mu_1, \mu_2)\cdot \sup_{x \in I} {{}^{f_2}\mathsf{Div}}_J(h(x), k(x)). \end{split} \end{equation}

We fix a measurable partition $ \{A_i\}_{i = 0}^n $ of J. For each $ 0 \leq i \leq n $ , by definition of the $ \sigma $ -algebra $ \Sigma_{G J} $ , the functions $ h(-)(A_i) \colon I \to [0,1] $ and $ k(-)(A_i) \colon I \to [0,1] $ are measurable, and hence we have two monotone increasing sequences $ \{h^i_l\}_{l=0}^\infty $ and $ \{k^i_l\}_{l=0}^\infty $ of (nonnegative) simple functionsFootnote 9 that converge uniformly to $ h(-)(A_i) $ and $ k(-)(A_i) $ , respectively. Since the sequences $ \{h^i_l\}_{l=0}^\infty $ and $ \{k^i_l\}_{l=0}^\infty $ converge uniformly and are bounded above, we obtain

\begin{align*} \sum_{i = 0}^n k^\sharp(\mu_2)(A_i) f\left( \frac{h^\sharp(\mu_1)(A_i)}{k^\sharp(\mu_2)(A_i)}\right) &= \sum_{i = 0}^n (\int_X k(x)(A_i)~d\mu_2(x) ) f\left( \frac{\int_X h(x)(A_i)~d\mu_1(x)}{\int_X k(x)(A_i)~d\mu_2(x)}\right) \\ &= \sum_{i = 0}^n (\int_X \lim_{l \to \infty} k^i_l~d\mu_2 ) f\left( \frac{\int_X \lim_{l \to \infty} h^i_l~d\mu_1}{\int_X \lim_{l \to \infty} k^i_l~d\mu_2}\right)\\ &= \sum_{i = 0}^n (\lim_{l \to \infty} \int_X k^i_l~d\mu_2 ) f\left( \frac{\lim_{l \to \infty} \int_X h^i_l~d\mu_1}{ \lim_{l \to \infty} \int_X k^i_l~d\mu_2}\right)\\ &= \lim_{l \to \infty} \sum_{i = 0}^n (\int_X k^i_l~d\mu_2 ) f\left( \frac{\int_X h^i_l~d\mu_1}{\int_X k^i_l~d\mu_2}\right). \end{align*}

We remark that the above computation is consistent even if $ k^\sharp(\mu_2)(A_i) = 0 $ for some $ 0 \leq i \leq n $ . We here recall $ 0f(a/0) = a f^\ast(0) $ for any $ a \in [0,\infty) $ . If $ k^\sharp(\mu_2)(A_i) = 0 $ , then $ \int_X k^i_l~d\mu_2 = 0 $ for all $ l \in {\mathbb{N}} $ because it is a nonnegative monotone increasing sequence whose limit is 0. Then $ (\int_X k^i_l~d\mu_2 ) f\left( (\int_X h^i_l~d\mu_1)/(\int_X k^i_l~d\mu_2)\right) = (\int_X h^i_l~d\mu_1)f^\ast(0) $ holds for all $ l \in {\mathbb{N}} $ . It is monotonically increasing, and converges to $ (h^\sharp(\mu_1)(A_i))f^\ast(0) $ .

We thus show the following inequality:

(29) \begin{equation}\label{inequality:composition:f-div:target} \begin{split} &\lim_{l \to \infty} \sum_{i = 0}^n (\int_X k^i_l~d\mu_2 ) f\left( \frac{\int_X h^i_l~d\mu_1}{\int_X k^i_l~d\mu_2}\right)\\ &\leq {{}^{f_1}\mathsf{Div}}_I(\mu_1,\mu_2) + \sup_{x \in I}{{}^{f_2}\mathsf{Div}}_J(h(x),k(x)) + \gamma {{}^{f_1}\mathsf{Div}}_I(\mu_1,\mu_2)\sup_{x \in I}{{}^{f_2}\mathsf{Div}}_J(h(x),k(x)). \end{split} \end{equation}

We fix $ l \in {\mathbb{N}} $ . We can write $ h^i_l = \sum_{j = 0}^m \alpha^i_{j}\chi_{B_j} $ and $ k^i_l = \sum_{j = 0}^m \beta^i_{j}\chi_{B_j} $ with some coefficients $ \alpha^i_{j},\beta^i_{j} \in [0,1] $ ( $ 0 \leq j \leq m $ ) and measurable partition $ \{B_j\}_{j = 0}^m $ of I.

By Jensen’s inequality and the second inequality of (A’), we calculate as follows:

We will evaluate the three expressions $ V_1,V_2,V_3 $ .

First, we obtain $ \lim_{l \to \infty} V_1 \leq {{}^{f_1}\mathsf{Div}}_I(\mu_1,\mu_2) $ as follows:

\begin{align*} V_1 & \leq \left(\sup_{0 \leq j \leq m} \sum_{i = 0}^n (\beta \alpha^i_j + (1-\beta) \beta^i_j)\right) \cdot \sum_{j = 0}^m \mu_2(B_j) f_1\left(\frac{\mu_1(B_j)}{\mu_2(B_j)}\right)\\ & = \sup_{x \in I} \left(\beta \sum_{i = 0}^n h^i_l(x) + (1-\beta) \sum_{i = 0}^n k^i_l(x)\right)\cdot \sum_{j = 0}^m \mu_2(B_j) f_1\left(\frac{\mu_1(B_j)}{\mu_2(B_j)}\right)\\ & \leq \sup_{x \in I} \left(\beta \sum_{i = 0}^n h^i_l(x) + (1-\beta) \sum_{i = 0}^n k^i_l(x)\right) \cdot {{}^{f_1}\mathsf{Div}}_I(\mu_1,\mu_2)\\ & \xrightarrow[]{~l \to \infty~} \sup_{x \in I} \left(\beta h(x)(J) + (1-\beta) k(x)(J) \right) \cdot{{}^{f_1}\mathsf{Div}}_I(\mu_1,\mu_2)\\ & \leq {{}^{f_1}\mathsf{Div}}_I(\mu_1,\mu_2) \end{align*}

Here, the first inequality is given from the nonnegativity of each $ \mu_2(B_j) f_1\left(\frac{\mu_1(B_j)}{\mu_2(B_j)}\right) $ derived from $ 0 \leq f_1 $ ; the equality is given by definition of $ \alpha_{j}^{i} $ and $ \beta_{j}^{i} $ ; the second inequality can be given by the continuity of $ {{}^{f_1}\mathsf{Div}} $ ; the last inequality is derived by $ \beta h(x)(J) + (1-\beta) k(x)(J) \in [0,1] $ from the assumption that either $ \beta \in [0,1] $ or $ h(x)(J) = k(x)(J) $ for all $ x \in I $ holds.

Second, we obtain $ \lim_{l \to \infty} V_2 \leq \gamma\cdot{{}^{f_1}\mathsf{Div}}_I(\mu_1,\mu_2) \cdot \sup_{x \in I}{{}^{f_2}\mathsf{Div}}_J(h(x),k(x)) $ as follows:

\begin{align*} V_2 & \leq \left(\sup_{0 \leq j \leq m} \sum_{i = 0}^n \beta^i_j f_2 \left(\frac{\alpha^i_j}{\beta^i_j}\right)\right) \sum_{j = 0}^m \left( \beta' \mu_1(B_j) + (1-\beta') \mu_2(B_j)) + \gamma \mu_2(B_j) f_1\left(\frac{\mu_1(B_j)}{\mu_2(B_j)}\right)\right)\\ & = \left(\sup_{x \in I} \sum_{i = 0}^n k^i_l(x) f_2 \left(\frac{h^i_l(x)}{k^i_l(x)}\right)\right) \left(\beta' \mu_1(I) + (1-\beta') \mu_2(I) +\gamma \sum_{j = 0}^m \mu_2(B_j) f_1\left(\frac{\mu_1(B_j)}{\mu_2(B_j)}\right) \right)\\ & \leq \left(\sup_{x \in I} \sum_{i = 0}^n k^i_l(x) f_2 \left(\frac{h^i_l(x)}{k^i_l(x)}\right)\right) \left(\beta' \mu_1(I) + (1-\beta') \mu_2(I) + \gamma\cdot{{}^{f_1}\mathsf{Div}}_I(\mu_1,\mu_2)\right)\\ & \xrightarrow[]{~l \to \infty~} \left(\sup_{x \in I} \sum_{i = 0}^n k(x)(A_i) f_2 \left(\frac{h(x)(A_i)}{k(x)(A_i)}\right)\right) \left(\beta' \mu_1(I) + (1-\beta') \mu_2(I) +\gamma\cdot{{}^{f_1}\mathsf{Div}}_I(\mu_1,\mu_2) \right)\\ & \leq \sup_{x \in I}{{}^{f_2}\mathsf{Div}}_J(h(x),k(x)) \left(\beta' \mu_1(I) + (1-\beta') \mu_2(I) + \gamma\cdot{{}^{f_1}\mathsf{Div}}_I(\mu_1,\mu_2)\right)\\ &\leq \sup_{x \in I}{{}^{f_2}\mathsf{Div}}_J(h(x),k(x)) \cdot \gamma\cdot{{}^{f_1}\mathsf{Div}}_I(\mu_1,\mu_2)\\ &=\gamma\cdot{{}^{f_1}\mathsf{Div}}_I(\mu_1,\mu_2) \cdot \sup_{x \in I}{{}^{f_2}\mathsf{Div}}_J(h(x),k(x)). \end{align*}

Here, the first inequality is proved by the nonnegativity of each

\[ (\beta' \mu_1(B_j) + (1-\beta') \mu_2(B_j)) + \gamma \mu_2(B_j) f_1\left(\frac{\mu_1(B_j)}{\mu_2(B_j)}\right), \]

which is derived from the first inequality of (A’); the first equality is given by definition of $ \alpha_{j}^{i} $ and $ \beta_{j}^{i} $ and the countable additivity of $ \mu_1 $ and $ \mu_2 $ ; the second inequality is given by the continuity of $ {{}^{f_1}\mathsf{Div}} $ and $ 0 \leq \gamma $ ; the last inequality is derived by $ \beta' \mu_1(I) + (1-\beta') \mu_2(I) \in [0,1] $ from the assumption that either $ \beta' \in [0,1] $ or $ \mu_1(I) = \mu_2(I) $ holds.

We prove the third inequality. We recall that the sequences $ \{h^i_l(x)\}_{l = 0}^\infty $ and $ \{k^i_l(x)\}_{l = 0}^\infty $ are monotone increasing at each $ x \in I $ . Since $ f_2 $ is convex, by Jensen’s inequality, the sequence $ \left\{\sum_{i = 0}^n k^i_l(x) f_2 \left({h^i_l(x)}/{k^i_l(x)}\right)\right\}_{l = 0}^\infty $ is monotone increasing at each $ x \in I $ . Then, the sequence $ \left\{ \sup_{x \in I} \sum_{i = 0}^n k^i_l(x) f_2 \left({h^i_l(x)}/{k^i_l(x)}\right)\right\}_{l = 0}^\infty $ is monotone increasing, because $ \sum_{i = 0}^n k^i_{l + 1}(x) f_2 \left({h^i_{l + 1}(x)}/{k^i_{l + 1}(x)}\right) $ is always greater than $ \sum_{i = 0}^n k^i_l(x) f_2 \left({h^i_l(x)}/{k^i_l(x)}\right) $ for all $ l \in {\mathbb{N}} $ and $ x \in I $ . Hence, from the continuity of $ {}^{f_2}\mathsf{Div} $ , we obtain

\begin{align*} \lim_{l \to \infty} \sup_{x \in I} \sum_{i = 0}^n k^i_l(x) f_2 \left(\frac{h^i_l(x)}{k^i_l(x)}\right) &= \sup_{l \in \mathbb{N}} \sup_{x \in I} \sum_{i = 0}^n k^i_l(x) f_2 \left(\frac{h^i_l(x)}{k^i_l(x)}\right)\\ &= \sup_{x \in I} \sup_{l \in \mathbb{N}} \sum_{i = 0}^n k^i_l(x) f_2 \left(\frac{h^i_l(x)}{k^i_l(x)}\right)\\ &= \sup_{x \in I} \sum_{i = 0}^n k(x)(A_i) f_2 \left(\frac{h(x)(A_i)}{k(x)(A_i)}\right)\\ &\leq \sup_{x \in I}{{}^{f_2}\mathsf{Div}}(h(x),k(x)). \end{align*}

Finally, we obtain $ \lim_{l \to \infty} V_3 = 0 $ as follows:

\begin{align*} V_3 & = \sum_{j = 0}^m \alpha (\mu_2(B_j) - \mu_1(B_j))(\sum_{i = 0}^n \alpha^i_j - \beta^i_j)\\ & = \alpha\left(\int_I h^i_l ~d\mu_2 - \int_I k^i_l ~d\mu_2 + \int_I k^i_l ~d\mu_1 - \int_I h^i_l ~d\mu_1\right)\\ & \xrightarrow[]{~l \to \infty~} \alpha\left(\int_I h(-)(J) ~d\mu_2 - \int_I k(-)(J) ~d\mu_2 + \int_I k(-)(J) ~d\mu_1 - \int_I h(-)(J) ~d\mu_1\right)\\ &= 0. \end{align*}

Here, the last equality is derived from the assumption that either $ \alpha = 0 $ or $ h(x)(J) = k(x)(J) $ for any $ x \in I $ holds.

From the above evaluations of $ V_1 $ , $ V_2 $ , and $ V_3 $ , we obtain the inequality (29). Since the measurable partition $ \{A_i\}_{i = 0}^n $ of J is arbitrary, we conclude (28) by the continuity of $ {}^{f}\mathsf{Div} $ . This completes the proof.

Parameters for Proposition 10 for for weight functions of $ \mathsf{TV} $ , $ \mathsf{KL} $ , $ \mathsf{HD} $ , and $ \mathsf{Chi} $ are shown in Table 4. Below, we check the conditions in Proposition 10.

  • For the weight function $ f(t) = |t - 1| / 2 $ of $ \mathsf{TV} $ , the tuple $ (\gamma,\alpha,\beta,\beta') = (0,0,1,0) $ satisfies for all $ x,y,z,w \in [0,1] $ , we have

    \begin{align*} 0 & \leq w + xf({z}/{x}),\\ xy f(zw/xy) & =| {zw} - xy | / 2 \leq (| {zw} - {wx}| + |{xw} - xy |) /2 = wxf({z}/{x}) + xf({w}/{y}). \end{align*}
  • For the weight function $ f(t) = t\log(t) - t + 1 $ of $ \mathsf{KL} $ , the tuple $ (\gamma,\alpha,\beta,\beta') = (0,-1,1,1) $ satisfies for all $ x,y,z,w \in [0,1] $ , we have

    \begin{align*} 0 & \leq z + x f(z/x), \\ xyf(zw/xy) &= xy((zw/xy)\log(zw/xy) - zw/xy + 1)\\ &= zw\log(w/y) + zw\log(z/x) - zw + xy \\ & = xw((z/x)\log(z/x) {-} z/x {+} 1) {+} zy((w/y)\log(w/y) {-} w/y + 1) {-} (x-z)(w-y). \end{align*}
  • For the weight function $ f(t) = (\sqrt{t} - 1)^2 / 2 $ of $ \mathsf{HD} $ , the tuple $ (\gamma,\alpha,\beta,\beta') = (0,-1/4,1/2,1/2) $ satisfies for all $ x,y,z,w \in [0,1] $ ,

    \begin{align*} 0 & \leq (z + x) / 2 + x f(z/x), \\ xyf(zw/xy) &= (zw + xy)/2 -\sqrt{xyzw} \\ & = (zw + xy)/2 - ((x + z) - (\sqrt{x} - \sqrt{z})^2)((y + w) - (\sqrt{y} - \sqrt{w})^2)/ 4 \\ & = (zw + xy)/2 - ((x + z) - 2 x f(z/x))((y + w) - 2 yf(w/y))/ 4\\ & = (zw + xy)/2 - ((x + z)/2 - x f(z/x))((y + w)/2 - yf(w/y))\\ & \leq (y + w)/ 2 \cdot x f(z/x) + (x + z)/2 \cdot y f(w/y) + (zw + xy)/2 - (x + z)(y + w)/4\\ & = (y + w)/ 2 \cdot x f(z/x) + (x + z)/2 \cdot y f(w/y) - (x - z)(w - y)/4. \end{align*}
  • For the weight function $ f(t) = ({t} - 1)^2 $ of $ \mathsf{Chi} $ , The tuple $ (\gamma,\alpha,\beta,\beta') = (1,-2,2,2) $ satisfies for all $ x,y,z,w \in [0,1] $ ,

    \begin{align*} 0 & \leq z^2/x = (2z - x) + ((z/x) - 1)(z - x) = (2z - x) + x f(z/x), \\ xy f(zw/xy) &= z^2w^2/xy - 2zw + xy \\ &= (xf(z/x) + 2z - x)(yf(w/y) + 2w - y) - 2zw + xy \\ &=(2w - y) xf(z/x) + (2z - x)yf(w/y) + xyf(z/x)f(w/y) -2(x-z)(w-y). \end{align*}

Proof. (Proof of Proposition 11)

We first show the monotonicity of $ \langle{p,\lambda}\rangle^*{\mathsf\Delta} $ . Assume $ m \leq m' $ . From the monotonicity of the original $ {\mathsf\Delta} $ , we obtain for each $ \nu_1,\nu_2 \in U^{\mathbb{C}} (SI)) $ ,

\begin{align*} (\langle{p,\lambda}\rangle^*{\mathsf\Delta}{ })^{m}_{I} (\nu_1,\nu_2) &= {\mathsf\Delta}^m_{pI} ((U^\mathbb D \lambda_{I})(\nu_1),(U^\mathbb D \lambda_{I})(\nu_2))\\ &\geq {\mathsf\Delta}^{m'}_{pI} ((U^\mathbb D \lambda_{I})(\nu_1),(U^\mathbb D \lambda_{I})(\nu_2))\\ &= (\langle{p,\lambda}\rangle^*{\mathsf\Delta}{ })^{m'}_{I} (\nu_1,\nu_2). \end{align*}

Second, we show the F-unit reflexivity of $ \langle{p,\lambda}\rangle^*{\mathsf\Delta} $ . For $ FI = (I,I,R_{FI}) $ , we have $ EpI = (p = (pI,pI,R_{FI}) $ for all $ I \in {\mathbb{C}} $ . We can calculate for all $ (x,y) \in R_F $ ,

\begin{align*} (\langle{p,\lambda}\rangle^*{\mathsf\Delta}{ })^{1_M}_{I} (\eta^S_{I} \mathbin{\bullet} x,\eta^S_{I} \mathbin{\bullet} y) &= {\mathsf\Delta}^{1_M}_{pI} ( (U^\mathbb D \lambda_{I}) (U^{{\mathbb{C}}} \eta^S_{I} \circ x),(U^\mathbb D \lambda_{I}) (U^{{\mathbb{C}}} \eta^S_{I} \circ y))\\ &= {\mathsf\Delta}^{1_M}_{pI} ( U^\mathbb D \lambda_{I} \circ U^{\mathbb D} p \eta^S_{I} \circ x), U^\mathbb D \lambda_{I} \circ U^{\mathbb D} p \eta^S_{I} \circ y)\\ &= {\mathsf\Delta}^{1_M}_{pI} ((\lambda_{I} \circ p \eta^S_{I}) \mathbin{\bullet} x,(\lambda_{I} \circ p \eta^S_{I}) \mathbin{\bullet} y)\\ &= {\mathsf\Delta}^{1_M}_{pI} (\eta^T_{pI} \mathbin{\bullet} x,\eta^T_{pI} \mathbin{\bullet} y) \leq 0. \end{align*}

Finally, we show the $ {F} $ -composability of $ \langle{p,\lambda}\rangle^*{\mathsf\Delta} $ . For all $ J \in {\mathbb{C}} $ , $ c_1,c_2 \in U^{\mathbb{C}} S I $ , and $ f_1,f_2 \colon I \to S J $ we can calculate

\begin{align*} (\langle{p,\lambda}\rangle^*{\mathsf\Delta}{ })^{mn}_{J} (f_1^\sharp \mathbin{\bullet} c_1, f_2^\sharp \mathbin{\bullet} c_2) &= {\mathsf\Delta}^{mn}_{pJ} ( (U^\mathbb D \lambda_{J}) (U^{\mathbb{C}} (f_1^\sharp) \circ c_1), (U^\mathbb D \lambda_{J}) (U^{\mathbb{C}} (f_2^\sharp) \circ c_2))\\ &= {\mathsf\Delta}^{mn}_{pJ} ( U^\mathbb D \lambda_{J} \circ U^\mathbb D p (f_1^\sharp) \circ c_1, U^\mathbb D \lambda_{J} \circ U^\mathbb D p (f_2^\sharp) \circ c_2)\\ (\ast)&= {\mathsf\Delta}^{mn}_{pJ} ( U^\mathbb D ((\lambda_J \circ p f_1 )^\sharp) \circ U^\mathbb D \lambda_I \circ c_1, U^\mathbb D ((\lambda_J \circ p f_2 )^\sharp) \circ U^\mathbb D \lambda_I \circ c_2)\\ &= {\mathsf\Delta}^{mn}_{pJ} ( (\lambda_J \circ p f_1 )^\sharp \mathbin{\bullet} (\lambda_I \mathbin{\bullet} c_1), (\lambda_J \circ p f_2 )^\sharp \mathbin{\bullet} (\lambda_I \mathbin{\bullet} c_2))\\ &\leq {\mathsf\Delta}^{m}_{pI} (\lambda_I \mathbin{\bullet} c_1,\lambda_I \mathbin{\bullet} c_2)+ \sup_{(x, y) \in EpI} {\mathsf\Delta}^{n}_{pJ} ( (\lambda_J \circ p f_1 ) \mathbin{\bullet} x, (\lambda_J \circ p f_2 ) \mathbin{\bullet} y) \\ &= (\langle{p,\lambda}\rangle^*{\mathsf\Delta}{ })^{m}_{I} (c_1,c_2) + \sup_{(x, y) \in FI} (\langle{p,\lambda}\rangle^*{\mathsf\Delta}{ })^{n}_{J} (f_1 \mathbin{\bullet} x,f_2 \mathbin{\bullet} y). \end{align*}

To prove the equality ( $ \ast $ ), we calculate

\begin{align*} U^\mathbb D \lambda_{J} \circ U^\mathbb D p (f_i^\sharp) &= U^\mathbb D (\lambda_{J} \circ p \mu^S_J \circ p S f_i) = U^\mathbb D (\mu^T_{pJ}\circ T\lambda_{J} \circ \lambda_{SJ} \circ p S f_i) \\ &= U^\mathbb D (\mu^T_{pJ}\circ T\lambda_{J} \circ T p f_i \circ \lambda_{I} ) = U^\mathbb D ((\lambda_{J} \circ p f_i)^\sharp \circ \lambda_{I} ). \end{align*}

This completes the proof.

Proof. (Proof of Proposition 13)

It suffices to show $ \operatorname{Top} $ -unit reflexivity and $ \operatorname{Top} $ -composability:

\begin{align*} &{\mathsf\Delta}^{\mathsf{lip},d_S}_I (\eta_I(x), \eta_I(y)) = \sup_{s', s \in S}\frac{d_S(\pi_2 (x,s), \pi_2 (y,s'))}{d_S(s,s')} = \frac{d_S(s,s')}{d_S(s,s')} = 1,\\ &{\mathsf\Delta}^{\mathsf{lip},d_S}_J (F_1^\sharp (f_1),F_2^\sharp (f_2))\\ &= \sup_{s', s \in S} \frac{d_S(\pi_2(F_1(\pi_1 f_1(s))(\pi_2 f_1(s))),\pi_2(F_2(\pi_1 f_2(s'))(\pi_2 f_2(s'))) ) }{d_S(s,s')}\\ &= \sup_{s', s \in S} \frac{d_S(\pi_2 f_1(s) ,\pi_2 f_2(s') )}{d_S(s,s')} \cdot \frac{d_S(\pi_2(F_1(\pi_1 f_1(s))(\pi_2 f_1(s))),\pi_2(F_2(\pi_1 f_2(s'))(\pi_2 f_2(s'))) ) }{d_S(\pi_2 f_1(s),\pi_2 f_2(s'))}\\ & \leq \sup_{s', s \in S} \frac{d_S(\pi_2 f_1(s),\pi_2 f_2(s'))}{d_S(s,s')} \cdot \sup_{t', t \in S} \frac{d_S(\pi_2 (F_1(\pi_1 f_1(s))(t)),\pi_2 (F_2(\pi_1 f_2(s'))(t')) ) }{d_S(t,t')}\\ &\leq {\mathsf\Delta}^{\mathsf{lip},d_S}_I(f_1,f_2) \cdot \sup_{x, y \in I} {\mathsf\Delta}^{\mathsf{lip},d_S}_J(F_1(x),F_2(y)) \end{align*}

Here, $ F_1,F_2 \colon I \to T_S J $ and $ f_1,f_2 \in T_S I $ .

Proof. (Proof of Proposition 14)

It suffices to show $ \operatorname{Eq} $ -unit reflexivity and $ \operatorname{Eq} $ -composability:

\begin{align*} {\mathsf\Delta}^{\mathsf{met},d_S}_I (\eta_I(x), \eta_I(x)) &= \sup_{s \in S} d_S(\pi_2 (x,s), \pi_2 (x,s)) = \sup_{s \in S} d_S(s,s) = 0.\\ {\mathsf\Delta}^{\mathsf{met},d_S}_J (F_1^\sharp (f_1),F_2^\sharp (f_2)) &= \sup_{s \in S} d_S(\pi_2 (F_1(\pi_1 f_1(s))(\pi_2 f_1(s))),\pi_1 (F_2(\pi_1 f_2(s))(\pi_2 f_2(s))) ) \\ &\leq \sup_{s \in S} d_S(\pi_2 (F_1(\pi_1 f_1(s))(\pi_2 f_1(s))),\pi_2 (F_2(\pi_1 f_1(s))(\pi_2 f_1(s))) ) \\ & \quad + \sup_{s \in S} d_S(\pi_2 (F_2(\pi_1 f_1(s))(\pi_2 f_1(s))),\pi_2 (F_2(\pi_1 f_1(s))(\pi_2 f_2(s))) )\\ & \leq \sup_{x \in I} {\mathsf\Delta}^{\mathsf{met},d_S}_J(F_1(x),F_2(x)) + {\mathsf\Delta}^{\mathsf{met},d_S}_I (f_1,f_2) \end{align*}

Here, $ F_1,F_2 \colon I \to T_S J $ and $ f_1,f_2 \in T_S I $ . Without loss of generality, we may assume $ \pi_1 f_1 = \pi_1 f_2 $ holds and $ \pi_2 f_1 $ and $ \pi_2 f_2 $ are nonexpansive, and for every $ x \in I $ , $ \pi_1 F_1(x) = \pi_1 F_2(x) $ holds and $ \pi_2 F_1(x) $ and $ \pi_2 F_2(x) $ are nonexpansive.

Proof. (Proof of Proposition 15) We first show the $ \operatorname{Eq} {} $ -unit reflexivity of $ d^{T_S (-)} $ . For any $ s \in S $ , we calculate

\begin{align*} d_{T_S I} (\eta_I(x), \eta_I(x)) &=\sup_{s \in S} \max\left( d_I(\pi_1 (x,s), \pi_1 (x,s)), d_S(\pi_2 (x,s), \pi_2 (x,s)\right) \\ & = \sup_{s \in S} \max(d_I(x,x),d_S(s,s)) = 0. \end{align*}

We next show the $ \operatorname{Eq} {} $ -composability of $ d^{T_S (-)} $ . For any $ f_1,f_2 \in T_S (I,d_I) $ and nonexpansive functions $ F_1,F_2 \colon (I,d_I) \to T_S (J,d_J) $ , we compute

We note here that the nonexpansivity of $ F_2 \colon (I,d_I) \to (S,d_S) \Rightarrow (S,d_S) \times (J,d_J) $ is equivalent to the one of its uncurrying $ \overline{F_2} \colon (S,d_S) \times (I,d_I) \to (S,d_S) \times (J,d_J) $ .

Proof. (Proof of Proposition 16) We first show the $ \mathsf{Dist}_0 {} $ -unit reflexivity of $ {\mathsf\Delta}^{\mathsf{Dist}_0} $ . For $ (x_1,x_2) \in \mathsf{Dist}_0 (I,d_I) $ (i.e. $ d_I (x_1,x_2) = 0 $ ), we calculate

\begin{align*} {\mathsf\Delta}^{\mathsf{Dist}_0}_{(I,d_I)} (\eta_I(x_1), \eta_I(x_2)) &= \sup_{d_S(s_1,s_2) = 0} \max\left( d_I(\pi_1 (x_1,s_2), \pi_1 (x_2,s_2)), d_S(\pi_2 (x_1,s_1), \pi_2 (x_2,s_2)\right) \\ & = \sup_{d_S(s_1,s_2) = 0} \max(d_I(x_1,x_2),d_S(s_1,s_2)) = 0. \end{align*}

Next, we show the $ \mathsf{Dist}_0 {} $ -composability of $ {\mathsf\Delta}^{\mathsf{Dist}_0} $ . For any $ f_1,f_2 \in T_S (I,d_I) $ and nonexpansive functions $ F_1,F_2 \colon (I,d_I) \to T_S (J,d_J) $ , we compute

This completes the proof.

Proof. (Proof of Proposition 17) Since $ M = 1 $ , the monotonicity of $ {\mathsf C}({\mathsf\Delta}, N) $ is obvious.

We first show the $ \operatorname{Eq} $ -unit reflexivity of $ {\mathsf C}({\mathsf\Delta}, N) $ . We recall that for all $ x \in UI $ , we have

\begin{alignat*}{2} T \pi_1 \mathbin{\bullet} (\eta^{T(N \times - )}_I \mathbin{\bullet} x) &=(T \pi_1 \circ \eta^{T(N \times - )}_I )\mathbin{\bullet} x &&=(T \pi_1 \circ \eta^{T}_{N \times I} \circ \eta^{(N \times -)}_I )\mathbin{\bullet} x\\ &=(\eta^{T}_{N} \circ \pi_1 \circ \langle 1_N \circ !_I,\mathrm{id}_I\rangle )\mathbin{\bullet} x &&=(\eta^{T}_{N} \circ 1_N \circ !_I)\mathbin{\bullet} x\\ &= \eta^{T} \mathbin{\bullet} ((1_N \circ !_I ) \mathbin{\bullet} x) &&= \eta^{T} \mathbin{\bullet} (1_N \mathbin{\bullet} (!_{I} \mathbin{\bullet} x))\\ &= \eta^{T} \mathbin{\bullet} (1_N \mathbin{\bullet} \mathrm{id}_{1}) &&= \eta^{T} \mathbin{\bullet} U1_N. \end{alignat*}

Hence,

\begin{alignat*}{2} {\mathsf C}({\mathsf\Delta},N)_{I} (\eta^{T(N \times - )}\mathbin{\bullet} x,\eta^{T(N \times - )}\mathbin{\bullet} x) &={\mathsf C}({\mathsf\Delta}, N)_{I} (\eta^{T(N \times - )}\mathbin{\bullet} x,\eta^{T(N \times - )}\mathbin{\bullet} x) \\ &={\mathsf\Delta}^{}_{N} (T \pi_1 \mathbin{\bullet} (\eta^{T(N \times - )}\mathbin{\bullet} x),T \pi_1 \mathbin{\bullet} (\eta^{T(N \times - )}\mathbin{\bullet} x)\\ %&=\asg {}N (T \pi_1 \circ \eta^{T} \circ \langle 1_N,x\rangle,T \pi_1 \circ \eta^{T} \circ \langle 1_N,x\rangle)\\ &={\mathsf\Delta}^{}_{N} (\eta^{T} \mathbin{\bullet} U1_N,\eta^{T} \mathbin{\bullet} U1_N) \\ &\leq 0_{\mathcal{Q}}. \end{alignat*}

We next show the $ \operatorname{Eq} $ -composability of $ {\mathsf C}({\mathsf\Delta}, N) $ . For any $ f \colon I \to T(N \times I) $ , we define $ h_f \colon N \times I \to T(N) $ by $ h_f = T(\star) \circ \theta_{N,N} \circ (\mathrm{id}_N \times (T \pi_1 \circ f)) $ . Then, we have $ T \pi_1 \mathbin{\bullet} f^{\sharp(T(N \times I))}\mathbin{\bullet} \nu = h_f^{\sharp T}\mathbin{\bullet} \nu $ for any $ \nu \in U(T(N \times I)) $ . First, for all $ m,n \in UN $ , we have

\begin{alignat*}{2} (T(\star) \circ (\eta_{N \times N})_n) \mathbin{\bullet} m &= T(\star) \mathbin{\bullet} (\eta_{N \times N} \mathbin{\bullet} \langle n,m\rangle) &&= (T(\star) \circ \eta_{N \times N}) \mathbin{\bullet} \langle n,m\rangle)\\ &= (\eta_{N}\circ \star) \mathbin{\bullet} \langle n,m\rangle) &&= \eta_{N} \mathbin{\bullet} (\star \mathbin{\bullet} \langle n,m\rangle)\\ &= \eta_{N} \mathbin{\bullet} (\star_n \mathbin{\bullet} m) &&= (\eta_{N} \circ \star_n ) \mathbin{\bullet} m. \end{alignat*}

From this and the equality (1), we can calculate as follows:

From the assumption $ {\mathsf\Delta}^{}_{N \times I} (c_1, c_2) \leq {\mathsf C}({\mathsf\Delta}, N)_{I} (c_1, c_2) $ , the $ \operatorname{Eq} $ -unit reflexivity and $ \operatorname{Eq} $ -composability of the original divergence $ {\mathsf\Delta} $ , we obtain the $ \operatorname{Eq} $ -composability of $ {\mathsf C}({\mathsf\Delta}, N) $ as follows:

This completes the proof.

Proof. (Proof of Proposition 18)

We consider a preorder $ \sqsubseteq $ on a monad T. We define the $ {\mathcal{B}} $ -divergence $ {\mathsf\Delta}^{\sqsubseteq}_{ }{} $ on TI by:

\[ {\mathsf\Delta}^{\sqsubseteq}_{I} (c_1,c_2) \triangleq \begin{cases} 0 & c_1 \not\sqsubseteq_I c_2 \\ 1 & c_1 \sqsubseteq_I c_2 \end{cases} \]

Each $ \tilde{\mathsf\Delta}(1)I $ is a preorder because $ \tilde{\mathsf\Delta}(1)I = {\sqsubseteq_I} $ holds for each I.

The $ \operatorname{Eq} $ -unit reflexivity of $ {\mathsf\Delta}^{\sqsubseteq}_{ }{} $ is derived from the reflexivity of $ \sqsubseteq $ . For all set I and $ c \in TI $ ,

\[ ({\mathsf\Delta}^{\sqsubseteq}_{I} (c,c) \leq 1) \iff (c \sqsubseteq_I c). \]

Since $ \sqsubseteq $ is a preorder on T, for any $ I,J\in{\bf Set} $ , $ c_1,c_2 \in TI $ and $ f,g \colon I \to TJ $ ,

\begin{align*} &({\mathsf\Delta}^{\sqsubseteq}_{I} (c_1, c_2) \times \sup_{x \in I}{\mathsf\Delta}^{\sqsubseteq}_{J} (f(x), g(x)) ) = 1\\ &\iff ({\mathsf\Delta}^{\sqsubseteq}_{I} (c_1, c_2) = 1) \land (\sup_{x \in I}{\mathsf\Delta}^{\sqsubseteq}_{J} (f(x), g(x)) = 1)\\ &\iff (c_1 \sqsubseteq_I c_2) \land (\forall{x \in I}.~ f(x) \sqsubseteq_J g(x))\\ &\implies (f^\sharp (c_1) \sqsubseteq_J f^\sharp (c_2)) \land (f^\sharp (c_2) \sqsubseteq_J g^\sharp (c_2))\\ &\implies (f^\sharp (c_1) \sqsubseteq_J g^\sharp (c_2))\\ &\iff({\mathsf\Delta}^{\sqsubseteq}_{J} (f^\sharp (c_1), g^\sharp (c_2)) = 1) \end{align*}

Hence, we have the $ \operatorname{Eq} $ -composability:

\begin{align*} {\mathsf\Delta}^{\sqsubseteq}_{J} (f^\sharp (c_1), g^\sharp (c_2)) \leq {\mathsf\Delta}^{\sqsubseteq}_{I} (c_1, c_2) \times \sup_{x \in I} {\mathsf\Delta}^{\sqsubseteq}_{J} (f(x), g(x)). \end{align*}

Conversely, we consider an $ \operatorname{Eq} $ -relative $ {\mathcal{B}} $ -divergence $ {\mathsf\Delta} $ on T such that each $ \tilde {\mathsf\Delta} (1) I $ is a preorder. We show that the family $ \sqsubseteq^{\mathsf\Delta} = \{ \sqsubseteq^{\mathsf\Delta}_I \}_{I \in {\bf Set}} $ defined by $ {\sqsubseteq^{\mathsf\Delta}_I} \triangleq \tilde {\mathsf\Delta} (1) I $ forms a preorder on monad T.

Each component $ \sqsubseteq^{\mathsf\Delta}_I $ of $ \sqsubseteq^{\mathsf\Delta} $ at set I is a preorder on the set TI. We here note that the divergence $ {\mathsf\Delta} $ must be reflexive (i.e. $ {\mathsf\Delta}^{}_{I}(c,c) \leq 1 $ for all $ I \in {\bf Set},c \in TI $ ) because of the reflexivity of $ \sqsubseteq^{\mathsf\Delta}_I $ :

\[ ({\mathsf\Delta}^{}_{I}(c,c) \leq 1) \iff (c \sqsubseteq^{\mathsf\Delta}_I c), \quad \text{ for all } I\in {\bf Set}, c \in TI. \]

From the reflexivity and $ \operatorname{Eq} $ -composability of $ {\mathsf\Delta} $ , we have for all $ c_1,c_2,c \in TI $ and $ f,g \colon I \to TJ $ ,

(30)
(31)

They are equivalent to the substitutivity and congruence of $ \sqsubseteq^{\mathsf\Delta} $ , respectively:

\begin{align*} (\ref{divergence_premonad_subst}) &\iff \forall{c_1,c_2 \in TI,f\colon I \to TJ}~.~ (c_1 \sqsubseteq^{\mathsf\Delta}_I c_2 \implies f^\sharp (c_1) \sqsubseteq^{\mathsf\Delta}_J f^\sharp (c_2)),\\ (\ref{divergence_premonad_congr}) &\iff \forall{c \in TI, f,g \colon I \to TJ}~.~ (\forall{x \in I}.~ f(x) \sqsubseteq^{\mathsf\Delta}_J g(x) \implies f^\sharp (c) \sqsubseteq^{\mathsf\Delta}_J g^\sharp (c)). \end{align*}

Finally, the above conversions $ {\mathsf\Delta}^{(-)} $ and $ \sqsubseteq^{(-)} $ are mutually inverse:

\begin{align*} {\mathsf\Delta}^{\sqsubseteq^{{\mathsf\Delta}'}}_{I} (c_1,c_2) \leq 1 \iff c_1 \sqsubseteq^{{\mathsf\Delta}'}_I c_2 \iff {\mathsf\Delta}'_I(c_1,c_2) \leq 1,\\ c_1 \sqsubseteq^{{\mathsf\Delta}^{\sqsubseteq'}}_I c_2 \iff {\mathsf\Delta}^{\sqsubseteq'}_I(c_1,c_2) \leq 1 \iff c_1 \sqsubseteq'_I c_2. \end{align*}

This completes the proof.

Appendix C. Proofs for Section 5 (Properties of Divergences on Monads)

Proof. (Proof of Theorem 21) First, it is easy to see that the inequality (3) is equivalent to $ {\mathsf\Delta} $ satisfying E-unit reflexivity.

We next show that the inequality (4) is equivalent to $ {\mathsf\Delta} $ satisfying E-composability.

(only if) Since $ U 1 = \{ {\rm id}_1 \} $ , we have $ R_{E 1} = \{ ({\rm id}_1, {\rm id}_1) \} $ . Therefore, it holds $ d^{\mathsf\Delta}_{1, J} (c_1, c_2) = {\mathsf\Delta}^{}_{J} (c_1, c_2) $ . By letting $ I = 1 $ in the inequality (4), we obtain the E-composability:

\begin{eqnarray*} & & d^{\mathsf\Delta}_{1, K} (f_1 \circ_{{\mathbb{C}}_T} c_1, f_2 \circ_{{\mathbb{C}}_T} c_2) \leq d^{\mathsf\Delta}_{J, K} (f_1, f_2) + d^{\mathsf\Delta}_{1, J} (c_1, c_2)\\ & \iff & {\mathsf\Delta}^{}_{K} (f_1^\sharp \circ c_1, f_2^\sharp \circ c_2) \leq \sup_{(x_1, x_2) \in E I} {\mathsf\Delta}^{}_{K} (f_1 \mathbin{\bullet} x_1, f_2 \mathbin{\bullet} x_2) + {\mathsf\Delta}^{}_{J} (c_1, c_2). \end{eqnarray*}

(if) From the E-composability, for any $ f_1, f_2 : I \to T J $ and $ g_1, g_2 : J \to T K $ and $ (x_1, x_2) \in E I $ , we have

\begin{equation*} {\mathsf\Delta}^{}_{K} (g_1^\sharp \mathbin{\bullet} (f_1 \mathbin{\bullet} x_1), g_2^\sharp \mathbin{\bullet} (f_2 \mathbin{\bullet} x_2)) \leq d^{\mathsf\Delta}_{J, K} (g_1, g_2) + {\mathsf\Delta}^{}_{J} (f_1 \mathbin{\bullet} x_1, f_2 \mathbin{\bullet} x_2) . \end{equation*}

Next, for any $ (x_1, x_2) \in E I $ , we have $ {\mathsf\Delta}^{}_{J} (f_1 \mathbin{\bullet} x_1, f_2 \mathbin{\bullet} x_2) \leq d^{\mathsf\Delta}_{I, J} (f_1, f_2) $ . Thus, by monotonicity of $ (+) $ we have

\begin{equation*} {\mathsf\Delta}^{}_{K} (g_1^\sharp \mathbin{\bullet} f_1 \mathbin{\bullet} x_1, g_2^\sharp \mathbin{\bullet} f_2 \mathbin{\bullet} x_2) \leq d^{\mathsf\Delta}_{J, K} (g_1, g_2) + d^{\mathsf\Delta}_{I, J} (f_1, f_2) . \end{equation*}

By discharging $ (x_1, x_2) \in E I $ , we conclude

\begin{equation*} d^{\mathsf\Delta}_{I, K} (g_1^\sharp \circ f_1, g_2^\sharp \circ f_2) \leq d^{\mathsf\Delta}_{J, K} (g_1, g_2) + d^{\mathsf\Delta}_{I, J} (f_1, f_2) . \end{equation*}

Proof. (Proof of Lemma 22) We write V for $ V_{{\mathcal{Q}},{\mathbb{C}}} $ . From $ x+\top=\top $ , we obtain the following equivalence (below $ z_1,z_2\in U(VZ) $ and $ x_1,x_2\in U(VX) $ ):

\begin{align*} & f\in {\bf Div}_{{\mathcal{Q}}}({{\mathbb{C}}})(Z\otimes X,Y)\\ & \iff \forall{z_1,z_2,x_1,x_2}~.~d_Y(f\bullet\langle z_1,x_1\rangle,f\bullet\langle z_2,x_2\rangle)\le d_Z(z_1,z_2)+d_X(x_1,x_2)\\ & \iff \forall{z_1,z_2,x_1,x_2,d_X(x_1,x_2)=0}~.~d_Y(f\bullet\langle z_1,x_1\rangle,f\bullet\langle z_2,x_2\rangle)\le d_Z(z_1,z_2)\\ & \iff \forall{z_1,z_2}~.~\sup_{x_1,x_2,d_X(x_1,x_2)=0}d_Y(f\bullet\langle z_1,x_1\rangle,f\bullet\langle z_2,x_2\rangle)\le d_Z(z_1,z_2)\\ & \iff \forall{z_1,z_2}~.~d_{X\multimap Y}(\lambda(f)\bullet z_1,\lambda(f)\bullet z_2)\le d_Z(z_1,z_2)\\ & \iff \lambda(f)\in{\bf Div}_{{\mathcal{Q}}}({{\mathbb{C}}})(Z,X\multimap Y). \end{align*}

This shows that the currying bijection $ \lambda:{\mathbb{C}}(VZ\times VX,VY)\to {\mathbb{C}}(VZ,VX\Rightarrow VY) $ restricts to the one from $ {\bf Div}_{{\mathcal{Q}}}({{\mathbb{C}}})(Z\otimes X,Y) $ to $ {\bf Div}_{{\mathcal{Q}}}({{\mathbb{C}}})(Z,X\multimap Y) $ , showing that $ V_{{\mathcal{Q}},{\mathbb{C}}} $ is a map of adjunction.

Proof. (Proof of Theorem 25) $ [{\mathsf\Delta}] $ is a graded variant of the codensity lifting performed along the fibration $ V_{{\mathcal{Q}},{\mathbb{C}}}:{\bf Div}_{{\mathcal{Q}}}({{\mathbb{C}}})\to{{\mathbb{C}}} $ (Katsumata et al., Reference Katsumata, Sato and Uustalu2018, see also Definition 37). Proving that it is a graded lifting of T is routine. We show $ {\mathsf\Delta}^m_I = [{\mathsf\Delta}] m ( {E}^{\mathcal{Q}}I) $ . The direction $ [{\mathsf\Delta}] m ( {E}^{\mathcal{Q}}I) \preceq_{TI} {\mathsf\Delta}^m_I $ is easy. We show the converse. From the composability of $ {\mathsf\Delta} $ , for any $ c_1, c_2 \in U (T I) $ , $ J \in {\mathbb{C}} $ , $ n \in M $ and $ f \in {\bf Div}_{{\mathcal{Q}}}({{\mathbb{C}}}) ( {E}^{\mathcal{Q}}I, {\mathsf\Delta}^n_J) $ , we have

\[ {\mathsf\Delta}^{m \cdot n}_J (f^\sharp \mathbin{\bullet} c_1, f^\sharp \mathbin{\bullet} c_2) \leq {\mathsf\Delta}^m_I (c_1, c_2) + \sup_{(x_1, x_2) \in E I} {\mathsf\Delta}^n_J (f \mathbin{\bullet} x_1, f \mathbin{\bullet} x_2) . \]

Next, the nonexpansivity of f is equivalent to

\[ \sup_{(x_1, x_2) \in E I} {\mathsf\Delta}^n_J (f \mathbin{\bullet} x_1, f \mathbin{\bullet} x_2) \le 0. \]

Therefore, we conclude $ {\mathsf\Delta}^{m \cdot n}_J (f^\sharp \mathbin{\bullet} c_1, f^\sharp \mathbin{\bullet} c_2) \leq {\mathsf\Delta}^m_I (c_1, c_2) $ . By discharging J, n, f, we conclude the inequality $ {\mathsf\Delta}^m_I\preceq_{TI} [{\mathsf\Delta}] m ( {E}^{\mathcal{Q}}I) $ .

Proof. (Proof of Theorem 26) Let $ {\mathsf\Delta} \in {\bf Div}(T, E,M,{\mathcal{Q}}) $ . We have already shown that $ [{\mathsf\Delta}] $ is an M-graded $ {\mathcal{Q}} $ -divergence lifting of T. We show that $ [{\mathsf\Delta}] $ is E-strong (this proof does not need the closedness of $ {\mathbb{C}} $ ). Let $ X \triangleq (I, d) \in {\bf Div}_{{\mathcal{Q}}}({{\mathbb{C}}}) $ and $ J \in {\mathbb{C}} $ be objects. We first rewrite the goal:

In the step $\mathbin{\stackrel{\dagger}{\iff}}$ , we use the equality (1). To show this goal, we proceed as follows. Let $ x_1, x_2 \in U I, c_1, c_2 \in U (T J), K \in {\mathbb{C}}, n \in M $ and $ f \in {\bf Div}_{{\mathcal{Q}}}({{\mathbb{C}}}) (X \otimes {E}^{\mathcal{Q}}J, {\mathsf\Delta}^n_K) $ . First, from the composability of $ {\mathsf\Delta} $ , we obtain

\[ {\mathsf\Delta}^{m \cdot n}_K ((f_{x_1})^\sharp \mathbin{\bullet} c_1, (f_{x_2})^\sharp \mathbin{\bullet} c_2) \leq {\mathsf\Delta}_J^m (c_1, c_2) + \sup_{(y_1, y_2) \in E J} {\mathsf\Delta}^n_K (f_{x_1} \mathbin{\bullet} y_1, f_{x_2} \mathbin{\bullet} y_2) . \]

We look at summands of the right-hand side. First, we have $ {\mathsf\Delta}^m_J (c_1, c_2) \leq d_{[{\mathsf\Delta}] m ( {E}^{\mathcal{Q}}J)} (c_1, c_2) $ by Theorem 25. Next, from the nonexpansivity of f, for any $ x_1,x_2\in UI,y_1, y_2 \in U J $ , we have

\[ {\mathsf\Delta}^n_K (f_{x_1} \mathbin{\bullet} y_1, f_{x_2} \mathbin{\bullet} y_2)= {\mathsf\Delta}^n_K (f \mathbin{\bullet} \langle x_1, y_1 \rangle, f \mathbin{\bullet} \langle x_2, y_2 \rangle) \leq d (x_1, x_2) + {E}^{\mathcal{Q}}J (y_1, y_2) . \]

Because $ x+\top=\top $ , we obtain

\[ \forall{x_1, x_2 \in U I}~.~ \sup_{(y_1, y_2) \in E J} {\mathsf\Delta}^n_K (f_{x_1} \mathbin{\bullet} y_1, f_{x_2} \mathbin{\bullet} y_2) \leq d (x_1, x_2) . \]

Therefore, we obtain the goal:

\[ {\mathsf\Delta}^{m \cdot n}_K ((f_{x_1})^\sharp \mathbin{\bullet} c_1, (f_{x_2})^\sharp \mathbin{\bullet} c_2) \leq d (x_1, x_2) + d_{[{\mathsf\Delta}] m ( {E}^{\mathcal{Q}}J)} (c_1, c_2). \]

Next, let $ \dot{T} \in {\bf SGDLift}(T, E,M,{\mathcal{Q}}) $ . We show $ \langle \dot{T}\rangle \in {\bf Div}(T, E,M,{\mathcal{Q}}) $ .

The unit law of $ \dot{T} $ immediately entails

\[ \eta_I \in {\bf Div}_{{\mathcal{Q}}}({{\mathbb{C}}}) ( {E}^{\mathcal{Q}}I, \dot{T} 1( {E}^{\mathcal{Q}}I)) . \]

Next, under the assumption on $ ({\mathbb{C}}, T) $ and $ {\mathcal{Q}} $ , in $ {\bf Div}_{{\mathcal{Q}}}({{\mathbb{C}}}) $ the functor $ (-) \otimes {E}^{\mathcal{Q}}I $ has a right adjoint $ {E}^{\mathcal{Q}}I \multimap (-) $ above the adjunction $ (-) \times I \dashv I \Rightarrow (-) $ (Lemma 22). Therefore, each component of the uncurried bind morphism ub given in (7) are nonexpansive morphisms in $ {\bf Div}_{{\mathcal{Q}}}({{\mathbb{C}}}) $ :

Therefore, we conclude

\[ \mathrm{ub} \in {\bf Div}_{{\mathcal{Q}}}({{\mathbb{C}}}) (\langle \dot{T}\rangle m ( {E}^{\mathcal{Q}}I) \otimes ( {E}^{\mathcal{Q}}I \multimap \langle \dot{T}\rangle n ( {E}^{\mathcal{Q}}J)), \langle \dot{T}\rangle (m \cdot n) ( {E}^{\mathcal{Q}}J)) . \]

We also easily have monotonicity: $ \langle \dot{T}\rangle m ( {E}^{\mathcal{Q}}I) \leq \langle \dot{T}\rangle n ( {E}^{\mathcal{Q}}I) $ for $ m \leq n $ by condition 1 of graded divergence lifting. We thus conclude that $ \langle \dot{T}\rangle m {E}^{\mathcal{Q}}I \in {\bf Div}(T, E,M,{\mathcal{Q}}) $ .

We finally show $ \dot{T} \preceq [\langle \dot{T}\rangle] $ . Let $ c_1, c_2 \in U (T I) $ . We show

(32) \begin{equation} \label{eq:goal} \sup_{n \in M, J \in {\mathbb{C}}, f \in {\bf Div}_{{\mathcal{Q}}}({{\mathbb{C}}}) (X, \dot{T} n ( {E}^{\mathcal{Q}}J))} d_{\dot{T} (m \cdot n) ( {E}^{\mathcal{Q}}J)} (f^\sharp (c_1), f^\sharp (c_2)) \leq d_{\dot{T} m X} (c_1, c_2) . \end{equation}

Let $ n \in M, J \in {\mathbb{C}}, f \in {\bf Div}_{{\mathcal{Q}}}({{\mathbb{C}}}) (X, \dot{T} n ( {E}^{\mathcal{Q}} J)) $ . Since $ \dot{T} $ is an M-graded $ {\mathcal{Q}} $ -divergence lifting of T, we obtain

\[ f^\sharp \in {\bf Div}_{{\mathcal{Q}}}({{\mathbb{C}}}) (\dot{T} m X, \dot{T} (m \cdot n) ( {E}^{\mathcal{Q}}J)) . \]

This implies the inequality $ d_{\dot{T} (m \cdot n) ( {E}^{\mathcal{Q}}J)} (f^\sharp (c_1), f^\sharp (c_2)) \leq d_{\dot{T} m X} (c_1, c_2) $ in $ {\mathcal{Q}} $ . By taking the sup for n, J, f, we obtain the inequality (32).

Proof. (Proof of Proposition 28) We write $ |1| = \{ \ast \} $ . We first check the measurable isomorphism $ G_{s} 1 \cong [0,1] $ . The measurable functions $ \mathrm{ev}_{\{\ast\}} \colon G_{s} 1 \to [0,1] $ ( $ \nu \mapsto \nu (\ast) $ ) and the function $ H \colon | [0,1]| \to |G_{s} 1| $ ( $ r \mapsto r \cdot \mathbf{d}_{\ast} $ ) are mutually inverse. For any (Borel-)measurable $ U \in \Sigma_{[0,1]} $ , we have $ H^{-1}(\mathrm{ev}_{\{\ast\}}^{-1} (U)) = U $ and $ H^{-1}(\mathrm{ev}_{\emptyset}^{-1} (U)) = [0,1] $ if $ 0 \in U $ and $ H^{-1}(\mathrm{ev}_{\emptyset}^{-1} (U)) = \emptyset $ otherwise. Since all generators of $ \Sigma_{G_{s} 1} $ are $ \mathrm{ev}_{\{\ast\}}^{-1}(U) $ and $ \mathrm{ev}_{\emptyset}^{-1} (U) $ where $ U \in \Sigma_{[0,1]} $ , we conclude the measurability of $ H \colon [0,1] \to G_{s} 1 $ . Thus, $ f \colon I \to [0,1] $ corresponds bijectively to $ H \circ f \colon I \to G_{s} 1 $ , and

\[ \int_I f d\nu_1 = \int_I \mathrm{ev}_{\{\ast\}} \circ H \circ f d\nu_1 = ((H \circ f)^\sharp \nu_1) (\{\ast \}). \]

We then obtain, for all $ I \in {\bf Meas} $ , $ \nu_1,\nu_2 \in G_{s} I $

\begin{align*} \mathsf{DP}_{I}^{\varepsilon} (\nu_1,\nu_2) &= \sup_{S \in \Sigma_I}(\nu_1(S) - \exp(\varepsilon)\nu_2(S))\\ &= \sup_{ S \in \Sigma_I }\left(\int_I \chi_S d\nu_1 - \exp(\varepsilon)\int_I \chi_S d\nu_2\right)\\ &\leq \sup_{f \colon I \to [0,1] }\left(\int_I f d\nu_1 - \exp(\varepsilon)\int_I f d\nu_2\right)\\ &= \sup_{f \colon I \to [0,1] }(((H \circ f)^\sharp\nu_1)(\ast) - \exp(\varepsilon)((H \circ f)^\sharp\nu_2)(\ast))\\ &\leq \sup_{f \colon I \to [0,1] }\sup_{S'\in \Sigma_1 (\iff S' = \{\ast\},\emptyset)}((H \circ f)^\sharp\nu_1)(S') - \exp(\varepsilon)((H \circ f)^\sharp\nu_2)(S'))\\ &= \sup_{f \colon I \to [0,1] } \mathsf{DP}_{1}^{\varepsilon} ((H \circ f)^\sharp\nu_1,(H \circ f)^\sharp\nu_2)\\ &= \sup_{g \colon I \to G_{s} 1 } \mathsf{DP}_{1}^{\varepsilon} (g^\sharp\nu_1,g^\sharp\nu_2)\\ &\leq \mathsf{DP}_{I}^{\varepsilon} (\nu_1,\nu_2). \end{align*}

The first inequality is given by $ \nu(S) = \int_I \chi_S d\nu $ where $ \chi_S \colon I \to [0,1] $ is the indicator function of S defined by $ \chi_S(x) = 1 $ when $ x \in S $ and $ \chi_S(x) = 0 $ otherwise. The last inequality is given by the data processing inequality which is given by the reflexivity and $ \operatorname{Eq}{} $ -composability of $ \mathsf{DP} $ .

Proof. (Proof of Proposition 29) We first prove that $ \mathsf{TV} $ is not 1-generated. We write $ |2| = \{0,1\} $ . We define $ \nu_1,\nu_2 \in G_{s} 2 $ by:

\[ \nu_1 = \frac{1}{2}\cdot\mathbf{d}_0 + \frac{1}{2}\cdot\mathbf{d}_1, \qquad \nu_2 = \frac{1}{3}\cdot\mathbf{d}_0 + \frac{2}{3}\cdot\mathbf{d}_1. \]

Then the total variation distance between them is calculated by:

\[ \mathsf{TV}_{2}(\nu_1,\nu_2) = \frac{1}{2}\left(\left|\frac{1}{2}-\frac{1}{3} \right| + \left|\frac{1}{2} - \frac{2}{3} \right|\right) = \frac{1}{6}. \]

On the other hand, for any $ f \colon 2 \to G_{s}1 $ , we have

\begin{align*} \mathsf{TV}_{1}(f^\sharp(\nu_1),f^\sharp(\nu_2)) &= \frac{1}{2} \left|\frac{1}{2}f(0) + \frac{1}{2}f(1) - \frac{1}{3}f(0) - \frac{2}{3}f(1)\right|\\ &= \frac{1}{2} \left|\frac{1}{6}f(0) - \frac{1}{6}f(1)\right|\\ &= \frac{1}{12} \left| f(0) - f(1) \right| \\ & \leq \frac{1}{12}. \end{align*}

This implies that $ \mathsf{TV} $ is not 1-generated.

Next, we prove that $ \mathsf{TV} $ is 2-generated. From the data processing inequality $ \mathsf{TV} $ which is given by the reflexivity and $ \operatorname{Eq}{} $ -composability of $ \mathsf{TV} $ , we obtain for any $ \nu_1,\nu_2 \in G_{s} I $ ,

\[ \mathsf{TV}_{I} (\nu_1,\nu_2) \geq \sup_{g \colon I \to G_{s} 2}\mathsf{TV}_{2} (g^\sharp \nu_1, g^\sharp \nu_2). \]

We show that the above inequality becomes the equality for some g.

We fix $ \nu_1,\nu_2 \in G_{s} I $ , a base measure $ \mu $ over I satisfying the absolute continuity $ \nu_1,\nu_2 \ll \mu $ and the Radon–Nikodym derivatives (density functions) $ \frac{d\nu_1}{d\mu}, \frac{d\nu_2}{d\mu} $ of $ \nu_1,\nu_2 $ with respect to $ \mu $ , respectively.

Let $ A = (\frac{d\nu_1}{d\mu} - \frac{d\nu_2}{d\mu})^{-1}([0,\infty)) $ and $ B = I \setminus A $ . We define $ g \colon I \to G_{s}2 $ by $ g(x) = \mathbf{d}_0 $ if $ x \in B $ and $ g(x) = \mathbf{d}_1 $ otherwise. Then for any $ \nu \in G_{s} I $ , we have

\[ (g^\sharp \nu)(\{0\}) = \int_I g(-)(\{0\}) d\nu = \int_A g(-)(\{0\}) d\nu + \int_B g(-)(\{0\}) d\nu = \int_A 1 d\nu + \int_B 0 d\nu = \nu(A). \]

Similarly, we have $ (g^\sharp \nu)(\{1\}) = \nu(B) $ . Therefore, we obtain

\begin{align*} \frac{1}{2}\mathsf{TV}_{I}(\mu_1,\mu_2) &=\frac{1}{2}\int_I \left| \frac{d\nu_1}{d\mu}(x) - \frac{d\nu_2}{d\mu}(x) \right|~d\mu(x)\\ &=\frac{1}{2}\int_{A} \frac{d\nu_1}{d\mu}(x) - \frac{d\nu_2}{d\mu}(x)~d\mu(x) + \frac{1}{2}\int_{B} \frac{d\nu_2}{d\mu}(x) - \frac{d\nu_1}{d\mu}(x) ~ d\mu(x)\\ &=\frac{1}{2}(\nu_1(A) - \nu_2(A) + \nu_2(B) - \nu_1(B))\\ &=\frac{1}{2}((g^\sharp\nu_1)(\{ 0 \}) - (g^\sharp\nu_2)(\{ 0 \}) + (g^\sharp\nu_2)(\{ 1 \}) - (g^\sharp\nu_2)(\{ 1 \}))\\ &=\frac{1}{2}(| (g^\sharp\nu_1)(\{ 0 \}) - (g^\sharp\nu_2)(\{ 0 \})| + |(g^\sharp\nu_2)(\{ 1 \}) - (g^\sharp\nu_2)(\{ 1 \}) |)\\ &= \mathsf{TV}_{2}(g^\sharp(\mu_1),g^\sharp(\mu_2)) \end{align*}

We then conclude that $ \Delta^{\mathrm{TV}} $ is 2-generated.

Proof. (Proof of Proposition 30) For all set J and $ c_1,c_2 \in TJ $ , we have

\begin{align*} {\mathsf\Delta}^{[\leq]^\Omega}_{J}(c_1,c_2) = 1 &\iff c_1 [\leq]^\Omega_J c_2\\ &\iff \bigwedge_{g \colon J \to T\Omega} g^\sharp (c_1) \leq g^\sharp (c_2)\\ &\iff \bigwedge_{g \colon J \to T\Omega} g^\sharp (c_1) \mathbin{[\leq]^\Omega_\Omega} g^\sharp (c_2)\\ &\iff \sup_{g \colon J \to T\Omega} {\mathsf\Delta}([\leq]^\Omega)_\Omega (g^\sharp (c_1),g^\sharp (c_2)) = 1. \end{align*}

This implies that $ {\mathsf\Delta}^{[\leq]^\Omega}_{ }{} $ is $ \Omega $ -generated.

Lemma 47. For any $ U\in{\bf QET}(\Sigma,\Omega),t,u\in T_\Sigma\Omega $ and $ \varepsilon\in\mathbb Q^+ $ , we have

\begin{equation*} D[U](t,u)\le\varepsilon\iff \emptyset\vdash {t}=_{\varepsilon}{u}{ } \in U. \end{equation*}

Proof. ( $ \implies $ ) Assume $ D[U](t,u) \leq \varepsilon $ . We first fix an arbitrary $ \varepsilon' \in \mathbb Q^+ $ such that $ \varepsilon < \varepsilon' $ . By the definition of D[U], there is $ \varepsilon^\ast \in \mathbb Q^+ $ such that $ D[U](t,u) \leq \varepsilon^\ast < \varepsilon' $ and $ \emptyset \vdash {t}=_{\varepsilon^\ast}{u} \in U $ . Here $ (\varepsilon - \varepsilon^\ast) \in \mathbb Q^+ $ and $ \varepsilon' = \varepsilon^\ast + (\varepsilon' - \varepsilon^\ast) $ . Therefore, by (Max) and (cut), we conclude $ \emptyset \vdash {t}=_{\varepsilon'}{u} \in U $ . Since $ \varepsilon' $ is arbitrary, we obtain $ \left\{\emptyset \vdash {t}=_{\varepsilon'}{u} \mathrel{}\middle|\mathrel{} \varepsilon' \in \mathbb Q^+, \varepsilon < \varepsilon' \right\} \subseteq U $ . Hence, by (Arch) and (cut), we have $ \emptyset \vdash {t}=_{\varepsilon}{u} \in U $ . ( $ \impliedby $ ) Obvious.

Lemma 48. For any $ U \in {\bf QET}(\Sigma ,\Omega ) $ , the function $ D[U] \colon (T_\Sigma \Omega )^2 \to {\mathcal{R}}^+ $ defined by (9) is a CS-PMet on $ T_\Sigma \Omega $ .

Proof. That D[U] is a pseudometric is shown in the beginning of (Mardare et al., Reference Mardare, Panangaden and Plotkin2016, Section 5).

We check the substitutivity. Let $ t,u \in T_\Sigma \Omega $ and $ h \colon \Omega \to T_\Sigma \Omega $ . By (Subst), we have

\[ \forall{\varepsilon\in\mathbb Q^+}~.~ \emptyset\vdash {t}=_{\varepsilon}{u} \in U \implies \emptyset \vdash{h^\sharp(t)}=_{\varepsilon}{h^\sharp(u)} \in U. \]

Since $ \varepsilon $ is arbitrary, we conclude the substitutivity as follows:

\begin{align*} D[U](h^\sharp(t),h^\sharp(u)) &= \inf \left\{ \varepsilon \in \mathbb Q^+ \mathrel{}\middle|\mathrel{} \emptyset\vdash {h^\sharp(t)}=_{\varepsilon}{h^\sharp(u)} \in U \right\} \\ &\leq \inf \left\{ \varepsilon \in \mathbb Q^+ \mathrel{}\middle|\mathrel{} \emptyset \vdash {t}=_{\varepsilon}{u} \in U \right\}\\ &= D[U](t,u). \end{align*}

We check the congruence. Let $ t \in T_\Sigma I $ and $ h_1,h_2 \colon I \to T_\Sigma \Omega $ . By unfolding the structure of t and applying (Nonexp) and (Cut) repeatedly, we have the implication:

(33) \begin{equation}\label{eq:QET=>CSPEMet:congr:1} \forall{i \in I,\varepsilon\in\mathbb Q^+}~.~ \emptyset\vdash {h_1(i)}=_{\varepsilon}{h_2(i)} \in U \implies \emptyset\vdash{h_1^\sharp(t)}=_{\varepsilon}{h_2^\sharp(t)} \in U. \end{equation}

From (33) and Lemma 47, we conclude the congruence as follows:

\begin{align*} D[U](h_1^\sharp(t),h_2^\sharp(t)) & = \inf \left\{ \varepsilon \in \mathbb Q^+ \mathrel{}\middle|\mathrel{} \emptyset\vdash{h_1^\sharp(t)}=_{\varepsilon}{h_2^\sharp(t)} \in U \right\}\\ & \leq \inf \left\{ \varepsilon \in \mathbb Q^+ \mathrel{}\middle|\mathrel{} \forall{i \in I}~.~ \emptyset \vdash {h_1(i)}=_{\varepsilon}{h_2(i)} \in U \right\}\\ & = \inf \left\{ \varepsilon \in \mathbb Q^+ \mathrel{}\middle|\mathrel{} \sup_{i \in I} D[U](h_1(i),h_2(i)) \leq \varepsilon \right\}\\ & = \sup_{i \in I} D[U](h_1(i),h_2(i)). \end{align*}

Here, the last equality follows from the density of $ \mathbb Q^+ $ .

The monotonicity of $ D[-] \colon ({\bf QET}(\Sigma ,\Omega ),\subseteq) \to ({\bf CSPMet}(T_\Sigma , \Omega ),\preceq) $ is shown as:

\begin{align*} U \subseteq V &\implies \forall{t,u \in T_\Sigma \Omega }~.~ \inf \left\{ \varepsilon \in \mathbb Q^+ \mathrel{}\middle|\mathrel{} \emptyset \vdash {t}=_{\varepsilon}{u} \in U \right\} \geq \inf \left\{ \varepsilon \in \mathbb Q^+ \mathrel{}\middle|\mathrel{} \emptyset \vdash {t}=_{\varepsilon}{u} \in V \right\}\\ & \iff \forall{t,u \in T_\Sigma \Omega }~.~ D[U](t,u) \geq D[V](t,u)\\ &\iff D[U] \preceq D[V]. \end{align*}

Lemma 49. Let T be a monad on Set and $ \Omega \in {\bf Set} $ . For any $ d \in {\bf CSPMet}(T, \Omega ) $ , the family $ \mathrm{Gen}(d) = \{\mathrm{Gen}(d)_I \colon (T I)^2 \to {\mathcal{R}}^+ \}_{I\in{\bf Set}} $ defined by (11) is an $ \Omega $ -generated $ \operatorname{Eq} $ -relative $ {\mathcal{R}}^+ $ -divergence on T where each $ \mathrm{Gen}(d)_I $ is a pseudometric.

Proof. From the reflexivity of d, we have the reflexivity of $ \mathrm{Gen}(d)_I $ : for each $ c \in TI $ ,

\[ \mathrm{Gen}(d)_I (c,c) = \sup_{k \colon I \to T\Omega } d(k^\sharp (c),k^\sharp (c)) = 0. \]

Hence, the $ \operatorname{Eq} $ -unit reflexivity of $ \mathrm{Gen}(d) $ follows. From the symmetry of d, we have the symmetry of $ \mathrm{Gen}(d)_I $ : for each $ c_1,c_2 \in TI $ ,

\begin{equation*} \mathrm{Gen}(d)_I (c_1,c_2) = \sup_{k \colon I \to T\Omega } d(k^\sharp (c_1),k^\sharp (c_2)) = \sup_{k \colon I \to T\Omega } d(k^\sharp (c_2),k^\sharp (c_1)) = \mathrm{Gen}(d)_I (c_2,c_1). \end{equation*}

From the triangle inequality of d, we have the triangle inequality of $ \mathrm{Gen}(d)_I $ : for all $ c_1,c_2,c_3 \in TI $ ,

\begin{align*} \mathrm{Gen}(d)_I (c_1,c_3) &= \sup_{k \colon I \to T\Omega } d(k^\sharp (c_1),k^\sharp (c_3))\\ &\leq \sup_{k \colon I \to T\Omega } d(k^\sharp (c_1),k^\sharp (c_2)) + d(k^\sharp (c_2),k^\sharp (c_3))\\ &\leq \sup_{k \colon I \to T\Omega } d(k^\sharp (c_1),k^\sharp (c_2)) + \sup_{k \colon I \to T\Omega } d(k^\sharp (c_2),k^\sharp (c_3))\\ &= \mathrm{Gen}(d)_I (c_1,c_2) +\mathrm{Gen}(d)_I (c_2,c_3) . \end{align*}

Using the reflexivity, congruence, and substitutivity of d and the triangle inequality of $ \mathrm{Gen}(d)_I $ , we next show the composability. Let $ c_1,c_2 \in TI $ and $ f_1,f_2 \colon I \to TJ $ . We obtain

We show the $ \Omega $ -generatedness of $ \mathrm{Gen}(d) $ :

\begin{align*} \mathrm{Gen}(d)_I(c_1,c_2) &= \sup_{l \colon I \to T\Omega } d (l ^\sharp (c_1), l ^\sharp (c_2))\\ (*) &= \sup_{h \colon \Omega \to T\Omega , k \colon I \to T\Omega } d ((h^\sharp \circ k)^\sharp (c_1), (h^\sharp \circ k)^\sharp (c_2))\\ &= \sup_{k \colon I \to T\Omega } \sup_{h \colon \Omega \to T\Omega } d (h^\sharp (k^\sharp (c_1)), h^\sharp (k^\sharp (c_2)))\\ &= \sup_{k \colon I \to T\Omega } \mathrm{Gen}(d)_\Omega (k^\sharp(c_1),k^\sharp(c_2)). \end{align*}

The step (*) uses the equality $ \left\{l \mathrel{}\middle|\mathrel{} l \colon I \to T\Omega \right\} = \left\{ h^\sharp \circ k \mathrel{}\middle|\mathrel{} h \colon \Omega \to T\Omega , k \colon I \to T\Omega \right\} $ .

Gen is indeed a monotone function of type $ ({\bf CSPMet}(T , \Omega ),\preceq) \to ({\bf PMet}(T , \Omega ),\preceq) $ because

\begin{align*} d \preceq d' & \implies \forall{I\in{\bf Set}}~.~\forall{c_1,c_2 \in TI}~.~ \sup_{k \colon I \to T\Omega } d(k^\sharp (c_1),k^\sharp (c_2)) \geq \sup_{k \colon I \to T\Omega } d'(k^\sharp (c_1),k^\sharp (c_2))\\ &\iff \forall{I\in{\bf Set}}~.~\forall{c_1,c_2 \in TI}~.~ \mathrm{Gen}(d)_I(c_1,c_2) \geq \mathrm{Gen}(d')_I(c_1,c_2)\\ &\iff \mathrm{Gen}(d) \preceq \mathrm{Gen}(d'). \end{align*}

Proof. (Proof of Theorem 34) This is proved by Lemma 47, 48, 49.

Proof. (Proof of Theorem 35) Let $ d \in {\bf CSPMet}(T_\Sigma , \Omega ) $ . We show $ (\mathrm{Gen}(d))_\Omega = d $ . Let $ t,u \in T_\Sigma \Omega $ . From the substitutivity of d, we have

\[ \sup_{k:\Omega\to T_\Sigma\Omega} d(k^\sharp(t),k^\sharp(u))\leq d(t,u). \]

On the other hand, $ d(t,u)=d(\eta_\Omega^\sharp(t),\eta_\Omega^\sharp(u)) $ . Therefore, we conclude

\[ (\mathrm{Gen}(d))_\Omega (t,u) = \sup_{k \colon \Omega \to T_\Sigma\Omega } d (k^\sharp (t), k^\sharp (u)) = d (t,u). \]

Let $ {\mathsf\Delta} \in {\bf PMet}(T_\Sigma , \Omega ) $ . We show $ \mathrm{Gen}({\mathsf\Delta}_\Omega ) = {\mathsf\Delta} $ . By the $ \Omega $ -generatedness of $ {\mathsf\Delta} $ , we have, for all sets I and $ t,u \in T_\Sigma I $ ,

\[ \mathrm{Gen}(({\mathsf\Delta})_\Omega )_I (t,u) = \sup_{k \colon I \to T_\Sigma\Omega } {\mathsf\Delta}_\Omega (k^\sharp (t),k^\sharp (u)) = {\mathsf\Delta}^{}_{I} (t,u). \]

We show the adjointness: $ U[d] \subseteq V \iff d \preceq D[V] $ for any $ V \in {\bf QET}(\Sigma ,\Omega ) $ and $ d \in {\bf CSPMet}(T_\Sigma , \Omega ) $ .

\begin{align*} U[d] \subseteq V &\iff \overline{\left\{\emptyset \vdash {t}=_{\varepsilon}{u } \mathrel{}\middle|\mathrel{} \varepsilon \in \mathbb Q^+, d (t,u) \leq \varepsilon \right\}}^{\mathrm{QET}({\Sigma},\Omega)} \subseteq V\\ &\iff \forall{t,u \in T_\Sigma\Omega , \varepsilon \in \mathbb Q^+}~.~ d (t,u) \leq \varepsilon \implies \emptyset \vdash {t}=_{\varepsilon}{u } \in V\\ (*) &\iff \forall{t,u \in T_\Sigma\Omega , \varepsilon \in \mathbb Q^+}~.~ d (t,u) \leq \varepsilon \implies \inf\left\{\varepsilon' \in \mathbb Q^+ \mathrel{}\middle|\mathrel{} \emptyset \vdash {t}=_{\varepsilon'}{u} \in V\right\} \leq \varepsilon\\ &\iff \forall{t,u \in T_\Sigma\Omega }~.~ \inf\left\{\varepsilon' \in \mathbb Q^+ \mathrel{}\middle|\mathrel{} \emptyset \vdash {t}=_{\varepsilon'}{u} \in V\right\} \leq d (t,u)\\ &\iff d \preceq D[V] \end{align*}

The step (*) uses Lemma 47.

Let $ d \in {\bf CSPMet}(T_\Sigma , \Omega ) $ . We finally show $ D[U[d]] = d $ . By the adjunction $ U[-]\dashv D[-] $ , we have $ d\preceq D[U[d]] $ . We thus show $ D[U[d]]\preceq d $ . We unfold this goal:

\begin{align*} D[U[d]]\preceq d &\iff \forall{t,u \in T_\Sigma\Omega }~.~ d(t,u) \leq D[U[d]](t,u)\\ &\iff \forall{t,u \in T_\Sigma\Omega }~.~ d(t,u) \leq \inf \left\{ \varepsilon \in \mathbb Q^+ \mathrel{}\middle|\mathrel{} \emptyset \vdash {t}=_{\varepsilon}{u} \in U[d] \right\}\\ &\iff \forall{t,u \in T_\Sigma\Omega ,\varepsilon \in \mathbb Q^+}~.~ {\emptyset \vdash {t}=_{\varepsilon}{u} \in U[d]} \implies d(t,u) \leq \varepsilon\\ &\iff \{\emptyset \vdash {t}=_{\varepsilon}{u} \in U[d] \} \subseteq \left\{ \emptyset \vdash {t}=_{\varepsilon}{u} \mathrel{}\middle|\mathrel{} d(t,u) \leq \varepsilon \right\}. \end{align*}

Since U[d] is the least QET including $ \left\{ \emptyset \vdash {t}=_{\varepsilon}{u} \mathrel{}\middle|\mathrel{} d(t,u) \leq \varepsilon \right\} $ , it suffices to have a QET $ V \in {\bf QET}(\Sigma ,\Omega ) $ such that

\[ \{\emptyset \vdash {t}=_{\varepsilon}{u} \in V \} = \left\{ \emptyset \vdash {t}=_{\varepsilon}{u} \mathrel{}\middle|\mathrel{} d(t,u) \leq \varepsilon \right\}. \]

Inspired from the definition of models of QET (Bacci et al., Reference Bacci, Mardare, Panangaden, Plotkin, Gadducci and Silva2021), we define V as follows:

\begin{align*} &{\Gamma \vdash {t}=_{\varepsilon}{u}} \in V\\ &\iff \forall{\sigma \colon \Omega \to T_\Sigma \Omega }~.~\left(\left(\forall{{t'}=_{\varepsilon'}{u'} \in\Gamma}~.~d(\sigma^\sharp(t'),\sigma^\sharp(u')) \leq \varepsilon' \right) \implies d(\sigma^\sharp(t),\sigma^\sharp(u)) \leq \varepsilon\right). \end{align*}

By the substitutivity of d and the definition of V, we obtain, for all $ t,u \in T_\Sigma \Omega $ and $ \varepsilon \in \mathbb Q^+ $ ,

\[ {\emptyset \vdash {t}=_{\varepsilon}{u}} \in V \iff (\forall{\sigma \colon \Omega \to T_\Sigma \Omega }~.~ d(\sigma^\sharp(t),\sigma^\sharp(u)) \leq \varepsilon) \iff d(t,u) \leq \varepsilon. \]

We check that V satisfies all rules of QET:

(Ref) From the reflexivity of d, we have $ {\emptyset \vdash {t}=_{0}{t}} \in V $ :

\[ \forall{\sigma \colon \Omega \to T_\Sigma \Omega }~.~ \left(\top \implies d(\sigma^\sharp(t),\sigma^\sharp(t)) \leq 0 \right). \]

(Sym) From the symmetry of d, we have $ {\{{t}=_{\varepsilon}{u} \} \vdash {u}=_{\varepsilon}{t}} \in V $ :

\[ \forall{\sigma \colon \Omega \to T_\Sigma \Omega }~.~ \left(\left( d(\sigma^\sharp(t),\sigma^\sharp(u)) \leq \varepsilon \right) \implies d(\sigma^\sharp(u),\sigma^\sharp(t)) \leq \varepsilon \right). \]

(Tri) From the triangle inequality of d, we have $ {\{{t}=_{\varepsilon}{u}, {u}=_{\varepsilon'}{v} \} \vdash {t}=_{\varepsilon+\varepsilon'}{v}} \in V $ :

\[ \forall{\sigma \colon \Omega \to T_\Sigma \Omega }~.~ \left(\left( d(\sigma^\sharp(t),\sigma^\sharp(u)) \leq \varepsilon \land d(\sigma^\sharp(u),\sigma^\sharp(v)) \leq \varepsilon' \right) \implies d(\sigma^\sharp(t),\sigma^\sharp(v)) \leq \varepsilon + \varepsilon' \right). \]

(Max) Since $ (\mathbb Q^+,\leq,0,+) $ is a preordered monoid where the unit 0 is the least element, we have $ {\{{t}=_{\varepsilon}{u} \} \vdash {t}=_{\varepsilon+\varepsilon'}{u}} \in V $ :

\[ \forall{\sigma \colon \Omega \to T_\Sigma \Omega }~.~ \left(\left( d(\sigma^\sharp(t),\sigma^\sharp(u)) \leq \varepsilon + 0 \right) \implies d(\sigma^\sharp(t),\sigma^\sharp(u)) \leq \varepsilon + \varepsilon' \right). \]

(Arch) From the density of $ \mathbb Q^+ $ , we have

\[ \forall{\sigma \colon \Omega \to T_\Sigma \Omega }~.~ \left( d(\sigma^\sharp(t),\sigma^\sharp(u)) > \varepsilon \implies \left( \exists{\varepsilon' \in \mathbb Q^+ \text{ s.t. }\varepsilon < \varepsilon'}~.~ d(\sigma^\sharp(t),\sigma^\sharp(u)) > \varepsilon'\right) \right). \]

It is equivalent to $ {\left \{{t}=_{\varepsilon'}{u} \mathrel{}\middle|\mathrel{} \varepsilon < \varepsilon' \right\} \vdash {t}=_{\varepsilon}{u}} \in V $ :

\[ \forall{\sigma \colon \Omega \to T_\Sigma \Omega }~.~ \left(\left( \forall{\varepsilon' \in \mathbb Q^+ \text{ s.t. }\varepsilon < \varepsilon'}~.~ d(\sigma^\sharp(t),\sigma^\sharp(u)) \leq \varepsilon' \right) \implies d(\sigma^\sharp(t),\sigma^\sharp(u)) \leq \varepsilon \right). \]

(Nonexp) Let $ f \colon n \in \Sigma $ be a function and $ \left\{ {t_i}=_{\varepsilon}{u_i}\mathrel{}\middle|\mathrel{} 1 \leq i \leq n \right\} \subseteq \mathbb V (T_\Sigma \Omega ) $ be a set of quantitiative equations. Let $ I = \{1,\ldots,n\} $ . We then take $ t_f = f(1,\ldots,n) \in T_{\Sigma } I $ . We define $ t, s : I \rightarrow T_{\Sigma } \Omega $ by $ t(i) = t_i $ and $ s(i) = s_i $ ( $ i \in I $ ). We now fix an arbitrary $ \sigma \colon \Omega \to T_{\Sigma } \Omega $ , and assume $ d (\sigma^\sharp(t_i), \sigma^\sharp(s_i)) \leq \varepsilon $ for all $ i \in I $ . Then this asserts $ \sup_{i \in I} d (\sigma^\sharp(t (i)), \sigma^\sharp(s (i))) \leq \varepsilon $ . From the congruence of d, we obtain

\begin{align*} d (\sigma^\sharp (f (t_1,\ldots,t_n)), \sigma^\sharp(f (s_1,\ldots,s_n) ) ) &= d (\sigma^\sharp(t^\sharp (t_f)), \sigma^\sharp(s^\sharp (t_f))) \\ &\leq \sup_{i \in I} d (\sigma^\sharp(t (i)), \sigma^\sharp(s (i))) \leq \varepsilon. \end{align*}

Since $ \sigma $ is arbitrary, we conclude $ \left\{ {t_i}=_{\varepsilon}{u_i}\mathrel{}\middle|\mathrel{} 1 \leq i \leq n \right\} \vdash {f (t_1,\ldots,t_n)}=_{\varepsilon}{f (s_1,\ldots,s_n)} \in V $ .

(Subst) Immediate by definition of V:

\begin{align*} &{\Gamma \vdash {t}=_{\varepsilon}{u}} \in V\\ &\iff \forall{\sigma \colon \Omega \to T_\Sigma \Omega }~.~\left(\left(\forall{{t'}=_{\varepsilon'}{u'} \in\Gamma}~.~d(\sigma^\sharp(t'),\sigma^\sharp(u')) \leq \varepsilon' \right) \implies d(\sigma^\sharp(t),\sigma^\sharp(u)) \leq \varepsilon\right)\\ &\implies \forall{\sigma', \sigma \colon \Omega \to T_\Sigma \Omega }~.~\left( \begin{aligned} &\left(\forall{{t'}=_{\varepsilon'}{u'} \in\Gamma}~.~d( ( \sigma^\sharp \circ {\sigma'})^\sharp(t'),( \sigma^\sharp \circ {\sigma'})^\sharp(u')) \leq \varepsilon' \right) \\ &\qquad \implies d((\sigma^\sharp \circ {\sigma'})^\sharp(t),( \sigma^\sharp \circ {\sigma'})^\sharp(u)) \leq \varepsilon \end{aligned} \right)\\ &\iff \forall{\sigma' \colon \Omega \to T_\Sigma \Omega }~.~\forall{\sigma \colon \Omega \to T_\Sigma \Omega }~.~\left( \begin{aligned} &\left(\forall{{t'}=_{\varepsilon'}{u'} \in\Gamma}~.~d(\sigma^\sharp ({\sigma'}^\sharp(t')),\sigma^\sharp ({\sigma'}^\sharp(u'))) \leq \varepsilon' \right) \\ &\qquad \implies d(\sigma^\sharp ({\sigma'}^\sharp(t)),\sigma^\sharp ({\sigma'}^\sharp(u))) \leq \varepsilon \end{aligned} \right)\\ &\implies \forall{\sigma' \colon \Omega \to T_\Sigma \Omega }~.~\forall{\sigma \colon \Omega \to T_\Sigma \Omega }~.~\left( \begin{aligned} &\left(\forall{{t''}=_{\varepsilon'}{u''} \in\sigma'(\Gamma)}~.~d(\sigma^\sharp(t''),\sigma^\sharp(u'')) \leq \varepsilon' \right) \\ &\qquad \implies d(\sigma^\sharp({\sigma'}^\sharp(t)),\sigma^\sharp({\sigma'}^\sharp(u))) \leq \varepsilon \end{aligned} \right)\\ &\iff \forall{\sigma'\colon \Omega \to T_\Sigma \Omega }~.~ {\sigma'}(\Gamma) \vdash {{\sigma'}^\sharp(t)}=_{\varepsilon}{{\sigma'}^\sharp(u)} \in V. \end{align*}

(Cut) Assume $ \Gamma' \vdash{t}=_{\varepsilon}{u} $ and $ \Gamma \vdash \psi \in V $ holds for all $ \psi \in \Gamma' $ . Fix an arbitrary $ \sigma \colon \Omega \to T_\Sigma \Omega $ , and assume $ \left(\forall{{t'}=_{\varepsilon''}{u''} \in\Gamma}~.~d(\sigma^\sharp(t''),\sigma^\sharp(u'')) \leq \varepsilon'' \right) $ . By definition of V, for any $ {t'}=_{\varepsilon'}{u'}\in \Gamma' $ , we obtain $ d(\sigma^\sharp(t'),\sigma^\sharp(u')) \leq \varepsilon' $ from $ \Gamma \vdash {t'}=_{\varepsilon'}{u'} \in V $ . Hence, by definition of V again, we obtain $ d(\sigma^\sharp(t),\sigma^\sharp(u)) \leq \varepsilon $ from $ \Gamma' \vdash{t}=_{\varepsilon}{u} \in V $ . Since $ \sigma $ is arbitrary, we conclude $ \Gamma \vdash{t}=_{\varepsilon}{u} \in V $ .

(Assumpt) Assume $ {t}=_{\varepsilon}{u} \in \Gamma $ . Fix an arbitrary $ \sigma \colon \Omega \to T_\Sigma \Omega $ . Regardless of the value of $ d(\sigma^\sharp(t),\sigma^\sharp(u)) $ , the following predicare is true:

\[\left(\forall{{t'}=_{\varepsilon'}{u'} \in\Gamma}~.~d(\sigma^\sharp(t'),\sigma^\sharp(u')) \leq \varepsilon' \right) \implies d(\sigma^\sharp(t),\sigma^\sharp(u)) \leq \varepsilon\]

because the premise $ (\forall{{t'}=_{\varepsilon'}{u'} \in\Gamma}~.~d(\sigma^\sharp(t'),\sigma^\sharp(u')) \leq \varepsilon') $ is false whenever $ d(\sigma^\sharp(t),\sigma^\sharp(u)) > \varepsilon $ . Since $ \sigma $ is arbitrary, we conclude $ \Gamma \vdash{t}=_{\varepsilon}{u} \in V $ .

Proof. (Proof of Theorem 36) It is clear that U[d] is an unconditional QET from its definition. Therefore, we take arbitrary $ V\in \mathbf{UQET}(\Sigma ,\Omega ) $ and show $ U[D[V]]=V $ . We assume $ V = \overline{S}^{\mathrm{QET}({\Sigma},\Omega)} $ for some $ S \subseteq \{ \emptyset \vdash{t}=_{\varepsilon}{u} \mid t,u \in T_\Sigma \Omega , \varepsilon \in \mathbb Q^+ \} $ . The adjunction $ U[-]\dashv D[-] $ implies $ U[D[V]] \subseteq V $ . We thus show $ V \subseteq U[D[V]] $ . For any $ t,u\in T_\Sigma\Omega $ and $ \epsilon\in\mathbb Q^+ $ , we have

Here, (*) uses Lemma 47. From the monotonicity of the closure $ \overline{(-)}^{\mathrm{QET}({\Sigma},\Omega)} $ , we conclude

\[ V = \overline{S}^{\mathrm{QET}({\Sigma},\Omega)} \subseteq \overline{\{ \emptyset \vdash{t}=_{\varepsilon}{u} \mid D'[V](t,u) \leq \varepsilon \}}^{\mathrm{QET}({\Sigma},\Omega)} = U[D[V]]. \]

Appendix D. Proofs for Section 7 (Graded Strong Relational Liftings for Divergences)

Lemma 50. Let $ ({\mathbb{C}},T) $ be a CC-SM and $ {\mathsf\Delta} = \{{\mathsf\Delta}^{m}_{I} \colon (U(TI))^2\to {\mathcal{Q}} \}_{m\in M,I\in{\mathbb{C}}} $ be a doubly indexed family of $ {\mathcal{Q}} $ -divergences on TI satisfying monotonicity on m (Definition 6). Then $ T^{[{\mathsf\Delta}]} $ is an $ M \times {\mathcal{Q}} $ -graded relational lifting of T (satisfies conditions 1–3 of Definition 37).

Proof. (Condition 1) We first show that $ (\mathrm{id}_{TX_1},\mathrm{id}_{TX_2}) \in {\bf BRel}({\mathbb{C}}) (T^{[{\mathsf\Delta}]} (m,v) X,T^{[{\mathsf\Delta}]} (n,w) X) $ for all X whenever $ m \leq n $ and $ v \leq w $ . From the monotonicity of $ {\mathsf\Delta} $ , for all $ I \in {\mathbb{C}} $ , $ c'_1, c'_2 \in U(TI) $ , $ n' \in M $ , $ w' \in {\mathcal{Q}} $ , we have

Therefore, for any $ (c_1,c_2) \in T^{[{\mathsf\Delta}]} (m,v) X $ , we obtain $ (c_1,c_2)\in T^{[{\mathsf\Delta}]} (n,w) X $ as follows:

(Condition 2) We next show $ (\eta_{X_1},\eta_{X_2}) :X\mathbin{\dot\rightarrow}T^{[{\mathsf\Delta}]} (1,0) X $ . From the definition of morphisms in $ {\bf BRel}({\mathbb{C}}) $ , for all $ (x_1,x_2) \in X $ , we have $ (\eta_{X_1} \mathbin{\bullet} x_1, \eta_{X_2} \mathbin{\bullet} x_2) \in T^{[{\mathsf\Delta}]} (1,0) X $ as follows:

(Condition 3) Finally, we show that $ (f_1^\sharp,f_2^\sharp) : T^{[{\mathsf\Delta}]} (n,w) X\mathbin{\dot\rightarrow}T^{[{\mathsf\Delta}]} (n\cdot m,w +v) Y $ holds for any $ (f_1,f_2) : X\mathbin{\dot\rightarrow}T^{[{\mathsf\Delta}]} (m,v) Y $ and $ (n,w)\in M \times {\mathcal{Q}} $ . For all $ (f_1,f_2) : X\mathbin{\dot\rightarrow}T^{[{\mathsf\Delta}]} (m,v) Y $ , we have

(a)

For all $ (c_1,c_2) \in T^{[{\mathsf\Delta}]} (n,w) X $ , we have

(b)

We here fix $ (f_1,f_2) : X\mathbin{\dot\rightarrow}T^{[{\mathsf\Delta}]} (m,v) Y $ . We show $ (f_1^\sharp,f_2^\sharp) \colon T^{[{\mathsf\Delta}]} (n,w) X \mathbin{\dot\rightarrow} T{\mathsf\Delta} (n \cdot m,w + v) Y $ . We also fix $ I\in{\mathbb{C}} $ , $ n''\in M $ , $ w''\in{\mathcal{Q}} $ and $ (k_1, k_2) :Y\mathbin{\dot\rightarrow} \tilde{\mathsf\Delta} (n'', w'') I $ . From (a), we obtain

\[ (k_1^\sharp \circ f_1,k_2^\sharp \circ f_2) : X\mathbin{\dot\rightarrow}\tilde{\mathsf\Delta} (m \cdot n'', v + w'') I. \]

Therefore, by instantiating (b) with $ (n',w') = (m \cdot n'',v + w'') $ and $ (l_1,l_2) = (k_1^\sharp \circ f_1,k_2^\sharp \circ f_2) $ , for all $ (c_1,c_2) \in T^{[{\mathsf\Delta}]} (n,w) X $ , we have

\[ ((k_1^\sharp \circ f_1)^\sharp \mathbin{\bullet} c_1, (k_2^\sharp \circ f_2)^\sharp \mathbin{\bullet} c_2) \in \tilde{\mathsf\Delta} (n \cdot m \cdot n'', w + v + w'') I. \]

Since $ (c_1,c_2) \in T^{[{\mathsf\Delta}]} (n,w) X $ , $ I\in{\mathbb{C}} $ , $ n''\in M $ , $ w''\in{\mathcal{Q}} $ and $ (k_1, k_2) :Y\mathbin{\dot\rightarrow} \tilde{\mathsf\Delta} (n'', w'') I $ are arbitrary, we conclude $ (f_1^\sharp,f_2^\sharp) \colon T^{[{\mathsf\Delta}]} (n,w) X \mathbin{\dot\rightarrow} T{\mathsf\Delta} (n \cdot m,w + v) $ as follows:

\begin{align*} &\left(\begin{aligned} &\forall{(c_1,c_2) \in T^{[{\mathsf\Delta}]} (n,w) X, I\in{\mathbb{C}}, m''\in M, v''\in{\mathcal{Q}}, (k_1, k_2) : Y \mathbin{\dot\rightarrow}\tilde{\mathsf\Delta} (m'', v'') I}~.~\\ &\qquad ((k_1^\sharp \circ f_1)^\sharp \mathbin{\bullet} c_1,(k_2^\sharp \circ f_2)^\sharp \mathbin{\bullet} c_2):X\mathbin{\dot\rightarrow} \tilde{\mathsf\Delta} (n \cdot m \cdot m'', w + v + v'') I \end{aligned}\right)\\ &\iff \left(\begin{aligned} &\forall{(c_1,c_2) \in T^{[{\mathsf\Delta}]} (n,w) X, I\in{\mathbb{C}}, m''\in M, v''\in{\mathcal{Q}}, (k_1, k_2) : Y \mathbin{\dot\rightarrow}\tilde{\mathsf\Delta} (m'', v'') I}~.~\\ &\qquad (k_1^\sharp \mathbin{\bullet} (f_1^\sharp \mathbin{\bullet} c_1),k_2^\sharp \mathbin{\bullet} (f_2^\sharp \mathbin{\bullet} c_2)):X\mathbin{\dot\rightarrow} \tilde{\mathsf\Delta} (n \cdot m \cdot m'', w + v + v'') I \end{aligned}\right)\\ &\iff \forall{(c_1,c_2) \in T^{[{\mathsf\Delta}]} (n,w) X}~.~(f_1^\sharp \mathbin{\bullet} c_1,f_2^\sharp \mathbin{\bullet} c_2) \in T^{[{\mathsf\Delta}]} (n \cdot m,w + v) Y\\ &\iff (f_1^\sharp,f_2^\sharp) \colon T^{[{\mathsf\Delta}]} (n,w) X \mathbin{\dot\rightarrow} T{\mathsf\Delta} (n \cdot m,w + v). \end{align*}

This completes the proof.

Proof. (Proof of Proposition 40) Since $ \dot T $ lifts the Kleisli extension (Condition 3 of Definition 37), and satisfy the fundamental property, we obtain

\begin{align*} &(c_1,c_2) \in \dot T (m,v) X\\ &\implies \forall{ I \in {\mathbb{C}}, n \in M, q \in {\mathcal{Q}},(k_1,k_2) \colon X \mathbin{\dot\rightarrow} \dot T (n,w)(EI) }~.~ (k_1^\sharp c_1, k_2^\sharp c_2) \in \dot T (mn,v+w) (EI)\\ &\iff \forall{ I \in {\mathbb{C}}, n \in M, q \in {\mathcal{Q}}, (k_1,k_2) \colon X \mathbin{\dot\rightarrow} \tilde{\mathsf\Delta}(n,w) I }~.~ (k_1^\sharp c_1, k_2^\sharp c_2) \in \tilde{\mathsf\Delta}(mn,v+w) I\\ &\iff (c_1, c_2) \in T^{[ {\mathsf\Delta}]} {} (m,v) X. \end{align*}

Appendix E. Proofs for Section 9 (Case Study I: Higher-Order Probabilistic Programs)

Lemma 51. The mapping

\[ (x,\sigma) \mapsto \begin{cases} \mathcal{N}(x,\sigma^2) & \sigma \neq 0\\ \mathbf{d}_{x} & \sigma = 0 \end{cases} \]

forms a measurable function of type $ {\mathbb{R}} \times {\mathbb{R}} \to G{\mathbb{R}} $ .

Proof. We show that for all $ A \in \Sigma_{\mathbb{R}} $ , the mapping $ f_A(x,\sigma) = \mathcal{N}(x,\sigma^2)(A) $ forms a measurable function of type $ {\mathbb{R}} \times {\mathbb{R}}_{\neq 0} \to [0,1] $ where $ {\mathbb{R}}_{\neq 0} $ is the subspace of $ {\mathbb{R}} $ whose underlying set is $ \{r\in{\mathbb{R}} \mid r \neq 0 \} $ . We have

\[ \mathcal{N}(x,\sigma^2)(A) = \sum_{k \in \mathbb{Z}}\mathcal{N}(x,\sigma^2)(A \cap [k,k+1]) = \sum_{k \in \mathbb{Z}} \int_{A \cap [k,k+1]} \frac{1}{\sqrt{2\pi\sigma^2}} \exp\left(-\frac{(x - r)^2}{\sigma^2}\right)dr \]

The mapping $ h(x,\sigma,r) = \frac{1}{\sqrt{2\pi\sigma^2}} \exp\left(-\frac{(x - r)^2}{\sigma^2}\right) $ forms a continuous function of type $ {\mathbb{R}} \times {\mathbb{R}}_{\neq 0} \times {\mathbb{R}} \to {\mathbb{R}} $ ; hence, it is uniformly continuous on the compact set $ I_1 \times I_2 \times [k,k+1] $ where $ I_1 $ and $ I_2 $ are arbitrary closed intervals in $ {\mathbb{R}} $ and $ {\mathbb{R}}_{\neq 0} $ , respectively. Then, for all $ 0 < \varepsilon $ , there exists $ 0 < \delta $ such that $ |h(x,\sigma,r) - h(x',\sigma',r') |<\varepsilon $ holds wherever $ |x - x'| + |\sigma - \sigma'| + |r -r '| < \delta $ . Hence, for all $ 0 < \varepsilon $ , there is $ 0 < \delta $ such that whenever $ |x - x'| + |\sigma - \sigma'|< \delta $ ,

\[ \left| \int_{A \cap [k,k+1]} h(x,\sigma,r)dr - \int_{A \cap [k,k+1]} h(x',\sigma',r)dr \right| \leq |\int_{[k,k+1]} |h(x,\sigma,r) - h(x',\sigma',r')| dr \leq \varepsilon. \]

Since the closed intervals $ I_1 $ and $ I_2 $ are arbitrary, we conclude that the function $ f_{A \cap [k,k+1]} \colon {\mathbb{R}} \times {\mathbb{R}}_{\neq 0} \to [0,1] $ is continuous, hence measurable. Hence, the mapping $ f_A = \sum_{k \in \mathbb{Z}} f_{A \cap [k,k+1]} $ is measurable. Since A is arbitrary and $ f_A(x,\sigma^2) = \mathrm{ev}_A \circ \mathcal{N}(x,\sigma^2) $ , the mapping $ g(x,\sigma) \triangleq \mathcal{N}(x,\sigma^2) $ forms a measurable function of type $ {\mathbb{R}} \times {\mathbb{R}}_{\neq 0} \to G{\mathbb{R}} $ . Next, it is obvious that the mapping $ g'(x,\sigma) \triangleq \mathbf{d}_{x} $ forms a measurable function of type $ {\mathbb{R}} \times \{0\} \to G{\mathbb{R}} $ . Finally, the following mapping g” forms a measurable function of type $ {\mathbb{R}} \times {\mathbb{R}} \to ({\mathbb{R}} \times {\mathbb{R}}_{\neq 0}+ {\mathbb{R}} \times \{0\}) $ :

\[ g''(x,\sigma) \triangleq \begin{cases} \iota_1 (x,\sigma) & \sigma \neq 0\\ \iota_2 (x,0) & \sigma = 0 \end{cases}. \]

Let $ A \in \Sigma_{{\mathbb{R}} \times {\mathbb{R}}_{\neq 0}+ {\mathbb{R}} \times \{0\}} $ . By definition of g”, we have $ (g'')^{-1}(A) = \iota_1^{-1}(A) \cup \iota_2^{-1}(A) $ . Since the coprojections are measurable, we have $ \iota_1^{-1}(A) \in \Sigma_{{\mathbb{R}} \times {\mathbb{R}}_{\neq 0}} $ and $ \iota_2^{-1}(A) \in \Sigma_{{\mathbb{R}} \times \{0\}} $ . Since $ {\mathbb{R}}_{\neq 0} $ and $ \{0\} $ are measurable subsets of $ {\mathbb{R}} $ , we have $ \Sigma_{{\mathbb{R}} \times {\mathbb{R}}_{\neq 0}},\Sigma_{{\mathbb{R}} \times \{0\}} \subseteq \Sigma_{{\mathbb{R}}\times{\mathbb{R}}} $ . Thus, $ (g'')^{-1}(A) \in \Sigma_{{\mathbb{R}} \times {\mathbb{R}}} $ . Since A is arbitrary, g” is measurable. Therefore, we conclude the measurability of the composition $ [g,g'] \circ g'' \colon {\mathbb{R}} \times {\mathbb{R}} \to G {\mathbb{R}} $ , which is exactly the mapping in the statement.

Corollary 52. $ [\![ \mathtt{norm}]\!] \in {\bf QBS}(K{\mathbb{R}}\times K{\mathbb{R}},PK{\mathbb{R}}) $ .

Lemma 53. The mapping

\[ (x,\lambda) \mapsto \begin{cases} \mathrm{Lap}(x,\lambda) & \lambda > 0\\ \mathbf{d}_{x} & \lambda \leq 0 \end{cases} \]

forms a measurable function of type $ {\mathbb{R}} \times {\mathbb{R}} \to G{\mathbb{R}} $ .

Proof. We have, for all $ A \in \Sigma_{\mathbb{R}} $ ,

\[ \mathrm{Lap}(x,\lambda)(A) = \int_{A} \frac{1}{2\lambda} \exp\left(-\frac{|x - r|}{\lambda}\right)dr \]

The density function $ h(x,\lambda,r) = \frac{1}{2\lambda} \exp\left(-\frac{|x - r|}{\lambda}\right) $ is continuous function of type $ {\mathbb{R}} \times {\mathbb{R}}_{0\leq} \times {\mathbb{R}} \to {\mathbb{R}} $ where $ {\mathbb{R}}_{0\leq} $ is the subspace of $ {\mathbb{R}} $ whose underlying set is $ \{r\in {\mathbb{R}} \mid 0 \leq r\} $ . The measurability of $ \mathrm{Lap}(x,\lambda) $ is proved in the same way as $ \mathcal{N}(x,\sigma^2) $ . The rest of proof is the same routine as Lemma 51.

Corollary 54. $ [\![ \mathtt{lap}]\!] \in {\bf QBS}(K{\mathbb{R}}\times K{\mathbb{R}},PK{\mathbb{R}}) $ .

Footnotes

1 In particular, there is no $ \sigma $-algebra $ \Sigma $ over $ {\bf Meas}({\mathbb{R}},{\mathbb{R}}) $ that makes the evaluation mapping $ (x,f) \mapsto f(x) $ a measurable function of type $ {\mathbb{R}} \times ({\bf Meas}({\mathbb{R}},{\mathbb{R}}),\Sigma) \to {\mathbb{R}} $.

2 Strictly speaking, differential privacy depends on the definition of adjacency of datasets. The adjacency relation $ R_{\mathrm{adj}} $ is usually defined as $ \{(d_1,d_2) \mid \rho(d_1,d_2) \leq 1\} $ with a metric $ \rho $ over I.

3 Remark that $ \Pr[c(d_1) = s] $ and $ \Pr[c(d_2) = s] $ are Radon–Nikodym derivatives of $ c(d_1) $ and $ c(d_2) $ with respect to a measure $ \nu $ such that $ c(d_1), c(d_2) \ll \nu $. [$ \implies $] Obvious. [$ \impliedby $] By Radon–Nikodym theorem, we can take the Radon–Nikodym derivatives $ \Pr[c(d_1) = s] $ and $ \Pr[c(d_2) = s] $ with respect to $ \nu = c(d_1) + c(d_2) $. The inequality does not depend on the choice of $ \nu $.

4 Recall that an ultrametric space $ (I,d_I) $ is a set I together with a function $ d_I \colon I^2 \to [0,1] $ such that $ d_I(x,x) = 0 $ and $ d_I(x,z) \leq \max(d_I(x,y),d_I(y,z)) $.

5 $ R_{E 1} = \emptyset $ happens if and only if $ R_{E I} = \emptyset $ for any $ I \in {\mathbb{C}} $. Therefore, nontrivial basic endorelations always satisfy $ R_{E 1} \neq \emptyset $.

6 A pseudometric is a function $ d \colon A^2 \to {\mathcal{R}}^+ $ such that $ d(a,a) = 0 $ (reflexivity), $ d(b,a) = d(a,b) $ (symmetry), and $ d(a,c) \leq d(a,b) + d(b,c) $ (triangle inequality) hold for all $ a,b,c \in A $. Since d may return the positive infinity $ \infty $, it is sometimes called an extended pseudometric.

7 If $ \sigma = 0 $ (or $ \lambda \leq 0 $), $ \mathcal{N}(x,\sigma^2) $ (resp. $ \mathrm{Lap}(x,\lambda) $) is not defined; thus, we replace it by the Dirac distribution $ \mathbf{d}_x $ at x instead.

8 To make examples simpler, we allow negative costs.

9 For a measurable subset $ A \in \Sigma_I $, an indicator function $ \chi_A \colon I \to [0,1] $ of \textit{A} is defined by $ \chi_A(x) = 1 $ if $ x \in A $ and $ \chi_A = 0 $ otherwise. A simple function is a linear combination of finite number of indicator functions.

References

Albarghouthi, A. and Hsu, J. (2018a). Constraint-based synthesis of coupling proofs. In Computer Aided Verification – 30th International Conference, CAV 2018, Proceedings, Part I, vol. 10981. LNCS. Springer, 327–346.Google Scholar
Albarghouthi, A. and Hsu, J. (2018b). Synthesizing coupling proofs of differential privacy. PACMPL 2 (POPL) 58:158:30.Google Scholar
Altenkirch, T., Chapman, J., and Uustalu, T. (2015). Monads need not be endofunctors. Logical Methods in Computer Science 11 (1).Google Scholar
Aumann, R. J. (1961). Borel structures for function spaces. Illinois Journal of Mathematics 5 (4) 614630.CrossRefGoogle Scholar
Azevedo de Amorim, A., Gaboardi, M., Hsu, J., and Katsumata, S. (2019). Probabilistic relational reasoning via metrics. In 34th Annual ACM/IEEE Symposium on Logic in Computer Science, LICS 2019. IEEE, 1–19.CrossRefGoogle Scholar
Bacci, G., Mardare, R., Panangaden, P., and Plotkin, G. (2021). Tensor of quantitative equational theories. In Gadducci, F. and Silva, A. (eds.), 9th Conference on Algebra and Coalgebra in Computer Science (CALCO 2021), volume 211 of Leibniz International Proceedings in Informatics (LIPIcs), Dagstuhl, Germany: Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 7:1–7:17.Google Scholar
Balan, A., Kurz, A., and Velebil, J. (2019). Extending set functors to generalised metric spaces. Logical Methods in Computer Science 15 (1).Google Scholar
Baldan, P., Bonchi, F., Kerstan, H., and König, B. (2018). Coalgebraic behavioral metrics. Logical Methods in Computer Science 14 (3).Google Scholar
Balle, B., Barthe, G., Gaboardi, M., Hsu, J., and Sato, T. (2020). Hypothesis testing interpretations and renyi differential privacy. In Chiappa, S. and Calandra, R. (eds.), Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics (AISTATS 2020), vol. 108. Proceedings of Machine Learning Research, Online. PMLR, 2496–2506.Google Scholar
Barthe, G., Crespo, J. M., and Kunz, C. (2011). Relational verification using product programs. In Butler, M. and Schulte, W., editors, FM 2011: Formal Methods, Berlin, Heidelberg: Springer Berlin Heidelberg, 200214.CrossRefGoogle Scholar
Barthe, G., D’Argenio, P. R., and Rezk, T. (2004). Secure information flow by self-composition. In Proceedings of the 17th IEEE Workshop on Computer Security Foundations, CSFW ’04, vol. 100, USA: IEEE Computer Society.Google Scholar
Barthe, G., Gaboardi, M., Arias, E. J. G., Hsu, J., Kunz, C., and Strub, P. (2014). Proving differential privacy in Hoare logic. In IEEE 27th Computer Security Foundations Symposium, CSF 2014. IEEE Computer Society, 411424.CrossRefGoogle Scholar
Barthe, G., Gaboardi, M., Arias, E. J. G., Hsu, J., Roth, A., and Strub, P. (2015). Higher-order approximate relational refinement types for mechanism design and differential privacy. In Proceedings of the 42nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2015, Mumbai, India, January 15-17, 2015. ACM, 5568.CrossRefGoogle Scholar
Barthe, G., Gaboardi, M., Grégoire, B., Hsu, J., and Strub, P. (2016). Proving differential privacy via probabilistic couplings. In Proceedings of the 31st Annual ACM/IEEE Symposium on Logic in Computer Science, LICS ’16. ACM, 749758.CrossRefGoogle Scholar
Barthe, G., Grégoire, B., Hsu, J., and Strub, P.-Y. (2017). Coupling proofs are probabilistic product programs. In Proceedings of the 44th ACM SIGPLAN Symposium on Principles of Programming Languages, POPL 2017, New York, NY, USA: Association for Computing Machinery, 161–174.CrossRefGoogle Scholar
Barthe, G., Köpf, B., Olmedo, F., and Béguelin, S. Z. (2012). Probabilistic relational reasoning for differential privacy. In Proceedings of the 39th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2012. ACM, 97110.CrossRefGoogle Scholar
Barthe, G. and Olmedo, F. (2013). Beyond differential privacy: Composition theorems and relational logic for f-divergences between probabilistic programs. In Automata, Languages, and Programming - 40th International Colloquium, ICALP 2013, Proceedings, Part II, vol. 7966. LNCS. Springer, 49–60.CrossRefGoogle Scholar
Benton, N. (2004). Simple relational correctness proofs for static analyses and program transformations. SIGPLAN Notices 39 (1) 1425.CrossRefGoogle Scholar
Bonchi, F., König, B., and Petrisan, D. (2018). Up-To Techniques for Behavioural Metrics via Fibrations. In 29th International Conference on Concurrency Theory (CONCUR 2018), vol. 118. Leibniz International Proceedings in Informatics (LIPIcs), Dagstuhl, Germany: Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, 17:1–17:17.Google Scholar
Bun, M., Dwork, C., Rothblum, G. N., and Steinke, T. (2018). Composable and versatile privacy via truncated CDP. In Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2018, New York, NY, USA: Association for Computing Machinery, 74–86.CrossRefGoogle Scholar
Bun, M. and Steinke, T. (2016). Concentrated differential privacy: Simplifications, extensions, and lower bounds. In Theory of Cryptography, Berlin, Heidelberg: Springer Berlin Heidelberg, 635–658.CrossRefGoogle Scholar
Çiçek, E., Barthe, G., Gaboardi, M., Garg, D., and Hoffmann, J. (2017). Relational cost analysis. SIGPLAN Notices 52 (1) 316329.CrossRefGoogle Scholar
Csiszár, I. (1963). Eine informationstheoretische Ungleichung und ihre Anwendung auf den beweis der ergodizitat von markoffschen ketten. Magyar. Tud. Akad. Mat. Kutato Int. Kozl. 8 85108.Google Scholar
Csiszár, I. (1967). Information-type measures of difference of probability distributions and indirect observations. Studia Scientiarum Mathematicarum Hungarica 2 299318.Google Scholar
Dwork, C., McSherry, F., Nissim, K., and Smith, A. (2006). Calibrating noise to sensitivity in private data analysis. In Theory of Cryptography, vol. 3876. LNCS. Springer Berlin Heidelberg, 265–284.CrossRefGoogle Scholar
Dwork, C. and Roth, A. (2013). The algorithmic foundations of differential privacy. Foundations and Trends in Theoretical Computer Science 9 (3-4) 211407.CrossRefGoogle Scholar
Gaboardi, M., Haeberlen, A., Hsu, J., Narayan, A., and Pierce, B. C. (2013). Linear dependent types for differential privacy. In The 40th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL ’13. ACM, 357370.CrossRefGoogle Scholar
Gaboardi, M., Katsumata, S., Orchard, D., and Sato, T. (2021). Graded hoare logic and its categorical semantics. In Yoshida, N., editor, Programming Languages and Systems – 30th European Symposium on Programming, ESOP 2021, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2021, Luxembourg City, Luxembourg, March 27 - April 1, 2021, Proceedings, vol. 12648. Lecture Notes in Computer Science. Springer, 234–263.CrossRefGoogle Scholar
Gavazzo, F. (2018). Quantitative behavioural reasoning for higher-order effectful programs: Applicative distances. In Proceedings of the 33rd Annual ACM/IEEE Symposium on Logic in Computer Science, LICS ’18, New York, NY, USA: Association for Computing Machinery, 452–461.CrossRefGoogle Scholar
Giry, M. (1982). A categorical approach to probability theory. In Banaschewski, B., editor, Categorical Aspects of Topology and Analysis, vol. 915. LNM. Springer, 68–85.CrossRefGoogle Scholar
Hall, R. (2012). New Statistical Applications for Differential Privacy. PhD thesis, Machine Learning Department School of Computer Science Carnegie Mellon University.Google Scholar
Hermida, C. and Jacobs, B. (1995). An algebraic view of structural induction. In Proceedings of CSL ’94, vol. 933. LNCS. Springer-Verlag, 412–426.CrossRefGoogle Scholar
Heunen, C., Kammar, O., Staton, S., and Yang, H. (2017). A convenient category for higher-order probability theory. In 32nd Annual ACM/IEEE Symposium on Logic in Computer Science, LICS 2017, 1–12.CrossRefGoogle Scholar
Jacobs, B. (1999). Categorical Logic and Type Theory. Elsevier.Google Scholar
Kairouz, P., Oh, S., and Viswanath, P. (2015). The composition theorem for differential privacy. In Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015, 1376–1385.Google Scholar
Katsumata, S. (2014). Parametric effect monads and semantics of effect systems. In The 41st Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL ’14. ACM, 633646.CrossRefGoogle Scholar
Katsumata, S. and Sato, T. (2013). Preorders on monads and coalgebraic simulations. In Pfenning, F., editor, Foundations of Software Science and Computation Structures, Berlin, Heidelberg: Springer Berlin Heidelberg, 145160.CrossRefGoogle Scholar
Katsumata, S., Sato, T., and Uustalu, T. (2018). Codensity lifting of monads and its dual. Logical Methods in Computer Science 14 (4).Google Scholar
Kurz, A. and Velebil, J. (2016). Relation lifting, a survey. Journal of Logical and Algebraic Methods in Programming 85 (4) 475–499. Relational and algebraic methods in computer science.CrossRefGoogle Scholar
Liese, F. and Vajda, I. (2006). On divergences and informations in statistics and information theory. IEEE Transactions on Information Theory 52 (10) 43944412.CrossRefGoogle Scholar
Lucassen, J. M. and Gifford, D. K. (1988). Polymorphic effect systems. In Conference Record of the Fifteenth Annual ACM Symposium on Principles of Programming Languages. ACM Press, 4757.CrossRefGoogle Scholar
Mac Lane, S. (1998). Categories for the Working Mathematician (Second Edition), vol. 5. Graduate Texts in Mathematics. Springer.Google Scholar
Mardare, R., Panangaden, P., and Plotkin, G. (2016). Quantitative algebraic reasoning. In Proceedings of the 31st Annual ACM/IEEE Symposium on Logic in Computer Science, LICS ’16, New York, NY, USA: Association for Computing Machinery, 700–709.CrossRefGoogle Scholar
Mardare, R., Panangaden, P., and Plotkin, G. (2017). On the axiomatizability of quantitative algebras. In 2017 32nd Annual ACM/IEEE Symposium on Logic in Computer Science (LICS), Los Alamitos, CA, USA: IEEE Computer Society, 1–12.CrossRefGoogle Scholar
Mironov, I. (2017). Rényi differential privacy. In 2017 IEEE 30th Computer Security Foundations Symposium (CSF), 263–275.CrossRefGoogle Scholar
Mitchell, J. C. and Scedrov, A. (1992). Notes on sconing and relators. In Computer Science Logic, 6th Workshop, CSL ’92, vol. 702. LNCS. Springer, 352–378.Google Scholar
Moggi, E. (1991). Notions of computation and monads. Information and Computation 93 (1) 5592.CrossRefGoogle Scholar
Morimoto, T. (1963). Markov processes and the H-theorem. Journal of the Physical Society of Japan 18 (3) 328331.CrossRefGoogle Scholar
Nielson, H. R. and Nielson, F. (2007). Semantics with Applications: An Appetizer. Springer-Verlag, Berlin, Heidelberg.CrossRefGoogle Scholar
Olmedo, F. (2014). Approximate Relational Reasoning for Probabilistic Programs. PhD thesis, Technical University of Madrid.Google Scholar
Prasad, S. and Smith, K. A. (2014). A note on differential privacy: Defining resistance to arbitrary side information. Journal of Privacy and Confidentiality 6 (1).Google Scholar
Radiček, I., Barthe, G., Gaboardi, M., Garg, D., and Zuleger, F. (2017). Monadic refinements for relational cost analysis. Proceedings of the ACM on Programming Languages 2 (POPL) 36:1–36:32.CrossRefGoogle Scholar
Reed, J. and Pierce, B. C. (2010). Distance makes the types grow stronger: A calculus for differential privacy. In Proceeding of the 15th ACM SIGPLAN International Conference on Functional Programming, ICFP 2010. ACM, 157168.CrossRefGoogle Scholar
Rutten, J. J. M. M. (1996). Elements of generalized ultrametric domain theory. Theoretical Computer Science 170 (1–2) 349381.CrossRefGoogle Scholar
Sato, T. (2014). Identifying all preorders on the subdistribution monad. In Jacobs, B., Silva, A. , and Staton, S., editors, Proceedings of the 30th Conference on the Mathematical Foundations of Programming Semantics, MFPS 2014, Ithaca, NY, USA, June 12-15, 2014, volume 308 of Electronic Notes in Theoretical Computer Science, pp. 309–327. Elsevier.CrossRefGoogle Scholar
Sato, T. (2016). Approximate relational hoare logic for continuous random samplings. In The Thirty-second Conference on the Mathematical Foundations of Programming Semantics, MFPS 2016, vol. 325. Electronic Notes in Theoretical Computer Science. Elsevier, 277–298.CrossRefGoogle Scholar
Sato, T., Barthe, G., Gaboardi, M., Hsu, J., and Katsumata, S. (2019). Approximate span liftings: Compositional semantics for relaxations of differential privacy. In 34th Annual ACM/IEEE Symposium on Logic in Computer Science, LICS 2019, 1–14. IEEE.CrossRefGoogle Scholar
Smirnov, A. (2008). Graded monads and rings of polynomials. Journal of Mathematical Sciences, 151:30323051.CrossRefGoogle Scholar
Sprunger, D., Katsumata, S., Dubut, J., and Hasuo, I. (2021). Fibrational bisimulations and quantitative reasoning: Extended version. Journal of Logic and Computation 31 (6) 15261559.CrossRefGoogle Scholar
Street, R. (1972). The formal theory of monads. Journal of Pure and Applied Algebra 2 (2) 149168.CrossRefGoogle Scholar
Wasserman, L. and Zhou, S. (2010). A statistical framework for differential privacy. Journal of the American Statistical Association 105 (489) 375389.CrossRefGoogle Scholar
Zaks, A. and Pnueli, A. (2008). Covac: Compiler validation by program analysis of the cross-product. In Cuellar, J., Maibaum, T. , and Sere, K. (eds.), FM 2008: Formal Methods, Berlin, Heidelberg: Springer Berlin Heidelberg, 3551.CrossRefGoogle Scholar
Figure 0

Table 1. (1-graded) $ \operatorname{Top} $-relative $ {\mathcal{Q}} $-divergences for cost counting monads

Figure 1

Table 2. $ \operatorname{Eq}{} $-relative M-graded $ {\mathcal{Q}} $- ($ {\mathcal{Q}}_s $-)divergences on G ($ G_{s} $)

Figure 2

Table 3. Statistical divergences that are $ \operatorname{Eq}{} $-relative $ {\mathcal{Q}} $- (resp. $ {\mathcal{Q}}_s $-) divergences on G (resp. $ G_{s} $)

Figure 3

Table 4. Parameters for Proposition 3

Figure 4

Figure 1. Quantitative equational theory rules.

Figure 5

Figure 2. Syntax of types and raw terms of the computational metalanguage.

Figure 6

Figure 3. Data for the categorical semantics of metalanguage.

Figure 7

Table 5. Relational Hoare logics for verifying divergences