1. Introduction
Logical relations. Logical relations arguments (see, e.g., Mitchell and Scedrov Reference Mitchell and Scedrov1992 for a survey) are proof techniques that can be used to demonstrate properties of typed programming languages (PL), ranging from strong normalization to canonicity and adequacy. The arguments are essentially type-guided forms of induction. They seem to have been reinvented several times by different research communities and are also known under various other names, including Tait’s method of computability, reducibility candidates, Artin gluing, Sierpinski cone, (sub)sconing, and Freyd cover.
Category theory gives a useful way to organize logical relations arguments: by viewing them as ways of building a new categorical semantics of a PL out of an existing ones. The new semantics then equips objects with predicates of some form and restricts the morphisms to those morphisms that respect the predicates. By choosing the right notion of predicates, we can ensure that the existence of this new semantics gives us the property we are hoping to prove about our PL.
In this paper, we present novel logical relations methods for languages with recursive features, together with an application of these techniques to correctness proofs for automatic differentiation (AD).
AD and the PL community. Automatic differentiation (AD, see, e.g., Griewank and Walther Reference Griewank and Walther2008 for a survey) is a popular family of techniques for computing derivatives of functions implemented by a piece of code, particularly when efficiency, scaling to high dimensions, and numerical stability are important. It has been studied in the scientific computing community for many decades and has been heavily used in machine learning for the last decade. In the last years, the PL community has turned toward studying AD from a new perspective. Much progress has been made toward giving a formulation of (forward and) reverse-mode AD that
-
(1) is simple and purely functional;
-
(2) scales to the expressive ML-family functional languages that are popular in practice;
-
(3) admits a simple correctness proof that shows that AD computes the derivative;
-
(4) provably has the correct asymptotic complexity and is performant in practice;
-
(5) is parallelism preserving.
Our contributions. In this paper, we present a simple solution to problems (1)–(3), our first major contribution.
We give a proof of the correctness of the reverse and forward mode dual numbers style AD in a semantically unified way, making use only of the very simple concrete denotational model of $\omega$ -cpos. In doing so, we simplify existing techniques that relied on sheaf-theoretic machinery (Vákár Reference Vákár2020; Huot et al. Reference Huot, Lew, Mansinghka and Staton2023).
A key challenge that we tackle to achieve the correctness proofs of this paper is to have sufficiently strong categorical logical relations techniques for reasoning about partially defined differentiable functions and term and type recursion. We believe that our novel methods can be simpler than existing alternatives such as Pitts (Reference Pitts1996) and Ahmed (Reference Ahmed and Sestoft2006), and they are still widely applicable, our second major contribution.
We refer to the companion paper (Smeding and Vákár 2022) for a performant implementation of the dual numbers reverse-mode AD technique proved correct in the present paper. It shows that it efficiently differentiates most of Haskell98, contributing toward point (4). Parallelism preservation (point (5)) for this AD technique is discussed in Smeding and Vákár (Reference Smeding and Vákár2024).
In our work, we ensure to keep all constructions sufficiently simple such that they can easily be generalized to more advanced AD algorithms such as CHAD (Vákár Reference Vákár2021; Vákár and Smeding Reference Vákár and Smeding2022; Lucatelli Nunes and Vákár Reference Lucatelli Nunes and Vákár2023), which is one of our key motivations for this work.
Why care and why is this difficult?. Given the central role that AD plays in modern scientific computing and machine learning, the ideal of differential programming has been emerging (Meijer Reference Meijer2018; Plotkin Reference Plotkin2018): compilers for general-purpose PL should provide built-in support for AD of any programs written in the language. Such general-purpose PL tend to include many language features, however, which we then need to be able to differentiate. What a correct and efficient notion of derivative is of such features might not be so straightforward as they often go beyond what is studied in traditional calculus. In this paper, we focus on the challenge posed, in particular, by partial language features: partial primitive operations, lazy conditionals on real numbers, iteration, recursion, and recursive types.
Partial primitive operations are certainly key. Indeed, even the basic operations of division and logarithm are examples. (Lazy) conditionals on real numbers are useful in practice for pasting together various existing smooth functions, a basic example being the ReLU function:
which is a key component of many neural networks. Conditionals are also frequently used in probabilistic programming to paste together density functions of different distributions (Betancourt Reference Betancourt2019). People have long studied the subtle issue of how one should algorithmically differentiate such functions with “kinks” under the name of the if-problem in AD (Beck and Fischer Reference Beck and Fischer1994). Our solution is the one also employed by Abadi and Plotkin (Reference Abadi and Plotkin2020): to treat the functions as semantically undefined at their kinks (at $x=0$ in the case of $ReLU(x)$ ). This is justified given how coarse the semantic treatment of floating-point numbers as real numbers is already. Our semantics based on partial functions defined on real numbers is sufficient to prove many high-level correctness properties. However, like any semantics based on real numbers, it fails to capture many of the low-level subtleties introduced by the floating-point implementation. Our key insight that we use to prove correctness of AD of partial programs is to construct a suitable lifting of the partiality monad to a variant of Huot et al. (Reference Huot, Staton and Vákár2020)’s category of $\mathbb{R}^k$ -indexed logical relations used to relate programs to their derivatives. This particular monad lifting for derivatives of partial functions can be seen as our solution to the if-problem in AD. In Section 11, we briefly discuss how the more ambitious solution to the if-problem in the style of Lee et al. (Reference Lee, Yu, Rival and Yang2020), Mazza and Pagani (Reference Mazza and Pagani2021), and Huot et al. (Reference Huot, Lew, Mansinghka and Staton2023) can also be achieved with our methods. In that solution, we show that the set of non-differentiable points where AD does not compute a correct derivative is of measure zero, which we achieve by choosing a different monad lifting.
Similarly, iteration constructs, or while-loops, are necessary for implementing iterative algorithms with dynamic stopping criteria. Such algorithms are frequently used in programs that AD is applied to. For example, AD is applied to iterative differential equation solvers to perform Bayesian inference in SIR models.Footnote 1 This technique played a key role in modeling the Covid-19 pandemic (Flaxman et al. Reference Flaxman, Mishra, Gandy, Unwin, Mellan, Coupland, Whittaker, Zhu, Berah and Eaton2020). For similar reasons, AD through iterative differential equation solvers is important for probabilistic modeling of pharmacokinetics (Tsiros et al. Reference Tsiros, Bois, Dokoumetzidis, Tsiliki and Sarimveis2019). Other common use cases of iterative algorithms that need to be AD’ed are eigen-decompositions and algebraic equation solvers, such as those employed in Stan (Carpenter et al. Reference Carpenter, Hoffman, Brubaker, Lee, Li and Betancourt2015). Finally, iteration gives a convenient way of achieving numerically stable approximations to complex functions (such as the Conway–Maxwell–Poisson density function (Goodrich Reference Goodrich2017). The idea is to construct, using iteration, a Taylor approximation that terminates once the next term in the series causes floating-point underflow. Indeed, for a function whose $i$ -th terms in the Taylor expansion can be represented by a program
we would define the underflow-truncated Taylor series by
where $\epsilon$ is a cutoff for floating-point underflow.
Next, recursive neural networks (Tai et al. Reference Tai, Socher and Manning2015) are often mentioned as a use case of AD applied to recursive programs. While basic Child-Sum Tree-LSTMs can also be implemented with primitive recursion (a fold) over an inductively defined tree (which can be defined as a recursive type), there are other related models such as Top-Down-Tree-LSTMs that require an iterative or general recursive approach (Zhang et al. Reference Zhang, Lu and Lapata2016). In fact, Jeong et al. (Reference Jeong, Jeong, Kim, Yu and Chun2018) have shown that a recursive approach is preferable as it better exposes the available parallelism in the model. In Appendix D, we show some Haskell code for the recursive neural network of Socher et al. (Reference Socher, Lin, Manning and Ng2011), to give an idea of how iteration and recursive types (in the form of inductive types of labeled trees) naturally arise in a functional implementation of such neural net architectures. We imagine that more applications of AD applied to recursive programs with naturally will emerge as the technique becomes available to machine learning researchers and engineers. Finally, we speculate that coinductive types like streams of real numbers, which can be encoded using recursive types as $\mu \alpha .\mathbf{1} \to (\mathbf{real}* \alpha )$ , provide a useful API for online machine learning applications (Shalev-Shwartz et al. Reference Shalev-Shwartz2012), where data is processed in real time as it becomes available. Recursion and more notably recursive types introduce one final challenge into the correctness proof of AD of such expressive functional programs: the required logical relations arguments are notoriously technical, limiting the audience of any work using them and frustrating application to more complicated AD algorithms like CHAD. To mend this problem, we introduce a novel, simple but powerful logical relations technique for open semantic logical relations for recursive types.
Prerequisites. We assume some familiarity with category theory (see, for instance, Mac Lane Reference Mac Lane2013): the concepts of and basic facts about categories, functors, natural transformations, (co)limits, adjunctions, and (co) monads. We also assume that the reader knows the most basic definitions in enriched category theory (see, for instance, Kelly Reference Kelly1982): the concepts of $\mathcal{V}$ -categories, $\mathcal{V}$ -functors, and $\mathcal{V}$ -natural transformations. We recall the definitions and results we need for $\mathcal{V}$ -monads and their Keisli $\mathcal{V}$ -categories (the interested reader can find more details in Dubuc Reference Dubuc1970). Later in this paper, we will also consider $\mathcal{V}$ -(co)limits, $\mathcal{V}$ -adjunctions, and $\mathcal{V}$ -(co)monadicity but only for the specific case of $\mathcal{V}=\boldsymbol{\omega } \mathbf{Cpo}$ with its cartesian structure. In these cases, we ensure to spell out all details to make the paper as self-contained as possible.
Convention. Whenever we talk about strict preservation of some structure (like products, coproducts, or exponentials), we are assuming that we have chosen structures (chosen products, coproducts, or exponentials) and the preservation is on the nose, that is to say, the canonical comparison is the identity.
2. Key Ideas
In this paper, we consider how to perform forward and reverse-mode dual numbers AD on a functional language with expressive partial features, by using a dual numbers technique.
Language
We consider an idealized functional language with product types $\tau \,{\mathop{\times }}\,\sigma$ , sum types $\tau \,{\mathop{\sqcup }}\,+\sigma$ , and function types $\tau \to \sigma$ generated by
-
• a primitive type $\mathbf{real}$ of real numbers (in practice, implemented as floating-point numbers);
-
• constants $\vdash \underline{c}:\mathbf{real}$ for $c\in \mathbb{R}$ ;
-
• sets $(\mathrm{Op}_n)_{n\in \mathbb{N}}$ of $n$ -ary primitive operations $\mathrm{op}$ , for which we include computations $x_1:\mathbf{real},\ldots, x_n:\mathbf{real}\vdash \mathrm{op} (x_1,\ldots, x_n):\mathbf{real}$ ; we think of these as implementing partial functions $\mathbb{R}^n\rightharpoonup \mathbb{R}$ with open domain of definition, on which they are differentiable; for example, we can include mathematical operations $\log, \exp \in \mathrm{Op}_1$ and $(+),(\!*\!),(/)\in \mathrm{Op}_2$ ;
-
• a construct $x:\mathbf{real}\vdash \mathbf{sign}\,(x):\mathbf{1}\,{\mathop{\sqcup }}\,+\mathbf{1}$ , where we write $\mathbf{1}$ for the empty product; $\mathbf{sign}\,{t}$ computes the sign of a real number $t$ and is undefined at ${t}=\underline{0}$ ; we can use it to define a lazy conditional on real numbers $\mathbf{if}\,r\,\mathbf{then}\,t\,\mathbf{else}\,s\, \stackrel{\mathrm{def}}{=} \mathbf{case}\,{\mathbf{sign}\,{r}}\;\mathbf{of}{\{\_\to{t}\mathrel{\big \lvert } \_\to{r}\}}$ of the kind that is often used in AD libraries like Stan (Carpenter et al. Reference Carpenter, Hoffman, Brubaker, Lee, Li and Betancourt2015).
Next, we include two more standard mechanisms for defining partial functions:
-
• (purely functional) iteration: given a computation $\Gamma, x \;:\; \tau \vdash{t} \;:\; \tau \,{\mathop{\sqcup }}\,+\sigma$ to iterate and a starting value $\Gamma \vdash{s} \;:\; \tau$ , we have a computation $\Gamma \vdash \mathbf{iterate}\,{t}\,\mathbf{from}\,{x}={s} \;:\; \sigma$ , which repeatedly calls $t$ , starting from the value of $s$ until the result lies in $\sigma$ ;
-
• recursion: given a computation $\Gamma, x:\tau \to \sigma \vdash{t}:\tau \to \sigma$ , we have a program $\Gamma \vdash \mu{x}.{t}:\tau\,{\rightarrow }\,\sigma$ that recursively computes to $\mathbf{let}\,x=\,\mu x.t\,\mathbf{in}\,t$ ; note that we can define iteration with recursion.
Dual numbers forward AD code transform
Let us assume that we have programs $\partial _i\mathrm{op} (x_1,\ldots, x_n)$ that compute the $i$ -th partial derivative of each $n$ -ary primitive operation $\mathrm{op}$ . For example, we can define $\partial _1(\!*\!)(x_1,x_2)=x_2$ and $\partial _2(\!*\!)(x_1,x_2)=x_1$ . Then, we can define a very straightforward forward mode AD code transformation $\mathcal{D}$ by replacing all primitive types $\mathbf{real}$ by a pair $\mathcal{D}\;({\mathbf{real}})\stackrel{\mathrm{def}}{=} \mathbf{real}\,{\mathop{\times }}\,\mathbf{real}$ of reals and by replacing all constants $\underline{c}$ , $n$ -ary primitive operations $\mathrm{op}$ and sign function $\mathbf{sign}\,$ in the program asFootnote 2
We extend $\mathcal{D}$ to all other types and programs in the unique homomorphic (structure preserving way), by using structural recursion. So, for example, $\mathcal{D}\;{(\tau \to \sigma )}\stackrel{\mathrm{def}}{=} \mathcal{D}\;{(\tau )}\to \mathcal{D}\;{(\sigma )}$ , $\mathcal{D}\;{(x)}\stackrel{\mathrm{def}}{=} x$ , $\mathcal{D}\;{(\mathbf{let}\,x=\,t\,\mathbf{in}\,s)}={\mathbf{let}\,x} ={\mathcal{D}\;{(t)}} \,\mathbf{in}\,{\mathcal{D}\;{(s)}}$ , and $\mathcal{D}\;{({t}\,{s})}=\mathcal{D}\;{(t)}\,\mathcal{D}\;{(s)}$ . We like to think of $\mathcal{D}$ as a structure preserving functor $\mathcal{D}:\mathbf{Syn}\,{\rightarrow }\,\mathbf{Syn}$ on the syntax.
Semantics
To formulate correctness of the AD transformation $\mathcal{D}$ , we need to assign a formal denotational semantics $[\![-]\!]$ to our language. We use the standard interpretation of types $\tau$ as $\omega$ -cpos $[\![\tau ]\!]$ (partially ordered sets with suprema of countable chains) and programs $x_1:\tau _1,\ldots, x_n:\tau _n\vdash{t}:\sigma$ as monotone $\omega$ -continuous partial functions $[\![t]\!] :[\![\tau _1]\!] \times \cdots \times [\![\tau _n]\!] \rightharpoonup [\![\sigma ]\!]$ . We interpret $\mathbf{real}$ as the discrete $\omega$ -cpo $[\![\mathbf{real}]\!] \stackrel{\mathrm{def}}{=} \mathbb{R}$ of real numbers, in which $r\leq r'$ if and only if $r=r'$ . We interpret $\underline{c}$ as the constant $[\![\underline{c}]\!] \stackrel{\mathrm{def}}{=} c\in \mathbb{R}$ . We interpret $\mathrm{op}$ as the partial differentiable function $[\![\mathrm{op} (x_1,\ldots, x_n)]\!] :\mathbb{R}^n\rightharpoonup \mathbb{R}$ that it is intended to implement. And, finally, we interpret $\mathbf{sign}\,$ as the partial function $[\![\mathbf{sign}\,(x)]\!] :\mathbb{R}\rightharpoonup \mathbf{1} \sqcup \mathbf{1}$ that sends $r\lt 0$ to the left copy of $\mathbf{1}$ and $r\gt 0$ to the right copy and is undefined for $r=0$ . Having fixed these definitions, the rest of the semantics is entirely compositional and standard. In particular, we interpret iteration and recursion using Kleene’s fixpoint theorem. We think of this semantics as a structure preserving functor $[\![-]\!] :\mathbf{Syn}\,{\rightarrow }\,\boldsymbol{\omega } \mathbf{Cpo}$ from the syntax to the category of $\omega$ -cpos and monotone $\omega$ -continuous functions.
Correctness statement
Having defined a semantics, we can phrase what it means for $\mathcal{D}$ to be correct. We prove the following, showing that $\mathcal{D}\;{(t)}$ implements the usual calculus derivative $D[\![t]\!]$ of $[\![t]\!]$ .
Theorem 2.1 (Forward AD correctness, Theorem 9.1 with $k=1$ in main text). For any program $x:\tau \vdash{t}:\sigma$ for $\tau =\mathbf{real}^{m},\sigma =\mathbf{real}^l$ (where we write $\mathbf{real}^n$ for the type $\mathbf{real}\,{\mathop{\times }}\,\cdots \,{\mathop{\times }}\,\mathbf{real}$ of length $n$ tuples of reals), we have that $[\![t]\!]$ is differentiable on its domain and
for any $(x_1,\ldots, x_m)$ in the domain of definition of $[\![t]\!]$ and any tangent vector $(v_1,\ldots, v_m)$ to $[\![\tau ]\!]$ at $x$ .
Importantly, the program $t$ might use higher-order functions, iteration, recursion, etc. In fact, we also establish the theorem above for general types $\tau$ and $\sigma$ not containing function types, but its phrasing requires slight bookkeeping that might distract from the simplicity of the theorem.
A proof via logical relations
The proof of the correctness theorem follows a logical relations argument that we found using categorical methods, but which can be phrased entirely in elementary terms. Let us fix some $n\in \mathbb{N}$ . We define for all types $\tau$ of our language, by induction, relations $T^{n}_{\tau }\subseteq (\mathbb{R}^n\to [\![\tau ]\!] )\times ((\mathbb{R}^n\times \mathbb{R}^n)\to [\![\mathcal{D}\;{(\tau )}]\!] )$ and $P^{n}_{\tau }\subseteq (\mathbb{R}^n\rightharpoonup{[\![\tau ]\!] })\times ((\mathbb{R}^n\times \mathbb{R}^n)\rightharpoonup{[\![\mathcal{D}\;{(\tau )}]\!] })$ that relate a (partial) $n$ -curve to its derivative $n$ -curve:
We then prove the following “fundamental lemma,” using induction on the typing derivation of $t$ :
If $x_1:\tau _1,\ldots, x_n:\tau _n\vdash{t}:\sigma$ and for $1\leq i\leq n$ , $(f_i, f^{\prime}_i)\in T^{n}_{\tau _i}$ , then
$(x\mapsto [\![t]\!] (f_1(x),\ldots, f_n(x)), (x,v)\mapsto [\![\mathcal{D}\;{(t)}]\!] (f^{\prime}_1(x,v),\ldots, f^{\prime}_n(x,v)))\in P^{n}_{\sigma }$ .
For example, we use that, by assumption, $[\![\partial _i\mathrm{op} (x_1,\ldots, x_n)]\!]$ equals the $i$ -th partial derivative of $[\![\mathrm{op} (x_1,\ldots, x_n)]\!]$ combined with the chain rule, to show that primitive operations $\mathrm{op}$ respect the logical relations. Crucial features to enable the inductive steps for iteration and recursion in the proof of the fundamental lemma are that $T^n_{\mathbf{real}}$ and $P^n_{\tau }$ are closed under suprema of countable chains and that $P^n_{\tau }$ contains the least element.
As $T^{k}_{\mathbf{real}^k}$ contains, in particular, $(\textrm{id}, ((x_1,\ldots, x_k), (v_1,\ldots, v_k))\mapsto ((x_1,v_1),\ldots, (x_n,v_k)))$ , our correctness theorem follows.
Extending to recursive types via a novel categorical logical relations technique
Next, we extend our language with ML-style polymorphism and recursive types. That is, we allow the formation of types $\tau$ with free type variables $\alpha$ , and we include a type variable binder $\boldsymbol{\mu }\alpha .\tau$ , which binds $\alpha$ in $\tau$ and computes a canonical fixpoint of $\alpha \mapsto \tau$ . We extend our AD transformation homomorphically on terms and types. For example, on types, we define
A type $\tau$ with $n$ free type variables gets interpreted in our $\omega$ -cpo-semantics as an $n$ -ary mixed-variance endofunctor $[\![\tau ]\!]$ on the category of $\omega$ -cpos and partial morphisms that restricts to that of $\omega$ -cpos and total morphisms. Programs with types that have free variables get interpreted as (di)natural transformations. As the category of $\omega$ -cpos and partial morphisms has the structure to interpret recursive types, we have a canonical minimal invariant
for the mixed-variance endofunctors $[\![\tau ]\!]$ on $\boldsymbol{\omega } \mathbf{Cpo}$ that types $\tau$ denote (Levy Reference Levy2012). We interpret $[\![\boldsymbol{\mu }\alpha .\tau ]\!] \stackrel{\mathrm{def}}{=} \mu [\![\tau ]\!]$ .
To extend the correctness proof to this larger language, we would like to define the logical relation
That is, we would like to be able to define relations using type recursion. If we can do so, then extending the proof of the fundamental lemma is straightforward. We can then establish the correctness theorem also for $\tau$ and $\sigma$ that involve recursive types.
The traditional method is to follow the technical recipes of Pitts (Reference Pitts1996). Instead, we develop a powerful new logical relations technique for recursive types, which we believe to be more conceptually clear and easier to use in situations like ours. To be precise, we prove a general result saying that under mild conditions, we can interpret recursive types in the category of logical relations over a category that models recursive types itself. For simplicity, we state an important special case that we need for our application here.
Given a right adjoint $\boldsymbol{\omega } \mathbf{Cpo}$ -enriched functor $G:\boldsymbol{\omega } \mathbf{Cpo}^n\,{\rightarrow }\, \boldsymbol{\omega } \mathbf{Cpo}$ (such as $G(X)=\boldsymbol{\omega } \mathbf{Cpo}^n(Y, X)$ for some $Y\in \boldsymbol{\omega } \mathbf{Cpo}^n$ ), consider the category $\mathbf{SScone}$ of $n$ -ary logical relations, which has objects $(X, T)$ , where $X\in \boldsymbol{\omega } \mathbf{Cpo}^n$ and $T$ is a (full) sub- $\omega$ -cpo of $GX$ , and morphisms $(T,X)\,{\rightarrow }\, (T',X')$ are $\boldsymbol{\omega } \mathbf{Cpo}^n$ -morphisms $f:X\,{\rightarrow }\, X'$ such that $y\in T$ implies $Gf(y)\in T'$ .
Theorem 2.2 (Logical relations for recursive types, special case of Theorem 10.14 in main text). Let $L$ be a strong monad on $\mathbf{SScone}$ that lifts the usual partiality monad ${(-)}_{\bot }$ on $\boldsymbol{\omega } \mathbf{Cpo}^n$ along the projection functor $\mathbf{SScone}\,{\rightarrow }\, \boldsymbol{\omega } \mathbf{Cpo}^n$ . We assume that $L$ takes the initial object to the terminal one and that $G(\eta _X)^{-1}(L(T,X))=T$ , where we write $\eta _X:X\,{\rightarrow }\,{X}_{\bot }$ for the unit of the partiality monad on $\boldsymbol{\omega } \mathbf{Cpo}^n$ . Then, the Kleisli functor $\mathbf{SScone}\hookrightarrow \mathbf{SScone}_L$ gives a model for recursive types.
Spelled out in non-categorical terms, we are considering logical relations
and we require that the relation $P_{0}$ at the initial object (empty type) is precisely the singleton set $\{\bot \}$ (containing the least element) and $G({[\![\tau ]\!] }\subseteq{{[\![\tau ]\!] }}_{\bot })^{-1}(P_{\tau })$ (which we think of as the total elements in $P_{\tau }$ ) coincide with $T_{\tau }$ .
In particular, in our case, we work binary logical relations ( $n=2$ ) with
and the monad lifting
Consequently, we can define the logical relations $T_{\boldsymbol{\mu }\alpha .\tau }$ using type recursion, as desired.
Dual numbers reverse AD
Similarly to dual numbers forward AD $\mathcal{D}$ , we can define a reverse AD code transformation $\overleftarrow{\mathcal{D}}$ : we define $ \overleftarrow{\mathcal{D}}(\mathbf{real})\stackrel{\mathrm{def}}{=} \mathbf{real}\,{\mathop{\times }}\,\mathbf{vect}$ and
and extend homomorphically to all other type and term formers, as we did before. In fact, this algorithm is exactly the same as dual numbers forward AD in code with the only differences being that
-
(1) the type $\mathbf{real}$ of real numbers for tangents has been replaced with a new type $\mathbf{vect}$ , which we think of as representing (dynamically sized) cotangent vectors to the global input of the program;
-
(2) the zero $\underline{0}$ and addition $(+)$ of type $\mathbf{real}$ have been replaced by the zero ${ 0}^{{\mathbf{v}}}$ and addition $({+}^{\mathbf{v}})$ of cotangents of type $\mathbf{vect}$ ;
-
(3) the multiplication $(\!*\!):\mathbf{real}\,{\mathop{\times }}\,\mathbf{real}\,{\rightarrow }\, \mathbf{real}$ has been replaced by the operation $(*^{{\mathbf{v}}}):\mathbf{vect} \,{\mathop{\times }}\, \mathbf{real}\,{\rightarrow }\,\mathbf{vect}$ : $({v}*^{{\mathbf{v}}}{r})$ is the rescaling of a cotangent $v$ by the scalar $r$ .
We write $\overline{e} _{i}$ for program representing the $i$ -th canonical basis vector $e_i$ of type $\mathbf{vect}$ , and we write
We define $[\![\mathbf{vect} ]\!] \stackrel{\mathrm{def}}{=} \mathbb{R}^\infty \stackrel{\mathrm{def}}{=}\sum _{k=0}^\infty \mathbb{R}^k$ as the infinite (vector space) coproduct of $k$ -dimensional real vector spaces. That is, we interpret $\mathbf{vect}$ as the type of dynamically sized real vectors.Footnote 3 We show that $\overleftarrow{\mathcal{D}}{(t)}$ implements the transposed derivative $D[\![t]\!] ^t$ of $[\![t]\!]$ in the following sense.
Theorem 2.3 (Reverse AD correctness, Theorem 9.1 with $k=\infty$ in main text). For any program $x:\tau \vdash{t}:\sigma$ for $\tau =\mathbf{real}^{s},\sigma =\mathbf{real}^l$ ,
for any $(x_1,\ldots, x_s)$ in the domain of definition of $[\![t]\!]$ .
We prove this theorem again using a similar logical relations argument, defining $T^{n}_{\tau }\subseteq (\mathbb{R}^n\to [\![\tau ]\!] )\times ((\mathbb{R}^n\times{(\mathbb{R}^\infty )}^n)\to [\![\overleftarrow{\mathcal{D}}{(\tau )}]\!] )$ and $P^{n}_{\tau }\subseteq (\mathbb{R}^n\rightharpoonup{[\![\tau ]\!] })\times (\mathbb{R}^n\times{(\mathbb{R}^\infty )}^n)\rightharpoonup{[\![\overleftarrow{\mathcal{D}}{(\tau )}]\!] })$ as before for all types $\tau$ of language, setting
where we consider $(\mathbb{R}^\infty ){}^n$ as a type of linear transformations from $\mathbb{R}^n$ to $\mathbb{R}^\infty$ and we write $e_i$ for the $i$ -th standard basis vector of $\mathbb{R}^n$ . We then prove the following “fundamental lemma,” using induction on the typing derivation of $t$ :
If $x_1:\tau _1,\ldots, x_n:\tau _n\vdash{t}:\sigma$ and, for $1\leq i\leq n$ , $(f_i, f^{\prime}_i)\in T^{n}_{\tau _i}$ , then $(x\mapsto [\![t]\!] (f_1(x),\ldots, f_n(x)), (x,L)\mapsto [\![\overleftarrow{\mathcal{D}}{(t)}]\!] (f^{\prime}_1(x,L),\ldots, f^{\prime}_n(x,L)))\in P^{n}_{\sigma }$ .
As $T^{s}_{\mathbf{real}^s}$ contains, in particular,
our theorem follows.
Extending to arrays
AD tends to be applied to programs that manipulate large arrays of reals. Seeing that such arrays are denotationally equivalent to lists $\boldsymbol{\mu }\alpha .\mathbf{1}\,{\mathop{\sqcup }}\,+\alpha \,{\mathop{\times }}\,\mathbf{real}$ , while only the computational complexity of operations differs, our correctness result also applies to functional languages with arrays. We thus differentiate array types ${\tau }[]$ with elements of type $\tau$ in the obvious structure preserving way, for example
and similarly for dual numbers reverse AD.
Almost everywhere differentiability
Taking inspiration from Lee et al. (Reference Lee, Yu, Rival and Yang2020),Mazza and Pagani (Reference Mazza and Pagani2021), and Huot et al. (Reference Huot, Lew, Mansinghka and Staton2023), we can increase our ambitions and show that, given sufficiently nice primitive operations, our AD methods compute correct derivatives almost everywhere for (almost everywhere) terminating programs in our language. In fact, a minor adaptation of our methods yields these results. Indeed, as long as we assume that all our (partial) primitive operations denote functions that are piecewise analytic under analytic partition (PAP) and are defined on a $c$ -analytic subset (meaning: a countable union of analytic subsets) of $\mathbb{R}^n$ , then we can simply redefine our logical relations
to conclude that
-
• any program $x:\tau \vdash{t}:\sigma$ for $\tau =\mathbf{real}^{m},\sigma =\mathbf{real}^l$ in our language denotes a partial PAP function $[\![{t}$ ]]defined on a $c$ -analytic subset;
-
• our AD transformation $\mathcal{D}\;{(t)}$ computes $\left ( [\![t]\!], g\right )$ for an intensional derivative $g$ of $[\![t]\!]$ , which coincides almost everywhere in the domain with the (standard) derivative $D[\![t]\!]$ of $[\![t]\!]$ .
Consequently, if our program terminates almost everywhere, the AD transformation computes the correct derivative almost everywhere.
3. Overview
We briefly sketch the high-level plan of attack that we will follow in this paper. In this work, our guiding philosophy is to consider categorical models of functional PL as categories with a certain kind of structure:
-
• certain chosen types $\mathbf{real}$ and morphisms $\mathrm{op}$ for basic types of real numbers and primitive operations (such as $\cos$ and multiplication) between real numbers;
-
• finite products, to represent tuples;
-
• finite coproducts, to represent variants;
-
• exponentials, to build types of curried and higher-order functions;
-
• a (partiality) monad such that the Kleisli category supports certain morphism-level fixpoint operators to represent (call-by-value) iteration (while-loops) and recursion;
-
• certain object-level fixpoint operators to represent recursively defined types.
Due to its technical complexity, we isolate the discussion of the last feature (recursive types) in Section 10.
A crisp formulation for the last two bullet points above is hard to find in the literature. Therefore, we develop such a formulation in detail in Sections 4 and 10.2. We further, in Sections 5 and 10.6, show that there are particularly well-behaved models of these features if we have enrichment over $\omega$ -cpos ( $\omega$ -chain complete partial orders). All the models we consider, except for the syntax, will fall into this well-behaved class.
We will generally identify the syntax of a PL, up to $\beta \eta$ -equality, with the freely generated (or initial) such category $\mathbf{Syn}$ . We can then understand AD as a structure preserving functor (preserving all the structure described above)
that sends $\mathbf{real}$ to a type of pairs $\mathbf{real}\times \mathbf{vect}$ (for storing both values and derivatives) and each primitive operation $\mathrm{op}$ to its derivative. We discuss this in Sections 6, 10.1, and 10.4.
In order to phrase the correctness of AD, we first need to fix the meaning of the programs in our language. That is sometimes done using an operational semantics that describes how programs are evaluated in time. Here, we work, instead, with a denotational semantics that systematically assigns spaces (in our case, $\omega$ -cpos) to types and mathematical functions (in our case, $\omega$ -cocontinuous, monotone functions) to programs. We can understand such a denotational semantics as a structure preserving functor to the category $\boldsymbol{\omega } \mathbf{Cpo}$ of $\omega$ -cpos:
which sends $\mathbf{real}$ to the real numbers $\mathbb{R}$ and each primitive operation $\mathrm{op}$ to the function $[\![\mathrm{op} ]\!]$ it intends to implement. Importantly, we are now in a position to phrase a correctness theorem for AD by relating the semantics of an AD-transformed program to the mathematical derivative of that program. We discuss this in Sections 7 and 10.7.
Our proof strategy for this correctness theorem is a logical relations proof, which we again phrase categorically. Given a functor $G:\boldsymbol{\omega } \mathbf{Cpo}^n\,{\rightarrow }\, \boldsymbol{\omega } \mathbf{Cpo}$ , we can consider the category $\mathbf{SScone}$ of $n$ -ary logical relations, which has objects $(X, T)$ , where $X\in \boldsymbol{\omega } \mathbf{Cpo}^n$ and $T$ is a (full) sub- $\omega$ -cpo $GX$ and morphisms $(T,X)\,{\rightarrow }\, (T',X')$ are $\boldsymbol{\omega } \mathbf{Cpo}^n$ -morphisms $f:X\,{\rightarrow }\, X'$ such that $y\in T$ implies $Gf(y)\in T'$ . Our proof proceeds by making a sensible choice of $G$ (Section 9.1) and giving a new categorical semantics $ \overline{[\![\hspace{-2.5pt}[-]\!] \hspace{-2.5pt}]}_{} :\mathbf{Syn}\,{\rightarrow }\, \mathbf{SScone}$ in the category of logical relations, such that the following diagram commutes and that the commuting of this diagram immediately implies the correctness of AD (Sections 9.4, 9.5, 9.6, and 9.7):
How do we construct such a semantics though? For that, we need to show that the category of logical relations has all the structure needed to interpret our language. That is:
-
• we show that products, coproducts, exponentials, and morphism-level fixpoint operators for iteration and recursion exist in our category of logical relations (Sections 8 and 9.2);
-
• we show that object-level fixpoint operators for recursive types exist in our category of logical relations (Section 10.8);
-
• we choose a sensible logical relation $ \overline{[\![\hspace{-2.5pt}[\mathbf{real}]\!] \hspace{-2.5pt}]}_{}$ for $\mathbf{real}$ to precisely capture correct differentiation, and we demonstrate that each primitive operation respects the chosen logical relations (Section 9.3).
The only choices that need to be made to construct this interpretation are the choice of logical relations associated with $\mathbf{real}$ and with partial functions (in the form of a lifting of the partiality monad to logical relations). All other required structure is unique thanks to a universal property. Further, the commutativity of the diagram above follows automatically from the initiality of $\mathbf{Syn}$ among all categorical models.
We believe that our methods for interpreting morphism- and object-level fixpoint combinators in categories of logical relations can be a simplification compared to existing methods. We therefore aim to present them in a reusable way.
4. Categorical Models for CBV Languages: $CBV$ Pairs and Models
The aim of this section is to establish a class of categorical models for call-by-value (CBV) languages with free notions of recursion and iteration. This material will be of importance as we will later consider particular examples of such models constructed from (1) the syntax of PL, (2) a concrete denotational semantics for PL in terms of $\omega$ -cpos, and (3) that concrete semantics decorated with logical relations to enable correctness proofs of AD.
Given a cartesian closed category $\mathcal{V}$ , we can see it as a $\mathcal{V}$ -enriched category w.r.t. the cartesian structure. Recall that a strong monad $\mathcal{T}$ on a cartesian closed category $\mathcal{V}$ is the same as a $\mathcal{V}$ -monad on $\mathcal{V}$ . More precisely, it is a triple
where $T$ is a $\mathcal{V}$ -endofunctor and $\mathrm{m}, \eta$ are $\mathcal{V}$ -natural transformations, satisfying the usual associativity and identity equations, that is to say, $\mathrm{m}\cdot \left ( \mathrm{m} T\right ) = \mathrm{m} \cdot \left ( T\mathrm{m} \right )$ and $\mathrm{m}\cdot \left ( \eta T \right ) = \mathrm{id} _{T}= \mathrm{m}\cdot \left ( T\eta \right )$ .Footnote 4
Let $\mathcal{T} = \left ( T, \mathrm{m}, \eta \right )$ and $\mathcal{T}^{\;\,\prime} = \left ( T', \mathrm{m} ', \eta ' \right )$ be monads on $\mathcal{V}$ and $\mathcal{V}\,'$ , respectively. Recall that an oplax morphism (or a monad op-functor) between $\mathcal{T}$ and $\mathcal{T}^{\;\,\prime}$ is a pair
where $H$ is a functor and ${\phi}$ is a natural transformation, such that
By the universal property of Kleisli categories, denoting by $J\;:\; \mathcal{V}\,{\rightarrow }\, \mathcal{C}$ and $J\;:\; \mathcal{V}\,'\,{\rightarrow }\, \mathcal{C} '$ the universal Kleisli functors, the oplax morphims (4) correspond bijectively with pairs of functors $\left ( H \;:\; \mathcal{V}\,{\rightarrow }\, \mathcal{V}\,', \overline{H} \;:\; \mathcal{C}\,{\rightarrow }\, \mathcal{C} ' \right )$ such that the diagram (7) commutes.
Definition 4.1 ( $CBV$ pair). A $CBV$ pair is a pair $\left (\mathcal{V}, \mathcal{T}\, \right )$ where $\mathcal{V}$ is bicartesian closed category (i.e., a cartesian closed category with finite coproducts) and $\mathcal{T}$ is a $\mathcal{V}$ -monad on $\mathcal{V}$ . We further require that $\mathcal{V}$ has chosen finite products, coproducts, and exponentials.
A $CBV$ pair morphism between the $CBV$ pairs $\left (\mathcal{V}, \mathcal{T}\, \right )$ and $\left (\mathcal{V}\,', \mathcal{T}^{\;\,\prime} \right )$ is a strictly bicartesian closed functor $H:\mathcal{V}\,{\rightarrow }\, \mathcal{V}\,'$ that is a strict morphism between $\mathcal{T}$ and $\mathcal{T}^{\;\,\prime}$ , that is such that $HT = T'H$ and $\left ( H, \mathrm{id} \right )$ defines a monad op-functor (4). This defines a category of $CBV$ pairs and $CBV$ pair morphisms, denoted herein by $\mathfrak{C}_{\mathtt{p}}$ .
Remark 4.2. If $\left ( \mathcal{V}, \mathcal{T}\, \right )$ is a $CBV$ pair, since $\mathcal{T}$ is $\mathcal{V}$ -enriched, we get a $\mathcal{V}$ -enriched Kleisli category $\mathcal{C}$ . We denote by
the $\mathcal{V}$ -enriched hom functor. It should be noted that, if we denote by $\left ( X \Rightarrow Y \right ) = \mathcal{V}\left [{X},{Y}\right ]$ the exponential in $\mathcal{V}$ , we have that $\mathcal{C}\left [{X},{Y}\right ] = \left ( X \Rightarrow ^k Y \right ) = \left ( X \Rightarrow TY \right )$ , which is the so-called Kleisli exponential and corresponds to the function types for our language.
Denoting by $\mathcal{C}$ and $\mathcal{C} '$ the respective Kleisli categories, each morphism
of $CBV$ pairs gives rise to a commutative square
where $J$ and $J '$ are, respectively, the universal Kleisli functors of $\mathcal{T}$ and $\mathcal{T}^{\;\,\prime}$ . In this case, $\overline{H}$ strictly preserves Kleisli exponentials, finite coproducts, and the action of $\mathcal{V}$ on $\mathcal{C}$ . That is to say, $\left ( H, \overline{H}\right )$ strictly preserves the distributive closed Freyd-categorical structure.Footnote 5
4.1 $CBV$ models: term recursion and iteration
In order to interpret our language defined in Section 6, we need an additional support for term recursion and iteration. Since we do not impose further equations for the iteration or recursion constructs in our language, the following definitions establish our class of models for term recursion and iteration. In contrast with most other references we are aware of, we give an explicit discussion of the case of iteration, even though it is definable in terms of recursion. The reason is that there are interesting PL with iteration but without recursion, of which we might want to prove properties.
Definition 4.3 (Free recursion and iteration). Let $\left ( \mathcal{V}, \mathcal{T}\, \right )$ be a $CBV$ pair and $ \mathcal{C}$ the corresponding $\mathcal{V}$ -enriched Kleisli category.
-
• A free recursion for $\left ( \mathcal{V}, \mathcal{T}\, \right )$ is a family of morphisms
(8) \begin{equation} \mu = \left ( \mu ^{W,Y} \;:\; \mathcal{V}\left[{\mathcal{C}\left [ {W},{Y} \right ] },{\mathcal{C}\left [ {W},{Y} \right ]}\right] \longrightarrow \mathcal{C}\left [{W},{Y}\right ]\right ) _{(W,Y)\in \mathcal{C}\times \mathcal{C}} \end{equation}in $\mathcal{V}$ . -
• A free iteration for $\left ( \mathcal{V}, \mathcal{T}\, \right )$ is a family of morphisms
(9) \begin{equation} {\mathsf{itt}} = \left ({\mathsf{itt}}^{W,Y} \;:\; \mathcal{C}\left [W,W{ \sqcup Y }\right] \longrightarrow \mathcal{C}\left [ { W },{ Y } \right ]\right ) _{(W,Y)\in \mathcal{C}\times \mathcal{C} } \end{equation}in $\mathcal{V}$ .
Definition 4.4 ( $CBV$ model). A $CBV$ model is a quadruple $\left ( \mathcal{V}, \mathcal{T}, \mu, {\mathsf{itt}} \right )$ in which $\left ( \mathcal{V}, \mathcal{T}\right )$ is a $CBV$ pair, $\mu$ is a free recursion, and ${\mathsf{itt}}$ is a free iteration for $\left ( \mathcal{V}, \mathcal{T}\right )$ .
A $CBV$ model morphism between the $CBV$ models $\left ( \mathcal{V}, \mathcal{T}, \mu, {\mathsf{itt}} \right )$ and $\left ( \mathcal{V}\,', \mathcal{T}^{\;\,\prime}, \mu ', {\mathsf{itt}} ' \right )$ is a morphism $H$ between the underlying $CBV$ pairs such that $ H \left ( \mu ^{W,Y} \right ) = \mu '^{HW, HY}$ and $ H \left ({\mathsf{itt}} ^{W, Y} \right ) ={\mathsf{itt}} '^{HW, HY}$ , for any $\left ( W, Y\right )\in \mathcal{V}\times \mathcal{V}$ . This defines a category of $CBV$ models, denoted herein by $\mathfrak{C}_{\mathcal{BV}}$ .
It should be noted that $\mathfrak{C}_{\mathcal{BV}}$ has finite products. Given two $CBV$ models $\left ( \mathcal{V} _0, \mathcal{T} _ 0, \mu _0, {\mathsf{itt}} _0 \right )$ and $\left ( \mathcal{V} _1, \mathcal{T} _ 1, \mu _1, {\mathsf{itt}} _1 \right )$ , the product is given by
where $\left ( \mu _0 \times \mu _1\right )^{\left ( W, W'\right ), \left ( Y, Y' \right ) } = \mu _0 ^{W,Y}\times \mu _1 ^{W',Y'}$ and $\left ({\mathsf{itt}} _0 \times{\mathsf{itt}} _1\right )^{\left ( W, W'\right ), \left ( Y, Y' \right ) } = \left ({\mathsf{itt}} _0 ^{W,Y}\times{\mathsf{itt}} _1 ^{W',Y'} \right )$ .
5. Canonical Fixed Points from 2-Dimensional Structure
The aim of this section is to specialize to a class of particularly well-behaved $CBV$ pairs and models, as they possess canonical iteration and recursion constructs that arise from a universal property of being a least fixed point. To phrase this universal property (and thus obtain uniqueness), we need the homsets of our models to be categories (or, posets, if we do not care to distinguish different ways of comparing morphisms), leading us to consider $\mathcal{V}$ that are $\mathbf{Cat}$ - or $\mathbf{Pos}$ -enriched. To obtain existence of these fixed points, it is sufficient to have colimits of countable chains, leading us to specialize to $\mathcal{V}$ that are enriched over $\omega$ -cocomplete categories or posets.
We denote by $ \boldsymbol{\omega } \mathbf{Cpo}$ the usual category of $\omega$ -cpos. The objects of $ \boldsymbol{\omega } \mathbf{Cpo}$ are the partially ordered sets with colimits of $\omega$ -chains, while the morphisms are functors preserving these colimits. An $\omega$ -cpo is called pointed if it has a least element, denoted herein by $\bot$ . We say that $f\in \boldsymbol{\omega } \mathbf{Cpo}\left ( W, Y\right )$ is a pointed $\boldsymbol{\omega } \mathbf{Cpo}$ -morphism if $W$ is pointed and $f$ preserves the least element.
It is well known that $ \boldsymbol{\omega } \mathbf{Cpo}$ is bicartesian closed.Footnote 6 We consider $ \boldsymbol{\omega } \mathbf{Cpo}$ -enriched categories w.r.t. the cartesian structure. Henceforth, if $\mathcal{V}$ is an $\boldsymbol{\omega } \mathbf{Cpo}$ -enriched category and $W, Y$ are objects of $\mathcal{V}$ , we denote by $\mathcal{V}\left ( W, Y\right )$ the $ \boldsymbol{\omega } \mathbf{Cpo}$ -enriched hom, that is to say, the $\omega$ -cpo of morphisms between $W$ and $Y$ .
An $\boldsymbol{\omega } \mathbf{Cpo}$ -category $\mathcal{V}$ has finite $\boldsymbol{\omega } \mathbf{Cpo}$ -products if it has a terminal object and we have a natural isomorphism of $\omega$ -cpos
or, equivalently, if it has finite products and tupling is an $\boldsymbol{\omega } \mathbf{Cpo}$ -morphism. Dually, an $\boldsymbol{\omega } \mathbf{Cpo}$ -category $\mathcal{V}$ has finite $\boldsymbol{\omega } \mathbf{Cpo}$ -coproducts if it has an initial object and we have a natural isomorphism of $\omega$ -cpos
or, equivalently, if it has finite coproducts and cotupling is an $\boldsymbol{\omega } \mathbf{Cpo}$ -morphism. We say that an $\boldsymbol{\omega } \mathbf{Cpo}$ -functor $F:\mathcal{V}\,{\rightarrow }\, \mathcal{V}\,'$ has an $\boldsymbol{\omega } \mathbf{Cpo}$ -right adjoint $U:\mathcal{V}\,'\,{\rightarrow }\, \mathcal{V}$ if we have a natural isomorphism of $\omega$ -cpos
or, equivalently, if it has a right adjoint functor $U:\mathcal{V}\,'\,{\rightarrow }\, \mathcal{V}$ such that the homset bijection $\mathcal{V}\,'(FZ, W')\,{\rightarrow }\, \mathcal{V}(Z, UW')$ is an $\boldsymbol{\omega } \mathbf{Cpo}$ -morphism. An $\boldsymbol{\omega } \mathbf{Cpo}$ -category $\mathcal{V}$ is $\boldsymbol{\omega } \mathbf{Cpo}$ -cartesian closed if $\mathcal{V}$ has finite $\boldsymbol{\omega } \mathbf{Cpo}$ -products, and moreover, for each object $Z\in \mathcal{V}$ , the $\boldsymbol{\omega } \mathbf{Cpo}$ -functor $\left (Z\times -\right ) \;:\; \mathcal{V}\,{\rightarrow }\,\mathcal{V}$ has a right $\boldsymbol{\omega } \mathbf{Cpo}$ -adjoint $\mathcal{V}\left [, \right ]{Z}{-}$ , called, herein, the $\boldsymbol{\omega } \mathbf{Cpo}$ -exponential of $Z$ . An $\boldsymbol{\omega } \mathbf{Cpo}$ -functor $H \;:\; \mathcal{V}\,{\rightarrow }\, \mathcal{V}\,'$ is strictly $\boldsymbol{\omega } \mathbf{Cpo}$ -cartesian closed if it strictly preserves the $\boldsymbol{\omega } \mathbf{Cpo}$ -products and the induced comparison $H\circ \mathcal{V}\left [{-},{-}\right ]\,{\rightarrow }\,{\mathcal{V}\,'}\left [{H(-)},{H(-)}\right ]$ is the identity.
Let $\mathcal{V}$ be $\boldsymbol{\omega } \mathbf{Cpo}$ -cartesian closed. For any $Z\in \mathcal{V}$ , since the hom functor $\mathcal{V}\left ({Z},{-}\right ) \;:\; \mathcal{V}\,{\rightarrow }\,\boldsymbol{\omega } \mathbf{Cpo}$ is cartesian, it induces the change of enriching base $2$ -functors
between the $2$ -categories of enriched categories w.r.t. the cartesian structures. Therefore, taking $Z = \mathsf{1}$ (the terminal object of $\mathcal{V}$ ), we get that every $\mathcal{V}$ -category ( $\mathcal{V}$ -functor/ $\mathcal{V}$ -monad) has a suitable underlying $\boldsymbol{\omega } \mathbf{CPO}$ -category ( $\boldsymbol{\omega } \mathbf{Cpo}$ -functor/ $\boldsymbol{\omega } \mathbf{Cpo}$ -monad), given by its image by $\mathfrak{G}_{\boldsymbol{\omega } \mathbf{Cpo}} := \mathfrak{G}_{\mathcal{V}\left ({\mathsf{1}},{-}\right )}$ .
Definition 5.1 ( $CBV$ $\boldsymbol{\omega } \mathbf{Cpo}$ -pair). A $CBV$ $\boldsymbol{\omega } \mathbf{Cpo}$ -pair is a $CBV$ pair $\left ( \mathcal{V}, \mathcal{T}\, \right )$ in which $\mathcal{V}$ is an $ \boldsymbol{\omega } \mathbf{Cpo}$ -bicartesian closed category, such that $\mathcal{V}\left ({W},{TY}\right )$ is a pointed $\omega$ -cpo for any $(W,Y)\in \mathcal{V}\times \mathcal{V}$ .
A $CBV$ $\boldsymbol{\omega } \mathbf{Cpo}$ -pair morphism between $\left (\mathcal{V}, \mathcal{T}\, \right )$ and $\left (\mathcal{V}\,', \mathcal{T}^{\;\,\prime} \right )$ is an $\boldsymbol{\omega } \mathbf{Cpo}$ -functor $H \;:\; \mathcal{V}\,{\rightarrow }\,\mathcal{V}\,'$ whose underlying functor yields a morphism between the $CBV$ pairs and such that $ H \;:\; \mathcal{V}\left ({W},{TY}\right )\,{\rightarrow }\, \mathcal{V}\left ({HW},{HTY}\right )$ is a pointed $\boldsymbol{\omega } \mathbf{Cpo}$ -morphism for any $\left ( W, Y\right )\in{\mathsf{ob}\,\mathcal{V}} \times{\mathsf{ob}\,\mathcal{V}}$ . This defines a category of $CBV$ $\boldsymbol{\omega } \mathbf{Cpo}$ -pairs, denoted herein by $\boldsymbol{\omega } \mathbf{CPO}\textrm{-}\mathfrak{C}_{\mathcal{BV}}$ .
There is, then, an obvious forgetful functor $\mathcal{U}_{\mathtt{p}} \;:\; \boldsymbol{\omega } \mathbf{CPO}\textrm{-}\mathfrak{C}_{\mathcal{BV}}\,{\rightarrow }\, \mathfrak{C}_{\mathtt{p}}$ .
5.1 Fixpoints: term recursion and iteration
Recall that, if $A$ is a pointed $\omega$ -cpo and $q \;:\; A\,{\rightarrow }\, A$ is an endomorphism in $\boldsymbol{\omega } \mathbf{Cpo}$ , then $q$ has a least fixed point given by the colimit of the chain
by Kleene’s fixpoint theorem. Given such an endomorphism, we denote by $\mathrm{lfp}\left ({q}\right )$ its least fixed point.
Henceforth, let $\left ( \mathcal{V}, \mathcal{T}\, \right )$ be a $CBV$ $\boldsymbol{\omega } \mathbf{Cpo}$ -pair and $J\;:\; \mathcal{V}\,{\rightarrow }\, \mathcal{C}$ the corresponding $\mathcal{V}$ -enriched universal Kleisli functor. We denote by $-\otimes - \;:\; \mathcal{V}\times \mathcal{C}\,{\rightarrow }\, \mathcal{C}$ the $\mathcal{V}$ -tensor product in $\mathcal{C}$ , also called the $\mathcal{V}$ -copower, which, in this case, corresponds to the usual action of $\mathcal{V}$ on $\mathcal{C}$ .
By hypothesis, for any $W,Y,Z\in \mathcal{V}$ , the $\omega$ -cpo $\mathcal{C} \left ( Z\otimes W, Y \right )\,{\cong}\, \mathcal{V}\left ( Z, \mathcal{C}\left [ W,Y \right ] \right )$ is pointed. Let us write $\Lambda _Z^{W,Y}$ for the isomorphism from left to right. Then, we can define
where $\partial _{Z} = \left (\mathrm{id}_ Z, \mathrm{id} _Z \right ) \;:\; Z\,{\rightarrow }\, Z\times Z$ is the diagonal morphism, $d_{Z,W,Y}:Z\otimes (W\sqcup Y)\,{\rightarrow }\, (Z\otimes W)\sqcup (Z\otimes Y)$ is the distributor, and $a_{Z,W,Y}:(Z\times W)\otimes Y\,{\rightarrow }\, Z\otimes (W\otimes Y)$ is the associator. Since the morphisms above are $\boldsymbol{\omega } \mathbf{Cpo}$ -natural in $Z\in \mathcal{V}$ , they give rise to the families of morphisms
by the Yoneda lemma, where $\mathsf{eval}_{ A, B } \;:\; \mathcal{V}\left [{A},{B}\right ] \times A\,{\rightarrow }\, B$ is the evaluation morphism given by the cartesian closed structure.
Lemma 5.2 (Underlying $CBV$ model). There is a forgetful functor $\mathcal{U}_{\mathcal{BV}} \;:\; \boldsymbol{\omega } \mathbf{CPO}\textrm{-}\mathfrak{C}_{\mathcal{BV}}\,{\rightarrow }\, \mathfrak{C}_{\mathcal{BV}}$ defined by $\mathcal{U}_{\mathcal{BV}} \left ( \mathcal{V}, \mathcal{T}\, \right ) = \left ( \mathcal{V}, \mathcal{T}, \mu _ \omega, \mathsf{it}_\omega \right )$ , taking every morphism $H$ to its underlying morphism of $CBV$ models.
Proof Since $H$ is a $\boldsymbol{\omega } \mathbf{Cpo}$ -functor and, for any $\left ( W, Y\right )\in{\mathsf{ob}\,\mathcal{V}}\times{\mathsf{ob}\,\mathcal{V}}$ ,
is a pointed $\boldsymbol{\omega } \mathbf{Cpo}$ -morphism, we get that, indeed, $H$ respects the free iteration and free recursion as defined in (15) and (16).
It should be noted that, given $CBV$ $\boldsymbol{\omega } \mathbf{Cpo}$ -pairs $\left ( \mathcal{V} _0, \mathcal{T} _0 \right )$ and $\left ( \mathcal{V} _1, \mathcal{T} _1\right )$ ,
is the product in $\boldsymbol{\omega } \mathbf{CPO}\textrm{-}\mathfrak{C}_{\mathcal{BV}}$ . Moreover, $\mathcal{U}_{\mathcal{BV}}$ preserves finite products.
6. Automatic Differentiation for Term Recursion and Iteration
For our purpose, we could define our macro in terms of total derivatives. However, we choose to present it in terms of partial derivatives, in order to keep our treatment as close as possible to the starting point of the efficient implementation of the reverse mode given in Smeding and Vákár (Reference Smeding and Vákár2023).
Following this choice of presentation, it is particularly convenient to establish our AD macro $\mathcal{D}$ as a program transformation between a source language and a target language (see Section 6.4). The source language contains the programs that we differentiate, while we use the target language to represent those derivatives.
6.1 Source language as a standard call-by-value language with iteration and recursion
We consider a standard (coarse-grain) CBV language over a ground type $\mathbf{real}$ , certain real constants $\underline{c}\in \mathrm{Op}_0$ , certain primitive operations $\mathrm{op} \in \mathrm{Op}_n$ for each nonzero natural number $n\in \mathbb{N} ^\ast$ , and $\mathbf{sign}$ . We denote $\displaystyle \mathrm{Op} := \bigcup _{n\in \mathbb{N} } \mathrm{Op} _ n$ .
As it is clear from the semantics defined in Section 7.3, $\mathbf{real}$ intends to implement the real numbers. Moreover, for each $n\in \mathbb{N}$ , the operations in $\mathrm{Op} _n$ intend to implement partially defined functions $\mathbb{R} ^n \rightharpoonup \mathbb{R}$ . Finally, $\mathbf{sign}\,$ intends to implement the partially defined function $\mathbb{R}\rightharpoonup \mathbf{1}\sqcup \mathbf{1}$ defined on $\mathbb{R} ^- \cup \mathbb{R} ^+$ , which takes $\mathbb{R} ^-$ to the left component and $\mathbb{R} ^+$ to the right component.
Although it is straightforward to consider more general settings, we also add the assumption that the primitive operations implement differentiable functions (see Assumption 7.6).
We treat this operations in a schematic way as this reflects the reality of practical AD libraries, which are constantly being expanded with new primitive operations.
The types $\tau,\sigma,\rho$ , values ${v},{w},{u}$ , and computations ${t},{s},{r}$ of our language are as follows.
We use sugar $\mathbf{if}\,r\,\mathbf{then}\,t\,\mathbf{else}\,s\, \stackrel{\mathrm{def}}{=} \mathbf{case}\,{\mathbf{sign}\,{r}}\,\mathbf{of}\;{\{\_\to{t}\mathrel{\big \lvert } \_\to{r}\}}$ , $\mathbf{fst}\,{t}\stackrel{\mathrm{def}}{=} \mathbf{case}\,{t}\,\mathbf{of}\,\langle{x},{\_}\rangle \to{x}$ , $\mathbf{snd}\,{t}\stackrel{\mathrm{def}}{=} \mathbf{case}\,{t}\,\mathbf{of}\,\langle{\_},{x}\rangle \to{x}$ , and $\mathbf{let} \,\mathbf{rec}\, f(x) = t\,\mathbf{in}\,s \stackrel{\mathrm{def}}{=} \mathbf{let}\,f = \mu f.\lambda x. t\,\mathbf{in}\,s$ . In fact, we can consider iteration as syntactic sugar as well:
$ \mathbf{iterate}\,{t}\,\mathbf{from}\,{x}={s}\stackrel{\mathrm{def}}{=}(\mu z.\lambda x.{ \mathbf{case}\,{t}\,\mathbf{of}\,\{\mathbf{inl}\, x'\to z\,x'\mid \mathbf{inr}\, x''\to x''\} })\,{s}$ .
The computations are typed according to the rules of Figs. 1 and 2, where $\mathtt{R}\subset \mathbb{R}$ is a fixed set of real numbers containing $0$ . For now, the reader may ignore the kinding contexts $\Delta$ . They will serve to support our treatment of ML-style polymorphism later.
We consider the standard CBV $\beta \eta$ -equational theory of Moggi (Reference Moggi1989) for our language, which we list in Fig. 3. We could impose further equations for the iteration construct as is done in Bloom and Ésik (Reference Bloom and Ésik1993) and Goncharov et al. (Reference Goncharov, Rauch and Schröder2015) as well as for the basic operations $\mathrm{op}$ and the sign function $\mathbf{sign}\,$ . However, such equations are unnecessary for our development.
6.2. Target language
We define our target language by extending the source language adding the following syntax, with the typing rules of Fig. 4.
The operational semantics of the target language depends on the intended behavior for the $AD$ macro $\mathcal{D}$ defined in Section 6.4. In our context, we want $\mathbf{vect}$ to implement a vector space ((co)tangent vectors), with the respective operations and the usual laws between the operations such as distributivity of the scalar multiplication over the vector addition (which is particularly useful for efficient implementations (Smeding and Vákár Reference Smeding and Vákár2023).
The terms $\mathfrak{h} _{i} t$ are irrelevant for the definition and correctness of the macro $\mathcal{D}$ , but it is particularly useful to illustrate the expected types in Section 9.6 and 9.7. Although this perspective is unimportant for our correctness statement, the reader might want to view $\mathbf{vect}$ as a computation type encompassing computational effects for the vector space operations $\overline{e}_i$ , $(\!*\!)$ , and $(+)$ with handlers given by the terms $\mathfrak{h} _{i} t$ .
We are particularly interested in the case that $\left ( \mathbf{vect}, +, \ast, \overline{0}\right )$ implements the vector space $\left ( \mathbb{R} ^k, +, \ast, 0\right )$ , for some $k\in \mathbb{N}\cup \left \{ \infty \right \}$ ,Footnote 7 where $\overline{e} _{i}$ implements the $i$ -th element $e^{k}_{i}\in \mathbb{R} ^k$ of the canonical basis if $k=\infty$ or if $i\leq k$ , and $0\in \mathbb{R}^k$ otherwise. In this case, $\mathfrak{h} _{i} t$ is supposed to implement
which denotes the canonical projection if $i\leq k$ and the coprojection otherwise.
For short, we say that $\mathbf{vect}$ implements the vector space $\mathbb{R} ^k$ to refer to the case above. It corresponds to the $k$ -semantics for the target language defined in Section 7.4.
6.3 The $CBV$ models $( \mathbf{Syn}_V, \mathbf{Syn}_{\mathcal{S}},\mathbf{Syn}_\mu, \mathbf{Syn}_{\mathsf{it}} )$ and $( \mathbf{Syn}_V^{{\mathbf{tr}}}, \mathbf{Syn}_{\mathcal{S}}^{{\mathbf{tr}}}, \mathbf{Syn}_\mu ^{{\mathbf{tr}}}, \mathbf{Syn}_{\mathsf{it}}^{{\mathbf{tr}}} )$
As discussed in Appendix A, we can translate our coarse-grain languages to fine-grain CBV languages. The fine-grain languages corresponding to the source and target languages correspond to the $CBV$ models
with the following universal properties.
Proposition 6.1 (Universal property of $CBV$ models (19)). Let $\left ( \mathcal{V}, \mathcal{T}, \mu, {\mathsf{itt}} \right )$ be a $CBV$ model. Assume that Figs. 5 and 6 are given consistent assignments.
-
1. There is a unique $CBV$ model morphism $H\;:\; \left ( \mathbf{Syn}_V, \mathbf{Syn}_{\mathcal{S}},\mathbf{Syn}_\mu, \mathbf{Syn}_{\mathsf{it}} \right )\,{\rightarrow }\, \left ( \mathcal{V}, \mathcal{T}, \mu, {\mathsf{itt}} \right )$ respecting the assignment of Fig. 5 .
-
2. There is a unique $CBV$ model morphism $\mathcal{H}\;:\; \left ( \mathbf{Syn}_V^{{\mathbf{tr}}}, \mathbf{Syn}_{\mathcal{S}}^{{\mathbf{tr}}}, \mathbf{Syn}_\mu ^{{\mathbf{tr}}}, \mathbf{Syn}_{\mathsf{it}}^{{\mathbf{tr}}} \right )\,{\rightarrow }\, \left ( \mathcal{V}, \mathcal{T}, \mu, {\mathsf{itt}} \right )$ that extends $H$ and respects the assignment of Fig. 6.
6.4. Dual numbers AD transformation for term recursion and iteration
Let us fix, for all $n\in \mathbb{N}$ , for all $\mathrm{op} \in \mathrm{Op}_n$ , and for all $1\leq i \leq n$ , computations $x_1:\mathbf{real},\ldots, x_n:\mathbf{real}\vdash \partial _i\mathrm{op} (x_1,\ldots, x_n):\mathbf{real}$ , which represent the partial derivatives of $\mathrm{op}$ . Using these terms for representing partial derivatives, we define, in Fig. 7, a structure preserving macro $\mathcal{D}$ on the types and computations of our language for performing AD.
We extend $\mathcal{D}$ to contexts: $\mathcal{D}\;{(\{x_1\;{:}\;\tau _1,{.}{.}{.},x_n{:}\tau _n\})}\stackrel{\mathrm{def}}{=} \{x_1\;{:}\;\mathcal{D}\;{(\tau _1)},{.}{.}{.},x_n\;{:}\;\mathcal{D}\;{(\tau _n)}\}$ . This turns $\mathcal{D}$ into a well-typed, functorial macro in the following sense.
Lemma 6.2 (Functorial macro). Our macro respects typing, substitution, and $\beta \eta$ -equality:
-
• If $\Gamma \vdash t \;:\; \tau$ , then $\mathcal{D}\;{(\Gamma )}\vdash \mathcal{D}\;{(t)}:\mathcal{D}\;{(\tau )}$ .
-
• $\mathcal{D}\;{(\mathbf{let}\,x ={t}\,\mathbf{in}\,{s})} ={\mathbf{let}\,x = \mathcal{D}(t)\,\mathbf{in}\,\mathcal{D}(s)}$ .
-
• If ${t}{\equiv }{s}$ , then $\mathcal{D}\;{(t)}{\equiv } \mathcal{D}\;{(s)}$ .
Our macro $\mathcal{D}$ can be seen as a class of macros, since it depends on the target language. More precisely, it depends on what $\mathbf{vect}$ implements (see Section 6.2).
As an example, for the program of Equation (1), $\mathcal{D}$ computes, modulo some $\beta \eta$ -equality to aid legibility, the following derivative (where we also define $\mathcal{D}\;{(\mathbf{int})}\stackrel{\mathrm{def}}{=}\mathbf{int}$ , $\mathcal{D}\;{({t}\lt{s}\lt{r})}\stackrel{\mathrm{def}}{=} \mathbf{fst}\,(\mathcal{D}\;{(t)})\lt \mathbf{fst}\,(\mathcal{D}\;{(s)})\lt \mathbf{fst}\,(\mathcal{D}\;{(r)})$ , and $\partial _i(+)(x,y)\stackrel{\mathrm{def}}{=} 1$ ):
6.5. AD transformation as a $CBV$ model morphism
By the universal property of $\left ( \mathbf{Syn}_V, \mathbf{Syn}_{\mathcal{S}},\mathbf{Syn}_\mu, \mathbf{Syn}_{\mathsf{it}} \right )$ established in Proposition 6.1, the assignment defined in Fig. 8 induces a unique $CBV$ model morphism
The macro $\mathcal{D}$ defined in Fig. 7 is encompassed by (20).
7. Concrete Semantics for the AD Transformation
We give a concrete denotational semantics for our source and target languages in terms of $\omega$ -cpos. In fact, our semantics for the target language will be parameterized by $k\in \mathbb{N}\cup \{\infty \}$ . This parameter allows us to give a uniform treatment of various variants of AD. For basic forward mode AD, we will use $k=1$ . Other $k\in \mathbb{N}$ correspond to vectorized forms of forward mode AD, and $k=\infty$ is primarily of interest for dual numbers reverse AD, which can be viewed as an optimized version of a vectorized forward AD with dynamically sized tangent vectors.
We will use these semantics to phrase and prove correctness of AD in the rest of this paper. We also recall some facts about and fix notation for derivatives, in order to phrase sufficient and necessary conditions on the semantics of primitive operations and their AD transformations.
7.1 Basic concrete model
The most fundamental example of a $CBV$ $\boldsymbol{\omega } \mathbf{Cpo}$ -pair is given by $\left ( \boldsymbol{\omega } \mathbf{Cpo}, \left ( -\right )_\bot \right )$ where $\left ( -\right )_\bot$ is the (lax idempotent) monad that freely adds a least element $\bot$ to each $\omega$ -cpo. Indeed, of course, $\boldsymbol{\omega } \mathbf{Cpo}\left ( W, \left ( Y\right )_\bot \right )$ is pointed for any pair $\left ( W,Y\right )\in{\mathsf{ob}\,\boldsymbol{\omega } \mathbf{Cpo}}\times{\mathsf{ob}\,\boldsymbol{\omega } \mathbf{Cpo}}$ .
We consider the product $\left ( \boldsymbol{\omega } \mathbf{Cpo}, \left ( -\right )_\bot \right )\times \left ( \boldsymbol{\omega } \mathbf{Cpo}, \left ( -\right )_\bot \right ) = \left ( \boldsymbol{\omega } \mathbf{Cpo}\times \boldsymbol{\omega } \mathbf{Cpo}, \left ( -\right )_\bot \right )$ , where, by abuse of language, $\left ( (C, C^{\prime})\right )_\bot = \left ( \left (C\right )_\bot, \left (C^{\prime}\right )_\bot \right )$ . By Lemma 5.2, we obtain $CBV$ models
For example, the program from Equation (1) is interpreted as the function
where
-
• $\lceil z\rceil ^\epsilon = z$ if $|z|\gt \epsilon$ , $\lceil z\rceil ^\epsilon = 0$ if $|z|\lt \epsilon$ and $\lceil z\rceil ^\epsilon = \bot$ otherwise;
-
• $N_{[\![t]\!], x}$ is the smallest natural number $i$ such that $\lceil [\![t]\!] (i,x)\rceil ^\epsilon =0$ .
7.2 Differentiable functions and interleaved derivatives
Henceforth, unless stated otherwise, the cartesian spaces $\mathbb{R} ^n$ and its subspaces are endowed with the respective discrete $\boldsymbol{\omega } \mathbf{Cpo}$ -structures, in which $r\leq r'$ if and only if $r=r'$ .
Definition 7.1 (Interleaving function). For each $(n,k)\in \mathbb{N} \times \left ( \mathbb{N}\cup \left \{ \infty \right \}\right )$ , denoting by $\mathbb{I}_{n}$ the set $\left \{ 1,\ldots, n\right \}$ , we define the isomorphism (in $\boldsymbol{\omega } \mathbf{Cpo}$ with the respective discrete $\boldsymbol{\omega } \mathbf{Cpo}$ -structures)
For each open subset $U\subset \mathbb{R}^n$ , we denote by ${\phi} _{n, k}^U \;:\; U\times \left ( \mathbb{R} ^k\right ) ^n\,{\rightarrow }\, {\phi} _{n,k}\left ( U\times \left ( \mathbb{R} ^k\right ) ^n \right )$ the isomorphism obtained from restricting ${\phi} _{n,k}$ .
In Definition 7.2, Remark 7.3, and Lemma 7.4, let $\displaystyle g \;:\; U\,{\rightarrow }\, \coprod _{j\in L} V_j$ be a map where $U$ is an open subset of $\mathbb{R} ^n$ , and, for each $i\in L$ , $V_i$ is an open subset of $\mathbb{R} ^{m_i}$ .
Definition 7.2 (Derivative). The map $g$ is differentiable if, for any $i\in L$ , $g^{-1}\left ( V_i \right ) =W _i$ is open in $\mathbb{R} ^n$ and the restriction $g|_{W_i} \;:\; W_ i\,{\rightarrow }\, V_i$ is differentiable w.r.t the submanifold structures $W_i\subset \mathbb{R} ^{n}$ and $ V_i \subset \mathbb{R} ^{m_i}$ . In this case, for each $k\in \left ( \mathbb{N}\cup \left \{ \infty \right \}\right )$ , we define the function:
in which $\tilde{w}$ is the linear transformation $ \mathbb{R} ^n\,{\rightarrow }\, \mathbb{R} ^k$ corresponding to the vector $w$ , $\cdot$ is the composition of linear transformations, $\iota _{m_i}$ is the obvious $ith$ -coprojection of the coproduct (in the category $\boldsymbol{\omega } \mathbf{Cpo}$ ), and $g'(x)^t:\mathbb{R}^{m_i}\,{\rightarrow }\,\mathbb{R}^n$ is the transpose of the derivative $g'(x) :\mathbb{R} ^n\,{\rightarrow }\,\mathbb{R} ^{m_i}$ of $g|_{W_i} \;:\; W_ i\,{\rightarrow }\, V_i$ at $x\in U$ .
Remark 7.3. It should be noted that, in Definition 7.2, $W_i$ might be empty for some $i\in L$ . In this case, $g|_{W_i} \;:\; W_ i\,{\rightarrow }\, V_i$ is trivially differentiable. Analogously, $U$ might be empty. In this case, the function $g$ is differentiable, and $\mathfrak{D}^{k}{g}$ is the unique morphism with domain $\emptyset$ and codomain as in (6.4).
Lemma 7.4. Let $\dot{g}$ be a function with domain as in (6.4). The map $g$ is differentiable and $\dot{g} =\mathfrak{D}^{k}{g}$ if, and only if, $g\circ \alpha$ is differentiable and $\dot{g}\circ \mathfrak{D}^{k}{\alpha } = \mathfrak{D}^{k}{\left (g\circ \alpha \right )}$ for any differentiable map $\alpha \;:\; \mathbb{R} ^n\,{\rightarrow }\, U$ .
Definition 7.5 (Differentiable partial maps). Let $ \displaystyle h \;:\; \coprod _{r\in K } \mathbb{R} ^{n_r}\,{\rightarrow }\, \left (\coprod _{j\in L } \mathbb{R} ^{m_j} \right )_\bot$ be a morphism in $\boldsymbol{\omega } \mathbf{Cpo}$ . We say that $h$ is differentiable if, for each $i\in K$ , the component $\displaystyle h_i := h\circ \iota _{i} \;:\; \mathbb{R} ^{n_i }\,{\rightarrow }\, \left ( \coprod _{j\in L } \mathbb{R} ^{m_j}\right )_\bot$ satisfies the following two conditions:
-
• $\displaystyle h_i ^{-1}\left ( \coprod _{j\in L } \mathbb{R} ^{m_j} \right ) = U_i$ is open in $\mathbb{R} ^{n_i }$ ;
-
• the corresponding total function (25) is differentiable.
(25) \begin{equation} \underline{h_i} = h|_{{U_i}} \;:\; U_i\,{\rightarrow }\, \coprod _{j\in L } \mathbb{R} ^{m_j} \end{equation}(26) \begin{equation} \displaystyle \mathfrak{d}^{k}\left ( h\right ) :\coprod _{r\in K } \left ( \mathbb{R} \times \mathbb{R} ^k \right )^{n_r}\,{\rightarrow }\, \left (\coprod _{j\in L } \left ( \mathbb{R} \times \mathbb{R} ^k \right ) ^{m_j}\right )_\bot \end{equation}In this case, for each $k\in \mathbb{N}\cup \left \{\infty \right \}$ , we define (26) to be the morphism induced by $ [ \mathfrak{d}^{k}\left ( h_r \right ) ] _{r\in K}$ where, for each $i\in K$ , $\mathfrak{d}^{k}\left ( h_i \right )$ is defined by (27), which is just the corresponding canonical extension of the map $\mathfrak{D}^{k}{h_i}$ .(27) \begin{eqnarray} \mathfrak{d}^{k}\left ( h_i\right ) \;:\; & \left ( \mathbb{R} \times \mathbb{R} ^k \right ) ^{n_i} &\,{\rightarrow }\, \left ( \coprod _{j\in L } \left ( \mathbb{R} \times \mathbb{R} ^k \right ) ^{m_j}\right )_\bot \\ &z &\mapsto \begin{cases} \mathfrak{D}^{k}{h_i} \left ( z\right ), & \text{if } z\in {\phi} _{{n_i}, k}\left ( U_i\times \left ( \mathbb{R} ^k\right ) ^{n_i}\right )\subset \left ( \mathbb{R} \times \mathbb{R} ^k \right ) ^{n_i}; \\ \bot, & \text{otherwise.} \end{cases}\nonumber \end{eqnarray}
For example, the partial function $h$ of Equation (21) has the following derivative $\mathfrak{D}^{k}{(} h)$ :
7.3 The semantics for the source language
We give a concrete semantics for our language, interpreting it in the $CBV$ $\boldsymbol{\omega } \mathbf{Cpo}$ -pair $\left ( \boldsymbol{\omega } \mathbf{Cpo}, \left (-\right )_\bot \right )$ .
We denote by $\mathbb{R}$ the discrete $\omega$ -cpo of real numbers, in which $r\leq r'$ if and only if $r=r'$ , and we define $ \mathsf{sign} \;:\; \mathbb{R}\,{\rightarrow }\, \left ( \mathsf{1} \sqcup \mathsf{1} \right )_\bot$ by (31), where $\iota _{1}, \iota _{2} \;:\; \mathsf{1}\,{\rightarrow }\, \mathsf{1}\sqcup \mathsf{1}$ are the two coprojections of the coproduct.
By the universal property of $\left ( \mathbf{Syn}_V, \mathbf{Syn}_{\mathcal{S}},\mathbf{Syn}_\mu, \mathbf{Syn}_{\mathsf{it}} \right )$ , there is only one $CBV$ model morphism (30) consistent with the assignment of Fig. 9 where $\mathsf{c}$ is the constant that $\underline{c}$ intends to implement, and, for each $\mathrm{op} \in \mathrm{Op} _n$ , ${f}_{\mathrm{op}}$ is the partial map that $\mathrm{op}$ intends to implement.
The $CBV$ model morphism (30) (or, more precisely, the underlying functor of the $CBV$ morphism $[\![-]\!]$ ) gives the semantics for the source language. Although our work holds for more general contexts, we consider the following assumption over the semantics of our language.
Assumption 7.6. For each $n\in \mathbb{N}$ and $\mathrm{op} \in \mathrm{Op} _n$ , $[\![\mathrm{op} ]\!] ={f}_{\mathrm{op}} \;:\; \mathbb{R} ^n\,{\rightarrow }\, \left ( \mathbb{R}\right )_\bot$ is differentiable.
7.4 The $k$ -semantics for the target language
For each $k\in \mathbb{N} \cup \left \{ \infty \right \}$ , we define the $k$ -semantics for the target language by interpreting $\mathbf{vect}$ as the vector space $\mathbb{R}^k$ . Namely, we extend the semantics $[\![-]\!]$ of the source language into a $k$ -semantics of the target language. More precisely, by Proposition 6.1, there is a unique $CBV$ model morphism (32) that extends $[\![-]\!]$ and is consistent with the assignment given by the vector structure (33) together with the projection (coprojection) $[\![\mathfrak{h} _{i} ]\!] _{k} \;:\; \mathbb{R} ^k\,{\rightarrow }\, \mathbb{R} ^i$ if $i\leq k$ ( $i\geq k$ ), for each $i\in \mathbb{N} ^\ast$ .
7.5 Soundness of $\mathcal{D}$ for primitive operations
Definition 7.7 (Sound for primitives). A macro $\mathcal{D}$ as defined in Fig. 7 and its corresponding $CBV$ model morphism $\mathbb{D}$ as defined in (20) are sound for primitives if, for any primitive $\mathrm{op} \in \mathrm{Op}$ , $ [\![\mathcal{D}\;{(\mathrm{op} )}]\!] _{k} = \mathfrak{d}^{k}\left ( [\![\mathrm{op} ]\!] \right )$ for any $k$ .
For each $j\in \mathbb{I}_{n}$ , given a differentiable function $f \;:\; \mathbb{R} ^n\,{\rightarrow }\, \left ( \mathbb{R}\right )_\bot$ , we denote by $\mathfrak{d}_{j}\left ( f \right ) \;:\; \mathbb{R} ^n\,{\rightarrow }\, \left ( \mathbb{R}\times \mathbb{R}\right )_\bot$ the function defined by $\mathfrak{d}_{j}\left ( f \right ) \left ( x_1,\ldots, x_n\right ) = \mathfrak{d}^{1}\left ( f \right ) \circ {\phi} _{n,1} \left ( (x_1,\ldots, x_n), e^{n}_{j}\right )$ , where $e^{n}_{j}$ the $j$ -th vector of the canonical basis of $\mathbb{R}^n$ .
Lemma 7.8. The macro $\mathcal{D}$ defined in Fig. 7 is sound for primitives provided that
for any primitive operation $\mathrm{op} \in \mathrm{Op} _ n$ of the source language.
8. Enriched Scone and Subscone
Here, we present general, reusable results about logical relations proofs for languages with recursive features. We phrase these in terms of category theory. Concretely, we discuss two categorical perspectives on logical relations, both of which are constructions to build a new categorical semantics out of two existing semantics $\mathcal{B}$ and $\mathcal{D}$ . The first perspective, called the scone, is as simple as a plain comma category $\mathcal{D}\downarrow G$ of the identity along a suitable functor $G:\mathcal{B}\,{\rightarrow }\,\mathcal{D}$ between the two existing semantics. It gives a proof-relevant perspective in which we may distinguish different witnesses demonstrating the truth of a predicate. The second perspective, called the subscone, arises as a suitable reflective subcategory of the scone. It crucial property is that its objects are chosen such that they represent only proof-irrelevant predicates, meaning that we can think of its morphisms simply as $\mathcal{B}$ -morphisms that respect the predicates.
Here, we focus, in particular, on characterizing when the scone and subscone are $\boldsymbol{\omega } \mathbf{Cpo}$ -bicartesian closed categories, getting us most of the way to a CBV $\boldsymbol{\omega } \mathbf{Cpo}$ -pair. We discuss the remaining ingredient of lifting the (pointed) monad to the subscone in Section 9.2.Footnote 8
8.1 Scone: proof-relevant categorical logical relations
Given an $\boldsymbol{\omega } \mathbf{Cpo}$ -functor $G:\mathcal{B}\,{\rightarrow }\,\mathcal{D}$ , the comma $\boldsymbol{\omega } \mathbf{Cpo}$ -category $\mathcal{D}\downarrow G$ of the identity along $G$ in ${\boldsymbol{\omega } \mathbf{Cpo}}\textrm{-}\mathbf{Cat}$ is defined as follows.
-
– The objects of $\mathcal{D}\downarrow G$ are triples $(D\in \mathcal{D}, C\in \mathcal{B}, j:D\,{\rightarrow }\, G(C) )$ in which $j$ is a morphism of $\mathcal{D}$ ; we think of these as pairs of a $\mathcal{B}$ -object $C$ and a proof-relevant predicate $(D, j)$ on $G(C)$ ;
-
– a morphism $(D,C, j)\,{\rightarrow }\, (D', C^{\prime}, h)$ between objects of $\mathcal{D}\downarrow G$ is a pair (35) making (36) commutative in $\mathcal{D}$ ; we think of these as $\mathcal{B}$ -morphisms $\alpha _1$ that respect the predicates, as evidenced by $\alpha _0$ :
-
– if $\alpha = \left ( \alpha _0 \;:\; D\,{\rightarrow }\, D', \alpha _1 \;:\; C\,{\rightarrow }\, C^{\prime}\right ), \beta = \left ( \beta _0 \;:\; D\,{\rightarrow }\, D', \beta _1 \;:\; C\,{\rightarrow }\, C^{\prime}\right ) \;:\; \left ( D, C, j\right )\,{\rightarrow }$ $\left ( D', C^{\prime}, h\right )$ , are two morphisms of $\mathcal{D}\downarrow G$ , we have that $\alpha \leq \beta$ if $\alpha _0\leq \beta _0$ in $\mathcal{D}$ and $\alpha _1\leq \beta _1$ in $\mathcal{B}$ .
Following the approach of Lucatelli Nunes and Vákár (Lucatelli Nunes Reference Lucatelli Nunes2022, Section 9), we have:
Theorem 8.1 (Monadic-comonadic scone). Let $G\;:\; \mathcal{B}\,{\rightarrow }\,\mathcal{D}$ be a right $\boldsymbol{\omega } \mathbf{Cpo}$ -adjoint functor. Assuming that $\mathcal{D}$ has finite $\boldsymbol{\omega } \mathbf{Cpo}$ -products and $\mathcal{B}$ has finite $\boldsymbol{\omega } \mathbf{Cpo}$ -coproducts, the $\boldsymbol{\omega } \mathbf{Cpo}$ -functor
defined by $\left ( D\in \mathcal{D}, C\in \mathcal{B}, j:D\,{\rightarrow }\, G(C) \right )\mapsto \left ( D, C \right )$ , is $\boldsymbol{\omega } \mathbf{Cpo}$ -comonadic and $\boldsymbol{\omega } \mathbf{Cpo}$ -monadic. Footnote 9 This implies, in particular, that $\mathcal{L}$ creates (and strictly preserves) $\boldsymbol{\omega } \mathbf{Cpo}$ -limits and colimits. Footnote 10
By Theorem 8.1 and the enriched adjoint triangle theorem,Footnote 11 we have:
Corollary 8.2. Let $G\;:\; \mathcal{B}\,{\rightarrow }\,\mathcal{D}$ be a right $\boldsymbol{\omega } \mathbf{Cpo}$ -adjoint functor between $\boldsymbol{\omega } \mathbf{Cpo}$ -bicartesian closed categories. In this case, $\mathcal{D} \downarrow G$ is an $\boldsymbol{\omega } \mathbf{Cpo}$ -bicartesian closed category. Moreover, if $\mathcal{D}\times \mathcal{B}$ is $\boldsymbol{\omega } \mathbf{Cpo}$ -cocomplete, so is $\mathcal{D} \downarrow G$ .
Theorem 8.1 and Corollary 8.2 are $\boldsymbol{\omega } \mathbf{Cpo}$ -enriched versions of the fundamental results of Lucatelli Nunes and Vákár (Lucatelli Nunes Reference Lucatelli Nunes2022, Section 9). The details and proofs are presented in Appendix C.
8.2 Subscone: proof-irrelevant categorical logical relations
Henceforth, we assume that $\mathbf{Sub}\left ( \mathcal{D}\downarrow G \right )$ is a fullFootnote 12 reflectiveFootnote 13 and repleteFootnote 14 $\boldsymbol{\omega } \mathbf{Cpo}$ -subcategory of $\mathcal{D}\downarrow G$ . We denote, herein, by $\mathfrak{T}_{sub}$ the idempotent $\boldsymbol{\omega } \mathbf{Cpo}$ -monad induced by the $\boldsymbol{\omega } \mathbf{Cpo}$ -adjunction.
Recall that a morphism $q$ in $\boldsymbol{\omega } \mathbf{Cpo}$ is full if its underlying functor is full. In this case, the underlying functor is also faithful and injective on objects. Moreover, a morphism $j$ in an $\boldsymbol{\omega } \mathbf{Cpo}$ -category $\mathcal{B}$ is full if $\mathcal{B}\left ( B, j \right )$ is full in $\boldsymbol{\omega } \mathbf{Cpo}$ for any $B\in \mathcal{B}$ .
Furthermore, recall that an $\boldsymbol{\omega } \mathbf{Cpo}$ -functor $H:\mathcal{W}\,{\rightarrow }\,\mathcal{Z}$ is locally full if, for any $\left ( X,W \right )\in{\mathsf{ob}\,\mathcal{W}}\times{\mathsf{ob}\,\mathcal{W}}$ , the morphism $H \;:\; \mathcal{W}\left ( X, W \right )\,{\rightarrow }\, \mathcal{Z}\left ( HX, HW \right )$ is a full $\boldsymbol{\omega } \mathbf{Cpo}$ -morphism. It should be noted that the $2$ -functor underlying a locally full $\boldsymbol{\omega } \mathbf{Cpo}$ -functor is locally fully faithful. Moreover, since every full morphism in $\boldsymbol{\omega } \mathbf{Cpo}$ is injective on objects, every locally full $\boldsymbol{\omega } \mathbf{Cpo}$ -functor is faithful (locally injective on objects).
Assumption 8.3. We require that:
-
(Sub. 1) whenever $\left ( D\in \mathcal{D}, C\in \mathcal{B}, j\right )\in \mathbf{Sub}\left ( \mathcal{D}\downarrow G \right )$ , $j$ is a full morphism in $\mathcal{B}$ ;
-
(Sub. 2) $G\;:\; \mathcal{B}\,{\rightarrow }\,\mathcal{D}$ is a right $\boldsymbol{\omega } \mathbf{Cpo}$ -adjoint functor between $\boldsymbol{\omega } \mathbf{Cpo}$ -bicartesian closed categories;
-
(Sub. 3) $\mathfrak{T}_{sub}$ strictly preserves $\boldsymbol{\omega } \mathbf{Cpo}$ -products;
-
(Sub. 4) Diag. (39) commutes.
We denote by $\underline{\mathcal{L}} \;:\; \mathbf{Sub}\left ( \mathcal{D}\downarrow G \right )\,{\rightarrow }\, \mathcal{B}$ the $\boldsymbol{\omega } \mathbf{Cpo}$ -functor given by the composition (38) where the unlabeled arrow is the full inclusion.
Proposition 8.4. The full inclusion $\mathbf{Sub}\left ( \mathcal{D}\downarrow G \right )\,{\rightarrow }\, \mathcal{D}\downarrow G$ creates (and strictly preserves) $\boldsymbol{\omega } \mathbf{Cpo}$ -limits and $\boldsymbol{\omega } \mathbf{Cpo}$ -exponentials. Moreover, if $\mathcal{D}\downarrow G$ is $\boldsymbol{\omega } \mathbf{Cpo}$ -cocomplete, so is $\mathbf{Sub}\left ( \mathcal{D}\downarrow G \right )$ .
Proof $\mathbf{Sub}\left ( \mathcal{D}\downarrow G \right )\,{\rightarrow }\, \mathcal{D}\downarrow G$ is $\boldsymbol{\omega } \mathbf{Cpo}$ -monadic, and hence, it creates $\boldsymbol{\omega } \mathbf{Cpo}$ -limits.
By Assumption (sub. 3) of Assumption 8.3, $\mathfrak{T}_{sub}$ is commutative, and hence, $\mathbf{Sub}\left ( \mathcal{D}\downarrow G \right )\,{\rightarrow }\, \mathcal{D}\downarrow G$ creates $\boldsymbol{\omega } \mathbf{Cpo}$ -exponentials.
Since $\mathfrak{T}_{sub}$ is idempotent, $\mathbf{Sub}\left ( \mathcal{D}\downarrow G \right )$ is $\boldsymbol{\omega } \mathbf{Cpo}$ -cocomplete whenever $\mathcal{D}\downarrow G$ is.
Corollary 8.5. $\mathbf{Sub}\left ( \mathcal{D}\downarrow G \right )$ is an $\boldsymbol{\omega } \mathbf{Cpo}$ -bicartesian closed category. Moreover, if $\mathcal{D}\times \mathcal{B}$ is $\boldsymbol{\omega } \mathbf{Cpo}$ -cocomplete, so is $\mathbf{Sub}\left ( \mathcal{D}\downarrow G \right )$ .
Theorem 8.6. The $\boldsymbol{\omega } \mathbf{Cpo}$ -functor $\underline{\mathcal{L}} \;:\; \mathbf{Sub}\left ( \mathcal{D}\downarrow G \right )\,{\rightarrow }\, \mathcal{B}$ is strictly (bi)cartesian closed and locally full (hence, faithful). Moreover, $\underline{\mathcal{L}}$ strictly preserves $\boldsymbol{\omega } \mathbf{Cpo}$ -colimits.
Proof The $\boldsymbol{\omega } \mathbf{Cpo}$ -functors $\mathcal{L} :\mathcal{D}\downarrow G\,{\rightarrow }\, \mathcal{D}\times \mathcal{B}$ and $\pi _{\mathcal{B}} \;:\; \mathcal{D} \times \mathcal{B}\,{\rightarrow }\, \mathcal{B}$ strictly preserve $\boldsymbol{\omega } \mathbf{Cpo}$ -weighted limits and colimits. Since $\mathfrak{T}_{sub}$ is idempotent and (39) commutes, this implies that $\underline{\mathcal{L}}$ strictly preserves $\boldsymbol{\omega } \mathbf{Cpo}$ -limits and colimits.
The composition $\pi _{\mathcal{B}} \circ \mathcal{L}$ has a left $\boldsymbol{\omega } \mathbf{Cpo}$ -adjoint given by $ C\mapsto \left ( \mathsf{0}, C, \iota _{\mathsf{0} } \right )$ . Since the counit of this $\boldsymbol{\omega } \mathbf{Cpo}$ -adjunction is the identity and $\pi _{\mathcal{B}} \circ \mathcal{L}$ strictly preserves $\boldsymbol{\omega } \mathbf{Cpo}$ -products, we get that this $\boldsymbol{\omega } \mathbf{Cpo}$ -adjunction strictly satisfies the Frobenius reciprocity condition. This implies that $\pi _{\mathcal{B}} \circ \mathcal{L}$ strictly preserves $\boldsymbol{\omega } \mathbf{Cpo}$ -exponentials.
Since $\mathfrak{T}_{sub}$ strictly preserves $\boldsymbol{\omega } \mathbf{Cpo}$ -products, we get that $\mathbf{Sub}\left ( \mathcal{D}\downarrow G \right )\,{\rightarrow }\, \mathcal{D}\downarrow G$ strictly preserves $\boldsymbol{\omega } \mathbf{Cpo}$ -exponentials as well. Therefore, $\underline{\mathcal{L}}$ strictly preserves $\boldsymbol{\omega } \mathbf{Cpo}$ -exponentials.
The locally fully faithfulness (and, hence, faithfulness) of $\underline{\mathcal{L}}$ follows from Condition (sub. 1) of Assumption 8.3.
Remark 8.7 (Proof-irrelevance). Condition (sub. 1) of Assumption 8.3 ensures that our subscone indeed gives us a proof-irrelevant approach to logical relations: in particular, as stressed above, it implies that $\underline{\mathcal{L}}$ is faithful. Given objects $(D, C, j), (D', C^{\prime}, j')$ and a morphism $f\;:\; C\,{\rightarrow }\, C^{\prime}$ in $\mathcal{B}$ , if there is $\alpha \;:\; (D, C, j)\,{\rightarrow }\, (D', C^{\prime}, j')$ satisfying $\underline{\mathcal{L}} (\alpha ) = f$ , then $\alpha$ is unique with this property. In this case, we say that $f$ defines a morphism $(D, C, j)\,{\rightarrow }\, (D', C^{\prime}, j')$ in $\mathbf{Sub}\left ( \mathcal{D}\downarrow G \right )$ .
Generally, we see a tradeoff between using proof-relevant logical relations proofs via an interpretation in the scone or proof-irrelevant ones via an interpretation in the subscone. The scone is generally better behaved as a category, as it tends to be both monadic and comonadic by Theorem 8.1, while the subscone tends to only be monadic. The objects and morphisms of the subscone, however, can be simpler to work with, as we do not need to track witnesses thanks to their uniqueness. In the rest of this paper, we work with the (proof-irrelevant) subscone, mostly to conform to conventions in the literature.
9. Correctness of Dual Numbers AD
In this section, we show that, as long as the macro $\mathcal{D}$ defined in Fig. 7 is sound for primitives and $\mathbf{vect}$ implements $\mathbb{R} ^k$ , $\mathcal{D}$ is correct according to the $k$ -specification below. More precisely, we prove that:
Theorem 9.1. Assume that $\mathbf{vect}$ implements the vector space $\mathbb{R} ^k$ , for some $k\in \mathbb{N}\cup \left \{ \infty \right \}$ . For any program $x:\tau \vdash{t}:\sigma$ where $\tau,\sigma$ are data types (i.e., types not containing function types), we have that $[\![t]\!]$ is differentiable and, moreover,
provided that $\mathcal{D}$ is sound for primitives.
We take the following steps to achieve this result:
-
• In Section 9.1, we fix a particular functor $G:\mathcal{B}\,{\rightarrow }\,\mathcal{D}$ for which to consider the scone, as well as a particular reflective subscone of the scone. This sets the concrete stage in which our logical relations proof will take place.
-
• In Section 9.2, we choose a particular lifting of the partiality monad to this subscone, to establish a reasoning principle for derivatives of partial functions.
-
• In Section 9.3, we fix a lifting of the interpretation of the primitive type $\mathbf{real}$ to the subscone, establishing a reasoning principle for derivatives of real-valued functions. We further show that, for a macro $\mathcal{D}$ that is sound for primitives, $[\![\mathcal{D}\;{(-)}]\!] _k$ respects the logical relation and hence yields an interpretation of our full language in the subscone.
-
• In Section 9.4, we show that logical relations at data types (composite types not containing function types) also capture correct differentiation.
-
• In Section 9.5, we derive our fundamental AD correctness theorem from the interpretation of our language in the subscone, and in Sections 9.6 and 9.7, we spell out in more detail what this correctness theorem entails for the choice of semantics $[\![\mathbf{vect} ]\!] _k=\mathbb{R}^k$ .
9.1 Fixing a particular subscone $ \mathbf{Sub}\left ( \boldsymbol{\omega } \mathbf{Cpo}\downarrow G_{n, k} \right )$
Henceforth, we follow the notation and definitions established in Section 7. In particular, unless stated otherwise, the cartesian spaces $\mathbb{R} ^n$ and its subspaces are endowed with the discrete $\boldsymbol{\omega } \mathbf{Cpo}$ -structure, in which $r\leq r'$ if and only if $r=r'$ .
For each $\left ( n, k\right ) \in \mathbb{N}\times \left ( \mathbb{N}\cup \left \{ \infty \right \} \right )$ , we define the $\boldsymbol{\omega } \mathbf{Cpo}$ -functor (41). We consider the full reflective $\boldsymbol{\omega } \mathbf{Cpo}$ -subcategory $ \mathbf{Sub}\left ( \boldsymbol{\omega } \mathbf{Cpo}\downarrow G_{n, k} \right )$ of $\boldsymbol{\omega } \mathbf{Cpo}\downarrow G_{n, k}$ whose objects are triples (42) such that $j$ is full (and, hence, injective on objects).
That is, we are considering what Barthe et al. (2020) calls open logical relations (where closed logical relations would correspond to the case of $G_{n, k}=\boldsymbol{\omega } \mathbf{Cpo}\times \boldsymbol{\omega } \mathbf{Cpo}\left ( (1,1), - \right )$ ).
The $\boldsymbol{\omega } \mathbf{Cpo}$ -functor $G_{n, k}$ together with $ \mathbf{Sub}\left ( \boldsymbol{\omega } \mathbf{Cpo}\downarrow G_{n, k} \right )$ satisfies Assumption 8.3. Therefore:
Proposition 9.2. $\mathbf{Sub}\left ( \boldsymbol{\omega } \mathbf{Cpo}\downarrow G_{n, k} \right )$ is a cocomplete $\boldsymbol{\omega } \mathbf{Cpo}$ -cartesian closed category. Moreover, the forgetful $\boldsymbol{\omega } \mathbf{Cpo}$ -functor $\underline{\mathcal{L}} _{n,k} \;:\; \mathbf{Sub}\left ( \boldsymbol{\omega } \mathbf{Cpo}\downarrow G_{n, k} \right )\,{\rightarrow }\, \boldsymbol{\omega } \mathbf{Cpo}\times \boldsymbol{\omega } \mathbf{Cpo}$ is locally full and strictly cartesian closed. Furthermore, it strictly preserves $\boldsymbol{\omega } \mathbf{Cpo}$ -colimits.
9.2 Lifting the partiality monad to the subscone
Let $(n,k)\in \mathbb{N}\times \left ( \mathbb{N}\cup \left \{\infty \right \}\right )$ . In order to get a categorical model of our language, we need to define a partiality monad for $\mathbf{Sub}\left ( \boldsymbol{\omega } \mathbf{Cpo}\downarrow G_{n, k} \right )$ .
We denote by $\mathfrak{O}_{n}$ the set of proper open non-empty subsets of the cartesian space $\mathbb{R}^n$ . For each $U\in \mathfrak{O}_{n}$ , we define
We define the $\mathbf{Sub}\left ( \boldsymbol{\omega } \mathbf{Cpo}\downarrow G_{n,k} \right )$ -monad $\mathcal{P}_{n, k}\left (-\right )_\bot$ on $\mathbf{Sub}\left ( \boldsymbol{\omega } \mathbf{Cpo}\downarrow G_{n,k} \right )$ by
where $\underline{\mathcal{P}_{n, k}\left (D, \left ( C, C^{\prime}\right ), j\right )_\bot }$ is the union
with the full $\boldsymbol{\omega } \mathbf{Cpo}$ -substructure of $ G_{n, k} \left ( \left ( C\right )_\bot, \left ( C^{\prime}\right )_\bot \right )$ induced by the inclusion $\mathtt{j}_{X}$ which is defined by the following components:
-
• the inclusion $\left \{ \bot \right \}\,{\rightarrow }\,G_{n, k} \left ( \left ( C\right )_\bot, \left ( C^{\prime}\right )_\bot \right )$ of the least morphism $\bot \;:\; \left ( \mathbb{R} ^n, \left ( \mathbb{R} \times \mathbb{R} ^k\right ) ^n \right )\,{\rightarrow }\, \left ( \left ( C\right )_\bot, \left ( C^{\prime}\right )_\bot \right )$ in $\boldsymbol{\omega } \mathbf{Cpo}\times \boldsymbol{\omega } \mathbf{Cpo}\left ( \left ( \mathbb{R} ^n, \left ( \mathbb{R} \times \mathbb{R} ^k \right ) ^n \right ), \left ( \left ( C\right )_\bot, \left ( C^{\prime}\right )_\bot \right ) \right )$ ;
-
• the inclusion of the total functions $G_{n, k}\left ( \eta _{C}, \eta _{C^{\prime}} \right )\circ j \;:\; D\,{\rightarrow }\, G_{n, k} \left ( C, C^{\prime} \right )\,{\rightarrow }\, G_{n, k} \left ( \left ( C\right )_\bot, \left ( C^{\prime}\right )_\bot \right )$ ;
-
• the injections $\displaystyle \mathbf{Sub}\left ( \boldsymbol{\omega } \mathbf{Cpo}\downarrow G_{n, k} \right )\left ( \mathsf{Diff}_{\left (U, n, k\right )}, \left ( D, \left ( C, C^{\prime}\right ), j\right ) \right )\,{\rightarrow }\, G_{n, k} \left ( \left ( C\right )_\bot, \left ( C^{\prime}\right )_\bot \right )$ , for $U\in \mathfrak{O}_{n}$ , defined by
\begin{align*} \Big( \alpha _0, \alpha _1 & \left. = \left ( \beta _0 \;:\; U\,{\rightarrow }\, C, \beta _1 \;:\; {\phi} _{n, k}\left ( U\times \left ( \mathbb {R} ^k \right ) ^n\right ) \,{\rightarrow }\, C^{\prime} \right ) \right )\\& \mapsto \left( \overline {\beta _0} \;:\; \mathbb {R} ^n\,{\rightarrow }\, \left ( C\right )_\bot, \overline {\beta _1} :\left ( \mathbb {R} \times \mathbb {R} ^k\right ) ^n\,{\rightarrow }\, \left ( C^{\prime}\right )_\bot \right ), \end{align*}where $\overline{\beta _0}$ and $\overline{\beta _1}$ are the respective corresponding canonical extensions. The image of $\mathtt{j}_{X}$ forms a sub- $\omega$ -cpo because the union $\bigcup _{n\in \mathbb{N}} U_n$ of open sets $U_n$ is open and because $D$ is an $\omega$ -cpo.
For each $(C,C^{\prime})\in \boldsymbol{\omega } \mathbf{Cpo}\times \boldsymbol{\omega } \mathbf{Cpo}$ , the component $\left ( \mathrm{m} _{C}, \mathrm{m} _{C^{\prime}} \right )$ and $\left ( \eta _{C}, \eta _{C^{\prime}} \right )$ of the multiplication and the unit of the monad $\left ( -\right )_\bot$ on $\boldsymbol{\omega } \mathbf{Cpo}\times \boldsymbol{\omega } \mathbf{Cpo}$ define morphisms
in $\mathbf{Sub}\left ( \boldsymbol{\omega } \mathbf{Cpo}\downarrow G_{n, k} \right )$ . Therefore, $\overline{\mathrm{m}}$ and $\overline{\eta }$ define the multiplication and the unit for $\mathcal{P}_{n, k}\left (-\right )_\bot$ , completing the definition of our monad. Analogously, we lift, as morphisms of $\mathbf{Sub}\left ( \boldsymbol{\omega } \mathbf{Cpo}\downarrow G_{n, k}\right )$ , the strength of $\left ( -\right )_\bot$ , making $\mathcal{P}_{n, k}\left (-\right )_\bot$ into a strong monad (i.e., $\mathbf{Sub}\left ( \boldsymbol{\omega } \mathbf{Cpo}\downarrow G_{n, k}\right )$ -enriched monad).
In order to finish the proof that $\left ( \mathbf{Sub}\left ( \boldsymbol{\omega } \mathbf{Cpo}\downarrow G_{n, k} \right ), \mathcal{P}_{n, k}\left (-\right )_\bot \right )$ is a $CBV$ $\boldsymbol{\omega } \mathbf{Cpo}$ -pair, it is enough to see that, for any pair of objects $\left ( D_0, \left ( C_0, C'_{\!\!0}\right ), j_0 \right )$ , $ \left ( D_1, \left ( C_1, C^{\prime}_1\right ), j_1 \right )$ of $\mathbf{Sub}\left ( \boldsymbol{\omega } \mathbf{Cpo}\downarrow G_{n, k} \right )$ , the least morphism $\bot \;:\; \left ( C_0, C'_{\!\!0}\right )\,{\rightarrow }\, \left ( \left ( C_1\right )_\bot, \left ( C^{\prime}_1\right )_\bot \right ), $ of $\boldsymbol{\omega } \mathbf{Cpo}\left ( C_0, \left ( C_1\right )_\bot \right ) \times \boldsymbol{\omega } \mathbf{Cpo}\left ( C'_{\!\!0}, \left (C^{\prime}_1\right )_\bot \right )$ defines the least morphism $\left ( D_0, \left ( C_0, C'_{\!\!0}\right ), j_0 \right )\,{\rightarrow }\, \mathcal{P}_{n, k}\left (D_1, \left ( C_1, C^{\prime}_1\right ), j_1\right )_\bot$ in $\mathbf{Sub}\left ( \boldsymbol{\omega } \mathbf{Cpo}\downarrow G_{n, k} \right )$ .
Finally, since the underlying endofunctor of the monad $\mathcal{P}_{n, k}\left (-\right )_\bot$ , the multiplication and the identity are clearly lifted from $\left ( -\right )_\bot$ through $\underline{\mathcal{L}} _{n,k}$ as defined above, we have:
Proposition 9.3. For each $(n,k)\in \mathbb{N}\times \left ( \mathbb{N} \cup \left \{ \infty \right \}\right )$ , $\left (\mathbf{Sub}\left ( \boldsymbol{\omega } \mathbf{Cpo}\downarrow G_{n, k} \right ), \mathcal{P}_{n, k}\left (-\right )_\bot \right )$ is a $CBV$ $\boldsymbol{\omega } \mathbf{Cpo}$ -pair. Moreover, $\underline{\mathcal{L}} _{n,k} \;:\; \mathbf{Sub}\left ( \boldsymbol{\omega } \mathbf{Cpo}\downarrow G_{n,k} \right )\,{\rightarrow }\, \boldsymbol{\omega } \mathbf{Cpo}\times \boldsymbol{\omega } \mathbf{Cpo}$ is a $CBV$ $\boldsymbol{\omega } \mathbf{Cpo}$ -pair morphism between $\left (\mathbf{Sub}\left ( \boldsymbol{\omega } \mathbf{Cpo}\downarrow G_{n, k} \right ), \mathcal{P}_{n, k}\left (-\right )_\bot \right )$ and $\left (\boldsymbol{\omega } \mathbf{Cpo}\times \boldsymbol{\omega } \mathbf{Cpo}, \left ( -\right )_\bot \right )$ .
Therefore, by Lemma 5.2, $\mathcal{U}_{\mathcal{BV}} \left ( \underline{\mathcal{L}} _{n, k} \right )$ is a $CBV$ model morphism between the underlying $CBV$ models of $\left (\mathbf{Sub}\left ( \boldsymbol{\omega } \mathbf{Cpo}\downarrow G_{n, k} \right ), \mathcal{P}_{n, k}\left (-\right )_\bot \right )$ and $\left (\boldsymbol{\omega } \mathbf{Cpo}\times \boldsymbol{\omega } \mathbf{Cpo}, \left ( -\right )_\bot \right )$ .
9.3 Logical relations for $\mathbf{real}$ and deriving a $CBV$ model morphism
Henceforth, we assume that the macro $\mathcal{D}$ is sound for primitives (see Definition 7.5). We establish the $CBV$ model morphism (55). We start by establishing the logical relations’ assignment.
Let $(n, k)\in \mathbb{N}\times \left ( \mathbb{N} \cup \left \{ \infty \right \}\right )$ . We define the object (47) in $\mathbf{Sub}\left ( \boldsymbol{\omega } \mathbf{Cpo}\downarrow G_{n,k} \right )$ .
For each $m\in \mathbb{N}$ , $\mathrm{op} \in \mathrm{Op} _m$ , and $c\in \mathtt{R}$ , we define the morphisms (48), (49), and (50) in $\boldsymbol{\omega } \mathbf{Cpo}\times \boldsymbol{\omega } \mathbf{Cpo}$ , in which $\mathbb{D}$ , $[\![-]\!]$ , and $[\![-]\!] _{k}$ are the functors underlying the $CBV$ model morphisms, respectively, defined in (20), (30), and (32).
By Proposition 8.4, we have that the product $ \overline{[\![\hspace{-2.5pt}[\mathbf{real} ]\!] \hspace{-2.5pt}]}_{n,k} ^m$ in $\mathbf{Sub}\left ( \boldsymbol{\omega } \mathbf{Cpo}\downarrow G_{n, k} \right )$ is given by (51). Therefore, by the chain rule for derivatives, we have that (48), (49), and (50), respectively, define the morphisms (52), (53), and (54) in $\mathbf{Sub}\left ( \boldsymbol{\omega } \mathbf{Cpo}\downarrow G_{n,k} \right )$ , where $\overline{\mathsf{1}}\sqcup \overline{\mathsf{1}}$ denotes the coproduct of the terminal $\overline{\mathsf{1}} = \left ( \mathsf{1}, \left ( \mathsf{1}, \mathsf{1} \right ), \mathrm{id} \right )$ with itself.
By the universal property of the $CBV$ model $\left ( \mathbf{Syn}_V, \mathbf{Syn}_{\mathcal{S}},\mathbf{Syn}_\mu, \mathbf{Syn}_{\mathsf{it}} \right )$ , we get:
Proposition 9.4. For each $(n, k)\in \mathbb{N}\times \left ( \mathbb{N} \cup \left \{ \infty \right \}\right )$ , there is only one $CBV$ model morphism
that is consistent with the assignment given by (47), (52), (54), and (53). Moreover, Diag. (56) commutes.
Proof Both $\left ([\![-]\!] \times [\![-]\!] _{k} \right )\circ \left ( \mathrm{id}\times \mathbb{D}\right )$ and $\mathcal{U}_{\mathcal{BV}} \left ({\underline{\mathcal{L}}}_{n,k}\right )\circ \overline{[\![\hspace{-2.5pt}[-]\!] \hspace{-2.5pt}]}_{n,k}$ yield $CBV$ model morphisms that are consistent with the assignment given by the object $\left ( \mathbb{R}, \mathbb{R}\times \mathbb{R} ^k \right )$ together with (48), (49), and (50).
9.4 AD logical relations for data types
As a consequence of Proposition 9.4, we establish a fundamental result on the logical relations $ \overline{[\![\hspace{-2.5pt}[-]\!] \hspace{-2.5pt}]}_{n,k}$ for data types (i.e., types not containing function types) in our setting: namely, Proposition 9.6. Observe that, by distributivity of products over coproducts, any such data type is isomorphic to $\bigsqcup _{j\in L}\mathbf{real}^{l_j}$ for some finite set $L$ and $l_j\in \mathbb{N}$ . Therefore, we start by establishing Lemma 9.5 about our logical relations and the coproducts in $\mathbf{Sub}\left ( \boldsymbol{\omega } \mathbf{Cpo}\downarrow G_{n,k} \right )$ .
Lemma 9.5. Let $\left ( n, k\right ) \in \mathbb{N} \times \left ( \mathbb{N} \cup \left \{\infty \right \}\right )$ . If $\displaystyle \left ( g, \dot{g}\right ) \in \coprod _{j\in L} \overline{[\![\hspace{-2.5pt}[\mathbf{real} ]\!] \hspace{-2.5pt}]}_{n,k} ^{l_j}$ , then $\displaystyle g \;:\; \mathbb{R} ^n\,{\rightarrow }\,\coprod _{j\in L} \mathbb{R} ^{l_j}$ is differentiable and $\dot{g} = \mathfrak{D}^{k}{g}$ .
Proof By Proposition 9.2, $\mathbf{Sub}\left ( \boldsymbol{\omega } \mathbf{Cpo}\downarrow G_{n,k} \right )$ has coproducts. Moreover, we can conclude that $\displaystyle \left ( g, \dot{g}\right ) \in \coprod _{j\in L} \overline{[\![\hspace{-2.5pt}[\mathbf{real} ]\!] \hspace{-2.5pt}]}_{n,k} ^{l_j}$ implies that, for some $r\in L$ , we have a pair
such that $\left ( g, \dot{g}\right ) = \left ( \iota _{\mathbb{R} ^{l_r} }\circ \underline{g}, \iota _{\left (\mathbb{R}\times \mathbb{R} ^k\right ) ^{l_r} }\circ \mathfrak{D}^{k}{g} \right )$ . Following Definition 7.2, this completes our proof.
Proposition 9.6. Let $\left ( n, k\right ) \in \mathbb{N} \times \left ( \mathbb{N} \cup \left \{\infty \right \}\right )$ . If $\displaystyle \left ( g, \dot{g}\right ) \in \underline{\mathcal{P}_{n,k}\left (\coprod _{j\in L} \overline{[\![\hspace{-2.5pt}[\mathbf{real} ]\!] \hspace{-2.5pt}]}_{n,k} ^{l_j}\right )_\bot }$ , then $\displaystyle g \;:\; \mathbb{R} ^n\,{\rightarrow }\,\left ( \coprod _{j\in L} \mathbb{R} ^{l_j}\right )_\bot$ is differentiable and $\dot{g} = \mathfrak{d}^{k}\left ( g\right )$ .
Proof Indeed, by the definition of $\underline{\mathcal{P}_{n,k}\left (-\right )_\bot }$ , we have one of the following situations:
-
s1. $g$ and $\dot{g}$ are the least morphisms, that is to say, they are constantly equal to $\bot$ ;
-
s2. the pair $\left ( g, \dot{g}\right )$ come from a pair of total functions $\left ( \underline{g}, \underline{\dot{g}} \right ) \in \displaystyle \coprod _{j\in L} \overline{[\![\hspace{-2.5pt}[\mathbf{real} ]\!] \hspace{-2.5pt}]}_{n,k} ^{l_j}$ ;
-
s3. $\displaystyle g^{-1}\left ( \coprod _{j\in L} \mathbb{R} ^{l _j} \right ) = W$ is open. Moreover, denoting by (58) the pair consisting of the corresponding total functions, we have that (59) holds for any differentiable map $ \alpha \;:\; \mathbb{R} ^n\,{\rightarrow }\, W$ .
(58) \begin{equation} \displaystyle \left ( \underline{g} \;:\; W\,{\rightarrow }\,\left ( \coprod _{j\in L} \mathbb{R} ^{l _j} \right ), \, \underline{\dot{g}} \right ) \end{equation}(59) \begin{equation} \left ( \underline{g} \circ \alpha, \, \underline{\dot{g} } \circ \mathfrak{D}^{k}{\alpha } \right )\in \coprod _{j\in L} \overline{[\![\hspace{-2.5pt}[\mathbf{real} ]\!] \hspace{-2.5pt}]}_{n,k} ^{l_j} . \end{equation}
If (s1.) holds, following Definition 7.5, we get that $g$ is differentiable and $\dot{g} = \mathfrak{d}^{k}\left ( g\right )$ by Remark 7.3.
In case of (s2.), we get $\underline{g}$ is differentiable and $\underline{\dot{g}} = \mathfrak{D}^{k}{\underline{g}}$ by Lemma 9.5. Hence $g$ is differentiable and $\dot{g} = \mathfrak{d}^{k}\left ( g\right )$ .
Finally, in case of (s3.), by Lemma 9.5, we get that, for any differentiable $ \alpha \;:\; \mathbb{R} ^n\,{\rightarrow }\, W$ , $\underline{g} \circ \alpha$ is differentiable and $\underline{\dot{g} } \circ \mathfrak{D}^{k}{\alpha }$ is well defined and equal to $\mathfrak{D}^{k}{\left ( \underline{{g} } \circ \alpha \right ) }$ . By Lemma 7.4, this implies that $\underline{g}$ is differentiable and $\mathfrak{D}^{k}{ \underline{g } } = \underline{\dot{g} }$ . Following Definition 7.5, this completes the proof that $g$ is differentiable and $\dot{g} = \mathfrak{d}^{k}\left ( g\right )$ .
Corollary 9.7. Let $k\in \mathbb{N}\cup \left \{\infty \right \}$ . If, for each $i\in \mathfrak{L}$ , the morphism $ \left (g, \dot{g} \right )$ in $\boldsymbol{\omega } \mathbf{Cpo}\times \boldsymbol{\omega } \mathbf{Cpo}$ defines the morphism (60) in $\mathbf{Sub}\left ( \boldsymbol{\omega } \mathbf{Cpo}\downarrow G_{{s_i}, k} \right )$ , then $\displaystyle g \;:\; \coprod _{r\in \mathfrak{L} } \mathbb{R} ^{s_r}\,{\rightarrow }\,\left ( \coprod _{j\in L} \mathbb{R} ^{l_j}\right )_\bot$ is differentiable and $\dot{g} = \mathfrak{d}^{k}\left ( g\right )$ .
Proof From the hypothesis, for each $i\in \mathfrak{L}$ , we conclude that the pair (62) defines the morphism (64), since $\left ( \iota _{\mathbb{R}^{s_i}}, \iota _{\left ( \mathbb{R}\times \mathbb{R} ^k\right ) ^{s_i}} \right )$ defines the coprojection (61) in $\mathbf{Sub}\left ( \boldsymbol{\omega } \mathbf{Cpo}\downarrow G_{{s_i}, k} \right )$ .
Since $\mathrm{id} _{\mathbb{R} ^{s_i}} \;:\; \mathbb{R} ^{s_i}\,{\rightarrow }\, \mathbb{R}^{s_i}$ is differentiable, and $\mathfrak{D}^{k}{\left ( \mathrm{id} _{\mathbb{R} ^{s_i}} \right ) }$ is given by the identity $\left ( \mathbb{R}\times \mathbb{R} ^k\right ) ^{s_i}\,{\rightarrow }\,\left ( \mathbb{R}\times \mathbb{R} ^k\right ) ^{s_i}$ , we conclude that
By Proposition 9.6, (64) proves that $g_i$ is differentiable and $\dot{g}_i = \mathfrak{d}^{k}\left ( g_i\right )$ . Since this result holds for any $i\in \mathfrak{L}$ , we conclude that $g$ is differentiable and $\dot{g} = \mathfrak{d}^{k}\left ( g \right )$ .
9.5 Fundamental AD correctness theorem
We prove Theorem 9.8, which completes the proof of Theorem 9.1.
Theorem 9.8. Let $\displaystyle t\;:\; \coprod _{r\in \mathfrak{L} } \mathbf{real} ^{s _r}\,{\rightarrow }\, \mathbf{Syn}_{\mathcal{S}}\left ( \coprod _{j\in L} \mathbf{real} ^{l _j} \right )$ be a morphism in $\mathbf{Syn}_V$ . We have that $\displaystyle [\![ t ]\!] \;:\; \coprod _{r\in \mathfrak{L} } \mathbb{R} ^{s _r}\,{\rightarrow }\,\left ( \coprod _{j\in L} \mathbb{R} ^{l _j}\right )_\bot$ is differentiable and, for any $k\in \left ( \mathbb{N}\cup \left \{ \infty \right \}\right )$ , $ [\![\mathbb{D}\left ( t \right ) ]\!] _{k} = \mathfrak{d}^{k}\left ( [\![ t ]\!] \right )$ .
Proof We assume that we have $t$ as above. For each $i\in \mathfrak{L}$ , the pair (65) is in the image of $\left ([\![-]\!] \times [\![-]\!] _{k} \right )\circ \left ( \mathrm{id}\times \mathbb{D}\right ) =\mathcal{U}_{\mathcal{BV}} \left ({\underline{\mathcal{L}}}_{{s_i},k}\right )\circ \overline{[\![\hspace{-2.5pt}[-]\!] \hspace{-2.5pt}]}_{{s_i},k}$ . This implies that (65) defines the morphism (66) in $\mathbf{Sub}\left ( \boldsymbol{\omega } \mathbf{Cpo}\downarrow G_{{s_i}, k} \right )$ . Therefore, by Corollary 9.7, we conclude that $[\![t]\!]$ is differentiable and $[\![\mathbb{D}{\left ( t\right ) } ]\!] _{k} = \mathfrak{d}^{k}\left ( [\![t]\!] \right )$ .
9.6 Correctness of the dual numbers forward AD
We assume that $\mathbf{vect}$ implements the vector space $\mathbb{R}$ . It is straightforward to see that we get forward mode AD out of our macro $\mathcal{D}$ : namely, for a program $x:\tau \vdash{t}:\sigma$ (where $\tau$ and $\sigma$ are data types) in the source language, we get a program $x:\mathcal{D}\;{(\tau )} \vdash \mathcal{D}\;{(t)}:\mathcal{D}\;{(\sigma )}$ in the target language, which, by Theorem 9.1, satisfies the following properties:
-
• $[\![{t} ]\!] \;:\; \coprod _{r\in K } \mathbb{R} ^{n_r}\,{\rightarrow }\,\left ( \coprod _{j\in L } \mathbb{R} ^{m_j} \right )_\bot$ is differentiable as in Definition 7.5;
-
• if $y\in \mathbb{R} ^{n_i}\cap [\![{t} ]\!] ^{-1}\left ( \mathbb{R} ^{m_j} \right ) = W_ j$ for some $i\in K$ and $j\in L$ , we have that, for any $w\in \mathbb{R} ^{n_i}$ , denoting $z := {\phi} _{{n_i},1}\left (y,w\right )$ ,
(67) \begin{eqnarray} [\![ \mathcal{D}\;{(t)} ]\!] _{1} \left ( {\phi} _{{n_i},1}\left (y,w\right ) \right ) &=& \mathfrak{d}^{1}\left ( [\![t]\!] \right ) \left ( z\right ) = \mathfrak{D}^{1}{[\![t]\!] |_{W_j} } \left ( z\right ) = {\phi} _{{m_j},1}\left ( [\![{t} ]\!] \left ( y\right ), \tilde{w}\cdot [\![{t} ]\!] '(y) ^{t} \right ) \nonumber \\ &=& {\phi} _{l,1}\left ( [\![{t} ]\!] \left ( y\right ), [\![{t} ]\!] '(y)(w) \right ), \end{eqnarray}where $[\![{t} ]\!] '(y) \;:\; \mathbb{R} ^{n_i}\,{\rightarrow }\,\mathbb{R} ^{m_j}$ is the derivative of $[\![{t} ]\!] |_{W_j}\;:\; W_j\,{\rightarrow }\, \mathbb{R} ^{m_j}$ at $y$ .
9.7 Correctness of the dual numbers reverse AD
We assume that $\mathbf{vect}$ implements the vector space $\mathbb{R} ^k$ , for some fixed $k\in \mathbb{N}\cup \left \{\infty \right \}$ . We consider the respective (co)projections $\mathfrak{p}_{k \rightarrow s}$ for each $s\in \mathbb{N}\cup \left \{\infty \right \}$ , as defined in (18) . The following shows how our macro encompasses reverse-mode AD.
For each $s\in \mathbb{N} ^\ast$ with $s \leq k$ , we can define the morphism $\mathbf{wrap}_{s}\stackrel{\mathrm{def}}{=} \left ( \pi _{j}, \overline{e} _{j} \right ) _{j\in \mathbb{I}_{s}} \;:\; \mathbf{real} ^s\,{\rightarrow }\, \left ( \mathbf{real}\times \mathbf{vect} \right ) ^s$ in $\mathbf{Syn}_V^{{\mathbf{tr}}}$ , which corresponds to the wrapper defined in (2) in the target language. We denote $\mathtt{wrap}_{s}\stackrel{\mathrm{def}}{=}[\![\mathbf{wrap}_{s}]\!] _{k}$ . By the definition of the $k$ -semantics, it is clear that $\mathtt{wrap}_{s} \left ( y \right ) = {\phi} _{s,k}\left ( y, e^{k}_{1}, \ldots, e^{k}_{s} \right )$ .
For a program $x:\mathbf{real}^{s} \vdash{t}:\mathbf{real} ^l$ (where $s, l\in \mathbb{N}^\ast$ ), we have that, for any $y\in [\![{t} ]\!] ^{-1}\left ( \mathbb{R} ^l \right )\subset \mathbb{R} ^s$ ,
by Theorem 9.1. This gives the transpose derivative $\mathfrak{p}_{s \rightarrow k} [\![{t} ]\!] '(y) ^t$ as something of the type $\mathbf{vect} ^l$ . This should be good enough whenever $k = s$ , since, in this case, $[\![\mathbf{vect} ^l]\!] _{k} = \left ( \mathbb{R}^s\right ) ^l$ and $\mathfrak{p}_{s \rightarrow k} =\mathfrak{p}_{k \rightarrow k} = \mathrm{id}$ .
In case of $s \lt k$ , if needed, the type can be fixed by using the handler $\mathfrak{h} _{s}$ . More precisely, we can define the morphism
and, by the definition of $k$ -semantics, we conclude that
since $ \mathfrak{p}_{k \rightarrow s}\circ \mathfrak{p}_{s \rightarrow k} = \mathrm{id}$ whenever $s\leq k$ .
Again, by Theorem 9.1, it is straightforward to generalize the correctness statements above to more general data types $\sigma$ . Furthermore, it should be noted that, for $k=\infty$ (representing the case of a type of dynamically sized array of cotangents), the above shows that our macro gives the reverse-mode AD for any program $x:\tau \vdash{t}:\sigma$ for data types $\tau$ and $\sigma$ . This choice of $k=\infty$ is the easiest route to take for a practical implementation of this form of dual numbers reverse AD, as it leads to a single type of cotangent vectors that works for any program.
10. AD for Recursive Types and ML-Polymorphism
10.1 Syntax for recursive types
We extend both our source and target languages of Sections 6.1 and 6.2 with ML-style polymorphism and type recursion in the sense of FPC (Fiore and Plotkin Reference Fiore and Plotkin1994). That is, we extend types, values, and computations for each of the two languages as
The new values and computations according to the rules in Fig. 10.
Here, kinding contexts $\Delta$ are lists of type variables $\alpha _1,\ldots, \alpha _n$ . We consider judgments $\Delta \mid \Gamma \vdash t \;:\; \tau$ , where the types in $\Gamma$ and $\tau$ may contain free type variables from $\Delta$ . They should be read as specifying that $t$ is a program of type $\tau$ , with free variables typed according to $\Gamma$ , that is polymorphic in the type variables of $\Delta$ .
We use the $\beta \eta$ -rules of Fig. 11.
Once a language has recursive types, it is already expressive enough to get term recursion and, hence, iteration. Namely, we can now consider term recursion at type $\tau =\sigma \to \rho$ as syntactic sugar. Namely, we first define $\chi \stackrel{\mathrm{def}}{=} \boldsymbol{\mu }\alpha .\left ( \alpha \to \tau \right )$ and then:
The semantics of the language is, of course, expected to be consistent – meaning that the interpretations of term recursion and recursive types should be compatible according to the definition above. Alternatively, we can consider that the source language is given by the basic language with the typing rules given by Fig. 1 with the corresponding grammar plus the recursive types established above, while the target language is the source language plus the extension given by the grammar and typing rules defined in Section 6.2.
10.2 Categorical models for recursive types: $rCBV$ models
Here, we establish the basic categorical model for the syntax of CBV languages with recursive types. Let $\left (\mathcal{V}, \mathcal{T}\right )$ be a $CBV$ pair and $J:\mathcal{V}\,{\rightarrow }\,\mathcal{C}$ the corresponding universal Kleisli functor. Moreover, let $\mathbf{Cat}\left ( \mathsf{2}, {\mathcal{V}}\textrm{-}\mathbf{Cat} \right )$ be the category of morphisms of ${\mathcal{V}}\textrm{-}\mathbf{Cat}$ .
For each $n\in \mathbb{N}$ , an $n$ -variable $\left (\mathcal{V}, \mathcal{T}\, \right )$ -parametric type (or a $\left (\mathcal{V}, \mathcal{T}\, \right )$ -parametric type of degree $n$ ) is a morphism ${E} \;:\; \left ( J^{\mathrm{op}} \times J\right ) ^n \rightarrow J$ in $\mathbf{Cat}\left ( \mathsf{2}, {\mathcal{V}}\textrm{-}\mathbf{Cat} \right )$ . In other words, it consists of a pair ${E} = \left ({E}_{\mathcal{V}},{E}_{\mathcal{C}} \right )$ of $\mathcal{V}$ -enriched functors such that (69) commutes. A $\left (\mathcal{V}, \mathcal{T}\, \right )$ -parametric type of degree $0$ (71) can be identified with the corresponding object $\mathcal{V}$ .
We denote by $\mathsf{Param}\left ( \mathcal{V}, \mathcal{T}\right )$ the collection of all $\left (\mathcal{V}, \mathcal{T}\, \right )$ -parametric types ${E} = \left ({E}_{\mathcal{V}},{E}_{\mathcal{C}}\right )$ of any degree $n\in \mathbb{N}$ . As the terminology indicates, the objects of $\mathsf{Param}\left ( \mathcal{V}, \mathcal{T}\right )$ play the role of the semantics of parametric types in our language. However, the parametric types in the actual language could be a bit more restrictive. They usually are those constructed out of the primitive type formers: namely, in our case, tupling (finite products), cotupling (finite coproducts), exponentiation (Kleisli exponential), and type recursion.
Definition 10.1 (Free type recursion). A free decreasing degree type operator (fddt operator) for $\left (\mathcal{V}, \mathcal{T}\right )$ is a function (70) identity on parametric types of degree $0$ which takes each $(n+1)$ -variable $\left (\mathcal{V}, \mathcal{T}\, \right )$ -parametric type ${E} = \left ({E}_{\mathcal{V}},{E}_{\mathcal{C}}\right )$ to a $\left (\mathcal{V}, \mathcal{T}\, \right )$ -parametric type ${\nu E} = \left ({\nu E}_{\mathcal{V}},{\nu E}_{\mathcal{C}} \right )$ of degree $n$ , provided that $n\in \mathbb{N}$ .
A rolling for (70) is a collection (72) of natural transformations such that (73) is invertible for any ${E} = \left ({E}_{\mathcal{V}},{E}_{\mathcal{C}} \right )$ , that is to say, $J\left (\mathsf{roll} ^{{E}} \right )$ is a natural isomorphism.
A free type recursion for $\left (\mathcal{V}, \mathcal{T}\right )$ is a pair $\underline{\nu } = \left ( \nu, \underline{\mathsf{roll} } \right )$ where $\nu$ is an fddt operator and $\underline{\mathsf{roll} }$ is a rolling for $\nu$ .
Definition 10.2 ( $H$ -compatible). Let $H$ be a $CBV$ pair morphism between $CBV$ pairs $\left ( \mathcal{V}, \mathcal{T}\, \right )$ and $\left ( \mathcal{V}\,', \mathcal{T}^{\;\,\prime} \right )$ . A pair $\left ({E},{E'}\right ) \in \mathsf{Param}\left ( \mathcal{V}, \mathcal{T}\right ) \times \mathsf{Param}\left ( \mathcal{V}\,', \mathcal{T}^{\;\,\prime}\right )$ of parametric types is $H$ -compatible if they have the same degree $n$ and the diagram (74) commutes. In particular, if $n = 0$ , the pair $\left ({E},{E'}\right )$ is $H$ -compatible if $H\left ({E}_{\mathcal{V}}\right ) ={E'}_{\mathcal{V}}$ .
Definition 10.3 ( $rCBV$ models). An $rCBV$ model is a triple $\left (\mathcal{V}, \mathcal{T}, \underline{\nu } \right )$ where $\left (\mathcal{V}, \mathcal{T}\right )$ is a $CBV$ pair and $\underline{\nu }$ is a free type recursion for $\left (\mathcal{V}, \mathcal{T}\right )$ .
An $rCBV$ model morphism between the $rCBV$ models $\left (\mathcal{V}, \mathcal{T}, \underline{\nu } \right )$ and $\left (\mathcal{V}\,', \mathcal{T}^{\;\,\prime}, \underline{\nu } '\right )$ consists of a $CBV$ pair morphism between $\left (\mathcal{V}, \mathcal{T}\, \right )$ and $\left (\mathcal{V}\,', \mathcal{T}^{\;\,\prime}\right )$ such that, for every $H$ -compatible pair $\left ({E},{E'}\right ) \in \mathsf{Param}\left ( \mathcal{V}, \mathcal{T}\right ) \times \mathsf{Param}\left ( \mathcal{V}\,', \mathcal{T}^{\;\,\prime}\right )$ of $n$ -variable parametric types, $\left ({\nu E},{\nu E'}\right )$ is $H$ -compatible and, if $n\gt 0$ , (75) holds, that is to say, $H\left (\mathsf{roll} ^{{E}} \right ) =\mathsf{roll} ^{{E}} _{\left ( H^{\mathrm{op}} \times H \right ) ^{n-1} }$ . The $rCBV$ models and $rCBV$ model morphisms define a category, denoted herein by $\mathfrak{C}_{\mathcal{RBV}}$ .
There is, then, an obvious forgetful functor $\mathcal{U}_{r\mathtt{p}} \;:\; \mathfrak{C}_{\mathcal{RBV}}\,{\rightarrow }\, \mathfrak{C}_{\mathtt{p}}$ .
Remark 10.4. We do not use this fact in our work, but every $rCBV$ model has an underlying $CBV$ model. More precisely, free term iteration can be defined out of the free term recursion, while the latter can be defined out of the free type recursion (see (68) ). This defines a forgetful functor
10.3 The $rCBV$ models $( \mathbf{Syn}_V^{\mathsf{R}}, \mathbf{Syn}_{\mathcal{S}}^{\mathsf{R}}, {\nu } _{\mathbf{Syn}} )$ and $( \mathbf{Syn}_V^{{\mathsf{R}}{\mathbf{tr}}}, \mathbf{Syn}_{\mathcal{S}}^{{\mathsf{R}}{\mathbf{tr}}}, {\nu } ^{{\mathbf{tr}}}_{\mathbf{Syn}} )$
We consider the $rCBV$ model generated by each syntax, that is to say, the free $rCBV$ models coming from the fine-grain CBV translations of the source and target languages. This provides us with the $rCBV$ models
with the universal property described in Proposition 10.5.
Proposition 10.5 (Universal property of the $rCBV$ models (75)). Let $\left ( \mathcal{V}, \mathcal{T}, \underline{\nu } \right )$ be an $rCBV$ model. Assume that Figs. 7 and 8 are given consistent assignments.
-
1. There is a unique $rCBV$ model morphism $H\;:\; \left ( \mathbf{Syn}_V^{\mathsf{R}}, \mathbf{Syn}_{\mathcal{S}}^{\mathsf{R}}, \underline{\nu } _{\mathbf{Syn}} \right )\,{\rightarrow }\, \left ( \mathcal{V}, \mathcal{T}, \underline{\nu } \right )$ respecting the assignment of Fig. 7 .
-
2. There is a unique $rCBV$ model morphism $\mathcal{H}\;:\; \left ( \mathbf{Syn}_V^{{\mathsf{R}}{\mathbf{tr}}}, \mathbf{Syn}_{\mathcal{S}}^{{\mathsf{R}}{\mathbf{tr}}}, \underline{\nu } ^{{\mathbf{tr}}}_{\mathbf{Syn}} \right )\,{\rightarrow }\, \left ( \mathcal{V}, \mathcal{T}, \underline{\nu } \right )$ that extends $H$ and respects the assignment of Fig. 8 .
Remark 10.6. By Proposition 6.1, we have (unique) $CBV$ model morphisms
and
that are identity on the primitive operations and types.
Proposition 10.5 states that $H\mapsto \mathcal{R} \left ( H \right )\circ \mathtt{s}$ and $\mathcal{H}\mapsto \mathcal{R} \left ( \mathcal{H} \right )\circ \mathtt{s}^{\mathsf{t}}$ give the bijections (78) and (79), respectively, showing that our syntax extension for recursive types give a free rCBV model on the syntax without recursive types.
10.4 Automatic differentiation for languages with recursive types
We extend our definition of AD to recursive types in Fig. 12. We note that our extension is compatible with our previous definitions if we view term recursion (and iteration) as syntactic sugar.
Lemma 10.7 (Type preservation). If $\Delta \mid \Gamma \vdash t \;:\; \tau$ , then $\Delta \mid \mathcal{D}\;{(\Gamma )}\vdash \mathcal{D}\;{(t)}:\mathcal{D}\;{(\tau )}$ .
10.5 AD transformation as an $rCBV$ model morphism
By Proposition 10.5, the assignment defined in Fig. 8 induces a unique $rCBV$ model morphism (80), which encompasses the macro $\mathcal{D}$ defined by Fig. 7 and extended in Fig. 12.
10.6 $\boldsymbol{\omega } \mathbf{Cpo}$ -enriched categorical models for recursive types: $rCBV$ $\boldsymbol{\omega } \mathbf{Cpo}$ -pairs
Although the setting of bilimit compact expansions is the usual reasonable basic framework for solving recursive domain equations, we do not need this level of generality. Instead, we consider a subclass of $\boldsymbol{\omega } \mathbf{Cpo}$ -enriched models, the $rCBV$ $\boldsymbol{\omega } \mathbf{Cpo}$ -pairs established in Definition 10.8.Footnote 15
We are back again to the setting of $\boldsymbol{\omega } \mathbf{Cpo}$ -enriched categories. Recall that an embedding-projection-pair (ep-pair) $u \;:\; A\stackrel{\hookrightarrow }{\leftharpoondown } B$ in an $\boldsymbol{\omega } \mathbf{Cpo}$ -category $\mathcal{C}$ is a pair $u = \left ( u^e, u^p\right )$ consisting of a $\mathcal{C}$ -morphism $u^e:A\,{\rightarrow }\, B$ , the embedding, and a $\mathcal{C}$ -morphism $u^p:B\,{\rightarrow }\, A$ , the projection, such that $u^e \circ u^p \leq \textrm{id}$ and $u^p\circ u^e= \textrm{id}$ .
It should be noted that, when considering the underlying $2$ -category of the $\boldsymbol{\omega } \mathbf{Cpo}$ -category, an ep-pair consists of an adjunctionFootnote 16 whose unit is the identity. In this context, it is also called a lari adjunction (left adjoint right-inverse) (see Clementino and Lucatelli Nunes Reference Clementino and Lucatelli Nunes2024, Section 1). In particular, as in the case of any adjunction, an embedding $u^e \;:\; A\,{\rightarrow }\, B$ uniquely determines the associated projection $u^p\;:\; B\,{\rightarrow }\, A$ and vice versa.
A zero objectFootnote 17 $\mathfrak{O}$ in an $\boldsymbol{\omega } \mathbf{Cpo}$ -category $\mathcal{C}$ is an ep-zero object if, for any object $A$ , the pair $\iota _A = \left ( \iota ^e \;:\; \mathfrak{O}\,{\rightarrow }\, A, \iota ^p \;:\; A\,{\rightarrow }\,\mathfrak{O} \right )$ consisting of the unique morphisms is an ep-pair.
Definition 10.8 ( $rCBV$ $\boldsymbol{\omega } \mathbf{Cpo}$ -pair). An $rCBV$ $\boldsymbol{\omega } \mathbf{Cpo}$ -pair is a $CBV$ pair $\left ( \mathcal{V}, \mathcal{T}\, \right )$ such that, denoting by $J \;:\; \mathcal{V}\,{\rightarrow }\,\mathcal{C}$ the corresponding universal Kleisli $\mathcal{V}$ -functor,
-
[r $\omega$ .1] $\mathcal{V}$ is a cocomplete $ \boldsymbol{\omega } \mathbf{Cpo}$ -cartesian closed category Footnote 18 ;
-
[r $\omega$ .2] the unit of $\mathcal{T}$ is pointwise a full morphism (hence, $J$ is a locally full $\boldsymbol{\omega } \mathbf{Cpo}$ -functor);
-
[r $\omega$ .3] $\mathcal{C}$ has an ep-zero object $\mathfrak{O} = J\left ( \mathsf{0}\right )$ , where $\mathsf{0}$ is initial in $\mathcal{V}$ ;
-
[r $\omega$ .4] whenever $u \;:\; J(A)\stackrel{\hookrightarrow }{\leftharpoondown } J(B)$ is an ep-pair in $\mathcal{C}$ , there is one morphism $\hat{u} \;:\; A\,{\rightarrow }\, B$ in $\mathcal{V}$ such that $J\left ( \hat{u}\right ) =u^e$ .
An $rCBV$ $\boldsymbol{\omega } \mathbf{Cpo}$ -pair morphism from $\left (\mathcal{V}, \mathcal{T}\, \right )$ into $\left (\mathcal{V}\,', \mathcal{T}^{\;\,\prime} \right )$ is an $\boldsymbol{\omega } \mathbf{Cpo}$ -functor $H \;:\; \mathcal{V}\,{\rightarrow }\,\mathcal{V}\,'$ that strictly preserves $\boldsymbol{\omega } \mathbf{Cpo}$ -colimits and whose underlying functor is a morphism between the $CBV$ pairs. This defines a category of $rCBV$ $\boldsymbol{\omega } \mathbf{Cpo}$ -pairs, denoted herein by $\boldsymbol{\omega } \mathbf{CPO}\textrm{-}\mathfrak{C}_{r\mathcal{BV}}$ .
Every $rCBV$ $\boldsymbol{\omega } \mathbf{Cpo}$ -pair $\left ( \mathcal{V}, \mathcal{T}\, \right )$ has an underlying $\boldsymbol{\omega } \mathbf{Cpo}$ -pair, and this extends to a forgetful functor $\boldsymbol{\omega } \mathbf{CPO}\textrm{-}\mathfrak{C}_{r\mathcal{BV}}\,{\rightarrow }\,\boldsymbol{\omega } \mathbf{CPO}\textrm{-}\mathfrak{C}_{\mathcal{BV}}$ . More importantly to our work, we have the following.
10.6.1 $rCBV$ $\boldsymbol{\omega } \mathbf{Cpo}$ -pairs are $rCBV$ models
Let $\left ( \mathcal{V}, \mathcal{T}\, \right )$ be an $rCBV$ $\boldsymbol{\omega } \mathbf{Cpo}$ -pair. It is clear that we have an underlying $CBV$ pair which, by abuse of language, we denote by $\left ( \mathcal{V}, \mathcal{T}\, \right )$ as well. Hence, we can consider $\left (\mathcal{V}, \mathcal{T}\, \right )$ -parametric types.
Let $n\in \mathbb{N}^\ast$ and (69) be an $n$ -variable $\left (\mathcal{V}, \mathcal{T}\, \right )$ -parametric type. For each $A\in \left ( \mathcal{V} ^{\mathrm{op}} \times \mathcal{V}\right ) ^{n-1}$ , we get an $1$ -variable $\left (\mathcal{V}, \mathcal{T}\, \right )$ -parametric type ${E}^A = \left ({E}^{A}_{\mathcal{V}},{E}^{A}_{\mathcal{C}} \right )$ , where ${E}^{A}_{\mathcal{V}}\left ( W,Y \right ) \stackrel{\mathrm{def}}{=}{E}_{\mathcal{V}}\left ( A, W,Y \right )$ and ${E}^{A}_{\mathcal{C}}\left ( W',Y' \right ) \stackrel{\mathrm{def}}{=}{E}_{\mathcal{C}}\left ( J(A), W',Y' \right )$ . Let $\mathcal{E}^{{E}}_{A}$ be the diagram (82) in $\mathcal{C}$ given by the chain of morphisms $\left ({a^e_{n}} \;:\; \mathfrak{A}_{n}\,{\rightarrow }\,\mathfrak{A}_{n+1} \right ) _{n\in \mathbb{N}}$ , where $\left ( a_{n} \right ) _{n\in \mathbb{N}}$ is the chain of ep-pairs inductively defined by (81).
There is a unique diagram $\hat{\mathcal{E}^{{E}}_{A} }$ such that $J\circ \hat{\mathcal{E}^{{E}}_{A} } = \mathcal{E}^{{E}}_{A}$ by (r $\omega$ .4) of Definition 10.8. Since $\mathcal{V}$ has $\boldsymbol{\omega } \mathbf{Cpo}$ -colimits, we conclude that the conical $\boldsymbol{\omega } \mathbf{Cpo}$ -colimit of $\hat{\mathcal{E}^{{E}}_{A} }$ exists and is preserved by $J$ (being an $\boldsymbol{\omega } \mathbf{Cpo}$ -left adjoint) – hence, $\mathcal{E}^{{E}}_{A}$ has a conical $\boldsymbol{\omega } \mathbf{Cpo}$ -colimit in $\mathcal{C}$ as well.
We recall the following variation on Smyth and Plotkin (Reference Smyth and Plotkin1982)’s celebrated limit-colimit coincidence result.
Lemma 10.9 (Limit-colimit coincidence, à la Smyth and Plotkin Reference Smyth and Plotkin1982). For any $\omega$ -chain $(a^e_n\dashv a^p_n)_{n\in \mathbb{N}}$ of ep-pairs in an $\boldsymbol{\omega } \mathbf{Cpo}$ -category $\mathcal{C}$ , any $\boldsymbol{\omega } \mathbf{Cpo}$ -colimiting cocone on $(a^e_n)_{n\in \mathbb{N}}$ consists of embeddings and the corresponding projections form an $\boldsymbol{\omega } \mathbf{Cpo}$ -limiting cone on $(a^p_n)_{n\in \mathbb{N}}$ .
Since (82) is the chain of embeddings of a chain of ep-pairs, the $\boldsymbol{\omega } \mathbf{Cpo}$ -colimit of these embeddings coincides with the $\boldsymbol{\omega } \mathbf{Cpo}$ -limit of the associated chain $\left ( a^p_{n} \right ) _{n\in \mathbb{N} }$ of projections (84), denoted herein by $\mathcal{P}^{{E}}_{A}$ . Such a bilimit of ep-pairs is absolute in the sense that any $\boldsymbol{\omega } \mathbf{Cpo}$ -functor $H \;:\; \mathcal{C}\,{\rightarrow }\, \mathcal{C} '$ preserves the conical $\boldsymbol{\omega } \mathbf{Cpo}$ -colimit (and $\boldsymbol{\omega } \mathbf{Cpo}$ -limit) of $\mathcal{E}^{{E}}_{A}$ (respectively, $\mathcal{P}^{{E}}_{A}$ ).
Since the conical $\boldsymbol{\omega } \mathbf{Cpo}$ -colimit of $\mathcal{E}^{{E}}_{A}$ is absolute, the diagram (69) commutes, and $J$ strictly preserves $\boldsymbol{\omega } \mathbf{Cpo}$ -colimits, we have the invertible morphism (85) given by the composition of the respective canonical comparison morphisms.
It should be noted that, for each $f\;:\; \left ( J^{\mathrm{op}} \times J\right ) ^{n-1} (A)\,{\rightarrow }\, \left ( J^{\mathrm{op}} \times J\right )^{n-1} (B)$ in $\left ( \mathcal{C} ^{\mathrm{op}} \times \mathcal{C}\right ) ^{n-1}$ , we have an induced $\mathcal{V}$ -natural transformation $\mathcal{E}^{{E}}_{f} \;:\; \mathcal{E}^{{E}}_{A}\,{\rightarrow }\, \mathcal{E}^{{E}}_{B}$ . This association extends to a $\mathcal{V}$ -functor $\mathcal{E}^{{E}}$ from $\left ( \mathcal{C} ^{\mathrm{op}} \times \mathcal{C}\right ) ^{n-1}$ into the $\mathcal{V}$ -category of chains in $\mathcal{C}$ . The association $A\mapsto \hat{\mathcal{E}^{{E}}_{A} }$ also extends to a $\mathcal{V}$ -functor $\hat{\mathcal{E}^{{E}} }$ from $\left ( \mathcal{V} ^{\mathrm{op}} \times \mathcal{V}\right ) ^{n-1}$ into the $\mathcal{V}$ -category of chains by the $\mathcal{V}$ -faithfulness of $J$ , .
We define the fddt operator $\nu _{\omega }$ as follows. For each $n\in \mathbb{N} ^\ast$ , given a $\left (\mathcal{V}, \mathcal{T}\, \right )$ -parametric type ${E} = \left ({E}_{\mathcal{V}},{E}_{\mathcal{C}}\right )$ , we define:
where, by abuse of language, $\mathrm{colim}$ is the $\mathcal{V}$ -functor from the $\mathcal{V}$ -category of chains in $\mathcal{V}$ (respectively, in $\mathcal{C}$ ) into the $\mathcal{V}$ -category $\mathcal{V}$ (respectively, $\mathcal{C}$ ).
Since every isomorphism is an embedding, there is only one $\omega \mathsf{roll}^{E} _A$ in $\mathcal{V}$ such that $J\left ( \omega \mathsf{roll}^{E} _A\right )$ is equal to (85). The morphisms $\omega \mathsf{roll} ^E = \left ( \omega \mathsf{roll}^{E} _A\right ) _{A\in \left ( \mathcal{V} ^{\mathrm{op}} \times \mathcal{V}\right ) ^{n-1} }$ gives a $\mathcal{V}$ -natural transformation ${E}_{\mathcal{V}}\left ( \textrm{id}, \nu _{\omega }{E}_{\mathcal{V}} ^{\mathrm{op}}, \nu _{\omega }{E}_{\mathcal{V}} \right )\rightarrow \nu _{\omega }{E}_{\mathcal{V}}$ such that $J\left (\omega \mathsf{roll} ^E \right )$ is invertible. Therefore, $\underline{\mathsf{roll} }_{\omega } \stackrel{\mathrm{def}}{=} \left ( \omega \mathsf{roll} ^E \right ) _{E\in \mathsf{Param}\left ( \mathcal{V}, \mathcal{T}\right ) }$ is a rolling for $\nu _{\omega }$ , and we can define the (free) type recursion $\underline{\nu }_{\omega } \stackrel{\mathrm{def}}{=} \left ( \nu _{\omega }, \underline{\mathsf{roll} }_{\omega } \right )$ .
Lemma 10.10 (Underlying $rCBV$ model). There is a forgetful functor $\mathcal{U}_{r\mathcal{BV}} \;:\; \boldsymbol{\omega } \mathbf{CPO}\textrm{-}\mathfrak{C}_{r\mathcal{BV}}\,{\rightarrow }\, \mathfrak{C}_{\mathcal{RBV}}$ defined by $\mathcal{U}_{r\mathcal{BV}} \left ( \mathcal{V}, \mathcal{T}\, \right ) = \left ( \mathcal{V}, \mathcal{T}, \underline{\nu }_{\omega } \right )$ that takes every morphism $H$ to its underlying morphism of $CBV$ models.
Proof From the definition of $\underline{\nu }_{\omega }$ and the fact that $H$ strictly preserves $\mathcal{V}$ -colimits, we conclude that, indeed, $H$ respects the conditions of a $rCBV$ model morphism described in Definition 10.3.
Remark 10.11. The product of $rCBV$ $\boldsymbol{\omega } \mathbf{Cpo}$ -pairs is computed as expected: $\left ( \mathcal{V} _0, \mathcal{T} _0 \right ) \times \left ( \mathcal{V} _1, \mathcal{T} _1 \right ) \,{\cong}\, \left ( \mathcal{V} _0 \times \mathcal{V} _1, \mathcal{T} _0\times \mathcal{T} _1 \right )$ . Moreover, it is clear that $\mathcal{U}_{r\mathcal{BV}}$ preserves finite products.
10.7 Concrete semantics
The $CBV$ pair $\left ( \boldsymbol{\omega } \mathbf{Cpo}, \left ( -\right )_\bot \right )$ as in Section 7.1 clearly satisfies the conditions of Definition 10.8, and hence, it is also an $rCBV$ $\boldsymbol{\omega } \mathbf{Cpo}$ -pair. By Proposition 10.5, for each $k\in \mathbb{N} \cup \left \{ \infty \right \}$ , we have unique $rCBV$ model morphisms (87) and (88) respecting the assignments of Fig. 12 and (33). In other words, following Remark 10.6, we have only one extension of the semantics (30) and (32) to the respective languages with recursive types.
Moreover, by Remark 10.11, we have that the product $ \left ( \boldsymbol{\omega } \mathbf{Cpo}\times \boldsymbol{\omega } \mathbf{Cpo}, \left ( -\right )_\bot \right )$ as in Section 7.1 is an $rCBV$ $\boldsymbol{\omega } \mathbf{Cpo}$ -pair.
10.8 Subscone for $rCBV$ $\boldsymbol{\omega } \mathbf{Cpo}$ -pairs
The first step for our logical relations proof is to verify that, for each $(n,k)\in \mathbb{N}\times \left ( \mathbb{N} \cup \left \{ \infty \right \}\right )$ , the $CBV$ $\boldsymbol{\omega } \mathbf{Cpo}$ -pair $\left (\mathbf{Sub}\left ( \boldsymbol{\omega } \mathbf{Cpo}\downarrow G_{n, k} \right ), \mathcal{P}_{n, k}\left (-\right )_\bot \right )$ as in Proposition 9.3 yields an $rCBV$ $\boldsymbol{\omega } \mathbf{Cpo}$ -pair. In order to do that, we rely on Theorem 10.13 about lifting the $rCBV$ $\boldsymbol{\omega } \mathbf{Cpo}$ -pair structure.
Definition 10.12 (Impurity preserving/purity reflecting). Let $\left ( \mathcal{V}, \mathcal{T}\, \right )$ and $\left ( \mathcal{V}\,', \mathcal{T}^{\;\,\prime} \right )$ be $CBV$ pairs. A $CBV$ pair morphism $H\;:\; \mathcal{V}\,{\rightarrow }\, \mathcal{V}\,'$ is impurity preserving (or, purity reflecting) if, whenever $H(f) = \eta ' _{\!\!Y} \circ g$ , there is $\hat{f}$ in $\mathcal{V}$ such that $\eta _Y \circ \hat{f} = f$ .
Theorem 10.13. Let $\left ( \mathcal{V}\,', \mathcal{T}^{\;\,\prime} \right )$ be an $rCBV$ $\boldsymbol{\omega } \mathbf{Cpo}$ -pair and $\left ( \mathcal{V}, \mathcal{T}\, \right )$ a $CBV$ pair such that $\mathcal{V}$ is a cocomplete $\boldsymbol{\omega } \mathbf{Cpo}$ -cartesian closed category and $T(\mathsf{0})$ is terminal.
If $H \;:\; \mathcal{V}\,{\rightarrow }\, \mathcal{V}\,'$ is a locally full $\boldsymbol{\omega } \mathbf{Cpo}$ -functor that yields an impurity preserving $CBV$ pair morphism $\left ( \mathcal{V}, \mathcal{T}\, \right )\,{\rightarrow }\, \mathcal{U}_{r\mathtt{p}} \left ( \mathcal{V}\,', \mathcal{T}^{\;\,\prime} \right )$ , then $\left ( \mathcal{V}, \mathcal{T}\, \right )$ is an $rCBV$ $\boldsymbol{\omega } \mathbf{Cpo}$ -pair. If, furthermore, $H$ strictly preserves $\boldsymbol{\omega } \mathbf{Cpo}$ -colimits, then $H$ yields an $rCBV$ $\boldsymbol{\omega } \mathbf{Cpo}$ -pair morphism.
Proof We prove that $\left ( \mathcal{V}, \mathcal{T}\, \right )$ yields an $rCBV$ $\boldsymbol{\omega } \mathbf{Cpo}$ -pair. By hypothesis, $\left ( \mathcal{V}, \mathcal{T}\, \right )$ satisfies (r $\omega$ .1). We prove the remaining conditions of Definition 10.8 below.
-
(r $\omega$ .2) Let $\eta$ and $\eta '$ be, respectively, the unit of $\mathcal{T}$ and $\mathcal{T}^{\;\,\prime}$ . Since $H$ is locally full, it reflects full morphisms. This implies that, for any $C\in \mathcal{V}$ , $\eta _ C$ is full since $\eta '_{H(C)}= H\left ( \eta _ C \right )$ is full.
-
(r $\omega$ .3) Since $T(\mathsf{0})$ is terminal, $J\left (\mathsf{0} \right )$ is a zero object. Thus, for each $A\in \mathcal{C}$ , we have the pair (89) of unique morphisms in $\mathcal{C}$ . Since $\overline{H}$ preserves initial objects and $\left ( \mathcal{V}\,', \mathcal{T}^{\;\,\prime} \right )$ is an $rCBV$ $\boldsymbol{\omega } \mathbf{Cpo}$ -pair, we have that (90) is the ep-pair of the unique morphisms. Finally, since $\overline{H}$ is a locally full $\boldsymbol{\omega } \mathbf{Cpo}$ -functor, it reflects ep-pairs, and hence, (89) is an ep-pair.
(88) \begin{equation} \left ( \iota _A \;:\; J\left (\mathsf{0} \right )\,{\rightarrow }\, A, {\iota ^A} \;:\; A\,{\rightarrow }\, J\left (\mathsf{0} \right ) \right ) \end{equation}(89) \begin{equation} \left ( \overline{H}\left ( \iota _A\right ), \overline{H}\left ({\iota ^A}\right ) \;:\; \overline{H}\left ( A\right )\,{\rightarrow }\, \mathfrak{O} \right ) \end{equation} -
(r $\omega$ .4) Given an ep-pair $u \;:\; J(A)\stackrel{\hookrightarrow }{\leftharpoondown } J(B)$ in $\mathcal{C}$ , the image $H(u) :\overline{H}J(A)\stackrel{\hookrightarrow }{\leftharpoondown } \overline{H}J(B)$ by $H$ is an ep-pair. Since $\left ( \mathcal{V}\,', \mathcal{T}^{\;\,\prime}\right )$ is an $rCBV$ $\boldsymbol{\omega } \mathbf{Cpo}$ -pair, there is one morphism $\hat{\overline{H}\left ( u\right )} \;:\; H(A)\,{\rightarrow }\, H(B)$ in $\mathcal{V}\,'$ such that $J'\left ( \hat{\overline{H}\left ( u\right )} \right ) = \overline{H}\left ( u^e \right )$ . Since the $CBV$ pair morphism $H\;:\; \left ( \mathcal{V}, \mathcal{T}\, \right )\,{\rightarrow }\, \mathcal{U}_{r\mathtt{p}} \left ( \mathcal{V}\,', \mathcal{T}^{\;\,\prime} \right )$ is impurity preserving, we conclude that there is $\hat{ u } \;:\; A\,{\rightarrow }\, B$ such that $J\left ( \hat{ u } \right ) = u^e$ .
As a consequence, in the setting of subscones satisfying Assumption 8.3, we get:
Theorem 10.14. Let $\left ( \mathcal{V}, \mathcal{T}\, \right )$ be an $rCBV$ $\boldsymbol{\omega } \mathbf{Cpo}$ -pair and (91) the forgetful $\boldsymbol{\omega } \mathbf{Cpo}$ -functor coming from a pair $\left ( G\;:\; \mathcal{V}\,{\rightarrow }\,\mathcal{D}, \mathfrak{T}_{sub}\right )$ satisfying Assumption 8.3 .
If $\mathcal{D}$ is cocomplete and $\overline{\mathcal{T}} = \left ( \overline{T}, \overline{\mathrm{m}}, \overline{\eta } \right )$ is a strong monad that is a lifting of the monad $\mathcal{T}$ along (91) such that (c.1) and (c.2) hold, then $\left ( \mathbf{Sub}\left ( \mathcal{D}\downarrow G \right ), \overline{\mathcal{T}} \right )$ is an $rCBV$ $\boldsymbol{\omega } \mathbf{Cpo}$ -pair and $\underline{\mathcal{L}}$ yields an $rCBV$ $\boldsymbol{\omega } \mathbf{Cpo}$ -pair morphism (92).
-
$\mathfrak{c}$ .1 $\overline{T}$ takes the initial to the terminal object;
-
$\mathfrak{c}$ .2 for any $\left ( D, C, j\right )\in \mathbf{Sub}\left ( \mathcal{D}\downarrow G \right )$ , denoting $\overline{T}\left ( D, C, j\right ) = \left ( \underline{\overline{\mathcal{T}}\left (D,C,j\right )}, T(D), \underline{\overline{\mathcal{T}}{j}}\right )$ , Diag (93) induced by the unit $\overline{\eta }$ is a pullback in $\mathcal{D}$ .
Proof By Corollary 8.5, $\mathbf{Sub}\left ( \mathcal{D}\downarrow G \right )$ is cocomplete $\boldsymbol{\omega } \mathbf{Cpo}$ -cartesian closed. Moreover, $\underline{\mathcal{L}}$ is locally full, strict $\boldsymbol{\omega } \mathbf{Cpo}$ -cartesian closed, and $\boldsymbol{\omega } \mathbf{Cpo}$ -colimit preserving by Theorem 8.6. Therefore, the fact that $\overline{\mathcal{T}}$ is a lifting of $\mathcal{T}$ through $\underline{\mathcal{L}}$ implies that it yields a $CBV$ pair morphism (92).
(c.2) implies that the $CBV$ pair morphism (92) is purity reflecting. Assuming (c.1), this implies that $\left ( \mathbf{Sub}\left ( \mathcal{D}\downarrow G \right ), \overline{\mathcal{T}} \right )$ is indeed an $rCBV$ $\boldsymbol{\omega } \mathbf{Cpo}$ -pair morphism and $\underline{\mathcal{L}}$ yields an (92) is an $rCBV$ $\boldsymbol{\omega } \mathbf{Cpo}$ -pair morphism by Theorem 10.13.
In the particular case of interest, we conclude:
Proposition 10.15. For each $(n,k)\in \mathbb{N}\times \left ( \mathbb{N} \cup \left \{ \infty \right \}\right )$ , $\left (\mathbf{Sub}\left ( \boldsymbol{\omega } \mathbf{Cpo}\downarrow G_{n, k} \right ), \mathcal{P}_{n, k}\left (-\right )_\bot \right )$ is an $rCBV$ $\boldsymbol{\omega } \mathbf{Cpo}$ -pair. Moreover, $\underline{\mathcal{L}} _{n,k} \;:\; \mathbf{Sub}\left ( \boldsymbol{\omega } \mathbf{Cpo}\downarrow G_{n,k} \right )\,{\rightarrow }\, \boldsymbol{\omega } \mathbf{Cpo}\times \boldsymbol{\omega } \mathbf{Cpo}$ yields an $rCBV$ $\boldsymbol{\omega } \mathbf{Cpo}$ -pair morphism
Proof In fact, we already know that $\underline{\mathcal{L}} _{n,k}$ comes from a pair that satisfies Assumption 8.3. Moreover, $\left (\boldsymbol{\omega } \mathbf{Cpo}\times \boldsymbol{\omega } \mathbf{Cpo}, \left ( -\right )_\bot \right )$ is an $rCBV$ $\boldsymbol{\omega } \mathbf{Cpo}$ -pair and $\mathcal{P}_{n, k}\left (-\right )_\bot$ is a lifting of $\left ( -\right )_\bot$ along $\underline{\mathcal{L}} _{n,k}$ satisfying the conditions of Theorem 10.14.
By Proposition 10.15 and Lemma 10.10, we get:
Corollary 10.16. $\underline{\mathcal{L}} _{n,k}$ yields an $rCBV$ model morphism
10.9 Logical relations as an $rCBV$ model morphism
Let $(n, k)\in \mathbb{N}\times \left ( \mathbb{N} \cup \left \{ \infty \right \}\right )$ , and let’s assume that $\mathcal{D}$ is sound for primitives (see Definition 7.7). By the universal property of the $rCBV$ model $\left ( \mathbf{Syn}_V^{\mathsf{R}}, \mathbf{Syn}_{\mathcal{S}}^{\mathsf{R}}, \underline{\nu } _{\mathbf{Syn}} \right )$ and the chain rule for derivatives, there is only one $rCBV$ model morphism
that is consistent with the assignment given by (47), (52), (54), and (53).
Lemma 10.17. For any $(n, k)\in \mathbb{N}\times \left ( \mathbb{N} \cup \left \{ \infty \right \}\right )$ , Diag. (96) commutes.
Proof Both $\left ([\![-]\!] \times [\![-]\!] _{k} \right )\circ \left ( \mathrm{id}\times \mathbb{ID}\right )$ and $\mathcal{U}_{r\mathcal{BV}} \left ({\underline{\mathcal{L}}}_{n,k}\right )\circ \overline{[\![\hspace{-2.5pt}[-]\!] \hspace{-2.5pt}]}_{n,k}$ yield $rCBV$ model morphisms that are consistent with the assignment given by the object $\left ( \mathbb{R}, \mathbb{R}\times \mathbb{R} ^k \right )$ and the morphisms (48), (49), and (50). Therefore, by the universal property of $\left (\mathbf{Syn}_V^{\mathsf{R}},\mathbf{Syn}_{\mathcal{S}}^{\mathsf{R}}{,}\underline{\nu } _{\mathbf{Syn}}\right )$ , we conclude that Diag. (96) indeed commutes.
10.10 AD correctness theorem for non-recursive data types
The correctness theorem for non-recursive data types (i.e., types formed from $\mathbf{real}$ , products, and coproducts) follows from Lemma 10.17 and Corollary 9.7. That is to say, we have:
Theorem 10.18. Let $\displaystyle t\;:\; \coprod _{r\in \mathfrak{L} } \mathbf{real} ^{s _r}\,{\rightarrow }\, \mathbf{Syn}_{\mathcal{S}}^{\mathsf{R}}\left ( \coprod _{j\in L} \mathbf{real} ^{l _j} \right )$ be a morphism in $\mathbf{Syn}_V^{\mathsf{R}}$ . We have that $\displaystyle [\![ t ]\!] \;:\; \coprod _{r\in \mathfrak{L} } \mathbb{R} ^{s _r}\,{\rightarrow }\, \left ( \coprod _{j\in L} \mathbb{R} ^{l _j} \right )_\bot$ is differentiable and, for any $k\in \left ( \mathbb{N}\cup \left \{ \infty \right \}\right )$ , $ [\![\mathbb{ID}\left ( t \right ) ]\!] _{k} = \mathfrak{d}^{k}\left ( [\![ t ]\!] \right )$ .
10.11 AD on recursive data types
The logical relations argument we presented provides us with an easy way to compute the logical relations of general recursive types: namely, since $\left (\mathbf{Sub}\left ( \boldsymbol{\omega } \mathbf{Cpo}\downarrow G_{n, k} \right ), \mathcal{P}_{n, k}\left (-\right )_\bot \right )$ is an $rCBV$ $\boldsymbol{\omega } \mathbf{Cpo}$ -pair, the recursive types will be computed out of suitable colimits. This gives us useful information about the semantics of $\mathcal{D}\;{(t)}$ for a program $x:\tau \vdash{t}:\sigma$ , where $\tau$ and $\sigma$ are recursive types. In particular, we can extend the correctness result of Theorem 10.18 to any recursive data type. By that, we mean any type $\tau$ built from the grammar
that is, any type not involving function types.
We can define these (recursive) data type more formally as follows. We denote by $\mathbf{Syn}_C^{\mathsf{R}}$ the Kleisli $\mathbf{Syn}_V^{\mathsf{R}}$ -category associated with $\left ( \mathbf{Syn}_V^{\mathsf{R}}, \mathbf{Syn}_{\mathcal{S}}^{\mathsf{R}} \right )$ . Moreover, we, respectively, denote by (97) and (98) the coproduct, product, and $n$ -diagonal functors.
Definition 10.19. Let ${R},{I},{O} \;:\; \left (\mathbf{Syn}_V^{\mathsf{R}}\right ) ^{\mathrm{op}} \times \mathbf{Syn}_V^{\mathsf{R}}\,{\rightarrow }\,\mathbf{Syn}_V^{\mathsf{R}}$ be the constant functors which are, respectively, equal to $\mathbf{real}$ , $\mathsf{1}$ and $\mathsf{0}$ . We define the set $\mathfrak{P}^{\mathfrak{d}}\left ( \mathbf{Syn}_V^{\mathsf{R}}, \mathbf{Syn}_{\mathcal{S}}^{\mathsf{R}}, \underline{\nu } _{\mathbf{Syn}} \right )$ inductively by (D1), (D2), and (D3).
-
(D1) The functors ${R},{I},{O}$ are in $\mathfrak{P}^{\mathfrak{d}}\left ( \mathbf{Syn}_V^{\mathsf{R}}, \mathbf{Syn}_{\mathcal{S}}^{\mathsf{R}}, \underline{\nu } _{\mathbf{Syn}} \right )$ . Moreover, the projection $\pi _{2}:\left (\mathbf{Syn}_V^{\mathsf{R}}\right ) ^{\mathrm{op}} \times \mathbf{Syn}_V^{\mathsf{R}}\,{\rightarrow }\, \mathbf{Syn}_V^{\mathsf{R}}$ belongs to $\mathfrak{P}^{\mathfrak{d}}\left ( \mathbf{Syn}_V^{\mathsf{R}}, \mathbf{Syn}_{\mathcal{S}}^{\mathsf{R}}, \underline{\nu } _{\mathbf{Syn}} \right )$ .
-
(D2) For each $n\in \mathbb{N} ^\ast$ , if the functors (99) belong to $\mathfrak{P}^{\mathfrak{d}}\left ( \mathbf{Syn}_V^{\mathsf{R}}, \mathbf{Syn}_{\mathcal{S}}^{\mathsf{R}}, \underline{\nu } _{\mathbf{Syn}} \right )$ , then the functors (100) and (101) are in $\mathfrak{P}^{\mathfrak{d}}\left ( \mathbf{Syn}_V^{\mathsf{R}}, \mathbf{Syn}_{\mathcal{S}}^{\mathsf{R}}, \underline{\nu } _{\mathbf{Syn}} \right )$ .
-
(D3) If ${E} = \left ({E}_{\mathbf{Syn}_V^{\mathsf{R}}},{E}_{\mathbf{Syn}_C^{\mathsf{R}}} \right )\in \mathsf{Param}\left ( \mathbf{Syn}_V^{\mathsf{R}}, \mathbf{Syn}_{\mathcal{S}}^{\mathsf{R}}\right )$ is such that ${E}_{\mathbf{Syn}_V^{\mathsf{R}}}\in \mathfrak{P}^{\mathfrak{d}}\left ( \mathbf{Syn}_V^{\mathsf{R}}, \mathbf{Syn}_{\mathcal{S}}^{\mathsf{R}}, \underline{\nu } _{\mathbf{Syn}} \right )$ , then $ \left ({\nu _{\mathbf{Syn}}{E}}_{\mathbf{Syn}_V^{\mathsf{R}} } \right )$ is in $\mathfrak{P}^{\mathfrak{d}}\left ( \mathbf{Syn}_V^{\mathsf{R}}, \mathbf{Syn}_{\mathcal{S}}^{\mathsf{R}}, \underline{\nu } _{\mathbf{Syn}} \right )$ .
We define the set $\mathsf{Param}^{\mathfrak{d}}\left ( \mathbf{Syn}_V^{\mathsf{R}}, \mathbf{Syn}_{\mathcal{S}}^{\mathsf{R}}, \underline{\nu } _{\mathbf{Syn}} \right )$ of parametric data types by (102).
All such (recursive) data types are, up to isomorphism, of a particularly simple form: a sum of products.
Proposition 10.20. Let $E$ be an $n$ -variable $\left ( \mathbf{Syn}_V^{\mathsf{R}}, \mathbf{Syn}_{\mathcal{S}}^{\mathsf{R}}, \underline{\nu } _{\mathbf{Syn}} \right )$ -parametric data type, where $n\in \mathbb{N} ^\ast$ . There is a countable family of natural numbers $\displaystyle \left ( \mathsf{m}_{\left ( j, \mathtt{T}\right ) } \right ) _{\left ( j, \mathtt{T}\right )\in \left ( \mathbb{I}_{n}\cup \left \{0\right \} \right )\times{\mathsf{Tree}}}$ such that, for any $rCBV$ model morphism $H \;:\; \left ( \mathbf{Syn}_V^{\mathsf{R}}, \mathbf{Syn}_{\mathcal{S}}^{\mathsf{R}}, \underline{\nu } _{\mathbf{Syn}} \right )\,{\rightarrow }\, \mathcal{U}_{r\mathcal{BV}} \left ( \mathcal{V}, \mathcal{T}\right )$ and any $H$ -compatible pair $\left ( E,F\right )$ , we have that (104) holds, where the isomorphism $\,{\cong}\,$ is induced by coprojections and projections Footnote 19 .
As a consequence, if ${\tau}\in \mathbf{Syn}_V^{\mathsf{R}}$ corresponds to a data type $\tau$ , then there is a countable family $\left ( l_j \right ) _{j\in L}\in \mathbb{N} ^L$ of natural numbers such that (103) holds for any $rCBV$ model morphism $H \;:\; \left ( \mathbf{Syn}_V^{\mathsf{R}}, \mathbf{Syn}_{\mathcal{S}}^{\mathsf{R}}, \underline{\nu } _{\mathbf{Syn}} \right )\,{\rightarrow }\, \mathcal{U}_{r\mathcal{BV}} \left ( \mathcal{V}, \mathcal{T}\right )$ .
Proof The result follows from induction. The nontrivial part is a consequence of the following.
Let $\left ({\tilde{E}},{\tilde{F}}\right ) \in \mathsf{Param}^{\mathfrak{d}}\left ( \mathbf{Syn}_V^{\mathsf{R}}, \mathbf{Syn}_{\mathcal{S}}^{\mathsf{R}}\right ) \times \mathsf{Param}{ \left (\mathcal{U}_{r\mathcal{BV}} \left ( \mathcal{V}, \mathcal{T}\, \right ) \right ) }$ be an $H$ -compatible pair of $\left ( n+1\right )$ -variable parametric types where ${\tilde{F}}_{ \mathcal{V} }$ is given by (105) for some countable family $\displaystyle \left ( \mathsf{s}_{\left ( i,r\right ) } \right ) _{{\left (i,r\right )\in \left ( \mathbb{I}_{n+1}\cup \left \{0\right \} \right )\times{\mathfrak{L} }} }$ of natural numbers. We prove below that $\left ({\nu _{\mathbf{Syn}}{\tilde{E}}},{F}\right )$ is $H$ -compatible for some $F$ such that ${F}_{\mathcal{V}}$ satisfies Equation (104). By the definition $rCBV$ model morphism, we have that $\left ({\nu _{\mathbf{Syn}}{\tilde{E}} },{\nu _{\omega }{\tilde{F}} }\right )$ is $H$ -compatible. Hence, we only need to prove that ${\nu _{\omega }{\tilde{F}} }_{\mathcal{V}}$ is given by (104).
-
(I) We inductively define the set $\mathsf{Tree}$ by the following. Let $r\in \mathfrak{L}$ : (a) if $\mathsf{s}_{\left ( n+1,r\right ) } = 0$ , then $r\in \mathsf{Tree}$ ; (b) if $\mathsf{s}_{\left ( n+1,r\right ) } \neq 0$ , then, for any $\mathtt{T}\in \mathsf{Tree} ^{\mathsf{s}_{\left ( n+1,r\right ) }}$ , the pair $\left ( \mathtt{T}, r\right )$ is in $\mathsf{Tree}$ .
-
(II) We inductively define the family $\left ( \mathsf{m}_{\left ( j, \mathtt{T} \right ) }\right ) _{\left ( j, \mathtt{T}\right ) \in \left ( \mathbb{I}_{n}\cup \left \{ 0\right \}\right ) \times \mathsf{Tree} }$ of indices by the following. Let $r\in \mathfrak{L}$ : (a) if $\mathsf{s}_{\left ( n+1,r\right ) } = 0$ , we define $\mathsf{m}_{\left ( j, r \right ) } := \mathsf{s}_{\left ( j, r\right ) }$ for each $j$ ; (b) if $\mathsf{s}_{\left ( n+1,r\right ) } \neq 0$ , given $\mathtt{T} = \left ( \mathtt{T}_i\right ) _{i\in \mathbb{I}_{\mathsf{s}_{\left ( n+1,r\right ) }} }\in \mathsf{Tree} ^{\mathsf{s}_{\left ( n+1,r\right ) }}$ , we define $\mathsf{m}_{\left ( j, \left ( \mathtt{T}, r \right ) \right ) }$ by (106) for each $j$ .
(104) \begin{equation} {\tilde{F}}_{\mathcal{V} }\left ( W_i, Y_i\right ) _{i\in \mathbb{I}_{n+1}} =\coprod _{r\in{\mathfrak{L} } }\left ({H\left ( \mathbf{real}\right ) }^{\mathsf{s}_{\left ( 0,r\right ) }}\times \prod _{i=1}^{n+1} Y_i ^{\mathsf{s}_{\left ( i,r\right ) } }\right ) \end{equation}(105) \begin{equation} \mathsf{m}_{\left ( j, \left ( \mathtt{T}, r \right ) \right ) } = \mathsf{s}_{\left ( j,r\right ) } + \sum _{i=1}^{\mathsf{s}_{\left ( n+1,r\right ) }}\mathsf{m}_{\left ( j, \mathtt{T}_i \right ) } \end{equation}
Let $X=\left ( W_i, Y_i\right ) _{i\in \mathbb{I}_{n}}\in \left (\mathcal{V} ^{\mathrm{op}} \times \mathcal{V} \right ) ^n$ , $\mathfrak{F}_X :={\tilde{F}}^{X}_{\mathcal{V}}\left ( \mathsf{0}, - \right )$ and $\iota _{}$ the obvious unique morphism. The colimit of (107) is isomorphic to (108). Hence, by the definition of the fddt operator $\nu _{\omega }$ of $ \mathcal{U}_{r\mathcal{BV}} \left ( \mathcal{V}, \mathcal{T}\right ) = \left ( \mathcal{V}, \mathcal{T}, \underline{\nu }_{\omega } \right )$ , ${\nu _{\omega } \tilde{F}}_{\mathcal{V}}$ is given by the formula given in (104). This completes the proof.
Finally, if ${\tau} \in \mathbf{Syn}_V^{\mathsf{R}}$ corresponds to a data type $\tau$ , then the constant parametric type ${\underline{{\tau} }}$ equal to ${\tau}$ is an $\left ( \mathbf{Syn}_V^{\mathsf{R}}, \mathbf{Syn}_{\mathcal{S}}^{\mathsf{R}}\right )$ -parametric data type of degree $1$ . Hence, denoting by $\underline{H{\tau} }$ the constant parametric type equal to $H\left ({\tau} \right )$ , since $\left ({\underline{{\tau}} }, {\underline{H{\tau}} } \right )$ is $H$ -compatible, we conclude that (104) holds for some $\left ( l_j\right ) _{j\in L}$ where $L$ is countable.
In particular, for any nonparametric (meaning: $0$ -variable parametric) recursive data type $R$ , we have the following:
This lets us strengthen our correctness theorem to apply also to programs between recursive data types:
Proposition 10.21. Let $\displaystyle t:{\tau}\,{\rightarrow }\,\sigma$ be a morphism in $\mathbf{Syn}_V^{\mathsf{R}}$ . If ${\tau}$ and $\sigma$ data types, $\displaystyle [\![ t ]\!] \;:\; \coprod _{r\in \mathfrak{L} } \mathbb{R} ^{s _r}\,{\rightarrow }\, \left ( \coprod _{j\in L} \mathbb{R} ^{l _j}\right )_\bot$ is differentiable and, for any $k\in \left ( \mathbb{N}\cup \left \{ \infty \right \}\right )$ , $ [\![\mathbb{ID}\left ( t \right ) ]\!] _{k} = \mathfrak{d}^{k}\left ( [\![ t ]\!] \right )$ .
Proof First of all, indeed, by Proposition 10.20, we have that there are countable families $\left ( s _r\right ) _{r\in \mathfrak{L} }$ and $\left ( l_j\right ) _{j\in L}$ such that
is a morphism in $\mathbf{Sub}\left ( \boldsymbol{\omega } \mathbf{Cpo}\downarrow G_{{s_i}, k} \right )$ , for each $i\in \mathfrak{L}$ and any $k\in \mathbb{N}\cup \left \{ \infty \right \}$ .
By the commutativity of (96) for any $(s_i, k)\in \mathbb{N}\times \left ( \mathbb{N} \cup \left \{ \infty \right \}\right )$ , we get that the pair $\left ( [\![t]\!], [\![\mathbb{ID}{\left ( t\right ) } ]\!] _{k} \right )$ defines the morphism (111) for each $i\in \mathfrak{L}$ . By Corollary 9.7, this implies that $[\![t]\!]$ is differentiable and $ [\![\mathbb{ID}\left ( t \right ) ]\!] _{k} = \mathfrak{d}^{k}\left ( [\![ t ]\!] \right )$ .
Finally, as a consequence, we get:
Theorem 10.22. Assume that $\mathbf{vect}$ implements the vector space $\mathbb{R} ^k$ , for some $k\in \mathbb{N}\cup \left \{ \infty \right \}$ . For any program $x:\tau \vdash{t}:\sigma$ where $\tau,\sigma$ are data types (including recursive data types), we have that $[\![t]\!]$ is differentiable and, moreover,
provided that $\mathcal{D}$ is sound for primitives.
Following the considerations of Section 9.6 and 9.7, it follows from Theorem 10.18 that $\mathcal{D}$ as defined in Section 10.4 correctly provides us with forward and reverse AD transformations for data types.
10.12 AD on arrays
Arrays are semantically the same as lists: in our language, if $\tau$ is a data type, an array of $\tau$ is given by $\boldsymbol{\mu }\alpha .\mathbf{1}\,{\mathop{\sqcup }}\,+ \tau \,{\mathop{\times }}\,\alpha$ . It should be noted that, if $x:\boldsymbol{\mu }\alpha .\mathbf{1}\,{\mathop{\sqcup }}\, \tau \,{\mathop{\times }}\,\alpha \vdash{t}:\boldsymbol{\mu }\alpha .\mathbf{1}\,{\mathop{\sqcup }}\, \tau \,{\mathop{\times }}\,\beta$ , we have that
By Theorem 10.22, if $\tau$ and $\sigma$ are data types, we get that $\mathfrak{d}^{k}\left ( [\![{t} ]\!] \right )$ (as defined in (27)) is equal to $[\![\mathcal{D}\;{(t)}]\!] _{k}$ . Therefore, Theorem 10.22 already encompasses the correctness for arrays (of data types).
11. Almost Everywhere Correct AD
Here, we show how some of the arguments of Huot et al. (Reference Huot, Lew, Mansinghka and Staton2023) about almost everywhere differentiability can be accommodated in our framework, by making use of a minor variation of our chosen logical relations over $\omega$ -cpos. The resulting arguments use plain logical relations over $\omega$ -cpos and do not rely on sheaf-structure. They are also a bit more general, as they apply to languages with coproduct and recursive types.
The central notion is Lee et al. (Reference Lee, Yu, Rival and Yang2020)’s concept of functions that are PAP. We recall some of the required notions to talk about PAP functions first.
Definition 11.1 (Analytic function). A function $f:U\,{\rightarrow }\, V$ , for $U\subseteq \mathbb{R}^n$ and $V\subseteq \mathbb{R}^m$ , is analytic if, for all $x\in U$ , its Taylor series converges pointwise to $f$ on an open neighborhood of $x$ .
Definition 11.2 (( $c$ )-Analytic set). A subset $A\subseteq \mathbb{R}^n$ is called analytic if there exist analytic functions $g_1,\ldots, g_m:U\,{\rightarrow }\, \mathbb{R}$ defined on an open neighborhood $U$ of $A$ , such that
A subset $A\subseteq \mathbb{R}^n$ is called $c$ -analytic if it is the countable union of analytic subsets.
As noted by Huot et al. (Reference Huot, Lew, Mansinghka and Staton2023), we can equivalently define a $c$ -analytic set as a countable disjoint union of analytic subsets.
Definition 11.3 (PAP function). A function $f:U\,{\rightarrow }\, V$ , for $U\subseteq \mathbb{R}^n$ and $V\subseteq \mathbb{R}^m$ , is called piecewise analytic under analytic partition (PAP) if it has a PAP representation in the sense of a countable family $\{(A_i, U_i, f_i)\}_{i\in I}$ such that:
-
• the sets $A_i$ are analytic and form a partition of $U$ ;
-
• each $f_i \;:\; U_i\,{\rightarrow }\, V$ is an analytic function defined on an open neighborhood $U_i$ of $A_i$ ;
-
• $f_i|_{A_i}=f|_{A_i}$ in the sense that $f_i(x) = f(x)$ for all $x \in A_i$ .
A crucial observation by Lee et al. (Reference Lee, Yu, Rival and Yang2020) is that PAP functions are closed under composition. As noted by Huot et al. (Reference Huot, Lew, Mansinghka and Staton2023), a subset $A\subseteq \mathbb{R}^n$ is $c$ -analytic if and only if the inclusion $A\hookrightarrow \mathbb{R}^n$ is a PAP function.
We consider the following notion of partial PAP function.
Definition 11.4 (Partial PAP function). We call a partial function $f:U\rightharpoonup V$ a partial PAP function if its domain of definition is $c$ -analytic and it restricts to a (total) PAP function on its domain.
As noted by Huot et al. (Reference Huot, Lew, Mansinghka and Staton2023), such partial PAP functions are closed under composition.
Definition 11.5 (Intensional derivative). Each particular PAP representation $\{(A_i, U_i, f_i)\}_{i\in I}$ of a PAP function $f$ gives rise to a unique intensional derivative $\{(A_i, U_i, Df_i)\}_{i\in I}$ , where we write $D f_i$ for the (standard) derivative of $f_i$ , such that $Df_i=Df$ on $A_i$ .
A given PAP function may therefore have several distinct intensional derivatives, arising from the different PAP representations. However, Lee et al. (Reference Lee, Yu, Rival and Yang2020) show that such PAP functions $f$ are differentiable almost everywhere and that each intensional derivative corresponds almost everywhere with the (standard) derivative of $f$ .
Next, we redefine our logical relations for $\mathbf{real}$ and monadic types from Sections 9.3 and 9.2. First, we redefine
Second, we denote by $\mathfrak{O}_{n}$ the set of countable families $\{(A_i, U_i)\}_{i\in I}$ of pairs of analytic subsets $A_i\subseteq \mathbb{R}^n$ and open neighborhoods $U_i$ of $A_i$ in, such that all $A_i$ are pair-wise disjoint and $\bigsqcup _{i\in I}A_i\neq \emptyset, \mathbb{R}^n$ . Then, for each $\{(A_i, U_i)\}_{i\in I}\in \mathfrak{O}_{n}$ , we redefine
We redefine the $\mathbf{Sub}\left ( \boldsymbol{\omega } \mathbf{Cpo}\downarrow G_{n,k} \right )$ -monad $\mathcal{P}_{n, k}\left (-\right )_\bot$ on $\mathbf{Sub}\left ( \boldsymbol{\omega } \mathbf{Cpo}\downarrow G_{n,k} \right )$ by
where $\underline{\mathcal{P}_{n, k}\left (D, \left ( C, C^{\prime}\right ), j\right )_\bot }\subseteq G_{n,k}({C}_{\bot },{C^{\prime}}_{\bot })$ is the union
where we identify $[(\gamma _{i}, \gamma '_{\!\!i})\mid i\in I]\sim [(\overline \gamma _{j}, \overline \gamma '_{\!\!j})\mid j\in J]$ if their domains of definition coincide ( $\bigsqcup _{i\in I}A_i=\bigsqcup _{j\in J}\overline A_j$ ) and they define the same function on this domain. To be more formal, we define the identification $[(\gamma _{i}, \gamma '_{\!\!i})\mid i\in I]\sim [(\overline \gamma _{j}, \overline \gamma '_{j})\mid j\in J]$ if $\bigsqcup _{i\in I}A_i=\bigsqcup _{j\in J}\overline A_j$ and $[\gamma _{i} \circ \iota _i\mid i\in I]=[\overline \gamma _{j} \circ \overline \iota _j\mid j\in J]$ and $[\gamma '_{\!\!i} \circ {\phi} _{n,k}\circ (\iota _i\times \textrm{id}_{(\mathbb{R}^k)^n})\circ {\phi} _{n,k}^{-1}\mid i\in I]=[\overline \gamma '_{\!\!j} \circ {\phi} _{n,k}\circ (\overline \iota _j\times \textrm{id}_{(\mathbb{R}^k)^n)}\circ {\phi} _{n,k}^{-1}\mid j\in J]$ , where we use the inclusions $\iota _i:A_i\hookrightarrow U_i$ and $\overline \iota _j:\overline A_j\hookrightarrow \overline U_j$ . The structure of the monad is defined entirely analogously to that in Section 9.2. Closure under suprema of $\omega$ -chains follows from (Huot et al. Reference Huot, Lew, Mansinghka and Staton2023, Corollary B.9). It is easy to see that the conditions of Theorem 10.14 are satisfied as before.
The rest of the development remains essentially unchanged, except for the minor modification that we work with (1) PAP functions rather than differentiable functions and (2) countable families of analytic subsets with open neighborhoods rather than open subsets.
If we spell out the resulting definitions for the logical relations (focusing on the $k$ -semantics for $k=1$ ), the result is as follows:
We see that $P^{n}_{\mathbf{real}^m}$ precisely captures Huot et al. (Reference Huot, Lew, Mansinghka and Staton2023)’s notion of partial PAP functions and their intensional derivatives, if we note that we can use (analytic) $\delta$ to define for any point $y\in \mathbb{R}^n$ an arbitrary small neighborhood: $x\mapsto \frac{x*\epsilon }{\sqrt{1+|\!|x|\!|^2}}+y$ is an analytic isomorphism between $\mathbb{R}^n$ and an $\epsilon$ -ball centered at $y$ . We can show (by induction) that $P^n_{\tau }$ is closed under suprema of $\omega$ -chains using (Huot et al. Reference Huot, Lew, Mansinghka and Staton2023, Corollary B.9).
With these new definitions, our entire development goes through again. As long as we ensure that all our primitive operations denote partial PAP functions, we obtain versions of Theorem III.2 and Corollary III.3. of Huot et al. (Reference Huot, Lew, Mansinghka and Staton2023) for a language that additionally includes recursive types, by using a plain logical relations argument over $\omega$ -cpos:
Theorem 11.6 (Almost everywhere differentiability). Assume that $\mathbf{vect}$ implements the vector space $\mathbb{R} ^k$ , for some $k\in \mathbb{N}\cup \left \{ \infty \right \}$ . For any program $x:\tau \vdash{t}:\sigma$ where $\tau,\sigma$ are data types (including recursive data types), we have that $[\![t]\!]$ is differentiable almost everywhere on its domain and, moreover,
almost everywhere, provided that $\mathcal{D}$ is sound for primitives.
Consequently, we obtain the correct derivative almost everywhere for any program $t$ that terminates almost everywhere. Importantly, this result remains true if we change the semantics of $\mathbf{sign}\,{t}$ to be defined even for ${t}=0$ , as is done in Huot et al. (Reference Huot, Lew, Mansinghka and Staton2023) and Mazza and Pagani (Reference Mazza and Pagani2021):
Indeed, this semantics is still logical relation respecting, thanks to our choice of lifting of the partiality monad to logical relations.
12. Related Work
This is an improved version of the unpublished preprint Vákár (Reference Vákár2020). In particular, we have simplified the correctness argument to no longer depend on diffeological or sheaf-structure and to have it apply to arbitrary differentiable (rather than merely smooth) operations. We have further simplified the subsconing technique for recursive types.
There has recently been a flurry of work studying AD from a PL point of view, a lot of it focusing on functional formulations of AD and their correctness. Examples of such papers are Pearlmutter and Siskind (Reference Pearlmutter and Siskind2008), Elliott (Reference Elliott2018), Shaikhha et al. (Reference Shaikhha, Fitzgibbon, Vytiniotis and Peyton Jones2019), Brunel et al. (Reference Brunel, Mazza and Pagani2020), Abadi and Plotkin (Reference Abadi and Plotkin2020), Barthe et al. (2020), Lee et al. (Reference Lee, Yu, Rival and Yang2020), Huot et al. (Reference Huot, Staton and Vákár2020), Mazza and Pagani (Reference Mazza and Pagani2021), Vákár (Reference Vákár2021), Lucatelli Nunes (Reference Lucatelli Nunes2022), Huot et al. (Reference Huot, Staton and Vákár2021), Vákár and Smeding (Reference Vákár and Smeding2022), Krawiec et al. (Reference Krawiec, Jones, Krishnaswami, Ellis, Eisenberg and Fitzgibbon2022), Smeding and Vákár (Reference Smeding and Vákár2023), and Huot et al. (Reference Huot, Lew, Mansinghka and Staton2023). Of these papers, Pearlmutter and Siskind (Reference Pearlmutter and Siskind2008), Abadi and Plotkin (Reference Abadi and Plotkin2020), Lee et al. (Reference Lee, Yu, Rival and Yang2020), Mazza and Pagani (Reference Mazza and Pagani2021), Smeding and Vákár (Reference Smeding and Vákár2023),and Huot et al. (Reference Huot, Lew, Mansinghka and Staton2023) are particularly relevant as they also consider AD of languages with partial features.
Here, Pearlmutter and Siskind (Reference Pearlmutter and Siskind2008) consider an implementation that differentiates recursive programs and the implementation of Smeding and Vákár (Reference Vákár and Smeding2022) even differentiates code that uses recursive types. They do not give correctness proofs, however.
Abadi and Plotkin (Reference Abadi and Plotkin2020) pioneer a notion of correctness that we use for most of this paper, where points of non-differentiability are essentially ignored by making a function undefined at such points. They use it to give a denotational correctness proof of AD on a first-order functional language with (first-order) recursion. The first-orderness of the language allows the proof to proceed by plain induction rather than needing a logical technique.
Lee et al. (Reference Lee, Yu, Rival and Yang2020) introduce a more ambitious notion of correctness in the sense of almost everywhere correct AD. Mazza and Pagani (Reference Mazza and Pagani2021) prove the correctness of basically the same AD algorithms that we consider in this paper when restricted to PCF with a base type of real numbers and a real conditional. Importantly, they also take care to prove almost everywhere correct differentiation for a language that supports conditionals on real numbers and primitives that can have points of non-differentiability. Their proof relies on operational semantic techniques. Huot et al. (Reference Huot, Lew, Mansinghka and Staton2023) combine the ideas of Lee et al. (Reference Lee, Yu, Rival and Yang2020) with those of Vákár (Reference Vákár2020) to give a denotational proof of almost everywhere correct AD for PCF, by using sheaves of logical relations. Section 11 of the present paper shows how their arguments can be reproduced without any sheaf-theoretic machinery, essentially by choosing a different lifting of the partiality monad to logical relations.
Barthe et al. (2020) have previously used (open) logical relations over the syntax, rather than semantics, to prove correctness of AD on total languages. It would be interesting to see whether and how their techniques could be adapted to languages with partial features. We suspect that the choice between logical relations over the syntax or semantics is mostly a matter of taste but that the extra (co)completeness properties that the semantics has can help, particularly when proving things about recursion and recursive types.
There is an independent line of inquiry into differential $\lambda$ -calculus (Ehrhard and Regnier Reference Ehrhard and Regnier2003) and differential categories (Blute et al. Reference Blute, Cockett, Lemay and Seely2020; Cockett et al. Reference Cockett, Cruttwell, Gallagher, Lemay, MacAdam, Plotkin, Pronk, Fernández and Muscholl2020). A conceptual distinction with the work on AD is that differentiation tends to be a first-class construct (part of the language) in differential $\lambda$ -calculus, rather than a code transformation in a metalanguage. Further, there is a stronger emphasis on the axioms that derivatives need to satisfy and less of a focus on recipes for computing derivatives. In this setting, differential restriction categories Cockett et al. (Reference Cockett, Cruttwell and Gallagher2012) gives a more abstract semantic study of the interaction between (forward) differentiation and partiality. We found that for our purposes, a concrete semantics in terms of $\omega$ -cpos sufficed, however.
Our contribution is to give an alternative denotational argument, which we believe is simple and systematic, and to extend it to apply to languages, which, additionally, have the complex features of recursively defined data structures that we find in realistic ML-family languages.
Such AD for languages with expressive features such as recursion and user-defined data types has been called for by the machine learning community (Jeong et al. Reference Jeong, Jeong, Kim, Yu and Chun2018; van Merrienboer et al. Reference van Merrienboer, Breuleux, Bergeron and Lamblin2018). Previously, the subtlety of the interaction of AD and real conditionals had first been observed by Beck and Fischer (Reference Beck and Fischer1994).
Our work gives a relatively simple denotational semantics for recursive types, which can be considered as an important special case of bilimit compact categories (Levy Reference Levy2012). Bilimit compact categories are themselves, again, an important special case of the very general semantics of recursive types in terms of algebraically compact categories (Freyd Reference Freyd1991). We believe that working with this special case of the semantics significantly simplifies our presentation.
In particular, this simplified semantics of recursive types allows us to give a very simple but powerful (open, semantic) logical technique for recursive types. It is an alternative to the two existing techniques for logical relations for recursive types: relational properties of domains (Pitts Reference Pitts1996), which is quite general but very technical to use, in our experience, and step-indexed logical relations (Ahmed Reference Ahmed and Sestoft2006), which are restricted to logical relations arguments about syntax, hence not applicable to our situation.
Finally, we hope that our work adds to the existing body of PL literature on AD and recursion (and recursive types). In particular, we believe that it provides a simple, principled denotational explanation of how AD and expressive partial language features should interact. We plan to use it to generalize and prove correct the more advanced AD technique CHAD (Elliott Reference Elliott2018;Vákár Reference Vákár2021; Vákár and Smeding Reference Vákár and Smeding2022; Lucatelli Nunes Reference Lucatelli Nunes2022; Kerjean and Pédrot Reference Kerjean and Pédrot2022) when applied to languages with partial features.
Acknowledgments
This project has received funding via NWO Veni grant number VI.Veni.202.124 as well as the European Union’s Horizon 2020 research and innovation program under the Marie Skłodowska-Curie grant agreement no. 895827.
This research was supported through the program “Oberwolfach Leibniz Fellows” by the Mathematisches Forschungsinstitut Oberwolfach in 2022. It was also partially supported by the CMUC, Centre for Mathematics of the University of Coimbra – UIDB/00324/2020, funded by the Portuguese Government through FCT/MCTES.
We are grateful to the anonymous reviewers for their helpful comments on this manuscript.
A. Fine grain call-by-value and AD
In Section 6, we have discussed a standard coarse-grain CBV language, also known as the $\lambda _C$ -calculus, computational $\lambda$ -calculus (Moggi Reference Moggi1989), or, plainly, CBV. In this appendix, we discuss an alternative presentation in terms of fine-grain CBVFootnote 20 (Levy et al. Reference Levy, Power and Thielecke2003; Levy Reference Levy2012). While it is slightly more verbose, this presentation clarifies the precise universal property that is satisfied by the syntax of our language.
A.1 Fine grain call-by-value
We consider a standard fine-grain call-by-value language (with complex values) over a ground type $\mathbf{real}$ of real numbers, real constants $\underline{c}\in \mathrm{Op}_0$ for $c\in \mathbb{R}$ , and certain basic operations $\mathrm{op} \in \mathrm{Op}_n$ for each natural number $n\in \mathbb{N}$ .
The types $\tau,\sigma, \rho$ , (complex) values ${v},{w},{u}$ , and computations ${t},{s},{r}$ of our language are as follows.
We will use sugar
We could also define iteration as syntactic sugar: $\mathbf{iterate}\,{t}\,\mathbf{from}\,{x}={{v}}\stackrel{\mathrm{def}}{=} \left (\mu z.\lambda x.{t\,\mathbf{to}\,y.\,\mathbf{case}\,{y}\,\mathbf{of}\{\mathbf{inl}\, x' \to z\, x'\, \mid \mathbf{inr}\,{x''}\to{\mathbf{return}\,x''}\}}\right )\,{v}$ .
The typing rules are in Fig. A1.
A.2 Equational theory
We consider our language up to the usual $\beta \eta$ -equational theory for fine-grain CBV, which is displayed in Fig. A2.
Under the translation of coarse-grain CBV into fine-grain CBV, this equational theory induces precisely that of Section 6.
A.3 The $CBV$ model $\left ( \mathbf{Syn}_V, \mathbf{Syn}_{\mathcal{S}},\mathbf{Syn}_\mu, \mathbf{Syn}_{\mathsf{it}} \right )$
Our fine grain call-by-value language corresponds with a $CBV$ model (see Definition 4.4).
We define the category $\mathbf{Syn}_V$ of values, which has types as objects. $\mathbf{Syn}_V(\tau,\sigma )$ consists of $(\alpha )\beta \eta$ -equivalence classes of values $x:\tau \vdash ^v{v}:\sigma$ , where identities are $x:\tau \vdash ^v x:\sigma$ and composition of $x:\tau \vdash ^v{v}:\sigma$ and $ y:\sigma \vdash ^v{w}:\rho$ is given by $x:\tau \vdash ^v{w}{}[^{{v}}\!/\!_{y}]:\rho$ .
Lemma A.1. $\mathbf{Syn}_V$ is bicartesian closed.
Similarly, we define the category $\mathbf{Syn}_C$ of computations, which also has types as objects. $\mathbf{Syn}_C(\tau,\sigma )$ consists of $(\alpha )\beta \eta$ -equivalence classes of computations $x:\tau \vdash ^c{t}:\sigma$ , where identities are $x:\tau \vdash ^c \mathbf{return}\,x:\sigma$ and composition of $x:\tau \vdash ^c{t}:\sigma$ and $ y:\sigma \vdash ^c{s}:\rho$ is given by $x:\tau \vdash ^c t\,\mathbf{to}\,y.\,s:\rho$ .
Lemma A.2. $\mathbf{Syn}_C$ is a $\mathbf{Syn}_V$ -category.
We define the $\mathbf{Syn}_V$ -functors
We have that $\mathbf{Syn}_J \dashv \mathbf{Syn}_G$ is a (Kleisli) $\mathbf{Syn}_V$ -adjunction $\mathbf{Syn}_J \dashv \mathbf{Syn}_G$ and, hence, denoting by $\mathbf{Syn}_{\mathcal{S}}$ the induced $\mathbf{Syn}_V$ -monad, $\left ( \mathbf{Syn}_V, \mathbf{Syn}_{\mathcal{S}} \right )$ is a $CBV$ pair, as defined in Definition 4.1. Moreover, considering the free recursion and free iteration
we get the $CBV$ model $\left ( \mathbf{Syn}_V, \mathbf{Syn}_{\mathcal{S}},\mathbf{Syn}_\mu, \mathbf{Syn}_{\mathsf{it}} \right )$ which has the following universal property.
Proposition A.3 (Universal property of the syntax). Let $\left ( \mathcal{V}, \mathcal{T}, \mu, \right )$ be a $CBV$ model with chosen finite products, coproducts and exponentials. For each consistent assignment
there is a unique $CBV$ model morphism $H$ between $\left ( \mathbf{Syn}_V, \mathbf{Syn}_{\mathcal{S}},\mathbf{Syn}_\mu, \mathbf{Syn}_{\mathsf{it}} \right )$ and $\left ( \mathcal{V}, \mathcal{T}, \mu, \right )$ respecting it.
Proposition A.4 (Universal property of the syntax). Let $\left ( \mathcal{V}, \mathcal{T}, \mu, \right )$ be a $CBV$ model with chosen finite products, coproducts and exponentials. For each consistent assignment
there is a unique $CBV$ model morphism $H$ between $\left ( \mathbf{Syn}_V, \mathbf{Syn}_{\mathcal{S}},\mathbf{Syn}_\mu, \mathbf{Syn}_{\mathsf{it}} \right )$ and $\left ( \mathcal{V}, \mathcal{T}, \mu, \right )$ respecting it.
A.4 A translation from coarse-grain CBV to fine-grain CBV
This translation $(-)^\dagger$ operates on types and contexts as the identity. It faithfully translates terms $\Gamma \vdash{t}:\tau$ of coarse-grain CBV into computations $\Gamma \vdash ^c{t}^\dagger \;:\; \tau$ of fine-grain CBV. This translation illustrates the main difference between coarse-grain and fine-grain CBV: in coarse-grain CBV, values are subset of computations, while fine-grain CBV is more explicit in keeping values and computations separate. This makes it slightly cleaner to formulate an equational theory, denotational semantics, and logical relations arguments.
We list the translation $(-)^\dagger$ below where all newly introduced variables are chosen to be fresh.
A.5 Dual numbers forward AD transformation
As before, we fix, for all $n\in \mathbb{N}$ , for all $\mathrm{op} \in \mathrm{Op}_n$ , for all $1\leq i \leq n$ , computations $x_1:\mathbf{real},\ldots, x_n:\mathbf{real}\vdash ^c \partial _i\mathrm{op} (x_1,\ldots, x_n):\mathbf{real}$ , which represent the partial derivatives of $\mathrm{op}$ . Using these terms for representing partial derivatives, we define, in Fig. A3, a structure preserving macro $\mathcal{D}$ on the types, values, and computations of our language for performing forward-mode AD. We observe that this induces the following AD rule for our sugar: $\mathcal{D}_{\mathcal{C}}({{\mathbf{if}\,v\,\mathbf{then}\,t\,\mathbf{else}\,s\,}}) = \mathbf{case}\,{\mathcal{D}_{\mathcal{V}}(v)}\,\mathbf{of}\langle{x},{\_}\rangle \to{\mathbf{if}\,x\,\mathbf{then}\,\mathcal{D}_{\mathcal{C}}(t)\,\mathbf{else}\,\mathcal{D}_{\mathcal{C}}(s)\,}$ .
In fact, by the universal property of $\mathbf{Syn}_J$ , $\mathcal{D}$ is the unique structure preserving functor on $\mathcal{D}$ that has the right definition for constants, primitive operations and $\mathbf{sign}\,$ . It automatically follows that $\mathcal{D}$ respects $\beta \eta$ -equality.
Under the translation of coarse-grain CBV into fine-grain CBV, this code transformation induces precisely that of Section 6.
B. A more efficient derivative for $\mathbf{sign}\,{}$
We can define by mutual induction (for both $\mathcal{D}=\mathcal{D},\overleftarrow{\mathcal{D}}_{k}$ )
and
Then, observe that, for any $x_1:\tau _1,\ldots, x_n:\tau _n\vdash{t}:\mathbf{real}$ , we have $[\![\mathbf{sign}\,{(\mathbf{fst}\,\mathcal{D}\;{(t)})}]\!] =[\![\mathbf{let}\,{x_1} ={\mathbf{p}_{\tau _1}(x_1)}\,\mathbf{in}\,{\cdots } \mathbf{let}\,{x_n} ={\mathbf{p}_{\tau _n}(x_n)}\,\mathbf{in}\,{\cdots }{\mathbf{sign}\,{t}}]\!]$ . Therefore, we can define, for $x_1:\tau _1,\ldots, x_n:\tau _n\vdash{t}:\mathbf{real}$ ,
This yields more efficient definitions of the forward and reverse derivatives of $\mathbf{sign}\,{}$ and $\mathbf{if}\,\mathbf{then}\,\mathbf{else}\,$ as we do not need to differentiate $t$ at all.
C. Enriched scone
We present straightforward generalizations (enriched versions) of the results presented in Lucatelli Nunes and Vákár (Reference Lucatelli Nunes and Vákár2023, Section 9) below.
Considering the $\boldsymbol{\omega } \mathbf{Cpo}$ -category $\mathsf{2}$ with two objects and only one nontrivial morphism between them, the $\boldsymbol{\omega } \mathbf{Cpo}$ -category $\mathsf{2}\pitchfork \mathcal{D}$ of morphisms of $\mathcal{D}$ can be described as the $\boldsymbol{\omega } \mathbf{Cpo}$ -category $\boldsymbol{\omega } \mathbf{Cpo}\textrm{-}\mathbf{Cat}\left [\mathsf{2}, \mathcal{D} \right ]$ of $\boldsymbol{\omega } \mathbf{Cpo}$ -functors $\mathsf{2}\,{\rightarrow }\, \mathcal{D}$ .
Explicitly, the objects of $\mathsf{2}\pitchfork \mathcal{D}$ are morphisms $f\;:\; Y_0\,{\rightarrow }\, Y_1$ of $\mathcal{D}$ . A morphism between $f$ and $g$ is a pair $\alpha = \left ( \alpha _0, \alpha _1\right ) \;:\; f\,{\rightarrow }\, g$ such that $\alpha _1 f = g\alpha _0$ , that is to say, a ( $\boldsymbol{\omega } \mathbf{Cpo}$ -)natural transformation. Finally, the $\boldsymbol{\omega } \mathbf{Cpo}$ -structure is defined by $\left ( \alpha _0, \alpha _1\right ) \leq \left ( \beta _0, \beta _1\right )$ if $\alpha _0\leq \beta _ 0$ and $\alpha _1\leq \beta _ 1$ in $\mathcal{D}$ .
Given an $\boldsymbol{\omega } \mathbf{Cpo}$ -functor $G:\mathcal{C}\,{\rightarrow }\,\mathcal{D}$ , the comma category $\mathcal{D}\downarrow G$ of the identity on $\mathcal{D}$ along $G$ in ${\boldsymbol{\omega } \mathbf{Cpo}}\textrm{-}\mathbf{Cat}$ is also known as the $\boldsymbol{\omega } \mathbf{Cpo}$ -scone or Artin glueing of $G$ . It can be described as the pullback (C1) in ${\boldsymbol{\omega } \mathbf{Cpo}}\textrm{-}\mathbf{Cat}$ , in which $\mathrm{codom} \;:\; \mathsf{2}\pitchfork \mathcal{D}\,{\rightarrow }\,\mathcal{D}$ , defined by $\left ( \alpha = \left ( \alpha _0, \alpha _1\right ) \;:\; f\,{\rightarrow }\, g\right )\mapsto \alpha _1$ , is the codomain $\boldsymbol{\omega } \mathbf{Cpo}$ -functor.
Since $\mathrm{codom}$ is an isofibration, the pullback (C1) is equivalent to the pseudo-pullback of $\mathrm{codom}$ along $G$ , which is the $\boldsymbol{\omega } \mathbf{Cpo}$ -category defined as follows. The objects of the pseudo-pullback are triples
where $\xi$ is an isomorphism in $\mathcal{D}$ . A morphism $(f,C,\xi )\,{\rightarrow }\, (f',C^{\prime},\xi ' )$ is a pair of morphisms $\left ( \alpha \;:\; f\rightarrow f', h\;:\; C\,{\rightarrow }\, C^{\prime} \right )$ such that $ G(h)\circ \xi = \xi '\circ \mathrm{codom} \left ( \alpha \right )$ . Finally, the $\boldsymbol{\omega } \mathbf{Cpo}$ -structure of the homs are given pointwise. That is to say, $\left ( \alpha, h \right )\leq \left ( \alpha ', h ' \right )$ if $\alpha \leq \alpha$ in $\mathsf{2}\pitchfork \mathcal{D}$ and $h\leq h '$ in $\mathcal{C}$ .
Lemma C.1. The forgetful $\boldsymbol{\omega } \mathbf{Cpo}$ -functor $\mathcal{L} \;:\; \mathcal{D} \downarrow G\,{\rightarrow }\, \mathcal{D}\times \mathcal{C}$ , defined in (37), creates all absolute (weighted) limits and colimits.
Proof Clearly, the $\boldsymbol{\omega } \mathbf{Cpo}$ -functor $\mathcal{L}$ reflects isomorphisms.
Let $D$ be a diagram in $\mathcal{D}\downarrow G$ such that the weighted (co)limit $(co)\mathsf{lim}\left (W, \mathcal{L} D \right )$ exists and is preserved by any $\boldsymbol{\omega } \mathbf{Cpo}$ -functor. Since $\mathcal{D}\downarrow G$ is the pullback (C1), there is a unique pair of diagrams $\left ( D_0, D_1 \right )$ such that
hold.
Since $\mathrm{dom}\circ D_0 = \pi _{\mathcal{D}}\circ \mathcal{L}\circ D$ and $\mathrm{codom}\circ D_0 = G\circ \pi _{\mathcal{C}}\circ \mathcal{L}\circ D$ , we get that $(co)\mathsf{lim}\left (W, \mathrm{dom} D_0 \right )\,{\cong}\, \pi _{\mathcal{D}}\left ( (co)\mathsf{lim}\left (W, \mathcal{L} \circ D\right )\right )$ and $ (co)\mathsf{lim}\left (W, \mathrm{codom} \circ D_0\right )\,{\cong}\, G\circ \pi _{\mathcal{C}}\left ( (co)\mathsf{lim}\left (W, \mathcal{L}\circ D\right )\right )$ . Therefore, $(co)\mathsf{lim}\left (W, \mathcal{L}\circ D_0\right )$ exists in $\mathsf{2}\pitchfork \mathcal{D}$ , pointwise constructed out of $(co)\mathsf{lim}\left (W, \mathrm{dom}\circ D_0\right )$ and $(co)\mathsf{lim}\left (W, \mathrm{codom}\circ D_0\right )$ .
Moreover, since $D_1 = \pi _{\mathcal{C}}\circ \mathcal{L}\circ D$ , we have that $(co)\mathsf{lim}\left (W, D_1\right )\,{\cong}\, \pi _{\mathcal{C}}\left ( (co)\mathsf{lim}\left (W, \mathcal{L}\circ D \right )\right )$ .
Therefore, the isomorphism $\xi$ given by
together with the pair $\left ( (co)\mathsf{lim}\left (W, D_0\right ), (co)\mathsf{lim}\left (W, D_1\right ) \right )$ defines, up to isomorphism, an object of $\mathcal{D}\downarrow G$ , which satisfies the universal property for $(co)\mathsf{lim}\left (W, D\right ) = (co)\mathsf{lim}\left (W, \left ( D_0, D_1\right )\right )$ .
Moreover, by the construction above, we conclude that $(co)\mathsf{lim}\left (W, D\right )$ is preserved by $\mathcal{L}$ . In particular:
The above completes the proof that the $\boldsymbol{\omega } \mathbf{Cpo}$ -functor $\mathcal{L}$ creates $(co)\mathsf{lim}\left (W, D\right )$ .
The $\boldsymbol{\omega } \mathbf{Cpo}$ -functor $\mathcal{L}$ has a right $\boldsymbol{\omega } \mathbf{Cpo}$ -adjoint provided that $\mathcal{D}$ has binary $\boldsymbol{\omega } \mathbf{Cpo}$ -products. It is given by $\left ( D\in \mathcal{D}, C\in \mathcal{C} \right )\mapsto \left ( D\times G\left ( C\right ), C, \pi _{G(C)} \right )$ . Therefore:
Theorem C.2. The forgetful $\boldsymbol{\omega } \mathbf{Cpo}$ -functor $\mathcal{L} \;:\; \mathcal{D} \downarrow G\,{\rightarrow }\, \mathcal{D}\times \mathcal{C}$ is $\boldsymbol{\omega } \mathbf{Cpo}$ -comonadic provided that $\mathcal{D}$ has binary $\boldsymbol{\omega } \mathbf{Cpo}$ -products.
By duality, we get that the forgetful $\boldsymbol{\omega } \mathbf{Cpo}$ -functor $F \downarrow \mathcal{C}\,{\rightarrow }\, \mathcal{D}\times \mathcal{C}$ is $\boldsymbol{\omega } \mathbf{Cpo}$ -monadic provided that $\mathcal{C}$ has finite $\boldsymbol{\omega } \mathbf{Cpo}$ -coproducts. Therefore:
Theorem C.3. The forgetful $\boldsymbol{\omega } \mathbf{Cpo}$ -functor $\mathcal{L} \;:\; \mathcal{D} \downarrow G\,{\rightarrow }\, \mathcal{D}\times \mathcal{C}$ is $\boldsymbol{\omega } \mathbf{Cpo}$ -monadic whenever $G$ has a left $\boldsymbol{\omega } \mathbf{Cpo}$ -adjoint and $\mathcal{C}$ has finite $\boldsymbol{\omega } \mathbf{Cpo}$ -coproducts.
Proof Indeed, by the $\boldsymbol{\omega } \mathbf{Cpo}$ -adjunction $F\dashv G$ , we get an isomorphism $\mathcal{D} \downarrow G\,{\cong}\, F \downarrow \mathcal{C}$ , which composed with the forgetful $\boldsymbol{\omega } \mathbf{Cpo}$ -functor $ F \downarrow \mathcal{C}\,{\rightarrow }\, \mathcal{D}\times \mathcal{C}$ is equal to $\mathcal{L} \;:\; \mathcal{D} \downarrow G\,{\rightarrow }\, \mathcal{D}\times \mathcal{C}$ .
As a consequence, we conclude that:
Theorem C.4. Let $G\;:\; \mathcal{C}\,{\rightarrow }\,\mathcal{D}$ be a right $\boldsymbol{\omega } \mathbf{Cpo}$ -adjoint functor between $\boldsymbol{\omega } \mathbf{Cpo}$ -bicartesian closed categories. We have that the forgetful $\boldsymbol{\omega } \mathbf{Cpo}$ -functor $\mathcal{L}$ is $\boldsymbol{\omega } \mathbf{Cpo}$ -monadic and comonadic. In particular, $\mathcal{D} \downarrow G$ is an $\boldsymbol{\omega } \mathbf{Cpo}$ -bicartesian closed category.
D. Some Haskell Code for a Recursive Neural Network