1. Introduction
The analysis of mortality rates is fundamental for actuaries as these are used to develop and set the premium for life insurance products, estimate liabilities, and develop the corresponding risk management strategies.
It is widely acknowledged in practice how the development of future mortality rates is the outcome of a stochastic process (Cairns et al., Reference Cairns, Blake and Dowd2006a). The seminal work of Lee & Carter (Reference Lee and Carter1992) attracted significant attention in the last three decades towards the development and the extension of stochastic mortality models which, to different degrees, are capable of capturing the different features of the mortality surface, such as the presence of the cohort effect (Willets, Reference Willets2004). These models have been originally developed for the analysis of data in discrete time, such as integer ages and integer calendar (or birth) year. Some examples include the Cairns–Blake–Dowd model (Cairns et al., Reference Cairns, Blake and Dowd2006b), the Poisson log-bilinear approach of Brouhns et al. (Reference Brouhns, Denuit and Vermunt2002), the Plat model (Plat, Reference Plat2009), and the functional approach of Hyndman & Shahid Ullah (Reference Hyndman and Shahid Ullah2007).
More recently, models developed using the tools of financial mathematics gained attention in the literature, following the work of Milevsky & Promislow (Reference Milevsky and Promislow2001). For example, the papers of Schrager (Reference Schrager2006), Biffis (Reference Biffis2005), Dahl (Reference Dahl2004), Blackburn & Sherris (Reference Blackburn and Sherris2013), Jevtić et al. (Reference Jevtić, Luciano and Vigna2013), Jevtić & Regis (Reference Jevtić and Regis2019), and Jevtić & Regis (Reference Jevtić and Regis2021) propose the use of the affine interest rate modeling framework developed by Duffie & Kan (Reference Duffie and Kan1996) for the analysis of mortality dynamics. In this way, it is possible to obtain closed-form formula for the survival curves, which can be used in pricing longevity-related securities and devise risk management strategies for these products. These models assume that the mortality intensity process is driven by a set of latent variables, whose dynamics are characterized by a stochastic differential equation with mean reversion. This implies that the closed-form survival curve is an exponentially affine function of the latent variables.
An advantage of the continuous-time approach for mortality modeling is the use of financial pricing techniques, which are familiar to market practitioners. Specifically, a no-arbitrage valuation framework can be used for pricing life-contingent products and developing appropriate risk management strategies for these products using the analytical results for affine processes (Biffis, Reference Biffis2005).
This paper describes the R package AffineMortality, which supports an extensive analysis of continuous time affine mortality models in the spirit of Blackburn & Sherris (Reference Blackburn and Sherris2013), Huang et al. (Reference Huang, Sherris, Villegas and Ziveyi2022), and Ungolo et al. (Reference Ungolo, Garces, Sherris and Zhou2023). More precisely, the package estimates the parameters of these models using mortality data collected at discrete time points and ages. In addition, the package facilitates the analysis of model fit and the simulation and projection of future mortality rates. These tasks require us to discretize the continuous-time models and to recast the inferential problem in a state-space form.
Schrager (Reference Schrager2006) and Ungolo et al. (Reference Ungolo, Garces, Sherris and Zhou2023) describe how the estimation of these models using the base Kalman filter can be problematic due to the numerical issues which follow from the multiplication and inversion of large-dimensional matrices. Hence, they advocate the use of the univariate Kalman filter of Koopman & Durbin (Reference Koopman and Durbin2000). For this reason, the computational routines in AffineMortality are implemented using the univariate Kalman filtering approach, together with other numerical tricks described in Ungolo et al. (Reference Ungolo, Garces, Sherris and Zhou2023), which make the Kalman filtering procedure more stable.
Alternative methods for the parameter estimation of affine term structure models with latent factors are the simulated moments estimator and the Markov chain Monte Carlo, whose details are described in Singleton (Reference Singleton2006). These methods do not take advantage of the possibility to obtain a closed-form solution for the distribution of the latent variables, as can be possible for several affine specifications. Within the mortality modeling literature, Jevtić et al. (Reference Jevtić, Luciano and Vigna2013) use the differential evolution method to estimate the parameters by minimizing the root mean squared error based on the survival function, which in the findings of the authors can take long time to converge.
The package allows to implement and assess among others, the Blackburn–Sherris model (BS) with independent factors, the BS model with two and three dependent factors, the Arbitrage-Free Nelson-Siegel mortality model (AFNS) with independent and dependent factors, and the Cox–Ingersoll–Ross mortality model (CIR). The plan is to further expand the library of models available for analysis, and to extend the existing ones by accounting for cohort or period specific factors.
To the best of our knowledge, currently available software focus on the analysis of mortality models in discrete time. For example, the R package StMoMo (Villegas et al., Reference Villegas, Kaishev and Millossovich2018) allows for the analysis, among others, of the Lee-Carter (Lee & Carter, Reference Lee and Carter1992), CBD (Cairns et al., Reference Cairns, Blake and Dowd2006b), and the age-period-cohort model by Renshaw & Haberman (Reference Renshaw and Haberman2006). In a similar fashion, StanMoMo (Barigou et al., Reference Barigou, Goffard, Loisel and Salhi2023) performs a Bayesian analysis of stochastic mortality models using Stan. The R package demography (Hyndman et al., Reference Hyndman, Booth, Tickle and Maindonald2022) instead provides functions for demographic analysis including: lifetable calculations, Lee-Carter modeling, functional data analysis of mortality rates, fertility rates, net migration numbers, and stochastic population forecasting.
Furthermore, there are R packages for the analysis of state-space models, such as dse (Gilbert, Reference Gilbert2009), sspir (Dethlefsen et al., Reference Dethlefsen, Lundbye-Christensen and Christensen2022), dlm (Petris, Reference Petris2010), FKF (Luethi et al., Reference Luethi, Erb and Otziger2022), and KFAS (Helske, Reference Helske2017) (see also Tusell, Reference Tusell2011 for a comprehensive review). However, these packages do not readily adapt to the state-space model used for the analysis of affine mortality models. This is because the system matrices of the affine mortality models follow as the solution of an ordinary differential equation, as we briefly illustrate in Section 3, and its subsequent discretization.
R and RStudio may be subject to crashes. Since the optimization process may take a long time to perform, the functions in AffineMortality, which require a lot of time to execute (e.g. affine_fit() for estimating the model parameters), allow the user to stop the process without losing the computations performed thus far. The package deals with fault tolerance by allowing the user to input a directory where the work can be saved as an .Rdata file.
AffineMortality can be installed by running the following commands:Footnote 1
library(devtools)
install_github(“ungolof/AffineMortality”)
library(AffineMortality)
The source code of AffineMortality is available through the GitHub repository https://github.com/ungolof/AffineMortality .
The paper develops as follows: Section 2 introduces the data to be used as input for the analysis, Section 3 summarizes the affine mortality modeling framework and briefly describes the mortality models supported by the package, and Section 4 describes their parameter estimation procedure. Section 5 describes how the package can be used to perform the goodness-of-fit analysis and compare affine models, and Section 6 describes the function affine_project(), which can be called to project future cohort survival curves. The package provides two methods to analyze parameter uncertainty, described in Section 7: the first estimates the covariance of the parameter estimates by using the bootstrap, while the other implements a multiple imputation-based method. The step-by-step illustration of the package is provided in Section 8. Section 9 describes other functions in AffineMortality and Section 10 concludes.
2. Input data
Let $\mu _{x,t}$ denote the force of mortality for an individual aged $x$ last birthday in calendar year $t$ . Without loss of generality, $\mu _{x,t}$ is approximated as the central death rate $m_{x,t}$ , when we assume that the force of mortality is constant between each integer age $x$ and $x+1$ , and between each calendar year $t$ and $t+1$ . The central death rate $m_{x,t}$ is empirically estimated as the ratio between the observed number of deaths $d_{x,t}$ and the central exposure at risk years $E^c_{x,t}$ .
The period survival probability at time $t$ of an individual aged $x$ at time $t$ until age $x+T-t$ is given by:
The estimation of affine mortality models uses the average force of mortality denoted for each calendar year $t$ and each age as:
Here, $x$ is fixed and denotes the smallest age in the age-range of interest (for example, equal to 50 in Ungolo et al., Reference Ungolo, Garces, Sherris and Zhou2023 and Huang et al., Reference Huang, Sherris, Villegas and Ziveyi2022). The dataset for the analysis is a matrix of dimension $N\times K$ , where $N$ is the number of ages in the age-range of interest and $K$ is the number of calendar years for the analysis.
Conversely, given the average force of mortality, we can obtain $\mu$ , by means of the following recursion, starting from $\mu _{x,t} = \bar{\mu }_{x}\left (t,t+1\right )$ :
for $i=2,\ldots,N$ . Furthermore, let us define the column vector $\bar{\mu }_{t} = [\bar{\mu }_{x}\!(t,t+1),\ldots,$ $\bar{\mu }_{x}\!(t,t+N)]^\top$ (we drop the reference on $x$ for practical reasons).
When processing the force of mortality data $\mu _{x,t}$ , the function rates2avg() in AffineMortality can be used to transform the $N\times K$ -dimensional matrix of $\mu _{x,t}$ rates into the corresponding matrix of $\bar{\mu }$ rates. The inverse operation can be carried out using the function avg2rates():
mu_bar <- rates2avg(mu_xt_rates)
mu_xt <- avg2rates(mu_bar)
When analyzing age-period mortality rates, each cell of $\bar{\mu }$ contains the average mortality rate between the base age $x$ and the age $x+y$ to which the $y$ th row refers in calendar year $t$ .
The matrix of average forces of mortality is the key input for the estimation of affine mortality models. The use of the average forces of mortality yields smoother data, which renders the estimation process more stable. This is the approach adopted within the interest rate literature (see Christensen et al., Reference Christensen, Diebold and Rudebusch2011), as well as in the analysis of affine mortality models (Blackburn & Sherris, Reference Blackburn and Sherris2013; Huang et al., Reference Huang, Sherris, Villegas and Ziveyi2022 and Jevtić & Regis, Reference Jevtić and Regis2019).
A similar data set can be set up for the analysis of age-cohort mortality rates, as discussed in Huang et al. (Reference Huang, Sherris, Villegas and Ziveyi2022) and Ungolo et al. (Reference Ungolo, Garces, Sherris and Zhou2023).
3. Affine mortality models
In this section, we provide a brief overview of the affine mortality modeling framework, and describe the different mortality models supported by AffineMortality. We first present the model setting in the real-world probability space $(\Omega,\mathcal{F},P)$ , where $P$ is the real-world probability measure. We then discuss the development of the mortality model under a risk-neutral pricing measure $Q$ that is equivalent to $P$ . The development of the model under $Q$ is important in view of applications in pricing and valuation of mortality-linked cash flows, since under $Q$ , the combined financial and actuarial market is assumed to be arbitrage-free. The mortality dynamics under $P$ and under $Q$ are linked by a careful choice of a Radon-Nikodým density process, which shall be discussed below.
Let $\tau$ denote the random time of death of an individual who is currently aged $x$ . We equip the probability space with a right-continuous and $P$ -complete filtration $\mathbb{F} = \{\mathcal{F}_t\}_{t\geq 0}$ , which can be decomposed as $\mathbb{F} = \mathbb{G} \vee \mathbb{H}$ , where $\mathbb{G}$ is a filtration containing all financial and actuarial information, except the actual time of death, and $\mathbb{H}$ is the smallest filtration such that $\tau$ is a $\mathbb{H}$ -stopping time. Hence, $\tau$ is also a $\mathbb{F}$ -stopping time. See, for example, Biffis (Reference Biffis2005) and Blackburn & Sherris (Reference Blackburn and Sherris2013) for further details.
We model $\tau$ as a doubly-stochastic stopping time that admits an intensity process $\mu _x(t)$ , which is a nonnegative, $\mathbb{G}$ -predictable process. See Biffis et al., (Reference Biffis, Denuit and Devolder2010, Section 2) for a detailed mathematical discussion. Furthermore, we assume that $\mu _x(t)$ is modeled as an affine function of an $M$ -dimensional, $\mathbb{F}$ -adapted, latent factor process $X(t)$ . That is, there exist $\rho _0^{(x)} \in \mathbb{R}$ and $\rho _1^{(x)}\in \mathbb{R}^M$ , possibly dependent on the base age $x$ , such that
The process $X(t)$ is assumed to be a solution of the (vector) stochastic differential equation (SDE)
where $K\in \mathbb{R}^{M\times M}$ , $\theta ^P\in \mathbb{R}^M$ , and $\Sigma \in \mathbb{R}^{M\times M}$ . We assume that $D(X(t),t)$ is an $M$ -dimensional diagonal matrix with diagonal elements $d_{ii}(X(t),t)$ given by
where $\alpha ^i$ and $\beta ^i\;:\!=\;(\beta ^i_1,\dots,\beta ^i_M)^\top$ are bounded and continuous functions, and $W^P$ is a standard $M$ -dimensional $ P$ -Brownian motion. The quantities $ K$ , $\theta ^{ P}$ , and $\Sigma$ represent the rate of mean reversion, the long-run mean, and the volatility of $X(t)$ , respectively. We identify $\mathbb{G}$ as the natural filtration generated by $W^P$ , so that $X(t)$ is a $\mathbb{G}$ -adapted process. Since $X(t)$ is also a (right-) continuous process, it follows that $\mu _x(t)$ is a $\mathbb{G}$ -predictable process (Cohen & Elliott, Reference Cohen and Elliott2015, Theorem 7.2.4)
Let $ S^P_x(t,T) \;:\!=\; \mathbb{E}^P[\exp \{-\int _t^T \mu _x(s) ds\} | \mathcal{G}_t]$ denote the (real-world) probability that an individual aged $x$ at time $t$ , conditional on being alive at time $t$ , survives up to time $T$ . Hence, following the affine framework set in Duffie & Kan (Reference Duffie and Kan1996) and Duffie et al. (Reference Duffie, Pan and Singleton2000), $S_x(t,T)$ is an exponentially affine function of $X(t)$ :
where $A_x^{ P}$ and $B_x^{ P}$ are solutions of the system of ordinary differential equations (ODEs)
with terminal condition $A_x^P(T,T) = 0$ and $B_x^P(T,T) = \mathbf{0}$ .
Therefore, the average force of mortality over the period $[t,T]$ , defined as
is an affine function of the latent state process $X\left (t\right )$ , i.e.
In correspondence to Equation (2.2), $\bar{\mu }_x^{ P}\textbf{}(t,T)$ represents the average force of mortality for an individual aged $x$ at time $t$ from ages $x$ to $x+(T-t)$ .
The affine representation of the average force of mortality allows us to cast the parameter estimation problem into that for a state-space model where $\bar{\mu }_x^{ P}(t,T)$ are the observations and $X(t)$ is the unknown state process.
While the model is developed with age-period mortality data in mind (as in Section 2), the approach here is also naturally cohort-based and can be applied to the analysis of age-cohort mortality data; see e.g. Ungolo et al. (Reference Ungolo, Garces, Sherris and Zhou2023).
In the following sections, we discuss how the affine mortality model can also be developed under a pricing measure $Q$ . Then, we discuss the affine mortality models we implement in AffineMortality. For simplicity, we shall specify the mortality models we implement in the package in terms of their dynamics under the pricing measure $Q$ , since certain models call for a specific structure in the stochastic dynamics under $Q$ (see for example the Arbitrage-Free Nelson-Siegel (AFNS) model and its extensions).
3.1 Affine mortality models under the pricing measure
At this stage, we assume that the combined financial and actuarial market is arbitrage-free. Thus, there exists a probability measure $Q$ on $(\Omega,\mathcal{F})$ , equivalent to $P$ , under which the discounted financial security prices are $Q$ -martingales. We define $Q$ via its Radon-Nikodým density process with respect to $P$ ,
for some $\mathbb{G}$ -predictable, $\mathbb{R}^M$ -valued process $\Lambda _t$ satisfying suitable integrability conditions (e.g. the Novikov condition) such that $\mathcal{L}(t)$ is a $P$ -martingale. Here, $\|\cdot \|$ denotes the Euclidean norm on $\mathbb{R}^M$ . By the Girsanov Theorem, the process $W^Q(t)$ defined by
is a standard Brownian motion under $Q$ . In particular, since $\Lambda _t$ is $\mathbb{G}$ -predictable, by Biffis et al. (Reference Biffis, Denuit and Devolder2010, Proposition 3.2), $\tau$ is still a doubly stochastic stopping time under $Q$ with intensity process $\mu _x(t)$ given by Equation (3.1). As such, we can define the probability that the individual survives up to time $T$ , conditional on being alive at time $t$ , as
To ensure the latent factor process $X(t)$ is affine under both $P$ and $Q$ , we further assume that $\Lambda _t$ is given by the essentially affine specification proposed by Duffee (Reference Duffee2002). Applied to our setting, we let
for some $\lambda _1 \in \mathbb{R}^{M\times M}$ and $\lambda _0 \in \mathbb{R}^M$ , which we refer to as the market price of risk parameters. Under this specification, the dynamics of $X(t)$ can be written as
where $\Delta \in \mathbb{R}^{M\times M}$ and $\theta ^Q \in \mathbb{R}^M$ , and $\Sigma$ and $D(X(t),t)$ are as in the $P$ -dynamics of $X(t)$ . Furthermore, under this construction, $\Lambda _t$ is indeed $\mathbb{G}$ -predictable.
Since $X(t)$ is also an affine process under $Q$ , the survival probability $S_x(t,T)$ has the exponential affine form
where $A_x$ and $B_x$ are solutions of the system of ODEs
with terminal condition $A_x(T,T) = 0$ and $B_x(T,T) = \mathbf{0}$ . As a result, the average force of mortality over the period $[t,T]$ , denoted by $\bar{\mu }_x(t,T)$ , is affine in the latent factor process
One key advantage of the essentially affine specification of $\Lambda _t$ is that, given $\lambda _1$ or $\lambda _0$ , one can compute $\Delta$ and $\theta ^Q$ using the estimated values of $K$ and $\theta ^P$ ; see Ungolo et al. (Reference Ungolo, Garces, Sherris and Zhou2023, Section 2.3) for details. This is relevant if one derives the market price of risk parameters from other sources, e.g. empirical evidence on policyholder behavior or other existing calibration methods (see e.g. Bacinello et al., Reference Bacinello, Biffis and Millossovich2010, Section 4.5) and Biffis (Reference Biffis2005, Section 5.4). Furthermore, this also relates to the fact that the combined financial and actuarial market is incomplete, so there are infinitely many equivalent martingale measures $Q \sim P$ . Thus, there exist multiple values of $\lambda _0$ or $\lambda _1$ which are consistent with the above (essentially affine) framework and such that the process $\Lambda _t$ remains $\mathbb{G}$ -predictable. However, we do not discuss these points in detail as they fall outside the scope of the current paper. In the estimation method we employ in the package, the parameters under $P$ and $Q$ are simultaneously estimated from the data.
Another advantage is the flexibility of the choice of $\lambda _1$ or $\lambda _0$ . This implies that we are free to choose any desired structure for $K$ and $\theta ^P$ while retaining the required special structures for $\Delta$ and $\theta ^Q$ . This is relevant, especially for the AFNS model and its extensions, where the interpretation of the latent factors as level, slope, and curvature factors are directly related to the specification of $\Delta$ . To this end, we assume that $\theta ^P = 0$ for all models considered, except the CIR model, and assume that $K$ is a diagonal matrix with $K = \text{diag}(\kappa _1,\dots,\kappa _M)$ for all models.
In the sections that follow, we state the dynamics of the latent factor process under the pricing measure $Q$ for each model we consider in the package. For each model, the factor loadings $A_x(t,T)$ and $B_x(t,T)$ are available in closed form and are functions of $t$ and $T$ only via $T-t$ . For all models, except the Gompertz-Makeham model, the mortality intensity does not depend on the individual current age $x$ , so we drop this from the notation and write $\mu (t)$ .
3.2 Blackburn-Sherris model
The BS model assumes that
i.e. $\rho _0 = 0$ and $\rho _1 = (1,\dots,1)^\top$ , where $X(t) = (X_1(t),\dots,X_M(t))^\top$ with dynamics
The components of $X(t)$ can be assumed to be independent by specifying $\Delta$ and $\Sigma$ as diagonal matrices, hence $\Delta = \text{diag}(\delta _1,\dots,\delta _M)$ and $\Sigma = \text{diag}(\sigma _1,\dots,\sigma _M)$ . Dependence among the components of $X(t)$ can be induced by setting
In AffineMortality, the user can specify as many latent factors as desired in the independent factor case. The factor loading expressions for the independent factor case can be found in Blackburn & Sherris (Reference Blackburn and Sherris2013). However, while the factor loadings are still available in closed form in the dependent factor case, one cannot obtain a scalable nested expression for the factor loadings. AffineMortality currently supports the two and the three-factor BS models with dependent factors. The factor loading expressions for the dependent case can be found in Huang et al. (Reference Huang, Sherris, Villegas and Ziveyi2022, Appendix A).
3.3 Gompertz-Makeham model
The Gompertz-Makeham model (see Schrager, Reference Schrager2006) introduces age-dependence in the mortality intensity process. Specifically, the model assumes that $X(t)$ has the same dynamics outlined in Equation (3.5), and
for some $\gamma \gt 0$ . The functional specification of $\mu _x(t)$ implies that the mortality intensity increases exponentially in the base age $x$ and it is $X_2(t)$ , which drives the stochasticity in the intensity process at older ages. In this case, we have $\rho _0^{(x)} = \rho _0 = 0$ and $\rho _1^{(x)} = (1, e^{\gamma x})\top$ . As before, dependence between the latent factors can be introduced by replacing $\Delta$ and $\Sigma$ by lower-triangular matrices. The factor loading expressions for both the independent and dependent factor cases can be found in Appendix B.1.
3.4 Arbitrage-free Nelson-Siegel model
The Arbitrage-free Nelson-Siegel (AFNS) model, proposed by Christensen et al. (Reference Christensen, Diebold and Rudebusch2011) for modeling the term structure of interest rates, assumes that there are three latent factors, identified as Level ( $L$ ), Slope ( $S$ ), and Curvature ( $C$ ), with $Q$ -dynamics:
The AFNS model assumes that the mortality intensity is the sum of the level and slope factors
The structure of $\Delta$ implies that the factor loadings $B_L$ , $B_S$ , and $B_C$ control for the shape (i.e. level, slope, and curvature, respectively) of the average force of mortality, with randomness driven by the dynamics of $L$ , $S$ , and $C$ ; see Christensen et al., (Reference Christensen, Diebold and Rudebusch2011, Proposition 1 and Section 2.3) for the closed-form expressions for the factor loadings.
A key feature of the AFNS model is the connection between the latent factors and the shape of the average force of mortality curve through the structure of $\Delta$ . We can induce factor dependence through the diffusion matrix $\Sigma$ . Specifically, we replace $\Sigma$ with a lower-triangular matrix in the dependent factor case. As such, $B_L$ , $B_S$ , and $B_C$ remain the same in the dependent factor case; see Christensen et al. (Reference Christensen, Diebold and Rudebusch2011, Appendix B) for the formula of $A(t,T)$ .
3.4.1 Arbitrage-free generalized Nelson-Siegel model
The Arbitrage-Free Generalized Nelson-Siegel (AFGNS) model is an extension of the AFNS model proposed by Christensen et al. (Reference Christensen, Diebold and Rudebusch2009), which includes an additional slope and curvature factor. The AFGNS model was proposed as an arbitrage-free version of the four-factor Nelson-Siegel-Svensson model, which extends the AFNS model by adding a second curvature factor. The AFGNS model is thus a five-factor model whose latent factors, in case of independence, satisfy the SDE
where $\delta _1 \neq \delta _2$ . As in the AFNS model, the dependent factor case consists of replacing a diagonal diffusion matrix with a lower triangular one. As in the AFNS model, under the AFGNS model, the mortality intensity is modeled as the sum of the level and the two slope factors
Factor loading expressions for the independent factor case can be found in Christensen et al. (Reference Christensen, Diebold and Rudebusch2009, Proposition 3.1). In the dependent factor case, only the form of $A(t,T)$ changes; this can be found in Christensen et al. (Reference Christensen, Diebold and Rudebusch2009, Appendix).
3.4.2 Arbitrage-free reduced Nelson-Siegel model
We introduce a version of the AFNS model without the curvature factor and call it the Arbitrage-Free Reduced Nelson-Siegel (AFRNS) model. This model has been included in the package in order to assess the effect of the curvature on the resulting model. Experiments we conducted on the mortality data of several countries showed that the presence of the curvature factor may produce negative mortality rates in some cases when projected 25 years ahead, despite the better in-sample performance of the AFNS and AFGNS models.
We consider two latent factors, with dynamics
for the independent factor case. As before, we replace the volatility matrix with a lower-triangular matrix in the dependent factor case. The resulting factor loadings for both the independent and dependent factor cases can be found in Appendix B.2.
3.4.3 Arbitrage-free unrestricted Nelson-Siegel model
We also introduce a variation of the AFNS model where the elements of the drift coefficient matrix $\Delta$ have possibly unequal values. We call this model the Arbitrage-Free Unrestricted Nelson-Siegel (AFUNS) model. The latent factor dynamics under the AFUNS model are given by
As before, the mortality intensity is modeled as the sum of the level and slope factors
We recover the AFNS model by setting $\delta _2 = -\delta _1$ and in the limit as $\delta _3\to \delta _1$ . The dependent factor case is obtained by replacing the volatility matrix by a lower-triangular matrix.
3.5 Cox–Ingersoll–Ross model
Under the CIR model, the mortality intensity is modeled as the sum of the components of the latent factor process $X(t)$ ,
where each $X_i(t)$ is a square-root diffusion process given by
This implies that each component of $X(t)$ is nonnegative $Q$ -almost surely and is strictly positive $Q$ -almost surely if $X_i(0) \gt 0$ and $2\delta _i \theta _i^Q \geq \sigma _i^2$ . Each $X_i(t)$ is asymptotically gamma distributed (as $t\to \infty )$ (Cox et al., Reference Cox, Ingersoll and Ross1985), hence the CIR mortality model can capture the heterogeneity of mortality rates at older ages (Pitacco, Reference Pitacco2016). The factor loadings for the CIR model are given by
where $\vartheta _i = \sqrt{\delta _i^2 + 2\sigma _i^2}$ . Their application in the analysis of Human Mortality Database (HMD) national mortality data can be found in Huang et al. (Reference Huang, Sherris, Villegas and Ziveyi2022) and Ungolo et al. (Reference Ungolo, Garces, Sherris and Zhou2023).
4. Parameter estimation
The model parameters are estimated using data collected in discrete time, as illustrated in Section 2. In the remainder of the paper, we use the real-world probability measure to characterize the dynamics of $X(t)$ . By discretizing the stochastic differential equation of each model, and using the affine representation of the average force of mortality, we obtain the following equally time-spaced state-space formulation:
where $A_{t} = \left [A \left (t,t+1\right ),\ldots, A \left (t,t+N\right )\right ]^\top$ and $B_{t} = \left [B \left (t,t+1\right ),\ldots, B \left (t,t+N\right )\right ]^\top$ . The age subscript has been omitted for notational convenience.
The state equation (4.1) describes the dynamics of the factor as an autoregressive process of order 1 with system matrix $\Phi _{t} =e^{-\boldsymbol{\kappa } j}$ and stochastic noise $\eta _{t}\sim N\left (0,R_t \right )$ . The measurement equation (4.2) describes $\bar{\mu }_{t} \in \mathbb{R}^{N}$ as an affine function of the latent variable $X\left (t\right )$ with error term $\epsilon _{t} \sim N\left (0,H \right )$ . We assume that $\eta _{t}$ and $\epsilon _{t}$ are independently distributed.
For the models so far implemented in AffineMortality, $A$ , $B$ , $H$ , and $\Phi =e^{-\boldsymbol{\kappa } j}$ do not depend on $t$ . For Gaussian models, such as the BS and the AFNS models, we have $R_t=R$ :
For the CIR model, because of the independence between the factors, $R_t$ is a diagonal matrix with $k$ th diagonal $r_{t,k}$ equal to:
Here, $\Phi$ , $A$ , $B$ , $R_t$ , and $H$ depend on the parameters that we estimate based on the statistical inference for the dynamics of the mortality rates. Furthermore, $H$ is a diagonal matrix, where the diagonal elements $\omega ^2_i$ are equal to
for $i=1,\ldots,N$ . In this way, the measurement Equation (4.2) accounts for the increasing variation in the mortality rates at older ages.
The parameters are estimated using maximum likelihood. Let $\psi$ denote the vector of parameters to be estimated. The likelihood function is readily obtained from the univariate Kalman Filter recursion (see Koopman & Durbin, Reference Koopman and Durbin2000 and Ungolo et al., Reference Ungolo, Garces, Sherris and Zhou2023 for its implementation in the context of affine mortality models), given the observed average mortality rates $\bar{\mu }_{1:T}$ :
where $\nu _{t,i} = \bar{\mu }_{t,i} - a_i - b_{i} \widehat{x}_{t,i}$ is the measurement error, and $F_{t,i} = b_i \widehat{\Sigma }_{t,i} b_i^\top + \omega ^2_i$ is the covariance of $\bar{\mu }_{t,i}$ (denoting $\mu _{x}\left (t,t+i\right )$ ) taking into account the uncertainty about the latent state $X\left (t\right )$ . Here, $a_i$ denotes the $i$ th element of $A_t$ , $b_i$ the corresponding row of the matrix $B_t$ , and $\widehat{x}_{t,i}$ and $\widehat{\Sigma }_{t,i}$ are the univariate Kalman Filter updates of the moments of $X\left (t\right )$ (see Appendix A for additional details).
For the CIR mortality model, the estimated parameter vector $\widehat{\psi }$ corresponds to the quasi-maximum likelihood estimator. See Chen & Scott (Reference Chen and Scott2003) and Jevtić & Regis (Reference Jevtić and Regis2021) for additional details.
4.1 Implementation
The parameter estimation task includes the initial state variable $X\left (0\right )$ among the set of unknown parameters. Other numerical tricks to foster reasonable parameter estimates, e.g. ensuring the positive-definiteness of the covariance matrix, are described in Ungolo et al. (Reference Ungolo, Garces, Sherris and Zhou2023).
We recommend the use of multiple starting values due to the high non-linearity of the log-likelihood function, which may have multiple local maxima. Some initial values are provided in AffineMortality through the list object sv_default, which we briefly illustrate in Section 8. These are based on previous analyses of country mortality rates. When fitting dependent factor models, we recommend the use as starting values of the parameter estimates obtained from the correspondent independent factor models.
4.2 affine_fit()
The function affine_fit() of the package AffineMortality allows us to carry out the parameter estimation task. The log-likelihood function of Equation (4.5) is optimized sequentially by group of parameters (Coordinate Ascent) through the gradient-free simplex Nelder-Mead method as recommended by Christensen et al. (Reference Christensen, Diebold and Rudebusch2011). This routine is readily available in R within the function optim.
The function affine_fit() takes the following arguments as input:
-
• model = c(“BS”, “AFNS”, “AFGNS”, “AFUNS”, “AFRNS”, “CIR”, “GMk”), to select one of the model families to be fitted. Its default value is BS;
-
• fact_dep=c(FALSE, TRUE), to select whether the model accounts for factor dependence (default FALSE);
-
• n_factors, to select the number of factors (only for the BS and the CIR models; default set to 3);
-
• data, the rectangular data set of $\bar{\mu }$ rates used for the analysis;
-
• st_val, corresponding to the set of starting values for the parameters. These must be supplied as a list of parameters, e.g.
st_val=list(x0=c(6.960591e-03, 9.017154e-03, 5.091784e-03),
delta=c(0.04268782, −0.03122758, −0.08573677),
kappa=c(1.162624e-02, 6.787268e-02, 5.061539e-03),
sigma=exp(c(−6.806310, −6.790270, −7.559145)),
r1=exp(−3.327060e + 01), r2=exp(−6.086479e-01),
rc=exp(−1.553156e + 01))
for the BS model with three independent factors. For dependent factor models, we instead supply sigma_dg, which is the parameter denoting the standard deviation, and Sigma_cov indicating the elements of the off-diagonal elements of the covariance matrix (generally a vector of zero, as suggested in Section 4.1);
-
• max_iter: maximum number of iterations for the coordinate ascent algorithm (default 200);
-
• tolerance: maximum log-likelihood value increase between iterations such that the optimizer can stop (default 0.1);
-
• wd: working directory to save the intermediate values of the parameters throughout iterations;
This function returns a list with
-
• model: same as input;
-
• fit: list with:
-
– par_est: list of parameter estimates;
-
– log_lik: value of the log-likelihood function;
-
– CA_par: table listing the value of the parameters throughout the coordinate ascent algorithm iterations;
-
-
• n.parameters: total number of estimated parameters;
-
• AIC: value of Akaike information criterion (see Section 5);
-
• BIC: value of Bayesian information criterion;
Using the pre-loaded age-cohort US data set, we can run the function affine_fit() as follows:
data(mu_bar) # - Load the US data set
affine_fit(model=“BS”, fact_dep=FALSE, n_factors=3, data=mu_bar,
st_val=st_val, max_iter=200, tolerance=0.1)
During its execution, the console shows the value of the parameters, the log-likelihood function, and the iteration number:
[1] “X(0)_1 0.005” “X(0)_2 0.006” “X(0)_3 0.004”
[1] “delta_1 0.053” “delta_2 -0.018” “delta_3 -0.087”
[1] “kappa_1 0.032” “kappa_2 0.007” “kappa_3 -0.003”
[1] “sigma_1 0.001” “sigma_2 0.001” “sigma_3 0”
[1] “r1 0” “r2 0.548” “rc 0”
[1] “log_lik 10538.37”
[1] “Iteration 1”
[1] “———————————”
[1] “X(0)_1 0” “X(0)_2 0.007” “X(0)_3 0.007”
[1] “delta_1 0.043” “delta_2 -0.022” “delta_3 -0.086”
[1] “kappa_1 0.017” “kappa_2 0.002” “kappa_3 0.01”
[1] “sigma_1 0.001” “sigma_2 0.001” “sigma_3 0”
[1] “r1 0” “r2 0.556” “rc 0”
[1] “log_lik 10623.92”
[1] “Iteration 2”
5. Goodness of fit
The fitted rates, denoted as $\widehat{\bar{\mu }}_t$ (which are $N$ -dimensional vectors, for $t=1,\ldots,K$ ), of use for the analysis of the goodness of fit of each model with respect to historical data, can be obtained using the function mubar_hat(), which takes the following arguments as input: model, fact_dep, n_factors, parameters, data. Again, the input parameters are supplied as a list, similar to the starting values of affine_fit(). The resulting fitted rates can then be used to analyze the goodness of fit of each model with respect to historical data.
5.1 Numerical measures
AffineMortality considers four goodness-of-fit measures, following Blackburn & Sherris (Reference Blackburn and Sherris2013) and Huang et al. (Reference Huang, Sherris, Villegas and Ziveyi2022):
-
• Akaike information criterion (AIC, Akaike, Reference Akaike1974):
(5.1) \begin{align} \text{AIC} = - 2 \log L \left (\widehat{\psi } \mid \bar{\mu }_{1:K} \right ) + 2 k \end{align} -
• Bayesian information criterion (BIC, Schwarz, Reference Schwarz1978):
(5.2) \begin{align} \text{BIC} = - 2 \log L\left (\widehat{\psi } \mid \bar{\mu }_{1:K} \right ) + 2 k KN \end{align} -
• Root mean squared error (RMSE):
(5.3) \begin{align} \text{RMSE} = \frac{1}{KN} \displaystyle \sum _x \displaystyle \sum _t \left ( \bar{\mu }_{x,t} - \widehat{\bar{\mu }}_{x,t} \right )^2 \end{align} -
• Mean absolute percentage error (MAPE, by age $x$ ):
(5.4) \begin{align} \text{MAPE}_x = \frac{1}{K} \displaystyle \sum _{t=1}^K \frac{|\bar{\mu }_{x,t} - \widehat{\bar{\mu }}_{x,t}|}{\bar{\mu }_{x,t}} \end{align}
where $k$ is the number of parameters, $t=1,\ldots,K$ , and $N$ is the number of ages considered in the analysis.
The AIC and BIC can be obtained as the output of affine_fit() (see Section 4.2). The RMSE is obtained by running RMSE(fitted, observed), where fitted is the $N \times K$ -dimensional matrix of the fitted rates obtained using the function mubar_hat(), and observed is the corresponding matrix of the $\bar{\mu }$ -rates used for parameter estimation. The function MAPE_row(fitted, observed) yields an $N$ -dimensional vector, which can be used to assess how the model fits at every age in the range of interest.
A desirable characteristic of affine mortality models is that their parameters ensure that the probability of negative rates is negligible. This is a potential limitation of Gaussian models, since $X(t)$ can assume any real value.
For this purpose, the function prob_neg_mu() yields an $N$ -dimensional vector with the probability of negative mortality rates at each age (based on the input data set) for a specific $h$ -year ahead projection, based on the simulated values of $X(t)$ . prob_neg_mu() takes as input model, fact_dep, n_factors, parameters, data, years_proj and n_simulations (default value set to 100,000).
5.2 Residuals
The function std_res() returns an $N\times K$ -dimensional matrix of the standardized residuals for the model of interest. These are computed as $N$ -dimensional vectors for each year $t=1,\ldots,K$ as:
Further details about this formula can be found in Ungolo et al. (Reference Ungolo, Garces, Sherris and Zhou2023). A heatmap of the standardized residuals by age and year (Fig. 1) can be generated by using the function heatmap_res():
std_resid <- std_res(model=“BS”, fact_dep=TRUE, n_factors=3,
parameters=pe_BSd_3F$fit$par_est, data=mu_bar)
heatmap_res(residuals=std_resid, color=FALSE)
In this way, we can visually detect the presence of period effects (if analyzing age-cohort data) or of cohort effects (if analyzing age-period data).
6. Projection
The function affine_project() returns the projected survival curve for $h$ time periods ahead (cohort or calendar year, depending on the analyzed data set), based on the optimal forecast of the average force of mortality under the quadratic loss (Christensen et al., Reference Christensen, Diebold and Rudebusch2011):
The corresponding survival probability is given by
where the expectation is taken with respect to the real-world probability measure $P$ using equation (4.1). The function affine_project() is illustrated in Section 8 when analyzing the BS model with three dependent factors for the US dataset. Its input structure is similar to affine_fit(), with the additional argument years_proj corresponding to the $h$ time periods ahead for the projection.
7. Parameter uncertainty
A further source of risk when projecting cohort survival curves is the uncertainty inherited from the parameter estimation process.
The function par_cov(), returns the variance-covariance matrix of the parameter estimates and their corresponding standard errors. It allows the user to choose between two methods for estimating parameter uncertainty, namely multiple imputation and bootstrap.
The first method, described in Ungolo et al. (Reference Ungolo, Garces, Sherris and Zhou2023), can be chosen by setting method=“MI” within par_cov(). It is recommended for Gaussian affine mortality models. Briefly, it consists of a procedure that randomly imputes a value of the latent state variable $X\left (t\right )$ sampled from the smoothing distribution at the value of the parameter estimates. In this way, we obtain a set of “completed” data sets such that parameters are then re-estimated. The number of completed data sets can be specified through the argument D_se. This method turns out useful, because, on one hand, it may not be possible to numerically compute the information matrix from the optimization process of the likelihood function due to its very flat surface. On the other hand, the alternative bootstrap method (briefly described later) may be computationally expensive if carried out hundreds of times, as recommended in practice. The downside of multiple imputation is that, unlike the bootstrap, it may tend to underestimate the standard errors since it is a delta method. From a computational perspective, a potential downside is the need to invert a Hessian matrix of larger dimensions, although this task is simpler compared to the inversion of the Hessian matrix from the estimation procedure (whose likelihood is marginalized with respect to the latent states). This method is not recommended, nor implemented for the CIR model, due to the lower truncation of the latent variable $X\left (t\right )$ .
When parameter uncertainty is assessed by multiple imputation, the function par_cov returns a list with two elements: Cov_par_est, the variance-covariance matrix of the parameters, and St_err, which is a list of the standard error of the parameters.
The bootstrap method draws on the work of Stoffer & Wall (Reference Stoffer and Wall2009), and was used by Blackburn & Sherris (Reference Blackburn and Sherris2013) in the context of affine mortality models. It can be implemented in AffineMortality by specifying the argument method=“Bootstrap” in the function par_cov(). In few words, this method consists of an iterative procedure that first computes the standardized innovations from the measurement error, in order to obtain a bootstrapped dataset of average mortality rates. These are used to obtain a new set of parameter estimates. Hence, once n_BS parameter estimates are obtained, their variance-covariance is computed. In this case, the function par_cov() provides an additional element given by the parameter estimates over the bootstrapped samples. The argument t_excl (default value set to 4 as suggested in Stoffer & Wall, Reference Stoffer and Wall2009) sets the number of the oldest residuals in terms of $t$ to adjust for the Kalman Filter startup irregularities.
Both multiple imputation and bootstrap algorithms are initialized at the parameter estimates in order to better explore the likelihood surface in the neighborhood of $\widehat{\psi }$ . The use of both methods is illustrated in Section 8.
We recommend using the bootstrap for models with a large number of parameters, such as the AFGNS, and for the CIR model with any number of factors.
8. Illustration
AffineMortality provides the data set of the average age-cohort mortality rates for the US males aged 50–99 born in the years 1883–1915 analyzed in Ungolo et al. (Reference Ungolo, Garces, Sherris and Zhou2023), which can be loaded as follows:
data(mu_bar)
This dataset will be used for illustrating the analysis of the BS model with three dependent factors.
For this age-cohort dataset, the cell at the intersection of the row corresponding to age $y$ and of the column corresponding to birth year $t$ contains the average of the mortality rates for the corresponding cohort between the base age 50 and $y$ .
Parameter estimation
A critical aspect of the analysis of affine mortality models is the specification of the starting values for the algorithm. As emphasized in Ungolo et al. (Reference Ungolo, Garces, Sherris and Zhou2023) and Blackburn & Sherris (Reference Blackburn and Sherris2013), the likelihood function can have multiple local optima, hence the fitting algorithm should be initialized several times. In AffineMortality, we provide a set of starting values (sv_default in an R list format), which can be used by the researcher for a first exploration of the models.
For example, to get the default starting values for the BS model with dependent factors, we can run the following code:
starting_values <- sv_default$BSd
For the other supported models, we can use sv_default$
-
• BSi for the BS model up to four factors;
-
• AFNSi and AFNSd for the AFNS model with independent and dependent factors, respectively;
-
• GMki and GMkd for the Gompertz-Makeham model with independent and dependent factors, respectively;
-
• AFGNSi and AFGNSd for the AFGNS model with independent and dependent factors, respectively. Similar for the AFRNS and AFUNS models;
-
• CIR for the CIR model up to four factors;
As highlighted in Section 4.1, a general recommendation when analyzing models with dependent factors is to initialize affine_fit() with the parameter estimates of the corresponding independent factor models, and to set the starting values of the additional parameters (such as the off-diagonal elements of the covariance matrix of $X(t)$ and of the mean reversion matrix $\Delta$ ) to zero.
We can thus estimate the parameters with affine_fit() as follows:
pe_BSd_3F <- affine_fit(model=“BS”, fact_dep=TRUE, n_factors = 3,
data=mu_bar, st_val=starting_values, max_iter = 5, tolerance = 0.1,
wd=“working_folder_directory”)
In practice, we run the fitting process for a larger number of iterations. For example, in Ungolo et al. (Reference Ungolo, Garces, Sherris and Zhou2023), the authors set max_iter = 200.
Goodness-of-fit analysis
Once we obtain the object pe_BSd_3F, we run the following lines to obtain the AIC and the BIC
>pe_BSd_3F$AIC
-21405.58
>pe_BSd_3F$BIC
-21292
This output can be used to make comparisons between different models. Suppose we want to compare it with the AFNS model with independent factor, then we run the following lines:
pe_AFNSi <- affine_fit(model=“AFNS”, fact_dep=FALSE, st_val=sv_default$AFNSi,
data=mu_bar, max_iter = 5, tolerance = 0.1)
>pe_AFNSi$AIC
-20842.76
>pe_AFNSi$BIC
-20772.45
Since the BS model with dependent factors has a smaller value of both AIC and BIC compared to the AFNS model with independent factors, then we conclude that the former shows a better in-sample fit.
The fitted average mortality rates can be then obtained by using mubar_hat()
fitted_BSd <- mubar_hat(model=“BS”, fact_dep=TRUE, n_factors = 3,
parameters=pe_BSd_3F$fit$par_est, data=mu_bar)
The fitted average mortality rates obtained using mubar_hat() can be used to calculate the RMSE and the MAPE for each age in the range of interest:
>RMSE(mu_bar, fitted_BSd)
[1] 0.002011956
MAPE_age(mu_bar, fitted_BSd)
which can be similarly used for comparing the models.
Furthermore, we can obtain the fitted $\mu$ -rates by using the function avg2rates() described in Section 2:
avg2rates(fitted_BSd)
In order to ensure that the probability of negative rates for the projected 10-years ahead cohort is negligible for the model under analysis, we run the function prob_neg_mu()
>prob_neg_mu(model=“BS”, fact_dep=TRUE, n_factors = 3,
parameters=pe_BSd_3F$fit$par_est, data=mu_bar, years_proj = 10,
n_simulations = 1000)
[1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
This shows vector of zeros indicating that the fitted model has zero probability of yielding negative rates for the cohort born in 1925.
Projection
The vector of the 1-year ahead (hence for the US males born in 1916) survival rates for the age-range of interest can be obtained by using the affine_project() function and then plotted (Fig. 2) as follows:
BSd_3F_proj <- affine_project(model=“BS”, fact_dep=TRUE, n_factors = 3,
parameters=pe_BSd_3F$fit$par_est, data=mu_bar, years_proj = 1)
plot(rownames(mu_bar), BSd_3F_proj, type=“l”, ylab = “S(t)”, xlab = “Age”)
Parameter uncertainty
The estimation of the parameter uncertainty by multiple imputation through the function par_cov() described in Section 7 can be carried out as follows:
par_unc_MI <- par_cov(method=“MI”, model=“BS”, fact_dep=TRUE, n_factors = 3,
parameters=pe_BSd_3F$fit$par_est, data=mu_bar, D_se = 5, max_iter = 10,
tolerance = 0.1, wd = 0)
We recommend setting the number of imputations D_se at least equal to 50 to obtain a more robust estimate of the parameter uncertainty. In this illustration, we have the following set of standard errors of the parameters as an R list object:
>par_unc_MI$St_err
$delta
[1] 0.0038258337 0.0738215285 0.0024289977 0.0529585515 0.0020585681 0.0004005845
$kappa
[1] 0.041057554 0.010218757 0.004028687
$sigma_dg
[1] 1.850683e-06 5.515050e-05 4.927546e-05
$Sigma_cov
[1] 0.000446849 0.015596337 0.005928911
$r1
[1] 8.016628e-16
$r2
[1] 0.005309632
$rc
[1] 1.330768e-09
Similarly, we can also estimate the parameter uncertainty by using the bootstrap method:
par_unc_Bts <- par_cov(method=“Bootstrap”, model=“BS”, fact_dep=TRUE,
n_factors = 3, parameters=pe_BSd_3F$fit$par_est, data=mu_bar, t_excl = 4,
BS_s = 5, max_iter = 3, tolerance = 10, wd = 0)
When using the bootstrap in practice, Blackburn & Sherris (Reference Blackburn and Sherris2013) set the number of bootstrap samples BS_s equal to 500, while the default value of the number of residuals to be excluded t_excl = 4 follows from Stoffer & Wall (Reference Stoffer and Wall2009). As for the function affine_fit(), the arguments max_iter and tolerance are usually set to 200 and 0.1, respectively, although the user can set any desired value.
9. Other functions
The function xfilter has the same input structure as mubar_hat and returns a list of conditional mean and covariance of the filtering distribution of the latent variable as obtained from the application of the univariate Kalman–Filter of Koopman & Durbin (Reference Koopman and Durbin2000). More precisely, the matrix X_t returns the value of $\mathbb{E}\left [X\left (t\right ) \mid \bar{\mu }_{1:t}\right ]$ (time-update step), for $t=0,\ldots,K$ while the matrix X_t_c returns the value of $\mathbb{E}\left [X\left (t\right ) \mid \bar{\mu }_{1:t-1}\right ]$ (forecasting step). S_t and S_t_c are the corresponding covariance matrix of $X\left (t\right )$ . For a brief illustration, this can be run as follows:
X_filtered <- xfilter(model=“BS”, fact_dep=TRUE, n_factors = 3,
parameters=pe_BSd_3F$fit$par_est, data=mu_bar)
The function xsmooth implements the Rauch-Tung-Striebel (Rauch et al., Reference Rauch, Tung and Striebel1965) smoothing procedure to obtain the conditional mean and covariance matrix of the distribution of $X\left (t\right )$ conditional to $\bar{\mu }_{1:T}$ , that is, the entire time series of the observations. It uses as input the results from the xfilter function and the value of the parameter $\mathbf{\kappa }$ driving the dynamics of the SDE illustrated in Section 3 under the real-world probability measure.
X_smoothed <- xsmooth(filterobject=X_filtered,
kappa=pe_BSd_3F$fit$par_est$kappa)
10. Conclusion and further developments
This paper describes the AffineMortality R package, which allows the user to estimate, compare, project, and assess the uncertainty of affine mortality models. These models can be analyzed from an age-period as well as from an age-cohort perspective. The package can be used to support researchers to answer a wide range of questions involving stochastic mortality, including pricing of mortality-contingent securities (Xu et al., Reference Xu, Sherris and Ziveyi2020a and Reference Xu, Sherris and Ziveyi2020b), risk management of mortality-contingent products, assessment of the natural hedging of life insurance policies, and life annuities (Blackburn et al., Reference Blackburn, Hanewald, Olivieri and Sherris2017 and Sherris et al., Reference Sherris, Xu and Ziveyi2020), as well as the design of innovative mortality pooling products.
The authors plan to further expand the range of models that can be fitted, such as the Squared Gaussian model used in interest rates modeling (Leippold & Wu, Reference Leippold and Wu2002), and incorporate other features, such as cohort-specific factors and other age-dependent models. Further additions will encompass the possibility to account for incomplete cohort data and the inclusion of models whose mean-reversion parameter is non zero.
Another strand of future developments of AffineMortality includes the possibility of using alternative optimization methods, such as the Subplex algorithm of Rowan (Reference Rowan1990), which is available in R through the package nloptr (Johnson, Reference Johnson2020).
Acknowledgments
The authors are grateful to Dr. Patrick J. Laub for his valuable help and insights in making the package available.
Data availability statement
The data and code supporting the findings of this study are openly available on GitHub at ungolof/AffineMortality. The results contained in the manuscript are reproducible, excluding environment-specific numerical errors. The ungolof/AffineMortality folder is registered with the unique Zenodo DOI https://doi.org/10.5281/zenodo.10828419.
Funding statement
Francesco Ungolo acknowledges financial support from ERGO Center of Excellence in Insurance. Michael Sherris and Yuxin Zhou acknowledge financial support from the Society of Actuaries Center of Actuarial Excellence Research Grant 2017-2020: Longevity Risk: Actuarial and Predictive Models, Retirement Product Innovation, and Risk Management Strategies; and from the Australian Research Council Discovery Grant DP170102275 Retirement Product Innovation. MS, Len Patrick Garces, and YZ also acknowledge support provided by the Australian Research Council Centre of Excellence in Population Ageing Research project number CE170100005.
Appendix
A. Univariate Kalman Filtering
Following the outline in Ungolo et al. (Reference Ungolo, Garces, Sherris and Zhou2023), the $i$ th element of the vector $\bar{\mu }_t$ can be written as:
Hence, the state equation corresponding to each observation $\bar{\mu }_{t,i}$ is:
for $i=1,\ldots N-1$ and $t=1,\ldots,K$ , given initial state $x_{0,N}=X\left (0\right )$ . Let $\bar{\mu }_{1:t} = \left [\bar{\mu }_{1}, \ldots,\bar{\mu }_{t} \right ]$ and $\bar{\mu }_{t,1:i} = \left [\bar{\mu }_{t,1}, \ldots,\bar{\mu }_{t,i} \right ]$ .
Given initial state $x_{0,N}\;:\!=\;X\left (0\right )$ and initial conditional covariance $\Sigma _{0,N} = \text{diag}\left (10^{-10},\ldots,10^{-10} \right )$ :
-
1. Forecasting ( $i=1$ only):
(A.3) \begin{align} \widehat{x}_{t,1} &=\mathbb{E}\left (x_{t,1} \mid \bar{\mu }_{1:t-1} \right ) = \Phi \widehat{x}_{t-1,N}, \\[5pt] \widehat{\Sigma }_{t,1} &=\mathbb{V}\left (x_{t,1} \mid \bar{\mu }_{1:t-1} \right ) = \Phi \widehat{\Sigma }_{t-1,N} \Phi ^\top + R;\nonumber \end{align} -
2. Time-update ( $i=1,\ldots,N-1$ on the left-hand side):
(A.4) \begin{align} \widehat{x}_{t,i+1} &= \mathbb{E}\left (x_{t,i+1} \mid \bar{\mu }_{1:t-1}, \bar{\mu }_{t,1:i} \right ) = \widehat{x}_{t,i} + K_{t,i} \nu _{t,i}, \\[5pt] \widehat{\Sigma }_{t,i+1} &=\mathbb{V}\left (x_{t,i+1} \mid \bar{\mu }_{1:t-1}, \bar{\mu }_{t,1:i} \right ) = \widehat{\Sigma }_{t,i} - K_{t,i} F_{t,i} K_{t,i}^\top \nonumber \\[5pt] &= \left (I - K_{t,i} b_i \right )\widehat{\Sigma }_{t,i}\left (I - K_{t,i} b_i \right )^\top + K_{t,i} \omega ^2_i K_{t,i}^\top,\nonumber \end{align}where the scalar quantities $\nu _{t,i}$ and $F_{t,i}$ , and the $M \times 1$ -dimensional vector $K_{t,i}$ , are given by(A.5) \begin{align} \nu _{t,i} &= \bar{\mu }_{t,i} - a_i - b_{i} \widehat{x}_{t,i}, \\[5pt] F_{t,i} &= b_i \widehat{\Sigma }_{t,i} b_i^\top + \omega ^2_i, \nonumber \\[5pt] K_{t,i} &= \widehat{\Sigma }_{t,i} b_i^\top F^{-1}_{t,i}.\nonumber \end{align}
B. Factor Loading Expressions
B.1 Gompertz-Makeham model
In the independent-factor Gompertz-Makeham model, the factor loadings are given by
In the dependent factor case, we have
where, for convenience, we define
B.2 Arbitrage-Free Reduced Nelson-Siegel (AFRNS) model
In the independent factor case, the factor loadings are given by
On the other hand, in the dependent factor case, we have
B.3 Arbitrage-Free Unrestricted Nelson-Siegel (AFUNS) model
In the independent factor case, we have the following expressions for the factor loadings
Meanwhile, the expression for $A(t,T)$ in the dependent factor case is given by
where we have