1. Introduction
Defined contribution (DC) pension plans constitute key part of pension systems in many countries, such as the “401k plan” in the United States and the “personal pensions” in the United Kingdom. In a DC plan, each participant owns her account and pays fixed contributions to the account regularly, and the accumulated contributions are invested in a financial market. The amount of retirement benefits is determined by the market value of the individual account at the retirement date. By design, DC plans have advantages over traditional pension schemes such as pay-as-you-go (PAYG) and defined benefit (DB) pension plans in terms of the transparency, fairness, portability, and sustainability, which encourage the prevalence of DC plans.
However, DC plans have major issues arising from that each participant bears the investment risk by herself. One issue is that the participant may not choose a good investment strategy due to the lack of financial expertise. Indeed, a typical DC plan participant in reality tends to perform the naive “ $1/n$ diversification,” equally dividing her contributions into the default options provided by the plan (Benartzi and Thaler Reference Benartzi and Thaler2001). Optimal investment strategies studied in the literature (e.g., Merton Reference Merton1971; Cairns Reference Cairns1996; Cairns et al. Reference Cairns, Blake and Dowd2006; Boulier et al. Reference Boulier, Huang and Taillard2001; Vigna and Haberman Reference Vigna and Haberman2001; Chen and Delong 2015; Menoncin and Vigna Reference Menoncin and Vigna2017) can thus not be easily employed by such participants.
Another major issue of DC plans is the incapability of intergenerational risk sharing (IRS), which is to diversify non-diversifiable risks within one generation (e.g., those caused by economic shocks) across different generations. For example, a DC plan participant whose accumulation period overlaps an economic depression faces a financial risk that cannot be diversified by herself; even if she can choose an optimal utility-maximizing investment strategy, she may not accumulate enough wealth for retirement. IRS enables diversifying such non-diversifiable risks by sharing the risks among different generations. By definition, however, IRS requires the pension scheme to be collective, and thus individual DC plans cannot implement IRS.
It is well known that carefully designed collective schemes can implement IRS to improve the welfare of participants (e.g., Gordon and Varian Reference Gordon and Varian1988; Allen and Gale Reference Allen and Gale1997; Shiller Reference Shiller1999). Gollier (Reference Gollier2008) shows that a collective DC pension plan can improve the welfare of participants compared with an individual benchmark using the optimal life cycle investment strategy. Similarly, Cui et al. (Reference Cui, De Jong and Ponds2011) study a collective DB-based hybrid pension scheme where both contributions and pension benefits may be adjusted and show that it is welfare-improving compared with an optimal individual benchmark. Chen et al. (Reference Chen, Beetsma, Ponds and Romp2016) consider a three-pillar pension system in which the second pillar is a collective hybrid plan and show that there is welfare improvement compared with a corresponding individual DC benchmark. See Barr and Diamond (Reference Barr and Diamond2008, Chapter 7) and Beetsma and Romp (Reference Beetsma and Romp2016) for an overview of IRS and further references.
While the seminal work of Gollier (Reference Gollier2008) shows that IRS in a collective DC pension system is welfare-improving, both his first-best and second-best strategies depend on rather strong assumptions. The first-best strategy attains IRS by enhancing the risk-taking ability of the pension fund, by treating the net present value of the contributions from all the future generations as part of the fund’s total wealth. Consequently, the fund can invest more than the fund’s actual wealth, that is, the fund can perform borrowing for investment, similar to the life cycle investment strategy for an individual investor (Merton Reference Merton1971). However, borrowing is not realistic for pension funds in reality. On the other hand, his second-best strategy does not allow borrowing but assumes the existence of an outside entity (shareholders) that helps finance the fund. Therefore, it is not clear whether the welfare improvement by the second-best strategy can be attained without the shareholders.
Given these limitations of Gollier’s analyses, one may ask: Can the welfare improvement by IRS be attained by a fully funded, collective DC pension system with a realistic investment strategy (e.g., no borrowing) and without an outside entity that helps finance the pension fund? Previous related works do not exactly answer this question, as discussed later in detail. For example, while Chen et al. (Reference Chen, Beetsma, Ponds and Romp2016) show that their collective scheme with IRS is welfare-improving compared with the corresponding individual DC scheme, it is assumed that both the collective and individual schemes use the same investment strategy; therefore, it is not clear whether their IRS is welfare-improving as compared with the optimal individual investment strategy.
Our main aim is to investigate the above question. To this end, we consider a stylized model for a fully funded, collective DC pension fund with multiple overlapping generations, which we call the IRS-DC model. As Gollier (Reference Gollier2008), each participant pays a fixed annual contribution to the pension fund, and the fund makes investment on behalf of the participants. Different from the first-best strategy of Gollier (Reference Gollier2008), however, the fund is not allowed to perform borrowing. Each participant has her own account in the fund, which accumulates her contributions and is indexed to the fund’s investment performance; the account value at the retirement date determines her pension benefit. The indexation rate of individual accounts is automatically adjusted to the (notional) funding ratio of the pension fund by the adjustment rule of Goecke (Reference Goecke2013). This automatic adjustment rule is the device for implementing IRS in our model. In contrast to the second-best strategy of Gollier (Reference Gollier2008), the fund is fully funded and does not rely on any external entity to implement IRS.
We analyze how the automatic adjustment rule stabilizes the funding ratio and the benefits of participants to attain IRS. Analytic expressions are derived for the funding ratio and benefits. It is shown that there is a trade-off between the stability of the funding ratio and that of benefits, and that this trade-off is controlled by the strength of the automatic adjustment rule. That is, benefits can be made more stable by increasing the volatility of the funding ratio, and vice versa. IRS can be attained by balancing this trade-off.
Automatic adjustment rules in pension systems have been not only studied in the literature (e.g., Cui et al. Reference Cui, De Jong and Ponds2011; Chen et al. Reference Chen, Beetsma, Ponds and Romp2016; Bams et al. Reference Bams, Schotman and Tyagi2016; Donnelly Reference Donnelly2017) but also applied to real pension systems in such countries as Sweden and the Netherlands [OECD, 2021, Chapter 2]. They are used for improving the sustainability of a pension fund and for providing stable benefits to participants (e.g., Settergren Reference Settergren2001; Barr and Diamond Reference Barr and Diamond2011). However, a formal analysis is missing for justifying such use of automatic adjustment rules in collective pension systems. Our analysis thus provides a first step in this regard.
The IRS-DC model has two parameters, one for the investment strategy and the other for the automatic adjustment rule. For optimizing these parameters, we define an expected utility maximization problem that involves the benefits of all the generations including those in the future, following Gollier (Reference Gollier2008). As this optimization problem cannot be solved analytically, we solve it numerically using Bayesian optimization (BO), a machine learning approach to optimizing a black-box function (e.g., Shahriari et al. Reference Shahriari, Swersky, Wang, Adams and De Freitas2016). As discussed later, the use of BO is our computational contribution, in line with the recent deployments of machine learning in the insurance literature (e.g., Hainaut Reference Hainaut2018; Gabrielli Reference Gabrielli2020; Wüthrich Reference Wüthrich2020; Scognamiglio Reference Scognamiglio2022; Schnürch and Korn Reference Schnürch and Korn2022).
To answer the question above, our main finding is that IRS can improve the welfare of participants without borrowing and shareholders, if the financial market is volatile and the participants are risk-averse; IRS may not be welfare-improving if this condition is not satisfied. We compare the welfare of the IRS-DC plan participants and the welfare of the corresponding individual DC plan participants, where the latter uses the optimal life cycle strategy (Merton Reference Merton1971). Several different settings of the financial market and the risk aversion of participants are investigated, and the above finding is obtained.
The paper proceeds as follows. Section 2 introduces the IRS-DC pension model, which is analyzed in Section 3. Section 4 explains the expected utility maximization problem, how to solve it with BO, and the setup for simulations. Section 5 presents numerical analyses, including the funding ratio process, the individual benefit accounts of the IRS-DC fund, and the certainty equivalents of the participants. Section 6 concludes. The appendix contains a short tutorial on BO, the proofs of analytic results, and additional numerical analyses.
2. Pension model
This section describes the IRS-DC pension model. The pension fund contains multiple overlapping generations, where there are always incoming and outgoing generations. Each generation pays fixed contributions annually to the pension fund. Before explaining the details, we summarize below the key features of the pension fund:
-
(i) The pension fund collectively invests the contributions from different generations (participants) in a financial market.
-
(ii) Each participant maintains her account in the pension fund that records her accumulated pension rights.
-
(iii) The growth rate of individual accounts is automatically adjusted based on the fund’s investment performance and a notional funding ratio so that IRS is implemented.
Section 2.1 describes the IRS-DC pension model in detail. Section 2.2 compares it with related pension models in the literature.
2.1. Description of the IRS-DC pension model
Figure 1 provides a schematic illustration of our pension model. The pension fund covers N overlapping working generations in each operating year (e.g., $N = 40$ ). The pension fund is fully funded. For simplicity, we assume that each generation consists of one hypothetical participant. Let $t \geqslant 0$ denote the time, with the unit being 1 year. We assume that the fund starts at time $t = 0$ with N initial generations.
2.1.1. Generation identifier
We use an integer $i \in \mathbb{N}$ as the identity of the generation who retires at time $t = i$ (see Figure 1). Namely, the generation i joins the fund at time $t = i- N$ and leaves the fund at time $t = i$ ; thus, this generation is in the fund for N years. Using this notation, we can define the set of all the working generations in the fund at any time point $t \geqslant 0$ as:
where [t] denotes the integer part of t (e.g., if $t = 3.4$ then $[t] = 3$ ; if $t = 3$ then $[t] = 3$ ).
2.1.2. Financial market and pension asset dynamics
We consider a financial market where there exist two investment opportunities: a risky asset (e.g., a stock) and a risk-free asset (e.g., a bank account or a bond), denoted by S(t) and F(t), respectively. Specifically, we consider the Black–Scholes market, where S(t) is driven by a diffusion process with constant drift $\mu > 0$ and volatility $\sigma > 0$ , while F(t) develops at a risk-free rate $r > 0$ such that $\mu > r$ :
where Z(t) is a standard Brownian motion under the real-world probability measure.
Let A(t) denote the asset of the pension fund at time t, with $A(0) > 0$ being the initial asset. For simplicity, we assume that the fund invests a constant fraction $\pi \in [0,1]$ of the pension asset A(t) in the stock S(t) and the rest $1-\pi$ in the risk-free asset F(t); thus, we can write the dynamics of the pension asset as:
We assume that the fund is prohibited from from borrowing ( $\pi>1$ ) and short selling ( $\pi<0$ ).
At the beginning of each year $t = 0, 1, 2 \dots$ , each working generation pays a constant amount of contribution, $c > 0$ , to the pension fund. Since there are N working generations, the fund thus receives a total of Nc contributions at the beginning of each year. At the same time, the fund pays a lump-sum benefit to the generation t, who retires at time t.
The dynamics of the pension asset can thus be written as:
where (2.4) is obtained by substituting (2.1) and (2.2) into (2.3), $A(t)_+ \;:\!=\; \lim_{\varepsilon \to +0} A(t+\varepsilon)$ denotes the right continuous limit, and $B_t(t)$ is the benefit paid to the generation t who has just retired and defined in (2.6) and (2.7) below (the double t notation of $B_t(t)$ is deliberate and its meaning will be clear shortly).
Therefore, the pension asset develops continuously over time $t >0$ , while there is a jump at each integer time $t = 0, 1, 2, \ldots$ (i.e., at the beginning of each year) when there are incoming and outgoing cash flows of Nc and $B_t(t)$ , respectively.
2.1.3. Individual accounts and retirement benefits
Like pure DC and notional DC plans, each participant (generation) in our IRS-DC pension fund has her individual account, which keeps track of her pension rights. It records her annual contributions and grows according to the indexation rate (2.11) defined below. The terminal value of the account at the time of retirement becomes the lump-sum retirement benefit.
More formally, let $B_i(t)$ denote the individual account of generation $i \in \mathbb{N}$ at time $t \in [i-N, i]$ , which starts from $B_i(i-N) = 0$ when this generation enters the fund at time $t = i-N$ . Then, we define its dynamics as:
where g(t) is the indexation rate at time t, defined in (2.11) below. Namely, the individual account $B_i(t)$ grows continuously from the entry time $t = i - N$ until retirement at $t = i$ according to the indexation rate g(t) as (2.7), while the account accumulates the annual contribution c at the beginning of every year as (2.6). The account value at the time of retirement $t = i$ , that is, $B_t(t)$ , is the retirement benefit of the generation i; see (2.5).
2.1.4. Indexation rate and notional liability
We now define the indexation rate g(t) that determines the growth rate of individual accounts as in (2.7), which in turn affects the pension’s asset dynamics in (2.5). To this end, following Bams et al. (Reference Bams, Schotman and Tyagi2016) and Donnelly (Reference Donnelly2017), we first define a notional liability of the fund as:
That is, we define the notional liability L(t) of the fund at time t as the sum of individual accounts $B_i(t)$ for the current working generations $i \in I_w(t)$ .
If there is no cash flow in (2.5), the solution to the asset process (2.4) is given by stochastic exponential, which is a classic result in financial mathematics (e.g., Karatzas and Shreve Reference Karatzas and Shreve1991, Chapter 5), as:
where $\tilde{\mu} > 0$ and $\tilde{\sigma} > 0 $ are constants defined as:
Following Goecke (Reference Goecke2013, Equation (5)),Footnote 1 we then define the indexation rate g(t) as:
where $\theta > 0$ is a constant, $\tilde{\mu}$ is from (2.10), A(t) is the pension fund’s asset process (2.4) (2.5), and L(t) is the notional liability (2.8).
Figure 2 provides a schematic illustration of the cash flows in our pension model for two time points $t = i$ and $t = i + 1$ . In the following, let us take a closer look at the role of various important factors.
2.1.5. The role of the indexation rate
While we will present a more formal analysis in Section 3, we provide here an intuitive discussion of how the indexation rate (2.11) works. As defined in (2.7), the indexation rate g(t) controls the growth rate of individual accounts $B_i(t)$ . The first term $\tilde{\mu}$ in (2.11) is the expected annual log return using the same investment strategy $\pi$ in the financial market without participating in the pension fund. In the second term, $A(t) / L(t)$ is the notional funding ratio that quantifies the balance between the asset A(t) and the notional liability L(t). The second term adjusts the growth rate of individual accounts $B_i(t)$ , and the parameter $\theta \geqslant 0$ specifies the strength of the adjustment.
If $\theta = 0$ , then individual accounts grow deterministically at the rate $\tilde{\mu}$ ; therefore in this case, the retirement benefits are ex ante determined, that is, the pension plan becomes a DB plan, thus removing the investment risk of pension participants. However, it risks the sustainability of the pension fund, as the fund is of a DC type by design, and thus there is no way of adjusting the contributions when the fund is underfunding.
On the other hand, if one sets a large value of $\theta > 0$ , then pension participants bear more investment risks to improve the sustainability of the pension fund. For example, suppose that the pension asset exceeds the notional liability, that is, $A(t) > L(t) = \sum_{i \in I_w(t)}B_{i}(t)$ . One can interpret this situation as that the fund yields a high return in the investment and thus there is a “surplus.” Then the log notional funding ratio becomes positive, $\ln (A(t) / L(t)) > 0$ , and the indexation rate g(t) shall be larger than $\tilde{\mu}$ ; therefore, individual accounts $B_i(t)$ grow faster, reflecting the high investment return. On the other hand, if $A(t) < L(t) = \sum_{i \in I_w(t)}B_{i}(t)$ , which happens when the fund yields a low return and thus there is a “deficit.” In this case, we have $\ln (A(t) / L(t)) < 0$ , and thus the indexation rate g(t) shall be smaller than $\tilde{\mu}$ ; therefore, individual accounts $B_i(t)$ grow more slowly, reflecting the low investment return.
This argument implies that the adjustment parameter $\theta$ should be neither too small nor too large. One should choose $\theta$ appropriately to achieve a good trade-off between the risks of individual participants and the pension fund. We present a more formal analysis in Section 3.
2.2. Comparison with related pension models
We compare the IRS-DC model with related pension models in the literature. Goecke (Reference Goecke2013) studies the indexation rate (2.11) for the return smoothing in a self-financing pension plan. Goecke (Reference Goecke2013)’s model consists only of one generation, and there exists no cash flow of contributions and payments. The earlier work by Baumann and Müller (Reference Baumann and Müller2008) considers the indexation rate (2.11) where the risk-free rate r is used instead of $\tilde{\mu}$ . Our model is a continuous-time version of the discrete-time model of Bams et al. (Reference Bams, Schotman and Tyagi2016), which itself is an extension of the overlapping generations model of Gollier (Reference Gollier2008). Bams et al. (Reference Bams, Schotman and Tyagi2016) use the indexation rate in (2.11) but do not analytically study its use. Donnelly, (Reference Donnelly2017) considers a funded collective DC pension plan. Donnelly, (Reference Donnelly2017)’s model consists of fixed multiple overlapping generations, and there exists no new incoming generation. Donnelly, (Reference Donnelly2017) uses an automatic adjustment rule similar to Goecke, (Reference Goecke2013)’s (and thus ours) but is different in its concrete form.
Cui et al. (Reference Cui, De Jong and Ponds2011) consider a funded DB-based hybrid pension system that can adjust both benefits and contributions. Their pension model does not have individual accounts. The present value of base benefits and contributions are made equivalent ex ante, but there is no direct link between one’s actual benefits and contributions. Their model is DB-based in this sense. They use automatic adjustment rules for contributions and retirement benefits based on the funding ratio. Chen et al. (Reference Chen, Beetsma, Ponds and Romp2016) consider a hybrid pension plan as their second-pillar pension system, in which there exist individual accounts. They also use automatic adjustment rules for contributions and individual accounts’ indexation rates. While the adjustment rules of Cui et al. (Reference Cui, De Jong and Ponds2011) and Chen et al. (Reference Chen, Beetsma, Ponds and Romp2016) are conceptually similar to ours, they are different in their forms. For example, Chen et al. (Reference Chen, Beetsma, Ponds and Romp2016) use the “tangent hyperbolic adjustment function,” while our adjustment rule is based on the log notional funding ratio. Moreover, Cui et al. (Reference Cui, De Jong and Ponds2011) and Chen et al. (Reference Chen, Beetsma, Ponds and Romp2016) define the liabilities in a DB manner, taking into account future retirement benefits, while we define our notional liability in a DC manner, that is, as the sum of current individual account values. Again, our liability is notional, since the fund does not provide any promise on retirement benefits.
Automatic adjustment mechanisms have been implemented in real pension systems; see OECD, (2021, Chapter 2) for an overview. Notably, Sweden’s first-pillar notional DC pension system uses an automatic adjustment rule for the indexation rate of individual accounts Settergren (Reference Settergren2001). This adjustment mechanism is based on a notional funding ratioFootnote 2. and is conceptually similar to other rules discussed here (see e.g., Hagen Reference Hagen2013, Equations (6.2) and (6.3)). The Swedish first-pillar notional DC pension system defines its notional liability essentially in the same way as (2.8) (Settergren Reference Settergren2001, Equation (3)).Footnote 3
3. Analysis
We present an analysis of the IRS-DC pension model in Section 2, focusing on the role of the indexation rate (2.11) for achieving IRS. In particular, we study how the adjustment parameter $\theta$ in the indexation rate impacts the funding ratio, which measures the stability of the pension fund, and the retirement benefits of individual participants.
In Section 3.1, we first study the dynamics of the log funding ratio. Based on this, we analyze the effects of the adjustment parameter on the dynamics of the funding ratio in Section 3.2 and on the retirement benefit of an individual participant in Section 3.3. In the latter, we obtain an analytic expression of the retirement benefit in terms of the log funding ratio and the adjustment parameter. Based on this expression, we compare the retirement benefits of the IRS-DC plan and the corresponding pure DC plan in Section 3.4. This last analysis provides insights into how IRS works in the IRS-DC plan.
3.1. Dynamics of the log funding ratio
We start by analyzing the dynamics of the log funding ratio defined as:
Goecke, (Reference Goecke2013, Proposition A.1) shows that $\rho(t)$ is an Ornstein–Uhlenbeck process under the assumption that there exists no cash flow. Baumann and Müller (Reference Baumann and Müller2008, Section 3.1) obtain a similar result, but again assuming no cash flow. Since our model involves explicit cash flows as in (2.5), these earlier results are not directly applicable. Nevertheless, we show here that $\rho(t)$ in our model is also an Ornstein–Uhlenbeck process if the time t is between integer time points. (Recall that cash flows in our model occur only at integer time points; see (2.5)). This result and intermediate derivations are later used for deriving further results, so we present them here for completeness.
Let $t_0 \in \mathbb{N}$ be an arbitrary integer time point, which corresponds to the beginning of a year. Then by (2.4), with $A(t_0)_+$ being the initial value after the contributions, the asset process A(t) for $t_0 < t < t_0 + 1$ is written as:
Notice the difference from the previous expression (2.9), which starts from $t = 0$ and holds only under the assumption that there exists no cash flow. Similarly, by (2.7), (2.8), and (2.11), the notional liability L(t) is given as:
Then for $t_0 < t < t_0 + 1$ , the log funding ratio $\rho(t)$ can be expanded as:
The last expression is obtained because the two identical terms $\int_{t_0}^t \tilde{\mu}{\textrm{d}}s$ in (3.3) are canceled out. This is the result of the expected log return $\tilde{\mu}$ being used in defining the indexation rate (2.11), which in turn results in (3.2).
Equation (3.4) indicates that the log funding ratio $\rho(t)$ for $t_0 < t < t_0 + 1$ is an Ornstein–Uhlenbeck process with initial value $\rho(t_0)_+$ (e.g., Karatzas and Shreve Reference Karatzas and Shreve1998, p. 358), which can be written as:
This expression shows that $\rho(t) = \ln (A(t) / L(t))$ is mean-reverting in the sense that, irrespective of the value of $\rho(t_0)_+$ , it tends to 0 (in expectation) as t increases. In other words, the funding ratio $A(t) / L(t)$ tends to 1 as t increases.
3.2. Effects of the adjustment parameter on the funding ratio
Based on the expression (3.5), we next study how the adjustment parameter $\theta$ affects the dynamics of the log funding ratio $\rho(t)$ . We summarize key observations in the following proposition, the proof of which can be found in Appendix A.1.
Proposition 1. Let $\rho(t) = \ln ( A(t) / L(t) )$ be the log funding ratio and $\theta > 0$ be the adjustment parameter of the indexation rate g(t). Let $t_0 \in \mathbb{N} \cup \{0 \}$ and $t_0 < t < t _0 + 1$ . Then we have the following:
-
(i) The conditional expectation and variance of $\rho(t)$ given $\rho(t_0)_+$ are given by:
(3.6) \begin{align} \mathbb{E}[ \rho(t) \mid \rho(t_0)_+ ] & = e^{-\theta(t-t_0)} \rho(t_0)_+, \end{align}(3.7) \begin{align} \mathbb{V}[\rho(t) \mid \rho(t_0)_+] & = {\tilde{\sigma}}^2 \int_{t_0}^{t} e^{- 2\theta (t - s)} {\textrm{d}}s , \end{align}where $\tilde{\sigma} > 0$ is the standard deviation of the annual log return in (2.10). -
(ii) As $\theta$ tends to zero, the conditional expectation of $\rho(t)$ given $\rho(t_0)_+$ tends to $\rho(t_0)_+$ :
(3.8) \begin{equation} \lim_{\theta\to +0}\mathbb{E}[ \rho(t) \mid \rho(t_0)_+ ]=\rho(t_0)_+, \end{equation}and the conditional variance of $\rho(t)$ given $\rho(t_0)_+$ tends to $\tilde{\sigma}^2$ times $t-t_0$ :(3.9) \begin{equation} \lim_{\theta\to +0} \mathbb{V}[\rho(t) \mid \rho(t_0)_+]={\tilde{\sigma}}^2 (t-t_0). \end{equation} -
(iii) As $\theta$ tends to infinity, the conditional expectation and variance of $\rho(t)$ given $\rho(t_0)_+$ tend to zero:
(3.10) \begin{equation} \lim_{\theta\to \infty} \mathbb{E}[ \rho(t) \mid \rho(t_0)_+ ]=0, \quad \lim_{\theta\to \infty} \mathbb{V}[\rho(t) \mid \rho(t_0)_+]=0. \end{equation}
Proposition 1 shows how the adjustment parameter $\theta$ affects the notional funding ratio $A(t)/L(t)$ and thus the stability of the pension fund. Point (iii) shows that a larger $\theta$ lets $A(t)/L(t)$ approach 1 more quickly and thus makes the fund more stable, while point (ii) indicates that a smaller $\theta$ makes the fund more volatile. Recall that the value of $\theta$ determines how strong the adjustment in the indexation rate g(t) works for the individual accounts $B_i(t)$ ; see (2.11). Therefore, a larger $\theta$ results in a stronger adjustment of the individual accounts $B_i(t)$ so that the notional liability $L(t) = \sum_{i \in I_w(t)} B_i(t)$ is adjusted more quickly to match the fund’s asset A(t); this is an intuitive explanation of how a large $\theta$ improves the stability of the pension fund.
While a larger $\theta$ may be beneficial for the fund’s stability, it results in a stronger adjustment of the individual accounts $B_i(t)$ , which may make the retirement benefits volatile. Therefore, it is important to understand the effects of $\theta$ on the retirement benefits; we analyze this next.
3.3. Effects of the adjustment parameter on the pension benefits
We next study how the adjustment parameter $\theta$ affects the retirement benefit of each generation. To this end, we obtain an analytic expression of the retirement benefit in terms of the log funding ratio, as summarized in the following proposition. The proof can be found in Appendix A.2.
Proposition 2. Let $c > 0$ be the annual contribution, $\tilde{\mu} >0$ and $\tilde{\sigma} > 0$ be the mean and the standard deviation of the annual log return in (2.10), $\rho(t) = \ln (A(t) / L(t))$ be the log funding ratio, and $\theta > 0$ be the adjustment parameter of the indexation rate g(t) in (2.11). Then for generation $i \in \mathbb{N}$ , the retirement benefit $B_i(i)$ is given by:
Proposition 2 enables studying the effects of the adjustment parameter $\theta$ on the retirement benefit $B_i(i)$ of the i-th generation, who retires at time $t = i$ . The expression (3.11) consists of N terms, in which each term is indexed by $n = 1,\dots,N$ . (Recall that N is the total number of years each generation contributes to the fund). One can understand the n-th term in (3.11) as corresponding to the contribution c made at time $t = i - n$ , that is, n years before the retirement at time $t = i$ .
We can make the following observations for the exponent of the n-th term in (3.11):
-
The term (I) corresponds to the deterministic growth term $\tilde{\mu}$ in the indexation rate g(t); see (2.11).
-
The term (II) represents the effects of the fund’s “surplus” or “deficit” in the last n years before the retirement. One can understand that there is a “surplus” if $\sum_{\ell = 1}^n \rho(i-\ell)_+ > 0 $ ; in this case, the retirement benefit increases accordingly, as a redistribution of the surplus. On the other hand, there is a “deficit” if $\sum_{\ell = 1}^n \rho(i-\ell)_+ < 0 $ , and the retirement benefit decreases accordingly; one can understand this as risk sharing to make the fund sustainable. The adjustment parameter $\theta$ determines the strength of the effects of this term, as we have $\lim_{\theta \to +0} (1 - e^{-\theta}) = 0$ and $\lim_{\theta \to \infty} (1 - e^{-\theta}) = 1$ .
-
The term (III) shows the effects of the volatility of the fund’s investment in the last n years before the retirement; recall the definition of ${\tilde{\sigma}} = \pi \sigma$ in (2.10). The adjustment parameter $\theta$ controls the influence of this volatility, as we have $\lim_{\theta \to +0} {\textrm{(III)}} = 0$ and $\lim_{\theta \to \infty} {\textrm{(III)}} = {\tilde{\sigma}} \int_{i-n}^i {\textrm{d}}Z(s)$ .
From these observations, one can understand that the adjustment parameter $\theta$ determines how strongly the retirement benefit is linked to the fund’s actual investment performance. For a larger $\theta$ , the terms (II) and (III) become more significant, and the retirement benefit is more directly influenced by the fund’s investment performance. For a smaller $\theta$ , the terms (II) and (III) become less significant, and the retirement benefit is determined mainly by the deterministic growth term (I). This asymptotic analysis supports the informal discussion in Section 2.1.5 on the mechanism of the indexation rate.
One may conclude that a smaller $\theta$ may be more beneficial for individual participants, because it makes the retirement benefits less volatile. However, as discussed in Section 3.2, a smaller $\theta$ makes the fund’s operation more volatile, and thus a larger $\theta$ is more desirable for the fund’s sustainability. Therefore, $\theta$ should be neither too small nor too large. We will discuss how to select the adjustment parameter $\theta$ (and the investment strategy $\pi$ ) in Section 4.
3.4. Effects of intergenerational risk sharing
Lastly, we discuss the effects of IRS, by comparing the pension benefits of the IRS-DC plan and the corresponding pure DC plan. Because our focus is to understand how IRS works, we assume here that the pure DC plan uses the same investment strategy $\pi$ as the IRS-DC plan. (Note that, in our numerical analysis in Section 5, we consider this setting as well as the setting where the pure DC plan uses the optimal investment strategy.)
Consider two hypothetical individuals from generation $i \in \mathbb{N}$ , who retire in the year i. One individual participates in the IRS-DC plan and receives the retirement benefit (3.11). The other participates in the pure DC plan using the investment strategy $\pi$ and receives the retirement benefit denoted by $A_i(i)$ . It is easy to see that $A_i(i)$ is given by:
By comparing (3.11) and (3.12), we can make the following observations:
-
The term (I) in (3.11) and the term (I’) in (3.12) are the same.
-
The term (II) in (3.11), which represents the effects of the fund’s surplus or deficit, does not exist in (3.12). This is reasonable, because there is no IRS in the pure DC plan.
-
The term (III) in (3.11), which shows the influence of the volatility of the investment, corresponds to the term (II’) in (3.12). Indeed, the term (III) converges to the term (II’) as $\theta \to \infty$ . However, one can see that the term (III) is smaller than the term (II’) for any fixed value of $\theta$ . The smaller volatility in (3.11) is the result of IRS and is controlled by the adjustment parameter $\theta$ .
This comparison describes how IRS works in the IRS-DC plan: IRS reduces the volatility of investment returns (term (III) in (3.11)), by letting the individuals share the fund’s surplus or deficit (term (II) in (3.11)). This effect of IRS is particularly important for protecting individual participants when the market is turbulent. Our numerical analysis in Section 5 shows that IRS is beneficial in this way.
4. Optimizing the investment strategy and adjustment parameter
We describe how to optimize the parameters of the IRS-DC pension model, namely the investment strategy $\pi$ and the adjustment parameter $\theta$ , so as to maximize the welfare of pension participants. In Section 4.1, we first introduce an expected utility maximization problem that involves the welfare of all the generations including those from the future. Since there is no analytical solution for this maximization problem, we next explain how to solve it numerically using BO in Section 4.2. We then describe the setting of simulations in Section 4.3, which will be used later in our numerical analysis.
4.1. Expected utility maximization problem
We consider a hypothetical social planner (fund manager) who decides the investment strategy $\pi$ and the adjustment parameter $\theta$ for the welfare of all the generations. To define the utility of this social planner, let $U_\gamma: (0, \infty) \to (-\infty, \infty)$ be the constant relative risk aversion (CRRA) utility function:
where $\gamma > 0$ is the level of relative risk aversion. We then define the utility of the social planner as the sum of discounted utilities of the retirement benefits for all the generations:
where $\beta > 0$ is a discounting factor and $B_i(i)$ is the retirement benefit of the i-th generation who retires at time $t = i$ ; see Figure 2, (2.6) and (2.7).
Lastly, we define our expected utility maximization problem as:
where the expectation is with respect to the retirement benefits $B_i(i)$ for all generations $i \in \mathbb{N}$ . Recall that $B_i(i)$ are path-dependent and depend on the investment strategy $\pi$ and the adjustment parameter $\theta$ .
We numerically solve the maximization problem (4.3), since neither the expected utility in (4.3) nor the solution for $\pi$ and $\theta$ are available in closed form. We approximate the expected utility in (4.3) by Monte Carlo simulations and optimize $\pi$ and $\theta$ using BO, as explained next.
4.2. BO for expected utility maximization
We briefly explain here how we use BO for solving the expected utility maximization problem (4.3). For details, see Appendix B and references therein. BO is a modern machine learning approach for globally optimizing a black-box objective function and has been shown to be more efficient than traditional approaches such as grid search (Shahriari et al., Reference Shahriari, Swersky, Wang, Adams and De Freitas2016). It has been widely used in applications where the objective function is computationally expensive to evaluate, such as the optimization of hyperparameters of a large-scale AI model (Snoek et al. Reference Snoek, Larochelle and Adams2012). The current work is the first attempt to apply BO in optimizing a pension system.
The objective function in (4.3) takes $\pi$ and $\theta$ as an input and outputs the expected utility:
where we note again that $B_i(i)$ depends on $\pi$ and $\theta$ . The key idea of BO is to “learn” the landscape of the objective function $f(\pi, \theta)$ while searching for $\pi$ and $\theta$ that maximizes the objective function. BO first evaluates the function values $f(\pi, \theta)$ for some initial candidates of $\pi$ and $\theta$ and obtains a rough estimate for the landscape of $f(\pi, \theta)$ . In the next step, BO finds $\pi$ and $\theta$ such that the function value $f(\pi,\theta)$ and its uncertainty are both high, so as to balance the so-called exploitation-exploration trade-off. BO then evaluates $f(\pi, \theta)$ for these $\pi$ and $\theta$ and updates the estimate of the landscape of $f(\pi, \theta)$ . BO iterates this learning optimization procedure. Estimates of the maximizers $(\pi^*, \theta^*) = \arg \max f(\pi, \theta)$ are obtained after a sufficient number of iterations (Bull, Reference Bull2011).
The above procedure is called “Bayesian” because the learning of the objective function is done by a Bayesian nonparametric method (Rasmussen and Williams, Reference Rasmussen and Williams2006). The Bayesian method is used because it can yield both an estimate of the landscape and its uncertainties, which are crucial for the exploitation-exploration trade-off and for gaining the optimization efficiency. For implementation, we use the R package mlrMBO (Bischl et al., Reference Bischl, Richter, Bossek, Horn, Thomas and Lang2017) in our numerical analysis.
4.3. Simulation setting
We explain here how we approximate the expected utility in (4.3) by Monte Carlo simulations, which is necessary for applying BO. Moreover, we describe the problem setting for our numerical analysis in the next section. First of all, we set the number of working generations as $N = 40$ , the discounting factor in (4.3) as $\beta = 0.98$ , and the annual contribution as $c = 1$ .
4.3.1. Financial market
We consider three settings for the financial market that represent different market risks, to investigate when IRS in the IRS-DC model works most effectively. The market price of risk, a.k.a the Sharpe ratio, is defined by:
where $\mu$ and $\sigma$ are the rate and volatility of the stock, respectively, and r is the rate of the risk-free asset; see Section 2.1.2. The Sharpe ratio quantifies the performance of a risky project in relation to a risk-free investment. It is one of the most frequently used performance measures, and we use it to describe different financial markets in our experiments. Typically, a Sharpe ratio above 0.5 in the long run indicates great investment performance and is difficult to achieve, while a Sharpe ratio between $0.1$ and $0.3$ is often considered reasonable and can be achieved more easily (e.g., Sharpe Reference Sharpe1998). Table 1 shows the Sharpe ratios for different financial markets estimated from historical dataFootnote 4. It shows that high values of the Sharpe ratio are around 0.3, and low values can be below $0.05$ .
For the simulation, we consider the following three settings for the financial market, with different levels of the Sharpe ratio:
We refer to Markets 1, 2, and 3 as M1, M2, and M3 for brevity, respectively. We call M1, M2, and M3 the markets with high, intermediate, and low Sharpe ratios, respectively. The calibrated values from the real world for the longer period (Table 1) justify the use of the chosen Sharpe ratios in our experiment.
4.3.2. Risk aversion of the social planner
The relative risk aversion $\gamma$ in the the CRRA utility function (4.1) represents the social planner’s risk attitude: the social planner becomes more risk-averse if $\gamma$ is larger. To study the impacts of $\gamma$ on the optimal investment strategy $\pi$ and adjustment parameter $\theta$ , we consider three settings: $\gamma = 3, 5, 10$ .
4.3.3. Entry cohorts
The generations with indicators $i = 1, \dots, 40$ are those who participate in the IRS-DC plan at time $t = 0$ and are called entry cohorts. For $t < 0$ , that is, before participating in the IRS-DC plan, we assume that the entry cohorts participate in a pure DC plan that applies the optimal life cycle investment strategy, following Gollier (Reference Gollier2008). Namely, the pure DC plan invests a large amount into the stock when the participant is young and gradually reduces the amount invested in the stock as the participant approaches retirement. To be more precise, for a generation i where $i=1,\dots, 40$ , let $B_i(t)$ be the individual account of generation i and $Y_i(t)$ be the net present value at time t of all the future contributions of generation i; then the optimal fraction $\pi_i^{\textrm{ind}}(t)$ of generation i’s wealth to be invested in the stock is given by:
See Merton (Reference Merton1971, Equation 71). Notice that $\pi^c$ is the so-called Merton constant defined as:
where $\lambda$ is the Sharpe ratio in (4.4). Note that $Y_i(t)$ can be calculated straightforwardly here, as the interest rate risk is excluded.
For an individual with the CRRA utility function, the life cycle investment strategy (4.5) provides the highest expected utility (Merton, Reference Merton1971, Gollier, Reference Gollier2008). Hence, the life cycle investment strategy and its variants have been popular choices for pure DC plans (e.g., Booth and Yakoubov Reference Booth and Yakoubov2000; Haberman and Vigna Reference Haberman and Vigna2002). Note that, when an individual is young, the discounted future income $Y_i(t)$ is much higher than her current wealth in the account $B_i(t)$ , and thus $\pi^{\textrm{ind}}(t)$ in (4.5) is much larger than 1. Therefore, the life cycle investment strategy (4.5) implies a high-leverage (i.e., borrowing) strategy when the individual is youngFootnote 5 (see Figure 3).
The life cycle investment strategy is also used in Section 5.5 to make a comparison between the IRS-DC and the optimal pure DC plans.
4.3.4. Euler–Maruyama approximation
For simulating the dynamics of the asset process (2.4), we use the Euler–Maruyama approximation. Given a finite time horizon $T > 0$ , we divide the interval [0, T] into n equal time intervals:
where we set the step size as $\Delta = 1 / 12$ , which corresponds to 1 month. Then we simulate the asset process as:
where Z is a standard normal random variable. The dynamics of the asset process in a pure DC plan is simulated in a similar way.
The step size $\Delta = 1/12$ implies that the indexation rate (2.11) is adjusted monthly according to the funding ratio. This monthly update is more frequent than the annual cash flows of the fund. This setting reflects the fact that the market values of individual accounts usually vary more frequently than cash flows in reality. The IRS-DC fund is assumed to be fully funded at $t=0$ implying that the initial value of the notional funding ratio is 1: $A(0)/L(0) = 1$ . (The influence of the initial funding ratio is examined in the numerical analysis in Section 5.2.)
4.3.5. Time horizon T
While the expected utility in (4.3) involves the infinite horizon, it is intractable for simulations and we need to use a finite horizon T. In our numerical analysis, we set the horizon as $T = 80$ years. For approximating the expected utility in (4.3), we then simulate 10,000 paths for the asset process until the horizon T and compute the Monte Carlo average. Note that, in this setting, the generations $i = 41, \dots, 80$ are those who spend their entire working periods in the IRS-DC plan.
4.3.6. Upper bound of the adjustment parameter $\theta$
While the adjustment parameter $\theta$ can take an arbitrarily large value in theory, for numerical optimization of $\theta$ we need to set its upper bound. The range of $\theta$ is set as $0 < \theta \leqslant 1$ in our numerical analysis.
5. Numerical analysis
This section presents our numerical analysis of the IRS-DC pension model. In Section 5.1, we first discuss the optimal investment strategy and adjustment parameter obtained by BO. In Section 5.2, we then study the dynamics of the funding ratio and discuss how the adjustment parameter $\theta$ affects its stability. The stability of the funding ratio can be understood as the stability of the pension fund’s operation. In Sections 5.3, 5.4, and 5.5, we focus on the individual accounts in the IRS-DC fund. We first study the dynamics of individual accounts in Section 5.3, and then the distribution of retirement benefits in Section 5.4. Lastly, we study the welfare of pension participants in Section 5.5. Additional numerical analyses on a time-varying investment strategy and the influence of population structure are reported in the online appendix.
5.1. Optimal investment strategy and adjustment parameter
As explained in Section 4.3, we consider nine different settings for the numerical analysis, resulting from three different values for the relative risk aversion ( $\gamma=3, 5, 10$ ) of the social planner and three different values for the Sharpe ratio ( $\lambda=0.3, 0.22, 0.11$ ) of the financial market. In each setting, we find the optimal investment strategy $\pi$ and the adjustment parameter $\theta$ by BO, as described in Section 4. We report the resulting optimal values of $\pi$ and $\theta$ in Table 2. For comparison, we also report the Merton constant (4.6) for each setting in Table 2, which will be used in the experiment in Section 5.5.
For each value of the Sharpe ratio $\lambda$ , the optimal $\pi^*$ and $\theta^*$ tend to decrease as the risk aversion $\gamma$ increases (with the exception of the case $\lambda = 0.3$ , where $\theta^*$ remains 1). Regarding the optimal investment strategy $\pi^*$ , this tendency can be anticipated by the same tendency in the Merton constant (4.6), which is inversely proportional to the risk aversion $\gamma$ . Regarding the optimal adjustment parameter $\theta^*$ , this tendency can be expected from the analysis in Section 3.3, where it is shown that a smaller adjustment parameter $\theta$ lowers the volatility of the retirement benefits; thus, a higher risk aversion $\gamma$ leads to smaller $\theta^*$ .
For each value of the risk aversion $\gamma$ , the optimal $\pi^*$ and $\theta^*$ tend to be smaller as the Sharpe ratio $\lambda$ becomes smaller (for the cases $\gamma = 3, 5$ and $\lambda = 0.22, 0.11$ , the adjustment parameter $\theta^*$ is comparably small). One can understand this tendency in a similar way as the discussion in the above paragraph, since the Sharpe ratio represents the market price of risk. Note that $\theta^*$ is extremely small for $\gamma = 10$ and $\lambda = 0.01$ , which implies the IRS-DC plan becomes similar to a DB plan, as discussed in Section 3.3; in this case, $\pi^*$ is also very small, meaning that the asset is mainly invested in the risk-free asset.
5.2. Funding ratio process
We next study the dynamics of the funding ratio $A(t)/L(t)$ , investigating the influences of the adjustment parameter $\theta$ and the initial funding ratio $A(0) / L(0)$ .
5.2.1. Influence of the adjustment parameter
We first examine the influence of the adjustment parameter $\theta$ . We fix the investment strategy to $\pi = 0.131$ , which is optimal for Market 3 with $\gamma = 3$ (see Table 2). We consider three values for the adjustment parameter: $\theta_1 = 0.04$ , $\theta_2 = 0.0835$ , and $\theta_3 = 0.2$ , where $\theta_2$ is optimal for Market 3 with $\gamma = 3$ . For each value of the adjustment parameter, we simulate the IRS-DC fund 10,000 times in Market 3; the results are summarized in Figure 4.
Figure 4(a) shows the mean and standard deviation of the funding ratio $A(t)/L(t)$ over the 10,000 simulations as a function of time t, for each of the three values of $\theta$ . The standard deviation is the smallest for $\theta_3 = 0.2$ and the largest for $\theta_1 = 0.04$ ; therefore, the larger the adjustment parameter, the smaller the standard deviation of the funding ratio. This observation validates our analysis in Section 3, which indicates that a larger adjustment parameter $\theta$ makes the funding ratio lower. Moreover, the mean of the funding ratio is close to 1 for $\theta_2 = 0.0835$ and $\theta_2 = 0.2$ , while the mean (and standard deviation) gradually increase for $\theta_3 =0.04$ . This observation is also consistent with the analysis in Section 3, which implies that a smaller adjustment parameter lets the funding ratio $A(t)/L(t)$ converge to 1 more slowly.
Figure 4(b), (c), and (d) show the paths of the funding ratio for three representative scenarios defined as follows. We pick up the three scenarios from the 10,000 simulations that correspond to the top 10%, 50%, and 90% values of the utilities of the social planner (see (4.2)) and plot the funding ratio processes in these scenarios; these three scenarios can be interpreted as representing “good,” “medium,” and “bad” realizations of the financial market. The discrepancy between the paths of the funding ratio in these scenarios reduces as $\theta$ increases. This implies that the funding ratio volatility decreases as $\theta$ increases and is consistent with our analysis in Section 3.
5.2.2. Influence of the initial funding ratio
We next examine the influence of the initial funding ratio $A(0)/L(0)$ on the dynamics of the funding ratio $A(t)/L(t)$ . We consider three cases for the initial funding ratio: (1) $A(0)/L(0) = 0.9$ , (2) $A(0)/L(0) = 1$ , and (3) $A(0)/L(0) = 1.1$ . We simulate the IRS-DC fund 10,000 times for each case and calculate the mean and standard deviation of $A(t)/L(t)$ . Figure 5 shows the results for (a) Market 3 with $\pi = 0.131$ and $\theta = 0.0835$ , which are optimal for $\gamma = 3$ in Market 3, and for (b) Market 1 with $\pi = 0.267$ and $\theta = 1$ , optimal for $\gamma = 10$ in Market 1. Regardless of the initial funding ratio $A(0)/L(0)$ , the mean of the funding ratio $A(t)/L(t)$ converges to 1 as t increases. For Market 3, where the adjustment parameter $\theta$ is small, the mean of the funding ratio converges to 1 slowly; for Market 1, where the adjustment parameter is larger, the mean converges to 1 immediately. Therefore, these results suggest that the IRS-DC fund can self-stabilize the funding ratio to 1, and a larger adjustment parameter $\theta$ leads to a quicker stabilization; again, this is consistent with the analysis in Section 3.
5.3. Dynamics of individual accounts
We next study the dynamics of individual accounts in the IRS-DC plan. As for the analysis in Section 3.4, to study the effects of IRS, each individual account in the IRS-DC plan is compared with the corresponding account in a pure DC plan that uses the same investment strategy $\pi^*$ in Table 2. (A pure DC plan using the optimal life cycle investment strategy is compared in Section 5.5.) Since IRS is absent, the dynamics of an individual account in the pure DC plan is given by the asset process yielding (3.12).
Figure 6 shows arbitrarily chosen paths of the individual accounts from the generation $i = 41$ in the IRS-DC and pure DC plans, for the three market settings and risk aversion $\gamma = 10$ . For Market 1, for which the adjustment parameter $\theta^*$ is large (see Table 2), the paths of the IRS-DC and DC accounts are similar. In contrast, for Markets 2 and 3, for which the adjustment parameter $\theta^*$ is smaller, the IRS-DC account accumulates more stably than the pure DC account. This observation is consistent with the discussions in Sections 2.1.5, 3.3, and 3.4 where it is argued that a smaller adjustment parameter $\theta$ reduces the volatility of an IRS-DC account as a result of IRS.
We next quantify the effects of IRS on stabilizing the accumulation of an IRS-DC account. To this end, we calculate the increment-ratio-based roughness (IR roughness) (Bardet and Surgailis Reference Bardet and Surgailis2011), a measure of roughness/smoothness of a stochastic process, for each path of the IRS-DC and pure DC accounts. The IR roughness takes a value between 0 and 1, and a larger value indicates that the path is smoother; see Appendix C for details. Table 3 shows the average of the IR roughness over the 10,000 simulations for each of the IRS-DC and pure DC accounts from the generation $i= 41$ . For all the nine settings considered, the IRS-DC account has a larger IR roughness than the pure DC account, which implies that the IRS-DC account is smoother. Therefore, IRS makes the accumulation of the IRS-DC account more stable than the pure DC account (Recall that the only difference between the IRS-DC and DC plans here is the existence of IRS).
5.4. Distribution of retirement benefits
We next study how IRS affects the distribution of retirement benefits. Figure 7 shows the histograms of retirement benefits (from the 10,000 simulations) of the IRS-DC and pure DC accounts for the generation $i = 41$ , for the three market settings and risk aversion $\gamma = 10$ . (Results for $\gamma = 3, 5$ are shown in the online appendix.) For Market 1, for which the adjustment parameter $\theta^*$ of the IRS-DC plan is large, the histograms of the IRS-DC and pure DC retirement benefits are almost identical. On the other hand, for Markets 2 and 3, for which the adjustment parameter is smaller, the volatility of the IRS-DC benefits is smaller than the pure DC benefits. In particular, for Market 3, for which the adjustment parameter is close to 0, the volatility of the IRS-DC benefits is very small. These observations support the analysis in Sections 3.3 and 3.4 that a smaller adjustment parameter $\theta$ makes the retirement benefits less volatile by IRS. Moreover, our result is consistent with similar observations made by Bams et al. (Reference Bams, Schotman and Tyagi2016) and Donnelly (Reference Donnelly2017) that a collective DC scheme can reduce the volatility of retirement benefits.
5.5. Welfare of participants
Lastly, we study how IRS can improve the welfare of the IRS-DC plan participants in terms of their expected utilities. To this end, we make a comparison with a pure DC plan that uses the optimal life cycle investment strategy in (4.5), which yields the highest expected utility for an individual investor.
For simplicity, we assume that each participant in the IRS-DC plan has the same CRRA utility $U_\gamma$ in (4.1) as the social planner. Similarly, to make a comparison straightforward, we assume that each participant in the pure DC plan has the same CRRA utility. To measure the welfare, we calculate the certainty equivalent (CE) for each participant. That is, for an IRS-DC participant from the generation i with retirement benefit $B_i(i)$ , the CE is defined as the quantity $CE_{i}^{(\text{IRS-DC})} > 0$ satisfying
where we approximate the expectation in the right-hand side by the empirical average of 10,000 realizations of $B_i(i)$ . The CE of each participant in the pure DC plan is calculated similarly. Note that, since the utility function is strictly concave, the expected utility is monotonically increasing with respect to the CE; the higher the CE implies, the higher the expected utility. We calculate the CEs of the participants in the IRS-DC plan and the pure DC plans for the generations $i = 41, \dots, 80$ .
Figure 8 shows the CEs of the IRS-DC and pure DC participants for the generations $i = 41, \dots, 80$ , for the three market settings and risk aversion $\gamma = 10$ . (Results for $\gamma = 3, 5$ are shown in the online appendix.) For Market 1, where the Sharpe ratio is high, the pure DC participants obtain higher welfare than the IRS-DC participants. On the other hand, for Markets 2 and 3, where the Sharpe ratio is lower, the IRS-DC participants obtain higher welfare than the pure DC participants. This observation indicates that the IRS-DC plan can provide higher welfare than the optimal DC plan when the market is more volatile (in the sense of having a lower Share ratio). Therefore, IRS is expected to be particularly advantageous in protecting individual participants when the market is turbulent (e.g., when there is an economic shock).
While it has been generally known in the literature that IRS is welfare-improving, there are a few key differences in our contribution. To explain this, we make a comparison with closely related works. Gollier (Reference Gollier2008) shows that IRS is welfare-improving over the optimal life cycle investment strategy, but his analysis is based on the assumption that the pension fund can perform borrowing for investment (i.e., the investment strategy $\pi$ can be larger 1); this assumption is not realistic for pension funds in reality. Moreover, his second-best strategy assumes the existence of a “shareholder” that helps finance the pension fund. Our result above shows that IRS can be welfare-improving even when borrowing is prohibited for the pension fund (i.e., $0 < \pi < 1 $ ) and without a shareholder.
Cui et al., (Reference Cui, De Jong and Ponds2011) show that a hybrid pension plan with IRS can provide higher welfare than a pure DC plan with an “optimal” investment strategy. However, their “optimal” individual investment strategy is not allowed to perform borrowing, and therefore it is less optimal than the optimal life cycle strategy (4.5), which performs borrowing. Moreover, Cui et al. (Reference Cui, De Jong and Ponds2011) optimize the parameters of the pension fund so as to maximize the expected utility of one specific entry cohort, not all the generations; they then compare this entry cohort’s welfare with a pure DC plan participant’s welfare. This way of optimizing the pension system is not appropriate as it ignores the other generations’ welfare. On the other hand, we show that the IRS-DC plan, which optimizes for all the generations’ utilities as in (4.3), can improve the welfare over the optimal life cycle investment strategy, when the market is volatile.
Bams et al. (Reference Bams, Schotman and Tyagi2016) consider a similar pension model as ours, but they do not show that their model can provide higher welfare than individual DC plans. Similarly, Donnelly (Reference Donnelly2017) considers a related collective DC plan but does not compare it with individual DC plans. Chen et al. (Reference Chen, Beetsma, Ponds and Romp2016) study a three-pillar model in which the second pillar is a collective DC, DB, or hybrid pension plan and make a comparison with the corresponding three-pillar model with the second pillar being an individual DC plan. While they show that the former yields higher welfare than the latter, they assume that the both plans use the same investment strategy, with the fraction invested in the stock being $\pi = 0.5$ ; therefore, their individual DC plan is not optimal. Different from these previous works, we make a comparison with the optimal life cycle investment strategy. By doing so, we show that the volatility of the financial market is a key factor that determines whether IRS is welfare-improving over the optimal life cycle investment strategy.
6. Concluding remarks
We have shown that a fully funded collective DC pension system with IRS can improve the welfare of individual participants, as compared with individual DC benchmarks using the optimal life cycle investment strategy, when the financial market is volatile. Key new findings to the literature include that (i) the welfare improvement can be achieved without relying on borrowing and shareholders, in contrast to, for example, Gollier (Reference Gollier2008), and that (ii) whether IRS improves the welfare depends on the volatility of the financial market, as measured by the Sharpe ratio. These observations suggest that a fully funded pension system with a realistic investment strategy (i.e., without borrowing) can implement IRS and protect individual participants from a turbulent market.
Our investigation has been based on a stylized model, which we call the IRS-DC pension model, that uses an indexation rate of individual accounts as a device for IRS. This indexation rate, originally introduced by Goecke (Reference Goecke2013), is automatically adjusted according to the notional funding ratio of the pension fund, so as to balance the welfare of different generations and the sustainability of the pension fund. We have analyzed the funding ratio process and retirement benefits in the IRS-DC model, and how their volatility is controlled by the adjustment parameter in the indexation rate. Moreover, we have shown how the adjustment parameter and the investment strategy can be optimized by using BO, a machine learning method for global optimization.
There are a number of possible future directions. First, as we have shown the effectiveness of the indexation rate of Goecke (Reference Goecke2013) as a means for IRS in a collective pension system, the same indexation rate may be applied to other collective schemes, such as hybrid pension systems (e.g., Cui et al. Reference Cui, De Jong and Ponds2011; Chen et al. Reference Chen, Beetsma, Ponds and Romp2016) and notional DC pension systems (e.g., Settergren Reference Settergren2001), where other forms of automatic adjustment rules are used for adjusting the individual accounts and/or contributions. This is worth investigating, as automatic adjustment rules have been used in real pension systems, such as the Dutch and Swedish pension systems (OECD, 2021, Chapter 2).
Second, as BO provides an efficient way of optimizing the parameters of a pension system, it enables researchers to study optimal pension systems under more realistic but complex setups. For example, BO may be applied to optimize the three-pillar pension system of Chen et al. (Reference Chen, Beetsma, Ponds and Romp2016), which involves a number of parameters, by expected utility maximization; this may enable showing that their collective scheme is welfare-improving over the optimal individual benchmark using the life cycle investment strategy, as we have shown for our collective scheme.
Third, our finding that IRS is welfare-improving in a volatile market is worth further investigation in a more realistic setup of the financial market. While our setup of the Black–Scholes market (i.e., log-normally distributed stock returns) follows many of related works e.g., Cui et al. Reference Cui, De Jong and Ponds2011; Chen et al. Reference Chen, Beetsma, Ponds and Romp2016), it is known that this setup does not necessarily hold in reality (e.g., Cont Reference Cont2001). For example, the log returns of real stocks are known to have heavy tails, which implies that real financial markets are more volatile than the Black–Scholes market. Similarly, it is more realistic to assume that the interest rate is stochastic and time-varying, rather than assuming a constant interest rate. Extending the current work to these more realistic settings will enable a deeper understanding of the functionality of the IRS.
Fourth, the discontinuity risk may be discussed for the IRS-DC model. We have implicitly assumed the mandatory participation of individuals by modeling that the population in each generation remains the same, as in related works (e.g., Gollier Reference Gollier2008; Chen et al. Reference Chen, Beetsma, Ponds and Romp2016). One could relax this assumption by making the participation voluntary and study individuals’ preferences and how they impact the sustainability of the pension fund and the welfare of different generations (e.g., Beetsma et al. Reference Beetsma, Romp and Vos2012). Because contributions are not adjusted in the IRS-DC plan by design, it may be anticipated that the IRS-DC plan is less prone to discontinuity risk than DB-based pension plans. However, if voluntary participation changes the populations of different generations, the effectiveness of the IRS may be affected (as suggested by the additional numerical analysis in the online appendix). It will be interesting to investigate whether mandatory participation is necessary for the IRS-DC plan to maintain effective IRS.
Acknowledgments
We would like to express our gratitude to the editor and the anonymous reviewers for their time and insightful comments, which helped improve the paper. This work in part has been supported by the French government, through the 3IA Cote d’Azur Investment in the Future Project managed by the National Research Agency (ANR) with the reference number ANR-19-P3IA-0002, and by Deutsche Forschungsgemeinschaft with grant number 418318744 of the research project: “Zielrente: die Lösung zur alternden Gesellschaft in Deutschland.”
Appendix A. Proofs
A.1. Proof of Proposition 1
Proof. The identity (3.6) follows from taking the conditional expectation of (3.5), using that the Brownian motion Z(s) for $t_0 < s < t$ is independent of the conditioning variable $\rho(t_0)_+$ and hence the conditional expectation of Z(s) is zero. Equation (3.7) follows by using the Ito Isometry in (3.5):
Equations (3.8) and (3.9) follow by taking the limits in (3.6) and (3.7).
A.2. Proof of Proposition 2
Proof. Let $t_0 \in \mathbb{N}$ be such that $i-N \leqslant t_0 \leqslant i-1$ . Let $\rho(t) = \ln (A(t) / L(t))$ be the log notional funding ratio. By (2.6), (2.7), and (2.11), we have
where (a) follows from (3.4) and (b) follows from (3.5).
We show a proof by induction. Suppose that, for $m \in \mathbb{N}$ with $0 < m \leqslant N-1$ , we have
Note that the identity (A3) holds for $m = N - 1$ , since we have by (A2) and $B_i(i-N) = 0$ :
By using (A2) with $t_0 = i - m$ , the assumption (A3) implies that
which is the same expression as (A3) with m being replaced by $m - 1$ . Therefore, by induction, (A3) holds with $m = 0$ , which is (3.11). This completes the proof.
Appendix B. Tutorial on Bayesian optimization
We provide here a short tutorial on Bayesian optimization (BO). For further details and references, see for example Shahriari et al. (Reference Shahriari, Swersky, Wang, Adams and De Freitas2016).
Let $\Omega$ be a parameter set and $f \;:\; \Omega \to \mathbb{R}$ be the objective function to be maximized. In our problem, this parameter set is $\Omega = [0,1] \times [0,1]$ and each $x \;:\!=\; (\pi, \theta) \in \Omega$ represents a pair of the investment strategy $\pi$ and adjustment parameter $\theta$ . We define the objective function as the CE of the expected utility in (4.3) with input parameters $x = (\pi, \theta)$ :
Note that the expected utility is a function of $x = (\pi, \theta)$ , as the payment $B_t(t)$ depends on $\pi$ and $\theta$ . Since the CRRA utility function $U_\gamma$ is strictly monotonically increasing with respect to its argument, the maximizer of the CE is the same as the maximizer of the expected utility:
Thus, the maximization of the expected utility can be equivalently formulated as the maximization of the objective function (B1).
In our study, the expected utility is approximated by the Monte Carlo average of 10,000 simulations of the asset process A(t) (and thus the resulting $B_t(t)$ ) for $t = 1, \dots, T \;:\!=\; 80$ . Therefore, each evaluation of f(x) for a given $x = (\pi, \theta)$ involves 10,000 simulations over 80 years on monthly basis, which is computationally expensive. If $A(t) \leqslant 0$ happens at any $t > 0$ for any of 10,000 simulations of the financial market, we set the objective function value to its minimum: that is, $f(x) = 0$ .
B.1 Procedure of Bayesian optimization
First, we generate initial design points $x_1, \dots, x_{ n_{\textrm{init}}}$ for some $n_{\textrm{init}} \in \mathbb{N}$ and evaluate the function values $f(x_1), \dots, f(x_{n_{\textrm{init}}} )$ on these points. One can generate these initial points randomly (e.g., uniform sampling on $\Omega$ ) or deterministically (e.g., grid points). In our study, we use the design given by Latin hypercube sampling (McKay et al. Reference McKay, Beckman and Conover2000) on $\Omega$ with $n_{\textrm{init}} = 10$ .
Below we use the notation $D_n \;:\!=\; \{ (x_i, f(x_i)) \}_{i=1}^n \subset \Omega \times \mathbb{R}$ to write the collection of points $x_1,\dots,x_n$ and the resulting function values $f(x_1), \dots, f(x_n)$ . $D_n$ can be understood as “data” or “observations” about f after n-time evaluations of the function. We also denote by $\alpha(x;\; D_n)$ the acquisition function, whose concrete form will be introduced later in Section B.3. The acquisition function $\alpha(x;\; D_n)$ is a function of $x \in \Omega$ and defined from $D_n$ .
BO iterates the following procedure for $n = n_{\textrm{init}} + 1, n_{\textrm{init}} + 2, \dots, M$ , where M is the total number of function evaluations.
-
(i) Compute $x_{n + 1} \in \arg\max_{x \in \Omega} \alpha(x;\; D_n)$ ,
-
(ii) Simulate $f(x_{n+1})$ and augment the data $D_{n+1} \;:\!=\; D_n \cup \{ (x_{n+1}, f(x_{n+1})) \}$ .
An estimate of the optimal parameters is then given as the maximizer from the evaluated inputs $x_1, \dots, x_M$ :
The acquisition function $\alpha(x;\; D_n)$ determines the next point $x_{n+1}$ to evaluate the objective function f. Note that the computational cost of solving $\max_{x \in \Omega}\alpha(x;\; D_n)$ is negligible compared to the computational cost of evaluating $f(x_{n+1})$ , as $\alpha(x;\; D_n)$ can be evaluated cheaply.
The acquisition function is designed so as to balance the exploitation and exploration. Exploitation is a strategy to search for in a region near the current maximizer in $x_n^* \;:\!=\; \arg\max \{ f(x) \mid x \in \{x_1,\dots,x_n \} \}$ ; exploration is to search for in a region far from the evaluated points $x_1,\dots,x_n$ . This exploration–exploitation trade-off is enabled by the learning and uncertainty quantification of the response surface of f from the data $D_n$ . This is done by Gaussian process regression, which we will explain next.
B.2. Gaussian process regression
Gaussian process regression (Rasmussen and Williams Reference Rasmussen and Williams2006) is a Bayesian nonparametric method for learning (or approximating) an unknown function $f \;:\;\Omega \to \mathbb{R}$ from its finite observations (data) $D_n = \{ (x_i,\; f(x_i)) \}_{i=1}^n$ . Recall that Bayesian inference in general proceeds as follows: (a) define a prior distribution for the quantity of interest, (b) collect observations (data) related to that quantity, and (c) update the prior distribution to the posterior distribution using the observed data, applying Bayes’ rule. In Gaussian process regression, the quantity of interest is the unknown function f, and (a’) one defines a prior distribution of f as a Gaussian process (or Gaussian random field), (b’) collects data $D_n = \{ (x_i, f(x_i)) \}_{i=1}^n$ , and (c’) updates the prior Gaussian process to the posterior Gaussian process, applying Bayes’ rule. See Figure B1 for illustrations of Gaussian process regression.
8.2.1 Prior Gaussian process
A Gaussian process is completely specified by its mean function $m \;:\; \Omega \to \mathbb{R}$ and covariance function $k\;:\; \Omega \times \Omega$ . We write $f \sim \mathcal{GP}(m, k)$ to mean that f is a sample path of the Gaussian process with mean function m and covariance function k. Then we have $m(x) = \mathbb{E}[f(x)]$ , $x \in \Omega$ and $k(x,x') = \mathbb{E}[ (f(x) - m(x)) (f(x') - m(x')) ]$ , $x, x' \in \Omega$ . By specifying m and k, we implicitly specify the corresponding Gaussian process.
For simplicity, we consider a Gaussian process with the zero-mean function (i.e., $m(x) = 0$ , $\forall x \in \Omega$ ) for our prior distribution of the objective function f:
What we need is to specify the covariance function k. By doing so, we can express our assumption or knowledge regarding key properties of the objective function f, such as its smoothness and structure.
Popular choices of covariance kernels include square exponential kernel $k(x,x') = \exp(- \| x - x' \|^2 / h)$ with $h > 0$ and Matérn kernels. In our study, we use the so-called Matérn- $5/2$ kernel of the form:
where $h > 0$ is a scale parameter. In our simulation study, we use the default value for h of the mlrMBO package. Roughly, this kernel leads to $f \sim \mathcal{GP}(0, k)$ that is almost surely twice differentiable (e.g., Kanagawa et al. Reference Kanagawa, Hennig, Sejdinovic and Sriperumbudur2018, Section 4.4). Thus, with this kernel, we essentially assume this degree of smoothness for the objective function, and this is our prior assumption.
B.2.2. Posterior Gaussian process
The use of a Gaussian process as a prior leads to an analytic expression of the resulting posterior distribution. Given data $D_n = \{ (x_i, f(x_i)) \}_{i=1}^n$ , the posterior distribution of f is also given as a Gaussian process:
where $m_n \;:\; \Omega \to \mathbb{R}$ is the posterior mean function and $k_n \;:\; \Omega \times \Omega \to \mathbb{R}$ is the posterior covariance function, given by:
where $\textbf{f}_n \;:\!=\; (f(x_1), \dots, f(x_n))^\top$ , $\textbf{k}_n(x) \;:\!=\; (k(x,x_1), \dots, k(x, x_n))^\top \in \mathbb{R}^n$ and $\textbf{K}_n \;:\!=\; (k(x_i, x_j))_{i, j = 1}^n \in \mathbb{R}^{n \times n}$ . For the detail of the above derivation, see Rasmussen and Williams (Reference Rasmussen and Williams2006).
The posterior mean function $m_n$ in (B 5) is an approximation of the objective function f based on the data $D_n$ . It works as a computationally cheaper surrogate model of f. On the other hand, the posterior standard deviation:
quantifies the uncertainty about the unknown function value f(x). These $m_n$ and $\sigma_n$ are the building blocks of the acquisition function, as we will see next.
B.3. Acquisition function
We now introduce the concrete form acquisition function $\alpha(x;\; D_n)$ . There are many acquisition functions proposed in the literature; see Shahriari et al. (Reference Shahriari, Swersky, Wang, Adams and De Freitas2016, Section IV). Most popular ones include the EI (Expected Improvement), GP-UCB (Gaussian Process Upper Confidence Bound), and ES (Entropy Search). In this paper, we use the EI acquisition function, which is standard and theoretically well studied (Bull Reference Bull2011). Let
be the maximum and the maximizer of the objective function f(x) over the currently evaluated inputs $x_1, \dots, x_n$ . The EI acquisition function $\alpha(x;\; D_n)$ at x is defined as the expected improvement of the function value f(x) over the current maximum $f^*_n$ , where the expectation is with respect to the posterior Gaussian process (B4):
where $\phi\;:\;\mathbb{R} \to [0, \infty)$ is the probability density function of a standard Gaussian random variable, and $\Phi\;:\; \mathbb{R} \to [0, 1]$ is its cumulative distribution function: $\Phi(y) \;:\!=\; \int_{-\infty}^y \phi(s) ds$ , $y \in \mathbb{R}$ .
The first term in (B8) represents the exploration, as it becomes large when $\sigma_n(x)$ , which represents the uncertainty about the function value f(x), is large. This is typically the case when x is far from already evaluated locations $x_1,\dots,x_n$ . The second in (B8) represents the exploitation, as it becomes large when $m_n(x) - f_n^*$ is large and $\sigma_n(x)$ is small. This is typically the case when x is near the current maximizer $x_n^*$ . Thus, the EI acquisition function naturally balances the exploration–exploitation trade-off, and the next point $x_{n+1} \in \arg \max_{x \in \Omega} \alpha(x;\; D_n)$ achieves such a balance.
B.4. Demonstration
Figure B2 shows an example of points $x = (\pi, \theta)$ evaluated by BO for $\gamma = 10$ . The green points are the $n_{\textrm{init}} = 10$ initial design points $x_1, \dots, x_{n_{\textrm{init}}}$ generated by Latin hypercube sampling. The total number of evaluated points is $M = 100$ . The red point is the maximizer $x_M^*$ , and the blue points are the 10 other second-best parameters (largely overlapping with the red point). For a comparison, we show $10 \times 10$ grid points.
Appendix C. IR roughness measure
To describe the IR roughness measure (Bardet and Surgailis Reference Bardet and Surgailis2011), we suppose that the path of each individual account is represented by a function $h \;:\; [0, \widetilde{T}] \to \mathbb{R}$ , where $\widetilde{T}=40$ is its terminal time. Note that $\tilde{T}$ is the terminal time for one generation but is different from the terminal time for the operation of the pension fund T. Discretizing the domain to $n - 1 \in \mathbb{N}$ intervals, the first-order IR roughness is defined as:
By the triangle inequality, the numerator in the sum is less than or equal to the denominator, and thus $R^{1,n}(h)$ takes values between 0 and 1. When the signs of the two increments $h( \widetilde{T}(j+1)/n ) - h( \widetilde{T} j/n ) $ and $h( \widetilde{T}(j+2)/n ) - h( \widetilde{T}(j+1)/n )$ are the same, the numerator equals the denominator; when those signs are different, the numerator is smaller than the denominator. As such, $R^{1,n}(h)$ reflects the sign changes of the function h and thus quantifies its roughness. Intuitively, $R^{1,n}(h)$ is close to 0 when h is rough, and is close to 1 when h is smooth. In fact, Bardet and Surgailis (Reference Bardet and Surgailis2011) show that, for a sufficiently smooth h, $R^{1,n}(h)$ converges to 1 as $n \to \infty$ .