Conviction Narrative Theory (CNT) (target article) is both a “theory of narratives” and a “narrative theory” that precludes mathematical or numerical analysis. This commentary reviews the commitments of CNT through the lens of active inference and self-evidencing (Hohwy, Reference Hohwy2016), asking whether CNT could lend itself to a formal (Bayesian) treatment.
Box 1 summarises the fundaments of active inference, as it relates to decision-making under uncertainty. With these fundaments, one can simulate the kind of decision-making addressed by CNT. For example, active inference reproduces decision-making under unknowable circumstances (Friston et al., Reference Friston, FitzGerald, Rigoli, Schwartenbeck, O'Doherty and Pezzulo2016); it dissolves the exploration–exploitation dilemma and provides a principled account of affordances (Schwartenbeck et al., Reference Schwartenbeck, Passecker, Hauser, FitzGerald, Kronbichler and Friston2019). It can model the spread of ideas (Albarracin, Demekas, Ramstead, & Heins, Reference Albarracin, Demekas, Ramstead and Heins2022) and has been applied to cultural niche construction and social norms (Veissiere, Constant, Ramstead, Friston, & Kirmayer, Reference Veissiere, Constant, Ramstead, Friston and Kirmayer2019).
Recent trends in theoretical neurobiology, machine learning and artificial intelligence converge on a single imperative that explains both sense-making and decision-making in self-organising systems, from cells (Friston, Levin, Sengupta, & Pezzulo, Reference Friston, Levin, Sengupta and Pezzulo2015) to cultures (Veissiere et al., Reference Veissiere, Constant, Ramstead, Friston and Kirmayer2019). This imperative is to maximise the evidence (a.k.a. marginal likelihood) for generative (a.k.a., world) models of how observations are caused. This imperative can be expressed as minimising an evidence bound called variational free energy (Winn & Bishop, Reference Winn and Bishop2005) that comprises complexity and accuracy (Ramstead et al., Reference Ramstead, Sakthivadivel, Heins, Koudahl, Millidge, Da Costa and Friston2022):
Accuracy corresponds to goodness of fit, while complexity scores the divergence between prior beliefs (before seeing outcomes) and posterior beliefs (afterwards). In short, complexity scores the information gain or cost of changing one's mind (in an information theoretic and thermodynamic sense, respectively). This means Bayesian belief updating is about finding an accurate explanation that is minimally complex (c.f., Occam's principle). In an enactive setting – apt for explaining decision-making – beliefs about “which plan to commit to” are based on the free energy expected under a plausible plan. This implicit planning as inference can be expressed as minimising expected free energy (Friston, Daunizeau, Kilner, & Kiebel, Reference Friston, Daunizeau, Kilner and Kiebel2010):
Risk is the divergence between probabilistic predictions about outcomes, given a plan, relative to prior preferences. Ambiguity is the expected inaccuracy. An alternative decomposition is especially interesting from the perspective of CNT:
The expected information gain underwrites the principles of optimal Bayesian design (Lindley, Reference Lindley1956), while expected cost underwrites Bayesian decision theory (Berger, Reference Berger2011). However, there is a twist that distinguishes active inference from expected utility theory. In active inference, there is no single, privileged outcome that furnishes a utility or cost function. Rather, utilities are replaced by preferences, quantified by the (log) likelihood of encountering every aspect of an observable outcome. In short, active inference appeals to two kinds of Bayes optimality and subsumes information and preference-seeking behaviour under a single objective.
What prevents CNT from using active inference for simulation, scenario modelling or computational phenotyping (Parr, Rees, & Friston, Reference Parr, Rees and Friston2018)? One answer is that the requisite generative models are too complex and difficult to specify. However, there may be some commitments of CNT that could be usefully dismantled, enabling its claims to be substantiated with simulations and implicit proof of principle.
In active inference, narratives feature as prior beliefs. Indeed, the plans – that underwrite policy selection – are often described as narratives (Friston, Rosch, Parr, Price, & Bowman, Reference Friston, Rosch, Parr, Price and Bowman2017b). So, could one cast CNT in terms of narrative (i.e., policy) selection and “planning as inference” (Attias, Reference Attias2003; Botvinick & Toussaint, Reference Botvinick and Toussaint2012; Matsumoto & Tani, Reference Matsumoto and Tani2020)? In what follows, five arguments against formalising CNT in this fashion are considered and countered.
(1) Radical uncertainty does not admit any Bayesian mechanics because the requisite probabilities do not have a well-defined outcome space.
Radical uncertainty rests upon an unknowable outcome (e.g., John Kay's wheel example). However, outcomes are known quantities that are observed. Technically, radical uncertainty refers to unknowable (i.e., hidden) causes of outcomes. However, finding the right causal explanation just is the problem of Bayesian inference. So what is radical about radical uncertainty? The answer might lie in the hierarchical nature of belief updating and implicit generative models. Given the parameters of a generative model, I can be uncertain about hidden states generating my observations. However, I can also be uncertain about the parameters, given a model. Finally, I can have uncertainty about my model. Radical uncertainty seems to concern the model structure.
Resolving the three kinds of uncertainty above corresponds to inference, learning and model selection, respectively. All entail maximising marginal likelihood or minimising free energy, with respect to posteriors over states, parameters and models, respectively. Model selection is known as structure learning in Radical Constructivism (Salakhutdinov, Tenenbaum, & Torralba, Reference Salakhutdinov, Tenenbaum and Torralba2013; Tenenbaum, Kemp, Griffiths, & Goodman, Reference Tenenbaum, Kemp, Griffiths and Goodman2011; Tervo, Tenenbaum, & Gershman, Reference Tervo, Tenenbaum and Gershman2016). Structure learning is a partly solved problem, through Bayesian model reduction (Friston, Parr, & Zeidman, Reference Friston, Parr and Zeidman2018), where redundant components are removed from an overly expressive model to maximise model evidence (e.g., Smith, Schwartenbeck, Parr, & Friston, Reference Smith, Schwartenbeck, Parr and Friston2020). This reductive approach complements non-parametric Bayes, which formalises the inclusion of new narratives (Gershman & Blei, Reference Gershman and Blei2012). In light of Bayesian model selection, one could argue that radical uncertainty admits a Bayesian mechanics.
(2) The utilities of different kinds of outcomes cannot be compared in a meaningful way.
This is only a problem if one subscribes to Bayesian decision theory as a complete account. Active inference vitiates this objection because to be Bayes optimal is to resolve uncertainty in the context of securing preferred outcomes (i.e., minimise expected free energy; see Box 1). Crucially, expected utility and information gain (Howard, Reference Howard1966; Kamar & Horvitz, Reference Kamar and Horvitz2013; Moulin & Souchay, Reference Moulin and Souchay2015) share the same currency; namely, natural units (when using natural logarithms of prior preferences). This lends a quantitative and comparable meaning to the value of information and preferences.
(3) Certain outcomes are so fuzzy they are impossible to predict and therefore one has to use heuristics.
Knowing something is unpredictable is itself an informative prior that can be installed into hierarchal generative models: c.f., Jaynes' maximum entropy principle (Jaynes, Reference Jaynes1957; Kass & Raftery, Reference Kass and Raftery1995; Sakthivadivel, Reference Sakthivadivel2022). So, how do “fast and frugal” heuristics fit into active inference? Heuristics are generally considered as priors that comply with complexity minimising imperatives (Box 1), for example, habitisation (FitzGerald, Dolan, & Friston, Reference FitzGerald, Dolan and Friston2014) or the minimisation of perceptual prediction errors (Mansell, Reference Mansell2011). In short, heuristics are exactly what active inference – under hierarchal generative models – is there to explain. On this reading of active inference, self-evidencing just is satisfising (Gerd Gigerenzer, personal communication).
(4) But people don't behave as if they were rational, or even with bounded rationality.
Many careful studies in cognitive neuroscience are concerned with how people deviate from Bayes optimality. However, this overlooks the complete class theorem (Brown, Reference Brown1981; Wald, Reference Wald1947). The complete class theorem says that for any pair of choice behaviours and cost functions, there are some priors that render the decisions Bayes-optimal. This has the fundamental implication that Bayesian mechanics cannot prescribe optimal (i.e., rational) decision-making. It can only describe rationality in terms of the priors a subject brings to the table. This insight underwrites the emerging field of computational psychiatry, where the game is to estimate the prior beliefs of patients that best explain their decision-making (Schwartenbeck & Friston, Reference Schwartenbeck and Friston2016; Smith, Khalsa, & Paulus, Reference Smith, Khalsa and Paulus2021).
(5) But the dimensionality and numerics of belief updating in realistic generative models are beyond the capacity of any computer, human or otherwise.
This argument rests on the use of sampling procedures to approximate posterior distributions, for example, likelihood free methods or approximate Bayesian computation (Chatzilena, van Leeuwen, Ratmann, Baguelin, & Demiris, Reference Chatzilena, van Leeuwen, Ratmann, Baguelin and Demiris2019; Cornish & Littenberg, Reference Cornish and Littenberg2007; Girolami & Calderhead, Reference Girolami and Calderhead2011; Ma, Chen, & Fox, Reference Ma, Chen and Fox2015; Silver & Veness, Reference Silver and Veness2010). However, active inference rests on variational schemes found in physics, high-end machine learning (Marino, Reference Marino2021) and (probably) the brain (Friston, Parr, & de Vries, Reference Friston, Parr and de Vries2017a). Variational Bayes eschews sampling by committing to a functional form for posterior beliefs; thereby converting an impossible marginalisation problem into an optimisation problem; namely, minimising variational free energy (Feynman, Reference Feynman1972). In summary, some people may think generative models with realistic narratives cannot be inverted; however, they (i.e., these people) are are existence proofs that such models can be inverted.
Conviction Narrative Theory (CNT) (target article) is both a “theory of narratives” and a “narrative theory” that precludes mathematical or numerical analysis. This commentary reviews the commitments of CNT through the lens of active inference and self-evidencing (Hohwy, Reference Hohwy2016), asking whether CNT could lend itself to a formal (Bayesian) treatment.
Box 1 summarises the fundaments of active inference, as it relates to decision-making under uncertainty. With these fundaments, one can simulate the kind of decision-making addressed by CNT. For example, active inference reproduces decision-making under unknowable circumstances (Friston et al., Reference Friston, FitzGerald, Rigoli, Schwartenbeck, O'Doherty and Pezzulo2016); it dissolves the exploration–exploitation dilemma and provides a principled account of affordances (Schwartenbeck et al., Reference Schwartenbeck, Passecker, Hauser, FitzGerald, Kronbichler and Friston2019). It can model the spread of ideas (Albarracin, Demekas, Ramstead, & Heins, Reference Albarracin, Demekas, Ramstead and Heins2022) and has been applied to cultural niche construction and social norms (Veissiere, Constant, Ramstead, Friston, & Kirmayer, Reference Veissiere, Constant, Ramstead, Friston and Kirmayer2019).
Box 1. Active inference
Recent trends in theoretical neurobiology, machine learning and artificial intelligence converge on a single imperative that explains both sense-making and decision-making in self-organising systems, from cells (Friston, Levin, Sengupta, & Pezzulo, Reference Friston, Levin, Sengupta and Pezzulo2015) to cultures (Veissiere et al., Reference Veissiere, Constant, Ramstead, Friston and Kirmayer2019). This imperative is to maximise the evidence (a.k.a. marginal likelihood) for generative (a.k.a., world) models of how observations are caused. This imperative can be expressed as minimising an evidence bound called variational free energy (Winn & Bishop, Reference Winn and Bishop2005) that comprises complexity and accuracy (Ramstead et al., Reference Ramstead, Sakthivadivel, Heins, Koudahl, Millidge, Da Costa and Friston2022):
Accuracy corresponds to goodness of fit, while complexity scores the divergence between prior beliefs (before seeing outcomes) and posterior beliefs (afterwards). In short, complexity scores the information gain or cost of changing one's mind (in an information theoretic and thermodynamic sense, respectively). This means Bayesian belief updating is about finding an accurate explanation that is minimally complex (c.f., Occam's principle). In an enactive setting – apt for explaining decision-making – beliefs about “which plan to commit to” are based on the free energy expected under a plausible plan. This implicit planning as inference can be expressed as minimising expected free energy (Friston, Daunizeau, Kilner, & Kiebel, Reference Friston, Daunizeau, Kilner and Kiebel2010):
Risk is the divergence between probabilistic predictions about outcomes, given a plan, relative to prior preferences. Ambiguity is the expected inaccuracy. An alternative decomposition is especially interesting from the perspective of CNT:
The expected information gain underwrites the principles of optimal Bayesian design (Lindley, Reference Lindley1956), while expected cost underwrites Bayesian decision theory (Berger, Reference Berger2011). However, there is a twist that distinguishes active inference from expected utility theory. In active inference, there is no single, privileged outcome that furnishes a utility or cost function. Rather, utilities are replaced by preferences, quantified by the (log) likelihood of encountering every aspect of an observable outcome. In short, active inference appeals to two kinds of Bayes optimality and subsumes information and preference-seeking behaviour under a single objective.
What prevents CNT from using active inference for simulation, scenario modelling or computational phenotyping (Parr, Rees, & Friston, Reference Parr, Rees and Friston2018)? One answer is that the requisite generative models are too complex and difficult to specify. However, there may be some commitments of CNT that could be usefully dismantled, enabling its claims to be substantiated with simulations and implicit proof of principle.
In active inference, narratives feature as prior beliefs. Indeed, the plans – that underwrite policy selection – are often described as narratives (Friston, Rosch, Parr, Price, & Bowman, Reference Friston, Rosch, Parr, Price and Bowman2017b). So, could one cast CNT in terms of narrative (i.e., policy) selection and “planning as inference” (Attias, Reference Attias2003; Botvinick & Toussaint, Reference Botvinick and Toussaint2012; Matsumoto & Tani, Reference Matsumoto and Tani2020)? In what follows, five arguments against formalising CNT in this fashion are considered and countered.
(1) Radical uncertainty does not admit any Bayesian mechanics because the requisite probabilities do not have a well-defined outcome space.
Radical uncertainty rests upon an unknowable outcome (e.g., John Kay's wheel example). However, outcomes are known quantities that are observed. Technically, radical uncertainty refers to unknowable (i.e., hidden) causes of outcomes. However, finding the right causal explanation just is the problem of Bayesian inference. So what is radical about radical uncertainty? The answer might lie in the hierarchical nature of belief updating and implicit generative models. Given the parameters of a generative model, I can be uncertain about hidden states generating my observations. However, I can also be uncertain about the parameters, given a model. Finally, I can have uncertainty about my model. Radical uncertainty seems to concern the model structure.
Resolving the three kinds of uncertainty above corresponds to inference, learning and model selection, respectively. All entail maximising marginal likelihood or minimising free energy, with respect to posteriors over states, parameters and models, respectively. Model selection is known as structure learning in Radical Constructivism (Salakhutdinov, Tenenbaum, & Torralba, Reference Salakhutdinov, Tenenbaum and Torralba2013; Tenenbaum, Kemp, Griffiths, & Goodman, Reference Tenenbaum, Kemp, Griffiths and Goodman2011; Tervo, Tenenbaum, & Gershman, Reference Tervo, Tenenbaum and Gershman2016). Structure learning is a partly solved problem, through Bayesian model reduction (Friston, Parr, & Zeidman, Reference Friston, Parr and Zeidman2018), where redundant components are removed from an overly expressive model to maximise model evidence (e.g., Smith, Schwartenbeck, Parr, & Friston, Reference Smith, Schwartenbeck, Parr and Friston2020). This reductive approach complements non-parametric Bayes, which formalises the inclusion of new narratives (Gershman & Blei, Reference Gershman and Blei2012). In light of Bayesian model selection, one could argue that radical uncertainty admits a Bayesian mechanics.
(2) The utilities of different kinds of outcomes cannot be compared in a meaningful way.
This is only a problem if one subscribes to Bayesian decision theory as a complete account. Active inference vitiates this objection because to be Bayes optimal is to resolve uncertainty in the context of securing preferred outcomes (i.e., minimise expected free energy; see Box 1). Crucially, expected utility and information gain (Howard, Reference Howard1966; Kamar & Horvitz, Reference Kamar and Horvitz2013; Moulin & Souchay, Reference Moulin and Souchay2015) share the same currency; namely, natural units (when using natural logarithms of prior preferences). This lends a quantitative and comparable meaning to the value of information and preferences.
(3) Certain outcomes are so fuzzy they are impossible to predict and therefore one has to use heuristics.
Knowing something is unpredictable is itself an informative prior that can be installed into hierarchal generative models: c.f., Jaynes' maximum entropy principle (Jaynes, Reference Jaynes1957; Kass & Raftery, Reference Kass and Raftery1995; Sakthivadivel, Reference Sakthivadivel2022). So, how do “fast and frugal” heuristics fit into active inference? Heuristics are generally considered as priors that comply with complexity minimising imperatives (Box 1), for example, habitisation (FitzGerald, Dolan, & Friston, Reference FitzGerald, Dolan and Friston2014) or the minimisation of perceptual prediction errors (Mansell, Reference Mansell2011). In short, heuristics are exactly what active inference – under hierarchal generative models – is there to explain. On this reading of active inference, self-evidencing just is satisfising (Gerd Gigerenzer, personal communication).
(4) But people don't behave as if they were rational, or even with bounded rationality.
Many careful studies in cognitive neuroscience are concerned with how people deviate from Bayes optimality. However, this overlooks the complete class theorem (Brown, Reference Brown1981; Wald, Reference Wald1947). The complete class theorem says that for any pair of choice behaviours and cost functions, there are some priors that render the decisions Bayes-optimal. This has the fundamental implication that Bayesian mechanics cannot prescribe optimal (i.e., rational) decision-making. It can only describe rationality in terms of the priors a subject brings to the table. This insight underwrites the emerging field of computational psychiatry, where the game is to estimate the prior beliefs of patients that best explain their decision-making (Schwartenbeck & Friston, Reference Schwartenbeck and Friston2016; Smith, Khalsa, & Paulus, Reference Smith, Khalsa and Paulus2021).
(5) But the dimensionality and numerics of belief updating in realistic generative models are beyond the capacity of any computer, human or otherwise.
This argument rests on the use of sampling procedures to approximate posterior distributions, for example, likelihood free methods or approximate Bayesian computation (Chatzilena, van Leeuwen, Ratmann, Baguelin, & Demiris, Reference Chatzilena, van Leeuwen, Ratmann, Baguelin and Demiris2019; Cornish & Littenberg, Reference Cornish and Littenberg2007; Girolami & Calderhead, Reference Girolami and Calderhead2011; Ma, Chen, & Fox, Reference Ma, Chen and Fox2015; Silver & Veness, Reference Silver and Veness2010). However, active inference rests on variational schemes found in physics, high-end machine learning (Marino, Reference Marino2021) and (probably) the brain (Friston, Parr, & de Vries, Reference Friston, Parr and de Vries2017a). Variational Bayes eschews sampling by committing to a functional form for posterior beliefs; thereby converting an impossible marginalisation problem into an optimisation problem; namely, minimising variational free energy (Feynman, Reference Feynman1972). In summary, some people may think generative models with realistic narratives cannot be inverted; however, they (i.e., these people) are are existence proofs that such models can be inverted.
Financial support
KF is supported by funding for the Wellcome Centre for Human Neuroimaging (Ref: 205103/Z/16/Z) and a Canada-UK Artificial Intelligence Initiative (Ref: ES/T01279X/1).
Competing interest
None.