Ivancovsky et al. propose fruitful connections between curiosity and creativity under an exploration–exploitation trade-off. The explore–exploit trade-off is the decision between a familiar option with known value and an unfamiliar option with unknown or uncertain value (Addicott, Pearson, Sweitzer, Barack, & Platt, Reference Addicott, Pearson, Sweitzer, Barack and Platt2017). Choosing unfamiliar options is risking time, energy, and foregone reward in return for information (Rubin, Shamir, & Tishby, Reference Rubin, Shamir and Tishby2012).
These ideas have history in reinforcement learning. For example, novelty-seeking is important to prevent failures of learning where subpar solutions are settled on prematurely (Fox, Pakman, & Tishby, Reference Fox, Pakman and Tishby2015). Despite the benefits of novelty-seeking, seeking novel information can also carry a high cost when forgoing familiar opportunities and accruing a burdensome amount of information (Wilson, Bonawitz, Costa, & Ebitz, Reference Wilson, Bonawitz, Costa and Ebitz2021). Thus, one must manage costs by taking “sensible risks” which balance exploring to learn novel information about the environment with accruing increasingly complex information for different tasks at hand (Sternberg & Lubart, Reference Sternberg and Lubart1996). One way to encourage taking on these risks for exploration is to use heuristics which locally track what has and has not been seen (Tang et al., Reference Tang, Houthooft, Foote, Stooke, Xi Chen, Duan and Abbeel2017; Wittmann, Bunzeck, Dolan, & Düzel, Reference Wittmann, Bunzeck, Dolan and Düzel2007; Wittmann, Daw, Seymour, & Dolan, Reference Wittmann, Daw, Seymour and Dolan2008). By contrast, preferring familiarity can manifest as a form of perseverative information seeking that was associated with deprivation curiosity (Lydon-Staley, Zhou, Blevins, Zurn, & Bassett, Reference Lydon-Staley, Zhou, Blevins, Zurn and Bassett2021), a drive to reduce uncertainty and acquire missing information (Kashdan et al., Reference Kashdan, Stiksma, Disabato, McKnight, Bekier, Kaji and Lazarus2018; Litman, Reference Litman2008). This preference for familiarity has been seen as prevalent in people with greater depressed mood and anxiety (Zhou et al., Reference Zhou, Patankar, Lydon-Staley, Zurn, Gerlach and Bassett2023), and may be an important heuristic strategy to reduce uncertainty for better reliability of future-oriented decisions (Harhen & Bornstein, Reference Harhen and Bornstein2023; Jiang, Kulesza, Singh, & Lewis, Reference Jiang, Kulesza, Singh and Lewis2015). However, in large environments, such local heuristics are impoverished, particularly when higher-order associations are needed for planning. This need for richer measurements motivates the use of network science tools to formalize both local and global relationships as internal representations of the environment (Yoo, Bornstein, & Chrastil, Reference Yoo, Bornstein and Chrastil2023; Zhou, Lydon-Staley, Zurn, & Bassett, Reference Zhou, Lydon-Staley, Zurn and Bassett2020). Thus, we propose expansions of the novelty-seeking model using reinforcement learning approaches to exploration and network science perspectives on information complexity and compression.
Ivancovsky et al. rightly note that curiosity and creativity must involve a dynamic policy of behavior that adaptively alternates between modes of exploration and exploitation. Reinforcement learning approaches reveal what behavior pattern, or policy, is appropriate for a given task and environment, for instance adapted to the sparsity of rewarding solutions (Gershman & Niv, Reference Gershman and Niv2015). To this end, the reinforcement learning approach of Harada (Reference Harada2020) was described. However, notably this paper reported that divergent and convergent thinking measures of creativity and the personality trait of openness to experience (a proxy for being “inventive/curious”) were not robustly associated to exploration and exploitation behavior based on model-free reinforcement learning (Harada, Reference Harada2020). This finding and other work (Jach et al., Reference Jach, Cools, Frisvold, Grubb, Hartley and Hartman2023; Molinaro et al., Reference Molinaro, Cogliati Dezza, Bühler, Moutsiana and Sharot2023) highlight the need for understanding creativity via more sophisticated models of the value of exploration.
The value of information is sometimes treated as a simple heuristic for predisposing choices toward exploration (Gottlieb, Oudeyer, Lopes, & Baranes, Reference Gottlieb, Oudeyer, Lopes and Baranes2013), but the value can also be formally expanded as the change in future expected value that results from increasing certainty over representations of the environment and sequence of choices (Kaelbling, Littman, & Cassandra, Reference Kaelbling, Littman and Cassandra1998). These planning and policy iteration approaches aim for more global knowledge about the environment, and thereby differ from the local count-based reward functions to encourage exploration (Masís, Chapman, Rhee, Cox, & Saxe, Reference Masís, Chapman, Rhee, Cox and Saxe2023; Oudeyer & Kaplan, Reference Oudeyer and Kaplan2007; Tang et al., Reference Tang, Houthooft, Foote, Stooke, Xi Chen, Duan and Abbeel2017; Wittmann et al., Reference Wittmann, Daw, Seymour and Dolan2008). Here we focus on approaches that balance the increased long-run discounted expected value of knowledge with the cost of sampling (exploration) (Kaelbling et al., Reference Kaelbling, Littman and Cassandra1998). To this end, the focus of choices shifts from an explore-or-exploit distinction to the iterative improvement of knowledge of the environment by testing predictions and simulations of future outcomes according to a given action policy (Gruber & Ranganath, Reference Gruber and Ranganath2019; Kobayashi, Ravaioli, Baranès, Woodford, & Gottlieb, Reference Kobayashi, Ravaioli, Baranès, Woodford and Gottlieb2019; Wilson, Wang, Sadeghiyeh, & Cohen, Reference Wilson, Wang, Sadeghiyeh and Cohen2020; Dubey & Griffiths, Reference Dubey and Griffiths2020; Liquin & Gopnik, Reference Liquin and Gopnik2022).
We describe two areas of future research. First, creative insights can emerge from expanded planning horizons. Planning is commonly implemented as a search over a decision tree, wherein expanded horizons entail a deeper search in the tree. When the internal representation of information about the causal structure of the environment is accurate, longer planning horizons are useful. However, when the representation is incomplete, a smaller planning horizon compresses the policy space and prevents overfitting to past observations (Jiang et al., Reference Jiang, Kulesza, Singh and Lewis2015). Humans can search over more complex structures in knowledge representations (Yoo et al., Reference Yoo, Bornstein and Chrastil2023). That knowledge may be more modular and compressible, allowing for the grouped representation of a more diverse chain of actions (Lai & Gershman, Reference Lai and Gershman2021; Momennejad, Reference Momennejad2020; Patankar et al., Reference Patankar, Zhou, Lynn, Kim, Ouellet, Ju and Bassett2023; Schapiro, Rogers, Cordova, Turk-Browne, & Botvinick, Reference Schapiro, Rogers, Cordova, Turk-Browne and Botvinick2013; Stachenfeld, Botvinick, & Gershman, Reference Stachenfeld, Botvinick and Gershman2017). The ability to use more complex knowledge structures may involve a spatial-like ability to navigate those structures (Rmus, Ritz, Hunter, Bornstein, & Shenhav, Reference Rmus, Ritz, Hunter, Bornstein and Shenhav2022), as well as a metacognitive ability to balance knowledge uncertainty with deeper planning (Schulz & Bonawitz, Reference Schulz and Bonawitz2007; Wade & Kidd, Reference Wade and Kidd2019; Nussenbaum et al., Reference Nussenbaum, Martin, Maulhardt, Yang, Bizzell-Hatcher, Bhatt, Koenig, Rosenbaum, O'Doherty, Cockburn and Hartley2023). Indeed, a form of mental navigation that spans diverse spaces has been proposed to be linked with both creativity and curiosity (Aru, Drüke, Pikamäe, & Larkum, Reference Aru, Drüke, Pikamäe and Larkum2023; Eysenbach, Gupta, Ibarz, & Levine, Reference Eysenbach, Gupta, Ibarz and Levine2018; Zhou et al., Reference Zhou, Patankar, Lydon-Staley, Zurn, Gerlach and Bassett2023). Although such diversity and depth can decrease knowledge uncertainty, it comes at the cost of time and computational resources to accrue and update information. Computational cost motivates the next direction of research.
Second, creatively recombining knowledge benefits from unlearning or updating outdated knowledge. This form of creativity complements a type of curiosity that is characterized by deconstructing and rebuilding current structures (Zurn, Reference Zurn2021). When an agent seizes onto a supposedly optimal choice that is actually suboptimal, future resources must be used to unlearn those experiences (Fox et al., Reference Fox, Pakman and Tishby2015). This is precisely a problem that deprivation curiosity can exacerbate (Kruglanski & Webster, Reference Kruglanski and Webster2018; Zedelius, Gross, & Schooler, Reference Zedelius, Gross and Schooler2022). A solution to this problem involves aiming for simpler, compressed policies by chunking actions (Lai & Gershman, Reference Lai and Gershman2021). Compression involves smartly discarding some information to efficiently redescribe the information, such as by describing an elephant and a chicken with one joint description rather than describing each alone (Cover & Thomas, Reference Cover and Thomas1991; Mack, Preston, & Love, Reference Mack, Preston and Love2020). In order to modulate the planning horizon, policies could be compressed to increase certainty, albeit over an impoverished model. This idea is related to strategically decomposing, aggregating, and reducing sequences of actions into a hierarchy of “options” (Botvinick, Niv, & Barto, Reference Botvinick, Niv and Barto2009; Sutton, Precup, & Singh, Reference Sutton, Precup and Singh1999) to balance the growing cost of planning (Botvinick, Reference Botvinick2012; Correa, Ho, Callaway, Daw, & Griffiths, Reference Correa, Ho, Callaway, Daw and Griffiths2023). The idea also relates to a computational form of curiosity that involves improving prediction of expected long-term value (Gruber & Ranganath, Reference Gruber and Ranganath2019; Schmidhuber, Reference Schmidhuber, Pezzulo, Butz, Sigaud and Baldassarre2008). Prediction is related to compression because the best compression is the true data generating model, and the true data generating model is the most predictive (Shannon, Reference Shannon1948). Notably, neural activity has been measured to be most compressed in the default-mode network (Mack et al., Reference Mack, Preston and Love2020; Zhou et al., Reference Zhou, Kim, Pines, Sydnor, Roalf, Detre and Bassett2022), a network of regions central to the proposed novelty-seeking model. Default-mode activity is also associated with the simulation of hypothetical episodes (Schacter & Addis, Reference Schacter and Addis2007) and the replay of episodic memories (Schapiro, McDevitt, Rogers, Mednick, & Norman, Reference Schapiro, McDevitt, Rogers, Mednick and Norman2018), which can help to plan or update actions from new experiences (Kauvar, Doyle, Zhou, & Haber, Reference Kauvar, Doyle, Zhou and Haber2023; Wilson et al., Reference Wilson, Wang, Sadeghiyeh and Cohen2020).
In conclusion, curiosity could be thought of computationally as actions taken to justify the expansion of one's planning horizon. The consequent cost of increased complexity can be managed by creatively compressing action policies, which further supports the pursuit of long-term goals.
Ivancovsky et al. propose fruitful connections between curiosity and creativity under an exploration–exploitation trade-off. The explore–exploit trade-off is the decision between a familiar option with known value and an unfamiliar option with unknown or uncertain value (Addicott, Pearson, Sweitzer, Barack, & Platt, Reference Addicott, Pearson, Sweitzer, Barack and Platt2017). Choosing unfamiliar options is risking time, energy, and foregone reward in return for information (Rubin, Shamir, & Tishby, Reference Rubin, Shamir and Tishby2012).
These ideas have history in reinforcement learning. For example, novelty-seeking is important to prevent failures of learning where subpar solutions are settled on prematurely (Fox, Pakman, & Tishby, Reference Fox, Pakman and Tishby2015). Despite the benefits of novelty-seeking, seeking novel information can also carry a high cost when forgoing familiar opportunities and accruing a burdensome amount of information (Wilson, Bonawitz, Costa, & Ebitz, Reference Wilson, Bonawitz, Costa and Ebitz2021). Thus, one must manage costs by taking “sensible risks” which balance exploring to learn novel information about the environment with accruing increasingly complex information for different tasks at hand (Sternberg & Lubart, Reference Sternberg and Lubart1996). One way to encourage taking on these risks for exploration is to use heuristics which locally track what has and has not been seen (Tang et al., Reference Tang, Houthooft, Foote, Stooke, Xi Chen, Duan and Abbeel2017; Wittmann, Bunzeck, Dolan, & Düzel, Reference Wittmann, Bunzeck, Dolan and Düzel2007; Wittmann, Daw, Seymour, & Dolan, Reference Wittmann, Daw, Seymour and Dolan2008). By contrast, preferring familiarity can manifest as a form of perseverative information seeking that was associated with deprivation curiosity (Lydon-Staley, Zhou, Blevins, Zurn, & Bassett, Reference Lydon-Staley, Zhou, Blevins, Zurn and Bassett2021), a drive to reduce uncertainty and acquire missing information (Kashdan et al., Reference Kashdan, Stiksma, Disabato, McKnight, Bekier, Kaji and Lazarus2018; Litman, Reference Litman2008). This preference for familiarity has been seen as prevalent in people with greater depressed mood and anxiety (Zhou et al., Reference Zhou, Patankar, Lydon-Staley, Zurn, Gerlach and Bassett2023), and may be an important heuristic strategy to reduce uncertainty for better reliability of future-oriented decisions (Harhen & Bornstein, Reference Harhen and Bornstein2023; Jiang, Kulesza, Singh, & Lewis, Reference Jiang, Kulesza, Singh and Lewis2015). However, in large environments, such local heuristics are impoverished, particularly when higher-order associations are needed for planning. This need for richer measurements motivates the use of network science tools to formalize both local and global relationships as internal representations of the environment (Yoo, Bornstein, & Chrastil, Reference Yoo, Bornstein and Chrastil2023; Zhou, Lydon-Staley, Zurn, & Bassett, Reference Zhou, Lydon-Staley, Zurn and Bassett2020). Thus, we propose expansions of the novelty-seeking model using reinforcement learning approaches to exploration and network science perspectives on information complexity and compression.
Ivancovsky et al. rightly note that curiosity and creativity must involve a dynamic policy of behavior that adaptively alternates between modes of exploration and exploitation. Reinforcement learning approaches reveal what behavior pattern, or policy, is appropriate for a given task and environment, for instance adapted to the sparsity of rewarding solutions (Gershman & Niv, Reference Gershman and Niv2015). To this end, the reinforcement learning approach of Harada (Reference Harada2020) was described. However, notably this paper reported that divergent and convergent thinking measures of creativity and the personality trait of openness to experience (a proxy for being “inventive/curious”) were not robustly associated to exploration and exploitation behavior based on model-free reinforcement learning (Harada, Reference Harada2020). This finding and other work (Jach et al., Reference Jach, Cools, Frisvold, Grubb, Hartley and Hartman2023; Molinaro et al., Reference Molinaro, Cogliati Dezza, Bühler, Moutsiana and Sharot2023) highlight the need for understanding creativity via more sophisticated models of the value of exploration.
The value of information is sometimes treated as a simple heuristic for predisposing choices toward exploration (Gottlieb, Oudeyer, Lopes, & Baranes, Reference Gottlieb, Oudeyer, Lopes and Baranes2013), but the value can also be formally expanded as the change in future expected value that results from increasing certainty over representations of the environment and sequence of choices (Kaelbling, Littman, & Cassandra, Reference Kaelbling, Littman and Cassandra1998). These planning and policy iteration approaches aim for more global knowledge about the environment, and thereby differ from the local count-based reward functions to encourage exploration (Masís, Chapman, Rhee, Cox, & Saxe, Reference Masís, Chapman, Rhee, Cox and Saxe2023; Oudeyer & Kaplan, Reference Oudeyer and Kaplan2007; Tang et al., Reference Tang, Houthooft, Foote, Stooke, Xi Chen, Duan and Abbeel2017; Wittmann et al., Reference Wittmann, Daw, Seymour and Dolan2008). Here we focus on approaches that balance the increased long-run discounted expected value of knowledge with the cost of sampling (exploration) (Kaelbling et al., Reference Kaelbling, Littman and Cassandra1998). To this end, the focus of choices shifts from an explore-or-exploit distinction to the iterative improvement of knowledge of the environment by testing predictions and simulations of future outcomes according to a given action policy (Gruber & Ranganath, Reference Gruber and Ranganath2019; Kobayashi, Ravaioli, Baranès, Woodford, & Gottlieb, Reference Kobayashi, Ravaioli, Baranès, Woodford and Gottlieb2019; Wilson, Wang, Sadeghiyeh, & Cohen, Reference Wilson, Wang, Sadeghiyeh and Cohen2020; Dubey & Griffiths, Reference Dubey and Griffiths2020; Liquin & Gopnik, Reference Liquin and Gopnik2022).
We describe two areas of future research. First, creative insights can emerge from expanded planning horizons. Planning is commonly implemented as a search over a decision tree, wherein expanded horizons entail a deeper search in the tree. When the internal representation of information about the causal structure of the environment is accurate, longer planning horizons are useful. However, when the representation is incomplete, a smaller planning horizon compresses the policy space and prevents overfitting to past observations (Jiang et al., Reference Jiang, Kulesza, Singh and Lewis2015). Humans can search over more complex structures in knowledge representations (Yoo et al., Reference Yoo, Bornstein and Chrastil2023). That knowledge may be more modular and compressible, allowing for the grouped representation of a more diverse chain of actions (Lai & Gershman, Reference Lai and Gershman2021; Momennejad, Reference Momennejad2020; Patankar et al., Reference Patankar, Zhou, Lynn, Kim, Ouellet, Ju and Bassett2023; Schapiro, Rogers, Cordova, Turk-Browne, & Botvinick, Reference Schapiro, Rogers, Cordova, Turk-Browne and Botvinick2013; Stachenfeld, Botvinick, & Gershman, Reference Stachenfeld, Botvinick and Gershman2017). The ability to use more complex knowledge structures may involve a spatial-like ability to navigate those structures (Rmus, Ritz, Hunter, Bornstein, & Shenhav, Reference Rmus, Ritz, Hunter, Bornstein and Shenhav2022), as well as a metacognitive ability to balance knowledge uncertainty with deeper planning (Schulz & Bonawitz, Reference Schulz and Bonawitz2007; Wade & Kidd, Reference Wade and Kidd2019; Nussenbaum et al., Reference Nussenbaum, Martin, Maulhardt, Yang, Bizzell-Hatcher, Bhatt, Koenig, Rosenbaum, O'Doherty, Cockburn and Hartley2023). Indeed, a form of mental navigation that spans diverse spaces has been proposed to be linked with both creativity and curiosity (Aru, Drüke, Pikamäe, & Larkum, Reference Aru, Drüke, Pikamäe and Larkum2023; Eysenbach, Gupta, Ibarz, & Levine, Reference Eysenbach, Gupta, Ibarz and Levine2018; Zhou et al., Reference Zhou, Patankar, Lydon-Staley, Zurn, Gerlach and Bassett2023). Although such diversity and depth can decrease knowledge uncertainty, it comes at the cost of time and computational resources to accrue and update information. Computational cost motivates the next direction of research.
Second, creatively recombining knowledge benefits from unlearning or updating outdated knowledge. This form of creativity complements a type of curiosity that is characterized by deconstructing and rebuilding current structures (Zurn, Reference Zurn2021). When an agent seizes onto a supposedly optimal choice that is actually suboptimal, future resources must be used to unlearn those experiences (Fox et al., Reference Fox, Pakman and Tishby2015). This is precisely a problem that deprivation curiosity can exacerbate (Kruglanski & Webster, Reference Kruglanski and Webster2018; Zedelius, Gross, & Schooler, Reference Zedelius, Gross and Schooler2022). A solution to this problem involves aiming for simpler, compressed policies by chunking actions (Lai & Gershman, Reference Lai and Gershman2021). Compression involves smartly discarding some information to efficiently redescribe the information, such as by describing an elephant and a chicken with one joint description rather than describing each alone (Cover & Thomas, Reference Cover and Thomas1991; Mack, Preston, & Love, Reference Mack, Preston and Love2020). In order to modulate the planning horizon, policies could be compressed to increase certainty, albeit over an impoverished model. This idea is related to strategically decomposing, aggregating, and reducing sequences of actions into a hierarchy of “options” (Botvinick, Niv, & Barto, Reference Botvinick, Niv and Barto2009; Sutton, Precup, & Singh, Reference Sutton, Precup and Singh1999) to balance the growing cost of planning (Botvinick, Reference Botvinick2012; Correa, Ho, Callaway, Daw, & Griffiths, Reference Correa, Ho, Callaway, Daw and Griffiths2023). The idea also relates to a computational form of curiosity that involves improving prediction of expected long-term value (Gruber & Ranganath, Reference Gruber and Ranganath2019; Schmidhuber, Reference Schmidhuber, Pezzulo, Butz, Sigaud and Baldassarre2008). Prediction is related to compression because the best compression is the true data generating model, and the true data generating model is the most predictive (Shannon, Reference Shannon1948). Notably, neural activity has been measured to be most compressed in the default-mode network (Mack et al., Reference Mack, Preston and Love2020; Zhou et al., Reference Zhou, Kim, Pines, Sydnor, Roalf, Detre and Bassett2022), a network of regions central to the proposed novelty-seeking model. Default-mode activity is also associated with the simulation of hypothetical episodes (Schacter & Addis, Reference Schacter and Addis2007) and the replay of episodic memories (Schapiro, McDevitt, Rogers, Mednick, & Norman, Reference Schapiro, McDevitt, Rogers, Mednick and Norman2018), which can help to plan or update actions from new experiences (Kauvar, Doyle, Zhou, & Haber, Reference Kauvar, Doyle, Zhou and Haber2023; Wilson et al., Reference Wilson, Wang, Sadeghiyeh and Cohen2020).
In conclusion, curiosity could be thought of computationally as actions taken to justify the expansion of one's planning horizon. The consequent cost of increased complexity can be managed by creatively compressing action policies, which further supports the pursuit of long-term goals.
Financial support
D. Z. acknowledges funding from the George E. Hewitt Foundation for Medical Research. A. M. B. acknowledges funding from NINDS R01NS119468 (PI: E.R. Chrastil) and NIMH R01MH128306 (PI: M.A. Yassa).
Competing interests
None.