Meta-learning as a model allows researchers to posit how human and other biological learning systems might learn from experience in a structured manner, including by relating experiences across timescales or latent causes non-uniformly. Meta-learning as a tool allows researchers to posit flexible and data-driven learning algorithms as computational models of human learning than those are readily expressed by machine learning algorithms such as gradient descent with canonical parameters, or inference in a Bayesian model in which exact inference is tractable. These senses of a “meta-learning” and a “meta-learned” model align with the dichotomy employed in Binz et al.
Meta-learning in both senses and using the implementation focused on in Binz et al. – a recurrent neural network – further inherits characteristics of connectionism: Universal approximation, ease of specification, manipulability (including of complexity), and integration of neuroscientific findings, which Binz et al. rightly note as positives. However, this implementation of meta-learning also inherits the challenges of a connectionist approach: Lack of interpretability (the ease with which humans can understand the workings and outputs of a system) and controllability (the ability to modify a model's behavior or learning process to achieve specific outcomes).
These benefits and drawbacks of the bottom-up, emergentist approach of connectionism have been discussed at length, including in this journal (Smolensky, Reference Smolensky1988). As a result of these discussions, a common ground between these and top-down structured approaches such as Bayesian cognitive modeling has emerged: That models posed in different description languages may not be at odds simply because they are posed at different levels of analysis, and in fact should be tested for complementarity (Rogers & McClelland, Reference Rogers and McClelland2008; Griffiths, Vul, & Sanborn, Reference Griffiths, Vul and Sanborn2012).
It is this integrative approach that I view as the most fruitful in examining the validity of meta-learning and meta-learned cognitive models precisely because (1) it allows us to address the challenges of working within a single paradigm (say, the lack of interpretability of a connectionist approach) at the same time as (2) providing stronger grounds on which to refute a cognitive model (say, by its inconsistency with evidence from neural recordings, or its inability to account for how an ecological task is solved). Making use of the former benefit is especially critical, as the meta-learned models commonly employed, including by Binz et al., have the potential to be even more inscrutable than a connectionist model initialized in a data-agnostic way.
Binz et al. discuss two studies of meta-learning and meta-learned models that bridge levels of analysis in this manner: Firstly, a meta-learning algorithm has been tested against experimental neuroscience findings in prefrontal cortex (Wang et al., Reference Wang, Kurth-Nelson, Kumaran, Tirumala, Soyer, Leibo and Botvinick2018). Secondly, a meta-learned recurrent neural network can approximate the posterior predictive distribution picked out as optimal by a Bayesian approach (Ortega et al., Reference Ortega, Wang, Rowland, Genewein, Kurth-Nelson, Pascanu and Legg2019). Connecting neuroscientific findings with computational-level analysis via algorithm is an exciting result. However, as Binz et al. note, the goodness of fit of the meta-learned approximation employed in both studies is not guaranteed, and has been empirically demonstrated to be poor.
As a contrast to an approach that makes use of approximation, our work (Grant, Finn, Levine, Darrell, & Griffiths, 2018) draws a formal connection between a connectionist implementation of meta-learning and inference in a hierarchical Bayesian model by making precise the prior, likelihood, and parameter estimation procedure implied in the use of the meta-learning implementation. Equivalently, this result describes a way to implement a rational solution to a problem of learning-to-learn in a connectionist architecture (though there are likely to be many equivalent implementations). A formal integration across levels like this is tighter than an approximation approach, and therefore provides a firmer footing for integrative constraints across levels of analysis.
Follow-up investigations have made use of this connection between computational-level and algorithmic-level approaches. For example, in McCoy, Grant, Smolensky, Griffiths, and Linzen (Reference McCoy, Grant, Smolensky, Griffiths and Linzen2020), we used an analogous setup to Grant et al. (2018) to meta-learn a syllable typology in a limited data setting akin to an impoverished language learning environment. To better accommodate the complex dynamics of learning, we relaxed some constraints on the meta-learning algorithm, thus for the moment doing away with the tight connection between the algorithmic and computational levels. However, in sticking with methods – namely tuning the gradient-based initialization for learning in a neural network – for which ongoing research in machine learning is formally characterizing how prior knowledge (Dominé, Braun, Fitzgerald, & Saxe, Reference Dominé, Braun, Fitzgerald and Saxe2023), including data-driven prior knowledge (Lindsey & Lippl, Reference Lindsey and Lippl2023), interacts with the learning algorithm and environment, my view is that these approaches will soon benefit from tighter connections between the algorithmic and computational levels echoing to the connection derived in Grant et al. (2018).
Absent these connections, because meta-learning and meta-learned models are underconstrained and data-driven, it is challenging to evaluate the validity and implications of these models for our understanding of how experience shapes learning. Thus, scientists interested in the place of meta-learning and meta-learned models in cognitive science should work to make precise the constraints that these models imply across levels of analysis, including by making use of analytical techniques from machine learning, at the same time looking into complementary constraints from experimental neuroscience, and ecologically relevant environments. Given that so many aspects remain open, it is an exciting time to be working with and on meta-learning toolkit.
Meta-learning as a model allows researchers to posit how human and other biological learning systems might learn from experience in a structured manner, including by relating experiences across timescales or latent causes non-uniformly. Meta-learning as a tool allows researchers to posit flexible and data-driven learning algorithms as computational models of human learning than those are readily expressed by machine learning algorithms such as gradient descent with canonical parameters, or inference in a Bayesian model in which exact inference is tractable. These senses of a “meta-learning” and a “meta-learned” model align with the dichotomy employed in Binz et al.
Meta-learning in both senses and using the implementation focused on in Binz et al. – a recurrent neural network – further inherits characteristics of connectionism: Universal approximation, ease of specification, manipulability (including of complexity), and integration of neuroscientific findings, which Binz et al. rightly note as positives. However, this implementation of meta-learning also inherits the challenges of a connectionist approach: Lack of interpretability (the ease with which humans can understand the workings and outputs of a system) and controllability (the ability to modify a model's behavior or learning process to achieve specific outcomes).
These benefits and drawbacks of the bottom-up, emergentist approach of connectionism have been discussed at length, including in this journal (Smolensky, Reference Smolensky1988). As a result of these discussions, a common ground between these and top-down structured approaches such as Bayesian cognitive modeling has emerged: That models posed in different description languages may not be at odds simply because they are posed at different levels of analysis, and in fact should be tested for complementarity (Rogers & McClelland, Reference Rogers and McClelland2008; Griffiths, Vul, & Sanborn, Reference Griffiths, Vul and Sanborn2012).
It is this integrative approach that I view as the most fruitful in examining the validity of meta-learning and meta-learned cognitive models precisely because (1) it allows us to address the challenges of working within a single paradigm (say, the lack of interpretability of a connectionist approach) at the same time as (2) providing stronger grounds on which to refute a cognitive model (say, by its inconsistency with evidence from neural recordings, or its inability to account for how an ecological task is solved). Making use of the former benefit is especially critical, as the meta-learned models commonly employed, including by Binz et al., have the potential to be even more inscrutable than a connectionist model initialized in a data-agnostic way.
Binz et al. discuss two studies of meta-learning and meta-learned models that bridge levels of analysis in this manner: Firstly, a meta-learning algorithm has been tested against experimental neuroscience findings in prefrontal cortex (Wang et al., Reference Wang, Kurth-Nelson, Kumaran, Tirumala, Soyer, Leibo and Botvinick2018). Secondly, a meta-learned recurrent neural network can approximate the posterior predictive distribution picked out as optimal by a Bayesian approach (Ortega et al., Reference Ortega, Wang, Rowland, Genewein, Kurth-Nelson, Pascanu and Legg2019). Connecting neuroscientific findings with computational-level analysis via algorithm is an exciting result. However, as Binz et al. note, the goodness of fit of the meta-learned approximation employed in both studies is not guaranteed, and has been empirically demonstrated to be poor.
As a contrast to an approach that makes use of approximation, our work (Grant, Finn, Levine, Darrell, & Griffiths, 2018) draws a formal connection between a connectionist implementation of meta-learning and inference in a hierarchical Bayesian model by making precise the prior, likelihood, and parameter estimation procedure implied in the use of the meta-learning implementation. Equivalently, this result describes a way to implement a rational solution to a problem of learning-to-learn in a connectionist architecture (though there are likely to be many equivalent implementations). A formal integration across levels like this is tighter than an approximation approach, and therefore provides a firmer footing for integrative constraints across levels of analysis.
Follow-up investigations have made use of this connection between computational-level and algorithmic-level approaches. For example, in McCoy, Grant, Smolensky, Griffiths, and Linzen (Reference McCoy, Grant, Smolensky, Griffiths and Linzen2020), we used an analogous setup to Grant et al. (2018) to meta-learn a syllable typology in a limited data setting akin to an impoverished language learning environment. To better accommodate the complex dynamics of learning, we relaxed some constraints on the meta-learning algorithm, thus for the moment doing away with the tight connection between the algorithmic and computational levels. However, in sticking with methods – namely tuning the gradient-based initialization for learning in a neural network – for which ongoing research in machine learning is formally characterizing how prior knowledge (Dominé, Braun, Fitzgerald, & Saxe, Reference Dominé, Braun, Fitzgerald and Saxe2023), including data-driven prior knowledge (Lindsey & Lippl, Reference Lindsey and Lippl2023), interacts with the learning algorithm and environment, my view is that these approaches will soon benefit from tighter connections between the algorithmic and computational levels echoing to the connection derived in Grant et al. (2018).
Absent these connections, because meta-learning and meta-learned models are underconstrained and data-driven, it is challenging to evaluate the validity and implications of these models for our understanding of how experience shapes learning. Thus, scientists interested in the place of meta-learning and meta-learned models in cognitive science should work to make precise the constraints that these models imply across levels of analysis, including by making use of analytical techniques from machine learning, at the same time looking into complementary constraints from experimental neuroscience, and ecologically relevant environments. Given that so many aspects remain open, it is an exciting time to be working with and on meta-learning toolkit.
Acknowledgments
N/A.
Financial support
This research received no specific grant from any funding agency, commercial, or not-for-profit sectors.
Competing interest
None.