1. Introduction
What kinds of things have psychological capacities? Some cases readily inspire controversy. Mimosa plants, for example, fold their leaves in response to mechanical disturbance, and properties of their response depend on their relationship to the stimulus (Gagliano et al. Reference Gagliano, Renton, Depczynski and Mancuso2014). Whereas novel disturbances elicit more folding, repeated exposure to a harmless stimulus such as water drops attenuates the response to the point that the leaves stop folding altogether. The researchers who produced this effect describe it as habituation, a type of learning that is well studied in psychology. They further suggest that the mechanisms by which plants learn and remember are similar to those used by animals, including calcium signaling and neurochemical transmission. Unsurprisingly, these claims are controversial. Some botanists argue that such application of neurobiological concepts is based on “superficial analogies and questionable extrapolations” that do not add to our understanding of plants (Alpi et al. Reference Alpi, Amrhein, Bertl, Blatt, Blumwald, Cervone, Dainty, Epstein, Galston, Mary Helen, Hawes, Hell, Hofte, Juergens, Leaver, Moroni, Murphy, Oparka, Perata, Quader, Rausch, Rivetta, Robinson, Sanders, Scheres, Schumacher, Sentenac, Slayman, Soave, Somerville, Taiz, Thiel and Wagner2007, 136). Others argue further that plants lack consciousness, memory, and other psychological capacities attributed to them in the so-called plant neurobiology literature (Taiz et al. Reference Taiz, Alkon, Draguhn, Murphy, Blatt, Hawes, Thiel and Robinson2019). Similar controversies surround bacteria and artificial intelligence (Adams and Garrison Reference Adams and Garrison2013; Adams Reference Adams2018).
One strategy for resolving such debates is to ground claims about nonhuman cognition in evidence from mathematical models of cognitive capacities. For example, Joo et al. (Reference Joo, Liang, Chung, Nachman, Kepecs and Frank2021) recently developed a formal model to help address whether rats have the capacity to evaluate confidence in their own memories and use it to guide decision making. The breadth and nature of metacognition across species is a source of spirited debate (Smith et al. Reference Smith, Couchman and Beran2014; Carruthers and Williams Reference Carruthers and Williams2022), and previous experiments with rats delivered equivocal results. Joo et al. (Reference Joo, Liang, Chung, Nachman, Kepecs and Frank2021) combine behavioral data from a spatial memory task with the success of their quantitative model to justify the claim that rats maximize reward by computing memory confidence. Given the availability of alternative interpretations that do not posit this metacognitive capacity, the researchers further argue that the case illustrates the importance of supplementing behavioral results with formal models. Whether or not this case is resolved in favor of the metacognitive interpretation, it exemplifies a general point, made by several other scientists and philosophers, that comparative cognition research would benefit from a shift toward more mathematical modeling and focus on quantifiable similarities/differences (Allen Reference Allen2014; Mikhalevich Reference Mikhalevich, Andrews and Beck2017; Colombo and Scarf Reference Colombo and Scarf2020; Farrar and Ostojic Reference Farrar and Ostojic2019; Figdor Reference Figdor2018).
In this article, I highlight several obstacles to inferring the presence and nature of cognitive processing from mathematical models. Although a few more challenges arise along the way, I emphasize two in particular: demarcating phenomenological models from process models and overcoming underdetermination by model fit. Both are problems for any application of mathematical models in cognitive science, but they are especially acute in the comparative cognition case. My claim is not that mathematical models cannot help with discovering nonhuman cognition. Rather, these general issues in scientific modeling should not be neglected when evaluating evidence afforded by formal models. Doing so oversimplifies the epistemology of cognitive modeling and exaggerates the strength of evidence in favor of hypotheses tested by models (i.e., that some particular cognitive process underlies behavior). I will illustrate these points in application to Carrie Figdor’s (Reference Figdor2018) appeal to formal models to guide judgments about the extension of psychological predicates beyond human cases. Her account helps bring into focus various difficulties with drawing inferences from mathematical models. She claims that quantitative similarity in behavior revealed by formal models constitutes strong evidence for shared psychological processes. Arguably, this claim or a qualified version of it is also implicit in the views of others optimistic about the epistemic benefits of formal modeling in comparative cognition. Unfortunately, things are not so simple. I argue instead that even if we solve the demarcation problem and consider only process models, fitting a process model to behavioral data is, on its own, not strong evidence for any cognitive process, let alone processes shared with humans.
The next section sketches Figdor’s position on psychological continuity across taxa and one motivation for concentrating on mathematical models in this context. Section 3 analyzes her argument for the claim that quantitative similarity in behavior counts as strong evidence for shared cognition. I show that the notion of “quantitative similarity” is ambiguous and recommend operationalizing it in terms of model fit. In section 4 I begin by pointing out that the argument from quantitative similarity is only plausible with respect to process models. I then argue that demarcating phenomenological models from process models is a nontrivial task, and current accounts in the philosophy of science literature are unsuccessful at doing so. Section 5 adopts a framework for process models from the cognitive science literature and demonstrates how fitting process models to behavioral data underdetermines what kind of underlying process generated the data. I conclude with a brief discussion of why the boundary between phenomenological models and explanatory models is blurrier in cognitive science than in other sciences. The upshot is that philosophical accounts of phenomenological modeling have been overly focused on cases in physics, rendering them ill-suited for cognitive science. I further suggest that background theories within specific sciences influence whether models are judged phenomenological or not.
2. Other minds and mathematical modeling
Figdor claims that due to various empirical discoveries, there is increasing pressure to reconsider the meaning and reference of psychological predicates. Traditional semantics of psychological predicates are anthropocentric and no longer scientifically respectable.Footnote 1 As she sees it, “[A]ll the relevant scientific evidence shows that psychological capacities are possessed by a far wider range of kinds of entities than often assumed” (Figdor Reference Figdor2018, 5). In support of this conclusion, she describes some empirically discovered behaviors of plants and bacteria that the researchers characterize psychologically (e.g., bacteria learning about their environments). However, her argument rests primarily on cases in which researchers also fit formal models to their behavioral data.
By Figdor’s lights, a key advantage of mathematical models is that they can powerfully challenge anthropocentric intuitions about what counts as cognition. In her view, “[A] mathematical model provides strong evidence that two domains have important similarities whether or not intuition agrees” (ibid., 135). Consider, for instance, the fact that neuroscientists frequently describe neurons as predicting stimuli. It may seem odd to think of neurons as formulating predictions like whole animals, but Figdor argues that such impressions are an unreliable guide. Mathematical models help guard against the bias of intuitions by revealing quantitative similarity in behavior. Such similarity is independent of qualitative similarity to humans and supports inferences to shared cognitive processes. Regardless of whether Figdor is right about anthropocentrism and formal modeling as a remedy for it, the claim that mathematical models provide strong evidence for psychological similarity is intriguing and worth exploring.
3. Inferring shared cognition from quantitative similarity
Figdor discusses only two types of formal models in depth: the temporal difference (TD) model and the drift-diffusion model (DDM) used in decision-making studies. I’ll describe the former case in more detail but summarize her conclusions regarding both. Making explicit how she reasons from specific modeling results will reveal some conceptual and metaphysical difficulties in interpreting mathematical models and their data. However, I will show in this section how these initial obstacles can be overcome.
The TD model was introduced by Sutton and Barto (Reference Sutton1987) as an improvement over the well-known Rescorla–Wagner model used to explain results from classical conditioning experiments. Although both models are specified by equations that describe the strength of association between a conditioned stimulus (CS) and an unconditioned stimulus (US), the TD model also represents changes in associative strength within trials. By capturing these changes in real time, as opposed to between discrete trials, the TD model can predict animal behaviors in a wider range of experimental conditions.
Briefly, the model works as follows. Footnote 2 At each time step, the algorithm uses a representation of available stimuli to formulate a prediction about upcoming USs. It compares this US prediction with the US prediction formulated at the previous time step. The comparison yields the temporal difference that is then compared to any actual US received. The value of this second comparison is the prediction error at that time step. Prediction errors are then used to update the weight on each element of the stimulus representation and thereby drive learning.
Although the TD model was originally developed for classical conditioning experiments with lab animals, it turns out that it can also model some neural activity. Based on a series of experiments in the latter half of the twentieth century, neuroscientists theorized that midbrain dopamine (DA) neurons process reward prediction errors (Schultz et al. Reference Schultz, Dayan and Read Montague1997). Through electrophysiological recordings in monkeys, it was shown that unexpected rewards boost DA neuron activity. However, as animals learn that reward is associated with a prior CS, DA neurons gradually respond less to the reward. They instead fire selectively upon presentation of the CS. If well-trained animals expect a reward and it is omitted, activity is suppressed. These results have since been replicated in rodents and humans, and the reward prediction error model is still widely accepted (Schultz Reference Schultz2016). Given the TD model’s reliance on prediction errors and success with classical conditioning, it wasn’t long before neuroscientists tried using it to understand DA signals.
Suri and Schultz (Reference Suri and Schultz2001) trained an artificial neural network with the TD learning algorithm under conditions used in previous monkey experiments. They then compared the model’s performance with electrophysiological data collected from monkeys. They found that the model’s reward prediction error signal reproduced characteristics of midbrain DA neuron activity, while its reward prediction signal resembled cortical and striatal activity. For example, after the model was trained on one CS followed by reward, its reward prediction signal was boosted upon CS presentation compared to pretraining, and the signal progressively increased until reward onset. Putamen neurons show the same pattern of activity in monkeys trained on the task. The model also reproduced features of electrophysiological data from a more complicated task involving three different CSs and two types of reward.
What should we make of such findings? In this case, Figdor (Reference Figdor2018, 53) concludes that “real neural populations appear to be adaptive elements that learn to predict future rewards in the quantitatively similar sense that humans, monkeys, rats, and other adaptive elements do.” She goes on to say, “[The TD model simulation] is finding structure in neural behavior that is quantitatively analogous to the structure of reinforcement learning in a behaving animal” (ibid., 53–54). In her response to a critical notice, she characterizes shared structure in behavior as a “criterion” for inferences to cognition (Figdor Reference Figdor2020). Regarding the DDM, she says, “[It] captures in formal terms the dynamics of the behavior from which we infer to [human] decision-making. We then use this formal structure as a criterion for inferences to decision-making in other cases” (ibid., 692). Based on a study in which researchers fit the DDM to behavioral data from fruit flies (DasGupta et al. Reference DasGupta, Ferreira and Miesenböck2014), Figdor infers that fruit flies make decisions. As she puts it, “[G]iven its fit to both human and fruit fly data, the model helps justify the ascription of decision-related component cognitive processes posited by the model (e.g., evidence accumulation) to the intended target populations of decision-makers” (Figdor Reference Figdor2018, 47). Such inferences are defeasible because justifying is weaker than proving: “Satisfying the DDM does not prove fruit flies make decisions (though it is an excellent source of confirmatory evidence)” (Figdor Reference Figdor2020, 692).
An immediate problem with the above proposal is that it’s unclear when behaviors from systems in different taxa count as quantitatively similar. Because no two datasets are identical, we need some way of deciding when they are similar enough to justify inferences to shared psychological capacities. Several of Figdor’s passages suggest that fitting the same formal model structure to different behavioral datasets is sufficient to count behaviors as quantitatively similar. However, she repeatedly characterizes scientific models as revealing structure in the world. For example, in addition to the preceding passage describing the TD model as “finding structure in neural behavior” (Figdor Reference Figdor2018, 53), she says, “indeed, social scientists are busy employing network modeling tools to explore the structure of human social relationships” (180). Behaviors may instead be considered sufficiently quantitatively similar when they have the same formal structure. Whichever interpretation Figdor intended to endorse, I will clarify the distinction in the text that follows and argue in favor of the first option.
Describing quantitative similarity in terms of shared formal structure trades one concept in need of operationalization for another. Without further explication of “formal structure,” it brings us no closer to determining when behaviors from systems in different taxa are quantitatively similar enough to infer shared cognition. Focusing on the formal structure of behavior also invites metaphysical worries. Taken literally, the idea that behaviors instantiate mathematical structure is an assumption about the way in which mathematical entities exist, and that is a topic of controversy in the philosophy of mathematics.Footnote 3 Mary Leng (Reference Leng2010), for instance, denies that mathematical entities exist at all, precluding their instantiation in the world. Another ontological possibility coming from the philosophy of science literature is that mathematical structure is a feature of models but not their targets. So-called discoveries of mathematical structure in behavior are just conceptual reifications “mistaking an aspect of a model—its structure, its construal, or the union of both—for an aspect of empirical data or the natural world; mistaking the math for the territory, so to speak” (Andrews Reference Andrews2021, 29). Several analyses from philosophers help make sense of how theoretical models can be useful in science despite containing descriptions and equations that are true of no object (e.g., Potochnik [Reference Potochnik2017]; Cartwright [Reference Cartwright1983]; Toon [Reference Toon2012]; Rice [Reference Rice2015]) or only partly true (Levy Reference Levy2015).
The assumption that mathematical structures are instantiated in nature is also ill-defined. What does it mean for a mathematical structure to be “in” or “instantiated by” (the behavior of) a system? One possibility is that mathematical structures are abstract entities that exist independently of the physical world, and physical systems sometimes exemplify their structures (Shapiro Reference Shapiro1997). An alternative is that there exists some structural relation (isomorphism, homomorphism, etc.) between a physical system and a mathematical system, though the latter need not exist as an abstract entity (Pincock Reference Pincock2012). Several other possibilities are defended in the philosophy of mathematics literature. Interpreting scientific models as revealing mathematical structure in the world shoulders these metaphysical burdens, but that is unnecessary for explaining the success of models and operationalizing quantitative similarity.
By contrast, model fit is well-defined given standard techniques for fitting models to data. There is nothing metaphysically mysterious about fitting formal models to data. Whether models fit data doesn’t hinge on the possibility that mathematical structures are instantiated by target systems or an account of what that means. Of course, choice of fitting method will influence how well a model fits experimental data, and there is always room for debating how much goodness of fit is good enough within a context. But the concept of fit is still rooted in modeling practice and therefore poised to operationalize quantitative similarity. Figdor more often describes formal models as applying to various domains, but fit is a more appropriate concept because it is understood that fit is a matter of degree. Talk of models applying suggests an all-or-none relation. In sum, I recommend operationalizing quantitative similarity with model fit because unlike Figdor’s current account, it makes precise the cases under consideration and circumvents orthogonal debates in the philosophy of mathematics.
We are now ready to evaluate the following claim:
(QS) Quantitative similarity in behavior constitutes strong evidence that different systems share specific psychological processes.
Behaviors from systems within or across taxa count as quantitatively similar when a mathematical model fits their data, and the psychological processes at stake are specified in the hypothesis tested by the model. We may further assume that (QS) applies only to cases in which the data under comparison are from the same kind of behavioral task. Importantly, Figdor’s account is not so much a target here as a launchpad for critically assessing when mathematical models provide compelling evidence about cognitive processes. Anyone interested in debates over nonhuman cognition might find (QS) appealing.
4. Phenomenological versus process models: a blurry boundary?
Model fit is not a reliable guide to cognition in general because it matters what kind of formal model is being fit. Some models such as Snell’s law are phenomenological in the sense that they have instrumental value (e.g., aid in prediction) but reveal nothing about underlying processes or mechanisms (Bokulich Reference Bokulich2011). Phenomenological models are usually constructed by fitting a model structure to data ad hoc. Although phenomenological models are often discussed in the context of physics, psychologists, and neuroscientists also use the concept (Luce Reference Luce1995; Mauk Reference Mauk2000; e.g., Bassett et al. Reference Bassett, Zurn and Gold2018). Despite their instrumental value, phenomenological models in cognitive science neither describe nor provide evidence for internal cognitive processes. Instead, they formally redescribe a target system’s behaviors or interactions. Thus, (QS) is false as a general claim about formal models in cognitive science. If fitting a phenomenological model isn’t evidence for any psychological process, then fitting one to behavioral data from another system isn’t evidence that the two systems share any specific psychological process.
Once phenomenological models are taken into consideration, (QS) is only plausible as a claim about models that represent possible cognitive processes underlying behavior. Following the lead of cognitive scientists, let’s call these models “process models.” Unfortunately, beyond this basic characterization, there is no agreed upon definition of the term or consensus on which models count as process models (Jarecki et al. Reference Jarecki, Tan and Jenny2020). In the following text, I will argue that standard accounts in the philosophy of science literature fail to demarcate phenomenological models from process models. I’ll demonstrate this with the example of linear models in mathematical psychology. I chose models fit to behavioral data from humans to underscore the problem faced by researchers in comparative cognition. If it is unclear whether models of human cognitive capacities are phenomenological, all the worse for models fit to data from plants, insects, and so forth where the presence of cognition is in question. The upshot is that contrary to Figdor’s suggestion, quantitative similarity is not readily applicable as a criterion for inferences to cognition.
Linear models in mathematical psychology began proliferating in the 1960s. Hoffman (Reference Hoffman1960) first proposed that judgments in decision-making tasks could be modeled as linear functions of cues. Subjects in a typical task are given a set of cues either sequentially or simultaneously and asked to predict an outcome or value of some property. In one of Hoffman’s tasks, for example, subjects used nine cues about 100 persons such as their high school rating and mother’s education level to judge their “intelligence.” The structure of a linear model is a weighted (usually multiple) linear regression equation in which variables represent cues, and weights represent the significance subjects assign to each variable with respect to what they’re judging. Hoffman showed that such models accurately predict the judgments of subjects, and the result was replicated many times over in various tasks (see Dawes and Corrigan [Reference Dawes and Corrigan1974] for references and discussion). Linear models of judgment are also building blocks of the lens model equation, which is used extensively in learning studies (Karelaia and Hogarth Reference Karelaia and Hogarth2008). The equation is a formalization of Brunswik’s (Reference Brunswik1952) lens model, and it is useful for quantifying how much different variables influence the accuracy of judgments. For instance, Luan et al. (Reference Luan, Schooler and Tan2020) recently showed that people judge the monetary value of objects more accurately when cues are presented sequentially instead of simultaneously. They then performed a lens model analysis to demonstrate that the improvement was primarily due to increased consistency in judgments.
Although it is possible that subjects solve linear functions in their minds to reach judgments in decision-making tasks, and some researchers have defended this sort of conclusion (Einhorn et al. Reference Einhorn, Kleinmuntz and Kleinmuntz1979; Goldberg Reference Goldberg1971; Payne et al. Reference Payne, Bettman and Johnson1993), the evidence is hardly decisive. Indeed, when Hoffman (Reference Hoffman1960) proposed using linear models of judgment, he called them “paramorphic representations” to emphasize that it’s unclear whether they accurately represent underlying mental processes. In the more recent literature, psychologists typically say people behave as if they use a linear model in these tasks (Hogarth and Karelaia Reference Alpi, Amrhein, Bertl, Blatt, Blumwald, Cervone, Dainty, Epstein, Galston, Mary Helen, Hawes, Hell, Hofte, Juergens, Leaver, Moroni, Murphy, Oparka, Perata, Quader, Rausch, Rivetta, Robinson, Sanders, Scheres, Schumacher, Sentenac, Slayman, Soave, Somerville, Taiz, Thiel and Wagner2007).Footnote 4 Some take a firmer stance, claiming that linear models merely predict judgments without capturing the cognitive operations leading to them (Glöckner and Betsch Reference Glöckner and Betsch2011), or attempt to explain away the success of linear models (Dawes Reference Dawes, Restle, Shiffrin, John Castellan, Lindman and Pisoni2018; Dawes and Corrigan Reference Dawes and Corrigan1974).
Here I’ll just give a couple reasons for remaining skeptical of the claim that linear models accurately represent internal cognitive processes. First, linear models aren’t based on any biological mechanisms. In the terminology of Marr’s (Reference Marr2010) framework, there is no evidence that any neural mechanisms implement an algorithm computing the hypothesized linear functions. Some degree of looseness between levels of description is both tolerable and expected (Allen Reference Allen2014), but in this case an implementation story is completely absent. Second, heuristic models, which don’t consist of linear functions, can also fit the same data and in some cases fit even better (Gigerenzer and Goldstein Reference Gigerenzer and Goldstein1999). They are also arguably more psychologically plausible with respect to properties such as computational tractability (Gigerenzer et al. Reference Gigerenzer, Hoffrage and Goldstein2008). For example, some heuristic models predict judgments based on fewer of the available cues compared to linear models (Hogarth and Karelaia Reference Hogarth and Karelaia2007). Instead of solving linear functions in decision-making tasks, it is possible that people utilize various heuristics. Though it should be stressed that the options aren’t exclusive. There is evidence that people switch strategies depending on the circumstances, including within an experimental task (Lee et al. Reference Lee, Gluck and Walsh2019; Newell et al. Reference Newell, Weston and Shanks2003).
The preceding discussion provides a test case for philosophical accounts of phenomenological models. Linear models of judgment fit behavioral data from humans engaged in cognitive tasks, but are they merely phenomenological models? Early writings in philosophy of science cast phenomenological models as independent from theory. However, Margaret Morrison (Reference Morrison, Morrison and Mary1999) has persuasively argued that this view is inadequate (see also Frigg and Hartmann Reference Frigg and Hartmann2020). In the more recent literature, philosophers typically characterize phenomenological models by appealing to what they describe and whether they count as explanatory. These features are complementary, but let’s consider them in turn.
According to Kaplan and Craver (Reference Kaplan and Craver2011), the signature of phenomenological models is that they describe behaviors of systems but not the mechanisms underlying their behavior. A problem with this criterion and any other based on description is that what a model represents arguably depends on the intentions of its user (Giere Reference Giere2010; Callender and Cohen Reference Callender and Cohen2010; Weisberg Reference Weisberg2013). In the case of linear models, many psychologists take them to redescribe behavior, whereas Einhorn et al. (Reference Einhorn, Kleinmuntz and Kleinmuntz1979) claim that linear models represent underlying cognitive processes, albeit at a higher level of abstraction than models specifying algorithms. Thus, the very same model might be classified as phenomenological or not depending on whom you ask. Even if agent-based accounts of scientific representation are wrong, the fact remains that scientists sometimes disagree about what models represent. Given the possibility of disagreement, appealing to what models describe fails to settle whether some are phenomenological or process models. The same reasoning extends to accounts that emphasize what phenomenological models aim at (e.g., Bokulich [Reference Bokulich2011]).
Explanatory status also supports opposing classifications because when models count as explanatory—in the sense of answering why certain things happen—is emphatically disputed. Proponents of the mechanistic framework argue that models across the mind-brain sciences have explanatory force only to the extent that they describe details of mechanisms. Their case studies include models in cognitive science (Kaplan and Bechtel Reference Kaplan and Bechtel2011), psychology (Piccinini and Craver Reference Piccinini and Craver2011), cognitive and systems neuroscience (Craver and Kaplan Reference Craver and Michael Kaplan2020; Kaplan and Craver Reference Kaplan and Craver2011), and computational neuroscience (Kaplan Reference Kaplan2011). Unsurprisingly, others have found their mechanistic demands on explanatory adequacy overly narrow. These critics highlight other explanatory patterns in the mind-brain sciences, including functional explanations (Weiskopf Reference Weiskopf2011), dynamical explanations (Ross Reference Ross2015; Silberstein and Chemero Reference Silberstein and Chemero2013), computational explanations (both causal and noncausal) (Chirimuuta Reference Chirimuuta2014; Serban Reference Serban2015; Chirimuuta Reference Chirimuuta2018), and topological explanations (Kostić Reference Kostić2018; Kostić and Khalifa Reference Kostić and Khalifa2023). Depending on one’s views about sources of explanatory force, linear models may be classified as phenomenological or process models. Even if it is granted that they are explanatory in some sense, there is room for disagreement about which kinds of explanation matter for being a process model.
In this section I’ve argued that (QS) is false because phenomenological models in cognitive science provide no evidence for psychological capacities. Rescuing (QS) depends on limiting its scope to process models, yet current philosophical accounts fail to demarcate phenomenological models from process models. This hampers the usefulness of quantitative similarity as a criterion for inferences to cognition. (QS) may be true of process models, but neither Figdor nor the modeling literature in philosophy of science make clear which ones those are. The situation seems to fit Morrison’s (Reference Morrison, Morrison and Mary1999) view that the distinction between phenomenological and theoretical models is of dubious philosophical value because it is difficult to draw a sharp boundary between the two.
In response, one might argue that even if it’s unclear whether a mathematical model is phenomenological or not, the fact that it fits behavioral data from different systems supports the inference that they share some underlying cognitive process. It just doesn’t reveal which one. Such inferences are problematic, though, in light of what Taylor et al. (Reference Taylor, Bastos, Brown and Allen2022) call the “many-to-one mapping problem”: Different possible cognitive processes can generate any particular behavior. A better strategy is to distinguish process models in a way that helps modelers cope with the many-to-one mapping problem. That might salvage (QS) by weeding out phenomenological models, and more generally, it doesn’t give up on the intuitively valuable distinction between phenomenological and explanatory models. Fortunately, more promising ways of distinguishing process models can be found in the cognitive science literature.
5. Fitting and comparing process models
I propose adopting the conceptual framework developed by Jarecki et al. (Reference Jarecki, Tan and Jenny2020). In my view, it offers the most thorough characterization of process models in the current cognitive science literature and evades the problems outlined in the previous section. Like previous proposals, they claim that process models represent testable assumptions about how cognitive systems transform inputs. They call this the “intermediate stage” between stimulus input and behavioral output. So far, this is too flexible. Linear models arguably represent the assumption that people transform inputs by computing linear functions during decision-making tasks, and that is a testable hypothesis in the sense that linear models will fit (within some degree of goodness) behavioral choice data or not. Or perhaps linear models are merely phenomenological because they formally represent overt choice behaviors and nothing more. The blurry boundary strikes again. However, Jarecki et al. (Reference Jarecki, Tan and Jenny2020) add the further condition that process models must make separate predictions at two levels: behavioral output and the intermediate stage. The theoretical significance of the latter kind, called “process predictions,” has also been urged by other cognitive scientists (Pachur et al. Reference Pachur, Hertwig, Gigerenzer and Brandstätter2013; Sun Reference Sun2008; Johnson et al. Reference Johnson, Schulte-Mecklenbeck and Willemsen2008). Process predictions include predictions about attention, speed, error types, and so forth. They are specific to models (e.g., the DDM predicts reaction time distributions), but all are consequences of the cognitive process hypothesized by modelers. Note that process predictions, like behavioral output predictions, are typically about behavioral measures. Which behaviors correspond to each type of prediction is defined within a modeling context. One modeler’s behavioral output prediction may be another’s process prediction.
The preceding summary leaves out important details of the framework, but it is enough to demonstrate that process models are distinguishable in a principled way. To see this, consider the following points. If a hypothesis about cognitive processing successfully predicts only one kind of behavior, then it is no more plausible than any competing hypothesis that makes the same prediction. As mentioned earlier, both linear and heuristic models of judgment predict the same choices made by people in some decision-making tasks, but they are based on competing hypotheses about the causal processes underlying those choices. Process predictions help deal with the many-to-one mapping problem by providing further points of comparison between models, allowing modelers to test competing hypotheses when they explain other data equally well. Models that fail to make process predictions might still accurately represent cognition at some level of abstraction, but they are too underspecified to be rigorously tested.Footnote 5 Jarecki et al. (Reference Jarecki, Tan and Jenny2020) apply their framework to demonstrate that at least one heuristic model of judgment qualifies as a process model, whereas equal weighting models (a species of linear models) do not. Because the details matter, every model type must be inspected individually to determine whether they meet the proposed criteria for process models.
The remainder of this section argues that (QS) is still false when charitably interpreted as a claim about process models. Fitting a process model to behavioral output data is, on its own, not strong evidence for any cognitive process, let alone shared processes. As examples of process models, I’ll use DDMs and Bayesian models of perceptual decision making. According to the previously mentioned framework, every DDM qualifies as a process model because they represent a specific process of evidence accumulation toward decision boundaries (see Ratcliff and McKoon [Reference Ratcliff and McKoon2008] for details), and they make separate predictions about choices (behavioral output) and reaction time distributions (intermediate stage). The Bayesian model built by Bitzer et al. (Reference Bitzer, Park, Blankenburg and Kiebel2014) and further developed by Fard et al. (Reference Fard, Park, Warkentin, Kiebel and Bitzer2017) is also a process model of decision making in two-alternative forced choice tasks. Instead of the diffusion process represented by DDMs, Bayesian models assume that cognitive systems generate predictions about stimuli and compare them to noisy input. An inference mechanism calculates the likelihood of each stimulus alternative given the observations up to some timepoint. The calculated posterior beliefs are then compared to a decision policy that determines what choice will be made. The Bayesian models developed by Bitzer and colleagues also make independent predictions about choices and reaction times.
It turns out that under certain assumptions about the parameters, it is possible to “translate” DDMs into Bayesian models that make the same predictions and vice versa (Fard et al. Reference Fard, Park, Warkentin, Kiebel and Bitzer2017; Bitzer et al. Reference Bitzer, Park, Blankenburg and Kiebel2014). More specifically, from parameters estimated by behavioral data in one kind of model, one can determine what parameters the other kind of model should take to predict the same choices and reaction time distributions. These models have distinct formal structures, including different numbers of parameters. Thus, they are not “exactly the same mathematical thing” or “simple rotations of each other” as Smith et al. (Reference Smith, Zakrzewski and Church2016, 1347) argue is the case in their example of competing two parameter models. The DDMs and Bayesian models also make very different assumptions about the decision-making process as summarized in the previous paragraph. In the terminology adopted by Figdor (Reference Figdor2018), their model construals are completely different.
Despite good fit to both behavioral output and process data, the kind of cognitive process generating decisions in forced choice tasks is underdetermined. From fit alone, we do not have strong evidence either that target systems use a sequential sampling process represented by DDMs or that they formulate predictions and use Bayesian inference to calculate the likelihood of each stimulus. That is why Bitzer et al. (Reference Bitzer, Park, Blankenburg and Kiebel2014) emphasize a theoretical virtue of their Bayesian models: Unlike DDMs, they explicitly model how sensory input is converted into evidence. Fard et al. (Reference Fard, Park, Warkentin, Kiebel and Bitzer2017) further motivate translating DDMs into Bayesian models by showing that modeling input more precisely leads to improved fit. These dialectical moves reflect an understanding among researchers that epistemic considerations beyond good fit are necessary for motivating their theoretical accounts. Though it is worth recognizing that, historically, many psychologists have supported theories primarily by demonstrating model fit (see Roberts and Pashler [Reference Roberts and Pashler2000] for a widely cited critique of the practice).
Cognitive scientists further acknowledge the epistemic limitations of model fit by emphasizing the importance of comparing models. Busemeyer and Stout (Reference Busemeyer and Stout2002, 260) make the point sharply: “It is meaningless to evaluate a model in isolation, and the only way to build confidence in a model is to compare it with reasonable competitors.” I’ll conclude my rejection of (QS) by drawing attention to the fact that model comparison adds yet another layer of epistemic challenges. An initial problem is that the best fitting model is not always the most accurate. If researchers simply pick the model that fits their data best, they will end up choosing overly complex models. (Relevant factors of model complexity include the number of parameters and functional form.) The result has been mathematically proven in simulation studies. As long as there is some error in the data, which is inevitable in experimentation, more complex models will fit better than the model that generated the data (Myung Reference Myung2000; Pitt and Myung Reference Pitt and Myung2002). Such models are overfit. They fit a particular dataset well but are sensitive to random error in it, so they are unlikely to fit new data.
Because a good fit can mislead researchers into favoring the wrong hypothesis, model selection techniques are used to achieve a balance between goodness of fit and complexity (see Myung et al. [Reference Myung, Cavagnaro, Pitt, Ehtibar, Colonius, Myung and William2016] for a recent review). However, there are many factors to consider when picking a model selection method. The reliability of some methods depends on sample size (Busemeyer and Wang Reference Busemeyer and Wang2000). Different classes of methods often disagree on which competing model is best because they seek out and punish different properties, and consistency between them further depends on circumstances such as effect size (Evans Reference Evans2019). There are also broader methodological issues at play. Bayesians argue that their techniques for assessing the credibility of model parameters are better at deciding between competitors than model selection methods that attempt to balance goodness of fit and complexity (Kruschke Reference Kruschke2011; Kruschke and Liddell Reference Kruschke and Liddell2018).
If fitting a process model is not strong evidence that humans use some cognitive process, then discovering that the same model fits behavior of some nonhuman systems is not strong evidence that they share that cognitive process with humans. Further epistemic considerations are necessary for strong confirmation of hypotheses about what kind of underlying process generates behavior. Exactly what considerations and how they should be weighted are topics that deserve thorough analysis elsewhere. However, the preceding discussion indicates that model selection techniques have a key role to play, and circumstances matter (sample size, effect size, etc.). Though it should be stressed that no model selection method is optimal in all cases (Evans Reference Evans2019), and choices should arguably be guided by the variable goals of researchers (Kellen Reference Kellen2019; Navarro Reference Navarro2019).
In this section I’ve concentrated on how model fit underdetermines the nature of cognitive processing and the many epistemic issues involved in selecting a model among multiple that fit the same data. Importantly, this kind of underdetermination is no threat to scientific realism. Underdetermination arguments against scientific realism rest on all epistemic considerations failing to direct theory choice. Model fit is just one of many considerations. At most, the epistemic limitations of model fit suggest cases of “practical” underdetermination (see Turnbull [Reference Turnbull2018] for a useful taxonomy of underdetermination). According to this relatively weak form of underdetermination, present evidence fails to direct theory choice, and that is fully compatible with scientific realism. Thus, my argument doesn’t rely on any antirealist maneuvers. Scientific theories may (eventually) track truths about cognitive processes underlying behavior, but formal modeling is no silver bullet.
6. Concluding remarks
Mathematical models may bolster evidence for cognitive capacities in nonhuman systems, but they also introduce complications of their own. Some are metaphysical such as the question of whether mathematical structure is a property of models and their targets or just models. Failure to distinguish these possibilities and other modeling concepts (e.g., model structure vs. model construal) is a source of confusion among scientists and philosophers (Andrews Reference Andrews2021). Others are more methodological. What counts as “quantitative similarity” and why is it a relevant kind of similarity for inferring shared psychological processes? These questions indicate a general burden on philosophers analyzing implications of modeling results: Notions that aren’t well defined within the considered studies require explication and justification (cf. Bickle [Reference Bickle, Hohwy and Kallestrup2008] on metascience). Furthermore, drawing inferences about modeling results without attending to more general issues in scientific modeling is hasty. Here I’ve concentrated on phenomenological modeling and underdetermination, but the same conclusion is also defensible by considering issues regarding scientific representation (Drayson Reference Drayson2020).
The problem of underdetermination by model fit is one epistemic challenge that cognitive modelers have addressed by adopting model selection techniques. However, I hasten to emphasize that there is no simple story about how model selection takes place in cognitive science or any straightforward solution to the problem. Again, no currently available model selection method is optimal, and under some conditions there are no practical differences between them (Evans Reference Evans2019). This highlights the need for further epistemic considerations (process predictions, mechanistic evidence, etc.) and perhaps nonepistemic values, but how they should be jointly assessed is an open question. My preliminary suggestion is that the weight of each type of evidence should be sensitive to the kind of system in question. For example, mechanistic similarity may be a useful criterion for inferring shared cognition in other mammals, but misleading when applied to more distant relatives where very different mechanisms might have evolved to achieve similar ends.
Elucidating what distinguishes phenomenological models from process models is perhaps a manageable problem. The fruitfulness of Jarecki et al.’s (Reference Jarecki, Tan and Jenny2020) framework remains to be seen, and future proposals may improve upon theirs. But the underdetermination of cognitive processing by model fit is a hard barrier to directly inferring shared cognition from quantitative similarity in behavior. Both phenomenological modeling and underdetermination are classic topics in general philosophy of science with extensive literatures. However, general issues can play out in different and interesting ways across the sciences. I conclude by briefly reflecting on why current philosophical accounts deliver a blurry boundary between phenomenological and process models in cognitive science.
An uninspired remark is that philosophical attempts at demarcating kinds of models rest on intuitions about what models represent and when they count as explanatory. Such intuitions are bound to generate controversy. However, the case of linear models suggests a more interesting explanation of the hazy boundary. My suspicion is that any formalism that accurately describes behavior is a how-possibly model given the computational theory of mind. According to this theory, cognitive systems are the kinds of things whose behavior is governed by internal algorithms computing functions. Given this theoretical framework, any mathematical model fit to behavioral data is doubly interpretable as a representation of both behavior and internal processing that causes behavior.
By contrast, consider the case of light. No one thinks its behavior is determined by internal computations. Hence, Snell’s law is uncontroversially judged a phenomenological model (Kaplan and Craver Reference Kaplan and Craver2011). Although it is useful as a formal representation of light behavior and aids in predicting how light will refract, no one is tempted to think that Snell’s law explains why light behaves as it does. Stephan Hartmann’s (Reference Hartmann, Morrison and Morgan1999) account of models and stories is insightful here. He argues that stories inspired by an underlying fundamental theory (but not deduced from it) play an important role in model acceptance. Stories told around the formalism fit a model into the broader framework of the fundamental theory, and there is no good model without such a story. In this terminology, what distinguishes mathematical models in cognitive science from Snell’s law and other phenomenological models in physics is that stories linking models of behavior to a dominant background theory (i.e., the computational theory of mind) are readily available. Consequently, any cognitive model which formally describes a system’s behavioral data is also a how-possibly model. It is a plausible possibility that the system computes the function specified by a well-fit model, and this would help explain its behavioral data, especially when supplemented with an algorithm by which the function is computed. (See Egan [Reference Egan2017] for more on function-theoretic characterization as an explanatory strategy in cognitive science.) Not so for formal models of light behavior. This allows a relatively clear boundary between phenomenological models and explanatory models in the case of light and perhaps physics more generally.
Given that philosophical thinking about phenomenological modeling has been so concentrated on models in physics, it is unsurprising that current accounts are ill-suited for cognitive science. Hopefully, this article encourages more philosophical work on phenomenological models, process models, and model selection, specifically in cognitive science.
Acknowledgments
An anonymous reviewer for this journal provided careful feedback that improved this article. I am grateful to Mazviita Chirimuuta, Arnon Levy, and especially Colin Allen for helpful discussions and comments on earlier versions. I also benefited from conversations with Mel Andrews, Gal Ben-Porath, Nuhu Osman Attah, and members of Colin’s writing group in the spring of 2023.