Clark and Fischer's (C&F's) central claim is that “people construe social robots … as depictions of social agents” (target article, sect. 1, para. 3). This process involves interacting with three “scenes”: “They perceive the raw machinery of a robot, construe it as a depiction of a character, and, using the depiction as a guide, engage in the pretense that they are interacting with the character depicted” (target article, Abstract). But as C&F note in section 4.5, “It is one thing to tacitly distinguish the three perspectives on a robot (a matter of cognition) and quite another to answer questions about them (a matter of meta-cognition)” (target article, sect. 4.5, para. 1). This distinction between cognition and meta-cognition is important, partly because it determines the usefulness of self-reports as measures of cognitive processes, but C&F are vague about the extent to which people reflect on this process of construing social robots as depictions, and whether they are able to put their reflections into words. In the same section they cite the study by Kahn et al. in which participants aged 9–15 “clearly struggled” to answer questions about the nature of a Robovie robot. Assuming C&F are correct that these participants construed the Robovie robot as a depiction of a character, these participants' responses – and difficulty responding – seem to suggest that they did not understand this clearly, or were unable to put it into words. C&F do not say this outright or explore its implications, instead highlighting that the questions in the study were not clear about which of the three scenes from their framework were being asked about.
There are reasons to suspect that meta-cognition (Dunlosky & Metcalfe, Reference Dunlosky and Metcalfe2008) about construing social robots as depictions would be more difficult – or absent – than C&F discuss. First, there could be difficulties from the nature of the measurement. A survey item or interview question might prompt the first time the participant has reflected on how they think about the robot. The amount of time and effort that participants give to this reflection could greatly affect their responses. Also, this meta-cognition is vulnerable to memory biases because participants must remember their experiences of the cognitive process. Finally, as C&F note in section 4.5, the survey item or interview question might be ambiguous about whether it refers to the robot's physical mechanism or, to use their terminology, the character it depicts. It would be similarly problematic if participants interpreted a question as inviting them to “play along” with imagining the robot to be a character (see target article, sect. 7.2), as some participants might indeed play along in their responses while others might not, instead answering about the robot as a mere mechanical artifact.
There could also be meta-cognitive difficulties from the process itself (i.e., of construing a social robot as a depiction). For one thing, robots do not fit neatly into our existing categories. For example, C&F mention in section 2.2 the study by Gray et al. in which robots were rated low in “‘experience’ (e.g., hunger, pain, fear)” but moderate in “‘agency’ (e.g., self-control, morality, memory)” (target article, sect. 2.2, para. 2). These two characteristics usually occur together in animals and not at all in inanimate objects. Also, the human origins of robots' actions can be difficult to keep in mind. First of all, robots often perform actions without any direct, visible indication that a human caused that action: There is not a puppeteer with their hand inside the robot or manipulating it via strings, and robots often lack signs of remote control such as wires leading around the corner or a nearby human holding a controller (Rueben et al., Reference Rueben, Klow, Duer, Zimmerman, Piacentini, Browning and Smart2021). Second, as C&F argue in section 7.3, people in an interaction with a robot are under time pressure to process the robot's actions as they occur so they (the person) can respond appropriately. In the language of section 6.3, this might require people to mostly do “engagement” to the exclusion of “appreciation,” perhaps making it difficult to produce an account of C&F's three scenes upon reflection.
Finally, the “social artifact puzzle” is puzzling: Even if someone can articulate that they have interacted with a robot as if it were a social agent while also knowing that it is a mechanical artifact, they might not be able to reconcile those two facts in a verbal description. Even human-robot interaction (HRI) theorists who think about this puzzle professionally find it difficult, and continue to disagree about whether the correct framework is depiction or image perception (Remmers, Reference Remmers2020), stance taking (Thellman, Reference Thellman2021), a dual process theory (Złotowski et al., Reference Złotowski, Sumioka, Eyssel, Nishio, Bartneck and Ishiguro2018), or something else. The reflections of laypeople on this theoretical puzzle might therefore be fragmentary, self-contradictory, or vague. Many people might simply give up.
C&F's theory might prove to explain how people “know that the robots are mechanical artifacts” and yet “interact with them as if they were actual agents” (target article, Abstract), but the process and results of people's meta-cognition about this is not much described. Additional empirical and theoretical work is needed here, especially inasmuch as meta-cognitive accounts of these cognitive processes might tend to be incomplete or inaccurate, as this commentary has suggested. One reason this is important is that HRI researchers often use self-report measures such as surveys and interviews to study anthropomorphism (Złotowski, Proudfoot, Yogeeswaran, & Bartneck, Reference Złotowski, Proudfoot, Yogeeswaran and Bartneck2015), mental state attribution (Thellman, de Graaf, & Ziemke, Reference Thellman, de Graaf and Ziemke2021), and related phenomena. Future work should study what valid inferences about cognitive processes can be made from self-reports, and when other types of measures should be used instead.
Clark and Fischer's (C&F's) central claim is that “people construe social robots … as depictions of social agents” (target article, sect. 1, para. 3). This process involves interacting with three “scenes”: “They perceive the raw machinery of a robot, construe it as a depiction of a character, and, using the depiction as a guide, engage in the pretense that they are interacting with the character depicted” (target article, Abstract). But as C&F note in section 4.5, “It is one thing to tacitly distinguish the three perspectives on a robot (a matter of cognition) and quite another to answer questions about them (a matter of meta-cognition)” (target article, sect. 4.5, para. 1). This distinction between cognition and meta-cognition is important, partly because it determines the usefulness of self-reports as measures of cognitive processes, but C&F are vague about the extent to which people reflect on this process of construing social robots as depictions, and whether they are able to put their reflections into words. In the same section they cite the study by Kahn et al. in which participants aged 9–15 “clearly struggled” to answer questions about the nature of a Robovie robot. Assuming C&F are correct that these participants construed the Robovie robot as a depiction of a character, these participants' responses – and difficulty responding – seem to suggest that they did not understand this clearly, or were unable to put it into words. C&F do not say this outright or explore its implications, instead highlighting that the questions in the study were not clear about which of the three scenes from their framework were being asked about.
There are reasons to suspect that meta-cognition (Dunlosky & Metcalfe, Reference Dunlosky and Metcalfe2008) about construing social robots as depictions would be more difficult – or absent – than C&F discuss. First, there could be difficulties from the nature of the measurement. A survey item or interview question might prompt the first time the participant has reflected on how they think about the robot. The amount of time and effort that participants give to this reflection could greatly affect their responses. Also, this meta-cognition is vulnerable to memory biases because participants must remember their experiences of the cognitive process. Finally, as C&F note in section 4.5, the survey item or interview question might be ambiguous about whether it refers to the robot's physical mechanism or, to use their terminology, the character it depicts. It would be similarly problematic if participants interpreted a question as inviting them to “play along” with imagining the robot to be a character (see target article, sect. 7.2), as some participants might indeed play along in their responses while others might not, instead answering about the robot as a mere mechanical artifact.
There could also be meta-cognitive difficulties from the process itself (i.e., of construing a social robot as a depiction). For one thing, robots do not fit neatly into our existing categories. For example, C&F mention in section 2.2 the study by Gray et al. in which robots were rated low in “‘experience’ (e.g., hunger, pain, fear)” but moderate in “‘agency’ (e.g., self-control, morality, memory)” (target article, sect. 2.2, para. 2). These two characteristics usually occur together in animals and not at all in inanimate objects. Also, the human origins of robots' actions can be difficult to keep in mind. First of all, robots often perform actions without any direct, visible indication that a human caused that action: There is not a puppeteer with their hand inside the robot or manipulating it via strings, and robots often lack signs of remote control such as wires leading around the corner or a nearby human holding a controller (Rueben et al., Reference Rueben, Klow, Duer, Zimmerman, Piacentini, Browning and Smart2021). Second, as C&F argue in section 7.3, people in an interaction with a robot are under time pressure to process the robot's actions as they occur so they (the person) can respond appropriately. In the language of section 6.3, this might require people to mostly do “engagement” to the exclusion of “appreciation,” perhaps making it difficult to produce an account of C&F's three scenes upon reflection.
Finally, the “social artifact puzzle” is puzzling: Even if someone can articulate that they have interacted with a robot as if it were a social agent while also knowing that it is a mechanical artifact, they might not be able to reconcile those two facts in a verbal description. Even human-robot interaction (HRI) theorists who think about this puzzle professionally find it difficult, and continue to disagree about whether the correct framework is depiction or image perception (Remmers, Reference Remmers2020), stance taking (Thellman, Reference Thellman2021), a dual process theory (Złotowski et al., Reference Złotowski, Sumioka, Eyssel, Nishio, Bartneck and Ishiguro2018), or something else. The reflections of laypeople on this theoretical puzzle might therefore be fragmentary, self-contradictory, or vague. Many people might simply give up.
C&F's theory might prove to explain how people “know that the robots are mechanical artifacts” and yet “interact with them as if they were actual agents” (target article, Abstract), but the process and results of people's meta-cognition about this is not much described. Additional empirical and theoretical work is needed here, especially inasmuch as meta-cognitive accounts of these cognitive processes might tend to be incomplete or inaccurate, as this commentary has suggested. One reason this is important is that HRI researchers often use self-report measures such as surveys and interviews to study anthropomorphism (Złotowski, Proudfoot, Yogeeswaran, & Bartneck, Reference Złotowski, Proudfoot, Yogeeswaran and Bartneck2015), mental state attribution (Thellman, de Graaf, & Ziemke, Reference Thellman, de Graaf and Ziemke2021), and related phenomena. Future work should study what valid inferences about cognitive processes can be made from self-reports, and when other types of measures should be used instead.
Acknowledgments
I am grateful to Sam Thellman for discussing the paper and commenting on my first draft, and to Peter Remmers also for discussing the paper.
Financial support
This research received no specific grant from any funding agency, commercial, or not-for-profit sectors.
Competing interest
None.