Introduction
Facial electromyography (fEMG) captures subtle activation of facial muscle areas and can be used to measure affective responses to stimuli (Bakker et al., Reference Bakker, Schumacher and Rooduijn2021; Russell, Reference Russell1980). In particular, activation of the so-called corrugator muscle above the eyebrow is indicative of a negative affective response (Hietanen et al., Reference Hietanen, Surakka and Linnankoski1998; Lang et al., Reference Lang, Greenwald, Bradley and Hamm1993; vanOyen Witvliet & Vrana, Reference vanOyen Witvliet and Vrana1995), and activation of the so-called zygomaticus muscle on the cheek is indicative of a positive affective response (Cacioppo et al., Reference Cacioppo, Petty, Losch and Kim1986). This technique has several advantages compared to other approaches that measure affective responses: (1) compared to studies using skin conductance, fEMG moves beyond indications of emotional arousal and captures the direction or valence of the affective response; (2) compared to electroencephalography (EEG) and functional magnetic resonance imaging (fMRI), fEMG is easier to capture, process, and analyze; and (3) compared to self-reports of emotion, fEMG captures unconscious affective processes while participants are exposed to a stimulus. While self-reports of emotion capture a cognitive evaluation—a feeling—after receiving a stimulus, fEMG registers a different, more unconscious aspect of affect that is not necessarily aligned with feelings (Evers et al., Reference Evers, Hopp, Gross, Fischer, Manstead and Mauss2014; LeDoux & Pine, Reference LeDoux and Pine2016). Despite these advantages, fEMG is rather unpopular in political science. Using Google Scholar, in the spring of 2023, we found only 18 studies published in peer-reviewed journals that used fEMG in the domain of politics. Only six of these were published in a political science journal (see Table A5 in the Supplementary Material).
We identify four reasons why fEMG is unpopular. First, political scientists lack (or perceive that they lack) the skills and equipment necessary to conduct a study including fEMG measures. Second, psychophysiological measures are criticized as invalid measures of emotions (Marcus et al., Reference Marcus, Neuman and MacKuen2017; Osmundsen et al., Reference Osmundsen, Hendry, Laustsen, Smith and Petersen2021). Third, there are too many researcher degrees of freedom in fEMG measurement and analysis, leading to a lack of robust findings. Fourth—and not necessarily unique to fEMG—there are concerns about the quality of experimental designs and the diversity of samples used in lab experiments.
We address these issues by (1) establishing fEMG as a supplementary measure, not an alternative, to capture a part of the dynamic unconscious and conscious processes that constitute an emotion episode. (2) We review different preprocessing and analysis techniques and provide recommendations to produce more robust results. (3) We evaluate a series of data collections to make recommendations regarding experimental design and sampling. To this end, we analyze data from five experimental designs, seven data collection sites, 98 treatments, and 585 individuals, for a total of 540,883 seconds of fEMG activity. We build on existing fEMG work in the field of psychophysiology (Blascovich et al., Reference Blascovich, Vanman, Mendes and Dickerson2011; Bucy & Bradley, Reference Bucy, Bradley, Bucy and Holbert2011; Potter & Bolls, Reference Potter and Bolls2012; Tassinary et al., Reference Tassinary, Cacioppo, Vanman, Cacioppo, Tassinary and Berntson2007; van Boxtel, Reference van Boxtel, Spink, Grieco, Krips, Loijens, Noldus and Zimmerman2010), but we also rely on a series of recent developments regarding the modeling of such data (Hess et al., Reference Hess, Arslan, Mauersberger, Blaison, Dufner, Denissen and Ziegler2017; ‘t Hart et al., Reference ‘t Hart, Struiksma, van Boxtel and van Berkum2019).
Our article is aimed at (non-fEMG-specialist) researchers interested in measuring affective responses and theorizing about the position of emotions in politics more generally. We hope to make fEMG more popular by taking away concerns about validity and robustness and, at the same time, propose better experimental designs. Before doing this, we first explain fEMG in more detail. Yet be aware that for a specialized explanation regarding facial musculature, we refer to the relevant psychophysiological sources (Cacioppo et al., Reference Cacioppo, Tassinary and Berntson2007; Potter & Bolls, Reference Potter and Bolls2012; Tassinary et al., Reference Tassinary, Cacioppo, Vanman, Cacioppo, Tassinary and Berntson2007; van Boxtel, Reference van Boxtel, Spink, Grieco, Krips, Loijens, Noldus and Zimmerman2010).
What is fEMG?
Facial EMG can register rapid, automatic affective responses in the face while participants view stimulus material. Specifically, fEMG measures tiny bioelectric signals at the surface of the skin that are generated as a result of muscle contractions in a specific region (Blascovich et al., Reference Blascovich, Vanman, Mendes and Dickerson2011; Bucy & Bradley, Reference Bucy, Bradley, Bucy and Holbert2011; Potter & Bolls, Reference Potter and Bolls2012; Tassinary et al., Reference Tassinary, Cacioppo, Vanman, Cacioppo, Tassinary and Berntson2007; van Berkum et al., Reference van Berkum, Struiksma, ’t Hart, Grimaldi, Shtyrov and Brattico2023; van Boxtel, Reference van Boxtel, Spink, Grieco, Krips, Loijens, Noldus and Zimmerman2010). These muscle contractions are not visible to the eye. This technique has a high temporal resolution, as it detects rapid changes in the contractions of facial muscles in a specific region (van Boxtel, Reference van Boxtel, Spink, Grieco, Krips, Loijens, Noldus and Zimmerman2010). It is primarily used as a measure of emotional valence (Larsen et al., Reference Larsen, Berntson, Poehlmann, Ito, Cacioppo, Lewis, Haviland-Jones and Barrett2008; Tassinary et al., Reference Tassinary, Cacioppo, Vanman, Cacioppo, Tassinary and Berntson2007). The corrugator supercilii muscle region, the muscle above the eyebrow that draws the brow down and pulls the brows together (see Figure 1), registers negative affect. Indeed, increased corrugator activity has been recorded in response to negative images, negative words, and negative affective cues in language (Hietanen et al., Reference Hietanen, Surakka and Linnankoski1998; Lang et al., Reference Lang, Greenwald, Bradley and Hamm1993; van Berkum et al., Reference van Berkum, Struiksma, ’t Hart, Grimaldi, Shtyrov and Brattico2023). The zygomaticus major muscle region, which pulls the corners of the mouth up and back into a smile (Larsen et al., Reference Larsen, Norris and Cacioppo2003), registers positive affect. Indeed, zygomaticus activity increases in response to positively valenced images and videos (Cacioppo et al., Reference Cacioppo, Petty, Losch and Kim1986; vanOyen Witvliet & Vrana Reference vanOyen Witvliet and Vrana1995). For a full overview of experimental results with the zygomaticus and corrugator muscles, we recommend reading van Berkum et al. (Reference van Berkum, Struiksma, ’t Hart, Grimaldi, Shtyrov and Brattico2023), as they comprehensively review 55 studies.
Other muscles have been associated with specific discrete emotions, such as the levator labii, which is associated with disgust (Chapman et al., Reference Chapman, Kim, Susskind and Anderson2009), or the frontalis, which is associated with fear. While there are concerns about whether the levator labii and frontalis validly and reliably capture these distinct emotions (Tassinary et al., Reference Tassinary, Cacioppo, Vanman, Cacioppo, Tassinary and Berntson2007), recent advances, particularly in data science, hold the promise of detecting discrete emotions using fEMG (Sharma et al., Reference Sharma, Castellini, van den Broek, Albu-Schaeffer and Schwenker2019).
Facial muscles are part of the peripheral nervous system that guides muscular activity. This is unlike more commonly used psychophysiological measures in political science such as skin conductance, which is part of the autonomic nervous system. The autonomic nervous system controls our glands, and skin conductance specifically monitors glands on the hand or fingertips (for an introduction, see Soroka, Reference Soroka and Foster2019). We have little control over our glands. In contrast, most of our facial muscles can be used voluntarily. That said, facial muscle activation can also be automatic and expressed in response to a stimulus before conscious awareness (Dimberg et al., Reference Dimberg, Thunberg and Elmehed2000; Mehu et al., Reference Mehu, Mortillaro, Bänziger and Scherer2012). In particular, automatic affective responses have been detected in evaluating the stimulus (Neumann et al., Reference Neumann, Hess, Schulz and Alpers2005) or by simulating the emotion presented in the stimulus (Niedenthal et al., Reference Niedenthal, Mermillod, Maringer and Hess2010; ‘t Hart et al., Reference ‘t Hart, Struiksma, van Boxtel and van Berkum2019). Facial EMG responses also relate to specific activity in the motor cortex and other brain structures associated with emotions (Achaibou et al., Reference Achaibou, Pourtois, Schwartz and Vuilleumier2008; Morecraft et al., Reference Morecraft, Stilwell-Morecraft and Rossing2004; Rymarczyk et al., Reference Rymarczyk, Żurawski, Jankowiak-Siuda and Szatkowska2018). Finally, the automaticity of fEMG responses is further underlined by the fact that they cannot be suppressed. If you ask people to not move their face, facial activity congruent with the valence of the presented stimuli is still present (Cacioppo et al., Reference Cacioppo, Bush and Tassinary1992; Dimberg et al., Reference Dimberg, Thunberg and Grunedal2002). Even if participants are prompted to smile, they still frown upon exposure to a negatively valenced stimulus. In all, fEMG is a valid tool to register automatic, affective responses to stimuli.
Recent work uses emotion recognition algorithms to capture emotional expressions (Boussalis et al., Reference Boussalis, Coan, Holman and Muller2021; Masch et al, Reference Masch, Gassner and Rosar2021). These algorithms are only capable of detecting visible muscle contractions, while fEMG can capture tiny temporal changes in muscle contraction invisible to both the eye and computer vision (Perusquía-Hernández et al., Reference Perusquía-Hernández, Ayabe-Kanamura, Suzuki and Kumano2019). In addition, it is problematic that there is little documentation of how these algorithms work and what training data they are based on. Therefore, while useful as a tool to measure expressions of political elites, emotion recognition algorithms cannot replace fEMG in the lab.
Facial EMG is a valid measure of affect and theoretically useful
The preceding section demonstrated the construct validity of fEMG, as evidenced by the expected fEMG responses to stimuli with positive or negative valence (Tassinary & Cacioppo, Reference Tassinary and Cacioppo1992; van Berkum et al., Reference van Berkum, Struiksma, ’t Hart, Grimaldi, Shtyrov and Brattico2023). In our own analyses, we find high correlations between the valence of a stimulus (negative-to-positive) and both corrugator activity (r $ = $ –.774) and zygomaticus activity (r $ = $ .501) (see the section on Robustness of fEMG measures and analysis). Yet in political science, the convergent validity of psychophysiological measures has faced scrutiny because of their low correlation with self-reports of emotion (Marcus et al., Reference Marcus, Neuman and MacKuen2017; Osmundsen et al., Reference Osmundsen, Hendry, Laustsen, Smith and Petersen2021). This misses the point. There are two reasons why we would not expect a substantive correlation between fEMG and self-reports. First, the lack of connection with self-reports emerges because conscious and unconscious processes are not necessarily connected (Evers et al., Reference Evers, Hopp, Gross, Fischer, Manstead and Mauss2014; LeDoux & Pine, Reference LeDoux and Pine2016). Different brain systems may even be responsible for the production of physiological fear responses versus cognitive fear responses (LeDoux & Pine, Reference LeDoux and Pine2016). Keltner and Gross (Reference Keltner and Gross1999) define emotions as “episodic, relatively short-term, biologically-based patterns of perception, experience, physiology, action, and communication that occur in response to specific physical and social challenges and opportunities” (p. 468). Emotions bring about rapid physiological changes in, for example, heart rate or blood pressure. But emotions also convey our feelings and evaluations of the situation to others by means of posture, voice, and, indeed, facial expression (van Berkum et al., Reference van Berkum, Struiksma, ’t Hart, Grimaldi, Shtyrov and Brattico2023). These different elements of an emotional episode do not take place at the same, but precede or follow each other dynamically (Marcus, Reference Marcus2012). Longer, conscious responses may, in turn, affect new unconscious responses further developing the emotion episode (Scherer & Moors, Reference Scherer and Moors2019). In sum, fEMG may not correlate with self-reports because the fEMG readings may not enter or may have already exited conscious awareness.
Second, the willingness of research participants to openly share their genuine feelings plays a role in the correlation between fEMG and self-reports. Some participants may feel compelled to express or suppress certain emotions based on societal expectations or appropriateness (Friesen et al., Reference Friesen, Smith and Hibbing2017). This point is underscored by some of the 18 political science studies using fEMG mentioned in the introduction (see Table A5 in the Supplementary Material). For instance, Ensari et al. (Reference Ensari, Kenworthy, Urban, Canales, Vasquez, Kim and Miller2004) demonstrated that participants rated politically sensitive out-groups favorably, while their fEMG activity indicated negative affect. However, when experimentally induced to justify hostility toward these out-groups, participants’ self-reported attitudes aligned with the negative fEMG activity. Similarly, Stewart et al. (Reference Stewart, Amoss, Weiner, Elliott, Parrott, Peacock and Vanman2003) found no correlation between fEMG activity in response to interactions involving gay men and participants’ self-reported attitudes toward gay men. Interestingly, corrugator activity (measuring disgust) predicted antidiscrimination behaviors, whereas self-reported attitudes did not.
These two studies demonstrate that fEMG is relevant in contexts in which participants may be reluctant to report their views. In other cases, participants are very motivated to express their feelings, but with no discernible fEMG activity. For example, in the experiment conducted by Homan et al. (Reference Homan, Schumacher and Bakker2022), participants reported feeling more anger when their in-party politician displayed anger instead of a happy or neutral display. But this difference between experimental conditions was not found in the corrugator and zygomaticus activity measures. Also, Bakker, Schumacher, and Homan (Reference Bakker, Schumacher and Homan2020) showed that stronger partisans reported feeling more disgust toward an out-party politician, while the strength of partisanship did not significantly impact the fEMG response in the labii (disgust) muscle.
To gain a comprehensive understanding of emotional or affective responses to political stimuli, we advocate collecting both fEMG measures and self-reports of feelings. Each measure captures different aspects of this complex process. Rather than rejecting one of the two measures, researchers should theorize when they should align and when not (Arceneaux et al. Reference Arceneaux, Bakker and Schumacher2024). This way, we can acquire a richer and more nuanced picture of the interplay of cognitive and affective processes in political decision-making.
Robustness of fEMG measures and analysis
Various approaches exist for preprocessing raw fEMG data, creating variables from fEMG measurements, and analyzing the effects of treatments on fEMG measures (van Berkum et al., Reference van Berkum, Struiksma, ’t Hart, Grimaldi, Shtyrov and Brattico2023). Experts in this field recognize the importance of adopting more standardized and transparent workflows, yet they also acknowledge how difficult this is given the sensitivity of fEMG readings (van Berkum et al., Reference van Berkum, Struiksma, ’t Hart, Grimaldi, Shtyrov and Brattico2023). Here we discuss the latest developments regarding preprocessing procedures, variable creation, and analysis. We use data that we previously collected to evaluate these developments and make specific recommendations. Before doing so, the next section first describes the data we collected.
Data collection
We combined data from five experimental designs gathered at seven data collection sites (note that several individual experimental designs have already been analyzed and published: Bakker, Schumacher, Gothreau et al., Reference Bakker, Schumacher, Gothreau and Arceneaux2020; Bakker, Schumacher, & Homan, Reference Bakker, Schumacher and Homan2020; Bakker et al., Reference Bakker, Schumacher and Rooduijn2021; Homan et al., Reference Homan, Schumacher and Bakker2022; Schumacher et al. Reference Schumacher, Rooduijn and Bakker2022). Yet the analyses conducted here are new. In total, this data set contains 98 treatments. Some of these treatments are images from the International Affective Picture System (IAPS) with positive, negative (or disgusting), or neutral valence. Other treatments are images of party leaders or videos in which different stances on political issues are presented. Table 1 provides an overview of these data collections. Table A1 in the Supplementary Material describes each experimental design.
Notes: In the header, p stands for participants, t for treatments. IAPS refers to the International Affective Picture System. ADFES is the Amsterdam Dynamical Facial Expression System. The treatments are described in detail in Table A1 in the Supplementary Material. Regarding the locations: Lowlands is a large cultural festival, the EU Youth Festival is an evangelical youth festival, TT Assen is a bikers event, the Fair Tilburg is a fair throughout the city of Tilburg in the Netherlands. The Media Museum is a museum about media in Hilversum in the Netherlands. The Lab refers to the university lab. The locations are diverse in terms of geographical spread.
In total, the data set contains 585 unique participants from different locations. Participants in the university lab signed up through an online portal and received a financial compensation (equivalent to 10 euros per hour or course credit). Participants in the labs-in-the-field were recruited on-site and received no financial compensation. In all cases, participants first signed an informed consent form and completed a survey on a desktop computer (laboratory) or iPad (lab-in-the-field). Next, participants were connected to physiological measurement equipment by trained research assistants. Participants (lab-in-the-field) were also given noise-canceling headphones (Bose). We recorded events during the experiment in a logbook. For example, when electrodes came off or when a participant’s mobile phone rang, we wrote this down to evaluate how it impacted our measurement. We return to this issue in the next section.
We recorded physiological responses per millisecond using the Versatile Stimulus Response Registration Program 1998 (Vsrrp98) software on laptops (lab-in-the-field data collection) or stationary computers running Windows 7 (laboratory data collection) at 1,000 Hz. The lab equipment we used was able to reliably and validly capture fEMG activity in earlier work in other domains (for references, see Bakker et al., Reference Bakker, Schumacher and Rooduijn2021).
To measure negative affect, we measured the activity of the corrugator supercilii (see also Figure 1). We did this using two 70-millimeter Ag/AgCl electrodes that were filled with electrolyte gel (Signa, Parker Laboratories). Using double-sided adhesive tape, we placed the two electrodes just above the eyebrow—directly where the muscle is located (Fridlund & Cacioppo, Reference Fridlund and Cacioppo1986). A third electrode was placed on the middle of the forehead (just below the hairline) and served as a the ground measure. The corrugator has no overlapping muscle groups, has a very limited representation in the motor cortex, and “tends to be bilateral innervated” (Larsen et al., Reference Larsen, Norris and Cacioppo2003, pp. 776–777). The measurement of the corrugator is therefore less subject to disruptions from the (voluntary) movement of other muscles.
We also measured the activity of the zygomaticus major by placing electrodes on the cheek where the zygomaticus muscle region is located (Larsen et al., Reference Larsen, Norris and Cacioppo2003).Footnote 1 It is a difficult muscle to measure because it has greater contralateral innervation (Larsen et al., Reference Larsen, Norris and Cacioppo2003), and the cheek is a particularly crowded area of the face with lots of muscles (Tassinary et al., Reference Tassinary, Cacioppo, Vanman, Cacioppo, Tassinary and Berntson2007). This makes measures of the zygomaticus susceptible to “cross talk” (Larsen et al., Reference Larsen, Norris and Cacioppo2003, p. 777). Irrespective of the difficulties of measuring the zygomaticus, it is a unique measure to capture positive affect.Footnote 2
(Pre)processing raw fEMG data
Facial EMG data are highly sensitive. For one, because of technical errors, physiological measures or time markers of treatment onset may be (partially) missing and therefore cannot be analyzed. Second, fEMG measurement is susceptible to external noise. To reduce noise in the signal, raw fEMG data are typically band-pass filtered between 20 and 400 Hz, with an additional 50 Hz notch filter (van Boxtel, Reference van Boxtel, Spink, Grieco, Krips, Loijens, Noldus and Zimmerman2010). Recording software for fEMG typically integrates the raw signal. The output of this process is fEMG values expressed as microvolts per time unit of choice.
In addition to these standard procedures, we describe three more preprocessing steps that we have encountered in the literature. First, during the experiment, electrodes may come off or temporarily have problems transmitting the signal. These issues can be identified by visualizing fEMG activity and searching for flat lines or big temporary drops in the signal. We used two human coders—who were unaware of the experimental conditions—to visually inspect physiological activity and search for flat lines and temporary drops. On the basis of this, we identified 10% to 15% of the data as potentially problematic. Second, fEMG readings may suffer from the activity of other facial muscles. Zygomaticus readings may be suspect because of people clenching their jaw. Corrugator activity can be influenced by eye blinks (Bhowmik et al., Reference Bhowmik, Jelfs, Arjunan and Kumar2017). Because such cross talk follows a particular pattern, specific algorithms can be applied to reduce this influence. In particular, we use a hampel filtering algorithm to identify and replace outliers (Bhowmik et al., Reference Bhowmik, Jelfs, Arjunan and Kumar2017).Footnote 3 Third, it is possible to use different statistical transformations such as winsorizingFootnote 4 or removing statistical outliers.Footnote 5 To evaluate the consequences of these preprocessing steps, we compare estimates of treatment effects on dependent variables constructed using different preprocessing steps. Figure 3 summarizes these results, but before we discuss them, we first need to take a few more steps to completely explain our measurement and analysis strategy.
fEMG measure construction
It is a general recommendation in the psychophysiology literature (Blascovich et al., Reference Blascovich, Vanman, Mendes and Dickerson2011; Potter & Bolls, Reference Potter and Bolls2012) that the fEMG activity of a participant during a stimulus be contrasted with fEMG activity during a neutral baseline. Typically prior to a stimulus, participants see a blank screen—with a cross in the middle for focus—and are asked to relax. Baseline activity thus measures the physiological activity of a specific muscle when in a resting state.Footnote 6 What is less clear in the psychophysiology literature is how long this baseline measurement should last. Participants are expoded to these blank screens between 8 and 30 seconds, and part of the activity during this period is used to calculate baseline activity. Some researchers use the last 2, 5, or 7 seconds of the baseline, while others recommend finding those seconds before the treatment that are artifact-free. It is quite often not specified why a specific selection of the baseline is chosen. To test whether the choice of a specific baseline influences our results, we analyzed baseline activity in our data by taking five different centrality measures (e.g., the mean or the median) and selecting nine different periods from the baseline (e.g., all data, last 2 seconds). We compared these 9 * 5 = 45 baseline calculations to each other and to human coding to identify the general tendency in the baselines. Section 5 in the Supplementary Material provides the details of this analysis. On this basis, we give a weak recommendation to taking the median of the baseline (all observations). This measure produced among the highest correlations with human-coded data, and medians are generally more effective at reducing noise in signals than means. Also, taking all observations in the baseline avoids the need to select between rather arbitrary cutoff lines of 2, 3, or 5 seconds. That said, other choices here are unlikely to produce very different results.
By subtracting the baseline value from fEMG activity during the stimulus, one can construct a measure of fEMG activity that allows for comparisons between participants. This measure expresses the difference in microVolt between the stimulus and the individual baseline. In the literature, there are two ways to construct a measure of fEMG activity. First, divide the raw fEMG activity in the treatment by the baseline activity and multiply by 100. This way, all values can be expressed as a percentage increase or decrease. A second option is to z-standardize the fEMG data. The microvolt and percentage measures correlate very highly in our data set (r = .9). Both measures correlate at approximately r = .7 with the z-standardized measure. In this article, we use the second measure, which indicates percentage increase and decrease in microvolt fEMG activity. We chose this measure because we find it easier to interpret. Given the high correlations with the alternatives, we expect this choice to matter little.
So far, we have discussed analyzing the activity of single muscle regions. But researchers have also calculated a facial activity index by subtracting zygomaticus activity from corrugator activity (Olszanowski et al., Reference Olszanowski, Wróbel and Hess2020). This is intuitively appealing, because it creates a single valence measure. Yet this measure becomes problematic if participants have counterempathetic responses (Bucy & Bradley, Reference Bucy and Bradley2004)—for example, when participants are amused by an angry-looking out-party politician (Homan et al., Reference Homan, Schumacher and Bakker2022). In this particular case, both corrugator and zygomaticus activity point in the same direction, and so subtraction produces zero facial activity even though there are clear physiological signals. In this case, capturing overall facial reactivity by adding zygomaticus activity to corrugator activity may be interesting.
Analysis strategy
Now that we have constructed a measure of fEMG activity, what analysis strategy should be adopted to analyze the effects of treatments? Various political science applications using psychophysiology have compared mean fEMG activity in different treatments (e.g., Settle et al., Reference Settle, Hibbing, Anspach, Carlson, Coe, Hernandez, Peterson, Stuart and Arceneaux2020). Alternatively, one can calculate the maximum fEMG activity or the area under the curve (AUC) (Osmundsen et al., Reference Osmundsen, Hendry, Laustsen, Smith and Petersen2021). More recently, however, fEMG activity during a treatment has not been collapsed into a single number but modeled by second as a multilevel model (Hess et al., Reference Hess, Arslan, Mauersberger, Blaison, Dufner, Denissen and Ziegler2017; Olszanowski et al., Reference Olszanowski, Wróbel and Hess2020; ‘t Hart et al., Reference ‘t Hart, Struiksma, van Boxtel and van Berkum2019). To see why this is useful, Figure 2 shows four corrugator responses to a negatively valenced image. All signals start at their own individual baseline set at 100, after which the treatment starts. The top (black) line shows strong increased corrugator activity. The bottom (blue) line shows a decrease in corrugator activity. The two middle lines show a response (a peak above 100) but also a decline. This decline is typical of fEMG responses and can be interpreted as activation-relaxation. Facial EMG activity typically reaches its full potential 750 milliseconds after stimulus onset, after which it returns in the direction of the baseline level (van Berkum et al., Reference van Berkum, Struiksma, ’t Hart, Grimaldi, Shtyrov and Brattico2023). The bottom and top lines also show this pattern. The problem is that if we take the mean of these responses, the green line will be around 100 and interpreted as a nonresponse, even though there is a clear peak of 10% more activity than the baseline. The problem with taking mean activity is that we may equate weak responses with no responses because the relaxation of the muscle is (close to) equal to the activation. The added advantage of a multilevel setup is that one can also model time dynamics.
To further substantiate the choice for a multilevel model, Figure A3 in the Supplementary Material presents simulations in which we compare different analysis strategies. Specifically, we contrast using the mean, the maximum, and the AUC activity measures with a multilevel model using second-by-second activity measures. In the latter case, we retrieve the significant treatment effect in 95%, whereas in the other cases this is between 76% and 77%.
We recommend using a multilevel setup to analyze fEMG activity. This still gives some degrees of freedom regarding model specification, such as how to model time dynamics and what covariates to add. For this article, we ran a series of models including time dynamics and covariates—but no treatments—and selected the model with the best fit. Section 6 in the Supplementary Material provides more details regarding this. We preregistered the analysis strategy and preprocessing strategy discussed here (https://osf.io/2wd8g/). Our design is sufficiently powered to detect small effects (see Section 9 of the Supplementary Material). All subsequent analyses adopt these strategies.
Analysis of preprocessing steps
Figure 3 compares the preprocessing steps that we discussed earlier. We subset our data to two conditions: an IAPS picture with negative valence (a tumor) and an IAPS picture with positive valence (a baby)—taken from Design 5 (n = 102). Using the percentage of fEMG activity compared to the baseline—as discussed in the preceding section—we estimate the difference between the two conditions. We expected that the negative image of a tumor would produce more corrugator activity and less zygomaticus activity than the positive image of a baby. To this end, we ran two multilevel models estimating the difference in corrugator activity (left-hand panel of Figure 3) and zygomaticus activity (right-hand panel).
The first estimate (a dot above the “none” label; the bars denote 95% confidence intervals) includes all recorded data. The unstandardized coefficient is 42.933. This means that—as expected—in the tumor treatment, participants have 42.9% more corrugator activity than in the baby treatment. Then we removed signals that at least one human coder identified as problematic (labeled 1 error). This reduces the effect size by a factor of 2 and the standard error by a factor of 6.5. The next set of preprocessing steps further reduces the standard error, respectively by a factor of 2 (the hampel filter compared to the 1 error condition), 1.6 (winsorizing compared to hampel filter), and 1.2 (outlier removal compared to hampel filter). Effect sizes also gradually diminish, but by the same degree as the standard error.
We took the same steps to analyze zygomaticus activity (right-hand panel of Figure 3). As expected, the direction of the coefficient is negative, meaning that the tumor treatment produces less zygomaticus activity than the baby treatment does. However, in steps 1 and 2, the confidence intervals are very large, and therefore the estimated difference is not statistically significant. Winsorization and the removal of remaining statistical outliers strongly reduce the standard errors in our estimates (to 9.3% and 34.1%, respectively), making the estimate statistically significant. Note that the hampel filter (for eye blinks) is only applicable for corrugator data.
For both corrugator activity and zygomaticus activity, we identified that most preprocessing steps reduced the error in our estimates—as one would expect. There is no indication of flipping signs of treatment effects. Our recommendation is to ensure robustness of results by presenting treatment effects using different preprocessing steps.
Improving designs including fEMG measures
In this section, we deal with questions of research design: What treatment characteristics work best to elicit fEMG responses? And what about the (lack of) participant heterogeneity in typical lab samples?
Data collections at the University of Amsterdam’s lab were complemented with data collections in the field to reach a politically more diverse population. We also investigate differences between the lab and the lab-in-the-field and explore the role of participant characteristics and treatment characteristics to make recommendations regarding future experimental designs with fEMG measures.
What is the effect of treatment characteristics on fEMG?
The 98 treatments in our dataset differ in medium (video, image, or word), sound (no sound or sound), valence (positive, negative, or neutral), and political versus nonpolitical content. This variation allows us to evaluate the impact of treatment characteristics on fEMG activity. To analyze this, we coded each treatment as follows: (1) does the treatment contain positive or negative emotional content or is it neutral; (2) is the treatment a video (= 2), a word (= 1), or an image (= 0); (3) is the treatment with (= 1) or without audio (= 0); (4) does the treatment have political (= 1) or nonpolitical content (= 0); and (5) of those treatments with political content, does the treatment include a face of a politician (= 1) or does it describe an issue position (= 0). Table A1 in the Supplementary Material describes how each specific treatment was coded.
Figure 4 visualizes the results of our analyses.Footnote 7 Before discussing the results, it is important to note that the two dependent variables produce by and large the same conclusions. The effects on the winsorized dependent variable look a little bit smaller, but this is because the variance in these variables is also smaller.Footnote 8 For practical reasons, our presentation of the results concentrates on the first point estimate in each panel, which are from the analysis with erroneous signals and outliers removed and hampel filter applied.
First, we consider the effect of the valence of the nonpolitical treatments. We compare negative to neutral (first column) and positive to neutral (second column). Regarding corrugator activity, there is a small positive effect of negative versus neutral content (b = 5.454, standardized (b) = 0.176, SE = 0.266). Put differently, negative content produces 17.6% more corrugator activity than neutral content does. Positive content, as expected, produces less corrugator activity than neutral content (b = –6.186, standardized (b) = –0.199, SE = 0.352) and, by implication, negative content. This means that the corrugator muscle relaxes during exposure to positive materials. Positive material also produces significantly more zygomaticus activity (second bottom panel) than neutral materials (b = 17.442, standardized (b) = 0.135, SE = 1.724). Surprisingly, negative material also produces more zygomaticus activity. In terms of effect size, this effect is a factor 4 smaller than the effect of positive material and only explains a difference of 1/30th standard deviation in zygomaticus activity (b = 4.436, standardized (b) = 0.034, SE = 1.323). As such, it is mostly neutral material that does not produce much zygomaticus activity. Another explanation is that very negative stimuli also produces zygomaticus activity (van Berkum et al., Reference van Berkum, Struiksma, ’t Hart, Grimaldi, Shtyrov and Brattico2023).
Videos produce slightly more physiological activity than images do, both regarding corrugator activity (b = 1.466, standardized (b) = 0.046, SE = 0.242) and zygomaticus activity (b = 3.118, standardized (b) = 0.031, SE = 1.099). Words produce slightly less corrugator activity (b = –0.756, standardized (b) = –.023, SE = 0.387) and zygomaticus activity (b = –6.658, standardized (b) = 0.066, SE = 1.314) than images do.
Treatments with audio produce slightly more corrugator activity (b = 1.331, standardized (b) = 0.041, SE = 0.218) than treatments without audio do. Regarding zygomaticus activity, the picture is more diffuse with one measure suggesting a null effect and the other measure a tiny positive effect.
Treatments with political content produce more corrugator activity than treatments with nonpolitical content (b = 3.185, standardized (b) = 0.099, SE = 0.237), while there seems to be no difference regarding zygomaticus activity. Political treatments including faces—as opposed to issues—produce less corrugator activity (b = –2.761, standardized (b) = –0.086, SE = 0.227) and more zygomaticus activity (b = 7.499, standardized (b) = 0.074, SE = 1.115). Overall, the treatment characteristics evoke corrugator and zygomaticus activity in predicted ways, further validating the measures.
What is the best design of a treatment to elicit fEMG activity? Prior to collecting the data for our first design, we reasoned that as most people do not care about politics, our experimental treatments needed to be very strong to elicit any bodily response. Therefore, we designed videos in which strongly worded political positions were defended by a voice actor. The fact that treatments with political content elicit stronger corrugator activity than treatments with nonpolitical content indicates that our initial reasoning was wrong. People do have strong physiological responses to politics. In one experiment, we measured fEMG activity in response to images of in-party leaders, out-party leaders, and ordinary citizens with different manipulated emotional expressions. These manipulations were identical. Even with such highly similar stimuli, participants have stronger responses to the political stimuli compared to the nonpolitical ones.
We do not need to make our stimuli dramatic to elicit a fEMG response. In fact, most of fEMG research uses very simple stimuli such as words or sentences. In the latter case, the words in these sentences are interspersed with sufficient time to reliably capture an fEMG response that can be associated with the stimulus (‘t Hart et al., Reference ‘t Hart, Struiksma, van Boxtel and van Berkum2019; van Berkum et al., Reference van Berkum, Struiksma, ’t Hart, Grimaldi, Shtyrov and Brattico2023). Such a design has stronger internal validity than the videos we developed, because in the videos, a lot happens at the same time. We do not know whether people respond to the images or to what is said. The videos look like real campaign messages. Therefore they are more ecologically valid, but this may also produce more noise in fEMG activity. In designing fEMG studies, it is important to consider this trade-off between internal validity and ecological validity.
What is the effect of sample characteristics on fEMG?
Facial EMG data are typically collected in a university lab. A concern with lab studies is that by using small samples of typically university students, we lack the heterogeneity in, for example, political attitudes to answer our research questions. Because we shared this concern, we ran several of our studies outside the lab at a field location. This clearly increased variation in ideology, age, and education in our sample (Bakker, Schumacher, Gothreau et al., Reference Bakker, Schumacher, Gothreau and Arceneaux2020), but it also brings some challenges regarding the comparability of the results because of, for example, differences in temperature and surroundings. Also, some of our field locations were festivals, and some of our participants had been drinking.Footnote 9
To assess differences between data collections in the lab and the field labs, we conduct the same analysis as in the preceding section but now add a dummy variable for lab versus field location. Regarding both measures of corrugator activity, we find no statistically significant difference between data collected in the lab or in the field lab. For zygomaticus activity, however, we find systematically higher zygomaticus activity in the lab than in the field lab (b = 28.88, standardized (b) = 0.301, SE = 4.927). In this case, this only concerns the Lowlands field lab location, because we did not collect zygomaticus data in the other field labs. In contrast, corrugator activity at Lowlands was significantly lower than in all other locations. We ran a number of robustness checks that could not account for the difference.Footnote 10
The previous analyses contrasts level differences in fEMG activity. It does not analyze whether treatment effect would differ depending on the location or participant characteristics that are more common in one location than another. We turn to two such analyses.
First, would different locations lead us to different treatment effects? Figure 5 seeks to illustrate this problem. We list the four locations in which we conducted Design 3. There is quite a variance in the percentage of voters who reported voting for a right-wing party, as indicated by the different sizes of the points in Figure 5. At the EO Youth Festival—an evangelical youth festival—only 15.6% percentage of the participants voted for a right-wing party.Footnote 11 Yet at TT Assen—a biker’s event—71.4% voted for a right-wing party. To evaluate the impact that such variation can have, we first regressed the left/right ideology question on corrugator activity during exposure to an image of Geert Wilders—the leader of a radical-right party in the Netherlands (see left panel of Figure 5). The overall effect (see “All” on the y-axis) is statistically significant and negative. This means that left-wing individuals experience more corrugator activity to Wilders than right-wing people do. In none of the individual locations is the effect statistically significant, yet in three out of four cases, it is negative. Only at the EO Youth Festival, where we have relatively few right-wing people, is the effect precisely 0. The right panel shows a regression estimate from comparing two experimental conditions—Wilders to his main critic Alexander Pechtold, the leader of D66, a social-liberal party. Here, we retrieve the general positive effect of Wilders producing more corrugator activity than Pechtold in three out of four locations. Now in the case with the most right-wing voters, we also find a positive but not significant effect. These two cases illustrate that imbalances in a variable like ideology can affect the results of a study. It shows the importance of drawing participants from a pool that is not necessarily representative for the population, but minimally balanced.
Second, do participant characteristics moderate treatment effects? For reasons of space, we concentrate here on analyzing differences between participants in how they processed neutral, negative, and positive treatments. We consider the moderating effect of participant characteristics, including gender, age, left/right ideology, and education, as these are common sources of variation in political attitudes.Footnote 12 We asked a gender question with three categories (0 = male, 1 = female, 2 = nonbinary). We asked age in years. Left/right ideology was measured with a single question that asked participants to place themselves on a 0 (= left) to 10 (= right) scale. Finally, we asked participants to indicate the highest level of education that they finished. We identified four levels: secondary vocational, higher vocational, secondary, and university. In the Supplementary Material, Table A2 and Figures A1a and A1b provide the descriptive statistics of these measures.
Figure 6 illustrates the marginal effects of participant characteristics in negative, neutral, and positive treatments. Starting on the left, the higher the age of the participant, the more corrugator activity there is in the neutral and positive conditions, but not in the negative condition. This effect is very small: the effect of age in the positive condition is 1.646. This translates to an increase in standardized corrugator activity of 0.05 for every standard deviation increase in age.
Regarding gender differences—displayed in column 2 of Figure 6—female participants have weaker corrugator responses to positive and neutral treatments than male participants. Particularly, the corrugator responses of female participants to positive treatments are substantively lower compared to male participants (difference (female – male) = –6.45, SE = 1.27). This is a substantial reduction of 0.21 standard deviation in corrugator activity. We are not familiar with an explanation of this differences from the psychophysiology literature (see also Soroka et al., Reference Soroka, Gidengil, Fournier and Nir2016). Regarding negative treatments, the sign is positive, yet there is no statistically significant difference between male and female participants (difference (female – male) = 1.51, SE = 1.16). We find no gender differences in zygomaticus activity.
Left/right ideology differences are displayed in column 3 of Figure 6. Left/right ideology does not have a statistically significant effect on corrugator activity in any of the treatments separately or when we compare the trend lines between the categories, although in some of the comparisons, the p-values are just above the mark for statistical significance. For zygomaticus activity, however, we find that for positive treatments right-wing participants have somewhat lower zygomaticus activity than left-wing participants. The effect is not substantive: a one standard deviation shift in left/right ideology is associated with a –0.04 standard deviation reduction in zygomaticus activity.
The effects of the education categories are displayed in columns 4–6 of Figure 6. Although there are some effects that do not cross the zero line, none of the effects is statistically significant when we adjust for multiple comparisons using the bonferroni method.
In sum, we report some differences regarding age, ideology and gender. The first two effects are very small. In all, these results do not provide strong evidence that lack of heterogeneity may affect treatment results. There are of course other reasons—such as statistical power—for seeking more heterogeneous samples.
Conclusion
The goal of this article is to encourage the adoption of fEMG in political science. We identified four issues that block this wider adoption. We addressed three of these four issues.
First, we used old and new material to show the construct validity of fEMG. Next, we identified that concerns about the concurrent validity of fEMG—that is, the correlation with self-reports—are unfounded. Self-reports and fEMG measure different aspects of complex, dynamic, partially conscious and partially unconscious emotion episodes. We should not expect them to correlate. Rather, the next step in the literature should be when to expect alignment or disalignment between self-reports and fEMG (or other psychophysiological or neurological measures) (Arceneaux et al., Reference Arceneaux, Bakker and Schumacher2024).
Second, there is a lack of robustness in fEMG measurement and analysis due to too many researcher degrees of freedom. Following recent developments and own analyses, we recommend specific preprocessing steps, how to construct a baseline activity measure, and different ways of calculating and analyzing fEMG activity measure. Our analyses of different preprocessing steps show remarkably robustness across specifications. That said, we also recommend preregistering analysis and preprocessing plans with fEMG data (van Berkum et al., Reference van Berkum, Struiksma, ’t Hart, Grimaldi, Shtyrov and Brattico2023).
Third, we addressed concerns about the quality of experimental designs and samples used in lab experiments. Using the data we collected and reflecting on our own experiences we identify a trade-off between internal validity and ecological validity. We analyzed treatment heterogeneity, for which we found limited evidence.
The problem we did not discuss is that most political scientists lack the skills and equipment necessary to conduct a study including fEMG measures. Most of the authors of this article have the luxury of working at a university with a faculty-level lab with the relevant equipment and expertise. We did not have to pay the start-up cost of approximately €30,000 to get the laptops, wires, signal amplifier, and relevant software to conduct an fEMG experiment. To run an experiment is fairly cheap, because you have few disposables: wires, adhesive gel or stickers, and cleaning utensils. To get expertise is, of course, more complicated. We hope that this article has helped make fEMG more accessible to political scientists by providing clear recommendations on various processing steps. Yet we recommend that scholars interested in using fEMG seek collaborations with teams already using fEMG protocols.
As a final note, we want to say that fEMG is not a miracle measure. However, we believe it is part of a larger effort to transform the study of political behavior. So far, the field has relied almost exclusively on self-reported measures of feelings, beliefs, and opinions. As a consequence, the processes that lead to such reports are ignored, and we fail to appreciate the mechanisms underlying feelings, beliefs and attitudes. On the positive side, political scientists are increasingly using techniques that tap into unconscious or preconscious processes, such as the implicit association task, automated emotion recognition algorithms based on computer vision techniques, EEG, fMRI, and psychophysiological measures such as skin conductance, heart rate, and, the focus of this article, fEMG. We hope to place this study into this growing body of work investigating the affective-cognitive mechanisms behind political attitudes, beliefs, and feelings.
Acknowledgements
Support for this research was provided by the Dutch National Science Foundation (B.N.B., G.S. & M.R.), the Royal Dutch Academy of Sciences (B.N.B., G.S., & M.R.), the broadcaster NTR (B.N.B., G.S., & M.R.), BKB: The Campaign Agency (B.N.B., G.S., & M.R.), the University of Amsterdam (B.N.B., G.S., & M.R.), the Amsterdam School of Communication Research (B.N.B.), the Research Priority Area Communication (B.N.B.), the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 750443 (B.N.B), the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme under grant agreement No 759079 (G.S.), and an NWO-Veni grant No 451-16-020 (M.R.). We want to thank the organizers of “Lowlands Science” and in particular Noortje Jakobs, Rewan Janssen, and the late Erik van Bruggen for the facilitation of the data collection during Lowlands. Gé Teunissen, Sjoukje Kerman, and Robin Meijers facilitated the creation of the stimulus material. We want to thank the team of the Dutch TV-show “De Kennis van Nu” and in particular Dirk de Bekker, Diederik Jekel, Ines Kaal, Susanne Linssen, Elisabeth van Nimwegen, Roland Vissers, and Marcia van Woensel for the inspiring conversations about our project and the facilitation of the data collection at Design 2. Throughout this project, Ming Boyer, Boris van den Berg, Cas Woudstra, Emke de Vries, Sander Kunst, Judith Meijer, Denise van de Wetering, and Myrthe Willems provided excellent assistance during the data collection, and we want to thank Bert Molenkamp, Marco Teunisse, and Jasper Wijnen for their technical assistance.
Supplementary material
The supplementary material for this article can be found at http://doi.org/10.1017/pls.2023.26.
Data availability statement
This article earned the Open Data, Open Materials, and Preregistration badges for open science practices. The data, replication code, and preregistration for this study are available at https://osf.io/2wd8g/.
Authors contributions
Conceptualization: G.S.; Data curation: G.S.; Formal analysis: G.S.; Funding acquisition: G.S., B.N.B. and M.R.; Investigation: G.S., M.D.H., I.R., N.F., B.N.B. and M.R.; Methodology: G.S., M.D.H., B.N.B. and M.R.; Project administration: G.S.; Resources: G.S.; Supervision: B.N.B. and M.R.; Validation: G.S.; Visualization: G.S.; Writing – original draft: G.S., M.D.H. and I.R.; Writing – review & editing: G.S, M.D.H., I.R., N.F., B.N.B. and M.R.