1. Introduction
A key concern of students of spatial voting and party competition is how valence—beyond policy aspects—affects party strategies (for recent reviews, see Adams, Merrill, and Zur Reference Adams, Merrill, Zur, Curini and Franzese2020; Evrenk Reference Evrenk, Congleton, Grofman and Voigt2019; Magyar, Wagner, and Zur Reference Magyar, Wagner, Zur, Congleton, Grofman and Voigt2023). The following approach has been widely applied for decades to study the impact of valence: researchers estimate a vote choice model consisting of choice attributes (spatial proximities), chooser attributes (voter demographics) and intercepts, define the intercepts as valences, and based on the sign and the size of the intercepts, they reach conclusions about how valence influences party vote shares and positional strategies (e.g., Schofield and Sened Reference Schofield and Sened2005a, Reference Schofield and Sened2006; Zur Reference Zur2021a,Reference Zurb). Section A of the Supplementary Material contains a not exhaustive list of 32 references, broadly published in top journals and publishing houses and widely cited that adopt this approach.
Mauerer (Reference Mauerer2020) highlights several difficulties that arise when relying on intercepts as valences, such as their dependence on arbitrary coding decisions of chooser attributes, and recommends studying valence qualities by covariates. However, she does not provide the statistical fundamentals to do so. The present contribution takes up this task and pursues three objectives to advance the empirical modeling of valence.
First, we clarify the interpretation of intercepts by investigating their link to chooser attributes and choice probabilities. We discuss the conventional identifiability restriction and present choice models imposing an identifiability approach that frees researchers from arbitrary coding decisions when interpreting intercepts as valences. Second, we propose an alternate strategy to intercepts to study the impact of valence. We outline a modeling approach and parameterization to incorporate valence as an additional observable utility source. Third, we discuss different specification strategies.
We accomplish these objectives based on identification issues and resulting model properties as well as covariate specification and effect parameterization strategies and illustrate implications for substantive interpretation using national election data.Footnote 1 We do not aim to provide a new definition of valence in terms of a substantive meaning or operationalization but investigate the statistical fundamentals.
Next, we outline the background and the analytical challenges involved (see also Mauerer Reference Mauerer2020), and lay out our objectives in detail. Section 2 briefly reviews the standard choice model and identification issues to set the methodological ground. Section 3 discusses the impact of coding schemes on interpreting intercepts as valences. Section 4 outlines how to model valence as an observable source of utility. Section 5 closes with concluding remarks.
1.1. Background, Analytical Challenges, and Objectives
The concept of valence goes back to Stokes (Reference Stokes1963). Instead of competing on policy issues where parties and voters can take different stands, his key argument is that the empirical reality is characterized by competition on competence, performance, trust, handling abilities, success, or lack thereof. Since Stokes’ original conceptualization, the literature has gone many different ways to define and incorporate valence in theoretical and empirical models, what Green and Jennings (Reference Green, Jennings, Arzheimer, Evans and Lewis-Beck2017, 550–551) call the “Valence Soup” in reviewing the immense literature that resulted meanwhile:
“Thus far authors have defined valence as a valence dimension, a party valence score, valence as a candidate’s character or strategic advantage, a leader advantage or disadvantage, valence as a strategic advantage, as candidate quality, candidate experience, education or the lack thereof, as party activism, the level of activist support, candidate spending, the reputation of candidates, scandals and corruption (or their absence) in political parties and corruption at the level of candidates.”
Many more definitions can be added to this list when inspecting the enormous formal literature (for a review, see Evrenk Reference Evrenk, Congleton, Grofman and Voigt2019). Within the spatial voting literature in the tradition of Downs (Reference Downs1957) where spatial proximity is the primary source of voter utility, valence is frequently vaguely defined as a second dimension of competition, a quality of a party (or candidate) that is not policy-related, that marks a difference between the parties, and this difference benefits the parties.
Our contribution operates within the influential literature on probabilistic spatial voting models, where a random aspect that is independent from spatial considerations is incorporated (e.g., Adams Reference Adams1999; Burden Reference Burden1997; Coughlin Reference Coughlin1992; Enelow and Hinich Reference Enelow and Hinich1989). This prominent research strand settled on the discrete choice framework (see, e.g., Train Reference Train2009) to arrive at empirical spatial vote choice models because the framework comes with several attractive features. In particular, both observed and unobserved choice determinants affect voter utility, and a random utility maximization process is the typical underlying choice rule. It also allows exploring how attributes of choice alternatives (e.g., voter–party spatial proximities) and choosers (i.e., voters) determine vote choices (e.g., Alvarez and Nagler Reference Alvarez and Nagler1998). While choice models can be formulated in many different ways, the conditional logit model (McFadden Reference McFadden and Zarembka1974) has become the standard choice modelFootnote 2 in the study of spatial voting (e.g., Adams, Merrill III, and Grofman Reference Adams, Merrill and Grofman2005; Kedar Reference Kedar2005; Merrill III and Adams Reference Merrill and Adams2001; Stoetzer and Zittlau Reference Stoetzer and Zittlau2015).
A widely adopted approach to integrating valence in empirical spatial vote choice models relies on the intercepts (see Section A of the Supplementary Material). The intercepts as a measure of valence were introduced by the Spatial Valence Model of Politics (e.g., Schofield and Sened Reference Schofield and Sened2005b, Reference Schofield and Sened2006) where they are understood as parties’ (candidates’) average nonpolicy qualities. A fundamental property of the standard choice model is that the intercepts represent the unobserved utility sources and, consequently, reproduce the vote shares in the data, given a particular set of covariates (Mauerer Reference Mauerer2020, 308–310), which has major implications for the numerous works that rely on intercepts to understand how valence affects spatial competition. The typical result is that parties with large vote shares have a large valence, and parties with small vote shares have a small valence or are “valence disadvantaged.”
Let us consider recent work to demonstrate that such results are no substantive findings on how valence affects party strategies and that merely a model property produces them. In the vibrant debate on the “collapse of centrist parties” in Europe, Zur (Reference Zur2021a,Reference Zurb) concludes that the decline of centrist party vote shares is the result of the loss of valence of these parties by drawing on the size and the direction of the intercepts.Footnote 3
Take, for example, the 2013 German election data Zur (Reference Zur2021b) analyzes, where the party vote shares are: Major-right CDU: .404, major-left SPD: .302, FDP: .030, Greens: .096, Left: .128, popular-right AfD: .044. The FDP, which has the smallest vote share, is classified as the centrist party and specified as the reference party. First, consider an intercept-only model where the following intercept estimates reproduce the vote shares: CDU: 2.73, SPD: 2.44, FDP: 0, Greens: 1.30, Left: 1.59, AfD: .51. Since the FDP has the smallest vote share, all intercepts, which are relative to the FDP, must be positive. The intercept for the CDU is the largest because the difference in vote shares between the CDU and the FDP is the largest. Thus, the direction and size of the intercepts are directly related to the vote shares.
Next, consider the intercepts, given spatial proximity on the left-right dimension, from which Zur (Reference Zur2021b) derives his conclusions: CDU: 2.79, SPD: 2.22, FDP: 0, Greens: 1.10, Left: 1.78, AfD: 1.53. When the model accounts for spatial proximity, there is some change in the intercepts; however, they still mainly reflect the vote shares. Thus, the vote share decline of centrist parties is explained by their vote share decline, which is not an explanation but results from transforming vote shares into intercepts using statistical models. Model properties cause the result, not valence (dis)advantages or positional efforts. Put differently, all factors determining vote choice that are not specified by covariates end up in the intercepts, and these factors might or might not be related to valence (Mauerer Reference Mauerer2020).
Another crucial model feature affects the interpretation of intercepts as valences: the inclusion of chooser attributes, that is, attributes that characterize voters. The influential contribution A Unified Theory of Party Competition by Adams et al. (Reference Adams, Merrill and Grofman2005) introduced them as nonpolicy motivations, which are, for example, socioeconomic factors or ties related to religion or class. These variables are of key theoretical importance as they integrate a behavioral perspective in the spatial modeling tradition to better understand centrifugal forces (see already Adams and Merrill III Reference Adams and Merrill1999a,Reference Adams and Merrillb, Reference Adams and Merrill2000; Merrill III and Adams Reference Merrill and Adams2001). Besides the theoretical importance, such variables add substantial explanatory power, as numerous works in line with this prominent research strand demonstrate for several decades and many polities (e.g., Adams and Merrill III Reference Adams and Merrill1999a, 771, 787; Reference Adams and Merrill2000, 741). Chooser attributes also enter the empirical applications of Schofield’s Spatial Valence Model, sometimes referred to as socioeconomic valences (e.g., Schofield and Zakharov Reference Schofield and Zakharov2010, 179) or just sociodemographics (e.g., Schofield and Sened Reference Schofield and Sened2006), which are not part of the formal but of the empirical model. A key argument for including them in the empirical modeling is again the improvement of model fit.
Mauerer (Reference Mauerer2020) demonstrates that chooser attributes are directly linked to the intercepts, so their coding determines the intercept values and the information they contain. Our first objective is to investigate further the implications of interpreting intercepts as valences and to outline a parameter identification strategy that matches the definition of average valences in Schofield’s Spatial Valence Model, which the conventional identifiability restriction does not. Here, we demonstrate the key points using the same German vote choice data as in Mauerer (Reference Mauerer2020). However, as we will lay out, relying on intercepts as a measure of valence still comes with several other drawbacks, such as the assumption that valence aspects are the only choice determinants that remain unobserved or their relative nature, which brings us to our second objective.
We outline a modeling approach to incorporate valence as an additional observable source of voter utility, which is consistent with existing studies within the spatial voting literature that consider valence as a measurable concept (e.g., Adams et al. Reference Adams, Merrill, Simas and Stone2011; Buttice and Stone Reference Buttice and Stone2012; Franchino and Zucchini Reference Franchino and Zucchini2015; Stone, Maisel, and Maestas Reference Stone, Maisel and Maestas2004; Stone and Simas Reference Stone and Simas2010). We propose to specify valence qualities as attributes that characterize parties (candidates) and present a parameterization that provides one valence effect for each party. We illustrate the benefits of the specification and parametrization strategy drawing on survey questions on candidate character traits in the American National Election Study.
Our third objective is to demonstrate the difference between specifying valence qualities as choice or chooser attributes by revisiting another prominent framework, the Valence Politics Model of Party Choice (e.g., Clarke et al. Reference Clarke, Sanders, Stewart and Whiteley2004, Reference Clarke, Sanders, Stewart and Whiteley2009, Reference Clarke, Sanders, Stewart and Whiteley2011; Sanders et al. Reference Sanders, Clarke, Stewart and Whiteley2011; Whiteley et al. Reference Whiteley, Clarke, Sanders and Stewart2013). Empirical applications of the model take a completely different approach than the dominant one in the spatial literature to quantify the theoretical concept of valence. Here, it is considered as an observable concept that can be measured by survey questions on party leader images or performance evaluations and these variables are specified as chooser attributes. By replicating a typical vote choice model in this research strand that uses the British Election Study, we illustrate the implications for interpretation and model complexity resulting from the chooser-attribute specification.
2. Standard Choice Model and Identifiability Issues
We briefly review the standard choice model and identifiability issues to set the ground. Let ${Y_i \in \{1, \ldots , J\}}$ contain J alternatives from which decision makers $i \in \{1, \ldots , n\}$ choose. The model incorporates chooser attributes $\boldsymbol {x}_i^T=(x_{i1}, \ldots , x_{iM})$ , also referred to as chooser-specific variables, as well as choice attributes $\boldsymbol {z}_{ij}^T=(z_{ij1},\ldots ,z_{ijK})$ , known as choice-specific variables, into the utility functions
A logistic response function connects the choice probabilities to the utility functions
where $\beta _{10},\ldots ,\beta _{J0}$ are alternative-specific intercepts, $\boldsymbol {\beta }^{T}_{j}=(\beta _{j1},\ldots ,\beta _{jM})$ are the parameters associated with chooser attributes $\boldsymbol {x}_i$ , and $\boldsymbol {\alpha }^{T}=(\alpha _{1},\ldots ,\alpha _{K})$ is the coefficient vector related to choice attributes, summarized in $\boldsymbol {z}_{i}^T=(\boldsymbol {z}_{i1}^T,\ldots ,\boldsymbol {z}_{ij}^T)$ . Equation (1) gives the model in its general unidentified version. Restrictions or side constraints are required to prevent linear dependency and thus ensure parameter identifiability.
One key restriction refers to the intercepts and chooser-specific covariates $\boldsymbol {x}_i$ , which vary across decision makers but not alternatives. Their invariance across alternatives causes that not all parameters $\boldsymbol {\beta }^{T}_{j}=(\beta _{j1},\ldots ,\beta _{jM})$ are identified. The same is true for the intercepts $\beta _{10},\ldots ,\beta _{J0}$ . The standard side constraint is to define one alternative as the reference alternative. For example, when the first alternative $(Y_i=1)$ serves as the reference, one sets $\beta _{10}=0$ , $\boldsymbol {\beta }_{1}^T=(0,\dots ,0)$ . We will come back to the invariance of chooser attributes across alternatives in Section 4 when we discuss the difference between specifying valence qualities as chooser or choice attributes.
2.1. The Identifiability of Categorical Chooser Attributes
An entirely different form of identifiability must be imposed on categorical chooser attributes that define groups of decision makers. Let $L\in \left \{1, \ldots , S \right \}$ denote a chooser attribute with S categories that represent subpopulations, socioeconomic groups, or, more generally, attribute levels. The analyst’s decision on how such attributes enter the utility functions is crucial for the resulting model properties and parameter interpretation. Since chooser attributes are directly linked to intercepts, the specific identifiability restriction involved has implications for the information the intercepts contain and, therefore, for interpreting intercepts as valences.
The conventional identifiability approach relies on (0–1) coding. We will clarify the consequences of the conventional (0–1) coding for interpreting intercepts as valences, which means one has to investigate the link between the identifiability restriction imposed here and the choice probabilities. Another way to deal with categorical chooser attributes is effect coding. Under this modeling approach, the dummy variables $x_{L(s)}$ for $s\in \left \{1,\dots ,S\right \}$ subpopulations take the values $0,1,-1$ . Table 1 compares (0–1) and effect coding for a four-categorical chooser attribute L. Under both coding schemes, ( $S-1$ ) dummy variables are sufficient. The last subpopulation is redundant since $L=S$ is implicitly determined by either the vector $(0,\dots ,0)$ or $(-1,\dots ,-1)$ .
Even though the coding schemes yield equivalent models, the interpretation of intercepts strongly depends on the chosen coding. We will present choice models imposing effect coding and discuss its benefits for interpreting intercepts as average valences.
3. Coding Schemes and Intercepts as Valence
The section investigates the link between the coding of categorical chooser attributes and choice probabilities and how it impacts the interpretation of intercepts as valences. We first discuss the difficulty with the conventional (0-1) coding, then present choice models imposing effect coding and outline the resulting model properties. For simplicity, we ignore the choice attributes $\boldsymbol {z}_{ij}$ initially because their coding does not affect the interceptsFootnote 4 and illustrate the key points with simplified examples. The section closes by discussing the implications of relying on intercepts to measure valence based on a vote choice model containing choice attributes $\boldsymbol {z}_{ij}$ and chooser attributes $\boldsymbol {x}_{i}$ .
3.1. The (0–1) Coding and the Reference Population
The (0–1) coding approach imposes an identification restriction that involves the need to define a reference population not to be confused with the reference alternative among choice alternatives. The analyst selects one subpopulation that serves as a reference to which the parameters are compared. Even though the selection is arbitrary and any subpopulation can be chosen as a reference, the specific choice directly affects the interpretation of intercepts.
The choice probabilities for subpopulation $s\in \left \{1,\dots ,S\right \}$ are given by
The choice of a reference population $s_0$ imposes the restriction $\beta _{js_0} =0 \text { for all } j.$ Alternatively, the model can be written with dummy variables $x_{L(s)}$
where $S_0=\{1,\dots ,s_0-1,s_0+1,\dots ,S\}$ .
For parameter interpretation, it is helpful to consider the log odds between any two alternatives $j_1, j_2\in \{1, \ldots , J\}$ ,
When selecting the first choice as the reference alternative ( $\beta _{10}=\beta _{1s}=0$ ), one obtains
For the reference population $s_0$ , one obtains
Thus, there is a direct link between the intercepts and the reference population. The intercepts represent the (log) odds of alternative j compared to alternative 1 in the reference population. The crucial point is that the definition of the reference population determines the interpretation of intercepts so that their meaning changes when the arbitrarily selected reference population changes.
For the covariate effects, one obtains
Hence, $e^{\beta _{js}}$ give the relative odds (or odds ratios) that compare the odds in subpopulation s to the odds in reference population $s_0$ .
3.1.1. Example
We use the same survey data as in Mauerer (Reference Mauerer2020) to demonstrate the implications for interpretation.Footnote 5 Let $Y_i$ contain vote choices for the German political parties CDU ( $Y_i=1$ ), SPD ( $Y_i=2$ ), FDP ( $Y_i=3$ ), Greens ( $Y_i=4$ ), and Left ( $Y_i=5$ ). We focus on the dichotomous variable gender $G \in \left \{1,2\right \}$ (1 female, 2 male) and estimate the model for different reference populations
with the restriction that one of the two gender-specific parameters is set to zero. We specify the party CDU as the reference alternative ( $\beta _{10}=\beta _{1s}=0$ ). Table 2 reports the estimates. The left part contains the parameters for males as the reference population and the right part for females as the reference population.
Note: CDU ( $j=1$ ) is reference alternative, SPD ( $j=2$ ), FDP ( $j=3$ ), Greens ( $j=4$ ), Left ( $j=5$ ). Source: 1998 German election study. $N=715$ .
To demonstrate that the information in the intercepts depends on the chosen reference population, let us consider the SPD vote ( $j=2$ ) and focus first on the estimates for males as the reference. The exposed intercept $e^{\beta _{20}}= 1.64$ gives the odds of males voting SPD compared to CDU (see Equation (6)). The product of the exposed intercept and gender estimate $e^{\beta _{20}} \: e^{\beta _{21}}= 1.64 \times .85= 1.39$ (see Equation (5)) gives the corresponding odds for females. The gender-specific estimate $e^{\beta _{21}}=.85$ gives the relative odds between the two subpopulations (see Equation (7)).Footnote 6 In the reversed coding (females as reference), $e^{\beta _{20}}=1.39$ gives the effect for females, $e^{\beta _{20}} \: e^{\beta _{21}}= 1.39 \times 1.18= 1.64$ the effect for males, and $e^{\beta _{21}}=1.18=1/.85$ the relative odds between the two subpopulations. Thus, even though the arbitrary (0–1) coding leaves the behavioral implications of the model unchanged, the parameters differ when the reference population changes, which has major implications for interpreting intercepts as valences considered in Section 3.3.
3.2. Effect Coding and Average Preferences
The benefit of effect coding is that it removes dependence on a pre-selected arbitrary reference population when interpreting intercepts as valences. It implies an identification restriction such that the resulting parameters relate to average preferences over subpopulations. We first discuss the choice model for S subpopulations and then add covariates.
3.2.1. Choice Model for S Subpopulations
Effect coding imposes for $s\in \left \{1,\dots ,S\right \}$ subpopulations the restriction
The restriction implicitly uses the geometric mean (GM) to average across all S subpopulations. The geometric mean can be considered a natural choice for averaging positive numbers based on product formation and root extraction. The geometric mean across subpopulations given the model holds has the form
Since $\prod _{s=1}^S e^{\beta _{js}}=1$ , one obtains
where $\gamma = (\prod _{s=1}^S \sum \limits ^{J}_{r=1}\exp (\beta _{r0} + \beta _{rs}))^{-1/S}$ . $GM(j)$ is the average preference (i.e., choice probability) for alternative j, with the geometric mean defining the average. It represents an average across subpopulations, not observations or alternatives. $\gamma $ is a constant that does not depend on the chooser attribute level s or alternative j.
This allows for a simple interpretation of intercepts, which are given by
Thus, $e^{\beta _{j0}}$ represents the preference for alternative j averaged over subpopulations times a constant ( $1/\gamma $ ). The intercepts indicate whether preferences vary across alternatives when accounting for possible variation across subpopulations. Even when the preferences differ across subpopulations ( $\beta _{js} \ne 0$ ), the average preferences for the alternatives are the same ( $GM(1)=\dots =GM(J)=\gamma $ ) when the intercepts are zero ( $\beta _{10}=\dots =\beta _{J0}=0$ ). Consequently, the intercepts represent average preferences not explained by subpopulations. When the first alternative is the reference ( $\beta _{10}=\beta _{1s}=0$ ), one obtains
The covariate parameters represent deviations from the average preferences. Compared to the reference alternative 1, $\beta _{js}$ give the additive effects on the average log odds and $e^{\beta _{js}}$ the multiplicative effects on the average odds,
The ratio of two parameters gives the relative odds between any two subpopulations $s_1, s_2 \in \left \{1,\dots ,S\right \}$
3.2.2. Example
We consider again the variable gender $G \in \left \{1,2\right \}$ (1 female, 2 male) to illustrate the interpretation under effect coding
with the restriction $\beta _{j1} + \beta _{j2}=0$ or $\beta _{j1}=- \beta _{j2}$ , respectively. Table 3 shows the estimates based on effect coding for the variable gender in two versions.
Note: CDU ( $j=1$ ) is reference alternative, SPD ( $j=2$ ), FDP ( $j=3$ ), Greens ( $j=4$ ), Left ( $j=5$ ). Source: 1998 German election study. $N=715$ .
Compared to the reference party CDU ( $\beta _{10}=\beta _{11}=0$ ), the intercepts $\beta _{j0}$ give the average preferences for party $j\in \{2,3,4,5\}$ , averaged over the male and female populations. The intercepts are identical under the two coding versions because the sum of the (log) odds over the two subpopulations is used to calculate them (see Equation (11)). One obtains
For example, the exposed SPD intercept $e^{\beta _{20}}=1.51$ indicates that the average odds of voting SPD is about 1.51 times higher than voting CDU. The gender-specific parameter $e^{\beta _{2 1}}$ shows how the subpopulations deviate from that average preference (see Equation (12)): $e^{\beta _{21}} = .92$ ( $1/e^{\beta _{21}} =1.08$ ) suggests that the odds of females (males) voting SPD compared to CDU is $.92$ ( $1.08$ ) times the average odds.Footnote 7
3.2.3. Choice Model for S Subpopulations and Additional Covariates
Next, we add covariates $\boldsymbol {x}_i$ to the utility functions. The restriction for identifiability $\sum _{s=1}^S \beta _{js} =0 \text { for all } j$ yields the choice probabilities for S subpopulations
where $\gamma (\boldsymbol {x}_i) = (\prod _{s=1}^S \sum \limits ^{J}_{r=1}\exp (\beta _{r0} + \beta _{rs}+\boldsymbol {x}_i^T\boldsymbol {\delta }_r))^{-1/S}$ . $GM(j,\boldsymbol {x}_i)$ is the average preference for alternative j given $\boldsymbol {x}_i$ , averaged over subpopulations and the average defined by the geometric mean.
For the intercepts, one obtains
Thus, $e^{\beta _{j0}}$ is the average preference for alternative j times the constant 1/ $\gamma (\boldsymbol {0})$ , where $\gamma (.)$ is evaluated at $\boldsymbol {0}^T=(0,\dots ,0)$ . However, when the first alternative is the reference, $\beta _{10}=\beta _{1s}=0, \boldsymbol {\delta }_1^T=(0,\ldots ,0)$ , the intercepts give the average odds
Compared to reference alternative 1, the covariate parameters $\beta _{js}$ give the additive effects on the average log odds and $e^{\beta _{js}}$ the multiplicative effects on the average odds,
where again $\prod _{s=1}^S e^{\beta _{js}}=1$ holds. The model can also include the term $\boldsymbol {z}^{T}_{ij}\boldsymbol {\alpha }$ , yielding the average $GM(j,\boldsymbol {x}_i, \boldsymbol {z}_{ij})$ .
3.3. Intercepts as a Measure of Valence
Next, we demonstrate the benefits of effect coding for interpreting intercepts as average valences as defined in the widely recognized Spatial Valence Model of Politics (e.g., Schofield and Sened Reference Schofield and Sened2005b, Reference Schofield and Sened2006) and discuss the difficulties that remain when equaling the intercepts with valence.
3.3.1. Application: Quantities in Schofield’s Spatial Valence Approach
The central quantity is the valence ranking where the intercepts are ranked $\beta _{[J]0} \geq \beta _{[J-1]0} \geq \dots \geq \beta _{[2]0} \geq \beta _{[1]0}$ .Footnote 8 The lowest valence party $\beta _{[1]0}$ plays a crucial role in quantifying average valences. First, the average valence of parties without the lowest ranked is calculated, $\lambda _{av\,(1)} = [1/(J-1)] \sum _{j=2}^{J}\beta _{[j]0}$ . Then, the valence difference between this average and the lowest valence party, $\Lambda = \lambda _{av\,(1)} - \beta _{[1]0}$ . We stick to the German election data and now consider all covariates: six standard voter demographics $\boldsymbol {x}_{i}$ (dichotomous variables union membership, working class, Catholic denomination, gender, region, and the quantitative variable age centered around the sample mean) and four spatial proximities $\boldsymbol {z}_{ij}$ (voter–party proximities on the issues of immigration, nuclear energy, European integration, and the Left-Right dimension).Footnote 9
Table 4 compares Schofield’s valence quantities for three vote choice models. Model 1 includes all covariates. In Model 2, we reversed the coding for one variable only, gender. Model 3 omits the variable region to demonstrate the dependence of the intercepts on the included covariates. The left part uses (0–1) coding, and the right part effect coding, based on the average $GM(j,\boldsymbol {x}_i, \boldsymbol {z}_{ij})$ . The party CDU is the reference alternative. Both coding schemes use identical probabilities when modeling choice behavior but yield different valence quantities because the information in the intercepts depends on the coding.
Note: Vote choice models containing spatial proximities and voter demographics. Section B of the Supplementary Material reports estimation tables. Party numbers: 1 (CDU, reference alternative, vote share: .30), 2 (SPD, vote share: .46), 3 (FDP, vote share: .04), 4 (Greens, vote share: .11), 5 (Left, vote share: .08). $\lambda _{av\,(1)}$ is the average valence other than lowest ranked party; $\Lambda $ is the valence difference for lowest valence party (see text). Source: 1998 German election study. $N=715$ .
Under (0–1) coding, the intercepts give the relative preferences of the voter segment $\boldsymbol {x}_{i}^T =\boldsymbol {0}^T=(0,\dots ,0)$ . Here, arbitrary coding decisions determine the composition of a particular electorate for which valence effects are calculated. Thus, the valence quantities in Model 1 refer to the subpopulation where all voter demographics take the value of 0, that is, average-aged male voters that do not belong to the working class, are no union members, do not have a catholic denomination, and are based in former East Germany. When the coding of only one variable changes, a different electorate is considered, which changes the information in the intercepts and, consequently, all intercept values and valence quantities. Whereas under Model 1, the Greens ( $\beta _{40}$ ) result as the lowest ranked party, it is the FDP ( $\beta _{30}$ ) under the reversed coding of gender in Model 2, yielding different valence quantities; $\lambda _{av\,(1)}$ reduces from $-0.54$ to $-0.87$ , and $\Lambda $ increases from $1.90$ to $2.39$ .
As a result, depending on arbitrary coding decisions, many different subpopulations can be constructed so that the researcher can calculate and present many different valence effects under (0–1) coding. And, the range of possible compositions of particular electorates gets larger the more voter demographics the model contains, which increases the model fit, and the more values these covariates can take.
Under effect coding, the intercepts represent relative average preferences (see Equation (16)). Consequently, the specific coding does not affect the valence quantities, yielding stable results. The SPD ( $\beta _{20}$ ) is the highest valence party and the lowest ranked party is the FDP ( $\beta _{30}$ ), $\lambda _{av\,(1)}=-0.89$ and $\Lambda =2.21$ under different coding versions.
Model 3 demonstrates how the ceteris paribus condition affects the valence quantities. As in any regression-based modeling and independent from the coding schemes, parameter interpretation depends on the included covariates. When omitting the variable region, a different electorate for which valences are calculated is defined, yielding different intercept values and, thus, valence effects. Whereas under effect coding, the ranking remains stable and the average valences and differences only slightly change, the quantities show a huge variation under (0–1) coding. The SPD ( $\beta _{20}$ ) instead of the CDU ( $\beta _{10}$ ) results as the highest valence party, the Greens ( $\beta _{40}$ ) as the lowest valence party, the FDP ( $\beta _{30}$ ) is ranked in the middle and $\Lambda $ changes from $2.39$ to $0.86$ .
If one wants to stick to interpreting intercepts as valences, effect coding is the better option. It better matches the definition of valence as an “average perception, among the electorate” (Schofield Reference Schofield2005, 348, italics added). Whereas (0–1) coding does not involve average preferences, effect coding does and the researcher avoids making inferences for one particular subpopulation only. However, two fundamental model properties make relying on intercepts to study valence aspects challenging and questionable.
First, the interpretation is not reference-free. Since the intercept of the reference alternative is set to zero under the standard side constraint to ensure identifiability, the reliance on intercepts only allows an interpretation relative to the chosen reference party (candidate). Second, if covariates contributing to explaining choices are not considered, this information enters the intercepts and increases the amount of unobserved utility sources. Since the intercepts reflect the importance of all unobserved utility sources, relying on intercepts implies that only valence qualities remain unobserved and the analyst succeeds in measuring all other choice determinants. If intercepts are assumed to represent valence, it should be made more clear what is implied. Then, given a particular set of covariates, valence comprises all the remaining unexplained (because unobserved or unobservable) factors that determine the choice.
4. Valence as an Observable Source of Utility
This section deals with specification and parameterization issues that arise when valence is considered as an observable source of voter utility. That is, when researchers model valence by covariates suitable for measuring valence qualities. We first outline a modeling strategy that overcomes the drawbacks of the intercepts. We propose a covariate specification that considers valence as a choice attribute and an effect parameterization that removes dependence on a reference alternative and provides one valence effect for each party (candidate). Then, we discuss the difference between specifying valences as choice or chooser attributes.
4.1. Valence as a Choice Attribute
We propose to specify valence qualities as choice-specific variables $\boldsymbol {z}_{ij}$ measuring attributes that characterize parties (candidates) to incorporate valence as an observable source of voter utility. To arrive at a valence effect for each party, we estimate parameters $\boldsymbol {\alpha }^{T}_{j}=(\alpha _{11}, \ldots , \alpha _{JK})$ that are specific to each alternative j. In contrast to the standard generic specification in Equation (1), which constrains the parameters $\boldsymbol {\alpha }$ to be the same for all alternatives ( $\boldsymbol {\alpha }_1 = \cdots = \boldsymbol {\alpha }_J := \boldsymbol {\alpha }$ ), the alternative-wise specification relaxes the assumption that decision makers assign the same weight to a choice attribute independent from which alternative they evaluate. The alternative-wise specification has been considered in the study of spatial voting before for voter–party issue proximities (e.g., Mauerer Reference Mauerer2016; Mauerer, Thurner, and Debus Reference Mauerer, Thurner and Debus2015). Here, we apply it to the parameterization of valence attributes to study how such qualities contribute to the utility each party provides voters
The parameters $\boldsymbol {\alpha }_j$ do not depend on a reference alternative. Thus, the effects are the same on all odds such that the relative odds remain the same independent of a reference alternative.
4.1.1. Application: Valence Qualities as Candidate Character Traits
We draw on the 2016 U.S. presidential election study and model the choice between the Democratic nominee Hillary Clinton and the Republican opponent Donald Trump to demonstrate the benefits of the specification strategy in studying the impact of valence qualities. We operationalize valence qualities by survey questions on candidate personality traits, thereby tapping into the dimension of valence as a character quality (Adams et al. Reference Adams, Merrill, Simas and Stone2011; Stone et al. Reference Stone, Maisel and Maestas2004; Stone and Simas Reference Stone and Simas2010). Voters assessed the candidates on six traits (strong leadership, really cares, knowledgeable, honest, speaks mind, and even-tempered). We generated an additive index for each candidate as an overall character assessment.Footnote 10
Table 5 reports three models. Model 1 includes four spatial proximities with generic parameters for simplicity and typical voter demographics. Model 2 adds character traits in generic specification. Model 3 specifies the character traits with alternative-wise parameters. Likelihood Ratio tests ( $\chi ^2(1)= 256.77$ ) indicate that character traits are highly significant and considerably improve model fit. Model 3 with alternative-wise parameters for character traits fits significantly better than Model 2 with one generic parameter ( $\chi ^2(1)= 4.29$ ). While in Model 1 all spatial proximities are highly significant, they greatly lose explanatory power, their effects are much weaker, and only two proximities (Liberal-Conservative and Defense) remain significant when including character traits whose effects are dominant in Models 2 and 3. The alternative-wise estimates in Model 3 suggest that character traits have a larger impact on the preference for the Democratic than the Republican candidate, ceteris paribus.
Note: Democratic Candidate is the reference alternative. Categorical voter attributes in effect coding, age centered around the sample mean. Source: 2016 American National Election Study. $N=1,847$ .
The empirical application demonstrates that operationalizing valence qualities by survey questions on candidate character traits and specifying them as choice attributes $\boldsymbol {z}_{ij}$ with alternative-wise parameters $\boldsymbol {\alpha }_j$ is a promising alternate strategy to intercepts to study how valence aspects affect vote choices. Including valence qualities as additional observable utility sources considerably improves the model performance so that less relevant information enters the intercepts. We note the ceteris paribus condition also applies here. The parameters reflect the association between the character traits and the dependent variable, given spatial proximities and voter demographics.
4.2. Valence as Chooser Attributes
The Valence Politics Model of Party Choice is the main competing approach to the spatial voting framework. Empirical applications also apply vote choice models containing spatial proximities and voter demographics. Instead of defining the intercepts as valence, this research strand usually considers multiple valence qualities, measured by survey questions on party leader images, performance evaluations, or problem-solving capacities and specified as chooser-specific variables (e.g., Clarke et al. Reference Clarke, Sanders, Stewart and Whiteley2009; Sanders et al. Reference Sanders, Clarke, Stewart and Whiteley2011; Whiteley et al. Reference Whiteley, Clarke, Sanders and Stewart2013). We revisit this modeling approach by demonstrating the difference between specifying valence qualities as chooser or choice attributes.
The specification of valence as chooser attributes $\boldsymbol {x}_i$ , which vary across choosers i but not alternatives j, requires J variables $\boldsymbol {x}_i^{(j)}$ and $J\times (J-1)$ parameters $\boldsymbol {\beta }_j^{(j)}$ for $p=1$ continuous or binary valence quality. Considering multiple valence features quickly leads to complex models and parameter inflation. For example, when analyzing $J=3$ alternatives and considering three valence features where two are continuous and one is four-categorical, one has to deal with 15 variables and 30 parameters: $3 \times 2$ variables with $3 \times (3-1) \times 2$ parameters for the two continuous valence features and $3 \times 3$ variables with $3 \times (3-1) \times (4-1)$ parameters for the four-categorical valence feature. Moreover, the resulting parameters depend on a reference alternative, and only a subset of the parameters is of direct interest in studying the impact of valence.
For parameter interpretation, it is helpful to consider the log odds between any two alternatives $j_1, j_2\in \{1, \ldots , J\}$ ,
Compared to reference alternative 1, $\beta _{10}=0, \: \boldsymbol {\beta }_1^{(j)}=(0,\ldots ,0)$ , the parameters $e^{\boldsymbol {\beta }_j^{(j)}}$ give the relative odds when $\boldsymbol {x}_i^{(j)}$ increases by one unit
Thus, the valence effects depend on the reference alternative, which makes their interpretation demanding.
The specification of valence as choice attributes $\boldsymbol {z}_{ij}$ , which take different values for each alternative j, is much more parsimonious and one can estimate J parameters $\boldsymbol {\alpha }_j$ that are reference-free and of direct interest to study valence effects. For $p=1$ continuous or binary valence quality, only one variable $z_{ij}$ is necessary and J parameters $\alpha _j$ can be estimated. When analyzing $J=3$ alternatives and three valence features (two continuous and one four-categorical), one only has to deal with 5 variables and 17 parameters: $2$ variables with $3 \times 2$ parameters for the two continuous valence features and $3 $ variables with $3 \times (4-1)$ parameters for the four-categorical valence feature.
The interpretation of valence effects is alternative-specific and independent from the reference alternative:
The parameters $e^{\alpha _j}$ give the relative odds when $z_{ij}$ increases by one unit. Next, we provide an empirical example to demonstrate the difference between both specifications.
4.2.1. Application: Valence Qualities as Party Leader Images
We draw on a simplified version of the model in Sanders et al. (Reference Sanders, Clarke, Stewart and Whiteley2011) and consider voting for the three major British parties Labour (Lab, $j=1$ ), Conservatives (Cons, $j=2$ ), and the Liberal Democrats (LD, $j=3$ ) in the 2010 British election. Valence is operationalized by several features, such as party leader images, assessments of party competence in different areas, or judgments about what party can best handle the most important issue facing Britain today. We focus on party leader imagesFootnote 11 and control for spatial proximities and voter demographics.Footnote 12 Table 6 only reports the parameters for party leader images (Section D of the Supplementary Material contains full estimation tables). The upper part gives the estimates for party leader images as chooser attributes based on different reference alternatives, and the lower part for party leader images as a choice attribute with alternative-wise effects.
Note: Vote choice models containing spatial proximities and voter demographics. Section D of the Supplementary Material reports full estimation tables. Source: 2010 British Election Study. $N=1,262$ .
When specifying party leader images as chooser attributes, three variables, one for each party ( $x_i^{(1)}, x_i^{(2)}, x_i^{(3)}$ ), are necessary. Each variable $x_i^{(j)}$ is associated with two identified parameters $\beta ^{(j)}_{j}$ , yielding the utility functions $u_{ij} = \beta _{j0} + x_i^{(1)}\beta ^{(1)}_{j} + x_i^{(2)}\beta ^{(2)}_{j} + x_i^{(3)}\beta ^{(3)}_{j}$ . Thus, one obtains six parameters to present valence effects, which are interpreted relative to a reference alternative, for example, to Labour, by setting $\beta _{10}=\beta ^{(1)}_{1}=\beta ^{(2)}_{1}=\beta ^{(3)}_{1}=0$ . Take the Labour leader image $x_i^{(1)}$ . When Labour is the reference, both identified parameters are of direct interest to evaluate the party’s valence effect. The parameter $\beta _2^{(1)}=-0.72$ gives the difference to the Conservatives and $\beta _3^{(1)}=-0.49$ the one to the Liberal Democrats, suggesting that an increase in Labour’s valence harms the Conservative vote more than the Liberal Democrats vote. When one selects the Conservatives or the Liberal Democrats as the reference, only one parameter in each case is of direct interest ( $\beta _1^{(1)}=0.72$ when the Conservative vote is the reference and $\beta _1^{(1)}=0.49$ when the Liberal Democrat vote is the reference) because the remaining parameter gives the difference between Liberal Democrats and Conservatives, which is only of indirect interest when evaluating the valence of Labour.
When we are interested in how the Conservative leader image $x_i^{(2)}$ impacts voting, the model with the Conservatives as reference contains the relevant information: $\beta _1^{(2)}=-0.90$ and $\beta _3^{(2)}=-1.13$ , suggesting that an increase in valence for Conservative has a larger negative impact on the Liberal Democrats than Labour. The same applies to the Liberal Democrats leader image $x_i^{(3)}$ . Thus, the researcher must estimate the model with different reference alternatives to detect all relevant valence effects. But these valence effects are not reference-free and always only allow a relative interpretation.
Under the proposed approach, which specifies valence qualities as a choice attribute $z_{ij}$ with alternative-wise parameters $\alpha _j$ , the resulting utility functions are $u_{ij}= \beta _{0j} + z_{ij}\alpha _j$ . One obtains one parameter for each party ( $\alpha _1,\alpha _2,\alpha _3$ ) that contains the relevant information. The alternative-wise parameters indicate that party leader images have the largest impact on the preference for the Conservatives and the smallest one for Labour, ceteris paribus, which is hardly seen when using the chooser-attribute approach.
5. Concluding Remarks
This contribution provides the statistical fundamentals to advance the empirical modeling of valence, a crucial concept in the study of public choice. We outline the effect coding scheme for chooser attributes that facilitates the interpretation of intercepts as valence because it frees researchers from making inferences for a specific reference population only and, therefore, matches the definition of average valences as introduced by Schofield’s widely applied Spatial Valence Model. However, relying on intercepts still comes with severe drawbacks that are independent of the coding schemes. The most critical point is probably that of approaching valence as an immeasurable concept. Defining the intercepts as valences implies that all unobserved choice-determining factors equal valence aspects. Consequently, when researchers want to stick to intercepts as valences, they should aim to capture as many non-valence-related factors by covariates to keep unobserved utility sources low and provide model fit measures to evaluate that. Then, effect coding presents a solution when the data do not contain suitable variables on valence qualities.
We also propose a covariate specification and effect parameterization strategy to incorporate valence aspects as an additional observable source of voter utility and, therefore, to overcome the drawbacks of the intercepts as valences and discuss different specification strategies. Our proposed modeling approach requires variables that are able to measure the theoretical concept of valence. Our empirical applications, where we operationalize valence by candidate character traits and party leader images (measured by like–dislike scores), are promising and yield insightful results. We hope this contribution inspires researchers to capture valence qualities through observable variables to keep the unobserved variable effect low, which is one major goal of empirical modeling.
Future research should focus on what variables are best to operationalize valence qualities and carefully consider them already in the data collection. For example, the literature on affective polarization is not in agreement about whether party leader like–dislike scores capture the general affect toward party leaders that might not be related to their qualities (e.g., Reiljan et al. Reference Reiljan, Garzia, Ferreira da Silva and Trechsel2023).
Supplementary Material
For supplementary material accompanying this paper, please visit https://doi.org/10.1017/pan.2023.43.
Data Availability Statement
Replication data and code for this paper are available in Mauerer and Tutz (Reference Mauerer and Tutz2023b) at https://doi.org/10.7910/DVN/SKWTGS.
Acknowledgments
We thank the four reviewers and the editors for their highly valuable comments.
Funding Statement
This work was supported by the program EMERGIA, Junta de Andalucia (EMC21-00256 to I.M.). Funding for open access charge: Universidad de Málaga/CBUA.