1 Introduction
How do customers choose from the abundance of products in modern retail outlets? How many attributes do they consider, and how do they process them to form a preference? These questions are of theoretical as well as practical interest. Gaining insights into the processes people follow while making purchase decisions will lead to better informed decision theories. At the same time, marketers are interested in more realistic decision models for predicting market shares and for optimizing marketing actions, for example, by adapting products and advertising materials to consumers’ choice processes.
In consumer research, decision models based on the idea of utility maximization predominate to date, as expressed in the prevalent use of weighted additive models derived from conjoint analysis to capture preferences (Elrod, Johnson & White, Reference Elrod, Johnson and White2004). At the same time, judgment and decision making researchers propose alternative decision heuristics that are supposed to provide psychologically more valid accounts of human decision making, and gather evidence for their use (e.g., Bröder & Schiffer, Reference Elrod, Johnson and White2003a; Gigerenzer, Todd & the ABC Research Group, Reference Gigerenzer and Todd1999; Newell & Shanks, Reference Newell and Shanks2003). Recently, the field of judgment and decision making has been equipped with a new tool, a greedoid algorithm to deduce lexicographic decision processes from preference data, developed independently by Yee et al. (Reference Yee, Dahan, Hauser and Orlin2007 and Kohli and Jedidi (Reference Kohli and Jedidi2007).
We aim to bring together these two lines of research by comparing the predictive performance of lexicographic decision processes deduced by the new greedoid algorithm to weighted additive models estimated by full profile regression-based conjoint analysis as a standard tool in consumer research. We derive hypotheses from the theoretical framework of adaptive decision making about when which approach should be the better-suited tool, and test them in an empirical study.
1.1 The standard approach to model preferences in consumer research
Conjoint analysis is based on seminal work from Luce and Tukey (Reference Luce and Tukey1964). Green developed the method further and adapted it to marketing and product-development problems (e.g, Green & Rao, Reference Green and Rao1971; Green & Wind, Reference Green and Wind1975). Today, conjoint analysis is regarded as the most prevalent tool to measure consumer preferences (Wittink & Cattin, Reference Wittink and Cattin1989; Wittink, Vriens & Burhenne, Reference Wittink, Vriens and Burhenne1994). In a survey among market research institutes, 65% of the institutes indicated having used conjoint analysis within the last 12 months, and growing usage frequency was forecasted (Hartmann & Sattler, Reference Hartmann and Sattler2002). Conjoint analysis is used to analyze how different features of products contribute to consumers’ preferences for these products. This is accomplished by decomposing the preference for the whole product into partitions assigned to the product’s constituent features. The established way to collect preference data is the full-profile method. Product profiles consisting of all relevant product features are presented to respondents. These profiles are evaluated either by rating or ranking or by discrete choice (i.e., buy or non-buy) decisionsFootnote 1.
The assumption behind the decompositional nature of conjoint analysis is that people weigh and add all available pieces of product information, thus deriving a global utility value for each option as the sum of partworth utilities. Options with higher utility are preferred — either deterministically or probabilistically — over options with lower utility. Clearly, this assumption rests on traditional conceptions of what constitutes rational decision making. Homo economicus is assumed to carefully consider all pieces of information and to integrate them into some common currency, such as expected utility, following a complex weighting scheme.
For rating- and ranking-based conjoint methods,Footnote 2 the basic weighted additive model (WADD) can be stated as follows:
with
rk = response for option k;
βjm = partworth utility of level m of attribute j;
xjm = 1 if option k has level m on attribute j;
else xjm = 0; and
εk = error term for response for option k.
The partworth utilities are estimated, usually by applying multiple regression, such that the sum of squares between empirically observed responses rk (ratings or rankings) and estimated responses ri is minimal.
1.2 Simple decision heuristics
The traditional view of rational decision making as utility maximization has been challenged in the judgment and decision making literature. Many authors propose alternative accounts of human decision making processes and argue that people are equipped with a repertoire of decision strategies from which to select depending on the decision context (e.g., Beach & Mitchell, Reference Beach and Mitchell1978; Einhorn, Reference Einhorn1971; Gigerenzer, Todd, & the ABC Research Group, Reference Gigerenzer and Todd1999; Payne, Reference Payne1976, Reference Payne1982; Payne, Bettman, & Johnson, Reference Payne, Bettman and Johnson1988, Reference Payne, Bettman and Johnson1993; Rieskamp & Otto, Reference Rieskamp and Otto2006; Svenson, Reference Svenson1979). According to Payne et al. (Reference Payne, Bettman and Johnson1988, Reference Payne, Bettman and Johnson1993), decision makers choose strategies adaptively in response to different task demands, and often apply simplified shortcuts — heuristics — that allow fast decisions with acceptable losses in accuracy. Moreover, simple heuristics are often more or at least equally accurate in predicting new data compared to more complex strategies (e.g., Czerlinski, Gigerenzer & Goldstein, Reference Czerlinski, Gigerenzer, Goldstein, Gigerenzer and Todd1999; Gigerenzer, Czerlinski & Martignon, Reference Gigerenzer, Czerlinski, Martignon, Shanteau, Mellers and Schum1999). The explanation is that simple heuristics are more robust, extracting only the most important and reliable information from the data, while complex strategies that weigh all pieces of evidence extract much noise, resulting in large accuracy losses when making predictions for new data — a phenomenon called overfitting (Pitt & Myung, Reference Pitt and Myung2002).
Lexicographic strategies are a prominent category of simple heuristics. A well-known example is Take The Best (TTB; Gigerenzer & Goldstein, Reference Gigerenzer and Goldstein1996), for inferring which of two alternatives has a higher criterion value by searching sequentially through cues in the order of their validity until one discriminating cue is found. The alternative with the positive cue value is selected. TTB is “noncompensatory” because a cue cannot be outweighed by any combination of less valid cues, in contrast to “compensatory” strategies, which integrate cue values (e.g., the WADD model). Applied to a consumer choice context, a lexicographic heuristic would prefer a product that is superior to another product on the most important aspectFootnote 3 for which the two options have different values, regardless of the aspects that follow in the aspect hierarchy.
1.3 Inferring lexicographic decision processes
Choices can be forecast by linear compensatory models even though the underlying decision process has a different structure (e.g., Einhorn, Kleinmuntz & Kleinmunutz, Reference Einhorn, Kleinmuntz and Kleinmuntz1979). A WADD model, for example, can theoretically reproduce a non-compensatory decision process if, in the ordered set of weights, each weight is larger than the sum of all weights to come (e.g., aspect weights of 21-n with n = 1, …, N and N = number of aspects; Martignon & Hoffrage, Reference Martignon, Hoffrage, Gigerenzer and Todd1999, Reference Martignon and Hoffrage2002). Despite its flexibility in assigning weights, however, Yee et al. (Reference Yee, Dahan, Hauser and Orlin2007 showed in Monte Carlo simulations that WADD models fall short of capturing the non-compensatory preference structure and are outperformed by lexicographic models when the choice is made in a perfectly non-compensatory fashion. Moreover, the goal is not only to generate high prediction performance but also insight into the process steps of decision making. Although conclusions from consumer self-reports and process tracing studies are limited (see below), several such studies suggest that only a minority of participants use a weighted additive rule, thus questioning the universal application of conjoint analysis (Denstadli & Lines, Reference Denstadli and Lines2007; Ford, Schmitt, Schechtman, Hults & Doherty, Reference Ford, Schmitt, Schechtman, Hults and Doherty1989).
Most users of conjoint models are well aware of their status as “as if” models (Dawkins, Reference Dawkins1976), and do not claim to describe the underlying cognitive process but only aim to predict the outcome. Consequently, many researchers call for psychologically more informed models (e.g., Bradlow, Reference Bradlow2005; Louviere, Eagle & Cohen, Reference Louviere, Eagle and Cohen2005). However, these rightful claims suffer from the lack of data analysis tools that estimate heuristics based on preference data. Self-reports (e.g., Denstadli & Lines, Reference Denstadli and Lines2007) are an obvious tool for tracking decision strategies but have questionable validity (Nisbett & Wilson, Reference Nisbett and Wilson1977). A more widely accepted way to deduce heuristics from people’s responses are process tracing techniques, such as eye tracking and mouse tracking (e.g., Payne et al., Reference Payne, Bettman and Johnson1988, Reference Payne, Bettman and Johnson1993; see Ford et al., Reference Ford, Schmitt, Schechtman, Hults and Doherty1989, for a review), or response time analyses (e.g., Bröder & Gaissmeier, Reference Bröder and Gaissmaier2007). However, these techniques are very expensive for examining large samples, as often required in consumer research. Moreover, data collection methods such as information boards tend to interfere with the heuristics applied and might induce a certain kind of processing (Billings & Marcus, Reference Billings and Marcus1983). Finally, it is unclear how process measures can be integrated into mathematical prediction models, as the same processing steps can be indicative of several strategies (Svenson, Reference Svenson1979).
In inference problems where the task is to pick the correct option according to some objective external criterion, such as inferring which of two German cities is larger, heuristics can be deduced by using datasets with known structure — in this case, a data set of German cities including their description in terms of features such as existence of an exposition site or a soccer team in the major league (Gigerenzer & Goldstein, Reference Gigerenzer and Goldstein1996). Based on the data set, one can compute the predictive validity of the different features, or cues, and thus derive cue weights. This way, competing inference strategies that process these cues in compensatory or noncompensatory fashions, including their predictions, can be specified a priori. These predictions can then be compared to the observed inferences that participants have made and the strategy that predicts most of these responses can be determined (see, e.g., Bröder, Reference Bröder2000; Bröder & Schiffer, Reference Bröder and Schiffer2003b; Rieskamp & Hoffrage, Reference Rieskamp, Hoffrage, Gigerenzer and Todd1999, Reference Rieskamp and Hoffrage2008).
In preferential choice, by contrast, the individual attribute weighting, or ordering structure, does not follow some objective outside criterion but depends on subjective preference and has to be deduced in addition to the decision strategy people use. Standard conjoint analyses estimate individual weighting structure assuming a weighted additive model, as laid out above. Approaches assuming alternative models are rare. Gilbride and Allenby (Reference Gilbride and Allenby2004) model choices in a two-stage-process and allow for compensatory and noncompensatory screening rules in the first stage. Elrod et al. (Reference Elrod, Johnson and White2004) suggest a hybrid model that integrates compensatory decision strategies with noncompensatory conjunctive and disjunctive heuristics. However, these approaches are basically modifications of the standard WADD model, allowing for noncompensatory weighting and conjunctions and disjunctions of aspects. This model, however, is not a valid representation of human decision making; its flexibility not only makes psychologically implausible computational demands, but also technically requires huge processing capacity. In contrast, the greedoid algorithm we focus on is intriguingly simple. It incorporates the principles of lexicography and noncompensatoriness rather than just adapting weighting schemes to imitate the output of lexicographic heuristics.
Yee et al. (Reference Yee, Dahan, Hauser and Orlin2007Footnote 4 developed the greedoid algorithm for deducing lexicographic processes from observed preference data, applicable to rating, ranking and choice alike. The algorithm rests on the assumption that the aspects of different options are processed lexicographically. It discloses the aspect sorting order that best replicates the observed (partial) preference hierarchy of options. Generally, the algorithm can be used to estimate various lexicographic heuristics. By introducing specific restrictions, aspects can be treated as acceptance or elimination criteria to model the well-known elimination-by-aspects heuristic (Tversky, Reference Tversky1972). For our purposes, we will implement the most flexible lexicographic-by-aspects (LBA) process that allows aspects from different attributes to be freely ranked as acceptance or elimination criteria.
The lexicographic-by-aspects process can be illustrated by a simple example. Given a choice between holiday options differing in travel location, with the three aspects Spain, Italy, and France, and means of transport, with the two aspects plane and car, a person may express the following preference order:
-
(1) Spain by plane;
-
(2) Spain by car;
-
(3) France by plane;
-
(4) Italy by plane;
-
(5) France by car;
-
(6) Italy by car.
The person’s preferences in terms of location and transport are quite obvious. She prefers Spain regardless of how to get there. Means of transport becomes determining when considering other countries, with a preference for flying. A restrictive lexicographic-by-attributes process could not predict this preference order without mistake. Using country as first sorting criteria, with the aspect order [Spain — France — Italy], would produce one mistake (i.e., wrong order for options 4 and 5), and sorting by means of transport, with the aspect order [plane — car], would produce two mistakes (i.e., wrong order for options 2 and 3 as well as 2 and 4). A lexicographic-by-aspects process, in contrast, by allowing aspects from different attributes to be ordered after each other, can predict the observed preference ranking perfectly. This way, the sorting order [Spain — plane — France] becomes possible, reproducing the observed order without mistakes.Footnote 5 This is the result that would be produced by the lexicographic-by-aspects implementation of the greedoid algorithm.
The algorithm is an instance of dynamic programming with a forward recursive structure. As goodness-of-fit criterion, a violated-pairs metric is used, counting the number of pairs of options that are ranked inconsistently in the observed and the predicted preference order. Basically, the algorithm creates optimal aspect orders for sorting alternatives — for example, various product profiles — by proceeding step-by-step from small sets of aspects to larger ones until the alternatives are completely sorted. First, the algorithm determines the inconsistencies that would be produced if the alternatives were ordered by one single aspect. This is repeated for each aspect. Then, starting from a set size of n = 2 aspects, the algorithm determines the best last aspect within each set, before moving forward to the next larger set size, and so forth until the set size comprises enough aspects to rank all options. Maximally, all 2N possible sets of N aspects are created and searched through (only if the set of profiles cannot be fully sorted by fewer aspects than N). This sequential procedure exploits the fact that the number of inconsistencies induced by adding an aspect to an existing aspect order depends only on the given aspects within the set and is independent from the order of those aspects. Compared to exhaustive enumeration of all N! possible aspect orders, dimensionality is reduced to the number of possible unordered subsets, 2N, decreasing running time by a factor of the order of 109 (Yee et al., Reference Yee, Dahan, Hauser and Orlin2007).
In an empirical test of the new algorithm, participants had to indicate their ordinal preferences for 32 SmartPhones (Yee et al., Reference Yee, Dahan, Hauser and Orlin2007). The products were described by 7 attributes; 3 attributes had 4 levels, 4 attributes had 2 levels, resulting in 20 aspects.Footnote 6 The Greedoid approach was compared to two methods based on the weigh-and-add assumption: hierarchical Bayes ranked logit (e.g., Rossi & Allenby, Reference Rossi and Allenby2003) and linear programming (Srinivasan, Reference Srinivasan, Aronson and Zionts1998).Footnote 7 Overall, the greedoid approach proved superior to both benchmark models in predicting hold-out data,Footnote 8 and thus seems to represent a viable alternative to standard estimation models. We aim to find out more about the conditions under which this methodology can be fruitfully applied, as well as about its limitations.
1.4 External and internal factors affecting strategy selection
According to the adaptive-strategy-selection view (Payne et al., Reference Payne, Bettman and Johnson1993), people choose different strategies depending on characteristics of the decision task. For instance, simple lexicographic heuristics predict decisions well when retrieval of information on the available options is associated with costs, such as having to pay for information acquisition (Bröder, Reference Bröder2000; Newell & Shanks, Reference Newell and Shanks2003; Newell, Weston & Shanks, Reference Newell, Weston and Shanks2003; Rieskamp & Otto, Reference Rieskamp and Otto2006), deciding under time pressure (Rieskamp & Hoffrage, Reference Rieskamp, Hoffrage, Gigerenzer and Todd1999, Reference Rieskamp and Hoffrage2008), or having to retrieve information from memory (Bröder & Schiffer, Reference Bröder and Schiffer2003a). The exact same studies, however, show that, when participants are given more time and can explore information for free, weighted-additive models usually outperform lexicographic strategies in predicting people’s decision making. Information integration seems to be a default applied by most people when faced with new decision tasks unless circumstances become unfavorable for extensive information search (Dieckmann & Rieskamp, Reference Dieckmann and Rieskamp2007).
Additionally, the mode in which options are presented as well as the required response mode affect strategy selection (for a review, see Payne, Reference Payne1982). Simultaneous display of options facilitates attribute-wise comparisons between alternatives.Footnote 9 In contrast, sequential presentation promotes alternative-wise, and thus more holistic, additive processing, as attribute-wise comparisons between options become difficult and would require the retrieval of previously seen options from memory or the application of internal comparison standards per attribute (Dhar, Reference Dhar1996; Lindsay & Wells, Reference Lindsay and Wells1985; Nowlis & Simonson, Reference Nowlis and Simonson1997; Schmalhofer & Gertzen, Reference Schmalhofer, Gertzen, Brehmer, Jungermann, Lourens and Sevlon1986; Tversky, Reference Tversky1969). Regarding response mode effects, Westenberg and Koele (Reference Westenberg and Koele1992) propose that the more differentiated the required response, the more differentiated and compensatory the evaluation of the alternatives. Following this proposition, ranking — which additionally is associated with simultaneous presentation of options — requires ordinal comparisons between options and is thus supposed to foster lexicographic processing, while rating requires evaluating one option at a time on a metric scale, which should promote compensatory processing. Indeed, there is empirical evidence that people use strategies that directly compare alternatives, such as elimination-by-aspects, more often in choice than in rating (e.g., Billings & Scherer, Reference Billings and Scherer1988; Schkade & Johnson, Reference Schkade and Johnson1989). Note that ranking tasks are often posed as repeated choice tasks, requiring participants to sequentially choose the most preferred option from a set that gets smaller until only the least preferred option remains. For such ranking tasks, we therefore expect similar differences in comparison to rating tasks as for choice tasks, and anticipate higher predictive accuracy of a lexicographic model relative to a compensatory model.
This prediction agrees with another result reported by Yee et al. (Reference Yee, Dahan, Hauser and Orlin2007. Besides their own ranking data, they re-analyzed rating data by Lenk, DeSarbo, Green and Young (Reference Lenk, DeSarbo, Green and Young1996). Unlike in the ranking case, the Greedoid approach produced slightly lower predictive accuracy for hold-out data than a hierarchical Bayes ranked logit model. However, the two studies differed in several aspects, so the performance difference cannot be unambiguously attributed to the difference in respondents’ preference elicitation task.
Among the internal factors affecting the selection of decision strategies are prior knowledge and expertise (Payne et al., Reference Payne, Bettman and Johnson1993). Experts tend to apply more selective information processing than non-experts (e.g., Bettman & Park, Reference Bettman and Park1980; see Shanteau, Reference Shanteau1992, for an overview). Shanteau (Reference Shanteau1992) reports results demonstrating that experts are more able than non-experts to ignore irrelevant information. Ettenson, Shanteau, and Krogstad (Reference Ettenson, Shanteau and Krogstad1987) found that professional auditors weighted cues far more unequally, relying primarily on one cue, than students. Similarly, in a study on rating mobile phones, participants that reported having used a weighted additive strategy had the lowest scores on subjective and objective product category knowledge compared to other strategy users (Denstadli & Lines, Reference Denstadli and Lines2007). In short, experts seem to be better able to prioritize attributes, thus possibly giving rise to a clear attribute hierarchy with noncompensatory alternative evaluation. In contrast, non-experts might be less sure about which attribute is most important and therefore apply a risk diffusion strategy by integrating different pieces of information.
To summarize, we compared two models of decision strategies — weighted additive and lexicographic — in terms of their predictive accuracy for ranking versus rating data. Our hypothesis was that, relative to compensatory strategies, lexicographic processes predict participants’ preferences better in ranking than in rating tasks. Our second hypothesis was that, regardless of the required response, predictive accuracy of the lexicographic model is higher for experts than for non-experts, because experts are better able to prioritize attributes (Shanteau, Reference Shanteau1992).
2 Method
To test our hypotheses, we selected skiing jackets as product category, which can be described by few attributes, thus allowing for acceptable questionnaire length and complexity. The product can be assumed to be relevant for many people in the targeted student population at a southern German university.
2.1 Participants
A sample of 142 respondents, 56% male, with an average age of 23.9 years, was recruited from homepages mainly frequented by business students at the University of Regensburg as well as via personal invitations during marketing classes and emails. For participation, everyone obtained the chance of winning one out of ten coupons worth 10 € for an online bookstore.
2.2 Procedure
Participants filled out a web-based questionnaire with separate sections for rating and ranking a set of skiing jackets. Each product was described by 6 features: price and waterproofness, each with 3 levels, as well as 4 dichotomous variables indicating the presence of an adjustable hood, ventilation zippers, a transparent ski pass pocket, and heat-sealed seams. These 6 features had been identified as most relevant for skiing jackets during exploratory interviews with skiers. The 96 possible profiles of skiing jackets were reduced to a 16-profile fractional factorial design (calibration set) that is balanced and orthogonal.Footnote 10 Each respondent, in each task, was shown the 16 profiles plus 2 hold-outs. Respondents were not aware of this distinction, as the 16 calibration profiles were interspersed with the hold-outs; both were presented and evaluated in the same way.Footnote 11 For the ranking task, all 18 profiles were shown at once. The task was formulated as a sequential choice of the preferred product (“What is your favorite skiing jacket out of this selection of products?”). The chosen product was deleted from the set, and the selection process started all over again until only the least preferred product was left. During the rating task, one profile at a time was presented to respondents. They were asked to assign a value on a scale from 0 to 100 to each profile (“How much does this product conform to your ideal skiing jacket?”). Each participant saw a new random order of profiles; task order was randomized as well. Between these tasks, people completed a filler task in order to minimize the influence of the first task on the second one.Footnote 12 The conjoint tasks were enclosed by demographic questions and questions on expertise (e.g., “Are you a trained skiing instructor?”). The survey took approximately 20 minutes.
2.3 Data analysis
As mentioned above, in applications of conjoint analysis in consumer research there usually is an a-priori definition of distinctive sets of calibration profiles used for model fitting and hold-out profiles used for evaluating predictive performance. The set of calibration profiles is designed to ensure sufficient informative data points per attribute aspect by paying attention to balance and orthogonality in aspect presentation across the different choice options. It could be argued, however, that the hold-outs might be peculiar in some way and thus lead to distorted estimates of predictive accuracy. We therefore decided to conduct a full leave-two-out cross-validation. That is, we fitted the model to all 153 possible sets of 16 profiles out of all 18 profiles, and in each run used the remaining two profiles as hold-outs for computing predictive accuracy. This necessarily involves slight violations of orthogonality in many of the 153 calibration sets. We think that these violations are acceptable, for the sake of generality and because there is no reason to expect the two tested models to be differentially affected by potential lack of information on some aspects.
Ranking and rating data were analyzed separately by the greedoid algorithm and by conjoint analysis. Ordinary least squares regression analysis was used to estimate individual-level conjoint models.Footnote 13 The resulting partworth utilities were used to formulate individual WADD models for each cross-validation run for each participant (see Equation 1). For each pair, the option with higher total utility was — deterministically — predicted to be preferred over the option with lower total utility. The outcome of the greedoid algorithm was used to specify individual LBA processes for each cross-validation run for each participant to decide between all possible pair comparisons of options: Aspects were considered sequentially according to the individual aspect order the greedoid algorithm produced. As soon as one aspect discriminated between options, the comparison was stopped, the remaining aspects were ignored, and the option with the respective aspect was predicted to be preferred.
The models’ predictions were compared to empirical rankings or ratings, respectively.Footnote 14 Each pair of products for which one model predicts the wrong option to be preferred was counted as one violated pair produced by that model.Footnote 15 For each participant, results were averaged across the 153 cross-validation runs. The main focus was on the mean predictive accuracy for hold-outs, that is, pairs of options with at least one option not included in the data fitting process.
3 Results
3.1 Descriptive results
Summarized over all respondents, the aspect partworths resulting from conjoint analyses of the rating task were largely congruent with the partworths resulting from the ranking task (see Figure 1). For both data collection methods, two features (waterproofness and price) received a much higher weight than the four other attributes.Footnote 16 These results were comparable to the aspect orderings disclosed by the greedoid algorithm. Features of the same two attributes dominated decision making, and they did so in rating and ranking (see Figure 2).
3.2 Model fit
The WADD model showed better data fit than the LBA model. For ranking, WADD produced 7.9% violated pairs on average across cross-validation runs and participants (SD = 6.4), while LBA produced 10.3% (SD = 6.0). For rating, WADD produced 6.5% violated pairs on average (SD = 5.4), compared to 8.7% produced by LBA (SD = 5.3). Given the high flexibility of the weights of the WADD model that can be adjusted to accommodate highly similar to highly differentiated weighting schemes, this result came as no surprise. The crucial test was how the two models performed when predicting hold-out data.
3.3 Predictive accuracies
3.3.1 Ranking vs. rating
Mean predictive accuracies for hold-out data of the two decision models in terms of percentage of violated pairs, averaged across participants, are summarized in Table 1. Clearly, the WADD model is better than the LBA model at predicting the preferences for the hold-out profiles for both ranking and rating tasks. In line with these descriptive results, a repeated-measurement ANOVA of the dependent variable of individual-level predictive accuracy for hold-out data (in terms of percentage of violated pairs), with the two within-subject factors Task (rating vs. ranking) and Model (WADD vs. LBA), revealed a significant main effect of Model, F(1,141) = 89.18, p < .001. The factor Task did not show a main effect, F(1,141) = 0.11, p = .746, but there was a significant interaction of Task x Model, F(1,141) = 11.60, p = .001: While LBA produces fewer violated pairs for the ranking compared to the rating task, WADD performs slightly worse for the ranking than for the rating task (see Table 1; the interaction can also be seen within the groups of experts and non-experts in Figure 3). Post-hoc t-tests revealed that predictive accuracy of LBA is marginally higher for ranking than for rating, t(141) = 1.41, p = .081. Thus, there is some support for the hypothesis that LBA is better at predicting ranking compared to rating data.Footnote 17
Note: Percentages refer to the proportion of pairs including at least one hold-out option that were wrongly predicted by the respective strategy, averaged across 153 cross-validation runs and across participants (n = 142). Pairs of options to which participants had assigned the same value were excluded from the rating data. Standard deviations are given in parenthesis.
One could argue that the violated-pairs metric unfairly favors models that predict many ties, which are not counted as violated pairs. However, the percentages of pair comparisons for which ties are predicted are below 1% for all models, and LBA predicted more ties (0.7% on average for ranking, 0.6% for rating) than WADD (0.1% for ranking, 0.1% for rating), which further backs the general superiority of the compensatory model in our data set.
3.3.2 Experts vs. non-experts
We divided respondents into a non-expert subgroup and an expert subgroup of active skiing instructors. In more detail, only people indicating that they had started or completed training for being a skiing instructor and were active for at least eight days per skiing season were considered experts. According to this criterion, we could identify 27 experts. This sample size is sufficient for statistical group comparisons so we did not have to rely on softer and more subjective ratings of expertise by respondents. Figure 3 shows that experts’ stated preferences tended to be generally more predictable than those of non-experts regardless of the model applied. However, when Expertise was added as a between-subjects factor in the repeated measurement ANOVA of individual-level predictive accuracy, with Task and Model as within-subjects factors, the main effect of Expertise was not significant, F(1,140) = 2.86, p = .093, not were there significant interactions (one-way or two-way) between Expertise and Task or Model.
3.3.3 Descriptive analysis of individual differences
Nevertheless, WADD was not the best model for all participants. For the ranking task, LBA achieved higher mean predictive accuracy than WADD for 35% of the participants (n = 50). For the rating task, LBA still achieved higher mean accuracy for 25% of participants (n = 36). In Figure 4, the difference in mean percentage of violated pairs between LBA and WADD is plotted for each respondent. Higher values indicate more violated pairs produced by LBA, that is, superiority of the WADD model. Respondents are ordered in decreasing order according to the size of difference between LBA and WADD for the ranking task (plotted as dots). The number of dots below zero is 50, corresponding to the number of respondents for which the LBA model achieved higher accuracy in the ranking task.
The respective difference values for the rating task, for the same participants, are also shown in Figure 4 (plotted as crosses). There are no visible hints that participants strive for inter-task consistency in their strategy use. Indeed, for only 16 of the 50 participants for who the LBA model achieved higher accuracy for ranking, it also achieved higher accuracy for rating. Note that assuming chance group assignment of the 50 participants, for already 13 participants it can be expected that the LBA model is also superior for rating (i.e., 50/142 * 36). Similarly, for 72 of the 92 for who the WADD model achieved higher accuracy for ranking, it also achieved higher accuracy for rating, with 69 expected by chance assignment.Footnote 18 In line with these findings, the correlation between the two vectors of difference values is very low, with r = .13.
4 Discussion
In line with our hypothesis, the lexicographic model was better in predicting ranking compared to rating data. As suggested by other authors, the simultaneous presentation mode and the required ranking response obviously promoted lexicographic processing (e.g., Dhar, Reference Dhar1996; Schkade & Johnson, Reference Schkade and Johnson1989; Tversky, Reference Tversky1969). However, the compensatory model derived from conjoint analysis proved superior to the lexicographic model based on the aspect orders derived from Yee et al.’s (Reference Yee, Dahan, Hauser and Orlin2007) greedoid algorithm regardless of the response mode in which participants had to indicate their preferences. This result was achieved despite the use of basic, conservative benchmark estimation procedure for the WADD model, that is, ordinary least square regression. Applying more powerful estimation procedures may have resulted in even greater superiority of the WADD model.
The relatively high predictive performance of the WADD model is in stark contrast to the results reported by Yee et al. (Reference Yee, Dahan, Hauser and Orlin2007. One possible reason for the WADD models’ inferiority in their study is that the number of options and aspects was higher: Yee et al. used 32 options described on 7 attributes with 20 aspects in total, whereas we used 18 options described on 6 attributes with 14 aspects in total. Thus, their task was more complex, increasing the need for simplifying heuristics. Payne (Reference Payne1976) as well as Billings and Marcus (Reference Billings and Marcus1983) suggest that a relatively large number of options induces a shift from compensatory to noncompensatory processing with the goal of reducing the number of relevant alternatives as quickly as possible. Some authors also report more use of noncompensatory strategies when the number of attributes increases (Biggs, Bedard, Gaber & Linsmeier, Reference Biggs, Bedard, Gaber and Linsmeier1985; Sundström, Reference Sundström1987). Thus, the effects of task complexity on the performance of LBA versus WADD models deserve exploration in controlled experiments in the future.
Moreover, two of our attributes seemed to be of utmost importance to many participants (see Figures 1 and 2). Given this importance structure, tradeoffs between the most important attributes seem to be within reach even given limited time and processing capacities. In sum, the circumstances under which the new greedoid approach can be fruitfully applied as a general tool require further exploration. Our research suggests that with few attributes to consider and relatively few options to evaluate, the standard approach will provide higher predictive accuracy on average, for both rating and ranking tasks. However, the WADD model does not outperform LBA for each individual participant. The LBA model is better in predicting the choices of a considerable proportion of people. It might therefore be useful to further study these differences to derive rules for assigning individual participants to certain decision strategies. However, there is little consistency across tasks in terms of which is the more accurate decision model. That is, for a large proportion of people for whom the LBA model was better at predicting the ranking data, the WADD model was better at predicting the rating data, and vice versa. Thus, the results do not seem to be skewed from participants’ striving for consistent answers across tasks. But at the same time, the observed inconsistency rules out the assumption of habitual preferences for certain strategies. So, many people seem to apply different strategies depending on the preference elicitation method. This diversity in responses to task demands will complicate assignment of participants to strategies.
We hypothesized that expertise would be one individual difference variable that affects strategy selection. However, the lexicographic model achieved only marginally higher accuracy for experts than for non-experts. Also, contrary to expectation, the WADD model still outperformed the lexicographic model in predicting expert decisions. A reason could be that our product category related to a leisure activity for which expertise is likely to be highly correlated with personal interest and emotional involvement. There is empirical evidence that involvement with the decision subject is associated with thorough information examination and simultaneous, alternative-wise processing, while the lack of it leads to attribute-wise information processing (Gensch & Javalgi, Reference Gensch and Javalgi1987). Thus, in addition to the situational factors promoting compensatory decision making over all participants, emotional involvement might have led to relatively high levels of compensatory processing in experts. Future studies should aim at distinguishing between the concepts of expertise and involvement to study their possibly opposite effects on strategy selection.
5 Conclusion
The development of the greedoid algorithm to deduce lexicographic processes offers great potential for the fields of judgment and decision making as well as consumer science. For the first time, a relatively simple and fast tool for deriving lexicographic processes is available, applicable to different kinds of preference data. However, it has to be doubted that the new approach represents a universal tool that will replace the established ones. For decision tasks with relatively low complexity — that is, with few aspects and options — the standard weighted additive model led to superior predictive accuracy for both ranking and rating data compared to the lexicographic model deduced with the greedoid algorithm. To provide advice to practitioners on when the new analysis method might prove useful, we clearly need to find out more about the conditions under which, and the people for whom lexicographic models lead to superior predictions. Based on previous research, situations with significant time pressure, complex decision tasks, or high cost of information gathering might represent favorable conditions for lexicographic processing, and thus for the application of the greedoid algorithm (Bröder, Reference Bröder2000; Payne, Reference Payne1976; Payne et al., Reference Payne, Bettman and Johnson1988).
People simplify choices in many ways. Verbal protocol studies have revealed many different cognitive processes and rules that decision makers apply, of which non-compensatory lexicographic decision rules are just one example (e.g., Einhorn et al., Reference Einhorn, Kleinmuntz and Kleinmuntz1979). There definitely is demand for models that are descriptive of what goes on in decision makers’ minds when confronted with the abundant choices their environment has to offer. Combining lexicographic and compensatory processes in one model might be a promising route to follow. Several authors have argued that noncompensatory strategies are characteristic of the first stage of choice, when the available options are winnowed down to a consideration set of manageable size (e.g., Bettman & Park, Reference Bettman and Park1980; Gilbride & Allenby, Reference Gilbride and Allenby2004; Payne, Reference Payne1976). Once the choice problem has been simplified, people may be able to apply or at least to approximate compensatory processes, which is in line with our results. The prevalence of combinations of lexicographic elimination and additive strategies is further backed by recent evidence from verbal protocol analyses (Reisen, Hoffrage & Mast, Reference Reisen, Hoffrage and Mast2008). There is preliminary work by Gaskin, Evgeniou, Bailiff & Hauser (Reference Gaskin, Evgeniou, Bailiff and Hauser2007) trying to combine lexicographic and compensatory processes in a two-stage model, with lexicographic processing, estimated with the greedoid algorithm, on the first stage. We are curious how these approaches will turn out in terms of predictive accuracy.
Appendix: Table of product profiles
Calibration as well as hold-out profiles were generated with SPSS Orthoplan. Orthogonality of the fractional factorial design of calibration profiles is ensured.
a Profiles originally included as the only two hold-out profiles (distinction not applicable anymore due to leave-two-out cross-validation procedure); remaining profiles form a balanced and orthogonal design.