Hostname: page-component-cd9895bd7-8ctnn Total loading time: 0 Render date: 2024-12-25T06:39:54.988Z Has data issue: false hasContentIssue false

Wine ratings and commercial reality

Published online by Cambridge University Press:  11 December 2024

Gianni De Nicoló*
Affiliation:
Carey Business School, Johns Hopkins University, Baltimore, MD, USA

Abstract

Is the quality of a 91-point wine significantly different from that of an 89-point wine? Which wines are underpriced relative to their evaluation of quality? This paper addresses these questions by constructing a novel wine rating system based on scores assigned by a panel of wine experts to a set of wines. Wines are classified in ranked disjoint quality equivalence classes using measures of statistically significant and commercially relevant score differences. The rating system is applied to the “Judgment of Paris” wine competition, to data of Bordeaux en-primeur expert scores and prices, and to expert scores and price categories of a large database of Italian wines. The proposed wine rating system provides an informative assessment of wine quality for producers and consumers and a flexible rating methodology for commercial applications.

Type
Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© Johns Hopkins University, 2024. Published by Cambridge University Press.

“Count what is countable, measure what is measurable, and what is not measurable, make it measurable.” Galileo Galilei (1564–1642).

I. Introduction

Wine quality ratings based on numerical scores assigned by a panel of wine experts to a set of wines are ubiquitously reported in wine magazines and wine guides, are prominent on the shelves of wine stores, and are used to grant quality “medals” to wines in wine competitions. These ratings are important for producers, offering them market visibility and directing their production decisions about wine style, as well as for consumers, orienting their purchasing decisions. The assessment of the value and the role of these ratings for producers and consumers, and their relationship with wine prices, are important ongoing areas of research in wine economics (Storchmann, Reference Storchmann2012).

Wine ratings pose two important and commercially relevant questions. Is a 90-point wine significantly different from an 89-point wine? As observed by Gergaud et al. (Reference Gergaud, Ginsburgh and Moreno-Ternero2021), rating boundaries determining differences in wine quality ratings can be critical for wine marketing. Currently, a wine with a score equal to or greater than the 90-point threshold is marketed as a high-quality wine deserving special exposure in the press and on the wine store shelves. And, which wines are underpriced relative to their assessment of quality? The classifications of wine “bargains” or “top performers” in the popular press reviewed by Miller et al. (Reference Miller, Stone and Stuen2015), and the results of how wine ratings can affect prices for wine rated above certain thresholds illustrated by Carlson et al. (Reference Carlson, Kopalle, Riddel, Rockwell and Vana2023), can significantly affect consumers’ perceptions, with relevant impact on sales. This paper addresses these questions by constructing a wine rating system that aims at providing a statistical and commercially relevant basis for the determination and comparisons of wine ratings and their relationship with prices.

Rating methodologies used in many areas, such as finance (see e.g. FitchRatings, 2022), environmental standards (see e.g. Morgan Stanley Capital International, 2022), and quality ratings of consumer goods (see e.g. www.consumereports.org), are based on a mapping of statistics of measurable characteristics of the objects rated—the probability of defaults of debt securities, the environmental impact of specific gas emissions, the records of repairs of consumer goods—into ranked disjoint quality equivalence classes, which are typically labeled with alphabetically ordered letters or visual indicators of ranks. Although rating methodologies may be very similar, ratings of specific items issued by different rating organizations may differ depending on the type of databases used, as well as on the weight assigned to the set of quality factors whose aggregation determines the overall rating of an item. The proposed rating system adapts a standard rating methodology to the multifaceted dimensions of wine quality as captured by tasting protocols in professional evaluations, as reflected in the “wine scorecards” used in various settings reviewed by Jackson (Reference Jackson2017). In essence, a wine score by a wine expert is a mapping of his/her quantitatively ordered sensory evaluation and weighting of wine quality factors onto a numerical rating scale. Similarly to the ratings produced in different fields, the rating of the set of wines by a panel of experts will depend on the size of the wine sample, the number of experts, and how experts’ scores are aggregated.

Reported wine evaluations differ depending on the implicit or explicit wine scorecard that experts use, which determines the design of a rating system. In an incomplete information rating system, a single wine score is recorded with no information on the evaluation of the quality factors underlying that score. In this case, the evaluation of each of the quality factors underlying the wine score is not observable.Footnote 1 In a complete information rating system, quality factors are scored individually and aggregated into a final score according to predetermined weights. The wine scorecard used by the International Organization of Vine and Wine (OIV, 2022) for sponsored wine competitions is an example of this format: the card requires experts to score 10 quality factors. A complete information rating system based on data about the “why(s)” a particular score is assigned to a wine is undoubtedly more informative than an incomplete information setup. A scoring template that requires an expert to explicitly assess quality factors numerically is generally considered a standard of professional tasting procedures by enologists. However, most statistical evaluations of wine tastings in the literature have been carried out using an incomplete information setup due to data availability. As the data of our applications are all incomplete information databases, in this paper, we focus on an incomplete wine rating system.

The proposed rating system builds on standard methods used in food science,Footnote 2 originally applied to wine evaluations by the pioneering contribution of Amerine and Roessler, (Reference Amerine and Roessler1983). In the context of these standard methods, we introduce the following novel assumptions regarding the standardization of expert scores, the definition and partitioning of data in ranked disjoint quality rating classes, and the identification of the price–quality rating component.

A. Scores standardization

Experts deliver a wine score in a specific numerical range. For instance, Wine Spectator magazine uses a 70–100 range, Decanter magazine uses a 50–100 range, and www.jancisrobinson.com uses the 12–20 range. Wine scores may differ according to experts’ different experiences, sensory capacities, and, most importantly, different weights assigned to the perceived wine quality factors that are not observable. This is reflected in different ways the same rating scale is used by each expert, as observed by Cardebat and Paroissien (Reference Cardebat and Paroissien2015). The heterogeneity of views of a panel of experts who use implicitly the same wine scorecard is informative in providing an evaluation of wine quality from different perspectives. The differences in wine evaluations by Robert Parker and Jancis Robinson often mentioned in the literature illustrate this point.

To ensure the comparability of expert scores, the aggregation of their scores requires a standardization that reflects this heterogeneity while preserving each expert ranking of a wine. In the proposed rating system, rank-preserving standardization is simply implemented using the location and scale of the distribution of wine scores of each expert, measuring their evaluation with a Z-score.

B. Rating classes

Wines are classified and ranked in disjoint quality equivalence classes using the mean of standardized expert scores. Based on a standard analysis of variance (ANOVA), we compute a measure of Minimum Significant Difference (MSD) at a standard level of confidence for a set of rated wines. Indivisible MSD units provide a “numeraire,” or “currency,” to convert a standardized wine score in units of MSD, called the QV of a wine. The QV is computed as the integer ratio of a wine standardized score to the MSD. QVs automatically place wines in ranked disjoint equivalent quality classes, since wines with the same assessed quality have the same QV. The ratings of a set of wines are then delivered as a set of alphabetically ordered rating classes (e.g. A, B, C, …, and so on). As detailed in our applications, the design of the system allows users to calibrate the number of rating classes to desired commercial objectives.

Computing QVs is feasible when the scores of each wine by a set of experts are available. Yet, in many wine publications and wine guides, only a single score of a wine is reported with no information about the scores of individual experts. In this case, the construction of rating classes can be implemented by the estimation of a finite mixture model (FMM) using a given distribution of wine scores. The identification of ranked disjoint equivalent quality classes is based on the estimated posterior predicted probabilities that a wine score belongs to a particular rating class. These predicted posterior probabilities are the “Quality Value” counterparts derived from the entire distribution of wine scores.

C. Price–quality rating component

Given the availability of individual prices for a set of rated wines, the rating system can be used to identify wine underpricing within each rating class. The relationship between price and quality conditional on a wine being rated in a given rating class is obtained by estimating hedonic price quantile regressions (Koenker, Reference Koenker2005), where price is a function of the identified rating classes and other controls. These quantile regressions are estimated for a quantile lower than the median, whose level is chosen according to the desired stringency of the criterion defining “underpricing.” An underpriced wine in each rating class is a wine whose price is lower than the predicted price at the chosen quantile.

The inclusion of underpricing in the rating system can be also implemented when individual wine prices are not available, but wines are classified in price ranges, or price “points”: in this case, the identification of underpriced wines is simply obtained from the joint empirical distribution of wines’ price ranges and rating classes using empirical quantiles within each rating class.

Therefore, the rating system can include underpricing information by expanding rating classes into subcategories. Similarly to the determination of rating classes, the design of these subcategories allows users to calibrate the degree of underpricing according to desired commercial objectives, as detailed in our applications.

D. Applications and plan of the paper

The proposed rating system is applied to three examples of commercially important wine ratings: the 1976 “Judgment of Paris” wine competition, a sample of 2021 ratings of Bordeaux en-primeur wines, and a large database of ratings and price categories of Italian wines published online for subscribers by the National Association of Wine Tasters ONAV (Organizzazione Nazionale Assaggiatori Vino) in 2022. The dataset of the “Judgment of Paris” wine competition includes wine scores by the panel of experts, but prices are not available. The Bordeaux en-primeur dataset includes both expert scores and prices. The ONAV dataset includes aggregate scores by panels of experts, as well as price ranges of a large set of Italian wines, but scores of the experts composing the panels and individual wine prices are not available.

Three desirable properties characterize the proposed rating system, as illustrated in these applications. First, ratings are obtained by a standard statistical procedure that embeds an “economic” evaluation of the quality values (QVs) of wines, delivering ranked disjoint quality equivalent classes. Second, the system can easily incorporate sub-ratings related to a price–quality relationship that takes into account both wine characteristics and rating classes. Third, the flexibility of the system allows potential users to calibrate the parameters that define the set of rating classes and the incorporation of price–quality sub-ratings according to desired commercial objectives. Wine rating reports based on the proposed rating system may provide a more transparent and informative assessment of wine quality for producers and consumers than current methods.

The remainder of the paper is composed of five sections and an Appendix. Section 2 details the rating methodology. Sections 35 implement the rating system using the three datasets described above. Section 6 concludes. The Appendix reports additional data tables referenced in the text.

II. Methods

The tasting panel is composed of N experts indexed by $i \in \left\{ {1,2,3, \ldots ,N} \right\}$ who evaluate M wines indexed by $j \in \left\{ {1,2,3, \ldots ,M} \right\}$ on a numerical rating scale defined on the positive real line. The score assigned by expert i to wine j is denoted by ${X_{ij}}$.

A. Scores standardization

As noted, experts may use a rating scale differently according to the (unobservable) weights assigned to different wines’ quality factors. For example, experts may assign different weights to sensory quality factors such as concentration, balance, persistence, or harmony, which are common factors requiring a specific evaluation in most wine scorecards.

To make experts’ evaluations comparable, we standardize the raw scores of each expert with respect to the location and scale of his/her score distribution. To this end, we use a standard Z-score, given by ${Z_{ij}} = \left( {{X_{ij}} - {\mu _i}} \right)\sigma _i^{ - 1}$, where ${\mu _i} = {M^{ - 1}}\mathop \sum \limits_j {X_{ij}}$ is the mean and ${\sigma _i} = \sqrt {{M^{ - 1}}\mathop \sum \limits_j {{\left( {{X_{ij}} - {\mu _i}} \right)}^2}} $ is the standard deviation of expert i’s wine evaluation. Under this standardization, the distribution of standardized scores of each expert has the same location and scale, i.e. a zero mean and a unit variance. Different evaluations of a wine by an expert will then reflect different quality evaluations relative to the expert’s own set of weights assigned to the (unobservable) quality factors. Note that any standardization of experts’ scores must be rank-preserving to consistently reflect their preference ordering. This condition is automatically satisfied for ${Z_{ij}}$ since the Z-score is a linear function of the raw score ${X_{ij}}$.Footnote 3 To work with positive standardized scores, and with no change of any of the results that follow, a second standardization is implemented with respect to the location and scale parameters of the overall distribution of standardized scores of wines of the panel of experts, denoted by ${\mu _P}$ and ${\sigma _P}$ respectively. This standardization can be useful to compare differences in scores of any expert relative to the overall distribution of the scores of the panel if so desired. Under this (double) standardization, the score of expert i of wine j, denoted by $Z{\left( P \right)_{ij}}$, satisfies

(1)\begin{align}\frac{{Z{{\left( P \right)}_{ij}} - {\mu _P}}}{{{\sigma _P}}} = \frac{{{X_{ij}} - {\mu _i}}}{{{\sigma _i}}} \equiv {Z_{ij}}\,\, \Rightarrow Z{\left( P \right)_{ij}} = {\mu _P} + {\sigma _P}{Z_{ij}}\end{align}

B. ANOVA

The score of a wine by a panel of experts is the mean of experts’ standardized scores, given by

(2)\begin{align}Z{\left( P \right)_j} = {N^{ - 1}}\mathop \sum \limits_{i = 1}^N Z{\left( P \right)_{ij}}\end{align}

There has been a debate in the literature on whether ranks or averages are the most appropriate statistics to aggregate experts’ wine scores. Quandt (Reference Quandt2006) advocated ranks, although he recognized that there is a loss of information about perceived differences in the quality of wines, as ranks can be the same across experts, but the value of their evaluations can be very different. In their detailed analysis of voting and grading systems, Balinski and Laraki, (Reference Balinski and Laraki2007, Reference Balinski and Laraki2010) discussed how different rules of aggregation of scores and rankings proposed in the wine literature may deliver different and often inconsistent results. They proposed to use the median to reflect “majority judgment.” Specifically, they showed that the median is the correct statistic of an aggregation function of experts’ scores that is consistent with a set of basic set of preference axioms. A key assumption in their framework is that experts share a “common language,” which we associate with experts sharing a “wine scorecard.” Note that if the distribution of experts’ scores is approximately normal (hence, approximately symmetric), then the mean and the median are approximately equal, implying that the use of average scores is consistent with “majority judgment.”

The reliability of tests of mean differences using F-tests based on ANOVA rests on the assumptions of equality of variances across experts’ scores and approximate normality of the distribution of the relevant regression errors. Equality of variance across experts’ scores is guaranteed by the Z-score standardization of Equation 1. The normality assumption needs to be tested. We test the normality of the residuals associated with the ANOVA regression using Bera et al. (Reference Bera, Galvao, Wang and Xiao2016) test, which exhibits good power for small samples and is detailed in our first application. If normality is not rejected, we compute the relevant F-test from the ANOVA at a 5% significance level. If normality is rejected, we use a robust ANOVA, implemented by computing a modified F-test using trimmed means and winsorized variances. In this case, the assessment of significant mean differences is based on the Yean statistics, as detailed in Wilcox (Reference Wilcox2022).

C. Rating classes as disjoint quality equivalent classes

A standard measure of statistical difference of the mean score of a pair of wines is given by the MSD at a given statistical significance level, typically chosen to be equal to 5%. The MSD is determined by the distribution of the test statistics under the null hypothesis of no difference in means. A wine rating system partitions the set of wines evaluated by a set of experts into disjoint equivalent quality classes. If class A is labeled as superior to class B, all wines in class A have scores statistically and significantly higher than the scores of all wines in class B.

Let’s illustrate the role of the MSD in our rating system with simple examples. If wine 1’s score is significantly higher than wine 2’s score, then $Z{\left( P \right)_1} - Z{\left( P \right)_2} \,MSD$. A wine equivalence class can be defined as the label assigned to the set of wines whose scores are not statistically significantly different. Yet, the computation and use of the MSD is necessary, but not sufficient, to fulfill the commercial need to classify wines in ranked disjoint equivalent quality classes, since wines with close numerical scores may not be assigned to such classes.

The common practical solution is to classify wines in different commercially significant categories based on partitions of raw scores in numerical quality categories treated as absolute ranks of quality.Footnote 4 For example, 90-point wines are classified strictly better than 89-point wines. If the value of the MSD is greater than 1, a 90-point wine is not significantly different from an 89-point wine. As long as some scores are viewed as a threshold of quality, such as a 90-point score, whether or not a 90-point score or an 89-point score represents significantly different quality levels may be highly relevant commercially.

More complications arise when we consider multiple mean comparisons. For example, let the scores of wines A, B, and C be 91, 90, and 89 respectively. If the MSD is less than 1, these wines are in disjoint quality equivalent classes. Using preference ordering notation, $A \succ B \succ C$. If the MSD is greater than 2, then these wines are in the same equivalence class, that is, $A \sim B \sim C$. If the MSD is 1.5, however, wine A is better than C ($A \succ C$) but equivalent to B ($A \sim B$), while wine B is equivalent to C ($B \sim C$). We can “separate” A from C, but we are unable to place B in one or the other class. Several examples of this situation are reported in Amerine and Roessler (Reference Amerine and Roessler1983), and arise in all our datasets as well. From a statistical viewpoint these comparisons are perfectly reasonable and informative, but they are not useful commercially. Paraphrasing the rating categories of some wine competitions, any wine receiving a “gold” medal should be classified as strictly better than any wine receiving a “silver” medal. In other words, quality equivalent classes must be disjoint.

Statistical significance and commercial relevance can be reconciled as follows. Consider an estimated MSD as an indivisible unit of account of quality. The QV of a wine can be defined by

(3)\begin{align}QV\left( {Z{{\left( P \right)}_j}} \right) = int\left( {\frac{{Z{{\left( P \right)}_j}}}{{MSD}}} \right)\end{align}

where the $int$ operator truncates any fraction, transforming the value of wine in an integer number. In other words, $QV\left( {Z{{\left( P \right)}_j}} \right)$ simply transforms the original standardized score of wine j into indivisible $MSD$ units. These units can be viewed as the “currency” employed in valuing a specific set of wines rated by a specific set of experts. The computation of the QVs automatically delivers the ranking of wines in disjoint quality equivalent classes and the relevant rating distribution. A standard measure of the MSD is obtained by computing the relevant F-tests from the ANOVA and the associated Fisher Least Significant Difference (FLSD) at a 5% significance level. We treat the FLSD as a benchmark, since it is the most liberal test of pairwise comparisons, being based on a Type I error rate that assumes individual pairwise comparisons.Footnote 5 If normality is rejected, we use a robust ANOVA, implemented on trimmed means and winsorized variances, using the Yean statistics to obtain the corresponding robust FLSD. Having obtained the FLSD, we parameterize the MSD as $MSD\left( k \right) = kFLSD$, where $k \geqslant 1$. The MSD is thus calibrated as a multiple of the FLSD. By varying k, we can determine a desired number of ranked disjoint quality equivalent classes depending on commercial objectives. Therefore, a (calibrated) wine QV is computed as

(4)\begin{align}QV\left( {Z{{\left( P \right)}_j},k} \right) = int\left( {\frac{{Z{{\left( P \right)}_j}}}{{kFLSD}}} \right)\end{align}

The value $k = 1$ is the benchmark ($MSD\left( 1 \right) = FLSD$), since it delivers the maximum number of disjoint quality equivalent classes. If some $k 1$ is chosen, the QV of a wine is expressed in terms of indivisible multiples of FLSD units. In this case, the number of ranked disjoint equivalent quality classes is typically reduced as k is increased.

In sum, given the FLSD and a choice of k, the computation of wines’ $QV$s determines the rating classes of the wine sample, which can be denoted by alphabetically ordered capital letters or other descriptors conveying a scale of different quality levels.

D. Rating underpriced wines

Núñez et al., (Reference Núñez, Martín-Barroso and Velázquez2024) review the extensive literature on hedonic linear regressions, which have been widely used in assessing the wine price–quality relationship. Few applications, such as Amédée-Manesme et al. (Reference Amédée-Manesme, Fayeb and Le Furb2020) and Castriota et al. (Reference Castriota, Corsi, Frumento and Ruggeri2022), have also used hedonic quantile regressions with the aim of identifying possible nonlinearities between wine price, wine characteristics, and expert scores.

We use hedonic quantile regressions to identify underpriced wines within each rating class as follows. Suppose the rating classes of a set of wines are labeled in decreasing quality order as $\left\{ {A,B,C,D,\ldots} \right\}$. Denote with ${P_j}$ the price of wine j, with ${Y_j}$ a vector of wine characteristics, and with ${I_R}$ a set of indicator functions classifying wines in each rating class $R \in \left\{ {A,B,C,D,..} \right\}$. We estimate the following hedonic price quantile regression:

(5)\begin{align}{P_j} = {\alpha _q} + {Y_j}{\beta _q} + \mathop \sum \limits_R \gamma _q^R{I_R} + I _j^q\end{align}

where the value of q is set to a value strictly lower than the median ($q=0.5$). The quantile regression estimates the predicted quantile q of the price of wine j in rating class R. Denote with $\hat P_j^q\left( R \right)$ the predicted quantile q of the price of wine j in rating class R. We identify an underpriced wine as a wine for which the observed price is lower than the estimated price at quantile q, i.e. ${P_j} \hat P_j^q\left( R \right)$. The rating of this wine is then set equal to $R + $. In other words, for each identified rating class, we define a sub-rating class marked by a “$ + $,” which includes wines that are underpriced within their quality level.

To sum up, the choice of q determines the magnitude of estimated underpricing, with lower levels of $q=0.5$ indexing higher degrees of underpricing. For reporting purposes, the choice of q will ultimately be determined by commercial considerations.

III. The “Judgment of Paris” wine competition

The 1976 Paris Wine Tasting was organized by Steven Spurrier, owner of a wine shop, and Patricia Gallagher, a manager of a wine school. In this event, nine French experts evaluated in blind tastings a set of top-quality white and red wines from France and California. In each of the tastings, a California wine ranked first. Taber (Reference Taber2005) illustrates the mechanics of the tasting and vividly describes the significant marketing impact of the competition in the international wine world: for the first time, New World wines were ranked as superior to top-quality Bordeaux and Burgundy wines by French experts. Experts used a (0–20) numerical scale, assigning separate points to four quality factors: eye, nose, mouth, and harmony. Unfortunately, the scores of each quality factor have not been made available.

Several statistical analyses of the results of this competition have been carried out, although most studies have erroneously included the scores of Steven Spurrier and Patricia Gallagher that were not included in the total count of wine scores (see Taber (Reference Taber2005), p. 202). Ashenfelter and Quandt (Reference Ashenfelter and Quandt1999) and Quandt (Reference Quandt2006) analyzed the results of the red wine competition converting scores into ranks, and discussed various methods of aggregations of expert scores and an evaluation of the rank correlations among experts as a measure of “consensus.” Cicchetti (Reference Cicchetti2006) focused on the degree of agreement of experts in both the red and white wine competitions, pointing out the possibility of different outcomes if the panel was split according to some measure of experts’ “consistency.” Hulkower (Reference Hulkower2009) applied Borda method of ranking, while Balinski and Laraki (Reference Balinski, Laraki, Giraud-Héraud and Pichery2013) used their proposed “majority judgment” method: both studies reported results different from those originally publicized since French red wines were found to rank first. More recently, Gergaud et al. (Reference Gergaud, Ginsburgh and Moreno-Ternero2021) have reviewed how the rankings of red wines would have changed using different ranking procedures. To the best of our knowledge, virtually all of the analyses of this competition have not examined systematically whether differences in total wine scores or ranks are statistically significant. What would have been the results of this competition if our proposed rating system had been applied using the scores of the nine French judges? And what would have been the event’s commercial impact?

The Appendix reports basic information and statistics of this wine competition. Table A.1 lists white and red wines, which include famous Burgundy and Bordeaux wines respectively. Table A.2 lists the experts, generally considered the apex of professional wine expertise in France at that time. Tables A.3 (white wines) and A.4 (red wines) report the original scores, the standardized scores according to Equation 1, statistics for each wine and each expert, and aggregate scores both including and excluding the scores of the experts whose tally was not included in the final scores.Footnote 6

Table 1 summarizes the ranking of white and red wines by the standardized mean scores.

Table 1. Judgment of Paris: Wines ranked by standardized mean score.

Note that a U.S. wine is first in the rank in both the white and red wine groups. As measured by the average of scores and rank sums by country provenance, U.S. white wines performed similarly to French white wines: this was a notable feat, given the top quality of French Burgundy wines. However, in the red wine category, U.S. wines performed worse than French wines on average, except the first red California wine. However, the difference between the standardized mean score of the first-ranked U.S. red wine and the second-ranked French red wine is minuscule.

Recall that experts arrived at their total wine score by rating wines according to four quality factors. An indirect gauge of how differences in quality assessment among experts might have affected their final score can be obtained by estimating a simple factor model. As shown in Table 2, the results of a standard estimation of common factors used by experts indicate that about 91% and 88% of variations of scores for white and red wines respectively is spanned by three common factors. This suggests that experts likely assigned different weights to at least three of the four quality factors composing the rating scale. The results of this simple factor analysis appear consistent with a three-level partition of quality factors.Footnote 7

Table 2. Judgment of Paris: Factor analysis.

As previously mentioned, the use of standard F-tests to compute MSDs rests on the assumption of normality of the conditional distribution of scores across experts. As a formal test, we use the statistical procedure introduced by Bera et al. (Reference Bera, Galvao, Wang and Xiao2016). They show that normality can be assessed based on the asymptotic Quantile-Covariance (QC) function, defined as the ratio of the expected quantile loss function over the density function evaluated at each quantile. Bera et al. (Reference Bera, Galvao, Wang and Xiao2016) show that the QC function is constant if and only if the underlying distribution is normal, and show that this property can be tested using standard Kolmogorov-type statistics. Graphically, a QC plot would exhibit approximate normality if it is close to a horizontal line. The results of the test can be also represented by QQ plots inclusive of confidence bands. As shown in Figure 1, the 95% confidence bands of the QC include the horizontal line, and the QQ plots indicate that all score observations are inside the relevant 95% confidence band: thus, the null of normality is not rejected.

Figure 1. Judgment of Paris: QC and QQ plots.

Table 3 reports ANOVA tables for two estimates of the MSD: the benchmark FLSD, and the more conservative Fisher-Hayter (Hayter, Reference Hayter1986) MSD measure, denoted by FHLSD, whose test of mean differences is based on the Type I error rate associated with all 45 ($N\left( {N - 1} \right)/2$) pairwise comparisons of the $N = 10$ wines in the sample. As expected, the FHLSD is notably greater than the FLSD, indicating that a more stringent criterion used to identify differences among standardized mean scores will generally result in a smaller number of rating classes. Table 4 reports the QVs and the ratings according to the FLSD and the FHLSD. The QV of each wine is computed according to Equation 2, using as denominator the FLSD and the FHLSD respectively. The resulting rating classes are labeled with alphabetically ordered capitalized letters..

Table 3. Judgment of Paris: ANOVA.

Table 4. Judgment of Paris: QVs and ratings.

Let’s compare the results under the FLSD and FHLSD “price systems.” Consider columns (1) and (2) of the white wines panel: the first two wines, one U.S. and one French wine, are worth six FLSDs and are placed in the A rating class; the third wine, a U.S. wine, is worth five FLSDs and is placed in the B rating class; all wines ranked from forth to nine (three U.S. and three French wines) are worth three FLSDs and are placed in the C rating class; the 10th-ranked U.S. wine is worth only one FLSD, and it is placed in the F rating class. According to the FLSD “currency,” white wines are ranked in the six classes $\left( {A,B,C,D,E,F} \right)$, where the D and E classes are empty. When we use a more conservative criterion of significant difference, such as the FHLSD, then the number of rating classes shrinks: as shown in columns (3) and (4) of the white wines panel, QVs are smaller and the rating classes are reduced to three, $\left( {A,B,C} \right)$, and within each class the number of U.S. and French wines is the same.

The results for red wines differ markedly from those of the white wines. Under the FLSD three classes are obtained, $\left( {A,B,C} \right)$: four wines are rated A, with only one U.S. wine in the A class; two wines are rated B, one U.S. and one French; the remaining four wines are rated C and are all U.S. wines. Under the FHLSD, rating classes shrink to two, $\left( {A,B} \right)$: five wines are rated A, with two U.S. wines in the list; the remaining five wines are rated B, with only one French wine in the list. Note that the use of different “price systems” can be useful to assess how sensitive is the placement of certain wines on the boundaries of rating classes to changes in rating classes. We explore the usefulness of variations in “currency” denominations for commercial purposes in the next application. Overall, French red wines performed better than the U.S. red wines, although the performance of the two highest-ranked U.S. red wines was as good as that of the French red wines in the two highest rating classes under the benchmark FLSD.

Summing up, the application of our proposed rating system to this important wine competition would have delivered a more balanced assessment of the quality of the wines involved. California white wines were on average comparable to French white wines, and some of them were in a higher rating class than some white French wines. By contrast, French red wines were in higher rating classes than California red wines, except the Stags’ Leap Winery Cabernet, whose ranking determined its rise to fame and the main marketing punch of this wine competition. Interestingly, while the evidence for California white wines being better or equivalent to French wines was compelling, the marketing galore was mostly focused on red wines.

IV. Bordeaux en-primeur

The commercial importance of pricing for Bordeaux wines in primary and secondary markets is stressed and analyzed by Masset et al. (Reference Masset, Weiskopf and Cardebat2023), who review the literature on the “efficiency” of Bordeaux wine pricing. The Bordoverview.com website contains ratings and prices for a large set of Bordeaux en-primeur wines. The sample of Left Bank red wines in 2021 includes 170 wines rated by five experts: William Kelley for Wine Advocate (WA), Jeff Leve (JL), Jane Anson for Decanter Magazine (JA), Chris Kissack (CK), and Jancis Robinson (JR). All experts rated wines on a 75–100 scale except JR, who used a 10–20 scale, which we converted into a 75–100 scale by a simple linear transformation. Table A.4 in the Appendix reports the distribution of the rated 170 wines by Appellation d'Origine Contrôlée (AOC) and classification.Footnote 8

Table 5 reports statistics of the scores of each wine by the experts, the relevant Spearman rank correlation matrix of experts’ evaluations, and the results of a standard estimation of common factors and factor loadings. Note that not all experts rated each wine in the sample. The mean and standard deviation of the scores of experts are very similar except JR, where differences may be in part due to the smaller number of the wines JR evaluated. The rank correlation of scores among experts is fairly high. The estimation of common factors used by the experts indicates that about 94% of the variation of scores is spanned by three common factors. Looking at the experts’ factor loadings, the magnitude of the loadings of the first factor is very similar across experts, while those of the second and third factors differ across experts, likely capturing different weights assigned to the latent quality factors used in experts’ wine evaluations.

Table 5. Left bank wine scores, factors, and factor loadings.

The scores of each expert were standardized relative to the panel’s location and scale measures according to Equation 2. Bera et al. (Reference Bera, Galvao, Wang and Xiao2016) tests of normality rejects the null a 5% confidence level. By trimming the distribution of scores excluding scores below the 5% percentile and above the 95% percentiles (a total of 35 scores), and winsorizing variances, the modified F-tests based on the Yean statistics do not reject the null of normality at a 5% confidence level. This result is visually depicted by the QC and QQ plots of Figure 2, where both the QC and QQ plots exhibit confidence bands consistent with approximate normality.

Figure 2. Left Bank sample: Bera et al. (Reference Bera, Galvao, Wang and Xiao2016) QC and QQ plots.

The “robust” ANOVA table for this sample (not reported) delivers a value of the “robust” FLSD equal to 2.63. Recall that we can calibrate the MSD as a multiple of the FLSD, that is, $MSD\left( k \right) = kFLSD$, since the FLSD is the most liberal statistical criterion. By increasing k, the number of rating classes declines. The choice of $k \geqslant 1$ thus determines the desired number of classes according to the desired commercial objectives if the number of rating classes of the benchmark ($k = 1$) is deemed not optimal. As shown next, evaluating how rating classes change with different values of k can determine how sensitive is the rating of wines at the boundaries of the rating classes and the corresponding thresholds of the rating classes in terms of the numerical scale used. As observed earlier, the determination of rating class boundaries as reported in terms of the rating scale of reference have important commercial implications.

Table 6 reports statistics of standardized scores in each rating class for all $k \in \left\{ {1,1.5,2} \right\}$ obtained by computing the corresponding QVs. The minimum and maximum of each rating class are the rating class boundaries. Is a 90-point wine strictly better than an 89-point wine? Under the FLSD “currency” ($MSD\left( 1 \right)$), these two wines are rated D, since the D class includes 43 wines with scores in the interval (89.5–92.0). Therefore, their quality is rated equally. Under the $MSD\left( {1.5} \right)$ “currency,” a 90-point wine belongs to rating class B, which includes 26 wines with scores in the interval (90.7–94.4), whereas an 89-point wine belongs to rating class C, which includes 67 wines with scores in the interval (86.8–90.2). Thus, under the $MSD\left( {1.5} \right)$ “currency,” these two wines belong to different rating classes. This latter comparison also holds under the $MSD\left( 2 \right)$ “currency.”

Table 6. Left Bank sample: Ratings with MSD(k) = kFLSD, for k = 1, 1.5, 2.

Summarizing, rating boundaries will crucially depend on the “evaluation style” of experts that compose the tasting panel since their evaluations will result in different “price systems,” leading to different distributions of QVs that determine the rating classes. From a commercial perspective, any issuer of ratings from expert panels adopting this rating system (a wine publication, or a rating website, for example), will have to determine the relevant “price system” that fulfills some commercial objectives. Disclosures of these choices may strengthen the reliability and the reputation of the ratings, enhancing their desired marketing impact.

A. Price–quality rating

Bordoverview.com reports the “average initial consumer price “en-primeur” in euros with tax included” for a subsample of wines, which are suggested to be used as a general guideline. To illustrate how price information can be embedded in the rating system, we consider the rating classes corresponding to the $MSD\left( {1.5} \right)$ value. We first estimate the missing prices with a standard hedonic model, and then derive the “+” rating subdivisions via quantile regressions.

To obtain the missing prices, we estimated the following hedonic regression:

(6)\begin{align}{\text{log}}{P_j} = \alpha + {X_j}\beta + \mathop \sum \limits_R {\gamma ^R}{I_R} + {I _j}\end{align}

where the vector ${X_j}$ includes indicator variables that index AOC, classification, and size of production, and ${I_R}$ is an indicator variable equal to 1 if wine j belongs to rating class $R \in \left\{ {A,B,C,D,E} \right\}$. The estimated coefficients are used to estimate the missing prices in the sample. As expected, this distribution of wine prices exhibits high right-skewness due to very expensive wine belonging to the historical 1855 Bordeaux classification.

The quantile regressions are specified as in Equation 5, and they are estimated for $q = 0.10$ and $q = 0.25$. Table 7 reports the results for both the standard hedonic regression and the quantile regression. The explanatory power of these regressions is fairly high, ranging from an R2 of 0.87 for the standard hedonic regression, to values of the Pseudo-R2 of 0.67 and 0.71 for the $q = 0.10$ and the $q = 0.25$ quantile regressions respectively. The p-values of the coefficients associated with AOC and Classification variables are all significant. Moreover, the coefficients of quality ranking are significant and indicate a positive qualit–-price relationship.

Table 7. Left Bank wines: Hedonic linear and quantile regressions.

The ratings incorporating price information are reported in Table 8, which shows the mean, minimum, and maximum prices in each rating class. Interestingly, price ranges overlap in the B and C rating classes, suggesting the existence of some overpricing in each of the two classes, likely due to the reputation effects of the classifications. As expected, the number of underpriced wines in each rating class that deserve a “$ + $” increases with the chosen q.

Table 8. Left Bank wines: Prices, ratings, and +ratings, MSD(1.5).

To summarize, using a sample which includes price information, we have illustrated the usefulness of using a rating system based on statistically significant and commercially relevant criteria to provide information about wine quality. This information is conditional on both the sample of rated wines and the number of experts involved in the evaluation. The commercial importance of setting the MSD parameters and the choice of quantile levels has been stressed. As in rating systems used in many other commercial fields, transparent information to the public about these choices may enhance the trustworthiness and reputation of the rating system in providing information about the quality of wines for producers and consumers.

V. The ONAV wine sample

The ONAV database contains a large number of Italian wines rated on a (75–100) scale and classified by standard typologies (still white wines, still red wines, sparkling wines, etc.), vintage year, and denomination of origins.Footnote 9 We selected from the database a sample of 2,485 wines, composed of 986 still white wines and 1,499 still red wines.

Table 9 shows the distribution of wine scores, vintage year, classification, and price ranges of the selected sample. First note that wine scores are in the subset of the reference rating scale, ranging from a minimum of 83 to a maximum of 96. Second, wine scores are obtained by teams of experts who follow a wine evaluation template similar to the one used by OIV (2022), but we do not have information on the scores assigned to each of the quality factors by the teams, and how they were aggregated. Hence, our rating system is applied directly to the distribution of reported wine scores. Moreover, prices of each wine are not reported individually but are just placed in six price ranges. About 95% of the wines have prices not greater than 40 euros, indicating that the bulk of the sample includes wines from low to medium–high price points. Despite the unavailability of individual expert scores, we can construct a “star” rating system identifying ranked disjoint equivalent classes by using a simple version of a Finite Mixture Model (FMM). Such a model allows to separate observations in subpopulations using estimates of the latent distributions that compose the mixture distribution of the wine scores.Footnote 10

Table 9. ONAV wine sample: Scores, vintages, classification and price ranges.

Denote with ${y_j}$ the score of wine j and with ${\mathbf{y}}$ the N-dimensional vector of wine scores. The density $f\left( {\mathbf{y}} \right)$ of ${\mathbf{y}}$ is assumed to come from R distinct classes of densities ${f_1},{f_2},..,{f_R}$ in proportions ${\pi _1},{\pi _2},\ldots,{\pi _R}$. A general specification of an R-component FMM conditional on a linear model of a vector of X covariates is given by:

(7)\begin{align}f\left( {\mathbf{y}} \right) = \mathop \sum \limits_{i = 1}^R {\pi _i}{f_i}\left( {{\mathbf{y}}|{X^T}{\beta _i}} \right)\end{align}

where ${\pi _i} \in \left[ {0,1} \right]$ is the probability for the ith class, $\sum _{i = 1}^R{\pi _i} = 1$, and ${f_i}\left( {{\mathbf{y}}|{X^T}\beta } \right)$ is the probability density function of ${\mathbf{y}}$ in the ith class conditional on the vector X, where the T superscript denotes a transpose.

The estimation of the probabilities of each component and the relevant conditional density function is interpreted as arising from the different unobservable weighting of wine quality factors assigned by the experts and summarized by a wine score. Since all wines are evaluated blind, we estimate the model with no covariates (the vector X includes only a constant), under the assumption that a wine score issued by a panel of experts summarizes all relevant information leading to a rating of a wine, including denomination of origin, vintage, and other unreported features of the wines that experts used.

The probabilities of the latent classes ${\pi _i}$ are estimated using a multinomial logit, where ${\pi _i} = \frac{{exp\left( {{y_i}} \right)}}{{\mathop \sum \nolimits_i exp\left( {{y_i}} \right)}}$. The choice of the family of densities to use for ${f_i}\left( . \right)$ depends on the structure of the data. As shown in Figure 3, the log scores appear to be well approximated by a (truncated) normal distribution. Hence, we use log-normal densities for ${f_i}\left( . \right)$ in the estimation of the parameters of Equation 5.1 In other words, the density of the vector of wine scores is approximated by a linear mixture of lognormal densities.

Figure 3. ONAV wine sample: log(score) distribution.

The likelihood function is computed as the sum of the probability-weighted conditional likelihood from each latent class, and estimation is iterative. The maximum of the predicted posterior probabilities across classes determine the partition of scores in rating classes. A key choice of the estimation procedure is the determination of the number of rating classes. As a baseline, we chose the number of classes as determined by standard AIC and BIC criteria. We computed both the AIC and BIC statistics for a number of classes ranging from 2 to 5, and found that both AIC and BIC statistics are minimized for $R = 4$.

Table 10 reports the results. As shown in the upper panel, wine scores fall into four rating classes in proportions 0.20, 0.21, 0.22, and 0.37 respectively. The mean scores are 83.47 in class 1, 85.29 in class 2, 87.27 in class 3, and 88.83 in class 4. The predicted posterior probabilities, denoted by pp1, pp2, pp3, and pp4, measure the probability of a wine with score x belonging to each of the four classes. As shown in the lower panel, the partition of ratings in terms of score intervals is simply determined by the maximum predicted probability of a wine of a given score to belong to each class. One star is given to wines with scores in the 83–84 range, two stars are given to wines with scores in the 85–86 range, three stars are given to wines with scores in the 87–88 range, and four stars are given to wines with scores greater than or equal to 89.

Table 10. ONAV Wine Sample: FMM predicted posterior probabilities and ratings.

A. Price–quality rating

Turning to the relationship between ratings and prices, Table 11 reports the distribution of wines by rating and price range under the four-star rating systems constructed with the FMM model.

Table 11. Italian wines sample: Wine distribution by rating and price range.

Three main results emerge from this table. First, as shown in the upper panel, there are 93 four-star wines in the lowest price range, suggesting that these wine would be natural candidates of an extra “$ + $” mark indicating underpricing relative to quality. The number of four-stars wine in the next higher price range is also substantial. Second, when we look at the fraction of wines in the lowest rating class by price range, this fraction declines only slowly with the increase in price ranges, suggesting wine overpricing relative to quality for a substantial number of wines. Relative wine overpricing similarly occurs for the intermediate rating categories (two and three stars). For the highest rating class, the fraction of wines increases and then decreases with the price range. Third, a significant fraction of wines of the highest rating classes 3 and 4 are in the lowest and next to the lowest price ranges, suggesting significant underpricing relative to quality. The evidence suggests that for this sample, a positive relationship between price and wine quality might not be as strong as predicted by standard hedonic price equations. This might be due to the pervasive wines’ underpricing and overpricing. Yet, our rating methodology is fairly flexible, since the number of rating classes can be increased or reduced in the estimation, and other conditioning covariates might be introduced in the X matrix to obtain finer partitions of rating classes consistent with specific commercial objectives.

The application of our rating system to this sample of Italian wines has shown that useful information about wine quality and its relationship with prices can be extracted from the data even without information about the scores assigned by the panels of experts involved in wine evaluations. The availability of this information, as well as the use of individual prices of wines, would undoubtedly improve the assessment of wine quality and the identification of wine underpricing.

VI. Conclusion

This paper has constructed a wine rating system based on scores assigned by a panel of wine experts to set wines. Standardized expert scores and a novel measure of wine QVs deliver ranked disjoint equivalent quality rating classes, which can be expanded into subcategories using a standard model of the price–quality relationship.

We have applied the system to the 1976 “Judgment of Paris” wine competition using only wine score information, to a sample of Bordeaux en-primeur wines and prices, illustrating the incorporation of price information in the ratings, and to a sample of Italian Wines, where the rating was constructed solely based on final scores by teams of experts. The application of the proposed rating system to these datasets shows that the system provides an informative assessment of wine quality for producers and consumers and a flexible template for wine rating reports.

All the datasets we have employed report data of the experts’ score of each wine but do not report the separate scores of wine quality factors customarily employed in professional wine tastings. An extension of the proposed rating system adapted to these more detailed data is part of our research agenda.

Supplementary material

The supplementary material for this article can be found at https://doi.org/10.1017/jwe.2024.27.

Acknowledgement(s)

I would like to thank an anonymous reviewer, Magalie Dubois, and participants in the 15th Annual Conference of the American Association of Wine Economists (AAWE) in Stellenbosch, South Africa, for their comments and suggestions.

Footnotes

1 Wine publications often report scores of experts that are supplemented by tasting notes. Typically, longer tasting notes are associated with higher scores. However, the specific information content of these notes about the assessment of specific quality factors is difficult to ascertain on a comparative basis, as most of these notes are focused on sensory descriptors. For a review of the information content of wine-tasting notes and an evaluation of the price impact of descriptors, see Capehart (Reference Capehart2021). On the potential role of tasting notes in wine marketing aided by AI technology, see Carlson et al. (Reference Carlson, Kopalle, Riddel, Rockwell and Vana2023).

2 An overview of the methods in food science is in Lawless and Heymann (Reference Lawless and Heymann2010). A review of the methods applied to wine are reviewed in Jackson (Reference Jackson2020) and Lesschaeve and Noble (Reference Lesschaeve, Noble and Andrew2022).

3 The standardization based on a transformation on the entire cumulative distribution of each expert score relative to that of one expert proposed by Cardebat and Paroissien (Reference Cardebat and Paroissien2015), and applied by Gergaud et al. (Reference Gergaud, Ginsburgh and Moreno-Ternero2021) and the Global Wine Score website www.globalwinescore.com, is not rank-preserving due to the pervasive presence of sets of wines ranked with the same score (ties).

4 Typical examples of these classifications are the rating tables used by magazines such as Wine Spectator or Decanter, where quality categories are associated with number ranges of a given scale anchored by descriptions indicating progressively higher quality as number ranges increase.

5 Pairwise comparisons of means following statistically significant F-tests are used to detect which particular means in a group are significantly different. When multiple independent tests are conducted, each test has an inherent Type I error rate α, but the overall family-wise Type I error rate accounting for all the $\left( {n - 1} \right)/2$ comparisons is equal to $1 - {\left( {1 - \alpha } \right)^n}$, where n is the number of comparisons. The Fisher LSD is the most liberal, as it does not control for the family-wise error rate. In applications, the Fisher LSD is considered “protected” from underestimation of the Type I error rate by an ANOVA F-test resulting in a very small p-value. For a review of multiple pairwise mean comparisons following ANOVA, see Sauder and DeMars (Reference Sauder and DeMars2019).

6 Perhaps unsurprisingly, a comparison of the aggregate scores for both white and red wines reveals fairly different values of means and standard deviations for most wine scores when experts number 4 and 8 are not included in the total count.

7 The “eye” part of the evaluation might have been very similar across experts since this quality factor is typically assessed to identify potential faults and some features of the typology of a wine, which was known to the judges. Moreover, the “eye” part typically receives the lowest weight in most professional wine quality evaluation protocols. The remaining three quality factors are those where most of the evaluations might have differed.

8 Classifications are: the 1855 Médoc Grand Cru Classé Classification (1st–5th GCC), with a total of 61 wines in five subcategories; the Médoc Cru Bourgeois Classification (CB), with a total of 56 wines; and the Graves classification (CC Graves), with 11 wines.

9 We consider four denominations: IGT, IGP, DOC/DOP, and DOCG. IGT (Indicazione Geografica Tipica) and IGP (Indicazione Geografica Protetta) are both geographical classifications. The IGT differs from IGP due to fewer restrictions for the bottling, labeling and production of grapes encompassing a very large area of production. The IGP classification is more restrictive, since the wine must be created or transformed in the production area indicated by the specification. DOC and DOP classifications are equivalent (DOC was earlier established in Italy, and subsequently incorporated in DOP according to the European-wide wine classification,The DOCG (Denominazione di Origine Controllata and Garantita observe more production restrictions summarized by the “guaranteed” term.

10 For a review of FMM models, see McLachlan et al. (Reference McLachlan, Lee and Rathnayake2019). An example of application of finite mixture models to wine data is in Cao (2014), who focused on a model designed to identify common versus random components of tasters’ evaluations.

References

Amédée-Manesme, C.-O., Fayeb, B., and Le Furb, E. (2020). Heterogeneity and fine wine prices: Application of the quantile regression approach. Applied Economics, 52(26), 28212840.CrossRefGoogle Scholar
Amerine, M. A., and Roessler, E. B. (1983). Wines: Their Sensory Evaluation. W.H Freeman and Company.Google Scholar
Ashenfelter, O., and Quandt, R. (1999). Analyzing a wine tasting statistically. Chance, 12(3), 1620.CrossRefGoogle Scholar
Balinski, M., and Laraki, R. (2007). A theory of measuring, electing, and ranking. Proceedings of the National Academy of Sciences of the United States of America, 104(21), 87208725.CrossRefGoogle ScholarPubMed
Balinski, M., and Laraki, R. (2010). Majority Judgment. The MIT Press.Google Scholar
Balinski, M., and Laraki, R. (2013). How best to rank wines: Majority judgment. In: Giraud-Héraud, E and Pichery, MC (eds.), Wine Economics. Applied Econometrics Association Series. Palgrave Macmillan (pp. 149172).Google Scholar
Bera, A. K., Galvao, A. F., Wang, L., and Xiao, Z. (2016). A new characterization of the normal distribution and test of normality. Econometric Theory, 332(5), 12161252.CrossRefGoogle Scholar
Capehart, K. W. (2021). Willingness to pay for wine bullshit: Some new estimates. Journal of Wine Economics, 16(3), 260282.CrossRefGoogle Scholar
Cardebat, J.-M., and Paroissien, E. (2015). Standardizing expert wine scores: An application for Bordeaux en primeur. Journal of Wine Economics, 10(3), 329348.CrossRefGoogle Scholar
Carlson, K., Kopalle, P., Riddel, A., Rockwell, D., and Vana, P., 2023, Complementing human effort in on-line reviews: A deep learning approach to automatic content generation and review synthesis. International Journal of Research in Marketing, Vol. 40, 54-74.CrossRefGoogle Scholar
Castriota, S., Corsi, S., Frumento, P., and Ruggeri, G. (2022). Does quality pay off? “Superstar” wines and the uncertain price premium across quality grades. Journal of Wine Economics, 141158.CrossRefGoogle Scholar
Cicchetti, D. V. (2006). The Paris 1976 tasting revisited once more: Comparing ratings of consistent and inconsistent tasters. Journal of Wine Economics, 1(2), 125140.CrossRefGoogle Scholar
FitchRatings. (2022). The rating process: How Fitch assigns credit ratings. February, www.firtchratings.com.Google Scholar
Gergaud, O., Ginsburgh, V., and Moreno-Ternero, J. D. (2021). Wine ratings: Seeking a consensus among tasters via normalization, approval, and aggregation. Journal of Wine Economics, 16(3), 321342.CrossRefGoogle Scholar
Hayter, A. (1986). The maximum familywise error rate of the fisher's least significance difference. Journal of the American Statistical Association, 81(396), 10001004.CrossRefGoogle Scholar
Hulkower, N. (2009). The Judgment of Paris according to Borda. Journal of Wine Research, 20(3), 171182.CrossRefGoogle Scholar
Jackson, R. S. (2017). Wine Tasting: A Professional Handbook (3rd ed.). Academic Press, Elsevier Ltd.Google Scholar
Jackson, R. S. (2020). Wine Science: Principles and Applications (5th ed.). Academic Press, Elsevier Ltd.Google Scholar
Koenker, R. (2005). Quantile Regression. Cambridge University Press.CrossRefGoogle Scholar
Lawless, H. T., and Heymann, H. (2010). Sensory Evaluation of Food: Principles and Practice (2nd ed.). Springer.CrossRefGoogle Scholar
Lesschaeve, I., and Noble, A. C. (2022). Sensory Analysis of Wine, Chapter 7. In: Andrew, GR (ed.), Managing Wine Quality. Volume I: Viticulture and Wine Quality (2nd ed., pp. 243–277). Woodhead Publishing, Elsevier.Google Scholar
Masset, P., Weiskopf, J.-P., and Cardebat, J.-M. (2023). Efficient pricing of Bordeaux en primeur wines. Journal of Wine Economics, 18, 3965.CrossRefGoogle Scholar
McLachlan, G. J., Lee, S. X., and Rathnayake, S. I. (2019). Finite Mixture Models. Annual Review of Statistics and Its Applications, 6, 355378.CrossRefGoogle Scholar
Miller, J. R., Stone, R. W., and Stuen, E. T. (2015). When is a wine a bargain? A comparison of popular and regression-based approaches. Journal of Wine Research, 26(2), 153168.CrossRefGoogle Scholar
Morgan Stanley Capital International. (2022). MSCI ESG Ratings. www.msci.com.Google Scholar
Núñez, J., Martín-Barroso, D., and Velázquez, F. J. (2024). The hedonic price model for the wine market: A systematic and comparative review of the literature. Agricultural Economics, 55, 247264.CrossRefGoogle Scholar
OIV, 2022, Standard for international wine and spirituous beverages of vitivinicultural origin competitions, www.oiv.int/sites/default.Google Scholar
Quandt, R. E. (2006). Measurement and inference in wine tasting. Journal of Wine Economics, 1(1), 730.CrossRefGoogle Scholar
Sauder, D. C., and DeMars, C. E. (2019). An updated recommendation for multiple comparisons. Advances in Methods and Practices in Psychological Science, 2(1), 2644.CrossRefGoogle Scholar
Storchmann, K. (2012). Wine economics. Journal of Wine Economics, 7(1), 133.CrossRefGoogle Scholar
Taber, G. M. (2005). Judgment of Paris. Sribner.Google Scholar
Wilcox, R. R. (2022). Introduction to Robust Estimation and Hypothesis Testing (5th ed.). Academic Press.Google Scholar
Figure 0

Table 1. Judgment of Paris: Wines ranked by standardized mean score.

Figure 1

Table 2. Judgment of Paris: Factor analysis.

Figure 2

Figure 1. Judgment of Paris: QC and QQ plots.

Figure 3

Table 3. Judgment of Paris: ANOVA.

Figure 4

Table 4. Judgment of Paris: QVs and ratings.

Figure 5

Table 5. Left bank wine scores, factors, and factor loadings.

Figure 6

Figure 2. Left Bank sample: Bera et al. (2016) QC and QQ plots.

Figure 7

Table 6. Left Bank sample: Ratings with MSD(k) = kFLSD, for k = 1, 1.5, 2.

Figure 8

Table 7. Left Bank wines: Hedonic linear and quantile regressions.

Figure 9

Table 8. Left Bank wines: Prices, ratings, and +ratings, MSD(1.5).

Figure 10

Table 9. ONAV wine sample: Scores, vintages, classification and price ranges.

Figure 11

Figure 3. ONAV wine sample: log(score) distribution.

Figure 12

Table 10. ONAV Wine Sample: FMM predicted posterior probabilities and ratings.

Figure 13

Table 11. Italian wines sample: Wine distribution by rating and price range.

Supplementary material: File

De Nicoló supplementary material

De Nicoló supplementary material
Download De Nicoló supplementary material(File)
File 26.7 KB