Classification of Wines Using Principal Component Analysis

Jackson Barth; Duwani Katumullage; Chenyu Yang; Jing Cao

doi:10.1017/jwe.2020.35

Classification of Wines Using Principal Component Analysis

Published online by Cambridge University Press: 22 March 2021

Chenyu Yang and

Jackson Barth: Affiliation:
Department of Statistical Science, Southern Methodist University, 3225 Daniel Ave, Dallas, Texas, 75275; e-mail: jbarth@smu.edu.
Duwani Katumullage: Affiliation:
Department of Statistical Science, Southern Methodist University, 3225 Daniel Ave, Dallas, Texas, 75275; e-mail: dkatumullage@smu.edu.
Chenyu Yang: Affiliation:
Department of Statistical Science, Southern Methodist University, 3225 Daniel Ave, Dallas, Texas, 75275; e-mail: chenyuy@smu.edu.
Jing Cao*: Affiliation:
Department of Statistical Science, Southern Methodist University, 3225 Daniel Ave, Dallas, Texas, 75275
*: e-mail: jcao@smu.edu (corresponding author).

Article contents

Abstract
Footnotes
References

Get access

Rights & Permissions

Abstract

Classification of wines with a large number of correlated covariates may lead to classification results that are difficult to interpret. In this study, we use a publicly available dataset on wines from three known cultivars, where there are 13 highly correlated variables measuring chemical compounds of wines. The goal is to produce an efficient classifier with straightforward interpretation to shed light on the important features of wines in the classification. To achieve the goal, we incorporate principal component analysis (PCA) in the k-nearest neighbor (kNN) classification to deal with the serious multicollinearity among the explanatory variables. PCA can identify the underlying dominant features and provide a more succinct and straightforward summary over the correlated covariates. The study shows that kNN combined with PCA yields a much simpler and interpretable classifier that has comparable performance with kNN based on all the 13 variables. The appropriate number of principal components is chosen to strike a balance between predictive accuracy and simplicity of interpretation. Our final classifier is based on only two principal components, which can be interpreted as the strength of taste and level of alcohol and fermentation in wines, respectively. (JEL Classifications: C10, Cl4, D83)

Keywords

cross-validation k-nearest neighbor classification principal component analysis

Type: Articles
Information: Journal of Wine Economics , Volume 16 , Issue 1 , February 2021 , pp. 56 - 67

DOI: https://doi.org/10.1017/jwe.2020.35 [Opens in a new window]
Copyright: Copyright © American Association of Wine Economists, 2021

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

The authors gratefully acknowledge helpful comments and advice from the editor, Karl Storchmann, and an anonymous reviewer.

References

Beltran, N. H., Duarte-Mermoud, M. A., Soto Vicencio, V. A., Salah, S. A., and Bustos, M. A. (2008). Chilean wine classification using volatile organic compounds data obtained with a fast GC analyzer. IEEE Transactions on Instrumentation and Measurement, 57(11), 2421–2436.CrossRef Google Scholar

Bredensteiner, E. J., and Bennett, K. P. (1999). Multicategory classification by support vector machines. Computational Optimization and Applications, 12, 53–79.CrossRef Google Scholar

Cabrita, M., Aires-De-Sousa, J., Gomes Da Silva, M., Rei, F., and Costa Freitas, A. (2012). Multivariate statistical approaches for wine classification based on low molecular weight phenolic compounds. Australian Journal of Grape and Wine Research, 18, 138–146.CrossRef Google Scholar

Cao, J. (2014). Quantifying randomness versus consensus in wine quality ratings. Journal of Wine Economics, 9(2), 202–213.CrossRef Google Scholar

Cao, J., and Stokes, L. (2010). Evaluation of wine judge performance through three characteristics: Bias, discrimination, and variation. Journal of Wine Economics, 5(1), 132–142.CrossRef Google Scholar

Corsi, A., and Ashenfelter, O. (2019). Predicting Italian wine quality from weather data and expert ratings. Journal of Wine Economics, 14(3), 234–251.CrossRef Google Scholar

Duch, W. (2018). Coloring black boxes: Visualization of neural network decisions. ArXiv.Org; Ithaca. Available at https://arxiv.org/pdf/1802.08478.pdf.Google Scholar

Hodgson, R. T. (2008). An examination of judge reliability at a major U.S. wine competition. Journal of Wine Economics, 3(2), 105–113.CrossRef Google Scholar

Johnson, R. A., and Wichern, D. W. (2019). Applied Multivariate Statistical Analysis (6th ed.). Upper Saddle River, NJ: Pearson.Google Scholar

Kubica, J., and Moore, A. (2003). Probabilistic noise identification and data cleaning. Paper presented at the Third IEEE International Conference on Data Mining, Melbourne, FL. In 2013 Third IEEE International Conference on Data Mining, 131–138. Available at https://www.computer.org/csdl/proceedings-article/icdm/2003/19780131/12OmNzcPAqS.Google Scholar

Luxen, M. F. (2018). Consensus between ratings of red Bordeaux wines by prominent critics and correlations with Prices 2004–2010 and 2011–2016: Ashton revisited and expanded. Journal of Wine Economics, 13(1), 83–91.CrossRef Google Scholar

McCannon, B. C. (2020). Wine descriptions provide information: A text analysis. Journal of Wine Economics, 15(1), 71–94.CrossRef Google Scholar

Oczkowski, E. (2016). Identifying the effects of objective and subjective quality on wine prices. Journal of Wine Economics, 11(2), 249–260.CrossRef Google Scholar

Santos, C. A. T., Páscoa, R. N. M. J., Sarraguça, M. C., Porto, P. A. L. S., Cerdeira, A. L., González-Sáiz, J. M., Pizarro, C., and Lopes, J. A. (2017). Merging vibrational spectroscopic data for wine classification according to the geographic origin. Food Research International, 102, 504–510.CrossRef Google Scholar

Suthampan, E. (2017). Principle component analysis (PCA) for wine dataset. Available at https://rstudio-pubs-static.s3.amazonaws.com/253795_29cb3d89b03e476a99ee2d32a7886243.html#/Google Scholar

Wine Data Set (1991). University of California at Irvine. UCI Machine Learning Repository. Available at http://archive.ics.uci.edu/ml/datasets/wine (accessed May 5, 2020).Google Scholar

Wine-Searcher (2020). Piedmont [Piemonte] wine. Available at https://www.wine-searcher.com/regions-piedmont+%5Bpiemonte%5D (accessed May 5, 2020).Google Scholar

Zhong, P., and Fukushima, M. (2006). Second-order cone programming formulations for robust multiclass classification. Neural Computation, 19(1), 258–282.CrossRef Google Scholar

Article contents

Classification of Wines Using Principal Component Analysis

Abstract

Keywords

Access options

Footnotes

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests