Machine Learning Algorithms and Measurement

doi:10.1017/9781009099813.003

Chapter 1 - Machine Learning Algorithms and Measurement

from Part I - Foundations

Published online by Cambridge University Press: 08 November 2023

Meaghan M. Tracy and

Edited by

Louis Tay ,

Sang Eun Woo and

Tara Behrend

Show author details

Louis Tay: Affiliation:
Purdue University, Indiana
Sang Eun Woo: Affiliation:
Purdue University, Indiana
Tara Behrend: Affiliation:
Purdue University, Indiana

Book contents

Get access

Summary

This chapter provides an overview of the common machine learning algorithms used in psychological measurement (to measure human attributes). They include algorithms used to measure personality from interview videos; job satisfaction from open-ended text responses; and group-level emotions from social media posts and internet search trends. These algorithms enable effective and scalable measures of human psychology and behavior, driving technological advancements in measurement. The chapter consists of three parts. We first discuss machine learning and its unique contribution to measurement. We then provide an overview of the common machine learning algorithms used in measurement and their example applications. Finally, we provide recommendations and resources for using machine learning algorithms in measurement.

Keywords

machine learning psychological measurement artificial intelligence supervised learning unsupervised learning

Information

Type: Chapter
Information: Technology and Measurement around the Globe , pp. 11 - 45

DOI: https://doi.org/10.1017/9781009099813.003 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2023

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Book purchase

Temporarily unavailable

References

Ahmad, H., Arif, A., Khattak, A. M., Habib, A., Asghar, M. Z., & Shah, B. (2020, January). Applying deep neural networks for predicting dark triad personality trait of online users. In 2020 International Conference on Information Networking (ICOIN) (pp. 102–105). IEEE.CrossRef Google Scholar

Akhtar, R., Winsborough, D., Ort, U., Johnson, A., & Chamorro-Premuzic, T. (2018). Detecting the dark side of personality using social media status updates. Personality and Individual Differences, 132, 90–97. https://doi.org/10.1016/j.paid.2018.05.026 CrossRef Google Scholar

Altman, N. S. (1992). An introduction to Kernel and nearest-neighbor nonparametric regression. The American Statistician, 46(3), 175–185. https://doi.org/10.1080/00031305.1992.10475879 Google Scholar

Angelov, D. (2020). Top2Vec: Distributed representations of topics. ArXiv:2008.09470 [Cs, Stat]. http://arxiv.org/abs/2008.09470 Google Scholar

Arya, S., Mount, D. M., Netanyahu, N. S., Silverman, R., & Wu, A. Y. (1998). An optimal algorithm for approximate nearest neighbor searching fixed dimensions. Journal of the ACM (JACM), 45(6), 891–923. https://doi.org/10.1145/293347.293348 CrossRef Google Scholar

Batista, G., & Monard, M.-C. (2002). A study of K-nearest neighbour as an imputation method. In International Conference on Health Information Science.Google Scholar

Bengio, Y. (2009). Learning deep architectures for AI. Now Publishers Inc.CrossRef Google Scholar

Blei, D. M., Jordan, M. I., Griffiths, T. L., & Tenenbaum, J. B. (2003). Hierarchical topic models and the nested Chinese restaurant process. Proceedings of the 16th International Conference on Neural Information Processing Systems, 17–24.Google Scholar

Bontempi, G., Birattari, M., & Bersini, H. (1999). Lazy learning for local modelling and control design. International Journal of Control, 72(7–8), 643–658. https://doi.org/10.1080/002071799220830 CrossRef Google Scholar

Braun, M. T., & Kuljanin, G. (2015). Big data and the challenge of construct validity. Industrial and Organizational Psychology: Perspectives on Science and Practice, 8(4), 521–527. https://doi.org/10.1017/iop.2015.77 CrossRef Google Scholar

Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2), 123–140.CrossRef Google Scholar

Breiman, L. (2001). Random forests. Machine Learning, 45, 5–32. https://doi.org/10.1023/A:1010933404324 CrossRef Google Scholar

Brownlee, J. (2020, September 21). Multi-core machine learning in Python with Scikit-Learn. Machine Learning Mastery. https://machinelearningmastery.com/multi-core-machine-learning-in-python/Google Scholar

Calanna, P., Lauriola, M., Saggino, A., Tommasi, M., & Furlan, S. (2020). Using a supervised machine learning algorithm for detecting faking good in a personality self‐report. International Journal of Selection and Assessment, 28(2), 176–185. https://doi.org/10.1111/ijsa.12279 CrossRef Google Scholar

Cardaioli, M., Cecconello, S., Conti, M., Pajola, L., & Turrin, F. (2020). Fake news spreaders profiling through behavioural analysis. In Working notes of CLEF 2020 – Conference and Labs of the Evaluation Forum, Thessaloniki, Greece, September 22–25, 2020, vol. 2696 of CEUR Workshop Proceedings. CEUR-WS.org Google Scholar

Chaney, A., & Blei, D. (2012, May). Visualizing topic models. In Proceedings of the International AAAI Conference on Web and Social Media (Vol. 6, No. 1).CrossRef Google Scholar

Chekroud, A. M., Gueorguieva, R., Krumholz, H. M., Trivedi, M. H., Krystal, J. H., & McCarthy, G. (2017). Reevaluating the efficacy and predictability of antidepressant treatments: A symptom clustering approach. JAMA Psychiatry, 74(4), 370–378. https://doi.org/10.1001/jamapsychiatry.2017.0025 CrossRef Google Scholar PubMed

Chen, T., & Guestrin, C. (2016, August). Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 785–794).CrossRef Google Scholar

Costa, P. T., & McCrae, R. R. (1992). Normal personality assessment in clinical practice: The NEO Personality Inventory. Psychological Assessment, 4(1), 5–13. https://doi.org/10.1037/1040-3590.4.1.5 CrossRef Google Scholar

Drucker, H., Burges, C. J., Kaufman, L., Smola, A., & Vapnik, V. (1996). Support vector regression machines. Advances in Neural Information Processing Systems, 155–161.Google Scholar

Efron, B., Hastie, T., Johnstone, I., & Tibshirani, R. (2004). Least angle regression. The Annals of Statistics, 32(2), 407–499. https://doi.org/10.1214/009053604000000067 CrossRef Google Scholar

Fix, E., & Hodges, J. (1951). Discriminatory analysis – nonparametric discrimination: Consistency properties. Technical Report 21-49-004,4, U.S. Air Force, School of Aviation Medicine, Randolph Field, TX.Google Scholar

Friedman, J. (2001). Greedy function approximation: A gradient boosting machine. The Annals of Statistics, 29(5), 1189–1232. https://doi.org/10.1214/aos/1013203451 CrossRef Google Scholar

Friedman, J. H. (2002). Stochastic gradient boosting. Computational Statistics & Data Analysis, 38(4), 367–378. https://doi.org/10.1016/S0167–9473(01)00065-2 CrossRef Google Scholar

Fujimoto, N. (2008). Faster matrix-vector multiplication on GeForce 8800GTX. 2008 IEEE International Symposium on Parallel and Distributed Processing, 1–8. https://doi.org/10.1109/IPDPS.2008.4536350 CrossRef Google Scholar

Géron, A. (2019). Hands-on machine learning with Scikit-Learn & TensorFlow: Concepts, tools, and techniques to build intelligent systems. O’Reilly Media, Inc.Google Scholar

Gradus, J. L., Rosellini, A. J., Horváth-Puhó, E., Street, A. E., Galatzer-Levy, I., Jiang, T., Lash, T. L., & Sørensen, H. T. (2020). Prediction of sex-specific suicide risk using machine learning and single-payer health care registry data from Denmark. JAMA Psychiatry, 77(1), 25–34. https://doi.org/10.1001/jamapsychiatry.2019.2905 CrossRef Google Scholar

Groves, R. M., Fowler, F. J. Jr, Couper, M. P., Lepkowski, J. M., Singer, E., & Tourangeau, R. (2011). Survey methodology (Vol. 561). John Wiley & Sons.Google Scholar

Hansen, L. K., & Salamon, P. (1990). Neural network ensembles. IEEE Transactions on Pattern Analysis and Machine Intelligence, 12(10), 993–1001.CrossRef Google Scholar

Hassanien, A. E., Kilany, M., Houssein, E. H., & AlQaheri, H. (2018). Intelligent human emotion recognition based on elephant herding optimization tuned support vector regression. Biomedical Signal Processing and Control, 45, 182–191. https://doi.org/10.1016/j.bspc.2018.05.039 CrossRef Google Scholar

Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning: Data mining, inference, and prediction. Springer Science & Business Media.CrossRef Google Scholar

Hickman, L., Bosch, N., Ng, V., Saef, R., Tay, L., & Woo, S. E. (2022). Automated video interview personality assessments: Reliability, validity, and generalizability investigations. Journal of Applied Psychology, 107(8), 1323–1351. https://doi.org/10.1037/apl0000695 CrossRef Google Scholar PubMed

Hickman, L., Song, Q. C., & Woo, S. E. (2022). Evaluating data. In Murphy, K. R. (Ed.), Data, methods and theory in the organizational sciences (pp. 98–123). Society of Industrial and Organizational Psychology Organizational Frontiers Series. Routledge.CrossRef Google Scholar

Hoerl, A. E., & Kennard, R. W. (1970). Ridge regression: Biased estimation for nonorthogonal problems. Technometrics, 12(1), 55–67. https://doi.org/10.1080/00401706.1970.10488634 CrossRef Google Scholar

IBM Corp. (2020). IBM SPSS statistics for Windows, version 27.0. IBM Corp.Google Scholar

James, G., Witten, D., Hastie, T., & Tibshirani, R. (Eds.). (2013). An introduction to statistical learning: With applications in R. Springer.CrossRef Google Scholar

Jiang, T., Gradus, J. L., & Rosellini, A. J. (2020). Supervised machine learning: A brief primer. Behavior Therapy, 51(5), 675–687. https://doi.org/10.1016/j.beth.2020.05.002 CrossRef Google Scholar PubMed

Jung, Y., & Suh, Y. (2019). Mining the voice of employees: A text mining approach to identifying and analyzing job satisfaction factors from online employee reviews. Decision Support Systems, 123, 113074. https://doi.org/10.1016/j.dss.2019.113074 CrossRef Google Scholar

Kenney, J. F., & Keeping, E. S. (1962). Linear regression and correlation. Mathematics of Statistics, 1, 252–285.Google Scholar

Kotsiantis, S. B. (2007). Supervised machine learning: A review of classification techniques. Informatica (Slovenia), 31(3), 249–268.Google Scholar

Kouiroukidis, N., & Evangelidis, G. (2011). The effects of dimensionality curse in high dimensional kNN search. 2011 15th Panhellenic Conference on Informatics, 41–45. https://doi.org/10.1109/PCI.2011.45 CrossRef Google Scholar

Krajewski, R. (2020, November 26). Python vs R: What language is better for data science projects? Ideamotive. https://www.ideamotive.co/blog/python-vs-r-what-language-is-better-for-data-science-projects.Google Scholar

Kuhn, M. (2019). Parallel processing. In The caret package. https://topepo.github.io/caret/parallel-processing.html Google Scholar

Kuhn, M., & Johnson, K. (2013). Applied predictive modeling. Springer.CrossRef Google Scholar

Lu, Z., Pu, H., Wang, F., Hu, Z., & Wang, L. (2017). The expressive power of neural networks: A view from the wdth. ArXiv:1709.02540 [Cs]. http://arxiv.org/abs/1709.02540 Google Scholar

MacQueen, J. (1967, June). Some methods for classification and analysis of multivariate observations. Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, 1(14), 281–297.Google Scholar

Matusik, J. G., Heidl, R., Hollenbeck, J. R., Yu, A., Lee, H. W., & Howe, M. (2019). Wearable bluetooth sensors for capturing relational variables and temporal variability in relationships: A construct validation study. Journal of Applied Psychology, 104(3), 357–387. https://doi.org/10.1037/apl0000334 CrossRef Google Scholar PubMed

McKinney, W. (2012). Python for data analysis. O’Reilly Media.Google Scholar

Mitchell, T. M. (1997). Machine learning. McGraw-Hill.Google Scholar

Morgan, J. N., & Sonquist, J. A. (1963). Problems in the analysis of survey data, and a proposal. Journal of the American Statistical Association, 58, 415–434.CrossRef Google Scholar

Müller, A. C., & Guido, S. (2018). Introduction to machine learning with Python: A guide for data scientists. O’Reilly Media.Google Scholar

Myles, A. J., Feudale, R. N., Liu, Y., Woody, N. A., & Brown, S. D. (2004). An introduction to decision tree modeling. Journal of Chemometrics: A Journal of the Chemometrics Society, 18(6), 275–285. https://doi.org/10.1002/cem.873 CrossRef Google Scholar

Nelson, D. (2020). Data visualization in Python. StackAbuse.Google Scholar

Newman, D. A. (2014). Missing data: Five practical guidelines. Organizational Research Methods, 17(4), 372–411. https://doi.org/10.1177/1094428114548590 CrossRef Google Scholar

Park, G., Schwartz, H. A., Eichstaedt, J. C., Kern, M. L., Kosinski, M., Stillwell, D. J., Ungar, L. H., & Seligman, M. E. P. (2015). Automatic personality assessment through social media language. Journal of Personality and Social Psychology, 108(6), 934–952. https://doi.org/10.1037/pspp0000020 CrossRef Google Scholar PubMed

Polyak, S. T., von Davier, A. A., & Peterschmidt, K. (2017). Computational psychometrics for the measurement of collaborative problem solving skills . Frontiers in Psychology, 8. https://doi.org/10.3389/fpsyg.2017.02029 CrossRef Google Scholar PubMed

Putka, D. J., Beatty, A. S., & Reeder, M. C. (2018). Modern prediction methods: New perspectives on a common problem. Organizational Research Methods, 21(3), 689–732. https://doi.org/10.1177/1094428117697041 CrossRef Google Scholar

Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1(1), 81–106. https://doi.org/10.1007/BF00116251 CrossRef Google Scholar

Quinlan, J. R. (1996). Bagging, boosting, and C4.5. Proceedings of the Thirteenth National Conference on Artificial Intelligence, 725–730.Google Scholar

Roberts, M. E., Stewart, B. M., Tingley, D., Lucas, C., Leder-Luis, J., Gadarian, S. K., Albertson, B., & Rand, D. G. (2014). Structural topic models for open-ended survey responses. American Journal of Political Science, 58(4), 1064–1082. https://doi.org/10.1111/ajps.12103 CrossRef Google Scholar

RStudio (2020). Learn Shiny. https://shiny.rstudio.com/tutorial/Google Scholar

Rubin, D. B. (1976). Inference and missing data. Biometrika, 63(3), 581–592. https://doi.org/10.1093/biomet/63.3.581 CrossRef Google Scholar

Rudin, C. (2019). Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5), 206–215. https://doi.org/10.1038/s42256–019-0048-x CrossRef Google Scholar PubMed

Salehzadeh, R. (2017). Which types of leadership styles do followers prefer? A decision tree approach. International Journal of Educational Management, 31(7), 865–877. https://doi.org/10.1108/IJEM-04-2016-0079 Google Scholar

Speer, A. B. (2020). Scoring dimension-level job performance from narrative comments: Validity and generalizability when using natural language processing. Organizational Research Methods, 109442812093081. https://doi.org/10.1177/1094428120930815 CrossRef Google Scholar

Steinhaus, H. (1956). Sur la division des corps matériels en parties. Bulletin L’Académie Polonaise des Science, 1(804), 801–804.Google Scholar

Tay, L., Woo, S. E., Hickman, L., & Saef, R. M. (2020). Psychometric and validity issues in machine learning approaches to personality assessment: A focus on social media text mining. European Journal of Personality, 34(5), 826–844. https://doi.org/10.1002/per.2290 CrossRef Google Scholar

Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological), 58(1), 267–288. https://doi.org/10.1111/j.2517-6161.1996.tb02080.x Google Scholar

Van der Maaten, L., & Hinton, G. (2008). Visualizing data using t-SNE. Journal of Machine Learning Research, 9(11). http://www.jmlr.org/papers/v9/vandermaaten08a.html Google Scholar

Wang, W., Hernandez, I., Newman, D. A., He, J., & Bian, J. (2016). Twitter analysis: Studying US weekly trends in work stress and emotion. Applied Psychology, 65(2), 355–378. https://doi.org/10.1111/apps.12065 CrossRef Google Scholar

Wickham, H., & Grolemund, G. (2017). R for data science: Import, tidy, transform, visualize, and model data. O’Reilly Media.Google Scholar

Winkielman, P., Schwarz, N., Fazendeiro, T. A., & Reber, R. (2003). The hedonic marking of processing fluency: Implications for evaluative judgment. In Musch, J. & Klauer, K. C. (Eds.), The psychology of evaluation: Affective processes in cognition and emotion (pp. 189–217). Lawrence Erlbaum Associates Publishers.Google Scholar

Xu, H., Zhang, N., & Zhou, L. (2020). Validity concerns in research using organic data. Journal of Management, 46(7), 1257–1274. https://doi.org/10.1177/0149206319862027 CrossRef Google Scholar

Youyou, W., Kosinski, M., & Stillwell, D. (2015). Computer-based personality judgments are more accurate than those made by humans. Proceedings of the National Academy of Sciences, 112(4), 1036–1040. https://doi.org/10.1073/pnas.1418680112 CrossRef Google Scholar PubMed

Zhang, T. (2004). Solving large scale linear prediction problems using stochastic gradient descent algorithms. ICML 2004: Proceedings of The Twenty-first International Conference on Machine Learning (pp. 919–926). Omnipress. https://doi.org/10.1145/1015330.1015332 Google Scholar

Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2), 301–320. https://doi.org/10.1111/j.1467-9868.2005.00503.x CrossRef Google Scholar

Accessibility standard: Unknown

Accessibility compliance for the PDF of this book is currently unknown and may be updated in the future.