Hostname: page-component-5f745c7db-xx4dx Total loading time: 0 Render date: 2025-01-06T07:04:28.890Z Has data issue: true hasContentIssue false

Principal Covariates Clusterwise Regression (PCCR): Accounting for Multicollinearity and Population Heterogeneity in Hierarchically Organized Data

Published online by Cambridge University Press:  01 January 2025

Tom Frans Wilderjans*
Affiliation:
Leiden University KU Leuven
Eva Vande Gaer
Affiliation:
KU Leuven
Henk A. L. Kiers
Affiliation:
University of Groningen
Iven Van Mechelen
Affiliation:
KU Leuven
Eva Ceulemans
Affiliation:
KU Leuven
*
Correspondence should be made to Tom Frans Wilderjans, Methodology and Statistics Unit, Institute of Psychology, Faculty of Social and Behavioral Sciences, Leiden University, Wassenaarseweg 52 (Pieter de la Court Building), 2333 AK Leiden, The Netherlands. Email: t.f.wilderjans@fsw.leidenuniv.nl

Abstract

In the behavioral sciences, many research questions pertain to a regression problem in that one wants to predict a criterion on the basis of a number of predictors. Although in many cases, ordinary least squares regression will suffice, sometimes the prediction problem is more challenging, for three reasons: first, multiple highly collinear predictors can be available, making it difficult to grasp their mutual relations as well as their relations to the criterion. In that case, it may be very useful to reduce the predictors to a few summary variables, on which one regresses the criterion and which at the same time yields insight into the predictor structure. Second, the population under study may consist of a few unknown subgroups that are characterized by different regression models. Third, the obtained data are often hierarchically structured, with for instance, observations being nested into persons or participants within groups or countries. Although some methods have been developed that partially meet these challenges (i.e., principal covariates regression (PCovR), clusterwise regression (CR), and structural equation models), none of these methods adequately deals with all of them simultaneously. To fill this gap, we propose the principal covariates clusterwise regression (PCCR) method, which combines the key idea’s behind PCovR (de Jong & Kiers in Chemom Intell Lab Syst 14(1–3):155–164, 1992) and CR (Späth in Computing 22(4):367–373, 1979). The PCCR method is validated by means of a simulation study and by applying it to cross-cultural data regarding satisfaction with life.

Type
Original Paper
Copyright
Copyright © 2016 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Arminger, G., & Stein, P. (1997). Finite mixtures of covariance structure models with regressors: Loglikelihood function, minimum distance estimation, fit indices, and a complex example. Sociological Methods & Research, 26, (2), 148182. doi:10.1177/0049124197026002002.CrossRefGoogle Scholar
Brusco, M. J., & Cradit, J. D. (2001). A variable selection heuristic for K-means clustering. Psychometrika, 66, 249270. doi:10.1007/BF02294838.CrossRefGoogle Scholar
Brusco, M. J., Cradit, J. D., Steinley, D., & Fox, G. L. (2008). Cautionary remarks on the use of clusterwise regression. Multivariate Behavioral Research, 43, (1), 2949. doi:10.1080/00273170701836653.CrossRefGoogle ScholarPubMed
Brusco, M. J., Cradit, J. D., & Tashchian, A. (2003). Multicriterion clusterwise regression for joint segmentation settings: An application to customer value. Journal of Marketing Research, 40, (2), 225234.CrossRefGoogle Scholar
Ceulemans, E., & Kiers, HAL (2009). Discriminating between strong and weak structures in three-mode principal component analysis. British Journal of Mathematical & Statistical Psychology, 62, 601620. doi:10.1348/000711008X369474.CrossRefGoogle ScholarPubMed
Ceulemans, E., Kuppens, P., & Van Mechelen, I. (2012). Capturing the structure of distinct types of individual differences in the situation-specific experience of emotions: The case of anger. European Journal of Personality, 26, 484495. doi:10.1002/per.847.CrossRefGoogle Scholar
Ceulemans, E., & Van Mechelen, I. (2008). CLASSI: A classification model for the study of sequential processes and individual differences therein. Psychometrika, 73, 107124. doi:10.1007/s11336-007-9024-1.CrossRefGoogle Scholar
Ceulemans, E., Van Mechelen, I., & Leenen, I. (2007). The local minima problem in hierarchical classes analysis: an evaluation of a simulated annealing algorithm and various multistart procedures. Psychometrika, 72, 377391.CrossRefGoogle Scholar
Cohen, J. (1992). A power primer. Psychological Bulletin, 112, 155159. doi:10.1037/0033-2909-112-1-155.CrossRefGoogle ScholarPubMed
Coxe, K. L., Kotz, S., Johnson, N. L., & Read, C. B. (1986). Principal components regression analysis. Encyclopedia of statistical sciences, New York: Wiley 181184.Google Scholar
de Jong, S., & Kiers, HAL (1992). Principal covariates regression: Part I. Theory. Chemometrics and Intelligent Laboratory Systems, 14, (1–3), 155164. doi:10.1016/0169-7439(92)80100-I.CrossRefGoogle Scholar
De Roover, K., Ceulemans, E., Timmerman, M. E., Vansteelandt, K., Stouten, J., & Onghena, P. (2012). Clusterwise simultaneous component analysis for analyzing structural differences in multivariate multiblock data. Psychological Methods, 17, 100119. doi:10.1037/a0025385.CrossRefGoogle ScholarPubMed
DeSarbo, W. S., & Cron, W. L. (1988). A maximum likelihood methodology for clusterwise linear regression. Journal of Classification, 5, 249282.CrossRefGoogle Scholar
DeSarbo, W. S., & Edwards, E. A. (1996). Typologies of compulsive buying behavior: A constrained clusterwise regression approach. Journal of Consumer Psychology, 5, (3), 231262.CrossRefGoogle Scholar
DeSarbo, W. S., Oliver, R. L., & Rangaswamy, A. (1989). A simulated annealing methodology for clusterwise linear regression. Psychometrika, 54, (4), 707736.CrossRefGoogle Scholar
Hahn, C., Johnson, M. D., Herrmann, A., & Huber, F. (2002). Capturing customer heterogeneity using a finite mixture PLS approach. Schmalenbach Business Review, 54, (3), 243269.CrossRefGoogle Scholar
Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2, (1), 193218.CrossRefGoogle Scholar
Kaiser, H. F. (1958). The varimax criterion for analytic rotation in factor analysis. Psychometrika, 23, 187200.CrossRefGoogle Scholar
Kiers, HAL (1989). Three-way methods for the analysis of qualitative and quantitative two-way data, Leiden: DSWO Press.Google Scholar
Kiers, HAL, & Smilde, A. (2007). A comparison of various methods for multivariate regression with highly collinear variables. Statistical Methods & Applications, 16, (2), 193228. doi:10.1007/s10260-006-0025-5.CrossRefGoogle Scholar
Kiers, HAL, & ten Berge, JMF (1992). Minimization of a class of matrix trace functions by means of refined majorization. Psychometrika, 57, 371382.CrossRefGoogle Scholar
Korth, B., & Tucker, L. R. (1975). The distribution of chance congruence coefficients from simulated data. Psychometrika, 40, (3), 361372.CrossRefGoogle Scholar
Kroonenberg, P. M. (2008). Applied multiway data analysis, Hoboken, NJ: Wiley.CrossRefGoogle Scholar
Kroonenberg, P. M. (1983). Three-mode principal component analysis: Theory and applications, Leiden: DSWO Press.Google Scholar
Kuppens, P., Ceulemans, E., Timmerman, M. E., Diener, E., & Kim-Prieto, C. (2006). Universal intracultural and intercultural dimensions of the recalled frequency of emotional experience. Journal of Cross-Cultural Psychology, 37, 491515. doi:10.1177/0022022106290474.CrossRefGoogle Scholar
Leisch, F. (2004). FlexMix: A general framework for finite mixture models and latent class regression in R. Journal of Statistical Software, 11, (8), 118.CrossRefGoogle Scholar
Roa, C. R. (1964). The use and interpretation of principal component analysis in applied research. Sankhyā: The Indian Journal of Statistics, Series A, 26, (4), 329358.Google Scholar
Sarstedt, M., & Ringle, C. M. (2010). Treating unobserved heterogeneity in PLS path modeling: a comparison of FIMIX-PLS with different data analysis strategies. Journal of Applied Statistics, 37, (8), 12991318. doi:10.1080/02664760903030213.CrossRefGoogle Scholar
Schott, J. R. (2005). Matrix analysis for statistics, 2Hoboken, NJ: Wiley.Google Scholar
Späth, H. (1979). Algorithm 39: Clusterwise linear regression. Computing, 22, (4), 367373. doi:10.1007/BF02265317.CrossRefGoogle Scholar
Späth, H. (1981). Correction to algorithm 39: Clusterwise linear regression. Computing, 26, (3), 275275. doi:10.1007/BF02243486.CrossRefGoogle Scholar
Steinley, D. (2003). Local optima in K-means clustering: What you don’t know may hurt you. Psychological Methods, 8, 294304. doi:10.1037/1082-989X.8.3.294.CrossRefGoogle ScholarPubMed
Steinley, D. (2004). Properties of the Hubert-Arabie adjusted rand index. Psychological Methods, 9, (3), 386396. doi:10.1037/1082-989X.9.3.386CrossRefGoogle ScholarPubMed
Stormshak, E. A., Bierman, K. L., Bruschi, C., Dodge, K. A., & Coie, J. D., The Conduct Problems Prevention Research Group. (1999). The relation between behavior problems and peer preference in different classroom contexts. Child Development, 70(1), 169–182.CrossRefGoogle Scholar
ten Berge, JMF (1977). Orthogonal procrustes rotation for two or more matrices. Psychometrika, 42, (2), 267276.CrossRefGoogle Scholar
Tucker, L. R. (1951). A method for synthesis of factor analysis studies. Personnel Research Section Rapport #984. Washington, DC: Department of the Army.Google Scholar
van den Berg, R. A., Hoefsloot, HCJ, Westerhuis, J. A., Smilde, A. K., & van der Werf, M. J. (2006). Centering, scaling and transformations: improving the biological information content of metabolomics data. BMC Genomics, 7, (1), 142157. doi:10.1186/1471-2164-7-142.CrossRefGoogle ScholarPubMed
van den Berg, R. A., Van Mechelen, I., Wilderjans, T. F., Van Deun, K., Kiers, HAL, & Smilde, A. K. (2009). Integrating functional genomics data using maximum likelihood based simultaneous component analysis. BMC Bioinformatics, 10, 340. doi:10.1186/1471-2105-10-340.CrossRefGoogle ScholarPubMed
Van Deun, K., Smilde, A. K., van der Werf, M. J., Kiers, HAL, & Van Mechelen, I. (2009). A structured overview of simultaneous component based data integration. BMC Bioinformatics, 10, 246. doi:10.1186/1471-2105-10-246.CrossRefGoogle ScholarPubMed
Vervloet, M., Van Deun, K., Van den Noortgate, W., & Ceulemans, E. (2013). On the selection of the weighting parameter value in principal covariates regression. Chemometrics and Intelligent Laboratory Systems, 123, 3643. doi:10.1016/j.chemolab.2013.02.005.CrossRefGoogle Scholar
Wedel, M., & DeSarbo, W. S. (1995). A mixture likelihood approach for generalized linear models. Journal of Classification, 12, 2155.CrossRefGoogle Scholar
Wilderjans, T. F., & Ceulemans, E. (2013). Clusterwise Parafac to identify heterogeneity in three-way data. Chemometrics and Intelligent Laboratory Systems, 129, 8797. doi:10.1016/j.chemolab.2013.09.010.CrossRefGoogle Scholar
Wilderjans, T. F., Ceulemans, E., & Kuppens, P. (2012). Clusterwise HICLAS: A generic modeling strategy to trace similarities and differences in multi-block binary data. Behavior Research Methods, 44, 532545. doi:10.3758/s13428-011-0166-9.CrossRefGoogle Scholar
Wilderjans, T. F., Ceulemans, E., & Van Mechelen, I. (2009). Simultaneous analysis of coupled data blocks differing in size: A comparison of two weighting schemes. Computational Statistics and Data Analysis, 53, 10861098. doi:10.1016/j.csda.2008.09.031.CrossRefGoogle Scholar
Wilderjans, T. F., Ceulemans, E., Van Mechelen, I., & van den Berg, R. A. (2011). Simultaneous analysis of coupled data matrices subject to different amounts of noise. British Journal of Mathematical and Statistical Psychology, 64, 277290. doi:10.1348/000711010X513263.CrossRefGoogle ScholarPubMed
Wold, H., Krishnaiah, P. R. (1966). Estimation of principal component and related methods by iterative least squares. Multivariate analysis, New York: Academic Press 391420.Google Scholar