Hostname: page-component-745bb68f8f-g4j75 Total loading time: 0 Render date: 2025-01-07T19:13:39.004Z Has data issue: false hasContentIssue false

Statistical Significance of the Contribution of Variables to the PCA solution: An Alternative Permutation Strategy

Published online by Cambridge University Press:  01 January 2025

Mariëlle Linting*
Affiliation:
Leiden University, The Netherlands
Bart Jan van Os
Affiliation:
Leiden University, The Netherlands
Jacqueline J. Meulman
Affiliation:
Leiden University, The Netherlands
*
Requests for reprints should be sent to Mariëlle Linting, Department of Education and Child Studies, Leiden University, P.O. Box 9555, 2300 RB Leiden, The Netherlands. E-mail: linting@fsw.leidenuniv.nl

Abstract

In this paper, the statistical significance of the contribution of variables to the principal components in principal components analysis (PCA) is assessed nonparametrically by the use of permutation tests. We compare a new strategy to a strategy used in previous research consisting of permuting the columns (variables) of a data matrix independently and concurrently, thus destroying the entire correlational structure of the data. This strategy is considered appropriate for assessing the significance of the PCA solution as a whole, but is not suitable for assessing the significance of the contribution of single variables. Alternatively, we propose a strategy involving permutation of one variable at a time, while keeping the other variables fixed. We compare the two approaches in a simulation study, considering proportions of Type I and Type II error. We use two corrections for multiple testing: the Bonferroni correction and controlling the False Discovery Rate (FDR). To assess the significance of the variance accounted for by the variables, permuting one variable at a time, combined with FDR correction, yields the most favorable results. This optimal strategy is applied to an empirical data set, and results are compared with bootstrap confidence intervals.

Type
Original Paper
Copyright
Copyright © 2011 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Agresti, A., Coull, B.A. (1998). Approximate is better than ‘exact’ for interval estimation of binomial proportions. The American Statistician, 52, 119126.Google Scholar
Anderson, M.J., Ter Braak, C.J.F. (2003). Permutation tests for multi-factorial analysis of variance. Journal of Statistical Computation and Simulation, 73, 85113.CrossRefGoogle Scholar
Anderson, T.W. (1963). Asymptotic theory for principal component analysis. Annals of Mathematical Statistics, 34, 122148.CrossRefGoogle Scholar
Benjamini, Y., Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society. Series B. Methodological, 57, 289300.CrossRefGoogle Scholar
Buja, A., Eyuboglu, N. (1992). Remarks on parallel analysis. Multivariate Behavioral Research, 27, 509540.CrossRefGoogle ScholarPubMed
Cohen, J. (1994). The earth is round (p<0.05). The American Psychologist, 49, 9971003.CrossRefGoogle Scholar
De Leeuw, J., Van der Burg, E. (1986). The permutational limit distribution of generalized canonical correlations. In Diday, E. (Eds.), Data analysis and informatics, IV (pp. 509521). Amsterdam: Elsevier.Google Scholar
Dietz, E.J. (1983). Permutation tests for association between two distance matrices. Systematic Zoology, 32, 2126.CrossRefGoogle Scholar
Douglas, M.E., Endler, J.A. (1982). Quantitative matrix comparisons in ecological and evolutionary investigations. Journal of Theoretical Biology, 99, 777795.CrossRefGoogle Scholar
Fabrigar, L.R., Wegener, D.T., MacCallum, R.C., Strahan, E.J. (1999). Evaluating the use of exploratory factor analysis in psychological research. Psychological Methods, 4, 272299.CrossRefGoogle Scholar
Fisher, R.A. (1935). The design of experiments, Edinburgh: Oliver and Boyd.Google Scholar
Girshick, M.A. (1939). On the sampling theory of roots of determinantal equations. Annals of Mathematical Statistics, 10, 203224.CrossRefGoogle Scholar
Glick, B.J. (1979). Tests for space-time clustering used in cancer research. Geographical Analysis, 11, 202208.CrossRefGoogle Scholar
Gliner, J., Leech, N., Morgan, G. (2002). Problems with null hypothesis significance testing (NHST): What do the textbooks say?. Journal of Experimental Education, 71, 8392.CrossRefGoogle Scholar
Good, P.I. (2000). Permutation tests: A practical guide to resampling methods for testing hypotheses, New York: Springer.CrossRefGoogle Scholar
Heiser, W.J., Meulman, J.J. (1994). Homogeneity analysis: Exploring the distribution of variables and their nonlinear relationships. In Greenacre, M., Blasius, J. (Eds.), Correspondence analysis in the social sciences: recent developments and applications (pp. 179209). New York: Academic Press.Google Scholar
Horney, K. (1945). Our inner conflicts: a constructive theory of neurosis, New York: Norton.Google Scholar
Hubert, L.J. (1984). Statistical applications of linear assignment. Psychometrika, 49, 449473.CrossRefGoogle Scholar
Hubert, L.J. (1985). Combinatorial data analysis: association and partial association. Psychometrika, 50, 449467.CrossRefGoogle Scholar
Hubert, L.J. (1987). Assignment methods in combinatorial data analysis, New York: Marcel Dekker.Google Scholar
Hubert, L.J., Schultz, J. (1976). Quadratic assignment as a general data analysis strategy. British Journal of Mathematical & Statistical Psychology, 29, 190241.CrossRefGoogle Scholar
Jolliffe, I.T. (2002). Principal component analysis, New York: Springer.Google Scholar
Keselman, H., Cribbie, R., Holland, B. (1999). The pairwise multiple comparison multiplicity problem: an alternative approach to familywise and comparisonwise Type I error control. Psychological Methods, 4, 5869.CrossRefGoogle Scholar
Killeen, P.R. (2005). An alternative to null-hypothesis significance tests. Psychological Science, 16, 345353.CrossRefGoogle ScholarPubMed
Killeen, P.R. (2006). Beyond statistical inference: a decision theory for science. Psychonomic Bulletin & Review, 13, 549562.CrossRefGoogle ScholarPubMed
Landgrebe, J., Wurst, W., Welzl, G. (2002). Permutation-validated principal components analysis of microarray data. Genome Biology, 3.CrossRefGoogle ScholarPubMed
Lin, S.P., Bendel, R.B. (1985). Algorithm AS 213: generation of population correlation matrices with specified eigenvalues. Applied Statistics, 34, 193198.CrossRefGoogle Scholar
Linting, M., Meulman, J.J., Groenen, P.J.F., van der Kooij, A.J. (2007). Nonlinear principal components analysis: introduction and application. Psychological Methods, 12, 336358.CrossRefGoogle ScholarPubMed
Linting, M., Meulman, J.J., Groenen, P.J.F., van der Kooij, A.J. (2007). Stability of nonlinear principal components analysis: an empirical study using the balanced bootstrap. Psychological Methods, 12, 359379.CrossRefGoogle ScholarPubMed
Mantel, N. (1967). The detection of disease clustering and a generalized regression approach. Cancer Research, 27, 209220.Google Scholar
Meulman, J.J. (1992). The integration of multidimensional scaling and multivariate analysis with optimal transformations of the variables. Psychometrika, 57, 539565.CrossRefGoogle Scholar
Meulman, J.J. (1993). Nonlinear principal coordinates analysis: minimizing the sum of squares of the smallest eigenvalues. British Journal of Mathematical & Statistical Psychology, 46, 287300.CrossRefGoogle Scholar
Meulman, J.J. (1996). Fitting a distance model to homogeneous subsets of variables: points of view analysis of categorical data. Journal of Classification, 13, 249266.CrossRefGoogle Scholar
Meulman, J.J., Van der Kooij, A.J., Heiser, W.J. (2004). Principal components analysis with nonlinear optimal scaling transformations for ordinal and nominal data. In Kaplan, D. (Eds.), Handbook of quantitative methodology for the social sciences (pp. 4970). London: Sage Publications.Google Scholar
NICHD Early, Child Care Research Network (1996). Characteristics of infant child care: factors contributing to positive caregiving. Early Childhood Research Quarterly, 11, 269306.Google Scholar
Noreen, E.W. (1989). Computer intensive methods for testing hypotheses, New York: Wiley.Google Scholar
Ogasawara, H. (2004). Asymptotic biases of the unrotated/rotated solutions in principal component analysis. British Journal of Mathematical & Statistical Psychology, 57, 353376.CrossRefGoogle ScholarPubMed
Peres-Neto, P.R., Jackson, D.A., Somers, K.M. (2003). Giving meaningful interpretation to ordination axes: assessing loading significance in principal component analysis. Ecology, 84, 23472363.CrossRefGoogle Scholar
Shaffer, J.P. (1995). Multiple hypothesis testing. Annual Review of Psychology, 46, 561584.CrossRefGoogle Scholar
Smouse, P.E., Long, J., Sokal, R.R. (1985). Multiple regression and correlation extensions of the Mantel test of matrix correspondence. Systematic Zoology, 35, 627632.CrossRefGoogle Scholar
Sokal, R.R. (1979). Testing statistical significance of geographical variation. Systematic Zoology, 28, 227232.CrossRefGoogle Scholar
Ter Braak, C.J.F. (1992). Permutation versus bootstrap significance tests in multiple regression and ANOVA. In Jöckel, K.H., Rothe, G., Sendler, W. (Eds.), Bootstrapping and related techniques (pp. 7986). Berlin: Springer.CrossRefGoogle Scholar
Timmerman, M.E., Kiers, H.A.L., Smilde, A.K. (2007). Estimating confidence intervals for principal component loadings: a comparison between the bootstrap and asymptotic results. British Journal of Mathematical & Statistical Psychology, 60, 295314.CrossRefGoogle ScholarPubMed
Verhoeven, K., Simonsen, K., McIntyre, L. (2005). Implementing false discovery rate control: increasing your power. Oikos, 108, 643647.CrossRefGoogle Scholar
Wilson, E.B. (1927). Probable inference, the law of succession, and statistical inference. Journal of the American Statistical Association, 22, 209212.CrossRefGoogle Scholar