Statistical Significance of the Contribution of Variables to the PCA solution: An Alternative Permutation Strategy

Mariëlle Linting; Bart Jan van Os; Jacqueline J. Meulman

doi:10.1007/s11336-011-9216-6

Statistical Significance of the Contribution of Variables to the PCA solution: An Alternative Permutation Strategy

Published online by Cambridge University Press: 01 January 2025

Mariëlle Linting ,

Bart Jan van Os and

Jacqueline J. Meulman

Show author details

Mariëlle Linting*: Affiliation:
Leiden University, The Netherlands
Bart Jan van Os: Affiliation:
Leiden University, The Netherlands
Jacqueline J. Meulman: Affiliation:
Leiden University, The Netherlands
*: Requests for reprints should be sent to Mariëlle Linting, Department of Education and Child Studies, Leiden University, P.O. Box 9555, 2300 RB Leiden, The Netherlands. E-mail: linting@fsw.leidenuniv.nl

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

In this paper, the statistical significance of the contribution of variables to the principal components in principal components analysis (PCA) is assessed nonparametrically by the use of permutation tests. We compare a new strategy to a strategy used in previous research consisting of permuting the columns (variables) of a data matrix independently and concurrently, thus destroying the entire correlational structure of the data. This strategy is considered appropriate for assessing the significance of the PCA solution as a whole, but is not suitable for assessing the significance of the contribution of single variables. Alternatively, we propose a strategy involving permutation of one variable at a time, while keeping the other variables fixed. We compare the two approaches in a simulation study, considering proportions of Type I and Type II error. We use two corrections for multiple testing: the Bonferroni correction and controlling the False Discovery Rate (FDR). To assess the significance of the variance accounted for by the variables, permuting one variable at a time, combined with FDR correction, yields the most favorable results. This optimal strategy is applied to an empirical data set, and results are compared with bootstrap confidence intervals.

Keywords

principal components analysis permutation statistical significance component loadings p-values

Information

Type: Original Paper
Information: Psychometrika , Volume 76 , Issue 3 , July 2011 , pp. 440 - 460

DOI: https://doi.org/10.1007/s11336-011-9216-6 [Opens in a new window]
Copyright: Copyright © 2011 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Agresti, A., Coull, B.A. (1998). Approximate is better than ‘exact’ for interval estimation of binomial proportions. The American Statistician, 52, 119–126.Google Scholar

Anderson, M.J., Ter Braak, C.J.F. (2003). Permutation tests for multi-factorial analysis of variance. Journal of Statistical Computation and Simulation, 73, 85–113.CrossRef Google Scholar

Anderson, T.W. (1963). Asymptotic theory for principal component analysis. Annals of Mathematical Statistics, 34, 122–148.CrossRef Google Scholar

Benjamini, Y., Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society. Series B. Methodological, 57, 289–300.CrossRef Google Scholar

Buja, A., Eyuboglu, N. (1992). Remarks on parallel analysis. Multivariate Behavioral Research, 27, 509–540.CrossRef Google Scholar PubMed

Cohen, J. (1994). The earth is round (p<0.05). The American Psychologist, 49, 997–1003.CrossRef Google Scholar

De Leeuw, J., Van der Burg, E. (1986). The permutational limit distribution of generalized canonical correlations. In Diday, E. (Eds.), Data analysis and informatics, IV (pp. 509–521). Amsterdam: Elsevier.Google Scholar

Dietz, E.J. (1983). Permutation tests for association between two distance matrices. Systematic Zoology, 32, 21–26.CrossRef Google Scholar

Douglas, M.E., Endler, J.A. (1982). Quantitative matrix comparisons in ecological and evolutionary investigations. Journal of Theoretical Biology, 99, 777–795.CrossRef Google Scholar

Fabrigar, L.R., Wegener, D.T., MacCallum, R.C., Strahan, E.J. (1999). Evaluating the use of exploratory factor analysis in psychological research. Psychological Methods, 4, 272–299.CrossRef Google Scholar

Fisher, R.A. (1935). The design of experiments, Edinburgh: Oliver and Boyd.Google Scholar

Girshick, M.A. (1939). On the sampling theory of roots of determinantal equations. Annals of Mathematical Statistics, 10, 203–224.CrossRef Google Scholar

Glick, B.J. (1979). Tests for space-time clustering used in cancer research. Geographical Analysis, 11, 202–208.CrossRef Google Scholar

Gliner, J., Leech, N., Morgan, G. (2002). Problems with null hypothesis significance testing (NHST): What do the textbooks say?. Journal of Experimental Education, 71, 83–92.CrossRef Google Scholar

Good, P.I. (2000). Permutation tests: A practical guide to resampling methods for testing hypotheses, New York: Springer.CrossRef Google Scholar

Heiser, W.J., Meulman, J.J. (1994). Homogeneity analysis: Exploring the distribution of variables and their nonlinear relationships. In Greenacre, M., Blasius, J. (Eds.), Correspondence analysis in the social sciences: recent developments and applications (pp. 179–209). New York: Academic Press.Google Scholar

Horney, K. (1945). Our inner conflicts: a constructive theory of neurosis, New York: Norton.Google Scholar

Hubert, L.J. (1984). Statistical applications of linear assignment. Psychometrika, 49, 449–473.CrossRef Google Scholar

Hubert, L.J. (1985). Combinatorial data analysis: association and partial association. Psychometrika, 50, 449–467.CrossRef Google Scholar

Hubert, L.J. (1987). Assignment methods in combinatorial data analysis, New York: Marcel Dekker.Google Scholar

Hubert, L.J., Schultz, J. (1976). Quadratic assignment as a general data analysis strategy. British Journal of Mathematical & Statistical Psychology, 29, 190–241.CrossRef Google Scholar

Jolliffe, I.T. (2002). Principal component analysis, New York: Springer.Google Scholar

Keselman, H., Cribbie, R., Holland, B. (1999). The pairwise multiple comparison multiplicity problem: an alternative approach to familywise and comparisonwise Type I error control. Psychological Methods, 4, 58–69.CrossRef Google Scholar

Killeen, P.R. (2005). An alternative to null-hypothesis significance tests. Psychological Science, 16, 345–353.CrossRef Google Scholar PubMed

Killeen, P.R. (2006). Beyond statistical inference: a decision theory for science. Psychonomic Bulletin & Review, 13, 549–562.CrossRef Google Scholar PubMed

Landgrebe, J., Wurst, W., Welzl, G. (2002). Permutation-validated principal components analysis of microarray data. Genome Biology, 3.CrossRef Google Scholar PubMed

Lin, S.P., Bendel, R.B. (1985). Algorithm AS 213: generation of population correlation matrices with specified eigenvalues. Applied Statistics, 34, 193–198.CrossRef Google Scholar

Linting, M., Meulman, J.J., Groenen, P.J.F., van der Kooij, A.J. (2007). Nonlinear principal components analysis: introduction and application. Psychological Methods, 12, 336–358.CrossRef Google Scholar PubMed

Linting, M., Meulman, J.J., Groenen, P.J.F., van der Kooij, A.J. (2007). Stability of nonlinear principal components analysis: an empirical study using the balanced bootstrap. Psychological Methods, 12, 359–379.CrossRef Google Scholar PubMed

Mantel, N. (1967). The detection of disease clustering and a generalized regression approach. Cancer Research, 27, 209–220.Google Scholar

Meulman, J.J. (1992). The integration of multidimensional scaling and multivariate analysis with optimal transformations of the variables. Psychometrika, 57, 539–565.CrossRef Google Scholar

Meulman, J.J. (1993). Nonlinear principal coordinates analysis: minimizing the sum of squares of the smallest eigenvalues. British Journal of Mathematical & Statistical Psychology, 46, 287–300.CrossRef Google Scholar

Meulman, J.J. (1996). Fitting a distance model to homogeneous subsets of variables: points of view analysis of categorical data. Journal of Classification, 13, 249–266.CrossRef Google Scholar

Meulman, J.J., Van der Kooij, A.J., Heiser, W.J. (2004). Principal components analysis with nonlinear optimal scaling transformations for ordinal and nominal data. In Kaplan, D. (Eds.), Handbook of quantitative methodology for the social sciences (pp. 49–70). London: Sage Publications.Google Scholar

NICHD Early, Child Care Research Network (1996). Characteristics of infant child care: factors contributing to positive caregiving. Early Childhood Research Quarterly, 11, 269–306.Google Scholar

Noreen, E.W. (1989). Computer intensive methods for testing hypotheses, New York: Wiley.Google Scholar

Ogasawara, H. (2004). Asymptotic biases of the unrotated/rotated solutions in principal component analysis. British Journal of Mathematical & Statistical Psychology, 57, 353–376.CrossRef Google Scholar PubMed

Peres-Neto, P.R., Jackson, D.A., Somers, K.M. (2003). Giving meaningful interpretation to ordination axes: assessing loading significance in principal component analysis. Ecology, 84, 2347–2363.CrossRef Google Scholar

Shaffer, J.P. (1995). Multiple hypothesis testing. Annual Review of Psychology, 46, 561–584.CrossRef Google Scholar

Smouse, P.E., Long, J., Sokal, R.R. (1985). Multiple regression and correlation extensions of the Mantel test of matrix correspondence. Systematic Zoology, 35, 627–632.CrossRef Google Scholar

Sokal, R.R. (1979). Testing statistical significance of geographical variation. Systematic Zoology, 28, 227–232.CrossRef Google Scholar

Ter Braak, C.J.F. (1992). Permutation versus bootstrap significance tests in multiple regression and ANOVA. In Jöckel, K.H., Rothe, G., Sendler, W. (Eds.), Bootstrapping and related techniques (pp. 79–86). Berlin: Springer.CrossRef Google Scholar

Timmerman, M.E., Kiers, H.A.L., Smilde, A.K. (2007). Estimating confidence intervals for principal component loadings: a comparison between the bootstrap and asymptotic results. British Journal of Mathematical & Statistical Psychology, 60, 295–314.CrossRef Google Scholar PubMed

Verhoeven, K., Simonsen, K., McIntyre, L. (2005). Implementing false discovery rate control: increasing your power. Oikos, 108, 643–647.CrossRef Google Scholar

Wilson, E.B. (1927). Probable inference, the law of succession, and statistical inference. Journal of the American Statistical Association, 22, 209–212.CrossRef Google Scholar

Article contents

Statistical Significance of the Contribution of Variables to the PCA solution: An Alternative Permutation Strategy

Abstract

Keywords

Information

Access options

Article purchase

Temporarily unavailable

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests