Hostname: page-component-745bb68f8f-b6zl4 Total loading time: 0 Render date: 2025-01-07T18:32:41.855Z Has data issue: false hasContentIssue false

A Comparison of Three Predictor Selection Techniques in Multiple Regression

Published online by Cambridge University Press:  01 January 2025

Robert L. McCornack*
Affiliation:
San Diego State College

Abstract

Three methods for selecting a few predictors from the many available are described and compared with respect to shrinkage in cross-validation. From 2 to 6 predictors were selected from the 15 available in 100 samples ranging in size from 25 to 200. An iterative method was found to select predictors with slightly, but consistently, higher cross-validities than the popularly used stepwise method. A gradient method was found to equal the performance of the stepwise method only in the larger samples and for the largest predictor subsets.

Type
Original Paper
Copyright
Copyright © 1970 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Anderson, H. E. & Fruchter, B. Some multiple correlation and predictor selection methods. Psychometrika, 1960, 25, 5976.CrossRefGoogle Scholar
Anscombe, F. J. Topics in the investigation of linear relations fitted by the method of least squares. Journal of the Royal Statistical Society, 1967, 29, 152.CrossRefGoogle Scholar
Burket, G. R. A study of reduced rank models for multiple prediction. Psychometric Monographs, No. 12, 1964.Google Scholar
Cochran, W. G.The omission or addition of an independent variate in multiple linear regression. Journal of the Royal Statistical Society, 1938, 5, 171176.CrossRefGoogle Scholar
Cooley, W. W., & Lohnes, P. R. Multivariate procedures for the behavioral sciences, 1962, New York: Wiley.Google Scholar
Cureton, E. E. Approximate linear restraints and best predictor weights. Educational and Psychological Measurement, 1951, 11, 1215.CrossRefGoogle Scholar
Dixon, W. J. Biomedical computer programs, 1965, Los Angeles: UCLA Student Store.Google Scholar
Draper, N. R. & Smith, H. Applied regression analysis, 1966, New York: Wiley.Google Scholar
DuBois, P. H. Multivariate correlational analysis, 1957, New York: Harper.Google Scholar
Dwyer, P. S. The square root method and its use in correlation and regression. Journal of the American Statistical Association, 1945, 40, 493503.CrossRefGoogle Scholar
Efroymson, M. A. Multiple regression analysis. In Ralston, A. & Wilf, H. S. (Eds.), Mathematical methods for digital computers, 1960, New York: Wiley.Google Scholar
Elfving, G., Sitgreaves, R. & Solomon, H. Item selection for item variables with a known factor structure. Psychometrika, 1959, 24, 189205.CrossRefGoogle Scholar
Fisher, R. A. Statistical methods for research workers, 6th ed., Edinburgh: Oliver and Boyd, 1936.Google Scholar
Fruchter, B. & Anderson, H. E. Geometrical representation of two methods of linear least squares multiple correlation. Psychometrika, 1961, 26, 433442.CrossRefGoogle Scholar
Garside, M. J. The best subset in multiple regression analysis. Applied Statistics Journal of the Royal Statistical Society, 1965, 14, 196200.Google Scholar
Gorman, J. W. & Toman, R. J. Selection of variables for fitting equations to data. Technometrics, 1966, 8, 2751.CrossRefGoogle Scholar
Graybill, F. A. An introduction to linear statistical models, 1961, New York: McGraw-Hill.Google Scholar
Greenberger, M. H.& Ward, J. H. An iterative technique for multiple correlation analysis. IBM Technical Newsletter, 1956, 12, 8597.Google Scholar
Hamaker, H. C. On multiple regression analyses. Statistica Neerlandica, 1962, 16, 3156.CrossRefGoogle Scholar
Hemmerle, W. J. Statistical computations on a digital computer, 1967, Waltham, Mass.: Blaisdell.Google Scholar
Hocking, R. R. & Leslie, R. N. Selection of the best subset in regression analysis. Technometrics, 1967, 9, 531540.CrossRefGoogle Scholar
Horst, P. (Ed.) The prediction of personal adjustment. New York: Social Science Research Council Bulletin 48, 1941.Google Scholar
Horst, P. & Smith, S. The discrimination of two racial samples. Psychometrika, 1950, 15, 271289.CrossRefGoogle ScholarPubMed
Householder, A. S. Principles of numerical analysis, 1953, New York: McGraw-Hill.Google Scholar
International Business Machines Corporation. System/360 scientific subroutine package, 1966, New York: White Plains.Google Scholar
Jennings, E. Matrix formulas for part and partial correlation. Psychometrika, 1965, 30, 353356.CrossRefGoogle ScholarPubMed
Kelley, T. L. & Salisbury, F. S. An iteration method for determining multiple correlation constants. Journal of the American Statistical Association, 1926, 21, 282292.CrossRefGoogle Scholar
Leiman, J. M. The calculation of regression weights from common factor loadings. Unpublished doctoral dissertation: University of Washington, 1951.Google Scholar
Lev, J. Maximizing test battery prediction when the weights are required to be non-negative. Psychometrika, 1956, 21, 245252.CrossRefGoogle Scholar
Li, J. C. R. Statistical Inference II, 1964, Ann Arbor, Mich.: Edwards Brothers.Google Scholar
Linhart, H. A criterion for selecting variables in a regression analysis. Psychometrika, 1960, 25, 4558.CrossRefGoogle Scholar
Lubin, A. & Summerfield, A. A square root method for selecting a minimum set of variables in multiple regression: II. Psychometrika, 1951, 16, 425437.CrossRefGoogle Scholar
Mann, H. B. Analysis and design of experiments, 1949, New York: Dover.Google Scholar
Oosterhoff, J. On the selection of independent variables in a regression equation, 1963, Amsterdam: Stichting Mathematisch Centrum.Google Scholar
Rao, C. R. Linear statistical inference and its applications, 1965, New York: Wiley.Google Scholar
Rhyne, A. L. Jr. & Steel, R. G. D. Tables for a treatments versus control multiple comparisons sign test. Technometrics, 1965, 7, 293306.CrossRefGoogle Scholar
Scheffe, H. The analysis of variance, 1959, New York: Wiley.Google Scholar
Searle, S. R. Matrix algebra for the biological sciences, 1966, New York: Wiley.Google Scholar
Shine, L. C. The relative efficiency of test selection methods in crossvalidation on generated data. Educational and Psychological Measurement, 1966, 26, 833846.CrossRefGoogle Scholar
Steel, R. G. D. A multiple comparison sign test: treatments versus control. Journal of the American Statistical Association, 1959, 54, 767775.CrossRefGoogle Scholar
Summerfield, A. & Lubin, A. A square root method of selecting a minimum set of variables in multiple regression: I. Psychometrika, 1951, 16, 271284.CrossRefGoogle Scholar
Thomas, G. B. Calculus and analytic geometry, 1960, Reading, Mass.: Addison-Wesley.Google Scholar
Thorndike, R. L. Personnel selection, 1949, New York: Wiley.Google Scholar
Toops, H. A. The L-method. Psychometrika, 1941, 6, 249266.CrossRefGoogle Scholar
Veldman, D. J. Fortran programming for the behavioral sciences, 1967, New York: Holt.Google Scholar
Watson, F. R. A new method for solving simultaneous linear equations associated with multivariate analysis. Psychometrika, 1964, 29, 7586.CrossRefGoogle Scholar
Wherry, R. J. A new formula for predicting the shrinkage of the coefficient of multiple correlation. Annals of Mathematical Statistics, 1931, 2, 440451.CrossRefGoogle Scholar
Wherry, R. J. & Gaylord, R. H. Test selection with integral gross score weights. Psychometrika, 1946, 11, 173183.CrossRefGoogle Scholar
Wherry, R. J., Stead, W. H. & Shartle, C. P. Occupational counseling techniques, 1940, New York: American Book Company.Google Scholar
Wood, K. R., McCornack, R. L. & Villone, L. Multiple regression with subsetting of variables, 1962, Santa Monica, California: System Development Corporation.Google Scholar
Winer, B. J. Statistical principles in experimental design, 1962, New York: McGraw-Hill.CrossRefGoogle Scholar