Hostname: page-component-745bb68f8f-s22k5 Total loading time: 0 Render date: 2025-01-14T12:44:30.893Z Has data issue: false hasContentIssue false

PREDICTION/ESTIMATION WITH SIMPLE LINEAR MODELS: IS IT REALLY THAT SIMPLE?

Published online by Cambridge University Press:  06 December 2006

Yuhong Yang
Affiliation:
University of Minnesota

Abstract

Consider the simple normal linear regression model for estimation/prediction at a new design point. When the slope parameter is not obviously nonzero, hypothesis testing and information criteria can be used for identifying the right model. We compare the performances of such methods both theoretically and empirically from different perspectives for more insight. The testing approach at the conventional size of 0.05, in spite of being the “standard approach,” performs poorly in estimation. We also found that the frequently told story “the Bayesian information criterion (BIC) is good when the true model is finite-dimensional, and the Akaike information criterion (AIC) is good when the true model is infinite-dimensional” is far from being accurate. In addition, despite some successes in the effort to go beyond the debate between AIC and BIC by adaptive model selection, it turns out that it is not possible to share the pointwise adaptation property of BIC and the minimax-rate adaptation property of AIC by any model selection method. When model selection methods have difficulty in selection, model combining is a better alternative in terms of estimation accuracy.This work was completed when the author was on leave from Iowa State University and was a New Direction Visiting Professor at the Institute for Mathematics and its Applications (IMA) at the University of Minnesota. The fundings from both IMA and ISU are greatly appreciated. The work was also partly supported by NSF CAREER grant DMS0094323. The author thanks Xiaotong Shen and Hannes Leeb for very helpful discussions. The paper also benefited from the questions and comments from the participants at the statistics seminars the author gave at the University of Minnesota and Duke University. The author is very grateful to the anonymous reviewers and the co-editor Benedikt Pötscher for carefully reading earlier versions of the paper, bringing my attention to several closely related previous and current results, and making many very valuable suggestions, which significantly improved the paper in both content and presentation.

Type
Research Article
Copyright
© 2007 Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

REFERENCES

Akaike, H. (1973) Information theory and an extension of the maximum likelihood principle. In B.N. Petrov & F. Csaki (eds.), Proceedings of the 2nd International Symposium on Information Theory, pp. 267281. Akademia Kiado.
Bancroft, T.A. (1944) On biases in estimation due to the use of preliminary tests of significance. Annals of Mathematical Statistics 15, 190204.Google Scholar
Barron, A.R., L. Birgé, & P. Massart (1999) Risk bounds for model selection via penalization. Probability Theory and Related Fields 113, 301413.Google Scholar
Barron, A.R. & T.M. Cover (1991) Minimum complexity density estimation. IEEE Transactions on Information Theory 37, 10341054.Google Scholar
Barron, A.R., Y. Yang, & B. Yu (1994) Asymptotically optimal function estimation by minimum complexity criteria. In Proceedings of the 1994 International Symposium on Information Theory, p. 38.
Birgé, L. & P. Massart (2001) Gaussian model selection. Journal of the European Mathematical Society 3, 203268.Google Scholar
Brown, L.D., M.G. Low, & L.H. Zhao (1997) Superefficiency in nonparametric function estimation. Annals of Statistics 25, 26072625.Google Scholar
Breiman, L. (1996) Bagging predictors. Machine Learning 24, 123140.Google Scholar
Buckland, S.T., K.P. Burnham, & N.H. Augustin (1997) Model selection: An integral part of inference. Biometrics 53, 603618.Google Scholar
Burnham, K.P. & D.R. Anderson (2002) Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach. Springer-Verlag.
Danilov, D. & J.R. Magnus (2004) On the harm that ignoring pretesting can cause. Journal of Econometrics 122, 2746.Google Scholar
Foster, D.P. & E.I. George (1994) The risk inflation criterion for multiple regression. Annals of Statistics 22, 19471975.Google Scholar
George, E.I. & D.P. Foster (2000) Calibration and empirical Bayes variable selection. Biometrika 87, 731747.Google Scholar
Geweke, J. & R. Meese (1981) Estimating regression models of finite but unknown order. International Economic Review 22, 5470.Google Scholar
Giles, J.A. & D.E.A. Giles (1993) Pre-test estimation and testing in econometrics: Recent developments. Journal of Economic Surveys 7, 145197.Google Scholar
Hannan, E.J. & B.G. Quinn (1979) The determination of the order of an autoregression. Journal of the Royal Statistical Society, Series B 41, 190195.Google Scholar
Hansen, M. & B. Yu (1999) Bridging AIC and BIC: An MDL model selection criterion. In Proceedings of IEEE Information Theory Workshop on Detection, Estimation, Classification and Imaging, p. 63. IEEE.
Hoeting, J.A., D. Madigan, A.E. Raftery, & C.T. Volinsky (1999) Bayesian model averaging: A tutorial. Statistical Science (with discussions) 14, 382417.Google Scholar
Johnson, R.W. (1996) Fitting percentage of body fat to simple body measurements. Journal of Statistics Education 4, available at http://www.amstat.org/publications/jse/v4n1/datasets.johnson.html.Google Scholar
Judge, G.G. & M.E. Bock (1978) The Statistical Implications of Pre-test and Stein-Rule Estimators in Econometrics. Elsevier/North-Holland.
Judge, G.G. & T.A. Yancey (1986) Improved Methods of Inference in Econometrics. Elsevier/North-Holland.
Kabaila, P. (2002) On variable selection in linear regression. Econometric Theory 18, 913925.Google Scholar
Leeb, H. (2005) The distribution of a linear predictor after model selection: Conditional finite-sample distributions and asymptotic approximations. Journal of Statistical Planning and Inference 134, 6489.Google Scholar
Leeb, H. & B. Pötscher (2003) The finite-sample model distribution of post-model-selection estimators and uniform versus nonuniform approximations. Econometric Theory 19, 100142.Google Scholar
Leeb, H. & B. Pötscher (2005) Model selection and inference: Facts and fiction. Econometric Theory 21, 2159.Google Scholar
Li, K.C. (1987) Asymptotic optimality for Cp, CL, cross-validation and generalized cross-validation: Discrete index set. Annals of Statistics 15, 958975.Google Scholar
Magnus, J.R. (1999) The traditional pretest estimator. Theory of Probability and Its Applications 44, 293308.Google Scholar
Magnus, J.R. (2002) Estimation of the mean of a univariate normal distribution with known variance. Econometrics Journal 5, 225236.Google Scholar
Magnus, J.R. & J. Durbin (1999) Estimation of regression coefficients of interest when other regression coefficients are of no interest. Econometrica 67, 639643.Google Scholar
Nishii, R. (1984) Asymptotic properties of criteria for selection of variables in multiple regression. Annals of Statistics 12, 758765.Google Scholar
Penrose, K., A. Nelson, & A. Fisher (1985) Generalized body composition prediction equation for men using simple measurement techniques (abstract). Medicine and Science in Sports and Exercise 17, 189.Google Scholar
Pollard, D. (2002) A User's Guide to Measure Theoretic Probability. Cambridge University Press.
Polyak, B.T. & A.B. Tsybakov (1991) Asymptotic optimality of the Cp-test for the orthogonal series estimation of regression. Theory of Probability and Its Applications (translation of Teorija Verojatnostei i ee Primenenija) 35, 293306.Google Scholar
Pötscher, B. (1989) Model selection under nonstationarity: Autoregressive models and stochastic linear regression models. Annals of Statistics 17, 12571274.Google Scholar
Pötscher, B. (1991) Effects of model selection on inference. Econometric Theory 7, 163185.Google Scholar
Rao, C.R. & Y. Wu (1989) A strongly consistent procedure for model selection in a regression problem. Biometrika 76, 369374.Google Scholar
Rao, J.S. & R. Tibshirani (1997) Comment on “An asymptotic theory for linear model selection.” Statistica Sinica 7, 249251.Google Scholar
Rissanen, J. (1978) Modeling by shortest data description. Automatica 14, 465471.Google Scholar
Rissanen, J. (1986) Stochastic complexity and modeling. Annals of Statistics 14, 10801100.Google Scholar
Schwarz, G. (1978) Estimating the dimension of a model. Annals of Statistics 6, 461464.Google Scholar
Shao, J. (1997) An asymptotic theory for linear model selection (with discussion). Statistica Sinica 7, 221242.Google Scholar
Shen, X. & J. Ye (2002) Adaptive model selection. Journal of the American Statistical Association 97, 210221.Google Scholar
Shibata, R. (1983) Asymptotic mean efficiency of a selection of regression variables. Annals of the Institute of Statistical Mathematics 35, 415423.Google Scholar
Speed, T.P. & B. Yu (1993) Model selection and prediction: Normal regression. Annals of the Institute of Statistical Mathematics 45, 3554.Google Scholar
Toro-Vizcarrondo, C. & T.D. Wallace (1968) A test of the mean square error criterion for restriction in linear regression. Journal of the American Statistical Association 63, 558572.Google Scholar
Wallace, T.D. (1972) Weaker criteria and tests for linear restrictions in regression. Econometrica 40, 689698.Google Scholar
Yang, Y. (1999) Model selection for nonparametric regression. Statistica Sinica 9, 475499.Google Scholar
Yang, Y. (2001) Adaptive regression by mixing. Journal of the American Statistical Association 96, 574588.Google Scholar
Yang, Y. (2003) Regression with multiple candidate models: Selecting or mixing? Statistica Sinica 13, 783809.Google Scholar
Yang, Y. (2004) Aggregating regression procedures for a better performance. Bernoulli 10, 2547.Google Scholar
Yang, Y. (2005) Can the strengths of AIC and BIC be shared? A conflict between model identification and regression estimation. Biometrika 92, 937950.Google Scholar
Yang, Y. & A.R. Barron (1998) An asymptotic property of model selection criteria. IEEE Transactions on Information Theory 44, 95116.Google Scholar
Yuan, Z. & Y. Yang (2005) Combining linear regression models: When and how? Journal of the American Statistical Association 100, 12021214.Google Scholar
Zhang, P. (1997) Comment on “An asymptotic theory for linear model selection.” Statistica Sinica 7, 254258.Google Scholar