Hostname: page-component-745bb68f8f-v2bm5 Total loading time: 0 Render date: 2025-01-27T07:30:44.190Z Has data issue: false hasContentIssue false

AN IN-DEPTH LOOK AT HIGHEST POSTERIOR MODEL SELECTION

Published online by Cambridge University Press:  30 November 2007

Tanujit Dey
Affiliation:
Case Western Reserve University
Hemant Ishwaran
Affiliation:
Case Western Reserve University and Cleveland Clinic
J. Sunil Rao
Affiliation:
Case Western Reserve University

Abstract

We consider the properties of the highest posterior probability model in a linear regression setting. Under a spike and slab hierarchy we find that although highest posterior model selection is total risk consistent, it possesses hidden undesirable properties. One such property is a marked underfitting in finite samples, a phenomenon well noted for Bayesian information criterion (BIC) related procedures but not often associated with highest posterior model selection. Another concern is the substantial effect the prior has on model selection. We employ a rescaling of the hierarchy and show that the resulting rescaled spike and slab models mitigate the effects of underfitting because of a perfect cancellation of a BIC-like penalty term. Furthermore, by drawing upon an equivalence between the highest posterior model and the median model, we find that the effect of the prior is less influential on model selection, as long as the underlying true model is sparse. Nonsparse settings are, however, problematic. Using the posterior mean for variable selection instead of posterior inclusion probabilities avoids these issues.

Type
Research Article
Copyright
© 2008 Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

REFERENCES

Akaike, H. (1973) Information theory and an extension of the maximum likelihood principle. In B.N. Petrov & F. Csaki (eds.), Proceedings of the Second International Symposium on Information Theory, pp. 267281. Akademia Kiado.
Atkinson, A. (1978) Posterior probabilities for choosing a regression model. Biometrika 65, 3948.Google Scholar
Barbieri, M. & J. Berger (2004) Optimal predictive model selection. Annals of Statistics 32, 870897.Google Scholar
Bernardo, J. & A. Smith (1994) Bayesian Theory. Wiley.
Breiman, L. (1992) The little bootstrap and other methods for dimensionality selection in regression: X-fixed prediction error. Journal of the American Statistical Association 87, 738754.Google Scholar
Chipman, H., E. George, & R. McCulloch (2001) The practical implementation of Bayesian model selection. In P. Lahiri (ed.), Model Selection, IMS Monograph 38, pp. 67116. IMS.
Clyde, M., H. DeSimone, & G. Parmigiani (1996) Prediction via orthogonalized model mixing. Journal of the American Statistical Association 91, 11971208.Google Scholar
Clyde, M., G. Parmigiani, & B. Vidakovic (1998) Multiple shrinkageand subset selection in wavelets. Biometrika 85, 391402.Google Scholar
Gelfand, A.E., D. Dey, & H. Chang (1992) Model determination using predictive distributions with implementations via sampling-based methods. In J.M. Bernardo, J.O. Berger, A.P. Dawid, & A.F.M. Smith (eds.), Bayesian Statistics, vol. 4, pp. 147167. Oxford University Press.
George, E. & R. McCulloch (1993) Variable selection via Gibbs sampling. Journal of the American Statistical Association 88, 881889.Google Scholar
George, E. & R. McCulloch (1997) Approaches for Bayesian variable selection. Statistica Sinica 7, 339373.Google Scholar
Geweke, J. & R. Meese (1981) Estimating regression models of finite but unknown order. International Economic Review 22, 5570.Google Scholar
Hannan, E. & B. Quinn (1979) The determination of the order of an autoregression. Journal of the Royal Statistical Society, Series B 41, 190195.Google Scholar
Hoeting, J.A., D. Madigan, A.E. Raftery, & C.T. Volinsky (1999) Bayesian model averaging: A tutorial. Statistical Science 14, 382417.Google Scholar
Ishwaran, H. & J. Rao (2003) Detecting differentially expressed genes in microarrays using Bayesian model selection. Journal of the American Statistical Association 98, 438455.Google Scholar
Ishwaran, H. & J. Rao (2005a) Spike and slab gene selection for multigroup microarray data. Journal of the American Statistical Association 100, 764780.Google Scholar
Ishwaran, H. & J. Rao (2005b) Spike and slab variable selection: Frequentist and Bayesian strategies. Annals of Statistics 33, 730773.Google Scholar
Leeb, H. & B. Pötscher (2007) Sparse Estimators and the Oracle Property, or the Return of Hodges' Estimator. April. Cowles Foundation Discussion Paper no. 1500.
Leeb, H. & B. Pötscher (2005) Model selection and inference: Facts and fiction. Econometric Theory 21, 2159.Google Scholar
Mitchell, T. & J. Beauchamp (1988) Bayesian variable selection in linear regression. Journal of the American Statistical Association 83, 10231036.Google Scholar
Nishii, R. (1984) Asymptotic properties of criteria for selection of variables in multiple regression. Annals of Statistics 12, 758765.Google Scholar
Rao, C. & Y. Wu (1989) A strongly consistent procedure for model selection in a regression problem. Biometrika 76, 369374.Google Scholar
Schwarz, G. (1978) Estimating the dimension of a model. Annals of Statistics 6, 461464.Google Scholar
Serfling, R. (2002) Approximation Theorems of Mathematical Statistics. Wiley.
Shao, J. (1993) Linear model selection by cross-validation. Journal of the American Statistical Association 88, 486494.Google Scholar
Shao, J. (1996) Bootstrap model selection. Journal of the American Statistical Association 91, 655665.Google Scholar
Shibata, R. (1976) Selection of the order of an autoregressive model by Akaike's information. Biometrika 63, 117126.Google Scholar
Smith, M. & R. Kohn (1996) Nonparametric regression using Bayesian variable selection. Journal of Econometrics 75, 317344.Google Scholar
Stone, M. (1977a) An asymptotic equivalence of choice of model by cross-validation and Akaike's criterion. Journal of the Royal Statistical Society, Series B 39, 4447.Google Scholar
Stone, M. (1977b) Asymptotics for and against cross-validation. Biometrika 64, 2935.Google Scholar
Yang, Y. (2005) Can the strengths of AIC and BIC be shared? A conflict between model identification and regression estimation. Biometrika 92, 937950.Google Scholar
Zhang, P. (1993a) Model selection via multifold cross validation. Annals of Statistics 21, 299313.Google Scholar
Zhang, P. (1993b) On the convergence rate of model selection criteria. Communications in Statistical Theory and Methods 22, 27652775.Google Scholar