Hostname: page-component-745bb68f8f-kw2vx Total loading time: 0 Render date: 2025-01-13T12:51:53.345Z Has data issue: false hasContentIssue false

COMBINING FORECASTING PROCEDURES: SOME THEORETICAL RESULTS

Published online by Cambridge University Press:  05 March 2004

Yuhong Yang
Affiliation:
Iowa State University

Abstract

We study some methods of combining procedures for forecasting a continuous random variable. Statistical risk bounds under the square error loss are obtained under distributional assumptions on the future given the current outside information and the past observations. The risk bounds show that the combined forecast automatically achieves the best performance among the candidate procedures up to a constant factor and an additive penalty term. In terms of the rate of convergence, the combined forecast performs as well as if the best candidate forecasting procedure were known in advance.

Empirical studies suggest that combining procedures can sometimes improve forecasting accuracy over the original procedures. Risk bounds are derived to theoretically quantify the potential gain and price of linearly combining forecasts for improvement. The result supports the empirical finding that it is not automatically a good idea to combine forecasts. Indiscriminate combining can degrade performance dramatically as a result of the large variability in estimating the best combining weights. An automated combining method is shown in theory to achieve a balance between the potential gain and the complexity penalty (the price of combining), to take advantage (if any) of sparse combining, and to maintain the best performance (in rate) among the candidate forecasting procedures if linear or sparse combining does not help.This research was supported by U.S. National Security Agency Grant MDA9049910060 and U.S. National Science Foundation CAREER Grant DMS0094323. The author sincerely thanks three reviewers and Poti Giannakouros for their very valuable comments, which led to a substantial improvement of the paper.

Type
Research Article
Copyright
© 2004 Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

REFERENCES

Armstrong, J.S. (1989) Combining forecasts: The end of the beginning or the beginning of the end? International Journal of Forecasting 5, 585588.Google Scholar
Barron, A.R. (1993) Universal approximation bounds for superpositions of a sigmoidal function. IEEE Transactions on Information Theory 39, 930945.Google Scholar
Barron, A.R. (1994) Approximation and estimation bounds for artificial neural networks. Machine Learning 14, 115133.Google Scholar
Barron, A.R., L. Birgé, & P. Massart (1999) Risk bounds for model selection via penalization. Probability Theory and Related Fields 113, 301413.Google Scholar
Barron, A.R., J. Rissanen, & B. Yu (1998) The minimum description length principle in coding and modeling. IEEE Transactions on Information Theory 44, 27432760.Google Scholar
Bates, J.M. & C.W.J. Granger (1969) The combination of forecasts. Operational Research Quarterly 20, 451468.Google Scholar
Box, G.E.P. & G.M. Jenkins (1976) Time Series Analysis: Forecasting and Control, 2nd ed. Holden-Day.
Breiman, L. (1996a) Stacked regressions. Machine Learning 24, 4964.Google Scholar
Breiman, L. (1996b) Bagging predictors. Machine Learning 24, 123140.Google Scholar
Buckland, S.T., K.P. Burnham, & N.H. Augustin (1997) Model selection: An integral part of inference. Biometrics 53, 603618.Google Scholar
Catoni, O. (1999) “Universal” Aggregation Rules with Exact Bias Bounds. Preprint no. 510, Laboratoire de Probabilites et Modeles Aleatoires, Université Paris VI & Université Paris VII.
Cesa-Bianchi, N., Y. Freund, D.P. Haussler, R. Schapire, & M.K. Warmuth (1997) How to use expert advice? Journal of the Association for Computing Machinery 44, 427485.Google Scholar
Chatfield, C. (1995) Model uncertainty, data mining, and statistical inference (with discussion). Journal of the Royal Statistical Society, Series A 158, 419466.Google Scholar
Clemen, R.T. (1989) Combining forecasts: A review and annotated bibliography. International Journal of Forecasting 5, 559583.Google Scholar
Clemen, R.T., A.H. Murphy, & R.L. Winkler (1995) Screening probability forecasts: Contrasts between choosing and combining. International Journal of Forecasting 11, 133145.Google Scholar
Clemen, R.T. & R.L. Winkler (1986) Combining economic forecasts. Journal of Business and Economic Statistics 4, 3946.Google Scholar
Cover, T.M. (1965) Behavior of sequential predictors of binary sequences. In Transactions of the Fourth Prague Conference on Information Theory, Statistical Decision Functions, and Random Processes, pp. 263271. Publishing House of the Czechoslovak Academy of Sciences.
Csiszár, I. (1975) I-Divergence geometry of probability distributions and minimization problems. Annals of Probability 3, 146158.Google Scholar
Dawid, A.P. (1984) Present position and potential developments: Some personal views. Statistical theory—The prequential approach (with discussion). Journal of the Royal Statistical Society, Series A 147, 278292.Google Scholar
DeVore, R.A. & G.G. Lorentz (1993) Constructive Approximation. Springer.
Devroye, L.P. & T.J. Wagner (1980) Distribution-free consistency results in nonparametric discrimination and regression function estimation. Annals of Statistics 8, 231239.Google Scholar
Donoho, D.L. & I.M. Johnstone (1994) Ideal denoising in an orthonormal basis chosen from a library of bases. C.R. Acad. Sci. Paris 319, 13171322.Google Scholar
Donoho, D.L. & I.M. Johnstone (1998) Minimax estimation via wavelet shrinkage. Annals of Statistics 26, 879921.Google Scholar
Efromovich, S. (1999) How to overcome curse of long-memory? IEEE Transactions on Information Theory 45, 17351741.Google Scholar
Figlewski, S. & T. Urich (1983) Optimal aggregation of money supply forecasts: Accuracy, profitability, and market efficiency. Journal of Finance 28, 695710.Google Scholar
Foster, D.P. (1991) Prediction in the worst case. Annals of Statistics 19, 10841090.Google Scholar
Genest, C. & J.V. Zidek (1986) Combining probability distributions: A critique and an annotated bibliography. Statistical Science 1, 114148.Google Scholar
Gouriéroux, C. & A. Monfort (1992) Qualitative threshold ARCH models. Journal of Econometrics 52, 159199.Google Scholar
Hall, P. & J.D. Hart (1990) Nonparametric regression with long-range dependence. Stochastic Processes and Their Applications 36, 339351.Google Scholar
Haussler, D., J. Kivinen, & M.K. Warmuth (1998) Sequential prediction of individual sequences under general loss functions. IEEE Transactions on Information Theory 44, 19061925.Google Scholar
Hoeting, J.A., D. Madigan, A.E. Raftery, & C.T. Volinsky (1999) Bayesian model averaging: A tutorial (with discussion). Statistical Science 14, 382401.Google Scholar
Holt, C.C. (1957) Forecasting Seasonals and Trends by Exponentially Weighted Moving Averages. Carnegie Institute of Technology. NR Research Memorandum 52.
Johnstone, I. (1999) Function Estimation in Gaussian Noise: Sequence Models. Manuscript.
Johnstone, I. & B.W. Silverman (1997) Wavelet threshold estimators for data with correlated noise. Journal of the Royal Statistical Association, Series B 59, 319351.Google Scholar
Juditsky, A. & A. Nemirovski (2000) Functional aggregation for nonparametric estimation. Annals of Statistics 28, 681712.Google Scholar
Kang, H. (1986) Unstable weights in the combination of forecasts. Management of Science 32, 683695.Google Scholar
Leamer, E.E. (1978) Specification Searches: Ad hoc Inference with Nonexperimental Data. Wiley.
LeBlanc, M. & R. Tibshirani (1996) Combining estimates in regression and classification. Journal of the American Statistical Association 91, 16411650.Google Scholar
Littlestone, N. & M.K. Warmuth (1994) The weighted majority algorithm. Information and Computation 108, 212261.Google Scholar
Merhav, N. & M. Feder (1998) Universal prediction. IEEE Transactions on Information Theory 44, 21242147.Google Scholar
Newbold, P., & C.W.J. Granger (1974) Experience with forecasting univariate times series and the combination of forecasts. Journal of the Royal Statistical Society, Series A 137, 131165 (with discussion).Google Scholar
Ploberger, W. & P.C.B. Phillips (1999) Empirical Limits for Time Series Econometric Models. Cowles Foundation Discussion paper 1220, Yale University.
Rissanen, J. (1986) Stochastic complexity and modeling. Annals of Statistics 14, 10801100.Google Scholar
Schütt, C. (1984) Entropy numbers of diagonal operators between symmetric Banach spaces. Journal of Approximation Theory 40, 121128.Google Scholar
Stone, C.J. (1977) Consistent nonparametric regression. Annals of Statistics 5, 595620.Google Scholar
Stone, C.J. (1985) Additive regression and other nonparametric models. Annals of Statistics 13, 689705.Google Scholar
Taylor, J.W. & D.W. Bunn (1999) Investigating improvements in the accuracy of prediction intervals for combinations of forecasts: A simulation study. International Journal of Forecasting 15, 325339.Google Scholar
Triebel, H. (1975) Interpolation properties of ε-entropy and diameters. Geometric characteristics of embedding for function spaces of Sobolev-Besov type. Mat. Sbornik 98, 2741; English trans. in Math. USSR Sb. 27, 23–37, 1977.Google Scholar
Wang, Y. (1996) Function estimation via wavelet shrinkage for long-memory data. Annals of Statistics 24, 466484.Google Scholar
Vovk, V.G. (1990) Aggregating strategies. In M. Fulk & J. Case (eds.), Proceedings of the Third Annual Workshop on Computational Learning Theory, pp. 372383.
Winters, P.R. (1960) Forecasting sales by exponentially weighted moving averages. Management of Science 6, 324342.Google Scholar
Wolpert, D. (1992) Stacked generalization. Neural Networks 5, 241259.Google Scholar
Yang, Y. (1997) Nonparametric Regression and Prediction with Dependent Errors. Technical Report 29, Department of Statistics, Iowa State University. A shorter version appeared in Bernoulli 7, 633–655, 2001.
Yang, Y. (in press) Aggregating regression procedures for a better performance. Bernoulli, forthcoming.
Yang, Y. (2000a) Mixing strategies for density estimation. Annals of Statistics 28, 7587.Google Scholar
Yang, Y. (2000b) Combining different procedures for adaptive regression. Journal of Multivariate Analysis 74, 135161.Google Scholar
Yang, Y. (2000c) Adaptive estimation in pattern recognition by combining different procedures. Statistica Sinica 10, 10691089.Google Scholar
Yang, Y. (2001) Adaptive regression by mixing. Journal of the American Statistical Association 96, 574588.Google Scholar
Yang, Y. (2003) Regression with multiple candidate models: Selecting or mixing? Statistica Sinica 13, 783809.Google Scholar
Yang, Y. & A.R. Barron (1998) An asymptotic property of model selection criteria. IEEE Transactions on Information Theory 44, 95116.Google Scholar
Yang, Y. & A.R. Barron (1999) Information-theoretic determination of minimax rates of convergence. Annals of Statistics 27, 15641599.Google Scholar
Zou, H. & Y. Yang (2003) Combining time series models for forecasting. International Journal of Forecasting, forthcoming.Google Scholar