Hostname: page-component-cd9895bd7-q99xh Total loading time: 0 Render date: 2024-12-27T11:05:40.964Z Has data issue: false hasContentIssue false

Globally convergent stochastic optimization with optimal asymptotic distribution

Published online by Cambridge University Press:  14 July 2016

Jürgen Dippon*
Affiliation:
Universität Stuttgart
*
Postal address: Mathematisches Institut A, Universität Stuttgart, 70511 Stuttgart, Germany. E-mail address: dippon@mathematik.uni-stuttgart.de

Abstract

A stochastic gradient descent method is combined with a consistent auxiliary estimate to achieve global convergence of the recursion. Using step lengths converging to zero slower than 1/n and averaging the trajectories, yields the optimal convergence rate of 1/√n and the optimal variance of the asymptotic distribution. Possible applications can be found in maximum likelihood estimation, regression analysis, training of artificial neural networks, and stochastic optimization.

Type
Research Papers
Copyright
Copyright © Applied Probability Trust 1998 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Berger, E. (1986). Asymptotic behaviour of a class of stochastic approximation procedures. Prob. Theory Rel. Fields 71, 517552.Google Scholar
Bishop, C.M. (1996). Neural Networks for Pattern Recognition. Clarendon Press, Oxford.Google Scholar
Delyon, B., and Juditsky, A. (1992). Stochastic optimization with averaging of trajectories. Stoch. and Stoch. Rep. 39, 107118.Google Scholar
Dippon, J., and Fabian, V. (1994). Stochastic approximation of global minimum points. J. Statistical Planning and Inference 41, 327347.Google Scholar
Dippon, J., and Renz, J. (1996). Weighted means of processes in stochastic approximation. Math. Methods Statist. 5, 3260.Google Scholar
Fabian, V. (1968). On asymptotic normality in stochastic approximation. Ann. Math. Statist. 39, 13271332.Google Scholar
Fabian, V. (1978). On asymptotically efficient recursive estimation. Ann. Statist. 6, 854866.Google Scholar
Fabian, V. (1988). The local asymptotic minimax adaptive property of a recursive estimate. Statistics & Probability Letters 6, 383388.Google Scholar
Fabian, V. (1992). On neural networks and stochastic approximation. Preliminary Report RM 530. Michigan State University, Dept. of Statistics and Probability.Google Scholar
Fabian, V. (1994). Comment on White (1989). J. Amer. Statist. Assoc. 89, 1571.Google Scholar
Hertz, J., Krogh, A., and Palmer, R.G. (1991). Introduction to the Theory of Neural Computation. Addison-Wesley, Redwood City, CA.Google Scholar
Huber, P.J. (1981). Robust Statistics. Wiley, New York.Google Scholar
Kushner, H.J., and Clark, D.S. (1978). Stochastic Approximation Methods for Constrained and Unconstrained Systems. Springer, New York.Google Scholar
Kushner, H.J., and Huang, H. (1981). Asymptotic properties of stochastic approximation with constant coefficients. SIAM J. Control Optimization 19, 87105.Google Scholar
Ljung, L. (1978). Strong convergence of a stochastic approximation algorithm. Ann. Statist. 6, 680696.CrossRefGoogle Scholar
Pflug, G. (1984). Stochastic optimization with constant step-size. Asymptotic laws. SIAM J. Control Optimization 24, 655666.Google Scholar
Polyak, B.T. (1990). New method of stochastic approximation type. Automat. & Remote Control 51, 937946.Google Scholar
Polyak, B.T., and Juditsky, A.B. (1992). Acceleration of stochastic approximation by averaging. SIAM J. Control Optimization 30, 838855.CrossRefGoogle Scholar
Polyak, B.T., and Tsybakov, A.B. (1990). Optimal order of accuracy of search algorithms in stochastic optimization. Problems Inform. Transmission 26, 126133.Google Scholar
Ripley, B.D. (1996). Pattern Recognition and Neural Networks. Cambridge University Press, Cambridge.Google Scholar
Robbins, H., and Siegmund, D. (1971). A convergence theorem for non-negative almost supermartingales and some applications. In Optimizing Methods in Statistics, ed. Rustagi, J.S. Academic Press, New York, pp. 233257.Google Scholar
Ruppert, D. (1988). Efficient estimators from a slowly converging Robbins–Monro process. Technical Report No. 781. School of Oper. Res. and Ind. Eng., Cornell University, Ithaca, New York.Google Scholar
Ruppert, D. (1991). Stochastic approximation. In Handbook of Sequential Analysis, eds. Ghosh, B.K. and Sen, P.K. Marcel Dekker, New York, pp. 503529.Google Scholar
Sarle, W.S. (1994). Neural networks and statistical models. In Proceedings of the Nineteenth Annual SAS Users Group International Conference. SAS Institute, pp. 15381550.Google Scholar
Stanković, S.S. and Milosavljević, M.M. (1991). Training of multilayer perceptions by stochastic approximation. In Neural Networks: Concepts, Applications, and Implementations, Vol. IV, eds. Antognetti, P. and Milutinović, V. Prentice Hall, New Jersey, pp. 201239.Google Scholar
Walk, H. (1992). Foundations of stochastic approximation. In Stochastic Approximation and Optimization of Random Systems, eds. Ljung, L., Pflug, G. and Walk, H. Birkhäuser, Basel, pp. 151.Google Scholar
Wei, C.Z. (1987). Multivariate adaptive stochastic approximation. Ann. Statist. 15, 11151130.Google Scholar
White, H. (1989). Some asymptotic results for learning in single hidden-layer feedforward network models. J. Amer. Statist. Assoc. 84, 10031013. Correction i.b.d. 87, 1252.CrossRefGoogle Scholar
White, H. et al. (1992). Artificial Neural Networks. Blackwell, Cambridge MA.Google Scholar