Hostname: page-component-cd9895bd7-dzt6s Total loading time: 0 Render date: 2024-12-27T11:42:35.915Z Has data issue: false hasContentIssue false

Importance Sampling for Failure Probabilities in Computing and Data Transmission

Published online by Cambridge University Press:  14 July 2016

Søren Asmussen*
Affiliation:
University of Aarhus
*
Postal address: Department of Mathematical Sciences, University of Aarhus, Ny Munkegade, DK-8000 Aarhus C, Denmark. Email address: asmus@imf.au.dk
Rights & Permissions [Opens in a new window]

Abstract

Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.

In this paper we study efficient simulation algorithms for estimating P(Xx), where X is the total time of a job with ideal time T that needs to be restarted after a failure. The main tool is importance sampling, where a good importance distribution is identified via an asymptotic description of the conditional distribution of T given Xx. If Tt is constant, the problem reduces to the efficient simulation of geometric sums, and a standard algorithm involving a Cramér-type root, γ(t), is available. However, we also discuss an algorithm that avoids finding the root. If T is random, particular attention is given to T having either a gamma-like tail or a regularly varying tail, and to failures at Poisson times. Different types of conditional limit occur, in particular exponentially tilted Gumbel distributions and Pareto distributions. The algorithms based upon importance distributions for T using these asymptotic descriptions have bounded relative error as x→∞ when combined with the ideas used for a fixed t. Nevertheless, we give examples of algorithms carefully designed to enjoy bounded relative error that may provide little or no asymptotic improvement over crude Monte Carlo simulation when the computational effort is taken into account. To resolve this problem, an alternative algorithm using two-sided Lundberg bounds is suggested.

Type
Research Article
Copyright
Copyright © Applied Probability Trust 2009 

References

[1] Abramowitz, M. and Stegun, I. A. (eds) (1972). Handbook of Mathematical Functions. Dover, New York.Google Scholar
[2] Andersen, L. N. and Asmussen, S. (2008). Parallel computing, failure recovery and extreme values. J. Statist. Theory Pract. 2, 279292.CrossRefGoogle Scholar
[3] Asmussen, S. (2003). Applied Probability and Queues, 2nd edn. Springer, New York.Google Scholar
[4] Asmussen, S. and Glynn, P. W. (2007). Stochastic Simulation: Algorithms and Analysis. Springer, New York.CrossRefGoogle Scholar
[5] Asmussen, S. and Lipsky, L. (2008). Failure recovery in computing and data transmission: limit theorems for checkpointing. Working paper. Aarhus University.Google Scholar
[6] Asmussen, S. et al. (2008). Asymptotic behavior of total times for Jobs that must start over if a failure occurs. Math. Operat. Res. 33, 932944.CrossRefGoogle Scholar
[7] Billingsley, P. (1968). Convergence of Probability Measures. John Wiley, New York.Google Scholar
[8] Bingham, N. H., Goldie, C. M. and Teugels, J. L. (1987). Regular Variation. Cambridge University Press.CrossRefGoogle Scholar
[9] Blanchet, J. and Glynn, P. W. (2008). Efficient rare-event simulation for the maximum of heavy-tailed random walks. Ann. Appl. Prob. 18, 13511378.CrossRefGoogle Scholar
[10] Blanchet, J. H. and Li, C. (2006). Efficient rare-event simulation for geometric sums. In Proc. RESIM, Bamberg, Germany.Google Scholar
[11] David, H. A. (1970). Order Statistics. John Wiley, New York.Google Scholar
[12] Fisher, R. A. (1929). Tests of significance in harmonic analysis. Proc. R. Soc. London A 125, 5459.Google Scholar
[13] Glynn, P. W. and Whitt, W. (1992). The asymptotic efficiency of simulation estimators. Operat. Res. 40, 505520.CrossRefGoogle Scholar
[14] Hammersley, J. M. and Handscomb, D. C. (1964). Monte Carlo Methods. Methuen, London.Google Scholar
[15] Jelenković, P. and Tan, J. (2007). Can retransmissions of superexponential documents cause subexponential delays? In Proc. IEEE INFOCOM (Anchorage, May 2007), pp. 892900.Google Scholar
[16] Jelenković, P. and Tan, J. (2007). Characterizing heavy-tailed distributions induced by retransmissions. Tech. Rep. EE2007-09-07, Columbia University.Google Scholar
[17] Sheahan, R., Lipsky, L., Fiorini, P. and Asmussen, S. (2006). On the distribution of task completion times for tasks that must restart from the beginning if failure occurs. In ACM SIGMETRICS Performance Evaluation Review Association for Computing Machinery, New York, pp. 2426.Google Scholar
[18] Van Leeuwaarden, J. S. H., Löpker, A. H. and Janssen, A. J. E. M. (2008). Connecting renewal age processes and M/D/1 processor sharing queues through stickbreaking. EURANDOM Report 2008-17.Google Scholar
[19] Willmot, G. E. and Lin, S. X. (2001). Lundberg Approximations for Compound Distributions with Insurance Applications (Lecture Notes Statist. 156), Springer, New York.CrossRefGoogle Scholar