Hostname: page-component-cd9895bd7-jkksz Total loading time: 0 Render date: 2024-12-27T13:05:42.992Z Has data issue: false hasContentIssue false

Finite-size corrections to Poisson approximations of rare events in renewal processes

Published online by Cambridge University Press:  14 July 2016

John L. Spouge*
Affiliation:
National Library of Medicine, USA
*
Postal address: National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD 20894, USA. Email address: spouge@nih.gov

Abstract

Consider a renewal process. The renewal events partition the process into i.i.d. renewal cycles. Assume that on each cycle, a rare event called 'success’ can occur. Such successes lend themselves naturally to approximation by Poisson point processes. If each success occurs after a random delay, however, Poisson convergence may be relatively slow, because each success corresponds to a time interval, not a point. In 1996, Altschul and Gish proposed a finite-size correction to a particular approximation by a Poisson point process. Their correction is now used routinely (about once a second) when computers compare biological sequences, although it lacks a mathematical foundation. This paper generalizes their correction. For a single renewal process or several renewal processes operating in parallel, this paper gives an asymptotic expansion that contains in successive terms a Poisson point approximation, a generalization of the Altschul-Gish correction, and a correction term beyond that.

Type
Research Papers
Copyright
Copyright © Applied Probability Trust 2001 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Altschul, S. F. (1997). Sequence comparison and alignment. In DNA and Protein Sequence Analysis, eds. Bishop, M. J. and Rawlings, C. J. IRL Press, Oxford, pp. 137167.Google Scholar
Altschul, S. (1999). Comments on ‘Gapped BLAST and PSI-BLAST: a new generation of protein database search programs’ by Altschul, S. F. et al. Scientist 13, 15.Google Scholar
Altschul, S. (1999). Private communication.Google Scholar
Altschul, S. F., and Gish, W. (1996). Local alignment statistics. In Computer Methods for Macromolecular Sequence Analysis (Methods in Enzymology 266), ed. Doolittle, R. F. Academic Press, London, pp. 460480.Google Scholar
Altschul, S. F, and Koonin, E. V. (1998). Iterated profile searches with PSI-BLAST—a tool for discovery in protein databases. Trends Biochem. Sci. 23, 444447.CrossRefGoogle ScholarPubMed
Altschul, S. F. et al. (1990). Basic Local Alignment Search Tool. J. Molec. Biol. 215, 403410.,Google Scholar
Gish, W., Miller, W., Myers, E. W. and Lipman, D. J. Altschul, S. F. et al. (1994). Issues in searching molecular sequence databases. Nature Genetics 6, 119129.Google Scholar
Boguski, M. S., Gish, W. and Wootton, J. C., Altschul, S. F. et al. (1997). Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–402.Google Scholar
Madden, T. L., Schaffer, A. A., Zhang, J., Zhang, Z., Miller, W. and Lipman, D. J. Arratia, R., and Waterman, M. S. (1989). The Erdõs—Rényi strong law for pattern matching with a given proportion of mismatches. Ann. Prob. 17, 11521169.Google Scholar
Arratia, R., Goldstein, L., and Gordon, L. (1989). Two moments suffice for Poisson approximations, the Chen–Stein method. Ann. Prob. 17, 925.Google Scholar
Barrett, C., Hughey, R., and Karplus, K. (1997). Scoring hidden Markov models. Comput. Appl. Biosci. 13, 191199.Google Scholar
Bundschuh, R., and Hwa, T. (2000). An analytic study of the phase transition line in local sequence alignment with gaps. Discrete Appl. Math. 104, 113142.CrossRefGoogle Scholar
Chen, L. H. Y. (1975). Poisson approximation for dependent trials. Ann. Prob. 3, 534545.CrossRefGoogle Scholar
Cinlar, E. (1975). Introduction to Stochastic Processes. Prentice-Hall, Englewood, NJ.Google Scholar
Dayhoff, M. O., Schwartz, R. M., and Orcutt, B. C. (1978). A model of evolutionary change in proteins. In Atlas of Protein Sequence and Structure, Vol. 5, suppl. 3, ed. Dayhoff, M. O. National Biomedical Research Foundation, Silver Spring, MD, pp. 345352.Google Scholar
De Bruijn, N. G. (1981). Asymptotic Methods in Analysis. Dover, New York.Google Scholar
Dembo, A., and Karlin, S. (1991). Strong limit-theorems of empirical distributions for large segmental exceedances of partial-sums of Markov variables. Ann. Prob. 19, 17561767.Google Scholar
Dembo, A., and Karlin, S. (1991). Strong limit-theorems of empirical functionals for large exceedances of partial-sums of i.i.d. variables. Ann. Prob. 19, 17371755.Google Scholar
Dembo, A., and Karlin, S. (1993). Central limit-theorems of partial-sums for large segmental values. Stoch. Proc. Appl. 45, 259271.CrossRefGoogle Scholar
Dembo, A., and Zeitouni, O. (1998). Large Deviations Techniques and Applications. Springer, New York.Google Scholar
Dembo, A., Karlin, S., and Zeitouni, O. (1994). Critical phenomena for sequence matching with scoring. Ann. Prob. 22, 19932021.CrossRefGoogle Scholar
Dembo, A., Karlin, S., and Zeitouni, O. (1994). Limit distributions of maximal non-aligned two-sequence segmental score. Ann. Prob. 22, 20222039.Google Scholar
Dembo, A., Karlin, S., and Zeitouni, O. (1994). Limit distributions of maximal non-aligned two-sequence segmental score. Ann. Prob. 22, 20222039.Google Scholar
Doob, J. L. (1991). Measure Theory. Springer, New York.Google Scholar
Feller, W. (1971). An Introduction to Probability Theory and its Applications, Vol. 1. John Wiley, New York.Google Scholar
Feller, W. (1971). An Introduction to Probability Theory and its Applications, Vol. 2. John Wiley, New York.Google Scholar
Freedman, D. (1974). The Poisson approximation for dependent events. Ann. Prob. 2, 256269.Google Scholar
Gribskov, M., McLachlan, A. D., and Eisenberg, D. (1987). Profile analysis: detection of distantly related proteins. Proc. Nat. Acad. Sci. USA 84, 43554358.Google Scholar
Grimmett, G. R., and Stirzaker, D. R. (1998). Probability and Random Processes. Oxford University Press.Google Scholar
Henikoff, S., and Henikoff, J. G. (1993). Performance evaluation of animo acid substitution matrices. Proteins 17, 4961.Google Scholar
Hille, E. (1976). Analytic Function Theory, Vol. 1. Ginn, New York.Google Scholar
Iglehart, D. L. (1972). Extreme values in the GI/G/1 queue. Ann. Math. Statist. 43, 627635.CrossRefGoogle Scholar
Karlin, S., and Altschul, S. F. (1990). Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. Proc. Nat. Acad. Sci. USA 87, 22642268.Google Scholar
Karlin, S., and Altschul, S. F. (1993). Applications and statistics for multiple high-scoring segments in molecular sequences. Proc. Nat. Acad. Sci. USA 90, 58735877.Google Scholar
Karlin, S., and Dembo, A. (1992). Limit distributions of maximal segmental score among Markov-dependent partial-sums. Adv. Appl. Prob. 24, 113140.Google Scholar
Karlin, S., and Taylor, H. M. (1975). A First Course in Stochastic Processes. Academic Press, New York.Google Scholar
Karlin, S. et al. (1991). Statistical-methods and insights for protein and DNA-sequences. Annual Rev. Biophys. Biophys. Chem. 20, 175203.Google Scholar
Bucher, P., Brendel, V. and Altschul, S. F. Levinson, N., and Redheffer, R. M. (1970). Complex Variables. Holden-Day, San Francisco.Google Scholar
Mott, R. (2000). Accurate formula for p-values of gapped local sequence and profile alignments. J. Molec. Biol. 300, 649659.CrossRefGoogle ScholarPubMed
Needleman, S. B., and Wunsch, C. D. (1970). A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Molec. Biol. 48, 443453.Google Scholar
Olsen, R., Bundschuh, R., and Hwa, T. (1999). Rapid assessment of extremal statistics for local alignment with gaps. In Proc. Int. Conf. Intelligent Systems for Molec. Biol., eds. Lengauer, T. et al. AAAI Press, Menlo Park, CA.Google Scholar
Pearson, W. R. (1995). Comparison of methods for searching protein sequence databases. Protein Sci. 4, 11451160.Google Scholar
Pearson, W. R. (1996). Effective protein sequence comparison. Meth. Enzymol. 266, 227258.Google Scholar
Pólya, G. and Szegö, G. (1972). Problems and Theorems in Analysis, Vol. 1. Springer, New York.Google Scholar
Reinert, G., and Schbath, S. (1998). Compound Poisson and Poisson process approximations for occurrences of multiple words in Markov chains. J. Comput. Biol. 5, 223253.CrossRefGoogle ScholarPubMed
States, D. J., Gish, W., and Altschul, S. F. (1991). Improved sensitivity of nucleic acid database searches using application-specific scoring matrices. Methods 3, 6670.Google Scholar
Stein, C. (1970). A bound for the error in the normal approximation to the distribution of a sum of dependent random variables. In Proc. 6th Berkeley Symp. Math. Statist. Prob., Vol. II, eds. Le Cam, L. M. et al. University of California Press, pp. 583602.Google Scholar
Tanushev, M. S., and Arratia, R. (1997). Central limit theorem for renewal theory for several patterns. J. Comput. Biol. 4, 3544.Google Scholar
Waterman, M. S., Gordon, L., and Arratia, R. (1987). Phase-transitions in sequence matches and nucleic-acid structure. Proc. Nat. Acad. Sci. USA 84, 12391243.Google Scholar
Williams, D. (1997). Probability with Martingales. Cambridge University Press.Google Scholar
Wolf, Y. (1999). Personal communication.Google Scholar
Wootton, J. C., and Federhen, S. (1993). Statistics of local complexity in amino-acid-sequences and sequence databases. Comput. Chem. 17, 149163.Google Scholar