Hostname: page-component-78c5997874-dh8gc Total loading time: 0 Render date: 2024-11-10T13:27:27.787Z Has data issue: false hasContentIssue false

Asymptotics for local maximal stack scores with general loop penalty function

Published online by Cambridge University Press:  01 July 2016

Niels Richard Hansen*
Affiliation:
University of Copenhagen
*
Postal address: Department of Mathematical Sciences, Universitetsparken 5, DK-2100, Copenhagen Ø, Denmark. Email address: richard@math.ku.dk
Rights & Permissions [Opens in a new window]

Abstract

Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.

A stack is a structural unit in an RNA structure that is formed by pairs of hydrogen bonded nucleotides. Paired nucleotides are scored according to their ability to hydrogen bond. We consider stack/hairpin-loop structures for a sequence of independent and identically distributed random variables with values in a finite alphabet, and we show how to obtain an asymptotic Poisson distribution of the number of stack/hairpin-loop structures with a score exceeding a high threshold, given that we count in a proper, declumped way. From this result we obtain an asymptotic Gumbel distribution of the maximal stack score. We also provide examples focusing on the computation of constants that enter in the asymptotic distributions. Finally, we discuss the close relation to existing results for local alignment.

Type
General Applied Probability
Copyright
Copyright © Applied Probability Trust 2007 

References

Altschul, S. et al. (1990). Basic local alignment search tool. J. Molec. Biol. 215, 403410.Google Scholar
Altschul, S. F. et al. (1997). Gapped blast and psi-blast: a new generation of protein database search programs. Nucleic Acids Res. 25, 33893402.CrossRefGoogle ScholarPubMed
Arratia, R., Goldstein, L. and Gordon, L. (1989). Two moments suffice for Poisson approximations: the Chen–Stein method. Ann. Prob. 17, 925.CrossRefGoogle Scholar
Arratia, R., Goldstein, L. and Gordon, L. (1990). Poisson approximation and the Chen–Stein method. Statist. Sci. 5, 403434.Google Scholar
Arratia, R., Gordon, L. and Waterman, M. (1986). An extreme value theory for sequence matching. Ann. Statist. 14, 971993.Google Scholar
Arratia, R., Gordon, L. and Waterman, M. S. (1990). The Erdős–Rényi law in distribution, for coin tossing and sequence matching. Ann. Statist. 18, 539570.Google Scholar
Asmussen, S. (2003). Applied Probability and Queues (Appl. Math. 51), 2nd edn. Springer, New York.Google Scholar
Dembo, A., Karlin, S. and Zeitouni, O. (1994). Critical phenomena for sequence matching with scoring. Ann. Prob. 22, 19931993.Google Scholar
Dembo, A., Karlin, S. and Zeitouni, O. (1994). Limit distribution of maximal non-aligned two-sequence segmental score. Ann. Prob. 22, 20222039.Google Scholar
Hansen, N. R. (2006). The maximum of a random walk reflected at a general barrier. Ann. Appl. Prob. 16, 1529.Google Scholar
Hofacker, I. L., Schuster, P. and Stadler, P. F. (1998). Combinatorics of RNA secondary structures. Discrete Appl. Math. 88, 207237.Google Scholar
Hofacker, I. L. et al. (1994). Fast folding and comparison of RNA secondary structures. Monatsh. Chem. 125, 167188.CrossRefGoogle Scholar
Karlin, S. and Altschul, S. F. (1990). Methods for assessing the statistical significance of molecular features by using general scoring schemes. Proc. Nat. Acad. Sci. 87, 22642268.Google Scholar
Karlin, S. and Dembo, A. (1992). Limit distributions of maximal segmental score among Markov-dependent partial sums. Adv. Appl. Prob. 24, 113140.Google Scholar
Ledoux, M. and Talagrand, M. (1991). Probability in Banach Spaces. Springer, Berlin.Google Scholar
Mathews, D. H., Sabina, J., Zuker, M. and Turner, D. H. (1999). Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure. J. Molec. Biol. 288, 911940.Google Scholar
Reinert, G. and Schbath, S. (1998). Compound Poisson and Poisson process approximations for occurrences of multiple words in Markov chains. J. Comput. Biol. 5, 223253.Google Scholar
Siegmund, D. (1985). Sequential Analysis. Springer, New York.Google Scholar
Siegmund, D. and Yakir, B. (2000). Approximate p-values for local sequence alignments. Ann. Statist. 28, 657680.CrossRefGoogle Scholar
Waterman, M. and Vingron, M. (1994). Rapid and accurate estimates of statistical significance for sequence data base searches. Proc. Nat. Acad. Sci. 91, 46254628.Google Scholar
Xiong, M. and Waterman, M. S. (1997). A phase transition for the minimum free energy of secondary structures of a random RNA. Adv. Appl. Math. 18, 111132.Google Scholar
Zuker, M., Mathews, D. and Turner, D. (1999). Algorithms and thermodynamics for RNA secondary structure prediction: a practical guide. In RNA Biochemistry and Biotechnology, Kluwer, Dordrecht, pp. 1143.CrossRefGoogle Scholar