Hostname: page-component-cd9895bd7-dk4vv Total loading time: 0 Render date: 2024-12-28T02:33:44.504Z Has data issue: false hasContentIssue false

The statistical analysis of direct repeats in nucleic acid sequences

Published online by Cambridge University Press:  14 July 2016

Rakesh Shukla*
Affiliation:
University of Cincinnati
R. C. Srivastava*
Affiliation:
The Ohio State University
*
Postal address: Institute of Environmental Health, Division of Biostatistics, University of Cincinnati Medical Center, Wherry Hall (#183), Cincinnati, OH 45267, USA.
∗∗Postal address: Department of Statistics, The Ohio State University, Columbus, OH 43210, USA.

Abstract

Sequence symmetries in DNA and RNA are being discovered at an increasing rate. Conjectures and hypotheses are being proposed for their possible structural and functional role in the nucleic acid. In this paper a probability model is studied which evaluates the probabilities of various repeats occurring by chance alone. Expressions are derived for the mean and variance of the statistics employed. The central limit theorem for dependent trials is used to obtain the asymptotic distributions. An indication is given of how to use the model to search for various gene amplification events in the evolutionary history of the sequences.

Type
Research Papers
Copyright
Copyright © Applied Probability Trust 1985 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

[1] Brezinski, D. P. (1975) Statistical significance of DNA sequence symmetries. Nature, London 253, 128130.Google Scholar
[2] Dykes, G., Bambara, R., Marians, K., and Wu, R. (1975) On the statistical significance of primary structural features found in DNA-protein interaction sites. Nucleic Acid Res. 2, 327345.Google Scholar
[3] Galas, D. J. (1978) On the symmetries of multi palindromic DNA sequences. J. Theoret. Biol. 72, 5773.Google Scholar
[4] Hoeffding, W. and Robbins, H. (1948) The central limit theorem for dependent random variables. Duke Math. J. 15, 773780.Google Scholar
[5] Wachter, R. De (1981) The number of repeats expected in random nucleic acid sequences and found in genes. J. Theoret. Biol. 91, 7198.CrossRefGoogle ScholarPubMed