Article contents
Distribution of the number of words with a prescribed frequency and tests of randomness
Published online by Cambridge University Press: 01 July 2016
Abstract
The goal of this paper is to investigate properties of statistical procedures based on numbers of different patterns by using generating functions for the probabilities of a prescribed number of occurrences of given patterns in a random text. The asymptotic formulae are derived for the expected value of the number of words occurring a given number of times and for the covariance matrix. The form of the optimal linear test based on these statistics is established. These problems appear in testing for the randomness of a string of binary bits, DNA sequencing, source coding, synchronization, quality control protocols, etc. Indeed, the probabilities of repeated (overlapping) patterns are important in information theory (the second-order properties of relative frequencies) and molecular biology problems (finding patterns with unexpectedly low or high frequencies).
Keywords
MSC classification
- Type
- General Applied Probability
- Information
- Copyright
- Copyright © Applied Probability Trust 2002
References
- 11
- Cited by