Article contents
Exact distribution of word counts in shuffled sequences
Published online by Cambridge University Press: 01 July 2016
Abstract
In DNA sequences, specific words may take on biological functions as marker or signalling sequences. These may often be identified by frequent-word analyses as being particularly abundant. Accurate statistics is needed to assess the statistical significance of these word frequencies. The set of shuffled sequences - letter sequences having the same k-word composition, for some choice of k, as the sequence being analysed - is considered the most appropriate sample space for analysing word counts. However, little is known about these word counts. Here we present exact formulae for word counts in shuffled sequences.
Keywords
MSC classification
- Type
- General Applied Probability
- Information
- Copyright
- Copyright © Applied Probability Trust 2006
References
- 1
- Cited by