Published online by Cambridge University Press: 14 July 2016
We give an efficient method based on minimal deterministic finite automata for computing the exact distribution of the number of occurrences and coverage of clumps (maximal sets of overlapping words) of a collection of words. In addition, we compute probabilities for the number of h-clumps, word groupings where gaps of a maximal length h between occurrences of words are allowed. The method facilitates the computation of p-values for testing procedures. A word is allowed to contain other words of the collection, making the computation more general, but also more difficult. The underlying sequence is assumed to be Markovian of an arbitrary order.