Hostname: page-component-cd9895bd7-hc48f Total loading time: 0 Render date: 2024-12-26T08:20:38.541Z Has data issue: false hasContentIssue false

Fast circular dictionary-matching algorithm

Published online by Cambridge University Press:  11 May 2015

TANVER ATHAR
Affiliation:
Department of Informatics, King's College London, London, UK
CARL BARTON
Affiliation:
Department of Informatics, King's College London, London, UK
WIDMER BLAND
Affiliation:
Department of Computing and Software, McMaster University, Hamilton, Canada
JIA GAO
Affiliation:
Department of Informatics, King's College London, London, UK
COSTAS S. ILIOPOULOS
Affiliation:
Department of Informatics, King's College London, London, UK
CHANG LIU
Affiliation:
Department of Informatics, King's College London, London, UK
SOLON P. PISSIS
Affiliation:
Department of Informatics, King's College London, London, UK
Rights & Permissions [Opens in a new window]

Abstract

Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.

Circular string matching is a problem which naturally arises in many contexts. It consists in finding all occurrences of the rotations of a pattern of length m in a text of length n. There exist optimal worst- and average-case algorithms for circular string matching. Here, we present a suboptimal average-case algorithm for circular string matching requiring time $\mathcal{O}$(n) and space $\mathcal{O}$(m). The importance of our contribution is underlined by the fact that the proposed algorithm can be easily adapted to deal with circular dictionary matching. In particular, we show how the circular dictionary-matching problem can be solved in average-case time $\mathcal{O}$(n + M) and space $\mathcal{O}$(M), where M is the total length of the dictionary patterns, assuming that the shortest pattern is sufficiently long. Moreover, the presented average-case algorithms and other worst-case approaches were also implemented. Experimental results, using real and synthetic data, demonstrate that the implementation of the presented algorithms can accelerate the computations by more than a factor of two compared to the corresponding implementation of other approaches.

Type
Paper
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
Copyright
Copyright © Cambridge University Press 2015

References

Aho, A. V. and Corasick, M. J. (1975). Efficient string matching: An aid to bibliographic search. Communications of the ACM 18 (6) 333340.Google Scholar
Barton, C., Iliopoulos, C. S. and Pissis, S. P. (2013). Circular string matching revisited. In: Proceedings of the 4th Italian Conference on Theoretical Computer Science (ICTCS 2013) 200–205.Google Scholar
Barton, C., Iliopoulos, C. S. and Pissis, S. P. (2014). Fast algorithms for approximate circular string matching. Algorithms for Molecular Biology 9 (9). Available at http://www.almob.org/content/9/1/9.Google Scholar
Barton, C., Iliopoulos, C. S. and Pissis, S. P. (2015). Average-case optimal approximate circular string matching. In: Dediu, A.-H., Formenti, E., Martin-Vide, C. and Truthe, B. (eds.) Language and Automata Theory and Applications, Lecture Notes in Computer Science, volume 8977 Springer, Berlin 8596.Google Scholar
Belazzougui, D. (2010). Succinct dictionary matching with no slowdown. In: Amir, A. and Parida, L. (eds.) Combinatorial Pattern Matching, Lecture Notes in Computer Science, volume 6129 Springer, Berlin 88100.CrossRefGoogle Scholar
Chan, H., Hon, W., Lam, T. and Sadakane, K. (2007). Compressed indexes for dynamic text collections. ACM Transactions on Algorithms 3 (2). Available at http://dl.acm.org/citation.cfm?doid=1240233.1240244.Google Scholar
Chen, K., Huang, G. and Lee, R. C. (2013). Bit-parallel algorithms for exact circular string matching. Computer Journal 57 (5) 731743.Google Scholar
Dori, S. and Landau, G. M. (2006). Construction of Aho Corasick automaton in linear time for integer alphabets. Information Processing Letters 98 (2) 6672.Google Scholar
Fischer, J. (2011). Inducing the LCP-array. In: Dehne, F., Iacono, J. and Sack, J.-R. (eds.) Algorithms and Data Structures, Lecture Notes in Computer Science, volume 6844, Springer, Berlin 374385.Google Scholar
Fischer, J. and Heun, V. (2011). Space-efficient preprocessing schemes for range minimum queries on static arrays. SIAM Journal on Computing 40 (2) 465492.Google Scholar
Fredriksson, K. and Grabowski, S. (2009). Average-optimal string matching. Journal of Discrete Algorithms 7 (4) 579594.Google Scholar
Frousios, K., Iliopoulos, C. S., Mouchard, L., Pissis, S. P. and Tischler, G. (2010). REAL: An efficient REad ALigner for next generation sequencing reads. In: Proceedings of the First ACM International Conference on Bioinformatics and Computational Biology, BCB 10, USA, ACM 154–159.Google Scholar
Gusfield, D. (1997). Algorithms on Strings, Trees and Sequences, Cambridge University Press.Google Scholar
Hon, W., Ku, T., Shah, R. and Thankachan, S. V. (2013). Space-efficient construction algorithm for the circular suffix tree. In Fischer, J. and Sanders, P. (eds.) Combinatorial Pattern Matching, Lecture Notes in Computer Science, volume 7922, Springer, Berlin 142152.Google Scholar
Hon, W., Ku, T., Shah, R., Thankachan, S. V. and Vitter, J. S. (2010). Faster compressed dictionary matching. In: Chavez, E. and Lonardi, S. (eds.) String Processing and Information Retrieval, Lecture Notes in Computer Science, volume 6393, Springer, Berlin 191200.Google Scholar
Hon, W., Lu, C., Shah, R. and Thankachan, S. V. (2011). Succinct indexes for circular patterns. In Asano, T., Nakano, S.-I., Okamoto, Y. and Watanabe, O (eds.) Algorithms and Computation, Lecture Notes in Computer Science, volume 7074, Springer, Berlin 673682.Google Scholar
Huynh, T. N. D., Hon, W., Lam, T. and Sung, W. (2006). Approximate string matching using compressed suffix arrays. Theoretical Computer Science 352 (1) 240249.Google Scholar
Ilie, L., Navarro, G. and Tinta, L. (2010). The longest common extension problem revisited and applications to approximate string searching. Journal of Discrete Algorithms 8 (4) 418428.Google Scholar
Iliopoulos, C. S. and Rahman, M. S. (2008). Indexing circular patterns. In: Nakano, S.-I. and Rahman, Md. S. (eds.) WALCOM: Algorithms and Computation, Lecture Notes in Computer Science, volume 4921, Springer, Berlin 4657.Google Scholar
Lothaire, M. (ed.) (2005). Applied Combinatorics on Words, Cambridge University Press.Google Scholar
Manber, U. and Myers, E. W. (1993). Suffix arrays: A new method for on-line string searches. SIAM Journal on Computing 22 (5) 935948.CrossRefGoogle Scholar
Nong, G., Zhang, S. and Chan, W. H. (2009). Linear suffix array construction by almost pure induced-sorting. In: Storer, J. A. and Marcellin, M. W. (eds.) Proceedings of the 2009 Data Compression Conference, DCC 09, Washington, DC, USA, IEEE Computer Society 193–202.Google Scholar
Rivest, R. (1976). Partial-match retrieval algorithms. SIAM Journal on Computing 5 (1) 1950.Google Scholar
Smyth, B. (2003). Computing Patterns in Strings. Pearson, Addison-Wesley.Google Scholar
Weiner, P. (1973). Linear pattern matching algorithms. In: Proceedings of the 14th Annual Symposium on Switching and Automata Theory (SWAT 1973), IEEE Computer Society 1–11.Google Scholar
Wu, S. and Manber, U. (1992). Fast text searching: Allowing errors. Communications of the ACM 35 (10) 8391.Google Scholar