Book contents
- Frontmatter
- Contents
- Preface
- Part I Formal Background
- Part II From Theory to Practice
- 7 The C(M) language
- 8 C(M) Implementation of Finite-State Devices
- 9 The Aho–Corasick Algorithm
- 10 The Minimal Deterministic Finite-State Automaton for a Finite Language
- 11 Constructing Finite-State Devices for Text Rewriting
- References
- Index
10 - The Minimal Deterministic Finite-State Automaton for a Finite Language
from Part II - From Theory to Practice
Published online by Cambridge University Press: 29 July 2019
- Frontmatter
- Contents
- Preface
- Part I Formal Background
- Part II From Theory to Practice
- 7 The C(M) language
- 8 C(M) Implementation of Finite-State Devices
- 9 The Aho–Corasick Algorithm
- 10 The Minimal Deterministic Finite-State Automaton for a Finite Language
- 11 Constructing Finite-State Devices for Text Rewriting
- References
- Index
Summary
A fundamental task in natural language processing is the efficient representation of lexica. From a computational viewpoint, lexica need to be represented in a way directly supporting fast access to entries, and minimizing space requirements. A standard method is to represent lexica as minimal deterministic (classical) finite-state automata. To reach such a representation it is of course possible to first build the trie of the lexicon and then to minimize this automaton afterwards. However, in general the intermediate trie is much larger than the resulting minimal automaton. Hence a much better strategy is to use a specialized algorithm to directly compute the minimal deterministic automaton in an incremental way. In this chapter we describe such a procedure.
Keywords
- Type
- Chapter
- Information
- Finite-State TechniquesAutomata, Transducers and Bimachines, pp. 253 - 278Publisher: Cambridge University PressPrint publication year: 2019
- 1
- Cited by