Preparing words in speech production is normally a fast and
accurate process. We generate them two or three per second in fluent
conversation; and overtly naming a clear picture of an object can
easily be initiated within 600 msec after picture onset. The
underlying process, however, is exceedingly complex. The theory
reviewed in this target article analyzes this process as staged and
feedforward. After a first stage of conceptual preparation, word
generation proceeds through lexical selection, morphological and
phonological encoding, phonetic encoding, and articulation itself. In
addition, the speaker exerts some degree of output control, by
monitoring of self-produced internal and overt speech. The core
of the theory, ranging from lexical selection to the initiation of
phonetic encoding, is captured in a computational model, called
weaver++. Both the theory and the computational
model have been developed in interaction with reaction time
experiments, particularly in picture naming or related word production
paradigms, with the aim of accounting for the real-time processing in
normal word production. A comprehensive review of theory, model, and
experiments is presented. The model can handle some of the main
observations in the domain of speech errors (the major empirical
domain for most other theories of lexical access), and the theory
opens new ways of approaching the cerebral organization of speech
production by way of high-temporal-resolution imaging.