Published online by Cambridge University Press: 26 March 2004
The E-Z Reader model (Reichle et al. 1998; 1999) provides a theoretical framework for understanding how word identification, visual processing, attention, and oculomotor control jointly determine when and where the eyes move during reading. In this article, we first review what is known about eye movements during reading. Then we provide an updated version of the model (E-Z Reader 7) and describe how it accounts for basic findings about eye movement control in reading. We then review several alternative models of eye movement control in reading, discussing both their core assumptions and their theoretical scope. On the basis of this discussion, we conclude that E-Z Reader provides the most comprehensive account of eye movement control during reading. Finally, we provide a brief overview of what is known about the neural systems that support the various components of reading, and suggest how the cognitive constructs of our model might map onto this neural architecture.
1. Many models of word-identification have been proposed (Brown 1991; Bullinaria 1997; McClelland & Rumelhart 1981; Paap et al. 1982; Plaut et al. 1996; Seidenberg 1989; Seidenberg & McClelland 1989) to explain how orthography maps onto phonology and/or meaning, and how this process is affected by lexical variables (e.g., normative frequency, grapheme-phoneme regularity, etc.). Unfortunately, these models are generally limited in two ways: First, the entry point into these models is usually some highly abstract orthographic representation that bears little resemblance to the features that one might expect to be encoded by the visual system (e.g., homogenous retina acuity). Second, the models are generally fit to data from paradigms other than natural reading (e.g., lexical decision latencies). The models therefore say very little about the relationships among vision, eye movements, and word identification. Two interesting exceptions to this are McClelland's (1986) programmable blackboard model of reading and Shillcock et al.'s (2000) split processing model. The former model was designed to examine how fixation locations and visual acuity restrictions affect the model's word recognition performance; similarly, the split processing model was designed to examine how bisection of the visual field (and hence words) by the two cerebral hemispheres might explain why words are identified most rapidly when they are fixated near their centers.
2. We did not have a deep reason for choosing the name of our model. “E-Z Reader” was the name of a fictional character in a children's educational program The Electric Company in the U.S. and was clearly a spoof on the title of the movie Easy Rider.
3. Our discussion of parafoveal preview effects pertains to the processing of English. Indeed, there is some recent evidence (Deutsch et al. 2000; 2003) that indicates that, in Hebrew, morphological previews (in the form of the root morpheme, which is distributed throughout the word) provide preview benefit effects.
4. There is currently some disagreement regarding the extent to which the duration of a fixation prior to a skip is inflated. While there are reports of such an effect (Pollatsek et al. 1986; Reichle et al. 1998), others have reported null effects (Engbert et al. 2002; Radach & Heller 2000). In a very recent examination, we found effects on the order of 23 msec prior to a skip.
5. There is some dispute concerning the influence of “higher order” variables on where readers fixate. For example, Lavigne et al. (2000) reported that the eyes moved further into a word when that word was both high-frequency and predictable from the prior context. However, Rayner et al. (2001) and Vonk et al. (2000) found no such effect. In addition, Underwood et al. (1990; see also Hyönä et al. 1989) reported that the eyes moved further into words when the informative part of the word was at the end of the word. But Rayner and Morris (1992) and Hyönä (1995b) were unable to replicate this finding. On the other hand, there appears to be general agreement that an orthographically irregular letter cluster at the beginning of a word results in the eyes’ initial landing position deviating toward the beginning of the word (Beauvillain & Doré 1998; Beauvillain et al. 1996; Hyönä 1995b).
6. A single set of parameter values were used in all of the simulations reported in this paper. These values were estimated by completing multiple grid-searches of the parameter's space so as to find the set that yielded the best overall fit to the Schilling et al. (1998) sentence corpus. For a complete description of our grid-search procedure, see the Appendix of Reichle et al. (1998).
7. Strictly speaking, Equation 1 produces word length effects (holding the eccentricity of the center of the word constant) only if the word straddles the fixation point. We used the arithmetic mean of the absolute distances in these formulas because of computational simplicity. However, if this were changed to some other combination rule (e.g., the geometric mean), then the equation would predict word length effects in all cases.
8. Frequency and predictability are not the only (nor necessarily the best) predictors of the time needed to identify a word in text. One problem with using frequency is that, even if the number of times a reader sees a given word in print was a perfect predictor of the time to identify the word, the Francis and Kučera (1982) norms (and other norms) are derived from corpuses that are unlikely to be representative of the texts that most readers encounter. (Another limitation of the Francis & Kučera norms is that they are derived from a fairly small corpus – only one million words.) Likewise, the predictability norms are also very crude estimates of how sentence context affects “on-line” lexical processing; in contrast to what actually happens during natural reading, the readers in these close-task studies have no visual information about the target words, but unlimited time to use all of the words in the sentence prior to the targets to guess their identities. Finally, the time needed to identify a word is likely to be a function of many other variables, including its part of speech, its concreteness, and the frequency with which it is encountered in spoken language. In summary, then, our decision to use frequency and predictability was not based on any a priori belief that these variables provide a complete explanation of lexical processing during reading. Instead, we are using them because they are known to produce significant effects in reading, and because they are clearly important determinants of word identification speed (i.e., how often a reader has seen the word before and how much top-down influence there is on the word).
9. In the current version of the model, for simplicity, attentional processing of wordn+1 (or words in general) is assumed to begin only when early visual processing of the entire word is completed. We are currently exploring versions of the model in which this assumption is relaxed, and attentional processing can begin when the early visual processing of parts of words is complete.
10. In our model, both the early pre-attentive visual processing and the non-labile stage of saccadic programming were halted during actual saccades. The former assumption was made because there is evidence that virtually no visual information is extracted during eye movements (Ishida & Ikeda 1989; Wolverton & Zola 1983). The latter assumption was necessary to ensure that a saccade could not be initiated while the eyes were already in motion. It should be noted that lexical processing does continue during saccades (Irwin 1998).
11. Figure 6 indicates that the model is underestimating the durations of single fixations. This problem stems from our increased estimate of the time needed to complete the labile stage of saccadic programming (i.e., t(M1) = 187 msec). Because this “competitor” takes longer completing the “race” that determines whether or not a word will be refixated (i.e., the race between L1 and M1), the predicted durations of the first of two or more fixations is slightly too long, as indicated by the fact that the first-fixation durations are similar in length to the single-fixation durations. This also causes the single fixation durations for lower frequency words to be a bit too short. We don't think this is a major conceptual problem, as the primary goal in our simulations was to fit first-fixation durations and gaze durations rather than singlefixation durations. The problem seems fixable, however, by reducing t(M1) a bit and increasing the effect of frequency on the first stage of lexical access a bit. These changes shouldn't produce any catastrophic effects on other aspects of the fit, although perhaps the gaze durations may not fit quite well as in the current simulation.
12. We did not actually examine the landing site distributions in the Schilling et al. (1998) data because there were too few observations and because the properties of the distributions that we wanted to simulate are quite robust and have been reported in several places (e.g., McConkie et al. 1988; 1991; Rayner et al. 1996).
13. Interestingly, Vitu et al. (2001) recently reported an inverted optimal viewing position effect in reading in which readers’ fixations were longer when they fixated near the center of a word than when they fixated away from the center of the word (when only one fixation was made on the word). Like Rayner et al. (1996), Vitu et al. also found frequency effects such that low-frequency words were fixated longer than high-frequency words.
14. In its current version, the model predicts that people will read about as effectively in a moving window condition in which the word to the left of fixation (wordn−1) and the fixated word (wordn) are visible as when the word to the right of fixation (wordn+1) and the fixated word (wordn) are visible (assuming word-boundary information is preserved to guide eye movements). This conflicts markedly with the findings in moving window studies (McConkie & Rayner 1975) where information to the right of the fixated word facilitates reading far more than information to the left of the fixated word. Perhaps the model does not depend critically on this attentional assumption and good predictions can be obtained with better attentional assumptions.
15. The model derives its name from Glenmore, Ireland – the place where much of the model was first developed (cf. Reilly & Radach 2003).
16. These results are open to alternative interpretations because the task was not natural reading, and thus did not actually require eye movements. Instead, the subject was required to read text on a computer monitor that was displayed through a stationary nine-character “window.” The text was manually advanced via pressing keys that moved the text forward (1–9 character spaces) or backwards (1–3 character spaces), and a mask (covering 1, 3, or 5 character spaces) was placed over the center of the viewing window to occlude letters in the scotoma conditions.
17. For example, we previously argued that the last version of the model discussed in Reichle et al. (1998), E-Z Reader 5, is superior to an earlier version, E-Z Reader 3, even though the latter model provided a slightly better aggregate fit to the Schilling et al. (1998) data. This claim was based primarily on a qualitative argument: In E-Z Reader 5 (but not E-Z Reader 3), the rate of lexical processing decreases as the disparity between the word being processed and the fovea increases. Although this feature of E-Z Reader 5 makes the model more psychologically plausible, the counter-argument could be made that the lack of an improvement of the model's overall performance does not warrant the additional of two parameters. However, Salvucci and Anderson (1998; 2001) recently found additional evidence supporting our claim. Briefly, Salvucci and Anderson first replicated the Schilling et al. experiment with a different subject population, and then used several different eye-movement protocol algorithms to determine how well E-Z Readers 3 and 5 could account for the eye-movement data of individual subjects. They also examined how well the models could account for two sequential measures: (1) the proportions of saccades of each given length; and (2) the proportions of saccades of each given length following saccades of various lengths. The results of these analyses indicated that E-Z Reader 5 fit all three measures better than did E-Z Reader 3, and that E-Z Reader 5 in fact provided a better account of the finer-grained, sequential aspects of the observed eye-movement data. Moreover, these results suggest that E-Z Reader 7 (which also includes the visual acuity assumption) may also provide better quantitative fits than earlier, simpler, versions of the model.
18. Furthermore, our simulations to date (Pollatsek et al. 2003) indicate that a simple race model (i.e., a race between two independent processes, a direct look-up process and a constructive process) is unlikely to account for the observed pattern of data in Hyönä and Pollatsek (1998) and Pollatsek et al. (2000). This is an illustration of how modeling can help sharpen one's thinking about such issues.
19. Because the effects of higher-order language processing are often delayed and/or apparent over a wider temporal window than are the effects of lower-order language processing, the former may actually be less difficult to simulate than the latter. Paradoxically, it may be more difficult to evaluate a model's capacity to simulate higher-order linguistic effects for these same reasons.