This action might not be possible to undo. Are you sure you want to continue?

SHARE: A STOCHASTIC, HIERARCHICAL ARCHITECTURE FOR READING EYEMOVEMENT

BY GANG FENG B. Ed., Beijing Normal University, 1990 M.A., University of Illinois, 1998 M.S., University of Illinois, 1999

THESIS Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Psychology in the Graduate College of the University of Illinois at Urbana-Champaign, 2001

Urbana, Illinois

SHARE: A STOCHASTIC, HIERARCHICAL ARCHITECTURE FOR READING EYEMOVEMENT Gang Feng, Ph. D. Department of Psychology University of Illinois at Urbana-Champaign, 2001 Kevin F. Miller, Advisor Advances in methods for capturing patterns of eye-movements in reading have not yet been matched by corresponding methods for turning those data into a comprehensive quantitative model that is able to account for patterns of reading eye movements. The primary objective of the current research is to identify a set of mathematical tools that are able to describe reading eye movements, which are complex time-series data that covary with linguistic, perceptual, and other variables. A survey of existing quantitative models of reading eye movements shows that many of the models are unable to account for distributions of empirical eye-movement. Nonetheless, the variety of modeling approaches also point to promising solutions to the problem. Based on analysis of modeling constraints, and inspired by previous efforts, a stochastic, hierarchical architecture for reading eye-movement, SHARE, is proposed. An advanced Markov model helps to capture the temporal dependency between reading eye movements, and the hierarchical structure concisely represents the logical relationships between covariate factors, eye-movement decisions, and observable eye-movement behaviors. Model parameter estimation is based on Bayesian theory, which provides a natural way to incorporate prior knowledge and to conduct probabilistic reasoning. A simple model based on the SHARE architecture has been developed. Although it only takes into account a limited number of covariates and only models dependency between adjacent

eye movements, it nevertheless is able to capture much of the dynamics of reading eye movements. A simulation study shows that with its simple structure, the model is able to reproduce the distributions of fixation durations and saccade length, as well as composite eyemovement variables. Because each reader is modeled individually, analyses of model parameters for readers of varying age and reading proficiency also shed light on the development of reading skills. The SHARE architecture is shown to be flexible enough to characterize both beginning and fluent reading, which is particularly attractive for the study of reading development. Its ability to capture eye-movement patterns also opens a wide range of possibilities for real-world applications of the eye-movement technology.

iii ABSTRACT Advances in methods for capturing patterns of eye-movements in reading have not yet been matched by corresponding methods for turning those data into a comprehensive quantitative model that is able to account for patterns of reading eye movements. The primary objective of the current research is to identify a set of mathematical tools that are able to describe reading eye movements, which are complex time-series data that covary with linguistic, perceptual, and other variables. A survey of existing quantitative models of reading eye movements shows that many of the existing models are unable to account for distributions of empirical eye-movement. Nonetheless, the variety of modeling approaches also point to promising solutions to the problem. Based on analysis of modeling constraints, and inspired by previous efforts, a stochastic, hierarchical architecture for reading eye-movement, SHARE, is proposed. An advanced Markov model helps to capture the temporal dependency between reading eye movements, and the hierarchical structure concisely represents the logical relationships between covariate factors, eye-movement decisions, and observable eye-movement behaviors. Model parameter estimation is based on Bayesian theory, which provides a natural way to incorporate prior knowledge and to conduct probabilistic reasoning. A simple model based on the SHARE architecture has been developed. Although it only takes into account a limited number of covariates and only models dependency between adjacent eye movements, it nevertheless is able to capture much of the dynamics of reading eye movements. A simulation study shows that with its simple structure, the model is able to reproduce the distributions of fixation durations and saccade length and to predict eye-movement

iv variables with reasonable accuracy. Because each reader is modeled individually, analyses of model parameters for readers of varying age and reading proficiency also shed light on the development of reading skills. A distinctive strength of the SHARE architecture is that it makes minimal assumptions about psychological mechanisms but concentrates on mathematical descriptions of eyemovement patterns. To the extent that it separates objective descriptions from hypothetical mechanisms, it presents a way to implement and test a variety of theories of reading eye movement in a common platform. The SHARE architecture is shown to be flexible enough to characterize both beginning and fluent reading, which is particularly attractive for the study of reading development. Its ability to capture eye-movement patterns also opens a wide range of possibilities for real-world applications of the eye-movement technology.

v

DEDICATION

To My Family

vi ACKNOWLEDGEMENTS I would like to recognize those people who have helped me meet this part of the Ph. D. requirement. I thank the members of my dissertation review committee, Richard C. Anderson, Cynthia Fisher, George W. McConkie, Kevin F. Miller, and Douglas Simpson for the distinct expertise that each person brought to the project. I am greatly in debt to my academic and dissertation advisor, Kevin Miller, who has given me generous support, intellectually, financially, and emotionally, for the past seven years. I cannot think of any other labs where I could enjoy the total freedom to pursue my intellectual interests, the thoughtful and timely guidance, and the extraordinary research facility that Kevin offered me. His influence on me, both professionally and personally, will be felt in the years to come. George McConkie showed me the way to eye-movement research. But more importantly, he provided me with an example of an extraordinary scholar, an enthusiastic advisor, and simply a good person. He has never refused a single request for help, no matter how big or small it was. I cherish every opportunity to work with him, and am grateful for all the help he gave me over the years. The other members of my committee also made major contributions to my understanding of reading, language, and statistical issues in modeling, as well as helping me to clarify my thinking. Cynthia Fisher introduced me to many new concepts in linguistics and language acquisition, and has read and given thoughtful comments on many papers over the years. Doug Simpson made many incisive and constructive suggestions about the statistical aspects of this project; his patience and encouragement had a major impact on this project. Richard Anderson

vii has been consistently supportive throughout my career at UIUC, and even made the supreme sacrifice of returning from his summer home in Wisconsin to a hot and humid ChampaignUrbana for my final orals meeting. Of course, none of my committee members can be held responsible for the errors that remain in this project. The greatest support throughout my graduate program comes from my family. My parents, Sunqi Feng and Mei Chen, are always confident in me and forever encouraging. No word can express my thanks to my wife, Xiuhong Cao, and daughter, Jessie. During the dull moments of thesis writing, those joyful albeit brief after-dinner family moments were the only source of power that recharged me after the many hours of daily work and carried me through the long journey.

viii TABLE OF CONTENTS TABLE OF CONTENTS............................................................................................................. viii LIST OF TABLES......................................................................................................................... xi LIST OF FIGURES ...................................................................................................................... xii CHAPTER 1. INTRODUCTION .................................................................................................. 1 Describing a Single Reading Eye Movement ..................................................................... 2 Composite Eye-movement Variables: Measuring Local Dynamics................................... 3 Eye Movements as Stochastic Processes ............................................................................ 7 From Measurement to Modeling ...................................................................................... 10 CHAPTER 2. A SURVEY OF QUANTITATIVE MODELS .................................................... 11 “Direct Control” Model and the READER Simulation .................................................... 11 “Attentional Shift” Theory and Reilly’s Connectionist Model......................................... 15 “E-Z Reader” Models ....................................................................................................... 18 “Strategy-tactics” Theory and the Reilly and O’Regan Simulations................................ 26 Mr. Chips: The Ideal Observer ......................................................................................... 33 Stochastic Models by Stark and Suppes .......................................................................... 36 Normal Eye Movements: McConkie and colleagues' mathematical modeling ................ 40 CHAPTER 3. DESIGN PRINCIPLES ........................................................................................ 48 Theory-driven vs. Data-driven Modeling ......................................................................... 48 Deterministic vs. Probabilistic Modeling ......................................................................... 50 The WHEN and WHERE Decisions................................................................................. 51 Linguistic vs. Low-level Variables ................................................................................... 52

ix Time-series vs. Independent Data..................................................................................... 53 Discrete vs. Continuous Control ....................................................................................... 54 Group vs. Individual Models ............................................................................................ 58 Descriptive vs. Predictive Applications............................................................................ 59 Choosing the Mathematical Tools .................................................................................... 60 CHAPTER 4. SHARE: STRUCTURE, DYNAMICS, AND MODEL FITTING ...................... 65 Modeling Environment ..................................................................................................... 65 Modeling Data .................................................................................................................. 65 Structure of the SHARE Model ........................................................................................ 66 Temporal Dynamics.......................................................................................................... 73 Model Fitting and Parameter Learning ............................................................................. 74 Model Adequacy and Comparison.................................................................................... 78 CHAPTER 5. SIMULATION RESULTS ................................................................................... 80 Simulation Method............................................................................................................ 80 Distributions of fixation durations .................................................................................... 83 Distributions of Saccade Length....................................................................................... 84 SHARE in Conventional Eye-movement Measures ......................................................... 85 Summary ........................................................................................................................... 87 CHAPTER 6. DEVELOPMENTAL CHANGES OF READING EYE MOVEMENTS ........... 89 Previous Research on the Development of Reading Eye Movements.............................. 89 Developmental Analyses Using SHARE.......................................................................... 90 Development of Reading Eye-movement Control............................................................ 91

x Effects of Input Variables on Eye-movement Control ..................................................... 95 Discussion ......................................................................................................................... 98 CHAPTER 7. DISCUSSION..................................................................................................... 100 What is SHARE? ............................................................................................................ 100 What SHARE is Not ....................................................................................................... 102 Composite Variables Revisited: Implications to Psycholinguistic Research ................. 103 Applications in Reading Education ................................................................................ 105 TABLES ..................................................................................................................................... 107 FIGURES.................................................................................................................................... 109 APPENDIX A. PROBLEMS IN THE E-Z READER MODEL ................................................ 223 The Goodness-of-fit Index.............................................................................................. 223 Correlations, Multicollinearity, and Parsimonious Modeling......................................... 227 APPENDIX B. FITTING MIXTURE MODELS TO EMPIRICAL FIXATION DURATION DISTRIBUTIONS ...................................................................................................................... 230 Introduction..................................................................................................................... 230 Method ............................................................................................................................ 231 Results............................................................................................................................. 232 Discussion ....................................................................................................................... 236 REFERENCES ........................................................................................................................... 238 CURRICULUM VITAE............................................................................................................. 250

xi LIST OF TABLES Table 1. Developmental Characteristics of Reading Eye Movements ....................................... 107 Table 2. Log Likelihood of Bayesian and MLE for Fixation Duration Fitting .......................... 108

xii LIST OF FIGURES Figure 1. Architecture of Reilly’s Connectionist Model of Eye-movement Control ................. 109 Figure 2. Illustration of Parafoveal Preview Effects in E-Z Reader 5. ....................................... 110 Figure 3. Order-of-processing diagram for E-Z Reader 5 .......................................................... 111 Figure 4. Illustration of components of the Mr. Chips model .................................................... 112 Figures 5A and 5B. Landing Position of Fixations During Reading.......................................... 113 Figure 6. Frequency of skipping four- and eight-letter words .................................................... 114 Figure 7. Mean Landing Positions of Regressive Saccades as a Function of Launch Site......... 115 Figure 8. Fitting Fixation Duration Distribution with a Two-stage Mixture Model .................. 116 Figure 9. Distributions of Fixation Durations in Yang and McConkie (in press) ...................... 117 Figure 10. Graphical representation of the SHARE model ........................................................ 118 Figures 11-1 through 76. Simulating Fixation Duration and Saccade Length Distributions...... 119 Figure 12. Simulated and Empirical First Fixation Duration by Word Frequency .................... 195 Figure 13. Simulated and Empirical Single Fixation Duration by Word Frequency.................. 196 Figure 14. Simulated and Empirical Gaze Duration by Word Frequency.................................. 197 Figure 15. Simulated and Empirical Skipping Probability by Word Frequency ........................ 198 Figure 16. Simulated and Empirical Probability of Making Single Fixation by Word Frequency ............................................................................................................................................. 199 Figure 17. Simulated and Empirical Probability of Making Two Fixations by Word ............... 200 Figure 18. Developmental Changes in Saccade Targeting Probabilities.................................... 201 Figure 19. Developmental Changes in Fixation Duration Control: Probabilities of Making Short, Medium, and Long Fixations.............................................................................................. 202

xiii Figure 20. Developmental Changes in Fixation Duration Control: Modes of Short, Medium, and Long Fixation Durations ..................................................................................................... 203 Figure 21. Developmental Changes in Fixation Duration Control: Variance of Short, Medium, and Long Fixation Durations .............................................................................................. 204 Figure 22. What Affects Saccade Targeting: Effects of Word Frequency, Length of the Next Word, Fixation Landing Position, and the Previous Saccade Move................................... 205 Figure 23. What Affects Fixation Duration Control: Effects of Word Frequency, Length of the Next Word, Fixation Landing Position, and the Previous Saccade Move.......................... 206 Figure 24. BNT Mixture of Gaussian Model Diagram............................................................... 207 Figure 25-1 through 15. Fitting 3rd-grade, 5th-grade, and Adult Fixation Duration with nComponent Lognormal Mixture Models ............................................................................ 208

1

We are all working toward daylight in the matter, and many of the discrepancies of facts and theories are more apparent than real. (E. B. Huey, 1908, p. 102)

CHAPTER 1. INTRODUCTION The fact that the eyes travel through a line of text with a series of stops and jumps was first documented over a century ago (Javel, 1878, cited in Huey, 1908). From the very beginning, eye movements held great promise for revealing the mental processes involved in silent reading: These movements [of the eyes during reading] are not only subject to the influence of the direction of thought as words and phrases are read and assimilated, but they are also directly concerned in the sensory processes of perception. ... This two-fold relation of these movements with the control activities on the one hand, and on the other hand as the necessary accessory to a peripheral organ of sensation gives them an intermediary position between sensation and recognition and between thought and motor expressions which is of particular interest for the cues or indices which study of them may give of some of the workings of the mind. (Dearborn, 1906, quoted in Gray, 1922, p. 173-174) However, the road from eye movements to the understanding of mental processes has not been an easy one. What we can learn from reading eye movements depends on our ability to quantitatively describe them. More than 80 years after Dearborn, O'Regan (1990) outlined the basic logic for inferring the workings of the mind from eye movements: The first step in making use of eye movements as a clue to cognitive and perceptual processes is to proceed backwards: manipulate processing in a known way, and try to

2 understand the accompanying changes in eye movements. Later, when it is known how eye movements react to processing changes, one can use eye movements to understand the cognitive and perceptual processing that occurs in particular cases. (O'Regan, 1990, p. 400) In other words, the ability to describe eye-movement patterns, particularly how they change in response to other factors, precedes and limits our ability to understand the psychological processes of interest. The central concern of this research is how to quantitatively describe reading eye movements. The first two chapters briefly summarize some of the previous approaches and problems associated with them. A stochastic, hierarchical architecture for reading eye movements (SHARE) is developed and a simple model is implemented using this architecture. Model fitting and simulation results are also presented. Describing a Single Reading Eye Movement Reading eye movements are generally described as an alternating sequence of fixations (stops) and saccades (jumps). At this level of abstraction, the eye is assumed to be stationary during a fixation and to make a fast, ballistic movement during a saccade. Oculomotor details below this level of abstraction are rarely discussed. Two measures of eye movements – fixation duration and saccade length – are most widely used in the reading literature (Inhoff & Radach, 1998). The use of these two measures, however, is not without controversies. First of all, the boundary between saccade and fixation is blurred. The transition between a saccade and a fixation is gradual, and micro-saccades, tremors, and drifts happen during a fixation (Carpenter,

3 1988; Inhoff & Radach, 1998). Thus, in practice the numerical values of fixation duration and saccade length depend on many factors, such as the temporal and spatial resolution of the eyetracking device and the algorithm that detects fixations and saccades (McConkie, 1981). Secondly, even at the above level of abstraction, there may be a need for additional measures. For example, Irwin (1998) showed that linguistic processing is not stopped during a saccade, and therefore its time should be included when measuring processing time. Last but not least, eye-movement measures, particularly fixation duration, are often subject to censoring. It is a common practice to discard fixations, for example, shorter than 100 msec or longer than some threshold. The theoretical motivation for censoring seems to be the belief that these fixations are not produced by cognitive processes and are thus uninteresting or unrepresentative (see Inhoff & Radach, 1998, for a discussion). Because extreme scores can greatly affect means and standard deviations, censoring also has the effect of making these measures more representative of the data as well as “improving” the significance of statistical analyses. This is particularly a concern with models that try to fit group data such as averages rather than individual observations. In the current study, we focus on the two traditional eye movement measures – fixation duration and saccade length. No censoring of data is used in the current study, and individual fixations and saccades are used as the unit of analysis. Composite Eye-movement Variables: Measuring Local Dynamics Beyond measuring individual eye movements, reading researchers face the challenge of quantifying a series of eye movements. Psycholinguists are particularly interested in how eyemovement patterns change in response to experimental manipulations. This requires a way to

4 summarize the dynamics of processing over multiple eye movements. This turns out to be a difficult undertaking. Reading eye movements are intrinsically dynamic. They occur in order, and the characteristics of one fixation depend in part on those of the previous ones (e.g., Henderson & Ferreira, 1993; McConkie, Kerr, Reddix, Zola, & et al., 1989). Eye movements also respond in real time to the content under the current fixation, or even in the periphery (see Rayner, 1998, for a review). Finally, reading eye movements are extremely variable. In fact, Huey (1908) commented “…the variation [of fixation duration] is so very great that any average is misleading, and the pauses may really be of almost any length” (p. 33). The use of composite eye-movement variables is an attempt to summarize eye-movement dynamics over a short period of time. A composite variable, such as gaze duration or skipping rate1, is essentially a sample statistic computed from a set of eye movements that satisfy certain criteria (for example, all fixations that landed on a particular word or word group). This effectively turns an eye-movement pattern into a single number, which then can be used in statistical analyses. For example, in a hypothetical psycholinguistic study, a researcher interested in how word frequency affects reading processes might manipulate the frequencies of some designated words in the reading materials, and calculate readers’ gaze duration on the experimental words. These data are then fed to an ANOVA to determine whether readers’ eye movements were affected by the frequency manipulation.

1

Gaze duration is typically defined as the sum of the duration of all fixations on a word (or a predefined region) provided that the eye has not left the word (region). Skipping rate is the probability of a word (or region) not being fixated. A finer distinction may be made between cases where the word was later regressed to and those where the word was never fixated during the entire reading.

5 This familiar scenario illustrates several problems with the use of composite eyemovement variables. First, no single statistic can completely summarize the eye-movement dynamics on a word. Therefore, multiple composite variables have to be computed with the hope that collectively they will give a full description of the eye movement pattern. In a recent review, Inhoff and Radach (1998) enumerated at least seven time-related composite variables: single fixation duration, the duration of the first and second of two target fixations, first fixation duration, gaze duration, mean fixation duration, total time, and total repair time. Each of them is a different way of selecting from and summing over the set of fixations on a word (due to space limitations their definitions are not listed here). New measures have been introduced to select and sum fixations over time in order to capture additional eye movement patterns (e.g., Liversedge, Paterson, & Pickering, 1998). In addition to reading time measures, a variety of variables have been used to describe saccade patterns, including the probability of skipping, refixating, or regressing to and from a word (or a region) and the length of saccade going in and out of a region, among others. Having too many options may not be advantageous. In practice, it is impossible to search through all of the composite variables for an effect. Researchers have to rely on rules of thumb to select a small set of “reasonable” variables, and hope they will capture the desired effects. Second, the correlations between these measures make it difficult to interpret findings. One rationale for the multitude of composite variables is that each is sensitive to different aspects of reading (e.g., Liversedge et al., 1998) or taps into different processing stages (Murray, 2000; Rayner, 1998). In reality, however, very few of these variables are independent of each other, and some pairs are often highly correlated. This is not surprising given that the various time-

6 related measures are just different and overlapping ways of selecting from the same pool of fixation and saccades. As a result, it is difficult to establish a direct link between a composite variable and a reading process. Similarly, because of the composite nature of the variables2, one cannot be certain that an effect found in one variable is not caused by others. When an effect in gaze duration is found, for example, it is impossible to conclude whether the difference is caused by prolonged individual fixation duration or elevated refixation probability, both of which are part of the definition of gaze duration. The complex relations between these variables create obstacles in attributing and interpreting empirical discoveries (see Inhoff & Radach, 1998). Moreover, the composite variables give the appearance of measures of independent events, which may mislead researchers. It is easy to forget that the fixation duration is not only determined by the characteristics of the currently fixated word, but also affected by that of the neighboring words, for instance, through parafoveal previewing (e.g., Henderson & Ferreira, 1993). The probabilities of refixating and skipping a word are also strongly related to the location of the previous fixation. Such information is lost when the composite variables are calculated and entered in statistical procedures such as ANOVA, which are designed for testing independent samples. When these temporal correlates are excluded from data analysis, one runs the risk of overestimating the effects of factors related to the foveal words and overlooking potentially important temporal effects.

2

Strictly speaking, the same problem also applies to measures such as first fixation duration and single fixation duration. Although they do not involve summation over multiple eye movements, they are in fact contingent on the fact that the word is being fixated (i.e., first fixation duration is defined as missing for skipped words), and thus are not statistically independent from other variables. They have to be interpreted in relation with, e.g., skipping rate.

7 Finally, one’s choice of composite variables is often tied to a favored theory of eye movement control. For instance, researchers who believe that lexical processes drive reading eye movements tend to focus exclusively on measures related to fixation-duration, whereas proponents of oculomotor or perceptually-oriented theories pay more attention to saccade patterns. Some researchers believe that measurement and theory should be tightly bound. For example, Rayner (1995) complained that many psycholinguistic researchers “probably don't have a model of eye-movement control in mind. In fact, they probably feel that it's not necessary to specify a model dealing with where the eye lands. All they care about is that gaze durations are variable as a function of various linguistic variables” (p.12). Philosophically it may be impossible to separate measurement from theory. But this does not mean that one has to subscribe to a particular theory in order to describe eye movements. Underlying the problem of composite variables is the mismatch between the dynamic, stochastic nature of reading eye movements and the mathematical tools chosen to represent eyemovement patterns. As illustrated above, a small set of simple statistics cannot sufficiently summarize a series of eye movements. And the problem is exacerbated by simply adding additional composite variables, which causes confusions at both the conceptual and empirical levels. Eye Movements as Stochastic Processes The solution is to describe reading eye movements as stochastic processes rather than independent events. Reading eye movements may be conceptualized as a series of events (fixations), each of which may be measured by two continuous variables – fixation duration and saccade length (of the saccade that follows the fixation). Eye movements are stochastic because

8 the values of fixation duration and saccade length at fixation t are probabilistically determined by those of the previous fixation at t-1, or even t-2, … etc. To further simplify the problem, we can code saccades in terms of a finite number of moves, corresponding to the number of words each saccade covers (e.g., +2 for moving forward 2 words, -1 for moving backward 1 word, etc.). We assume that at the time of planning a saccade, each move has certain probability of being chosen. Reading saccades are now described as discrete events (different moves) happening at discrete times (when saccades are made, which is assumed to be instantaneous under the current level of abstraction). Such a stochastic system may be well modeled by a classical Markov model. In a simple first-order Markov model, a system x is assumed to have a finite number of states, Xi, i=1..k (in our case, there are k possible moves). The system may change from one state to another at a designated time (making different kinds of saccades), and the probability of being in state Xi at time t depends only on the previous state but not any earlier state history. Mathematically, the probability of making move Xi at fixation t is P(xt=Xi| xt-1, xt-2, xt-3, … x1)= P(xt=Xi| xt-1) In other words, the probability of the current state is independent of events prior to the last state. The above model is referred to as the first-order Markov model because the conditional dependency extends one step back. In a zero-order Markov model, also known as the randomwalk model, the current state is completely independent of any previous history. The system effectively describes a sample of independent events. It is also possible to derive higher-order Markov models, in which the current state depends on the previous n states, but computational cost becomes prohibitive as n increases. The first-order Markov model often offers a good

9 approximation of short-term temporal relations in data. How would the Markov model help in conceptualizing and describing reading eye movements (in this case, saccades)? Assuming that a first-order Markov model as described above is applicable, all the dynamics of eye movements are summarized in the model’s transition probability matrix. Any saccade-related composite variable, such as skipping rate or average saccade length (in words), can be mathematically derived from the matrix. In fact, they can be computed from the marginal probabilities of the transition matrix. With the transition matrix one may answer much more detailed questions about saccade programming, such as “if the current saccade is a refixation versus a regression, is it more likely to skip a word in the next saccade?” Markov models have been used to summarize eye movement patterns in picture viewing and scene perception (e.g., Stark & Ellis, 1981). Describing reading eye movements, however, is a different matter. There are at least two major obstacles to using a simple Markov model for reading. First, the Markov model described above can only deal with discrete events, but fixation duration and saccade length are continuous measures. Fixation duration is highly informative in reading research, perhaps more so than in picture perception studies. Although one may be able to code saccade length as discrete values (number of words), fixation duration cannot be treated in the same way. How to model continuous data is a problem to be solved. In addition, the classical Markov framework assumes a constant transition matrix, i.e., the transition probabilities remain unchanged. This is unrealistic for reading, because it excludes the possibility of linguistic or other factors affecting eye-movement programming. One possible extension of the model is to allow relevant factors to change the transition probabilities. In other

10 words, the transition probabilities are probabilistically dependent on the values of linguistic and oculomotor variables. The current research is an exploration in this direction. From Measurement to Modeling This chapter started by identifying the problem of describing eye-movement patterns as a critical link for reading eye-movement research. It then pointed out problems associated with the use of composite variables and argued that reading eye movements should be treated as timeseries data. Nonetheless, it not only does not offer a simple solution for describing eye movements, but also calls for more sophisticated mathematical models. The conclusion may come as a surprise, but it sheds light on the nature of the problem. Describing eye-movement patterns is not a measurement problem. It is squarely in the domain of mathematical modeling because it deals with numbers – measures of eye movements, not eye movements themselves. Reading eye movements are complex, so their description requires more than basic mathematical tools. In searching for the right tools to model reading eye movements, it is critical to understand their mathematical properties. There have been a number of quantitative models of reading eye movements. Although they come from different perspectives, each summarized constraints and regularities of reading eye movements. They present a natural starting point for the current exploration.

11 CHAPTER 2. A SURVEY OF QUANTITATIVE MODELS Much of the history of reading eye-movement research can be characterized by debates over eye-movement control mechanisms (see Rayner, 1998, for a brief review of different theories). Until recently, reading eye-movement theories were largely verbal descriptions of hypothetical mechanisms with some supportive evidence. Testing these theories was difficult, if not impossible, because they were often too vague and flexible to be disconfirmed by empirical evidence. The past decade has seen a spurt of quantitative models that specify theories in the language of mathematics or computer algorithms. The current chapter reviews previous attempts at quantitative modeling of reading eye movements, with emphasis on their modeling approaches, including mathematical models, assumptions about eye movements and model fitting. The goal is to discover facts about reading eye movements, successful modeling approaches, and reasons for failure. The findings of this survey suggest principles for the design of the current model. The survey is not intended to be a review of eye movement theories, although a brief introduction to the theoretical background of a model is given when necessary. Comments after each review are only relevant to the current research and are not meant to be comprehensive discussions. “Direct Control” Model and the READER Simulation Just and Carpenter (1980) proposed two assumptions that linked eye movements and cognitive processes. The immediacy assumption states that a reader tries to interpret each content words of a text as it is encountered, making guesses if uncertain. The eye-mind assumption asserts that the eye remains fixated on a word as long as the word is being processed. Together,

12 these two assumptions formed the basis for a reading model in which eye movements, measured by gaze duration, are controlled entirely by cognitive processes. They supported the two assumptions with regression analyses of reading eye movements, which showed that gaze duration could be predicted from linguistic variables. READER was a computer implementation of their theory of reading and eye movement control (Thibadeau, Just, & Carpenter, 1982). It was designed to be “a natural language understanding system that reads the text word by word, and whose processing time on each word corresponds to the human gaze duration on that word” (Thibadeau et al., 1982, p.158). With respect to eye-movement control, the only eye-movement variable it attempted to model was gaze duration, which, according to Just and Carpenter, was equal to the mental processing time on words. Model structure. READER was implemented as a LISP program. To give the flavor of the system, a partial representation of the word "are" in "Flywheels are …" would be in the following form:

… (WORD2: HAS FEATURE1) (FEATURE1: IS 'A') (WORD2: HAS FEATURE2) (FEATURE2: IS 'R') … (WORD2: IS 'ARE') (WORD2: HAS SUBJECT2) (SUBJECT2: IS WORD1)

… As a complete comprehension system, READER included a variety of components, ranging from a lexicon to a schema-based knowledge representation. Reading started with encoding letters one by one, until the word was found in the lexicon. The ultimate goal was to produce a summary of the passage it “read.” At any moment lexical, syntactic, semantic, and

13 discourse-level analyses were being carried out concurrently and interactively. READER’s gaze duration was measured by a linear transformation of the machine cycles the model spent on processing a word. Just and Carpenter (1980) were explicit about when the eyes should move: “When the perceptual and semantic stages have done all of the requisite processing on a particular word, the eye is directed to land in a new place where it continues to rest until the requisite processing is done” (p. 336). The “requisite processing” could be any (combination) of the reading processes, for example, lexical access or text integration. What is considered “required” depends on the goal of reading. READER assumed a word-by-word reading strategy, targeting the next word in line after finishing processing the current word. The model, however, did allow word skipping when the comprehension processes were able to “predict” the next word – when the lexical activation of the next word was elevated beyond a threshold by other reading processes. The skipped words turned out to be short function words such as “of” and no content word was ever skipped in the model. Parameter estimation. The empirical data for modeling were gaze duration results obtained from a study in which undergraduate students were asked to read some short scientific passages, including the “flywheel” passage that READER read. Gaze durations on each word were first averaged across participants, and then entered as the dependent variable in multiple regression analyses in order to determine the contributions of various textual factors, such as word length and syntactic role. Although primarily a symbolic processing system, READER had quite a few activation weights, memory decay rates, and thresholds in the system that required parameterization. The

14 authors did not mention how values were assigned, nor did they perform any systematic optimization of the parameters. Model fitting. READER’s “reading” performance was evaluated in several ways. READER did a fair job as a comprehension system because it was able to “recall” a reasonable amount of information after reading the passage. Thibadeau, Just, and Carpenter (1982) also compared the effects of various linguistic factors on human and model performances, and concluded that the effects were qualitatively, and sometimes quantitatively, similar. However, they did not perform formal statistical tests to support their conclusions. In fact, Carpenter (1984) argued against overall statistical goodness-of-fit tests and preferred examining mismatches between the model and data. The only quantitative index of model fit was the correlation coefficient between human and READER’s gaze duration over the 140 words, which was approximately r=0.80. Comments. READER might be a successful model of reading comprehension, but it is quite limited as an eye-movement control model. The most obvious problem is that it accounted for only gaze duration and left no explanations for any other eye-movement phenomena. Equally problematic is the fact that the READER simulation was based on a single 140-word passage. The model was never extended to “read” other stories, and there was no evidence that it could be easily generalized to other reading materials. Methodologically, Kliegl, Olson, and Davidson (1982) pointed out that, because the independent variables (linguistic factors) were correlated in their regression analyses, the regression coefficients might not reflect the effects of the factors in the presence of other factors. The validity of the model is consequently undermined because the READER model was tuned to

15 reflect the effects as shown in the regression coefficients. “Attentional Shift” Theory and Reilly’s Connectionist Model In contrast to Just and Carpenter's ambitious project, Morrison's (1984) model was designed to explain basic eye-movement patterns with minimal assumptions. Morrison suggested that eye movements were driven by word recognition. It was assumed that during a fixation, attention would focus on the foveally fixated word until it was recognized. At this moment a signal was sent to the oculomotor system to start programming a saccade to the next word, while in the meantime attention shifted to work on the next word based on peripheral visual information. If the peripheral word was recognized quickly, before the oculomotor system would finish programming the saccade, this saccade command was cancelled and the oculomotor system was instructed to program a new saccade to the word after it. Even if the peripheral word was not completely recognized by the end of the current fixation, the partial processing would still improve word recognition in the next fixation. Various modifications to the model have since been proposed (Henderson & Ferreira, 1993; Kennison & Clifton, 1995; Rayner & Pollatsek, 1989; Reilly, 1993). The most recent version of the Morrison model is the E-Z Reader models (Reichle, Pollatsek, Fisher, & Rayner, 1998; Reichle, Rayner, & Pollatsek, 1999), discussed in the next section. Reilly (1993) aimed to build a common platform, based on a connectionist framework, for testing different reading eye-movement control models. He chose a connectionist modeling approach because of its “ability to model the blending and merging of constraints in lexical encoding and in the production of saccadic shifts” (p. 210). The Morrison model, termed the “Attentional Shift Model (ASM),” is the only model implemented in the paper.

16 Model architecture. Reilly’s connectionist model was composed of three main components: (a) a visual input module, (b) a lexical module, and (c) a saccade programming module (see Figure 1). The visual input module mimicked some interesting details of the human retina. It consisted of a matrix of 26x20 units, representing a horizontal visual field of 20 English letters. When the model “fixated” on a word, letters within the visual field would activate the corresponding units. The farther away a letter was from the center of the fovea3, the lower its overall activation level. In addition, the model implemented two blurring mechanisms – spatial blurring and category blurring – to simulate decreased acuity for eccentric letters. Reilly's model provides a fairly intuitive and physiologically plausible account for visual input during reading. Visual attention was modeled as an inverted “spotlight” on the visual field, which functioned as a filter that severely suppressed the activation of unattended regions4. Attention could be shifted by moving the ‘spotlight,” which in turn would modify the visual input and trigger saccade programming. The lexical module was a fully connected feed-forward network, which took input from the visual input module. The network represented 222 word types in the training corpus. During simulations, a word was considered “identified” if the output activation level became stable.

3

The center of the fovea was the 8th letter position from the left, not the geometric center of the visual field. This simulated the asymmetric perceptual span (McConkie & Rayner, 1975).

4

Reilly (1993) was unclear about the size of the spotlight, but suggested that it has to be small enough to provide a relatively noise-free target for saccade programming. He was also vague on how the movement of the spotlight was guided. Presumably it always jumped to the center of the next word in the periphery.

17 The saccadic control module was a feed-forward network that also took input from the visual module, and activation levels for each letter position were averaged to simulate low-level visual information. The two output units represented saccade directions (left and right); their activation values corresponded to the distance of the saccade, which was used to update visual input after each saccade was carried out. Following Morrison (1984) and Henderson and Ferreira (1993), the saccadic control module was activated either when there was an attention shift or when the fixation “timed out.”5 An attention shift was only triggered when the current word was identified. This lexical access time, in turn, depended on the frequency of the word in the training corpus. Thus, the decision of when to move the eyes was primarily lexically based but was affected by the eccentricity of the word relative to the fovea. Model training and testing. The connectionist model had approximately 65,000 modifiable weights, and the values of these parameters were set through back-propagation training. The lexical and saccadic modules were trained independently. The lexical module was trained using a corpus of three short stories consisting of 222 word types and 863 word tokens. During training, the lexical module learned to identify words at random “retinal” positions (i.e., the word and the attention “spotlight” were randomly placed). The training stopped when the lexical network was able to identify 98.7% of the fixated words. The saccade control module was trained to move to the location of the attention “spotlight.” Special care was taken in Reilly (1993) so that the proportions of progressions,

5

Henderson and Ferreira (1993) suggested that if during a fixation lexical access was not completed after a

18 regressions, and refixations in the training samples closely matched those found in normal adult reading. The saccade module was trained to reach an 80% accuracy level so as to mimic the lessthan-perfect performance of the human saccadic mechanism. Reilly (1993) presented some example output from the simulation study, demonstrating that the model was able to reproduce a range of empirical eye-movement phenomena, including skipping, refixations, the word frequency effect, and the penalty of eccentricity viewing. Reilly (1993) acknowledged that the model was preliminary, and needed fine-tuning to ensure a quantitative fit to empirical processing time and saccade length measures, particularly their distributional properties. Therefore, no formal goodness-of-fit testing was performed. Comments. Reilly’s (1993) neural network implementation of the Morrison (1984) model is unique among the models reviewed here. The model’s connectionist framework and less-thanperfect training criteria imply that eye-movement control is probabilistic. In addition, consecutive eye movements are not independent because parafoveal processing would change the activations in the lexical unit and thus facilitate or hinder word recognition during the next fixation. In short, Reilly’s model strongly suggests a stochastic control mechanism of reading eye movements. “E-Z Reader” Models "E-Z Reader" (Rayner, Reichle, & Pollatsek, 1998; Reichle et al., 1998; Reichle et al., 1999), a series of six computer simulation models, is the latest incarnation of Morrison’s theory. One of the problems with the original Morrison model is that it predicted that the time to process

deadline, the fixation would be terminated automatically.

19 a parafoveal word, which was the time to execute the current saccade, is independent of the characteristics of the word under the current fixation. Experimental evidence suggests that parafoveal processing benefits diminish when the word under fixation is difficult to process (Henderson & Ferreira, 1993). To solve this problem, Reichle et al. (1998) proposed that the signal to shift attention and the signal to program a saccade should be decoupled. Saccade programming was moved to an earlier point, allowing variable time for parafoveal preview of the next word(s). This is arguably the most significant change from Morrison's original model. Other improvements included incorporating contextual predictability to capture effects of higher processes, adding a default refixation strategy in the oculomotor system, implementing penalties for processing noncentrally fixated words, and the incorporation of landing position effects (see McConkie, Kerr, Reddix, & Zola, 1988). The E-Z Reader model is probably the most ambitious modeling endeavor among all models, therefore it deserves more detailed scrutiny. One of the most impressive features of the E-Z Reader modeling effort is the way in which the models have evolved over time. E-Z Reader models were initially built on simplistic assumptions, and became progressively more complex as more assumptions were added to make them more psychologically plausible. The “E-Z Reader 1” model included the basic structure of the models, but did not utilize contextual predictability information and did not have the ability to simulate within-word refixations. Contextual predictability was incorporated into the “E-Z Reader 2” model. “E-Z Reader 3” added a mechanism for intra-word refixations. Penalties for

20 eccentric viewing positions were implemented in “E-Z Reader 4 and 5.” “E-Z Reader 66” (Reichle et al., 1999) is a recent attempt to improve Model 5 by adding the capability to model the effect of within word landing positions (McConkie et al., 1988). Our discussion focuses on the E-Z Reader 5 and 6 models as they were considered the state-of-the-art models by the authors. Model architecture of E-Z Reader 5. E-Z Reader 5 was composed of a lexical module and an oculomotor module. In order to decouple the signal for attention shift from that for saccade programming, lexical access was divided into two sequential processes. The first was the familiarity check (fc), which corresponded to “a rapid feeling of familiarity” or “matching on the basis of global similarity” (Reichle et al., 1998) to all entries in the mental lexicon. It was followed by a process called completion of lexical access (lc), which actually finished word identification. The signal to start programming the next saccade was triggered at the end of the fc stage, before the fixated word was completely identified. Attention shift, on the other hand, was triggered only after the lc stage, when lexical processing is finished. The oculomotor module also included two sequential processes – (a) an early, labile stage (m) of saccade programming that could be cancelled by subsequent saccadic programming, and (b) a later, nonlabile stage (M) in which saccades could no longer be cancelled. The original Morrison model did not have a mechanism for refixations. To explain refixations, Reichle et al

6

Reichle, Rayner, and Pollatsek (1999) had refused to call it “E-Z Reader 6” because they considered it an incremental improvement over the E-Z Reader 5 rather than a qualitatively different one. However the name “E-Z Reader 6” appeared in data tables. It is referred to as ‘E-Z Reader 6” in this paper, because the addition of landing position modeling significantly changed the basic architecture of E-Z Reader 5.

21 (1998) hypothesized a default refixation mechanism that was essentially the same as that of Reilly and O’Regan (1998, 1998): the oculomotor system was assumed to plan refixation at the beginning of each fixation, which was subject to cancellation by a progressive saccade triggered by lexical processing. As in all Morrison family models, reading phenomena in the E-Z Reader model result from variations in the mixture of different processes that take different amounts of time to complete different processes. With respect to the lexical processes, it assumed that the processing times for both fc and lc were linear functions of the logarithm of word frequency, albeit with different slopes, which allowed more parafoveal processing time for high-frequency words (see Figure 2). Additionally, the fc and lc processing times were also functions of contextual predictability and eccentricity of words relative to the retina. To avoid determinism, random variation was explicitly introduced. The lexical processing times were assumed to follow Gamma distributions, with standard deviations equal to one third of their means. For the oculomotor system, the times to complete the labile and nonlabile programming processes were assumed to follow Gamma distributions with means of 150 msec and 50 msec, respectively, and standard deviations of 1/3 of their respective means7. The oculomotor processing times were independent of lexical processes. The E-Z Reader model was able to generate fairly complex eye-movement behaviors. The computer simulations were implemented as stochastic finite state machines, as illustrated in

7

The Gamma distributions were chosen because they showed similar shapes to the empirical distributions. All Gamma distributions in the E-Z Reader series had standard deviations equaled to 1/3 of their means. The ratio was picked for convenience by the authors.

22 Figure 3. Each of the square boxes represents a possible state of the whole system, which is a combination of the states of the lexical and the oculomotor modules. There were 14 states in E-Z Reader 5. The model moved from one state to another if one of the processes terminated and a new process started. The arrows on the diagram mark legal transitions from one state to another. For example, at State 1 the lexical system was doing familiarity check on word N (f(n)) while the oculomotor system was planning a refixation on word N (r(n)). If after some time the labile programming stage (r(N)) of refixation to word N ended and turned into nonlabile programming (R(N)), the system now would move from State 1 (f(n) r(n)) to State 2 (f(n) R(n)). It should be emphasized that although the lexical processes may appear to “drive” reading eye movements in the model, every decision was in fact a result of an interaction, or more precisely competition, between the lexical and oculomotor processing time. This is clearly illustrated in Figure 3. Improvement of E-Z Reader 6. The primary motivation of the E-Z Reader 6 model (Reichle et al., 1999) was to extend the E-Z Reader 5 model to account for landing position effects (McConkie et al., 1988). McConkie et al. found that saccades tend to overshoot targets closer than approximately 7 letter spaces and undershoot those farther than 7 letter spaces. The magnitude of this systematic error was in the range of 0.5 letters per letter PSL. The landing positions were also subject to random error, which follows a Normal distribution. The longer the distance of a saccade the greater the variance in the Normal distribution. These effects were implemented in E-Z Reader 6 with a pair of linear regression formulas. For a given planned saccade length (PSL, the distance between the current fixation position and the center of the intended word; same as launch site in McConkie et al., 1988), the

23 actual saccade length was Saccade length = PSL + ( Ψb − PSL) ⋅ Ψm + E , where Ψb=7 and Ψm=0.4 were fixed parameters derived from McConkie et al.’s (1988) study, and E was a normally distributed random error with a mean of zero and standard deviation given by8:

σ = β b + β m ⋅ PSL

where βb and βm were free parameters to be estimated. Parameter estimation and model fitting. E-Z Reader 5 was modeled on a corpus of adult reading data (Schilling, Rayner, & Chumbley, 1998). Words in the corpus were classified into five categories based on their word frequency. Six eye-movement variables were calculated for each of the categories: (1) mean gaze duration, (2) mean first fixation duration, (3) mean single fixation duration, (4) the mean probability that the word was skipped, (5) the mean probability of making a single fixation, and (6) the mean probability of making two fixations. Model parameters were estimated based on these 30 means. An E-Z Reader model was essentially a Monte Carlo simulation. It took texts, coded in terms of word frequency and contextual predictability, and traveled through the state transition diagram (Figure 3) by random sampling from the Gamma distributions. The simulations were run 1,000 times and the above six eye-movement measures were calculated from the simulated “eye-movement” data.

8

McConkie et al. (1988) estimated that the standard deviation was a cubic function of PSL (see discussions on Reilly & O’Regan’s model in the next section). Reichle et al. (1999) apparently simplified it to a linear function.

24 Model fitting was done using a “grid search” procedure, which involved repeated Monte Carlo simulations with different parameter values that covered the whole (or a reasonable part9 of the) parameter space. The parameter values that maximized the overall fit between the model and empirical data were reported. EZ-Reader is clearly the most ambitious and systematic attempt to date to model control of eye-movements in reading. At the same time, two serious shortcomings in E-Z Reader’s parameter estimation and model fitting led to problems in the model-fitting program. These problems are briefly summarized here; further discussion can be found in Appendix A. First, the computation formula for the goodness-of-fit measure, as described in Reichle et al. (1998), contains two errors. Reichle et al. mistakenly squared one of the elements in the formula, which, instead of normalizing differences, scaled the differences by as much as 100 times. In addition, they used standard deviations when standard errors (of the means) should be employed, which resulted another unintended scaling in the magnitude of about 50. The resulting RMS values, measuring how much variation was left after model-fitting, were reported as statistically nonsignificant, but should have been highly significant. This computational mistake can help to explain another puzzle in the evolution of the E-Z Reader models: the goodness-of-fit measure, RMS, did not improve much, and sometimes even dropped, when new structures and free parameters were introduced. Reichle et al. ignored this warning sign and based their model selection on theoretical arguments rather than on fit with data.

9

Reichle et al. (1998, 1999) were vague on how they chose the range of parameter space.

25 Another problem with the modeling effort was a severe multicollinearity in the measures being fit. I analyzed the basic data for the E-Z Reader modeling, which consisted of 30 means of eye-movement variables. As shown in Appendix A, all six eye-movement measures were so highly correlated in the empirical dataset that after a principle component analysis, a single factor explained 94.6% of variance, and three factors accounted for 99.999% of total variance. In effect, the free parameters in E-Z Reader 1 through 6 were estimated on only 5 points. In addition, the first component was also a linear function of (log-transferred) word frequency. Thus, the only “correct” model based on this dataset of 30 means would be “any eye-movement measure is a linear function of log-transformed word frequency.” Given that this linearity was built-in since E-Z Reader 1, it is not surprising that the later models did not improve model fits. Comments. At the conceptual level, the E-Z Reader model represents a substantial improvement of the original Morrison (1984) model. In particular, two new mechanisms proposed by Reichle et al. (1998) – the decoupling of attention-shift and saccade signals and the default refixation strategy – enabled the model to simulate more phenomena than the original Morrison model. On the other hand, there is as yet little empirical evidence to support the two new assumptions. Their psychological plausibility remains to be seen. As a quantitative simulation endeavor, E-Z Reader has major limitations. Besides the mathematical errors, fitting the model on a small set of means proved to be very problematic. Even if there were not the multicollinearity problem in the data and the modeling were carried out correctly, there would be still no guarantee that the model really described reading eye movements. In fact, it would almost certainly not capture the distributional characteristics of fixation duration and saccade length, given the arbitrary use of gamma distributions.

26 “Strategy-tactics” Theory and the Reilly and O’Regan Simulations O'Regan (1990) suggested that the oculomotor guidance system works according to the following two heuristics: 1. Between-word strategy. Readers fixate on a word until the completion of lexical access or some other significant stage of recognition. Then they pick a target word from the right periphery, attempt to move to the generally optimal viewing position (word center) of the word. In other words, triggering of the between-word saccades is under the control of ongoing psycholinguistic processing, but word targeting is simply an oculomotor process. 2. Within-word tactics. If the landing position is too far from the generally optimal position, the system immediately makes a saccade to the other side of the word, and then returns to the between-word strategy. These tactics are purely oculomotor phenomena and fixation duration and saccade length are independent of psycholinguistic factors. Most models assume a word-by-word reading strategy, but word targeting in the Strategy-tactics model is flexible. O’Regan (1990) presented analyses based on a “careful, wordby-word” reading strategy, but also explored alternative scanning routines. An important challenge for the strategy-tactics theory is to find the word-targeting strategy used in normal reading. The Reilly and O’Regan (1998) simulation study was an attempt to answer this question. The study was based on McConkie et al.’s (1988) finding that the distributions of landing sites on a word tend to follow a normal distribution. Reilly and O’Regan (1998), however, noticed that the there were systematic mismatches between the observed distributions and the predicted normal curves. They argued that the mismatches resulted when the over/undershooting fixations

27 ended up landing on neighboring words. They further predicted that different word-aiming strategies (e.g. “jump to each successive word,” or “skip high frequency words”) would result in different patterns of over/undershooting, and therefore different patterns of deviation from the normal curves. By simulating different word targeting strategies and comparing the simulated landing position distributions to empirical data, Reilly and O’Regan (1998) hoped to identify the most likely word aiming strategy in reading. Reilly and O’Regan (1998; 1998) hypothesized at least six potential word-targeting strategies, which fell in two categories – oculomotor strategies and linguistic strategies. The oculomotor strategies do not require any lexical processing in selecting the next word. They included (1) Random Control10, (2) Word by Word (WBW), (3) Target long word (TLW), and (4) Skip short words (SSW). The linguistic strategies included (5) Skip high-frequency word (SHFW) and (6) Attention shift (AS). The first five strategies are self-explanatory based on their names. The AS strategy was the Rayner and Pollatsek (1989) version of the Morrison (1984) model without the Henderson and Ferreira (1993) deadline hypothesis. Model architecture. All word-targeting strategies were simulated within the same basic framework and differed only in the strategy used for selection of the next target word. Like E-Z Reader, Reilly and O'Regan's model was implemented as a finite-state simulation program. There were three main modules in the model – a lexical system, an oculomotor system for generating refixations, and a saccade triggering system. Before going into details of the modules, let us first get a flavor of how the simulation worked.

10

The Random Control strategy was not modeled because it was rejected outright as impossible.

28 At the onset of a fixation on a word, the lexical and the oculomotor systems worked in parallel. The latter would start to prepare a refixation by default. When the lexical process was completed, it would program a progressive saccade, the target of which was determined by the word-targeting strategy being modeled. When the refixation generation process finished, it would program a refixation. Eye-movement commands such as "move forward" or "stay" were taken by the saccade-triggering module, which handled the oculomotor details of saccade programming. Each programmed saccade took a random time to be triggered. Thus, during each fixation there was a competition between "move forward" and "stay," and the result depended probabilistically on the processing times of the three modules. The above illustrates two interesting features of the Reilly and O'Regan's (1998) simulation. First, although the goal was to simulate landing position distributions, processing times played the most significant role during the simulations. Thus, the Reilly and O'Regan simulations qualify as comprehensive eye-movement models. Second, the default-refixation mechanism clearly reminds us of the E-Z Reader model. In fact, despite the heated debates between the strategy-tactics and Morrison’s theories, they were remarkably similar when implemented as quantitative models, as will be seen in the following discussion of model details. In the Reilly-O’Regan model, the average lexical identification time was a linear function of the logarithm of word frequency. It was also a function of the length of the currently fixated word and landing position eccentricity. Individual lexical access times followed a normal distribution, whose standard deviation was 1/10 of its mean (chosen for convenience).

29 Refixations have a special importance in the Strategy-tactics theory11. The probability of refixation was a function of word length and eccentricity of landing position (McConkie et al., 1989). The time to prepare a refixation was a linear function of eccentricity (off-center fixations resulted in shorter refixation latencies) but was independent of word frequency. It was assumed to be normally distributed with a standard deviation of, again, 1/10 of its mean. The time between programming and actually triggering a saccade – the oculomotor delay – was assumed to be a random variable12 with a mean of 150 msec and a standard deviation of 50 msec, and was not affected by lexical or any other processes. The landing position of a saccade was a normally distributed random variable whose mean and standard deviation were determined according to the original McConkie et al. (1988) formulas: m= 3.3 + 0.49 d , sd= 1.318 + 0.000518 d3 , where d is the distance (in letters) between the launch site and center of the intended word, which was effectively the PSL in the E-Z Reader 6 model. Parameter estimation. Most parameters of the model were fixed. They were assigned

11

Interestingly, Reilly and O'Regan (1998) did not specify where refixations are targeted. It is possible that, like

inter-word saccades, they all aim at the center of words. However, O'Regan (1990) maintained that refixations tend to land on the opposite side of the launching site. There is no basis in Reilly and O'Regan to judge how this was implemented in their simulations.

12

Reilly and O'Regan (1998) did not state the distributional form of the oculomotor delay. I assume it is a normally

distributed random variable, just like all other random variables in the model.

30 either on the basis of previous findings or with convenient values. There were, however, a few free parameters, all of which were part of the word-targeting strategies. For example, in the Target Longest Word (TLW) strategy one had to determine the size of the visual field from which the "long" word would be picked. When there were one or more free parameters, Reilly and O'Regan (1998) picked some reasonable and convenient values and ran the simulation multiple times. There was little systematic parameter estimation. Modeling results and Model testing. Simulation materials were taken from the same text as in McConkie et al. (1988); only word length and frequency information were used. For each strategy, 20 trials were run with different random seeds. For each simulation, analyses similar to McConkie et al. were conducted. Simulated landing site distributions were subtracted from the hypothetical normal distributions for individual words. The authors looked at the patterns of discrepancies for each word-targeting strategy and searched for ones that were close to the empirical pattern. Simulation results were reported mostly qualitatively. Reilly and O'Regan (1998) did not perform any statistical test to compare the fit of models based on different strategies because the strategies had different numbers of parameters and might not be readily comparable. The only quantitative measure of the models' goodness of fit with empirical patterns was correlation coefficients13, along with statistical tests of whether each was significantly different from zero. Reilly and O'Regan relied heavily on the magnitude of the correlation coefficients to choose the most likely word-targeting strategy.

13

The "concordance measure (rc)" in Reilly and O’Regan (1998) was a correlation coefficient. When there were free

31 Findings of the simulations were complicated and will not be reported here in detail. The Word-by-Word strategy was shown to fit the data poorly. As for Morrison’s Attention Shift model, Reilly and O'Regan concluded that there was not enough time to identify words in the parafovea with the attentional shift mechanism14, and that the details of the AS model might need some revision15. Reilly and O'Regan (1998) favored the “Target the Longest Word” strategy. They concluded, “The results, therefore, suggest that the eye-movement guidance system does not generally use linguistic information, but exploits word-length information in the right parafovea to target the next saccade” (p.316). Comments. These conclusions, however, are highly suspicious because of several methodological and conceptual problems. The first concern is whether Reilly and O'Regan's findings were robust. The effects they tried to model (deviations of fixation position distributions from normal distributions) were very small. Comparing models based on these statistics thus becomes very tricky. With an arbitrary simulation sample size of n=20, the statistical power of these tests is very questionable. In addition, the normal distribution hypothesis was a convenient

**parameters and there was a "grid-search", rc's of all simulation trials were reported in a table.
**

14

Reilly and O'Regan rejected an alternative explanation that the time estimates for word identification were too

long. They argued that the lexical processing time estimates were based on those of Rayner & Pollatsek (1989, p. 176), which had been shown to be quite reliable and was supported by other sources. Without direct evidence, this argument does not seem strong. In fact, even if individual parameters of lexical processing time were accurately estimated, the overall time could still be an overestimate. See later discussion on the use of regression coefficients when independent variables are correlated.

15

Reilly and O'Regan suggested adding contextual predictability to reduce lexical identification time, which,

interestingly, was exactly one of the new features in Reichle et al.'s (1998) E-Z Reader models.

32 modeling choice16 in McConkie et al. (1988). Suppose the actual landing position distribution was a slightly positively skewed distribution (e.g. a lognormal distribution), it might well require a word-targeting strategy other than TLW to produce a pattern that would match the empirical data. The second problem is the use of a correlation coefficient rc as the goodness-of-fit index. Given that Reilly and O'Regan were modeling a fairly small effect, all deviations would be close to zero and thus correlation coefficients would be expected to be low and variable. Choosing a model on the basis of absolute values of correlation coefficients, as Reilly and O'Regan did, is risky. There is no guarantee that a model with r= 0.34 is statistically better than one with r= 0.30. A better goodness-of-fit indicator is needed to evaluate Reilly and O'Regan's conclusions. In addition, many modeling decisions were quite arbitrary. The assumption that processing times are normally distributed implies that fixation durations, the sum of the component times, would also be normally distributed. This contradicts the well-known fact that fixation durations, like reaction times, follow a positively skewed distribution that systematically differs from normal (McConkie, Kerr, & Dyre, 1994). Similarly, most of the parameters in the model were fixed to convenient values rather than being systematically estimated from data. A different set of values may yield a different conclusion. At the conceptual level, it is unclear why readers would necessarily follow a single wordtargeting strategy. It is conceivable that the eye may be attracted by a host of different features, such as word length, orthographic structure (Liversedge & Underwood, 1998), or the likelihood

16

Reilly and O'Regan dismissed the choice of distribution other than Normal as "unparsimonious."

33 of being identified parafoveally (Brysbaert & Vitu, 1998). There may also be individual differences in word-targeting strategies. If these are true, Reilly and O’Regan’s attempt to identify strategies is doomed to fail. A more fruitful approach seems to be to describe directly how readers actually target words in reading, instead of presupposing any fixed strategy. Mr. Chips: The Ideal Observer The ideal observer models take a different modeling approach from the previous ones. “An ideal observer is an algorithm that yields the best possible performance in a task that has a well-specified goal…” (Legge, Klitz, & Tjan, 1997, p. 525). In other words, an ideal observer model begins by specifying a goal and task constraints and tries to find an optimal solution. Its objective is not to describe human data but to compare human performance to that of the optimal algorithm. “The ideal observer provides an index of task-relevant information by showing the performance level that can be achieved when all of the information is used optimally. Comparison of human performance to ideal performance can establish whether human performance is limited by the information available in the stimulus or by information-processing limitations within the human” (p. 525). Mr. Chips (Legge et al., 1997), a computer simulation program, attempted to identify the optimal strategy for saccade programming that minimizes uncertainty in word recognition. In the simple world Mr. Chips lived in, reading had one goal – to identify each and every word – and two constraints – the limited visual acuity of the retina and inaccurate control of eye movements. Mr. Chips attempted to “read” a word list with the minimum number of saccades and identify each word in order. This was achieved by carefully calculating the best landing position of the next saccade so as to minimize uncertainty in word identification. Its calculation was based on its

34 lexical knowledge, the (partial) information from its "retina," and characteristics of the oculomotor system. Note that Legge et al. did not try to simulate the temporal dimension of reading17. Model architecture. As shown in Figure 4, Mr. Chips had three main modules – the retina, the lexicon, and the oculomotor system. Mr. Chips' retina consisted of three regions: (a) high-resolution vision in which letters can be identified, (b) low-resolution vision (relative scotomas) in which spaces can be distinguished from letters but letters cannot be identified, and (c) blind spots (absolute scotomas) where there is no vision. Mr. Chips had a lexicon composed of the 542 most common words in written English, along with their relative frequencies. The reading materials (word lists) were randomly sampled from Mr. Chips' lexicon. At the core of Mr. Chips was the algorithm for calculating and minimizing uncertainty about the current word. This was done in two steps. Based on the partial visual information from the retina (some identified letters and word length), Mr. Chips extracted from the lexicon a list of candidate words. If the list had more then one word (i.e., the word could not be uniquely identified) Mr. Chips would compute an entropy value, an index of the amount of uncertainty, based on the frequencies of the candidate words, for every possible landing position of the next saccade (most likely refixations) and select the movement that was most likely to identify the word. This is the "entropy-minimization principle" underlying the ideal-observer model.

17

Legge, Klitz and Tjan (1997) did include a section discussing the "reading speed" of Mr. Chips, but this speed was

35 Like humans, Mr. Chips' saccade execution could be imperfect. In one version of the model, its saccade length followed a normal distribution. Mr. Chips had to incorporate this statistical information into saccade programming. Parameter estimation. Because it is an ideal-observer model, Mr. Chips’ parameters were manipulated by the modeler rather than estimated from data. For example, Legge et al. (1997) explored the effects of smaller vocabulary size and abnormal retina on reading saccade programming. Parameters were not estimated from human data. Modeling results. The virtue of an ideal-observer model is not how well it approximates behavioral data, but how it can help to understand human behavior. Several human eyemovement phenomena, such as refixations, regressions, word skipping, etc., emerged from following the simple entropy-minimization algorithm. Mr. Chips also showed an “optimal viewing position” – it tended to land on the third letter position on a word. Interestingly, Legge, et al. (1997) showed that the “eye-movement behaviors” of Mr. Chips could be characterized with a few simple heuristics, despite the complex internal mechanisms of the model. For example, Legge et al. (1997) demonstrated that almost identical performance could be obtained when only word length information was used. This is consistent with the finding in reading literature that eye-movement guidance is primarily based on word boundary information (McConkie & Rayner, 1975; Rayner, 1986). Legge and colleagues also showed that Mr. Chips’ eye-movement strategies, such as the optimal viewing position effect, could be summarized by a set of simple if-then heuristics. Together these findings suggest that an

estimated from its saccade length by assuming an average 250 msec fixation duration.

36 eye-movement control system may achieve optimal reading performance without actually doing expensive entropy calculations or using high-level information. Comments. The Mr. Chips model sheds light on some important issues in modeling eye movements. It demonstrated that eye movements could be described at a behavioral level separate from the underlying mechanisms. Another important insight is that simple discrete algorithms (“targeting word centers”) could achieve near optimal performance compared to the costly “continuous” control (“minimizing entropy”). These became important design principles for my research. Stochastic Models by Stark and Suppes Two scholars, notably not mainstream reading researchers, have tried to describe reading eye movements with stochastic models (Stark, 1994; Suppes, 1990, 1994). Both of them chose to use Markov models (see the first chapter for a brief introduction) to capture the dynamics of eye movements. Scanpath theory of reading. Based on his research on scanpaths (Hacisalihzade, Stark, & Allen, 1992; Stark, 1994; Stark & Ellis, 1981; Zangemeister, Sherman, & Stark, 1995), Stark (1994) proposed that the sequence of reading fixations could be modeled as a Markov process, or a “scanpath.” Stark proceeded by treating each word in a text as a possible state and describing reading as going through a series of states. The probability of jumping from one state (word) to another constituted a Markov transition matrix, and the transition matrix could fully describe the stochastic properties of reading fixation sequences. Further more, Stark introduced string-editing distance (Wagner & Fischer, 1974) as a measure of the similarity between two fixation sequences, which could be desirable for reading research.

37 Comments on the scanpath model. Stark’s scanpath model has been largely overlooked in the reading research community. One of the reasons is that the way Stark formulated the Markov transition matrix originated from picture perception studies and might not be suitable for reading research. By setting each word as a state, Stark implied that the eye might jump from a word to any other word in reading. While this is possible, such wild saccades are very rare in reading. Compared to picture viewing, reading is a much more constrained task, where the eyes almost always move to adjacent words and wild jumps are rare. It is more intuitive to consider a more localized Markov process, in which the possible moves of the eye are limited to nearby words. Suppes’ Stochastic model. Suppes' (1990, 1994) reading eye-movement control model provides a relatively comprehensive treatment of eye movements – modeling both fixation duration and saccade programming – and thus is discussed in more detail. The stochastic model was derived from Suppes’ earlier models of eye movements in doing multi-digit arithmetic (Suppes & et al., 1983). The reading counterpart consisted of two increasingly complex models – the minimal-control model and the text-dependent probabilistic control (TDPC) model. In the minimal-control model, Suppes attempted to simulate fixation duration as a pure random variable that was not affected by on-going reading processes. In contrast, saccade direction and size were under complete cognitive control18. The minimal-

18

Suppes (1990) was inconsistent about this. Despite the facts that (a) the axioms unequivocally showed that

saccade targeting was determined by the underlying cognitive processing, and (b) he clearly stated that “direction and size of saccade are under cognitive control in this minimal model” (p. 466), Suppes maintained the following: “It was assumed that most of the process is an automatic low-level process, little disturbed by cognitive and linguistic aspects of reading. The two basic assumptions of the minimal control model were (a) durations of

38 control model did not cover many empirical findings, therefore a revised model, the TDPC model, was derived to “take into account the local variables that have the largest effects on eye movements” (p. 472). Because the revised model does not change the fundamental architecture of the “minimal control” model, the following discussion is primarily based on the initial model. Model architecture. Suppes’ models were defined in terms of axioms, or fundamental hypotheses about the principles of eye-movement control. A system of axioms was then translated into mathematic functions, for instance, a distribution density function of fixation duration. Some of the axioms would undoubtedly surprise mainstream reading researchers. For example,

AXIOM F1. The execution time of each eye-control instruction is independent of past processing and the present stimulus context.

…

AXIOM D1. If processing is complete in a given region of regard, then move to the next word of text.

…

AXIOM D5. A saccade is independent of past motion and earlier stimuli.

With respect to fixation duration, the axioms implied that it should be a mixture of an (a) exponential random variable and (b) a convolution of two identical exponential distributions. For saccade programming, Suppes proposed a Markov model that was more intuitive than Stark’s scanpath formulation. He categorized saccade moves into five states: move forward, regress, refixate, skip the next word, and others. According to the axioms in the minimal-control

fixations are not affected by the content of the reading text, and (b) the length of saccades is not influenced by text context but only by the physical layout of the page” (p. 465).

39 model, saccade programming was a zero-order Markov process, also known as a “random walk.” At any time point in time, the probabilities of making the five moves were constants, independent of previous states19. The revised TDPC model added only one change to the fixation duration axioms – the execution time of each eye-control instruction decreases monotonically along the line of text (Heller, 1982). Factors that have been central to other models, such as word frequency or syntactic effects, were dismissed as having “only relatively small effects” (Suppes, 1990, p. 473). More changes were made to the axioms for saccade control, incorporating the effects of the optimal viewing position, word length, and syntactic difficulty. However, these patches were added in such a haphazard fashion that it became impossible to evaluate the mathematical properties of the model. Parameter estimation, model testing, and model comparison. The distribution of fixation durations was a fully parameterized mathematical model, which had been fitted to eyemovement data from Suppes’ arithmetic experiments. Models with the best fitting parameters showed a “reasonably good” fit, but Suppes acknowledged that they would have been rejected by a formal goodness-of-fit test. He did not report the fitting of any reading data. There are reasons to believe that the fit would not be better than that of the arithmetic data20.

19

Suppes (1990) was not consistent on the nature of the Markov process. While he clearly intended to promote a

random-walk model (p. 467), a few axioms referred to an undefined concept of “processing.” Depending on the outcome of the processing, different saccadic moves might be taken. This violated the basic assumptions of a random-walk mode.

20

Suppes (1990) acknowledged that reading fixation duration was typically less variable than those in doing

40 Suppes did not develop the saccade control system in any depth beyond the five axioms. This part of the model was not explicitly expressed in a mathematical form. No quantitative test of the models was given in Suppes (1990; 1994). The choice of the TDPC model over the minimal control model was based solely on theoretical analyses. Comments. Although an extremely limited attempt, Suppes (1990; 1994) outlined the possibility of Markovian models in describing reading eye movements, both fixation duration and saccades. An obvious problem with the Markov models in both Stark’s (1994) and Suppes’ models is that they were not flexible enough to take into account other factors, such as word frequency. A Markov model with a hierarchical structure will be explored in the current research. In addition, Suppes’ model is one of the first attempts to explicitly model the distribution of fixation durations. Although it failed (McConkie & Dyre, 2000), it called much needed attention to the importance of modeling not only the means but also their distributions. Normal Eye Movements: McConkie and colleagues' mathematical modeling The goal of McConkie and colleagues' research is best summarized by the title of McConkie, Kerr, and Dyre (1994) – “What are ‘normal’ eye movements during reading: toward a mathematical description.” Some of their representative studies include the modeling of landing position distributions (McConkie et al., 1988; Radach & McConkie, 1998), refixation frequencies (McConkie et al., 1989; Radach & McConkie, 1998), skipping rates (Kerr, 1992; McConkie et al., 1994), regressions (Vitu & McConkie, 2000; Vitu, McConkie, & Zola, 1998),

arithmetic, therefore an exponential-based model may not work well. Furthermore, the mixture distribution Suppes proposed typically shows two modes, but reading fixation duration distribution is usually unimodal.

41 and distributions of fixation durations (McConkie & Dyre, 2000; McConkie et al., 1994). Summarizing this line of research turns out to be difficult, because models for individual components are still evolving and pieces of the model have not been completely put together. Nevertheless, the central theme of this line of research is to mathematically describe regularities and constraints that are inherent in eye-movement data. Many of its findings have become the foundations of other modeling efforts (e.g., Reichle et al., 1998; Reilly & O'Regan, 1998). McConkie and colleagues decomposed the problem of reading eye-movement control into two separate decisions: (a) where to move the eyes and (b) when to move them. With respect to the WHERE decision, a further distinction has been made between where the eyes are intended to go and where they actually land. Therefore there are three main components in McConkie and colleagues’ eye-movement control model: saccade target selection, saccade execution, and fixation duration control. Saccade execution. McConkie et al. (1988) found that the landing positions of fixations relative to a word was a bell-shaped curve centered near the center of the word (see Figure 5A and 5B). The shape of the curve could be approximated with a normal distribution, whose mean and variance were functions of the launch site (planned saccade length, PSL, in Reichle et al., 1998) and word length, among other factors. McConkie et al. (1988) proposed that saccades were targeted at word centers but missed the targets because of two sources of error in the visuomotor system. A saccadic range error was responsible for the systematic overshooting of near targets and undershooting of far away targets. A random placement error caused the random spread in landing positions. Together the landing position distribution could be summarized with a linear regression function, as discussed in the E-Z Reader model and the Reilly and O’Regan

42 model. McConkie, Kerr, and Dyre (1994) concluded that landing position was not under the control of higher levels processes. McConkie et al. (1994) reported that the landing position distributions on pseudo-words or nonsense letter strings, embedded in continuous text, were essentially the same as those for normal words. This was further confirmed in Radach and McConkie (1988), which found that landing position distribution was affected by word length and word position in a line, but not by the duration of the previous fixation or the “informativeness” of the initial trigram of the next word. These findings suggested that saccade execution should be modeled independently from cognitive processes. Saccade target selection. An essential assumption in McConkie and colleagues’ framework is that eye movements are targeted at the center of words when they are planned. Which words are selected to be the targets, then, becomes the key question. Three types of eye movements are particularly interesting – refixations, word skipping, and regressions. 1. Refixations. McConkie et al. (1989) examined the frequency of refixating a word immediately following the first fixation on it. Based on a large corpus of reading eye movements, they found that the frequency of refixation is a U-shaped function of the initial landing position on the word. The probability of making a refixation is higher if the eye lands near the ends of a word then at the word center. McConkie et al. concluded that the initial landing position is the primary determinant of refixations. In addition, Radach and McConkie (1998) analyzed landing positions as a function of launching site for both forward and regressive saccades and concluded that there is no evidence for different mechanisms, which questioned the basic hypothesis of the strategy-tactic theory (O'Regan, 1990).

43 2. Skipping. McConkie, Kerr, and Dyre (1994; see also Kerr, 1992) found that the frequency of skipping the next word could be expressed in a three-parameter function21:

p( skip ) = 1 −

Max − Min 1 + e A×LaunchSite− B

where Max is the maximum of the curve and equals 1, Min is the minimum value reached by the function, A controls how rapidly the function rises, and B is the inflection point of the curve. The parameter values depended on word length, as shown in Figure 6. McConkie, Kerr, and Dyre (1994) hypothesized a word-skipping mechanism based on the concept of a visual clarity threshold that must be met for a word to be skipped. The above equation could be interpreted as the proportion of words exceeding the threshold for a given distance (measured as launching site). Brysbaert and Vitu (1998) proposed a similar theory based on the “Extended Optimal Viewing Position (EOVP)” effect (Brysbaert, Vitu, & Schroyens, 1996), where the eye guidance system constantly estimated the probability of recognizing a peripheral word within typical fixation duration. The system would probabilistically skip words that were highly likely to be recognized at the end of the current fixation. Brysbaert and Vitu (1998) obtained good fit to empirical skipping rate data with a one-parameter model. Determining whether or not to skip a word is only part of saccade programming. To complete the picture one needs to know how the saccade targeting system selects among many potential targets. Neither McConkie et al. (McConkie et al., 1994) nor Brysbaert and Vitu (1998)

21

McConkie, Kerr, and Dyre (1994) presented the equation in a equivalent but slightly confusing form:

p( skip) = 1 +

Min − 1 1 + e A×LaunchSite− B

44 addressed this issue. 3. Regressions. The phenomenon of regressions has been less well understood, in part because of the long-held belief that they were results of comprehension break-down and thus should be excluded from analysis (e.g., Reichle et al., 1998). Most recently, McConkie and colleagues (Radach & McConkie, 1998; Vitu et al., 1998) have made some intriguing discoveries about regressions. Vitu et al. found that both low-level factors (e.g., the length of the previous saccades) and linguistic factors (e.g., word frequency of skipped words) affected the likelihood of regressing after a word is skipped. Their results indicated that the phenomenon is complex and is unlikely to have a single cause. Radach and McConkie (1998) looked at the question of whether regressions are generated by a different mechanism from that which produces other kinds of saccades. The analyses of launch site effects showed that there was little systematic range error in interword regressions (see Figure 7). Regressive refixations, on the other hand, show the same range of errors and random errors as forward saccades do. Their results indicated that the control of interword regressions was functionally different from that in making forward saccades or refixations. Fixation duration. Early attempts to model the distribution of fixation durations have been incomplete and unsuccessful (Harris, Hainline, Abramov, Lemerise, & et al., 1988; Suppes, 1990), in part because their model choices were mainly based on theoretical speculations22. In

22

Suppes’ (1990) fixation duration model was derived from the axioms, which had no empirical evidence (at least

in reading research). Harris et al. (1988) presumed that saccade latency involved two (independent) consecutive processes. This is logically possible, but there has not been experimental evidence to support it.

45 contrast, McConkie, Kerr, and Dyre (1994) and McConkie and Dyre (2000) emphasized the inherent constraints in the data. McConkie, Kerr, and Dyre (1994) studied the hazard function23 of the first fixation duration distribution, and found it could be approximated by three piecewise linear functions – a slow-rising early piece, a fast-rising period, and a flat, constant tail. Their subsequent modeling effort capitalized on this characteristic form of a hazard function. Like Harris et al. (1988), McConkie, Kerr, and Dyre (1994) hypothesized a two-step process – ordering a saccade and executing a saccade. They further assumed that once a saccade was ordered, there was a random waiting time before the saccade was executed. The random waiting time was assumed to follow an exponential distribution24. The time to order a saccade was modeled by a mixture of two Weibull components with linear, raising hazard functions (for

23

A hazard function, loosely speaking, characterizes the instantaneous probability of an event happening given that

**it has not yet happened. Formally, it can be defined as a function of the probability density function, f(t):
**

h (t ) = f (t ) 1 − ∫ f (t )

0 t

Luce (1986) demonstrates that, compared to the cumulated probability function or the probability density function, the hazard function was more readily interpretable and was more sensitive in differentiating distributions.

24

Interestingly, in Harris et al.’s (1988) model, the exponential component, the “β-period,” corresponded to the

wait-time for ordering the next saccade, not the executing time. McConkie et al.’s (1994; McConkie & Dyre, 2000) interpretation is problematic because a mechanism with exponential wait-time would to be too unreliable to carry out saccadic movements, one of the most frequent movements in humans. In reaction time literature, there had been similar confusions, and the consensus now is that the exponential component corresponds to cognitive or signal processing rather than to the execution (see Luce, 1986).

46 discussion of the Weibull distribution, see Johnson, Kotz, & Balakrishnan, 1994). There was no theoretical reason to choose the Weibull distributions except that they characterized the empirical hazard functions. Putting the two steps together, the distribution of fixation durations (sum of ordering and executing times) was the convolution25 of the two components. This “two-stage mixture” model fitted the empirical distribution very well, as seen in Figure 8, although no goodness-of-fit statistics were reported. Following this initial success, McConkie and Dyre (2000) explored two additional models – a “two-state transition” model and a “two-stage race” model. Although the three models, including the 1994 “two-stage mixture” model, have different assumptions about the underlying mechanisms that determine fixation duration, they were designed to closely mimic the piecewise linear hazard function of the empirical data. Consequently, they fit empirical data equally well. There was no evidence that one mechanism was more plausible than another. Comments. While there has not been a unified model, this line of research has contributed much quantitative knowledge to our understanding of reading eye movements. The power of the data-driven modeling approach is self-evident as two competing models – the E-Z Reader 6 model (Reichle et al., 1999) and Reilly and O’Regan’s (1998) model – both implemented McConkie et al.’s (1988) formulas. With respect to saccade programming, McConkie et al.’s (1988) proposal of a two-level saccade control model has been widely accepted. In this hierarchical model, cognitive effects are

25

**The distribution of the sum of two random variables is the convolution of the two distributions. Mathematically,
**

t

hg + f (t ) = ∫ f ( x ) ⋅ g (t − x )dx

0

47 confined to the level of selecting of target words, and have only discrete control – selecting which word but not where in the word to land the eyes. The continuous nature of saccade length is a result of random and systematic errors, and saccade execution is conditionally independent of higher processes. This conceptualization greatly simplified the interpretation of saccade control in reading. The SHARE architecture is an extension of this probabilistic, hierarchical structure. McConkie et al.’s (1994; McConkie & Dyre, 2000) modeling of fixation duration distribution is also inspiring. The reason for their unprecedented successes is not a superior theory or mechanism, but their data-driven modeling approach – the choice of using piece-wise linear models to estimate empirical hazard functions. This suggested that one might go a step further and question the only major a priori mechanism hypothesis in their models, the assumption of the saccade ordering and executing steps.

48 CHAPTER 3. DESIGN PRINCIPLES The previous chapter surveyed some of the previous attempts to quantitatively account for reading eye movements. Their successes and failures illustrate some important issues that any quantitative model trying to describe reading eye movements has to address. A modeler has to make conscious decisions about them. The choices will constrain his or her modeling approaches. Eight such issues are presented below as dichotomies, although the choices are often neither mutually exclusive nor limited to two. They represent the decision process through which the current model has been shaped, and provide a framework for presenting the rationale for the basic modeling choices made in the research to follow. Theory-driven vs. Data-driven Modeling Rayner (1995, see chapter 1) raised an important issue – do we need a theory of eye movements in order to measure and describe them? The question may be pursued in two senses: whether we should try to describe eye movements without subscribing to a particular theory, and whether we are able to do so. My response to the first question is that we should try to develop a theory-neutral descriptive framework for eye-movements, to the extent we can. Current theories of reading eyemovement control – e.g., the strategy-tactics theory (O'Regan, 1990; O'Regan & Jacobs, 1992) and theories based on Morrison (1984; e.g., Rayner & Pollatsek, 1989) – are collections of hypotheses about the underlying mechanisms and processing. While these hypotheses are inspired by empirical findings, there is no evidence that any particular theory is indisputable. The field of reading eye movement research has not reached a stage where theories are well

49 established and few facts are left to be found. On the contrary, as some most recent studies suggest (e.g., McConkie & Dyre, 2000; Shillcock, Ellison, & Monaghan, 2000), we are just starting to discover some of the basic constraints and regularities of eye movements. At this point, our observations should not be limited and biased by existing theories and models. The extent to which we can describe reading eye movements without subscribing to a particular theory is an empirical question. The SHARE architecture is an attempt to model eye movements with a minimal number of assumptions about the underlying mechanisms and processes. The current research approaches the problem by analyzing the logical constraints for the modeling task, carefully selecting the mathematical model, and employing powerful algorithms to estimate model parameters. The goal of the model is to capture the “essence” of eye movement patterns so that it can reproduce eye movements with the same pattern, or predict the next fixation, among other things. What can we gain from an “atheoretical26” model, assuming it does achieve its goal? First of all, such a data-driven modeling approach is just an extension of several lines of successful research looking for structures in the eye movement data. By using a more powerful

26

The term is used in contrast with a model based on a particular existing theory, in particular a theory that heavily

emphasizes on hypothetical mechanisms. There is no such thing as atheoretical modeling. Every mathematical operation imposes, explicitly or implicitly, structure and assumptions on the subject matter, and these assumptions are part of the theory. Consider, for example, why the model “1+1=2” fails to model the volume of a cup of sugar mixed with a cup of water, or what a better-fit model “1+1=1” (more correctly f(1,1)=1) reveals about the underlying mechanism of the above mixing process. The assumptions of the current model will be discussed in the rest of this chapter and the next chapter.

50 mathematical model (see discussion in Chapter 1) more should be learned about the inherent regularities in the data. Secondly, although the model does not hypothesize about the mechanisms, it tests whether a mathematical structure is adequate to describe some aspect of eye movements, which in turn constraint potential mechanisms. Last but certainly not least, the ability to faithfully describe eye-movement patterns will enable many applications of eyemovement methodology that were previously unavailable. In short, a data-driven modeling approach is a valuable way to contribute to our understanding about reading eye movements, and at the current state of knowledge it is a muchneeded complement to the development of eye-movement mechanisms. The rest of the chapter discusses some of the important modeling decisions in choosing the modeling structures and tools. Deterministic vs. Probabilistic Modeling There is enormous variation in reading eye movements. One may try to account for every bit of the variation in a model, or assume at least part of the variation is due to random fluctuation. The models surveyed in the last chapter vary along this dimension. The READER model (Thibadeau, 1983; Thibadeau et al., 1982) exemplifies the deterministic approach, where variation in gaze duration was precisely determined by the intricate comprehension processes. At the other extreme, Suppes (1990; 1994) hypothesized that fixation duration was a pure random variable independent of any other factors. Most models took the middle ground, but the sources of random variance were introduced very differently. The noise in Reilly’s (1993) connectionist model was built into the neural network architecture and training. Both the E-Z Reader and the strategic-tactics models

51 introduced arbitrary (and different) random variance to lexical and oculomotor processes. It is particularly interesting for the E-Z Reader model, because Morrison’s original model was presented as a deterministic machine. Neither model took the step to verify that their models have probabilistic characteristics similar to the empirical data27. In contrast, distributional properties of random components, such as means and standard deviation, were directly taken from McConkie and colleagues’ estimates (McConkie & Dyre, 2000; McConkie et al., 1988; McConkie et al., 1989). The most illuminating example on the issue of deterministic versus probabilistic modeling is Mr. Chips. The basic model was purely deterministic. Every move was carefully calculated to minimize lexical uncertainty. However, the outcome of the complex deterministic process could be modeled with surprisingly simple probabilistic heuristics. It suggests the strength of probabilistic modeling, even if there is a complex deterministic underlying mechanism. The current research employs a probabilistic framework. The WHEN and WHERE Decisions The WHEN and WHERE decisions refer to the mechanisms that determine fixation duration and saccade length, respectively. Not all models reviewed above considered both dimensions. Of those that did, the READER (Thibadeau et al., 1982) assumed a single mechanism – reading comprehension – determined both, whereas in Suppes (1990) the two

27

Reichle et al. (1998) showed figures of distributions of simulated and empirical fixation duration measures and

claimed that they were similar without any quantitative support. The fittings were far from satisfactory compared to McConkie and Dyre’s (2000) work. The simulated distributions would almost certainly be rejected as appropriate models if any statistical analysis were performed.

52 decisions were completely independent. In both E-Z Reader and strategy-tactics models the two decisions were made through interactions between the lexical and the oculomotor systems. There is strong neurophysiological evidence that there exist two separate pathways, one carrying spatially coded information and the other conveying the triggering signal of saccades (e.g., van Gisbergen, Gielen, Cox, Brujins, & Schaars, 1981). Behavioral data also support the separation of the two pathways (Kingstone & Klein, 1993; Walker, Kentridge, & Findlay, 1995). These motivated Findlay and Walker (1999) to model the two pathways as a loosely coupled parallel system, in which cognitive factors may affect both pathways but via different mechanisms. Whether the WHERE and WHEN pathways are closely or loosely coupled systems has to be determined empirically. As a general architecture, the two pathways should be represented separately, while still allowing interdependencies between the two systems. On the other hand, a modular model, in which subsystems are only loosely connected, seems to be more desirable for model fitting and interpreting. Therefore, in the SHARE model the two pathways are implemented as separate subsystems that can be statistically dependent on each other. But the first model built on the basis of SHARE will assume they are conditionally independent subsystems. Whether or not they should be modeled as stochastically dependent processes is a question to be answered by the fit of the model to empirical data. Linguistic vs. Low-level Variables There is no doubt that eye-movement decisions are not independent of what is on the page. But whether eye movements are driven by high-level linguistic variables (e.g. word frequency and contextual predictability) or by low-level visual factors (e.g. word length and

53 landing position) is under theoretical debate. This is clearly reflected in the various quantitative models, each of which proposed some idiosyncratic set (including the empty set in the case of Suppes’ model) of variables that determine fixation duration and saccade targeting. The strategy for the SHARE architecture is to give all variables equal opportunities, and let data determine which variable is relevant to which eye-movement outcome. As a first step, the current implementation includes two relatively uncontroversial variables, namely the frequency of the currently fixated word and the length of the next word (see Rayner, 1998), which represent linguistic and low-level information, respectively. The model is not limited to these two variables, however. It is designed to make it easy to incorporate other variables without changing the fundamental structure of the model. Time-series vs. Independent Data Eye movements occur in order, therefore they naturally constitute time-series data. Most eye-movement research tries to summarize eye movements using statistical models designed for independent samples, for example, by using composite variables and analysis of variance. However, unless one can prove eye movements are time-independent, they should be modeled as time-series data. In other words, the burden of proof is on those who treat eye movements as independent samples. There have been attempts to study the temporal relations of eye movements. Several studies calculated autocorrelations among eye movements and found them to be negligible (Andriessen & De Voogd, 1973; Hogaboam, 1983; Rayner & McConkie, 1976). However, a zero correlation coefficient does not guarantee statistical independence. There is empirical evidence that eye movements are not independent samples. For example, regressions are more likely to

54 occur after long forward saccades (Andriessen & De Voogd, 1973). McConkie et al. (1988; 1989) found that various aspects of an eye movement (e.g., probability of word skipping) depend on the characteristics of the previous eye movement (e.g., landing position and launch site). The survey of quantitative models leads to a similar conclusion. Although Suppes’ (1990) minimal-control model assumed that both fixation duration and saccade moves were independent, identically distributed random variables, all other models treated fixation duration and saccade length as time-dependent. In conclusion, there is no strong a priori reason to believe eye movements can be modeled as independent samples. Therefore, reading eye movements should be modeled as timeseries data. On the other hand, most temporal connections proposed in the literature are relatively short term – in most cases between adjacent eye movements. This suggests a relatively simple stochastic model may be sufficient to capture these relations. Discrete vs. Continuous Control Eye-movement data – fixation duration and saccade length – are continuous, but that does not necessarily preclude the possibility that they were “intended” to be discrete. For example, Radach and McConkie (1998) argued that saccade programming is discrete. They suggested that saccades are targeted at word centers, and the spread of landing position is a result of errors in the oculomotor system (McConkie et al., 1988; O'Regan, 1990; Radach & McConkie, 1998; see also Rayner, 1998). The discrete-control model is in contrast with continuous-control theories (e.g., Liversedge & Underwood, 1998), in which eye movements are directly aimed at particular locations in words. Theoretical debates aside, the discrete-control conceptualization offers some advantages

55 from a modeling point of view. For example, it insulates the effects of cognitive factors from saccade execution details, so that the subsystems can be modeled separately. A discrete stochastic system is also easier to model than a continuous one, and is often more interpretable. One concern with the discrete-control approach is that the underlying mechanism may be truly continuous. The Mr. Chips model sheds some light on this issue. The Mr. Chips model was a strict continuous-control model, in which saccade length is meticulously calculated to maximize information. However, the saccadic “behaviors” could be well modeled as outcomes of a probabilistic, discrete control system in which eye movements were directed to the optimal viewing position of each word. Therefore, to the degree that descriptions of eye movements can be separated from the possible underlying mechanisms, a discrete-control model provides at least a good approximation of the eye movement outcome. Because of its relative simplicity and the likelihood that continuous data can be modeled via discrete underlying processes, it makes sense to begin with a discrete model of eye-movement control. While a discrete-control theory for saccade programming (McConkie et al., 1988) has been widely accepted, fixation duration, on the other hand, has almost always been assumed to be under continuous control. Our survey shows that the most popular, unchallenged assumption is that fixation duration (e.g., first fixation duration or gaze duration) is a linear function of the logarithm of word frequency (Just & Carpenter, 1980; Reichle et al., 1998; Reilly & O'Regan, 1998; see also Rayner, 1998, for a review). In some quantitative models (Reichle et al., 1998; Reichle et al., 1999; Reilly, 1993; Reilly & O'Regan, 1998), it is also a continuous function of landing position (eccentricity), word length, and duration of the previous fixation. In fact, there is empirical evidence hinting a discrete control system in the WHEN

56 pathway. Distributional analyses of fixation duration have shown that linguistic factors such as word frequency (McConkie, Reddix, & Zola, 1992) or semantics (Feng, Miller, Zhang, & Shu, 2001) tend to have strong effects on some fixations and little effects on others. These findings contradict traditional continuous-control models based on linear regressions (Reichle et al., 1998; Reilly & O'Regan, 1998; Thibadeau et al., 1982), which assume linguistic factors affect all fixations by changing the means of fixation durations. The clearest demonstration of the existence of different kinds of reading fixations is Yang and McConkie (in press), in which they experimentally manipulated the information readers could perceive at any given fixation using the eye-movement contingent display change technique (McConkie & Rayner, 1973). The manipulations to the text ranged from extreme (such as blanking the whole page or replacing a line of text with X’s) to modest (replacing text with non-words or filling all spaces with a symbol). Yang and McConkie found three categories of fixations (see Figure 9). The first group included short fixations (shorter than approximately 125 msec), which occurred even when all visual information was removed. The second group peaked at approximately 175 to 200 msec. These fixations did not require linguistic information but the content being fixated needed to be “text-like.” For instance, the position of the peaks of these fixations were largely unaffected when a line of text was replaced with X’s but the spaces were preserved, but the distributions were severely suppressed when the spaces were removed. Lastly, there was a group of long fixations that peaked roughly at around 350 msec and extended well beyond 700 msec in some cases. Corroborating evidence for the existence of three distinct types of fixations also came from oculomotor research. Gezeck, Fischer, & Timmer (1997) also found, in simple saccadic

57 reaction time experiments, three distinct categories of fixations – “express” (90-120 msec), “fast regular” (135-170 msec), and “slow regular” (200-220 msec). Interestingly, the three peaks are at the same position for naive and trained subjects but the weights differ, with more express saccades for trained subjects. The positions of the peaks differed from those in Yang & McConkie (in press), not surprisingly given the task differences, but both strongly suggest the existence of different categories of fixations, each having distinct parameters and possibly responding to different information. To determine whether fixation duration in normal reading can be modeled with a discrete model, I fitted a mixture-of-lognormal model to fixation duration from a large dataset (details of the study are presented in Appendix B). The hypotheses are similar to the discrete-control framework for saccade programming. The mixture-of-lognormal model assumes a two-level fixation duration control system. At the “control” level, there are n discrete categories of fixations, each having different parameters (e.g., intended duration). For each fixation, the control system chooses the appropriate kind of fixation and sends the command to the “output” level. At the output level, the command is carried out but with random error added, which is assumed to follow lognormal distributions (the justifications are discussed in Appendix B). Thus, over the long run, the distribution of all fixation durations follows a mixture of lognormal distributions. To summarize the findings, the distributions of fixation durations can be very well fitted with a 3-component mixture-of-lognormal model. This model not only fits group data from children and adults, but also fits individual distributions (these results are presented in detail in Chapter 5). Most importantly, the parameters of the three classes of fixations are largely

58 consistent with the estimates from Yang & McConkie (in press). This suggests that the good fitting achieved by the 3-component lognormal-mixture model is not coincidental. Based on McConkie et al. (1988) and the above fixation duration modeling study, both WHERE and WHEN pathways are modeled by a hierarchical probabilistic model, where eye-movement commands are discrete at the control level and random errors come into play at the output level. Group vs. Individual Models Individual differences28 in reading eye movements are enormous, and they were probably the very reason why the eye movement method attracted early researchers (Buswell, 1937; Huey, 1908). The value of the eye-movement methodology, especially in reading education, largely depends on our ability to describe and understand these individual differences. Nonetheless, practically all models of eye movement control are designed to eliminate individual differences so as to model an “average skilled reader.” An understandable argument is that after the general mechanism is discovered, individual differences may be accounted for by simply adjusting some model parameters. Although this is not an unreasonable modeling approach, there is no sign that many of the existing models can be easily modified to accommodate individual differences. For example, in most of the models in the survey, the rules (e.g., the axioms in Suppes, 1990), mechanisms (e.g., familiarity check versus lexical completion in Reichle et al., 19988), and constraints (e.g., minimizing lexical uncertainty in Legge et al., 1997) are hard-coded. It is unlikely that the same rules, mechanisms, or constraints will apply to

28

The term “individual differences” is used loosely here to represent both inter-personal differences and intra-

personal differences under different situations, e.g., reading for different purposes.

59 each individual under every circumstance. As a descriptive model, the current model is designed to be flexible – it can be used to describe group as well as individual eye-movement data. It imposes as few hard-coded constraints as possible so that it can be maximally flexible in accounting for variance in eye movements. In the meantime, its hierarchical framework helps to structure individual differences, captured in model parameters, in a meaningful way. Descriptive vs. Predictive Applications The original motivation for developing the descriptive model was to use it in a predictive application – detecting processing difficulties during reading. The idea was that if we could faithfully describe the different eye-movement patterns during normal reading versus reading difficulties, we would be able to predict whether the reader was experiencing processing difficulty based on a sample of his/her reading eye movements. Furthermore, if the diagnosis can be done accurately and quickly enough, it may be possible to provide real-time assistants to readers who experience difficulties in reading. There are several major obstacles in achieving this goal. Firstly, the eye-movement model has to be flexible enough to capture both normal and troubled reading. Most previous theories or models were unable to do this (e.g., E-Z Reader models excluded regressions). The current model is designed to be able to accommodate a wide range of eye-movement patterns. Secondly, prediction or diagnosis requires the model to be individualized; a set of predefined criteria will not fit all readers. This is especially critical because the application is intended for children, whose reading proficiency and eye movements vary substantially. In a real-world computer assisted reading instruction setting, the system needs to quickly adapt to a

60 particular reader, preferably within a few practice trials. Learning a model from sparse and incomplete data is computationally challenging because parameter estimates become unstable and possibly biased. One of the most promising solutions to this problem is to incorporate prior domain knowledge to guide parameter estimation (Heckerman, 1998). For example, if the reader is a third-grade student, what we know about third-grade readers’ reading eye movements should be used to help estimating the parameters for this particular reader. Finally, a computer assisted reading system needs to support probabilistic decisionmaking. Given a set of parameters and observed eye movements, it needs to probabilistically decide whether or not the reader was in trouble. Previous models do not have a mechanism to perform this task. The current model is designed to support such probabilistic classifications. Choosing the Mathematical Tools This chapter identified the goals and task constraints of the current model. The model attempts to summarize reading eye-movement patterns mathematically while being neutral about eye-movement control mechanisms as much as possible. The eight design principles enlisted above have outlined its basic structure – a hierarchical, stochastic model that fully supports individualization and probabilistic decision-making. What mathematical tools will serve these needs? The Markov models (see Chapter 1) are a natural choice for modeling stochastic processes (e.g., Bengio, 1999). Suppes (1990; 1994) used a zero-order Markov model (independent) for fixation duration and saccade targeting. Stark’s scanpath employed a firstorder Markov transition matrix to describe reading fixation sequences. However, as discussed previously, classical Markov models have at least two limitations: they are only suitable for

61 modeling discrete events, and they do not allow the hierarchical structure necessary for modeling reading eye movements. In light of these problems, I chose to use the Hidden Markov decision tree (HMDT) model (Jordan, Ghahramani, & Saul, 1997; Jordan, Ghahramani, Jaakkola, & Saul, 1998). The HMDT is a marriage between a Hidden Markov model (HMM; Rabiner, 1989) and a Hierarchical Mixture of Experts (HME) model (Jordan & Jacobs, 1994). The HMM is a class of Markovian models known for its successful applications in automatic speech recognition. It is a two-layered (representing two random variables) probabilistic model that unfolds over time. The “state” variable is assumed to be unobservable and follows the classical Markov process (thus the term Hidden Markov); the “output” variable is observable and is conditionally independent of everything except for the concurrent value of the state variable. Temporal dynamics are captured by the discrete, unobservable state variable, whose value is probabilistically revealed by the observed output variable. For example, words are composed of phonemes; phonemes are discrete, abstract categories that are not directly observable, but they are probabilistically related to the observable acoustic waveforms. One way to do speech recognition is to model this relationship with an HMM, where phonemes correspond to the different states of the state variable, the output variable represents various acoustic features of the speech, and words are characterized by different state-transition probabilities, i.e., phoneme sequences. The goal of the HMM is often to probabilistically determine the most likely value or value sequence of the (unobservable) state variable from a given sequence of input, i.e., to “recognize” phonemes or words from the waveforms. In order to do this, the HMM has to be “trained” with training data to optimize model parameters – the

62 recognition accuracy clearly depends on the model’s ability to capture the statistical regularities in data. The HME is a probabilistic decision tree model for classifying independent samples. Statistically it is closely related to the multinomial logit modes, a special form of generalized linear models (GLIM; McCullagh & Nelder, 1983). In its simplest form (e.g., see the HME example in Murphy, 2001), the HME may be reduced to a piece-wise linear regression model. However, its power lies in the hierarchical structure, where there are multiple layers of “gating” variables, or “experts.” As the input goes down the hierarchy of “experts,” the data space is recursively divided, until at the end the final categorization is reached. Thus, HME outperforms pure linear models and other models in complex data clustering tasks (Jordan & Jacobs, 1994). The HMDT architecture integrates the best features from both HMM and HME models. It may be viewed as an HMM with multiple “state” layers instead of one, which makes it possible to model more complex control mechanisms. Alternatively, it can also be seen as an HME with temporal structures, which allows it to model not only independent data but also time-series data. The current model uses a three-layer HMDT model, which is also known as the InputOutput Hidden Markov model (IOHMM; Bengio, 1999; Bengio & Frasconi, 1996). In the IOHMM terminology, word frequency, word length, and landing position are “input” variables, and fixation duration and saccade length are “output” variables. Between the input and output layers is the eye-movement control layer, represented as the “state” variables in the IOHMM. Looking at the static structure, the generation of eye movement commands (word targeting and fixation categories) at the control layer is probabilistically affected by the linguistic and visual input variables, and the actual eye movements, the output variables, are probabilistically

63 controlled by the eye movement commands. In the temporal dimension, an eye-movement decision is probabilistically based on not only the current input variables but also on the previous eye movement. It should be noted, however, that the SHARE architecture is not limited to one layer of eye movement control. For example, in order to model the fact that eye movement patterns are different when a reader experiences problems in reading, one may implement a four-layer HMDT model, in which a “cognitive state” node with two states – troubled and normal reading – is linked to the “control” layer described above. Such an implementation allows modeling of long-term changes in eye-movement patterns, in addition to the effects between adjacent eye movements. The modular, hierarchical structure of SHARE minimizes the effects of model extension on existing structures. In addition to the HMDT structure, another important element of the SHARE architecture is the use of Bayesian methods for estimating model parameters and conducting statistical inferences. Unlike other commonly used methods such as maximum likelihood methods, Bayesian methods provide a natural way to combine prior knowledge and observed data during estimation (see Bernardo & Smith, 1994for Bayesian theory in general, and Bengio, 1999, and Jordan et al., 1998, for an introduction to the use of Bayesian methods in stochastic modeling). At least two aspects of the Bayesian method are attractive for the current application. First, because the model will be fitted at the individual reader level, there may not be enough data to reliably estimate all parameters using traditional methods. By using prior knowledge (e.g., the distribution of parameters for third-grade readers), the Bayesian method is able to stabilize estimations and deal with missing data naturally. The other advantage of the Bayesian

64 method is that it provides a way to adapt a generic model to an individual. One may start with a model with parameter values based on the grade level, but as eye movements are collected, the model parameters may be updated using the Bayesian method and the model gradually and quickly becomes individualized. Few other methods provide flexibility like this. To summarize, the objectives of the current research requires a probabilistic description of reading eye movements, and the stochastic model based on IOHMM provides the mathematical tool for modeling. The architectural and computational details of the current model are discussed in the next chapter.

65 CHAPTER 4. SHARE: STRUCTURE, DYNAMICS, AND MODEL FITTING SHARE, a stochastic, hierarchical architecture for reading eye-movement, is designed to mathematically describe reading eye movements. The rationales for choosing the IOHMM framework have been laid out in the previous chapter. The current chapter focuses on the specifications and the workings of the model. Modeling Environment The model was implemented using MatLab, with the Bayes Net Toolbox (BNT; Murphy, 2001). BNT is an open source MatLab package that supports graphical modeling (Jordan et al., 1998) and Bayesian inference (Bernardo & Smith, 1994; Heckerman, 1998), which are two crucial elements of the SHARE model. The source code for the SHARE model is available on request. Modeling Data The eye-movement data used for model fitting came from Miller & Feng (in prep.), in which English- and Chinese-speaking children (third- and fifth-graders) and adults (undergraduate students) were asked to read ordinary short stories on a computer screen. The current study focused only on the English data. There were 20 third-grade students, 26 fifthgrade students, and 30 adults, each reading 16, 18, and 27 pages of text, respectively. The stories were selected to be at the children’s age levels (third- and fifth-grade levels, respectively); adult readers read the children’s stories for comparison. Eye movements were recorded using the EyeLink system, a video-based system with sampling rate of 250 Hz and spatial resolution of 0.005°. Typical calibration-recalibration accuracy is approximately 0.5° to 1°. The default saccade detection algorithm in the system was

66 used. Eye-movement recording was binocular, but data from only the left eye were analyzed. Reading materials were presented on a 17-inch monitor in the standard VGA mode (640 x 480 pixels), 60-70 cm away from the reader. English materials were displayed in Espy Sans font, a font optimized for screen display. Each letter subtended an average of 0.31 visual degrees or 7.9 screen pixels. The whole dataset consisted of more than 140,000 fixations. Eye movement variables such as gaze location, fixation duration, and saccade length were recorded, along with relevant information such as word frequency (Francis & Kucera, 1982), word length (in letters and pixels), and landing position within words (in pixels). Structure of the SHARE Model A graphical representation of the SHARE model is shown in Figure 10. Each node in the graph represents a random variable. Nodes with rectangular boxes are discrete variables; nodes with oval boxes are continuous variables. Clear nodes represent observed variables; the shadowed box (FDC) represents a hidden variable. An arrow from one node to another shows that the latter variable is dependent on the former; the lack of an arrow between two nodes shows that the two nodes are conditionally independent. The circular arrows beside the ST and FDC nodes signify temporal dependency, i.e., the value of a node at time t depends on that at time t-1. There were eight nodes in the SHARE model, forming three layers. The top three nodes form the input layer. Three variables represented linguistic, low-level visual, and oculomotor input information to the eye-movement control layer. 1. FREQn is the word frequency (Francis & Kucera, 1982) of the currently fixated word. Numerous studies have shown that word frequency affects fixation durations and saccade

67 programming (see Rayner, 1998, for a review). For computational simplicity29, frequency was divided into three categories – less than 100 occurrences per million (L), between 100 and 1000 per million (M), and more than 1000 per million (H). The three categories had roughly equal sizes. Although the cut-off point for the low frequency category – 100 per million occurrences – was higher than that typically used for adult psycholinguistic studies (around 40 per million), it is more appropriate for third- and fifth-graders. 2. WLENn+1 is the word length of the word following the one currently fixated30. The length of the word in the right periphery has been shown to affect skipping rates (Kerr, 1992) and landing position (McConkie et al., 1988), among other eye-movement parameters. As with word frequency, word length was classified into three levels – less than 4 letters long (S), between 4 and 8 letters (M), and longer than 8 letters (L). By token or by type, there were more short words than long words in the reading materials. 3. ECCENn is the eccentricity of the current fixation relative to the fixated word. McConkie et al. (1988) and O’Regan (1990) have shown that refixation rate is a function of landing position. Fixations that land at or near word centers are less likely to result in refixations

29

In general discrete variables are less computationally demanding in Bayesian network modeling. Although in the

current study the cut-off points are more or less arbitrary and probably not optimal, the discrete variables should show qualitatively similar effects as the continuous ones. As the very first step it was more important to implement a simple but working model than to perfect all details. In the future continuous input variables may be used to avoid these arbitrary decisions.

30

In case the current word is the last word of a line, WLENn+1 is the length of the first word in the next line.

Although psychologically return sweep planning may be different from that of normal saccades, no special

68 than are eccentric fixations. In the current model, ECCENn was a binary variable: eccentric (E) fixations were those that landed on the beginning or end quarter of a word; those that landed on the central two quarters were central (C) fixations. This served as a simplified measure for landing position effects. The middle layer is the eye-movement control layer, which includes the saccade targeting (ST), fixation duration class (FDC), and planned saccade length (PSL) nodes. The control layer receives information from the input variables and probabilistically determines the target of the next saccade and the category of the current fixation duration. These two eye-movement commands are passed to the output layer to generate actual eye movements. 4. STt is the saccade-targeting node. In the current model, it was assumed to be directly observable from data.31 It was modeled as a discrete variable with seven values, or “states,” representing seven different kinds of saccadic moves – (a) regress two or more words32, (b) regress one word, (c) refixate the current word, (d) move forward one word, (e) move two words forward, (f) move three words ahead, (g) move forward four or more words. Each state was associated with a probability, which was in turn conditioned on the values of the input variables

**mechanism is implemented in the current model for simplicity.
**

31

It is a standard assumption that the word the eye lands on is the intended word. According to this assumption, the

value of ST is directly observed. However, the assumption ignores the possibility that the eye missed the intended target because of oculomotor errors (McConkie et al., 1988). In the current model, I chose to ignore these cases during model fitting because it greatly simplified computation. These cases were dealt with in simulations.

32

Because only around 1-1.5% saccades were regressions longer than 2 words, these were combined with 2-word

regressions. For the same reason, forward saccades longer than 4 words (about 1% for children, 2% for adults)

69 and the previous value of ST (STt-1). In other words, the probability of each movement might go up or down depending on the current linguistic, visual, and oculomotor information, as well as the last saccadic move. The ST node achieved this by keeping track of all combinations of the input variables. Internally, it had a table of 3 (FREQ) x 3 (WLEN) x 2 (ECCEN) x 7 (STt-1) x 7 (STt) = 882 probabilities, 144 (2x2x1x6x6) of which were free parameters. How these parameters were adjusted during model fitting will be discussed in the next section. Modeling saccade targeting as a discrete, word-based process is consistent with McConkie et al. (1988; McConkie et al., 1994) and many other theories (e.g., O'Regan, 1990; Rayner & Pollatsek, 1989; Stark, 1994; Suppes, 1990; but see Legge et al., 1997, and Shillcock et al., 2000). Unlike models that assume a default word-by-word reading strategy (e.g., Morrison, 1984; O'Regan, 1990; Reichle et al., 1998), the current model assumes that each word within the window of ST node has a certain probability of being fixated, and the actual decision is made probabilistically. It also differs from the two previous Markov models (Stark, 1994; Suppes, 1990). In Suppes’ model WHERE and WHEN decisions were made independent of previous eye movements. The current model extends it to represent dependencies between consecutive eye movements. One problem with Stark’s model is that by making every word a potential target at any moment, the model has a necessarily large transition matrix that contains mostly near-zero probabilities, making probability estimation very difficult. In contrast, the current model uses a local representation – only words near the current fixation are considered, which allows more accurate estimation.

were combined with 4-word forward saccades.

70 5. FDCt represents the fixation duration category of the current fixation. As shown in Appendix B, fixation duration could be modeled as a mixture of three lognormal distributions. The FDC node controlled the mixture rate. It was modeled as a discrete random variable with three states – short (S), medium (M), and long (L) fixation. The FDC was a hidden node because its state was not directly observable. Its value was probabilistically inferred (estimated) from observed fixation duration. Like the ST node, the probability of making a short, medium, or long fixation was conditioned on the input variables and the previous fixation duration category (FDCt-1). Internally, it kept a table of 3 (FREQ) x 3 (WLEN) x 2 (ECCEN) x 3 (FDCt-1) x 3 (FDCt) = 162 adjustable probabilities, 16 of which were free parameters. 6. PSLt is the planned saccade length, which is the distance (in pixels) from the current fixation location to the center of the intended word. It was modeled as a continuous random variable. It was an observed node during model fitting, because it was calculated from empirical eye-movement data. Therefore, the arrow between STt and PSLt should be ignored during model fitting. During simulations, it was computed based on the current fixation position and the coordinates of the intended word, which was determined by the value of the ST node. The arrow with a dotted line between STt and PSLt signifies this dependency during simulation. At the bottom of the figure is the output layer of the model, which includes SACCt and DURt nodes. They take commands from the eye-movement control nodes and “execute” eye movements. Both of the variables were continuous, corresponding directly to what would be measured by an eye-tracker. 7. SACCt is the saccade to be carried out at the end of the current fixation t. It is measured in pixels in the current model. A positive number corresponds to a saccade to the right

71 of the current fixation position. Normally this means a forward saccade, but under rare conditions it would also be a regressive saccade going from the beginning of a line to the end of the last line. Conversely, a negative number typically means a regression, except for return sweeps, in which the eye goes from the end of a line to the beginning of the next. Following McConkie et al. (1988)33, SACCt was assumed to follow a normal distribution, whose parameters – mean and variance – were determined by the STt and PSLt nodes. More specifically, mean(SACCt)= ai + bi * PSLt , and var(SACCt)= si , where i (i= 1..7) corresponds to the current state of the ST node, PSLt is the currently intended length of saccade, and ai , bi, and si are constants estimated during model fitting. In other words, the SACCt node kept a different set of parameters (ai , bi, and si ) for each type of saccade move. Note that the current parameterization was a simplified version of McConkie et al.’s results34. In the current model no assumption about the variance for each saccade move (which determined the planned saccade length) was made; it was left for the model to learn from data.

33

Using the notations in E-Z Reader model (Reichle et al., 1998; see Chapter 2), McConkie et al.’s formula for

landing position may be reformulated in terms of mean saccade length: Mean Saccade Length = PSL + ( Ψb − PSL) ⋅ Ψm = Ψb ⋅ Ψm + (1 − Ψm ) ⋅ PSL = a + b ⋅ PSL

34

Some factors, for example word length, were not taken into account. In addition, McConkie et al. estimated that

the variance of the landing position distribution was a cubic function of launch sites (PSLt). This cubic function is not implemented in the current model because the scatter plot in their paper (Figure 4) showed that the cubic trend was not strong.

72 8. DURt represents the logarithm of the duration of the current fixation. DURt followed a normal distribution, with a different mean and variance for each state of the FDCt node. More specifically, mean(DURt)= ai, and var(DURt)= si, where i (i= 1..3) corresponds to the current state of the FDC node, and ai and si are constants estimated during model fitting. Over the long run the output of the DURr node would be a mixture of three normal distributions because of the three different set of parameters. The exponent of the DUR variable, consequently, would follow mixture-of-lognormal distribution, which has been shown to be a good model of the distribution of fixation durations (Appendix B). During model fitting the empirical fixation duration was first log-transformed. In the simulation the reverse transformation (exponential) was applied to the output values of the DUR node. In addition to the nodes, the arrows in the figure were equally important to the structure of the model. They represented the direction of causality in the model (Heckerman, 1998; Perl, 2000). In particular, the current model assumed that both WHEN and WHERE decisions were affected by the three input variables – FREQ, WLEN, and ECCEN. The strength of these factors was to be estimated from empirical data. From the control layer to output layer, the current model assumed that the WHERE and WHEN pathways are (conditionally) independent. There was no arrow between ST and FDC nodes, ST and DUR nodes, or FDC and SACC nodes. The model also excluded any crosspathway connections from fixation t to fixation t+1. These independence assumptions were

73 made to simplify model conception and computation. However, this did not imply that SACC and DUR nodes are independent. On the contrary, statistically and conceptually, saccade length and fixation duration in the current model were correlated because they both shared the same “parents” – the input nodes. If a close examination of the model shows that the empirical relation between saccade length and fixation duration cannot be captured by the current model structure, some of the independence assumptions may be relaxed. Temporal Dynamics SHARE modeled three kinds of variation in reading eye movements – (a) the inherent randomness of perceptual, cognitive, and oculomotor processes, (b) the variation of the current linguistic and other input, and (c) the time-dependency of the eye-movement process. The first two were captured by the hierarchical, probabilistic model structure. The time-series nature of eye movements was modeled with the temporal links (the two self-pointing arrows beside ST and FDC nodes) at the eye-movement control level. Like other arrows, the self-pointing arrows indicated that the state of the random variables (ST or FDC) at time35 t was dependent on that of t-1. Conditioned on the input nodes, the ST and FDC nodes followed a first-order Markov model. The model used this short-term temporal dependency to approximate possibly complex time-series effects in eye-movement programming. Given that most temporal effects reported (e.g., the spill-over effect, optimal viewing position effects) are in fact confined to consecutive eye movements, it was expected that the first-order Markovian process should capture most of the temporal dynamics in reading eye

35

Eye movements were treated as discrete time events.

74 movements. Model Fitting and Parameter Learning Three features distinguished the fitting of the SHARE model from the modeling efforts reviewed previously. First, the model was completely individualized, which means that every parameter was adjusted so that the model best captured the reading eye movements of a particular reader. Wide ranges of individual differences in reading eye movements have been well documented for over a century. One of the goals of this research was to find a way to fully describe these differences. I did not attempt to construct age-group-average models because without an understanding of the differences between individuals, a group average model would be impossible to interpret. In addition, the model parameters were not estimated from a set of statistics computed from eye movements, as all previous models have done. Instead, the present model was fitted to the raw data. In other words, every fixation and saccade a reader made was used to adjust, or “train,” model parameters. The goal of the model fitting process was to maximize the overall goodness of fit of the model. The goodness-of-fit index used here was the log-likelihood of the model, which is the logarithm of the probability of the data being produced by the model. Finally, the Bayesian method was employed to achieve the above two goals. A critical challenge for fully individualized modeling is that there may not be enough data to reliably estimate all parameters. For example, the overall probability of making a 5-word forward saccade was often less than 0.05 for third-grader readers. If a child made 2,000 fixations, there would be fewer than 100 in this category. Further divide these 100 fixations by the number of combinations of FREQ, WLEN, ECCEN, and STt-1 nodes, which is 126, and some of the cells

75 were bound to be empty. Thus, estimating parameters of these cells would have been impossible. Conceptually, a sensible way to deal with this situation is to estimate them with group averages – when data from many readers are pooled together, hopefully these parameters become estimable. The Bayesian method is uniquely suited to implement this intuition. With the Bayesian method, we first impose a prior probability distribution, centered at the group average, over the parameter we want to estimate. The prior probability distribution represents our belief or knowledge about the value of the parameter. When there is no observed evidence regarding this parameter, the posterior probability distribution is simply the prior distribution, and our best guess in this case is the group mean. In addition to these trivial cases, the true power of the Bayesian method is its ability to estimate posterior probability distribution when there are limited observed data, in which case the combination of prior knowledge and empirical data narrows down the posterior distribution, resulting in accurate parameter estimation (see Bernardo & Smith, 1994, and Smyth, Heckerman, & Jordan, 1996, for the Bayesian methods). Therefore, in the current model priors were used in estimating all parameters. The fitting of an individual SHARE model involved two major steps – (a) specifying the prior distributions for each parameter and (b) looping through eye movements of a reader and adjusting the parameters according to the Bayes rule. Specifying prior distributions. Because the input variables FREQ, WLEN, ECCEN, and PSL (during model fitting) were observed, they were not estimated and therefore did not need priors. The prior distributions for parameters of the ST node were assumed to follow Dirichlet distributions (the most common prior distribution for discrete variables; see Bernardo & Smith,

76 1994, and Murphy, 2001). The parameters of the Dirichlet distributions were determined in the following way. First, the overall probabilities of the seven saccadic moves (see previous discussion on the ST node) were calculated over the whole age-group dataset36. This set of probabilities was replicated 126 times to fill all combinations of FREQ, WLEN, ECCEN, and STt-1, and these 882 probabilities were set as the parameters of the 126 Dirichlet distributions. The above steps defined our a priori knowledge about the individual reader – we assumed that the reader was an average reader of his/her age group, and that none of the input factors had any effects on his/her saccade programming. The prior distributions for the FDC node were also Dirichlet distributions, but their parameters were estimated differently from that of the ST node because FDC was unobservable. The first step was to estimate the overall probabilities of making short, medium, or long fixations. This was done by fitting the reader’s fixation duration to a simple Gaussian-mixture model (McLachlan & Peel, 2000), as in Appendix B37. Once the personalized overall probabilities were estimated, they were copied 54 times to fill all combinations of FREQ, WLEN, ECCEN, and FDCt-1, and these 162 probabilities were set as the parameters of the 54 Dirichlet distributions. This was equivalent to the assumption that neither the input variables nor the previous state of the FDC node had any effect of the current state of FDC. There were three parameters for the SACC node – the intercept (ai), the slope, (bi), and

36

These simple probabilities would be all the information necessary for a zero-order Markovian minimal-control

**model (Suppes, 1990, 1994).
**

37

Note that the fitting of the Gaussian-mixture model itself involved Bayesian modeling, where its prior was set to

a Dirichlet distribution with group averages as parameters.

77 the variance (si), all of which were conditioned on the state of STt. The SACC node itself followed a normal distribution whose mean was determined by ai, bi, and PSLt. The priors were assumed to follow normal-gamma distributions (the most common prior distribution for normaldistributed random variables; see Bernardo & Smith, 1994). The initial values of ai, bi, and si were estimated using regression analyses of all eye-movement data from the appropriate age group. For example, to obtain estimates of the intercept, slope, and variance for refixations, all refixations in the age group were entered to the regression model SACC = a + b * PSL, and the estimated parameters were used as the parameters for the prior distribution for refixations. Finally, the DUR node also followed a normal distribution, but its parameters were assumed to be “clamped,” meaning that they were fixed and were not adjusted during model fitting. The reason to clamp the parameters was to be consistent with the 3-component lognormal-mixture model (Appendix B). If the mean and variance were allowed to change under different combinations of the input variables, the resulting distribution of the DUR node would be a mixture of many normal distributions rather than a 3-component normal mixture. The values of the parameters (means and variances) were estimated as a by-product of estimating the prior distribution for the FDC node. In fitting the (personalized) Gaussian-mixture model, the mixture rate was used as the prior for FDC, and the estimated mean and variance for each component normal distribution were set as fixed parameters for the DUR node. Therefore, although the DUR parameters did not change in model fitting, they were still fully individualized. Bayesian parameter estimation. Once the priors were set, the model was ready to be

78 trained with empirical eye-movement data. An exact inference version of the Boyen-Koller inference algorithm for dynamic belief networks (Boyen & Koller, 1998a; Boyen & Koller, 1998b; see Murphy, 2001for implementation details) was used. The technical details of the algorithm will not be discussed here. Conceptually it looked for the maximum posterior probability solution given the prior distribution and data (Cowell, 1998a; Cowell, 1998b; Heckerman, 1998). The iterative algorithm stopped when the improvement of the goodness of fit index – the log-likelihood – was under a threshold. The chance of stopping at a local maximum instead of global optimum solution was minimized by both the use of reasonable prior distributions and using multiple (3) runs with different random seeds in estimating the Gaussianmixture model. Model Adequacy and Comparison From the perspective of an empirical researcher, it is natural to ask the question of whether a model is adequate. However, the question is difficult to answer in the absolute sense. Statistically, it is more sensible to compare the relative goodness of fit of different models. The ultimate answer to the question depends on one’s goals. The adequacy of the SHARE model is addressed in two ways. First, compared to various reduced versions of the model, the complete and trained model gained significantly in likelihood ratio tests. The improvements were examined separately for the WHEN and WHERE pathways, because they were conditionally independent and the overall log-likelihood was the sum of the log-likelihood indices for the two channels. Likelihood ratio tests were performed for each individual and the following findings held for each individual reader. For the WHEN pathway, there was a statistically significant gain in goodness of fit of the

79 simple Gaussian-mixture model when the parameters were individualized. When the Gaussianmixture model was further compared with the full SHARE model (WHEN pathway only) that took into account the input variables and temporal dynamics, there was a statistically significant gain by the latter. Similarly, the complete WHERE pathway was shown to be statistically superior to a model that assumed no individual differences, no effects from the input variables, and no temporal connections. Together, these results suggest that the more complex structure in SHARE is necessary to account for reading eye-movement data, and its performance was better than some simple models of eye movements. Because the emphasis of the present research is to establish the SHARE architecture, a comprehensive analysis of the model is beyond the scope of the current report. Future studies will address some important issues, such as the relative contribution of different input variables to the two pathways and whether some interaction between the two pathways would increase model fitting. The next chapter will focus on simulation studies of the SHARE model and compare eye-movement behaviors of the model to real readers.

80 CHAPTER 5. SIMULATION RESULTS The Markovian structure of the SHARE model is very suitable for running simulations. The model took a text, coded in terms of word frequency, word length, and the x-coordinates of the beginning and end of the words, “read” through it according to its parameters, and stopped reading when it reached or passed the last word of the text. In the simulation study, each individualized model read through the same texts that the corresponding human reader had read. Eye-movement characteristics of the reader and the model were compared. Simulation Method Materials. Preparing reading materials for the model was straightforward. Each word in the texts used in Miller and Feng (in prep.) was simply coded with four variables – FREQ, WLEN, x1, and x2. The latter two simply marked the horizontal position of the word in screen coordinates38. FREQ and WLEN were defined in the last chapter (see Figure 10). Procedures. Model parameters of the particular reader were loaded. For each trial, the model was assumed to always start with a fixation on the first word. Other parameters were initialized as follows39: STt=0 was set to “forward 1 word,” FDCt=0 was set to a medium fixation, and ECCENt=1 was “central fixation.” With these initial values and the values of the input variables FREQN=1 and WLENN=2, SHARE was able to find the appropriate STt=1 and FDCt=1. For example, STt=1=x was the

38

In the Miller and Feng (in prep.) study there were multiple lines of text per screen. However, the y-coordinates

are not interesting in reading except for distinguishing lines. They were trivial to model and were not included here.

81 conditional probability: P(ST=x| STt-1=STt=0, FREQ=FREQN=1, WLENN+1=WLENN=2, ECCEN=ECCENt=1) All combinations of these probabilities were estimated and stored internally in a parameter table in the ST node. Therefore, finding P(STt=1=x) was simply a table lookup with the values of the input variables and the previous ST as indices. The procedure for finding P(FDCt=1=x) was similar. The next step was to generate eye-movement commands. The value of the ST node was randomly generated from the discrete distribution P(STt=1=x), where x was one of the seven possible saccadic moves. The resulting random sample indicated the target word for the next saccade. Similarly, the value of the FDC node was also randomly generated, which was the category of the current fixation duration. An additional step in the WHERE pathway was to calculate the value of the PSL node. The planned saccade length was the displacement between the current position of the fixation and the center of the targeted word, as indicated by the current value of ST. The calculation of PSL from ST was completely deterministic. Next, the eye-movement commands were passed down to the WHEN and WHERE pathways for execution. For the WHEN pathway, the conditional mean and variation of the DUR node, given the current FDC value, were retrieved from the table of parameters stored in the DUR node. Then a random sample from the normal distribution specified by the conditional mean and variation was drawn. The exponent of this random sample was the duration of the current fixation. The processes in the WHERE pathway were similar. The SACC node was also

39

Hereby t represents the current fixations, and N represents the current word number.

82 assumed to be a normally distributed variable, whose mean and variance were determined by the current values of ST and PSL nodes: mean(SACCt)= ai + bi * PSLt , and var(SACCt)= si , where i is the current value of ST, ai, bi and si are parameters associated with i that were estimated during model training. The actual saccade length was a random sample from the normal distribution specified above. Now the first fixation on a page had terminated and the first saccade had been made. Some information needed to be updated at this point. Now, t=2, and N=N+STt=1 (i.e., the current word was set to the targeted word; see below for exceptions). The ECCENt=2 was computed as specified in the last chapter. The FREQ and WLENN+1 values were also updated. With all values of the input nodes updated and the past values of ST and FDC nodes available, the model was ready to repeat the above process and generate the next fixation duration and saccade move. The process would iterate until the targeted word in ST node was beyond the last word in the text. Problems arose when the difference between PSL and SACC was so large that the next fixation would land on a word other than the targeted word. In this case the model simply took the fixated word as if it were the targeted word, and calculated ECCEN, FREQ, etc. based on the actual fixated word. Other treatments were possible but not explored here. If, after a regression, the “eye” was sent to a word before the first word, it was simply redirected to the first word. Ten simulation trials were run for each of the 76 individualized models, including both children and adults, with different random seeds. The “eye movements” and the corresponding word information were recorded for further analyses.

83 Distributions of fixation durations The upper left panels of Figures 11-1 through 76 (one for each participant) show the frequency distributions of empirical and simulated fixation duration. Note that the simulated frequencies were divided by the number of simulation trials (10) so as to be with the same scale as the empirical figures. In general, the simulated data appeared to follow empirical distributions closely and was responsive to individual differences. A formal statistical test of the hypothesis that two distributions are identical is the Kolmogorov-Smirnov (K-S) test (Birnbaum, 1952; Conover, 1999; Hall & Wellner, 1980). The K-S statistics involves calculating a critical value, w1-α, which is a function of the confidence level α. If at any point along the distribution, the cumulative distribution function of another distribution is more than w1-α away from that of the sample distribution, we reject, with confidence level α, the hypothesis that the other distribution is the population distribution of the sample. For large n Hollander & Wolfe (1999) introduced an approximation formula: w1-α =

− ln(α / 2) . 2n

For α=0.05 and n=1000 (most readers have between 1,000 and 2,000 fixations), w1-α is approximately 0.043. The K-S test can be carried out visually. The lower left panels of Figures 11-1 through 76 show the cumulative distribution functions of empirical and simulated fixation duration. A vertical bar at the top-left corner of each figure shows the magnitude of w1-α for that particular reader. If the vertical difference between the two cumulative distribution functions exceeds the length of the bar, SHARE is not a statistically adequate model of fixation duration. In fact, none

84 of the 76 individual simulations differed statistically significantly from the empirical data. Distributions of Saccade Length The empirical and simulated distributions of saccade length were compared for each individual model with the same procedures as for fixation duration. Frequency distributions for saccade length were shown in the upper right panels of Figures 11-1 through 76. The simulated frequency distributions appear to fit fairly well with the empirical data. The model was able to generate return sweeps as well as progressive and regressive saccades in approximately correct proportions. Cumulative distribution functions were shown in the lower right panels of Figures 11-1 through 76. The small vertical bar at center-top of each figure represents the magnitude of the w1-α for the reader. The simulated distributions, the smoother curve, also appear to follow the empirical distributions closely. However, the K-S tests showed that in each of the 76 trials the simulated distribution was statistically significantly different from the empirical one. Three systematic discrepancies were apparent in the frequency distributions and cumulative distributions. First, the model sometimes failed to show the dual-peak structure near zero in the some of the empirical data. The saddle around zero indicated that readers were unlikely to make very small saccades. This is consistent with O’Regan’s (1990) finding that a refixation tends to land on the opposite size of the word from the previous landing position. Interestingly, however, not every reader showed the fine structure (adult readers were more likely to show the saddle), and the model was able to demonstrate the saddle in some cases. This suggests that the model was able to capture the phenomenon, but some of the parameters were probably not optimized. In addition, the model slightly but consistently overestimated the longest saccades, which

85 was evident in the lower right panels of Figures 11-1 through 76, where the simulated distribution function was consistently lower than the empirical curve near the top of the chart. The likely cause of the problem is that the variance parameter in SACC node was overestimated for the “four words or longer forward saccade” category. This category had relatively few but very heterogeneous cases, which tended to lead to an unstable variance estimate. It is also possible that in these cases the landing position distributions might no longer follow a normal distribution but instead a skewed distribution. This would also lead to elevated variance estimates under the normal distribution assumption. Future research is needed to explore ways to model this heterogeneous category by using non-normal distributions. Lastly, the model predicted a small but visible number of between-line regressions – extremely long saccades involving regressions from the beginning of a line to the end of last line. These saccades did occur in data, but were not as frequent as the model suggested. Between-line regressive saccades did not require any special mechanism in the present model. The ST node generated a regression command without knowing in which line the word was located. If the target happened to be in the previous line, a long, between-line regression was generated. Thus, the frequency of this type of regression was no different from regular regressions, according to the current model. However, the empirical data suggest that the frequency of between-line saccades is lower than expected. In the future information such as line number may be added to the model to suppress these between-line regressions. SHARE in Conventional Eye-movement Measures To relate the SHARE model to traditional eye-movement theories, and to demonstrate its ability to capture moment-by-moment processes, the following analysis compared simulated and

86 empirical eye movements in terms of some conventional eye-movement measures. The structure of the analysis was intentionally borrowed from the E-Z Reader modeling (Reichle et al., 1998). Reichle et al. classified words into five frequency categories and summarized eye-movement data using six word-based measures. The measures were: (a) first fixation duration, (b) single fixation duration, (c) gaze duration, (d) skipping rate, (e) the probability of making a single fixation on the word, and (f) the probability of making two fixations on the word. In the current analysis, the same procedure was followed, except word frequency was coded in three levels (as part of the model specification) instead of five. But instead of predicting one set of group means, the current model had to predict 76 sets of individual means. This was a more stringent test because the model needed to accommodate a wide range of individual differences – from beginning readers to adults. The added degrees of freedom also made the results more interpretable, in view of the collinearity problem in these measures (see Appendix A). Figures 12 through 17 compared the simulated and empirical values of the six measures. Each point represents an individual mean. As seen in the figures, not only were the empirical and simulated values highly correlated, but the model also reproduced the absolute values with reasonable accuracy. It is worth noting that in Figure 17, the probability of making double fixations on a word had a fairly restricted range, and yet the model was still able to predict the values. On the other hand, there were some systematic differences between the simulated and empirical data. For example, the model was able to reproduce fairly closely the probability of

87 making single fixations on a work (Figure 16), but consistently (although only slightly) underpredicted single fixation duration (Figure 15). The model did not have a special mechanism to program a single fixation, and therefore its single fixation duration means should be identical to first fixation duration. The simulation results suggested that average fixation duration increased when only one fixation was made on a word, compared to cases with multiple fixations. Future research is needed to examine whether the increase of mean fixation duration is a result of change in the weights of the fixation duration categories or in the means of these components. Overall, the analysis showed that, with few assumptions about mechanism, the SHARE model was able to reproduce eye-movement details, measured by conventional eye-movement measures. Furthermore, SHARE provided a set of terminology, such as saccade targeting probabilities and fixation duration categories, that can reproduce eye-movement distributional data for individuals. Because the parameters of this model are more tractable and accessible than the raw distributions, this can be an important step toward developing an empirical methodology for implementing and evaluating the claims of contending models of eye-movement control in reading. Summary Once a SHARE model was trained with an individual reader’s eye movement data, it captured the essence of the data and encapsulated it in the model parameters. Given the same reading materials, the model could reproduce eye movements that were quantitatively similar to the original empirical data, as the above simulation study demonstrated. The simulation also showed that the SHARE architecture was able to adapt to beginning and skilled readers. In addition, the Markovian structure at the control level of the model

88 naturally accounted for temporal dynamics in reading. When assessed using the conventional eye-movement measures, the model was able to quantitatively reproduce empirical values. Compared to many existing models, the graphical model is simple and its statistical characteristics are well understood. Therefore, the SHARE structure is suitable as a general platform of communication in the field of reading eye movement research. The simulation and the analyses only illustrated a small portion of the potential of the SHARE architecture. For example, it would be interesting to study its ability to predict the next eye movement on the basis of eye movements that a reader has already made. The simulation also suggested several aspects of the current implementation of the model that need refinement, including the handling of refixations in some readers and the issue of single fixation duration. The next chapter will show how analysis of the parameters of individual SHARE models can shed light on what aspects of eye-movement control develop as children become more skilled readers.

89 CHAPTER 6. DEVELOPMENTAL CHANGES OF READING EYE MOVEMENTS The previous chapter has shown that the SHARE model is able to capture a wide range of individual differences in reading eye movements. It may also prove useful in capturing in a concise manner developmental differences in reading eye-movements, which will in turn provide the basis for theorizing about what cognitive processes change with the acquisition of reading skill. Previous Research on the Development of Reading Eye Movements How do eye movements change with age and reading proficiency? A few studies have investigated this question, and most of them have reported only global statistics. Table 1 (from Table 4 in Rayner et al., 1998) summarized some global measures of reading eye movements from previous studies (Buswell, 1922; McConkie et al., 1991; Rayner, 1986; Taylor, 1965). Mean fixation duration declines with age, although the absolute values of the means and the range of developmental changes vary among studies. Developmental changes in saccade patterns are more difficult to describe. Based on the incomplete list of two variables in Table 1, skilled readers cover the same text with fewer fixations, although it is not consistent across studies whether proficient readers have fewer regressions than beginning readers. The only study that went significantly beyond global statistics is McConkie et al. (1991). McConkie et al. examined distributions of fixation durations for first- through fifth-grade students. Three findings were evident from the distributions. First, fixation duration distributions typically had a single mode at approximately 180 msec, regardless age. Therefore, what drives the developmental changes in mean fixation duration appears to be the right tails of the distributions. In addition, there were substantial individual differences in the distributions of

90 fixation durations, especially among beginning readers. Lastly, McConkie et al. also showed that the means and higher moments of fixation duration distributions were strongly correlated with reading abilities. With regard to saccade control, McConkie et al. (1991) found that first-grade students showed distributions of landing positions similar to those of adults (McConkie et al., 1988). Another eye-movement characteristic shared by beginning and skilled readers is within-word refixations. McConkie et al. (1988, 1991) demonstrated that the probability of making a refixation on a word is a U-shaped function of the landing site of the initial fixation on the word. McConkie et al. (1991) also showed that the probability of skipping a word as a function of saccade launching site increases with age, and the forms of the functions at different grades resemble adult data (McConkie et al., 1994). Thus, it appears that many of the basic mechanisms of eye-movement control in reading English are in place after a year of reading experience, possibly even before any formal reading instruction. Developmental Analyses Using SHARE To the extent a SHARE model can simulate individual readers’ eye-movement patterns, developmental differences in reading eye movements can be studied by analyzing parameters of individual models. McConkie et al. (1991) showed that developmental changes are more complicated than what can be described by global measures such as mean fixation duration. The SHARE architecture is particularly suitable for studying these complex changes, because it provides a rich set of structures and parameters to describe these differences and is able to closely simulate readers’ eye movements, as shown in the previous chapter. This chapter focuses on two developmental issues – the changes of eye-movement

91 control across age, and the changes of the effects of linguistic, perceptual, and oculomotor factors on eye-movement control. These correspond to two levels in the SHARE model, namely the control layer and the relationship between the input and the control layer. Age differences in individual parameters of these layers are analyzed. Grouping by age risks overlooking meaningful within-group differences, as age is only a crude indication of reading skill. In the absence of an independent reading proficiency measure, reading speed (measured in words per minutes, WPM) was used as an indicator of readers’ reading proficiency. Past research has shown that reading speed is highly correlated with standardized reading test scores. Development of Reading Eye-movement Control One of the core assumptions of the SHARE model is discrete control of eye movements in the control layer. The probabilities of making each eye-movement command – for example “forward 2 words” or “long fixation” – form the basis of individual readers’ eye-movement control strategy. The following analyses explore developmental differences in controlling saccades and fixation duration. Saccade targeting. Of the seven potential saccade targets in the ST node, what is the probability of selecting a particular target? Figure 18 shows the probabilities40 of making regressions (ST=-1 or –2*), refixations (ST=0), progressing one word (ST=1), and progressing more than one word (ST= 2, 3, or 4*) as a function of age group and reading speed. Some categories were combined to simplify data presentation.

40

These are the unconditional probabilities, i.e., ignoring the effects of word frequency and alike. They were

computed by collapsing the multidimensional frequency tables in the ST node into a single dimension table.

92 The probability of regressions did not differ across age, F(2, 73)=1.25, p=1.86, MSE=0.0018. Regression rates were around 15% for all age groups, which is remarkable given that the adult readers were reading simple, elementary-school-level stories. Some college student reading as fast as 600 words per minute made 25% regressions, more than any third-grade student did. Refixation rates showed a significant decrease with age, F(2, 73)=105.3, p<0.001, MSE=0.0036. A post-hoc comparison with Bonferroni adjustments showed that each age group was significantly different from others. The probability of progressing one word showed a significant decrease with age, F(2, 73)=12.1, p<0.001, MSE=0.0039. A Bonferroni post-hoc comparison showed that while both third- and fifth-grade groups differed significantly from adults, they did not differ significantly from each other. The magnitude of the difference was rather small – approximately 32% for children versus 25% for adults. Finally, the largest developmental difference was an increase in the probability of progressing two or more words, F(2, 73)=134.4, p<0.001, MSE=0.0052. Each age group differed significantly from others. To summarize results on the developmental patterns of saccade control, the largest differences between beginning and skilled readers lie in the tradeoff between making refixations and making long (2 or more words) forward saccades. Comparatively, the differences in regression rate and the probability of moving forward one word at a time are small. Fixation duration. According to the present model the distribution of fixation durations is a mixture of three components, each of which follows a lognormal distribution. Developmental

93 changes in the proportions, modes, and variance of the components are analyzed below. Figure 19 shows the proportions of the three types of fixations. There was no significant age effect on the probability of making short fixations, F(2, 73)=2.157, p=0.123, MSE=0.0021. The probability of making long fixations showed a significant decrease with age, F(2, 73)=27.3, p<0.001, MSE=0.0056. A post-hoc test showed that third- and fifth-grade students did not differ significantly from each other, but both differed significantly from adults. Because they add up to 1, the probability of making medium fixations also had a significant age effect, increasing with a age, and similar post-hoc results. Figure 20 shows the modes (corresponding to the means of the logarithm of fixation durations in the model) of the three components of fixation durations as a function of age and reading speed. Overall, the largest change appears to be the decrease of long fixation modes with age and reading speed. For short fixations, the average mode increased slightly with age (from 62 msec to 67 msec to 73 msec) although the difference was statistically significant, F(2, 73)=8.591, p<0.001, MSE=0.0080. The differences between the children were not significant in a post hoc test, but the child-adult difference was significant. There was also a significant but small age effect in the modes of medium fixations, F(2, 73)=14.8, p<0.001, MSE=0.0034. The drop from 202 msec (3rd grade) to 198 msec (5th grade) was not significant, but the mode for adults, 179 msec, was significantly lower than either of the children groups. A strong age effect was observed for long fixations, F(2, 73)=35.9, p<0.001, MSE=0.0017. Again, third-grade (319 msec) and fifth-grade (292 msec) values did not differ

94 significantly, but both differed significantly from that of adults (221 msec). Lastly, variances of the three components as functions of age and reading speed are shown in Figure 21. There was no significant age effect for the variance of short fixations, F(2, 73)=1.52, p=0.225, MSE=0.0028. The variance of medium and long fixations decrease significantly with age, F(2, 73)=11.80, p<0.001, MSE=0.0004, and F(2, 73)=17.74, p<.001, MSE=0.0014, respectively. In both cases, the two young age groups did not differ significantly from each other but both differed significantly from adults. Again, the largest age-related difference is the decrease in the variance for long fixations. The above analyses of the control layer parameters provide some new pieces of information to the understanding of reading development. Regarding saccade programming, beginning and skilled readers differ not in regression rate, or in the overall probability of making forward saccades. The developmental change is rather specific – skilled readers tend to make fewer refixations, and make more rather long forward saccades (two words or more). With respect to fixations, findings from the present study concur with McConkie et al. (1991)’s observations that the modes of fixation duration distributions do not change much with age but the tails of the distributions becomes less heavy. In addition, the discrete FDC node in the SHARE model provides a quantitative description of these developmental changes. By decomposing the overall distributions into three components, it is shown that the characteristics of the briefest fixations do not change substantially with age. The medium fixation component, corresponding to the modes of the distribution, becomes slightly shorter and denser, but the changes are small compared to the third component. What really accounts for the developmental changes is the third, long fixation component – its proportion, mode and, variance decreased

95 substantially with age. Effects of Input Variables on Eye-movement Control The above analyses ignore effects of input variables on the control layer. However, Chapter 4 has shown clearly that these input variables contribute significantly to the explanatory ability of the SHARE model. Their effects are investigated below. Under the present implementation, the input variables were represented as ordinal discrete variables (e.g., low, medium, and high frequency; although in the future they may be continuous). Therefore their relations with the control nodes – ST and FDC, which are also discrete and ordinal – are represented in multidimensional contingent tables. The current report will focus only on the main effects of each individual input variable, that is, only the twodimension contingent table between an input variable and a control node will be analyzed. Interactions between these variables will be explored in future research. The strength of association of a two-dimensional contingency table is summarized using Goodman and Kruskal’s Gamma (1954; 1963; see also Agresti, 1990). Gamma, a scalar ranging from -1 to 1, measures the association between two ordinal, discrete variables. It is defined as the difference between numbers of concordant and discordant pairs divided by the sum of the two counts, where discordant pairs are cases where the two variables vary in opposite directions, and concordant pairs are cases where the two variables change in the same direction (ties are excluded; for mathematical definition, see Agresti, 1990). Goodman (1963) showed that a Gamma computed from a sample follows an asymptotic normal distribution, whose mean is the population Gamma and variance is a complex function of the frequencies of concordant and discordant pairs. In the following analyses, the effect of an input variable on eye-movement

96 control is represented with the corresponding Gamma. The present report concentrates on two issues related with development: the proportion of readers in each age group showing statistically significant effects and the change of the absolute values across age. The input variables include word frequency, the length of the next word, landing position of the current fixation, and the state of the previous eye movement (i.e., the previous values of ST and FDC nodes). Input variables and saccade programming. Figure 22 shows the effects of the four input variables on saccade targeting (the ST node). The two horizontal lines in each graph mark the 95% confidence interval of the Gamma41. In other words, data points that fall between the lines were not statistically significantly different from zero. Word frequency has a significant effect on saccade programming for nearly all young readers but for fewer adults. The Gammas were significantly different from zero for 95% of third-grade and 93.3% of fifth-grade students, but only for 67.7% of adults. A Chi-square test showed significant age effect, χ2(2)=9.63, p=0.008. The ANOVA of the Gammas by age group was significant, F(2, 73)=18.6, p<0.001, MSE=0.0018. A Post hoc analysis showed that both third- and fifth-grade students differed from adults but not from each other. The length of the next word appears to have the opposite pattern. The Gammas were significantly different from zero for 40% of third-grade readers, 86.7% of fifth-grade students, and 100% of adults. A Chi-square test showed significant age effect, χ2(2)=15.2, p<0.001. The

41

The confidence interval of Gamma varies somewhat for each reader. In order to visually represent the interval,

the average confidence interval is used here.

97 ANOVA of the Gammas by age group was significant, F(2, 73)=42.8, p<0.001, MSE=0.0033. A Post hoc analysis showed that every age group is different from each other. The picture for landing position is different. The Gammas differed significantly from zero for 35% of third-grade, 23.3% of fifth-grade students, and 7.7% of adults. A Chi-square test showed significant age effect, χ2(2)=0.615, p>0.5. The ANOVA was also nonsignificant. In contrast, for the effect of the previous saccade move, the Gamma for every reader was significantly different from zero. The ANOVA of the Gammas by age group was significant, F(2, 73)=10.45, p<0.001, MSE=0.0039. A Post hoc analysis showed that the third-grade students did not differ from adults but both differed from fifth-graders. Input variables and fixation duration control. Similar analyses also examined the effects of input variables on the FDC node, and are presented in Figure 23. Word frequency showed a significant effect on saccade programming for 60% of thirdgrade, 46.7% of fifth-grade students, and 50% of adults. The Chi-square test was not significant,

**χ2(2)=0.314, p>0.50. The ANOVA of the Gammas by age group was also nonsignificant, F(2,
**

73)=1.184, p=0.312, MSE=0.0053. The length of the next word showed a similar developmental pattern but overall weaker effects. The Gammas were significantly different from zero for 10% of third-grade readers, 30% of fifth-grade students, and 19.2% of adults. A Chi-square test was not significant, χ2(2)=0.3479, p>0.50. The ANOVA was nonsignificant, F(2, 73)=1.232, p=0.298, MSE=0.0035. Landing position showed a development effect. The Gammas differed significantly from zero for 55% of third-grade, 50% of fifth-grade students, and 7.7% of adults. The Chi-square test was not significant, χ2(2)=3.09, p=0.214. However, the ANOVA test was significant, F(2,

98 73)=14.37, p<.001, MSE=0.0055. Lastly, for the effect of the previous saccade move, the Gammas differed significantly from zero for 80% of third-grade, 53.3% of fifth-grade students, and 30.8% of adults. The Chisquare test was not significant, χ2(2)=4.189, p=0.123. However, the ANOVA of the Gammas by age group was significant, F(2, 73)=6.89, p<0.001, MSE=0.0130. A Post hoc analysis showed that the third-grade students did not differ from adults but both differed from fifth-graders. Overall, the above results demonstrated that readers at different proficiency levels are sensitive to different information in programming reading eye movements. When programming the next saccade, beginning readers are more affected by the frequency of the currently fixated word but are less affected by the length of the next word, compared to skilled readers. Landing position also seems to have a larger impact on young readers’ WHEN decision. Additionally, not all variables have equal effects on different parameters of eye movements. For example, the length of the next word has very little effect on the duration of the current fixation but significant effects on the programming of the next saccade, at least for more skilled readers. Discussion What develops in reading eye-movement control? Analyses of the parameters of individual SHARE models suggest that that as readers become more proficient, their eye movements are less affected by features of the currently-fixated word (e.g., word frequency and landing position) or the state of the previous eye movements (e.g., previous values of ST and FDC nodes). Skilled readers take into account of the length of the next word in programming the next saccade, and they tend to move further into the unread text.

99 Results based on analyses of SHARE’s parameter space confirmed many previous knowledge about the development of reading eye movements. Furthermore, SHARE was able to explore important questions that were unanswered in prior research. For example, it is found that temporal-dependency in eye-movement control decreases slightly with age, but the effects remain for most adult readers. A unique feature of the SHARE model is that it models temporal dependencies between consecutive eye movements. Evidently, these temporal dependencies were among the largest and most consistent effects on eye movement control. More interestingly, temporal dependencies decrease in strength with reading proficiency, which suggests that skilled reading eye movements become more like a zero-order Markov (random-walk) process.

100 CHAPTER 7. DISCUSSION The goal of the current research is to describe reading eye movements mathematically with minimal assumptions about the mechanisms of the processes. A stochastic, hierarchical architecture for reading eye-movement, or SHARE, is developed, and a simple model based on this architecture is tested. What is SHARE? SHARE is a mathematical model that is able to reproduce many essential characteristics of reading eye movements. It is, to my knowledge, the first model that simultaneously accounts for fixation duration and saccade length in their distributional details, as opposed to only group means. Its Markovian architecture also gives straightforward explanations to the moment-bymoment dynamics of eye movements with few a priori assumptions, compared with some existing models. SHARE is also unique because of its completely individualized modeling approach, which contrasts strongly with most, if not all, previous models’ focus on “the average person.” Reading eye movements are as diverse as are readers themselves. There is no reason to presume a common set of parameters, or even mechanisms, for all readers. Besides the bias in psychology to think in terms of “the average person,” a practical obstacle preventing individualized modeling is that there may not be enough data collected from an individual reader to obtain sound parameter estimates. The Bayesian method used in SHARE provides a promising way to get around the problem. However, the most important contribution of SHARE is not the model in its current form. Rather, the hope of this research is to introduce a language for describing reading eye

101 movements. I argued in Chapter 1 that researchers have struggled to depict reading eye movements since the discovery of the basic phenomena over a century ago (Javel, 1878). The solutions, ranging from early attempts to use verbal analogies and visual aids to the latest flourish of composite eye-movement measures and theories of mechanisms, are far from satisfactory. The direction of the current research is to separate description from mechanism, and focus squarely on the former. As a result, the SHARE architecture was designed to satisfy three logical requirements for describing reading eye movements – that they are probabilistic in nature, that they are time-series data, and that they are affected by other factors. Of course, some of the details – e.g., the choice of input variables, the specifications of the nodes (discrete vs continuous, etc.), or the independence assumptions – are specific to the current implementation. But nevertheless, the general hierarchical, stochastic architecture has been shown to be flexible enough to capture much of the essence of reading eye movements, and it has the potential to become a common language to talk about eye movements. This brings up the fine but crucial distinction between architecture and a specific theory implemented under the architecture. It can be argued that SHARE, as implemented in the current study, is a particular theory of reading eye movements, because it has restricted linguistic effects on reading to word frequency only, and assumed conditional independency between the WHEN and WHERE pathways. However, the author has no intention to defend such a theory. Rather, it was implemented as an example of modeling in the new architecture. The fact that even a simple-minded “theory” like this could account for many facts of eye movements demonstrated the power of the architecture.

102 What SHARE is Not First and foremost, SHARE, as in its current implementation, is not a theory of reading eye movements the author wants to promote. As argued above, it is merely a demonstration of an architecture to mathematically describe patterns of reading eye movements. Moreover, the SHARE architecture is not a theory of eye-movement control mechanisms. On the contrary, it is assumed that data and mechanism can be described separately, and SHARE is intended to be as independent of the mechanism assumptions as possible. For example, SHARE models what effect word frequency has on saccade targeting, but makes no assumption about how the effect is possible. It does not say anything about whether the effect of word frequency happens earlier or later than word length, or whether or not attention has been shifted. The arrows in the hierarchical architecture represent the direction of causality only; they do not imply serial processing or even temporal order. In short, eye-movement description is at the phenomenological level. SHARE does not compete with existing theories of eye movements; it is a complement. In a sense it provides a test-bed, where different theories may be implemented on a common ground and compete with each other. For example, it is conceivable that the E-Z Reader model could be implemented in a SHARE environment. It would add many processing assumptions to SHARE, and would make specific predictions about how, for example, word frequency would affect the control of saccade and fixation duration. In other words, the model would fix some of the free parameters. The model would then fit empirical data (a built-in feature of the SHARE architecture), and the result could be compared to a “full” model where the corresponding parameters were not fixed. Standard statistical tests could be carried out to evaluate the power of

103 the model. Of course, it is arguable whether description and mechanism are truly separable. Our survey of existing eye-movement models suggests, to a large extent, they are. McConkie and Dyre (2000) have shown that different mechanisms may result in almost identical fits to empirical data. Conversely, the Mr. Chips model (Legge et al., 1997) demonstrated that a complex deterministic process could be modeled successfully with simple probabilistic heuristics. On the other hand, the probabilistic nature of the SHARE architecture precludes implementing models such as READER, in which eye movements are deterministically decided. Nonetheless, the Mr. Chips model hints that SHARE may also be compatible with a deterministic model if the distributional properties of the model are well understood. Composite Variables Revisited: Implications to Psycholinguistic Research The proliferation of composite eye-movement measures may reflect researchers’ increasing frustration in describing complex eye movement patterns. However, new measures have not solved the problem, and in many cases only complicate the matter more. SHARE suggests a different approach. Instead of summing up fixation duration over time in idiosyncratic ways, SHARE captures temporal dynamics with its Markovian structure at the eye-movement control layer. Paired with the power of the hierarchical structure (input variables), SHARE’s probabilistic representation naturally summarizes endless combinations of eyemovement patterns. What is variable and elusive in the sample domain can be expressed as stable parameters in the Markov transition matrices. An analogy is the two representations of speech signals – what is difficult to perceive in the waveform may be obvious in the spectrogram, and vise versa. Eye-movement patterns may be difficult to capture in the sample domain, but much

104 easier to deal with as a Markov transition matrix. To psycholinguistic researchers, this points to a change in data analysis. For example, in a hypothetical reading experiment, the researcher manipulates a target word in a sentence so that in the experimental condition the word does not fit the sentence context whereas in the control condition it does. The researcher is interested in whether readers detect the improper word within the region of the next n words (or within x fixations, etc.). Instead of using gaze duration over the region (or using Liversedge et al., 1998, measures), s/he may define each of the n words as a state, feed the eye movements within the region into a simple SHARE model42, and estimate the transition matrices of the ST node for the experimental and control conditions. If readers change saccade patterns when they see an inappropriate word, different transition matrices are expected for the two conditions. Fixation duration may be modeled similarly. There are several potential benefits for using this approach. First, the results may be more interpretable. Instead of “mean gaze duration increased 15 msec,” one may report something like “the probability of regressing back to the target word increased from 0.1 to 0.5, and the probability of making long fixations increased from 0.3 to 0.4.” Furthermore, with enough data one may be able to estimate instantaneous transition probabilities, e.g., the probability of fixating the target word in the 2nd, 3rd, … fixation after the first fixation on the target word. This is valuable information that many researchers have tried to infer from traditional measures such as first fixation duration and gaze duration. Last but not least, individual differences in reading eye movements may be estimated and experimental effects may be estimated for individual readers.

42

A simple first-order Markov model may suffice in this hypothetical study.

105 In sum, for psycholinguistic reading research, the SHARE architecture may provide a complementary or possibly alternative solution to the eye-movement measurement problem, although many details have to be worked out in the future. Applications in Reading Education One of the original motivations of this research was to use eye movements to detect processing difficulties in reading. In the early days of eye-movement research, the pioneers (Buswell, 1922; Buswell, 1937; Dearborn, 1906; Gray, 1922; Huey, 1908) did not hesitate to point to an “abnormal” eye-movement pattern and conclude that the reader was experiencing difficulties. Buswell (1937) also distinguished general reading deficiencies from having trouble with specific words. The problem, of course, was that the inference process was qualitative and holistic. The “art” of detecting reading difficulties from eye movements disappeared soon after. Logically, if readers have different eye-movement patterns when they are reading normally versus experiencing difficulties, one should be able to compare the patterns and detect the state of the reader. To carry out this process quantitatively, however, one has to be able to faithfully describe eye-movement patterns associated with different states and probabilistically infer the state from observed eye movement patterns. SHARE was developed exactly for this purpose. It is able to summarize a wide range of eye-movement patterns with the stochastic, hierarchical structure. The Bayesian method can be used to probabilistically infer the state of an unobserved node in the structure given observed data (e.g., the value of the FDC node was hidden and was estimated from data). In addition, its ability to adapt to individual reader’s eye-movement parameters is also essential in performing the detection task.

106 As an extension of the current research, a prototype of a reading difficulty detection model has been developed. At its simplest form, it consists of an input layer with FREQ and WLEN as input variable, a cognitive-state layer that contains a binary node (troubled versus normal reading), and the eye-movement layer contains a discrete node that is similar to the ST node in the current model. The cognitive state node is assumed to be unobserved, and the goal of the model is to estimate the probability of the states given input variables and an observed sequence of saccade movements. Initial testing shows that the prototype model is able to distinguish different patterns of eye movements. Although the prototype model is far from complete, the initial results are promising. A next step for the current research is to explore the full potential of the SHARE architecture in describing and detecting reading difficulties.

SHARE, the mathematical model developed in this research, grows out of the need to quantitatively account for reading eye movements in both theoretical research and educational applications. It demonstrates the feasibility and utility of modeling eye movements at a level other than mechanisms and processes. Although researchers may have different theories about mechanisms and processes, it is my hope that at least we can share a common description of eye movements.

**107 Table 1. Developmental Characteristics of Reading Eye Movements TABLES
**

Grade level 3 4

Article and characteristic Taylor (1965) Fixation duration (msec) Fixations per 100 words Frequency of regressions (%) Buswell (1922) Fixation duration (msec) Fixations per 100 words Frequency of regressions (%) Rayner (1985b) Fixation duration (msec) Fixations per 100 words Frequency of regressions (%) McConkie et al. (1991) Fixation duration (msec) Fixations per 100 words Frequency of regressions (%) Overall mean Fixation duration (msec) Fixations per 100 words Frequency of regressions (%)

1

2

5

6

Adult

330 224 23 432 182 26

300 174 23 364 126 21 290 165 27

280 155 22 316 113 20

270 139 22 268 92 19 276 122 25

270 129 21 252 87 20

270 120 21 236 87 21 242 110 24

240 90 17 252 75 8 239 92 9 200 118 21

304 168 34 355 191 28

268 138 33 306 151 26

262 125 34 286 131 25

248 132 36 266 121 26

243 135 36 255 117 26 249 106 22

233 94 14

Reproduced from: Table 4 in Rayner (1998)

108 Table 2. Log Likelihood of Bayesian and MLE for Fixation Duration Fitting

Components

Component Modes BNT (sec) GMM (sec)

Variance of log(dur) BNT GMM

Weights BNT

GMM

3rd-grade: N=481, BNT Log likelihood= -386.86, GMM Log likelihood=-386.68 S 0.062 0.082 0.171 0.284 0.081 M 0.204 0.212 0.133 0.105 0.537 L 0.302 0.321 0.184 0.171 0.382 5th-grade: N=586, BNT Log likelihood= -415.85, GMM Log likelihood=-417.53 S 0.061 0.169 0.217 0.657 0.037 M 0.191 0.194 0.103 0.092 0.595 L 0.305 0.351 0.227 0.143 0.368 Adults: N=416, BNT Log likelihood= -231.08, GMM Log likelihood=-231.40 S 0.061 0.072 0.21 0.281 0.081 M 0.177 0.173 0.089 0.079 0.673 L 0.221 0.218 0.146 0.122 0.246

0.133 0.530 0.337

0.158 0.611 0.231

0.104 0.531 0.365

109 Figure 1. Architecture of Reilly’s Connectionist Model of Eye-movement Control FIGURES

Figure 1. Architecture of Reilly’s Connectionist Model of Eye-movement Control (reproduced from Reilly, 1997, Figure 1). The circles represent connectionist modules and the rectangles non-connectionist control modules. Thick lines indicate a flow of activation, thin lines a flow of control. The asymptote detectors determine when the cascading outputs from the lexical and saccadic modules have reached asymptote.

110

Figure 2. Illustration of Parafoveal Preview Effects in E-Z Reader 5.

Figure 2. Illustration of Parafoveal Preview Effects in E-Z Reader 5 (reproduced from Reichle et al., 1997, Figure 6). Preview benefit (gray area) increases as the frequency of the foveal word increases (x-axis). At time t(fn), familiarity check has completed and a saccade to the next word is ordered, which would take a constant time, t(mn+1)+t(Mn+1), to prepare and execute. During this time, if the lexical completion process is able to finish, t(lan), there will be some time for parafoveal processing, marked in gray. Because the slope for t(lan) is larger than that for t(fn), the gray area shrinks for low-frequency words.

111 Figure 3. Order-of-processing diagram for E-Z Reader 5

Figure 3. An order-of-processing diagram for E-Z Reader 5 (reproduced from Reichle et al., 1998, Figure 7). The boxes are possible states that the model could be in, with the ongoing processes represented in the box. Each arrow is labeled by the process that has completed, and dotted arrows indicate that attention has shifted forward (indicated by n = n + 1 on the diagram). Note that n indexes the attended word, not the fixated word. (The numbers given to the boxes are essentially arbitrary.) f = familiarity check of the word; lc = completion of the lexical access of the word; m = a labile stage of saccade programming that can be canceled by a subsequent saccade; M = a subsequent nonlabile stage of saccade programming. The additional states added are for planning and executing intraword saccades.

112 Figure 4. Illustration of components of the Mr. Chips model

Figure 4. Illustration of components of the Mr. Chips model, reproduced from (reproduced from Legge et al., 1997, Figure 1). See Chapter 2 (page 34) for details.

113 Figures 5A and 5B. Landing Position of Fixations During Reading

Figures 5A and 5B. Landing position of fixations during reading (reproduced from McConkie et al., 1994, Figures 1 and 2). Figure 5A shows empirical frequency distributions of fixation landing position as a function of launching sites. The corresponding fitted Normal curves are plotted. Figure 5B shows the mean landing position as a function of launch site, for seven-letter words. It can be seen that the range error is zero at launch site equals 7 letter spaces.

114 Figure 6. Frequency of skipping four- and eight-letter words

Figure 6. Frequency of skipping four- and eight-letter words (reproduced from McConkie et al., 1994, Figure 3). The probability of word skipping can be modeled with a logistic function (see Chapter 2, page 43 for more details).

115 Figure 7. Mean Landing Positions of Regressive Saccades as a Function of Launch Site

Figure 7. Mean landing positions of regressive saccades as a function of launch site (reproduced from Radach & McConkie, 1998, Figure 3). The x-axis is numbered relative to the space following the target word, with negative numbers indicating launch sites from within the word, and positive numbers indicating launch positions to the right of the word boundary. The y-axis indicates mean landing position, and is numbered with respect to the center of the word. Interword regressions do not show systematic range errors.

116 Figure 8. Fitting Fixation Duration Distribution with a Two-stage Mixture Model

Figure 8. Fitting fixation duration distribution with a two-stage mixture model (reproduced from McConkie et al., 1994, Figure 5). See Chapter 2 (page 46) for details.

117 Figure 9. Distributions of Fixation Durations in Yang and McConkie (in press)

20 18 16 14

Percentage

12 10 8 6 4 2 0

25 75

Normal+ NormalNonword+ X's+ X'sDash+ Blank-

125 175 225 275 325 375 425 475 525 575 625 675 725

Fixation Durations(25ms Bins)

Figure 9. Distributions of fixation durations in Yang and McConkie (in press, reproduced from Figure 2). Normal+ is the control condition in which the original text was displayed. In the Normal- condition all spaces were replaced by the @ character. In the Nonword+ condition letters were replaced by randomly selected letters. In the X’s+ condition all characters except for spaces were replaced by X’s. In the X’s- condition all characters, including spaces, were replaced by X’s. In the Dash+ condition all characters except for spaces were replaced by dashes. All characters were replaced by spaces in the Blank-condition.

118 Figure 10. Graphical representation of the SHARE model

FREQn L|M|H

WLENn+1 S|M|L

ECCENn C| E

STt -2*| -1| 0 | 1 | 2 | 3 | 4*

FDCt S|M|L

PSLt

SACCt

DURt

Figure 10. Graphical representation of the SHARE model. Each node represents a random variable. FREQn is the frequency of the current word. WLENn+1 is the length of the next word. ECCENn is the eccentricity of the current landing position. STt is the saccade targeting node that plans the current saccade (the one following the current fixation t). FDCt is the fixation duration category of the current fixation. PSLt is the planned saccade length of the current saccade. SACCt is the actual length of the current saccade. DURt is the log-transformed duration of the current fixation. Nodes with rectangle boxes are discrete variables; nodes with oval boxes are continuous nodes. Clear boxes represent observed variables; the shadowed box (FDC) represents a hidden variable. An arrow from one node to another shows that the latter variable is dependent on the former; the lack of an arrow between two nodes shows that the two nodes are conditionally independent. The circular arrows beside the ST and FDC nodes signify temporal dependency, i.e., the value of a node at fixation t depends on that at fixation t-1.

119

Figure 11-1. Simulating Fixation Duration and Saccade Length Distributions of Participant 1

Distribution of Fixation Duration Frequency (in empirical data scale) 200 Empirical Simulated 150 Frequency (in empirical data scale) 700 600 500 400 300 200 100

Distribution of Saccade Length

100

50

0

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec) Cumulative Distribution Function

1

0 −600 −400 −200 0 200 Saccade Length

400

600

Cumulative Distribution Function 1 0.8 Cumulative Prob 0.6 0.4 0.2 0 −600 −400 −200 0 200 Saccade Length

1 0.8 Cumulative Prob 0.6 0.4 0.2 0

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec)

1

400

600

120

Figure 11-2. Simulating Fixation Duration and Saccade Length Distributions of Participant 2

Distribution of Fixation Duration Frequency (in empirical data scale) 350 300 250 200 150 100 50 0 0 0.2 0.4 0.6 0.8 Fixation Duration (in sec) Cumulative Distribution Function 1 0.8 Cumulative Prob 0.6 0.4 0.2 0 Cumulative Prob 1 0.8 0.6 0.4 0.2 1 Empirical Simulated Frequency (in empirical data scale) 1400 1200 1000 800 600 400 200

Distribution of Saccade Length

0 −600 −400 −200 0 200 Saccade Length

400

600

Cumulative Distribution Function

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec)

1

0 −600 −400 −200 0 200 Saccade Length

400

600

121

Figure 11-3. Simulating Fixation Duration and Saccade Length Distributions of Participant 3

Distribution of Fixation Duration Frequency (in empirical data scale) 200 Empirical Simulated 150 Frequency (in empirical data scale) 700 600 500 400 300 200 100

Distribution of Saccade Length

100

50

0

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec) Cumulative Distribution Function

1

0 −600 −400 −200 0 200 Saccade Length

400

600

Cumulative Distribution Function 1 0.8 Cumulative Prob 0.6 0.4 0.2 0 −600 −400 −200 0 200 Saccade Length

1 0.8 Cumulative Prob 0.6 0.4 0.2 0

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec)

1

400

600

122

Figure 11-4. Simulating Fixation Duration and Saccade Length Distributions of Participant 4

Distribution of Fixation Duration Frequency (in empirical data scale) 250 200 150 100 50 0 Empirical Simulated Frequency (in empirical data scale) 800

Distribution of Saccade Length

600

400

200

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec) Cumulative Distribution Function

1

0 −600 −400 −200 0 200 Saccade Length

400

600

Cumulative Distribution Function 1 0.8 Cumulative Prob 0.6 0.4 0.2 0 −600 −400 −200 0 200 Saccade Length

1 0.8 Cumulative Prob 0.6 0.4 0.2 0

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec)

1

400

600

123

Figure 11-5. Simulating Fixation Duration and Saccade Length Distributions of Participant 5

Distribution of Fixation Duration Frequency (in empirical data scale) 250 200 150 100 50 0 Empirical Simulated Frequency (in empirical data scale) 600 500 400 300 200 100

Distribution of Saccade Length

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec) Cumulative Distribution Function

1

0 −600 −400 −200 0 200 Saccade Length

400

600

1 0.8 Cumulative Prob 0.6 0.4 0.2 0

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec)

1

400

600

124

Figure 11-6. Simulating Fixation Duration and Saccade Length Distributions of Participant 6

Distribution of Fixation Duration Frequency (in empirical data scale) 350 300 250 200 150 100 50 0 0 0.2 0.4 0.6 0.8 Fixation Duration (in sec) Cumulative Distribution Function 1 0.8 Cumulative Prob 0.6 0.4 0.2 0 Cumulative Prob 1 0.8 0.6 0.4 0.2 1 Empirical Simulated Frequency (in empirical data scale) 1000 800 600 400 200

Distribution of Saccade Length

0 −600 −400 −200 0 200 Saccade Length

400

600

Cumulative Distribution Function

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec)

1

0 −600 −400 −200 0 200 Saccade Length

400

600

125

Figure 11-7. Simulating Fixation Duration and Saccade Length Distributions of Participant 7

Distribution of Fixation Duration Frequency (in empirical data scale) 200 Empirical Simulated 150 Frequency (in empirical data scale) 700 600 500 400 300 200 100

Distribution of Saccade Length

100

50

0

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec) Cumulative Distribution Function

1

0 −600 −400 −200 0 200 Saccade Length

400

600

1 0.8 Cumulative Prob 0.6 0.4 0.2 0

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec)

1

400

600

126

Figure 11-8. Simulating Fixation Duration and Saccade Length Distributions of Participant 8

Distribution of Fixation Duration Frequency (in empirical data scale) 200 Empirical Simulated 150 Frequency (in empirical data scale) 800

Distribution of Saccade Length

600

100

400

50

200

0

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec) Cumulative Distribution Function

1

0 −600 −400 −200 0 200 Saccade Length

400

600

1 0.8 Cumulative Prob 0.6 0.4 0.2 0

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec)

1

400

600

127

Figure 11-9. Simulating Fixation Duration and Saccade Length Distributions of Participant 9

Distribution of Fixation Duration Frequency (in empirical data scale) 300 250 200 150 100 50 0 0 0.2 0.4 0.6 0.8 Fixation Duration (in sec) Cumulative Distribution Function 1 0.8 Cumulative Prob 0.6 0.4 0.2 0 Cumulative Prob 1 0.8 0.6 0.4 0.2 1 Empirical Simulated Frequency (in empirical data scale) 1400 1200 1000 800 600 400 200

Distribution of Saccade Length

0 −600 −400 −200 0 200 Saccade Length

400

600

Cumulative Distribution Function

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec)

1

0 −600 −400 −200 0 200 Saccade Length

400

600

128

Figure 11-10. Simulating Fixation Duration and Saccade Length Distributions of Participant 10

Distribution of Fixation Duration Frequency (in empirical data scale) 200 Empirical Simulated 150 Frequency (in empirical data scale) 800

Distribution of Saccade Length

600

100

400

50

200

0

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec) Cumulative Distribution Function

1

0 −600 −400 −200 0 200 Saccade Length

400

600

1 0.8 Cumulative Prob 0.6 0.4 0.2 0

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec)

1

400

600

129

Figure 11-11. Simulating Fixation Duration and Saccade Length Distributions of Participant 11

Distribution of Fixation Duration Frequency (in empirical data scale) 300 250 200 150 100 50 0 0 0.2 0.4 0.6 0.8 Fixation Duration (in sec) Cumulative Distribution Function 1 0.8 Cumulative Prob 0.6 0.4 0.2 0 Cumulative Prob 1 0.8 0.6 0.4 0.2 1 Empirical Simulated Frequency (in empirical data scale) 1000 800 600 400 200

Distribution of Saccade Length

0 −600 −400 −200 0 200 Saccade Length

400

600

Cumulative Distribution Function

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec)

1

0 −600 −400 −200 0 200 Saccade Length

400

600

130

Figure 11-12. Simulating Fixation Duration and Saccade Length Distributions of Participant 12

Distribution of Fixation Duration Frequency (in empirical data scale) 300 250 200 150 100 50 0 0 0.2 0.4 0.6 0.8 Fixation Duration (in sec) Cumulative Distribution Function 1 0.8 Cumulative Prob 0.6 0.4 0.2 0 Cumulative Prob 1 0.8 0.6 0.4 0.2 1 Empirical Simulated Frequency (in empirical data scale) 600 500 400 300 200 100

Distribution of Saccade Length

0 −600 −400 −200 0 200 Saccade Length

400

600

Cumulative Distribution Function

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec)

1

0 −600 −400 −200 0 200 Saccade Length

400

600

131

Figure 11-13. Simulating Fixation Duration and Saccade Length Distributions of Participant 13

Distribution of Fixation Duration Frequency (in empirical data scale) 200 Empirical Simulated 150 Frequency (in empirical data scale) 300 250 200 150 100 50

Distribution of Saccade Length

100

50

0

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec) Cumulative Distribution Function

1

0 −600 −400 −200 0 200 Saccade Length

400

600

1 0.8 Cumulative Prob 0.6 0.4 0.2 0

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec)

1

400

600

132

Figure 11-14. Simulating Fixation Duration and Saccade Length Distributions of Participant 14

Distribution of Fixation Duration Frequency (in empirical data scale) 200 Empirical Simulated 150 Frequency (in empirical data scale) 800

Distribution of Saccade Length

600

100

400

50

200

0

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec) Cumulative Distribution Function

1

0 −600 −400 −200 0 200 Saccade Length

400

600

1 0.8 Cumulative Prob 0.6 0.4 0.2 0

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec)

1

400

600

133

Figure 11-15. Simulating Fixation Duration and Saccade Length Distributions of Participant 15

Distribution of Saccade Length

100

50

0

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec) Cumulative Distribution Function

1

0 −600 −400 −200 0 200 Saccade Length

400

600

1 0.8 Cumulative Prob 0.6 0.4 0.2 0

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec)

1

400

600

134

Figure 11-16. Simulating Fixation Duration and Saccade Length Distributions of Participant 16

Distribution of Fixation Duration Frequency (in empirical data scale) 200 Empirical Simulated 150 Frequency (in empirical data scale) 350 300 250 200 150 100 50

Distribution of Saccade Length

100

50

0

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec) Cumulative Distribution Function

1

0 −600 −400 −200 0 200 Saccade Length

400

600

1 0.8 Cumulative Prob 0.6 0.4 0.2 0

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec)

1

400

600

135

Figure 11-17. Simulating Fixation Duration and Saccade Length Distributions of Participant 17

Distribution of Fixation Duration Frequency (in empirical data scale) 250 200 150 100 50 0 Empirical Simulated Frequency (in empirical data scale) 1000 800 600 400 200

Distribution of Saccade Length

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec) Cumulative Distribution Function

1

0 −600 −400 −200 0 200 Saccade Length

400

600

1 0.8 Cumulative Prob 0.6 0.4 0.2 0

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec)

1

400

600

136

Figure 11-18. Simulating Fixation Duration and Saccade Length Distributions of Participant 18

Distribution of Saccade Length

100

50

0

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec) Cumulative Distribution Function

1

0 −600 −400 −200 0 200 Saccade Length

400

600

1 0.8 Cumulative Prob 0.6 0.4 0.2 0

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec)

1

400

600

137

Figure 11-19. Simulating Fixation Duration and Saccade Length Distributions of Participant 19

Distribution of Fixation Duration Frequency (in empirical data scale) 250 200 150 100 50 0 Empirical Simulated Frequency (in empirical data scale) 400

Distribution of Saccade Length

300

200

100

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec) Cumulative Distribution Function

1

0 −600 −400 −200 0 200 Saccade Length

400

600

1 0.8 Cumulative Prob 0.6 0.4 0.2 0

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec)

1

400

600

138

Figure 11-20. Simulating Fixation Duration and Saccade Length Distributions of Participant 20

Distribution of Fixation Duration Frequency (in empirical data scale) 400 Empirical Simulated 300 Frequency (in empirical data scale) 800

Distribution of Saccade Length

600

200

400

100

200

0

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec) Cumulative Distribution Function

1

0 −600 −400 −200 0 200 Saccade Length

400

600

1 0.8 Cumulative Prob 0.6 0.4 0.2 0

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec)

1

400

600

139

Figure 11-21. Simulating Fixation Duration and Saccade Length Distributions of Participant 21

Distribution of Fixation Duration Frequency (in empirical data scale) 200 Empirical Simulated 150 Frequency (in empirical data scale) 300 250 200 150 100 50

Distribution of Saccade Length

100

50

0

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec) Cumulative Distribution Function

1

0 −600 −400 −200 0 200 Saccade Length

400

600

1 0.8 Cumulative Prob 0.6 0.4 0.2 0

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec)

1

400

600

140

Figure 11-22. Simulating Fixation Duration and Saccade Length Distributions of Participant 22

Distribution of Fixation Duration Frequency (in empirical data scale) 250 200 150 100 50 0 Empirical Simulated Frequency (in empirical data scale) 700 600 500 400 300 200 100

Distribution of Saccade Length

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec) Cumulative Distribution Function

1

0 −600 −400 −200 0 200 Saccade Length

400

600

1 0.8 Cumulative Prob 0.6 0.4 0.2 0

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec)

1

400

600

141

Figure 11-23. Simulating Fixation Duration and Saccade Length Distributions of Participant 23

Distribution of Fixation Duration Frequency (in empirical data scale) 250 200 150 100 50 0 Empirical Simulated Frequency (in empirical data scale) 700 600 500 400 300 200 100

Distribution of Saccade Length

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec) Cumulative Distribution Function

1

0 −600 −400 −200 0 200 Saccade Length

400

600

1 0.8 Cumulative Prob 0.6 0.4 0.2 0

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec)

1

400

600

142

Figure 11-24. Simulating Fixation Duration and Saccade Length Distributions of Participant 24

Distribution of Fixation Duration Frequency (in empirical data scale) 400 Empirical Simulated 300 Frequency (in empirical data scale) 500 400 300 200 100

Distribution of Saccade Length

200

100

0

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec) Cumulative Distribution Function

1

0 −600 −400 −200 0 200 Saccade Length

400

600

1 0.8 Cumulative Prob 0.6 0.4 0.2 0

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec)

1

400

600

143

Figure 11-25. Simulating Fixation Duration and Saccade Length Distributions of Participant 25

Distribution of Fixation Duration Frequency (in empirical data scale) 140 120 100 80 60 40 20 0 0 0.2 0.4 0.6 0.8 Fixation Duration (in sec) Cumulative Distribution Function 1 0.8 Cumulative Prob 0.6 0.4 0.2 0 Cumulative Prob 1 0.8 0.6 0.4 0.2 1 Empirical Simulated Frequency (in empirical data scale) 700 600 500 400 300 200 100

Distribution of Saccade Length

0 −600 −400 −200 0 200 Saccade Length

400

600

Cumulative Distribution Function

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec)

1

0 −600 −400 −200 0 200 Saccade Length

400

600

144

Figure 11-26. Simulating Fixation Duration and Saccade Length Distributions of Participant 26

Distribution of Fixation Duration Frequency (in empirical data scale) 200 Empirical Simulated 150 Frequency (in empirical data scale) 500 400 300 200 100

Distribution of Saccade Length

100

50

0

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec) Cumulative Distribution Function

1

0 −600 −400 −200 0 200 Saccade Length

400

600

1 0.8 Cumulative Prob 0.6 0.4 0.2 0

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec)

1

400

600

145

Figure 11-27. Simulating Fixation Duration and Saccade Length Distributions of Participant 27

Distribution of Fixation Duration Frequency (in empirical data scale) 250 200 150 100 50 0 Empirical Simulated Frequency (in empirical data scale) 300 250 200 150 100 50

Distribution of Saccade Length

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec) Cumulative Distribution Function

1

0 −600 −400 −200 0 200 Saccade Length

400

600

1 0.8 Cumulative Prob 0.6 0.4 0.2 0

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec)

1

400

600

146

Figure 11-28. Simulating Fixation Duration and Saccade Length Distributions of Participant 28

Distribution of Fixation Duration Frequency (in empirical data scale) 250 200 150 100 50 0 Empirical Simulated Frequency (in empirical data scale) 600 500 400 300 200 100

Distribution of Saccade Length

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec) Cumulative Distribution Function

1

0 −600 −400 −200 0 200 Saccade Length

400

600

1 0.8 Cumulative Prob 0.6 0.4 0.2 0

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec)

1

400

600

147

Figure 11-29. Simulating Fixation Duration and Saccade Length Distributions of Participant 29

Distribution of Fixation Duration Frequency (in empirical data scale) 140 120 100 80 60 40 20 0 0 0.2 0.4 0.6 0.8 Fixation Duration (in sec) Cumulative Distribution Function 1 0.8 Cumulative Prob 0.6 0.4 0.2 0 Cumulative Prob 1 0.8 0.6 0.4 0.2 1 Empirical Simulated Frequency (in empirical data scale) 800

Distribution of Saccade Length

600

400

200

0 −600 −400 −200 0 200 Saccade Length

400

600

Cumulative Distribution Function

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec)

1

0 −600 −400 −200 0 200 Saccade Length

400

600

148

Figure 11-30. Simulating Fixation Duration and Saccade Length Distributions of Participant 30

Distribution of Fixation Duration Frequency (in empirical data scale) 250 200 150 100 50 0 Empirical Simulated Frequency (in empirical data scale) 600 500 400 300 200 100

Distribution of Saccade Length

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec) Cumulative Distribution Function

1

0 −600 −400 −200 0 200 Saccade Length

400

600

1 0.8 Cumulative Prob 0.6 0.4 0.2 0

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec)

1

400

600

149

Figure 11-31. Simulating Fixation Duration and Saccade Length Distributions of Participant 31

Distribution of Fixation Duration Frequency (in empirical data scale) 200 Empirical Simulated 150 Frequency (in empirical data scale) 350 300 250 200 150 100 50

Distribution of Saccade Length

100

50

0

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec) Cumulative Distribution Function

1

0 −600 −400 −200 0 200 Saccade Length

400

600

1 0.8 Cumulative Prob 0.6 0.4 0.2 0

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec)

1

400

600

150

Figure 11-32. Simulating Fixation Duration and Saccade Length Distributions of Participant 32

Distribution of Fixation Duration Frequency (in empirical data scale) 400 Empirical Simulated 300 Frequency (in empirical data scale) 800

Distribution of Saccade Length

600

200

400

100

200

0

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec) Cumulative Distribution Function

1

0 −600 −400 −200 0 200 Saccade Length

400

600

1 0.8 Cumulative Prob 0.6 0.4 0.2 0

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec)

1

400

600

151

Figure 11-33. Simulating Fixation Duration and Saccade Length Distributions of Participant 33

Distribution of Saccade Length

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec) Cumulative Distribution Function

1

0 −600 −400 −200 0 200 Saccade Length

400

600

1 0.8 Cumulative Prob 0.6 0.4 0.2 0

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec)

1

400

600

152

Figure 11-34. Simulating Fixation Duration and Saccade Length Distributions of Participant 34

Distribution of Saccade Length

100

50

0

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec) Cumulative Distribution Function

1

0 −600 −400 −200 0 200 Saccade Length

400

600

1 0.8 Cumulative Prob 0.6 0.4 0.2 0

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec)

1

400

600

153

Figure 11-35. Simulating Fixation Duration and Saccade Length Distributions of Participant 35

Distribution of Fixation Duration Frequency (in empirical data scale) 200 Empirical Simulated 150 Frequency (in empirical data scale) 400

Distribution of Saccade Length

300

100

200

50

100

0

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec) Cumulative Distribution Function

1

0 −600 −400 −200 0 200 Saccade Length

400

600

1 0.8 Cumulative Prob 0.6 0.4 0.2 0

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec)

1

400

600

154

Figure 11-36. Simulating Fixation Duration and Saccade Length Distributions of Participant 36

Distribution of Fixation Duration Frequency (in empirical data scale) 400 Empirical Simulated 300 Frequency (in empirical data scale) 1200 1000 800 600 400 200

Distribution of Saccade Length

200

100

0

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec) Cumulative Distribution Function

1

0 −600 −400 −200 0 200 Saccade Length

400

600

1 0.8 Cumulative Prob 0.6 0.4 0.2 0

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec)

1

400

600

155

Figure 11-37. Simulating Fixation Duration and Saccade Length Distributions of Participant 37

Distribution of Fixation Duration Frequency (in empirical data scale) 250 200 150 100 50 0 Empirical Simulated Frequency (in empirical data scale) 350 300 250 200 150 100 50

Distribution of Saccade Length

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec) Cumulative Distribution Function

1

0 −600 −400 −200 0 200 Saccade Length

400

600

1 0.8 Cumulative Prob 0.6 0.4 0.2 0

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec)

1

400

600

156

Figure 11-38. Simulating Fixation Duration and Saccade Length Distributions of Participant 38

Distribution of Fixation Duration Frequency (in empirical data scale) 250 200 150 100 50 0 Empirical Simulated Frequency (in empirical data scale) 300 250 200 150 100 50

Distribution of Saccade Length

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec) Cumulative Distribution Function

1

0 −600 −400 −200 0 200 Saccade Length

400

600

1 0.8 Cumulative Prob 0.6 0.4 0.2 0

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec)

1

400

600

157

Figure 11-39. Simulating Fixation Duration and Saccade Length Distributions of Participant 39

Distribution of Fixation Duration Frequency (in empirical data scale) 120 100 80 60 40 20 0 0 0.2 0.4 0.6 0.8 Fixation Duration (in sec) Cumulative Distribution Function 1 0.8 Cumulative Prob 0.6 0.4 0.2 0 Cumulative Prob 1 0.8 0.6 0.4 0.2 1 Empirical Simulated Frequency (in empirical data scale) 350 300 250 200 150 100 50

Distribution of Saccade Length

0 −600 −400 −200 0 200 Saccade Length

400

600

Cumulative Distribution Function

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec)

1

0 −600 −400 −200 0 200 Saccade Length

400

600

158

Figure 11-40. Simulating Fixation Duration and Saccade Length Distributions of Participant 40

Distribution of Saccade Length

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec) Cumulative Distribution Function

1

0 −600 −400 −200 0 200 Saccade Length

400

600

1 0.8 Cumulative Prob 0.6 0.4 0.2 0

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec)

1

400

600

159

Figure 11-41. Simulating Fixation Duration and Saccade Length Distributions of Participant 41

Distribution of Fixation Duration Frequency (in empirical data scale) 200 Empirical Simulated 150 Frequency (in empirical data scale) 500 400 300 200 100

Distribution of Saccade Length

100

50

0

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec) Cumulative Distribution Function

1

0 −600 −400 −200 0 200 Saccade Length

400

600

1 0.8 Cumulative Prob 0.6 0.4 0.2 0

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec)

1

400

600

160

Figure 11-42. Simulating Fixation Duration and Saccade Length Distributions of Participant 42

Distribution of Fixation Duration Frequency (in empirical data scale) 300 250 200 150 100 50 0 0 0.2 0.4 0.6 0.8 Fixation Duration (in sec) Cumulative Distribution Function 1 0.8 Cumulative Prob 0.6 0.4 0.2 0 Cumulative Prob 1 0.8 0.6 0.4 0.2 1 Empirical Simulated Frequency (in empirical data scale) 800

Distribution of Saccade Length

600

400

200

0 −600 −400 −200 0 200 Saccade Length

400

600

Cumulative Distribution Function

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec)

1

0 −600 −400 −200 0 200 Saccade Length

400

600

161

Figure 11-43. Simulating Fixation Duration and Saccade Length Distributions of Participant 43

Distribution of Fixation Duration Frequency (in empirical data scale) 400 Empirical Simulated 300 Frequency (in empirical data scale) 700 600 500 400 300 200 100

Distribution of Saccade Length

200

100

0

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec) Cumulative Distribution Function

1

0 −600 −400 −200 0 200 Saccade Length

400

600

1 0.8 Cumulative Prob 0.6 0.4 0.2 0

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec)

1

400

600

162

Figure 11-44. Simulating Fixation Duration and Saccade Length Distributions of Participant 44

Distribution of Fixation Duration Frequency (in empirical data scale) 600 500 400 300 200 100 0 0 0.2 0.4 0.6 0.8 Fixation Duration (in sec) Cumulative Distribution Function 1 0.8 Cumulative Prob 0.6 0.4 0.2 0 Cumulative Prob 1 0.8 0.6 0.4 0.2 1 Empirical Simulated Frequency (in empirical data scale) 1000 800 600 400 200

Distribution of Saccade Length

0 −600 −400 −200 0 200 Saccade Length

400

600

Cumulative Distribution Function

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec)

1

0 −600 −400 −200 0 200 Saccade Length

400

600

163

Figure 11-45. Simulating Fixation Duration and Saccade Length Distributions of Participant 45

Distribution of Fixation Duration Frequency (in empirical data scale) 200 Empirical Simulated 150 Frequency (in empirical data scale) 400

Distribution of Saccade Length

300

100

200

50

100

0

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec) Cumulative Distribution Function

1

0 −600 −400 −200 0 200 Saccade Length

400

600

1 0.8 Cumulative Prob 0.6 0.4 0.2 0

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec)

1

400

600

164

Figure 11-46. Simulating Fixation Duration and Saccade Length Distributions of Participant 46

Distribution of Saccade Length

100

50

0

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec) Cumulative Distribution Function

1

0 −600 −400 −200 0 200 Saccade Length

400

600

1 0.8 Cumulative Prob 0.6 0.4 0.2 0

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec)

1

400

600

165

Figure 11-47. Simulating Fixation Duration and Saccade Length Distributions of Participant 47

Distribution of Saccade Length

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec) Cumulative Distribution Function

1

0 −600 −400 −200 0 200 Saccade Length

400

600

1 0.8 Cumulative Prob 0.6 0.4 0.2 0

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec)

1

400

600

166

Figure 11-48. Simulating Fixation Duration and Saccade Length Distributions of Participant 48

Distribution of Fixation Duration Frequency (in empirical data scale) 200 Empirical Simulated 150 Frequency (in empirical data scale) 500 400 300 200 100

Distribution of Saccade Length

100

50

0

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec) Cumulative Distribution Function

1

0 −600 −400 −200 0 200 Saccade Length

400

600

1 0.8 Cumulative Prob 0.6 0.4 0.2 0

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec)

1

400

600

167

Figure 11-49. Simulating Fixation Duration and Saccade Length Distributions of Participant 49

Distribution of Saccade Length

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec) Cumulative Distribution Function

1

0 −600 −400 −200 0 200 Saccade Length

400

600

1 0.8 Cumulative Prob 0.6 0.4 0.2 0

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec)

1

400

600

168

Figure 11-50. Simulating Fixation Duration and Saccade Length Distributions of Participant 50

Distribution of Fixation Duration Frequency (in empirical data scale) 250 200 150 100 50 0 Empirical Simulated Frequency (in empirical data scale) 350 300 250 200 150 100 50

Distribution of Saccade Length

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec) Cumulative Distribution Function

1

0 −600 −400 −200 0 200 Saccade Length

400

600

1 0.8 Cumulative Prob 0.6 0.4 0.2 0

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec)

1

400

600

169

Figure 11-51. Simulating Fixation Duration and Saccade Length Distributions of Participant 51

Distribution of Fixation Duration Frequency (in empirical data scale) 250 200 150 100 50 0 Empirical Simulated Frequency (in empirical data scale) 250 200 150 100 50

Distribution of Saccade Length

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec) Cumulative Distribution Function

1

0 −600 −400 −200 0 200 Saccade Length

400

600

1 0.8 Cumulative Prob 0.6 0.4 0.2 0

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec)

1

400

600

170

Figure 11-52. Simulating Fixation Duration and Saccade Length Distributions of Participant 52

Distribution of Fixation Duration Frequency (in empirical data scale) 250 200 150 100 50 0 Empirical Simulated Frequency (in empirical data scale) 500 400 300 200 100

Distribution of Saccade Length

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec) Cumulative Distribution Function

1

0 −600 −400 −200 0 200 Saccade Length

400

600

1 0.8 Cumulative Prob 0.6 0.4 0.2 0

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec)

1

400

600

171

Figure 11-53. Simulating Fixation Duration and Saccade Length Distributions of Participant 53

Distribution of Saccade Length

100

50

0

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec) Cumulative Distribution Function

1

0 −600 −400 −200 0 200 Saccade Length

400

600

1 0.8 Cumulative Prob 0.6 0.4 0.2 0

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec)

1

400

600

172

Figure 11-54. Simulating Fixation Duration and Saccade Length Distributions of Participant 54

Distribution of Fixation Duration Frequency (in empirical data scale) 300 250 200 150 100 50 0 0 0.2 0.4 0.6 0.8 Fixation Duration (in sec) Cumulative Distribution Function 1 0.8 Cumulative Prob 0.6 0.4 0.2 0 Cumulative Prob 1 0.8 0.6 0.4 0.2 1 Empirical Simulated Frequency (in empirical data scale) 500 400 300 200 100

Distribution of Saccade Length

0 −600 −400 −200 0 200 Saccade Length

400

600

Cumulative Distribution Function

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec)

1

0 −600 −400 −200 0 200 Saccade Length

400

600

173

Figure 11-55. Simulating Fixation Duration and Saccade Length Distributions of Participant 55

Distribution of Fixation Duration Frequency (in empirical data scale) 200 Empirical Simulated 150 Frequency (in empirical data scale) 400

Distribution of Saccade Length

300

100

200

50

100

0

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec) Cumulative Distribution Function

1

0 −600 −400 −200 0 200 Saccade Length

400

600

1 0.8 Cumulative Prob 0.6 0.4 0.2 0

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec)

1

400

600

174

Figure 11-56. Simulating Fixation Duration and Saccade Length Distributions of Participant 56

Distribution of Fixation Duration Frequency (in empirical data scale) 250 200 150 100 50 0 Empirical Simulated Frequency (in empirical data scale) 250 200 150 100 50

Distribution of Saccade Length

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec) Cumulative Distribution Function

1

0 −600 −400 −200 0 200 Saccade Length

400

600

1 0.8 Cumulative Prob 0.6 0.4 0.2 0

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec)

1

400

600

175

Figure 11-57. Simulating Fixation Duration and Saccade Length Distributions of Participant 57

Distribution of Fixation Duration Frequency (in empirical data scale) 250 200 150 100 50 0 Empirical Simulated Frequency (in empirical data scale) 300 250 200 150 100 50

Distribution of Saccade Length

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec) Cumulative Distribution Function

1

0 −600 −400 −200 0 200 Saccade Length

400

600

1 0.8 Cumulative Prob 0.6 0.4 0.2 0

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec)

1

400

600

176

Figure 11-58. Simulating Fixation Duration and Saccade Length Distributions of Participant 58

Distribution of Fixation Duration Frequency (in empirical data scale) 350 300 250 200 150 100 50 0 0 0.2 0.4 0.6 0.8 Fixation Duration (in sec) Cumulative Distribution Function 1 0.8 Cumulative Prob 0.6 0.4 0.2 0 Cumulative Prob 1 0.8 0.6 0.4 0.2 1 Empirical Simulated Frequency (in empirical data scale) 500 400 300 200 100

Distribution of Saccade Length

0 −600 −400 −200 0 200 Saccade Length

400

600

Cumulative Distribution Function

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec)

1

0 −600 −400 −200 0 200 Saccade Length

400

600

177

Figure 11-59. Simulating Fixation Duration and Saccade Length Distributions of Participant 59

Distribution of Fixation Duration Frequency (in empirical data scale) 300 250 200 150 100 50 0 0 0.2 0.4 0.6 0.8 Fixation Duration (in sec) Cumulative Distribution Function 1 0.8 Cumulative Prob 0.6 0.4 0.2 0 Cumulative Prob 1 0.8 0.6 0.4 0.2 1 Empirical Simulated Frequency (in empirical data scale) 400

Distribution of Saccade Length

300

200

100

0 −600 −400 −200 0 200 Saccade Length

400

600

Cumulative Distribution Function

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec)

1

0 −600 −400 −200 0 200 Saccade Length

400

600

178

Figure 11-60. Simulating Fixation Duration and Saccade Length Distributions of Participant 60

Distribution of Fixation Duration Frequency (in empirical data scale) 300 250 200 150 100 50 0 0 0.2 0.4 0.6 0.8 Fixation Duration (in sec) Cumulative Distribution Function 1 0.8 Cumulative Prob 0.6 0.4 0.2 0 Cumulative Prob 1 0.8 0.6 0.4 0.2 1 Empirical Simulated Frequency (in empirical data scale) 400

Distribution of Saccade Length

300

200

100

0 −600 −400 −200 0 200 Saccade Length

400

600

Cumulative Distribution Function

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec)

1

0 −600 −400 −200 0 200 Saccade Length

400

600

179

Figure 11-61. Simulating Fixation Duration and Saccade Length Distributions of Participant 61

Distribution of Fixation Duration Frequency (in empirical data scale) 250 200 150 100 50 0 Empirical Simulated Frequency (in empirical data scale) 250 200 150 100 50

Distribution of Saccade Length

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec) Cumulative Distribution Function

1

0 −600 −400 −200 0 200 Saccade Length

400

600

1 0.8 Cumulative Prob 0.6 0.4 0.2 0

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec)

1

400

600

180

Figure 11-62. Simulating Fixation Duration and Saccade Length Distributions of Participant 62

Distribution of Fixation Duration Frequency (in empirical data scale) 300 250 200 150 100 50 0 0 0.2 0.4 0.6 0.8 Fixation Duration (in sec) Cumulative Distribution Function 1 0.8 Cumulative Prob 0.6 0.4 0.2 0 Cumulative Prob 1 0.8 0.6 0.4 0.2 1 Empirical Simulated Frequency (in empirical data scale) 700 600 500 400 300 200 100

Distribution of Saccade Length

0 −600 −400 −200 0 200 Saccade Length

400

600

Cumulative Distribution Function

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec)

1

0 −600 −400 −200 0 200 Saccade Length

400

600

181

Figure 11-63. Simulating Fixation Duration and Saccade Length Distributions of Participant 63

Distribution of Fixation Duration Frequency (in empirical data scale) 200 Empirical Simulated 150 Frequency (in empirical data scale) 200

Distribution of Saccade Length

150

100

100

50

50

0

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec) Cumulative Distribution Function

1

0 −600 −400 −200 0 200 Saccade Length

400

600

1 0.8 Cumulative Prob 0.6 0.4 0.2 0

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec)

1

400

600

182

Figure 11-64. Simulating Fixation Duration and Saccade Length Distributions of Participant 64

Distribution of Fixation Duration Frequency (in empirical data scale) 250 200 150 100 50 0 Empirical Simulated Frequency (in empirical data scale) 350 300 250 200 150 100 50

Distribution of Saccade Length

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec) Cumulative Distribution Function

1

0 −600 −400 −200 0 200 Saccade Length

400

600

1 0.8 Cumulative Prob 0.6 0.4 0.2 0

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec)

1

400

600

183

Figure 11-65. Simulating Fixation Duration and Saccade Length Distributions of Participant 65

Distribution of Fixation Duration Frequency (in empirical data scale) 150 Empirical Simulated 100 Frequency (in empirical data scale) 350 300 250 200 150 100 50

Distribution of Saccade Length

50

0

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec) Cumulative Distribution Function

1

0 −600 −400 −200 0 200 Saccade Length

400

600

1 0.8 Cumulative Prob 0.6 0.4 0.2 0

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec)

1

400

600

184

Figure 11-66. Simulating Fixation Duration and Saccade Length Distributions of Participant 66

Distribution of Saccade Length

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec) Cumulative Distribution Function

1

0 −600 −400 −200 0 200 Saccade Length

400

600

1 0.8 Cumulative Prob 0.6 0.4 0.2 0

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec)

1

400

600

185

Figure 11-67. Simulating Fixation Duration and Saccade Length Distributions of Participant 67

Distribution of Saccade Length

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec) Cumulative Distribution Function

1

0 −600 −400 −200 0 200 Saccade Length

400

600

1 0.8 Cumulative Prob 0.6 0.4 0.2 0

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec)

1

400

600

186

Figure 11-68. Simulating Fixation Duration and Saccade Length Distributions of Participant 68

Distribution of Fixation Duration Frequency (in empirical data scale) 300 250 200 150 100 50 0 0 0.2 0.4 0.6 0.8 Fixation Duration (in sec) Cumulative Distribution Function 1 0.8 Cumulative Prob 0.6 0.4 0.2 0 Cumulative Prob 1 0.8 0.6 0.4 0.2 1 Empirical Simulated Frequency (in empirical data scale) 400

Distribution of Saccade Length

300

200

100

0 −600 −400 −200 0 200 Saccade Length

400

600

Cumulative Distribution Function

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec)

1

0 −600 −400 −200 0 200 Saccade Length

400

600

187

Figure 11-69. Simulating Fixation Duration and Saccade Length Distributions of Participant 69

Distribution of Fixation Duration Frequency (in empirical data scale) 300 250 200 150 100 50 0 0 0.2 0.4 0.6 0.8 Fixation Duration (in sec) Cumulative Distribution Function 1 0.8 Cumulative Prob 0.6 0.4 0.2 0 Cumulative Prob 1 0.8 0.6 0.4 0.2 1 Empirical Simulated Frequency (in empirical data scale) 700 600 500 400 300 200 100

Distribution of Saccade Length

0 −600 −400 −200 0 200 Saccade Length

400

600

Cumulative Distribution Function

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec)

1

0 −600 −400 −200 0 200 Saccade Length

400

600

188

Figure 11-70. Simulating Fixation Duration and Saccade Length Distributions of Participant 70

Distribution of Fixation Duration Frequency (in empirical data scale) 200 Empirical Simulated 150 Frequency (in empirical data scale) 300 250 200 150 100 50

Distribution of Saccade Length

100

50

0

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec) Cumulative Distribution Function

1

0 −600 −400 −200 0 200 Saccade Length

400

600

1 0.8 Cumulative Prob 0.6 0.4 0.2 0

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec)

1

400

600

189

Figure 11-71. Simulating Fixation Duration and Saccade Length Distributions of Participant 71

Distribution of Saccade Length

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec) Cumulative Distribution Function

1

0 −600 −400 −200 0 200 Saccade Length

400

600

1 0.8 Cumulative Prob 0.6 0.4 0.2 0

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec)

1

400

600

190

Figure 11-72. Simulating Fixation Duration and Saccade Length Distributions of Participant 72

Distribution of Fixation Duration Frequency (in empirical data scale) 350 300 250 200 150 100 50 0 0 0.2 0.4 0.6 0.8 Fixation Duration (in sec) Cumulative Distribution Function 1 0.8 Cumulative Prob 0.6 0.4 0.2 0 Cumulative Prob 1 0.8 0.6 0.4 0.2 1 Empirical Simulated Frequency (in empirical data scale) 600 500 400 300 200 100

Distribution of Saccade Length

0 −600 −400 −200 0 200 Saccade Length

400

600

Cumulative Distribution Function

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec)

1

0 −600 −400 −200 0 200 Saccade Length

400

600

191

Figure 11-73. Simulating Fixation Duration and Saccade Length Distributions of Participant 73

Distribution of Fixation Duration Frequency (in empirical data scale) 250 200 150 100 50 0 Empirical Simulated Frequency (in empirical data scale) 500 400 300 200 100

Distribution of Saccade Length

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec) Cumulative Distribution Function

1

0 −600 −400 −200 0 200 Saccade Length

400

600

1 0.8 Cumulative Prob 0.6 0.4 0.2 0

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec)

1

400

600

192

Figure 11-74. Simulating Fixation Duration and Saccade Length Distributions of Participant 74

Distribution of Saccade Length

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec) Cumulative Distribution Function

1

0 −600 −400 −200 0 200 Saccade Length

400

600

1 0.8 Cumulative Prob 0.6 0.4 0.2 0

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec)

1

400

600

193

Figure 11-75. Simulating Fixation Duration and Saccade Length Distributions of Participant 75

Distribution of Fixation Duration Frequency (in empirical data scale) 300 250 200 150 100 50 0 0 0.2 0.4 0.6 0.8 Fixation Duration (in sec) Cumulative Distribution Function 1 0.8 Cumulative Prob 0.6 0.4 0.2 0 Cumulative Prob 1 0.8 0.6 0.4 0.2 1 Empirical Simulated Frequency (in empirical data scale) 600 500 400 300 200 100

Distribution of Saccade Length

0 −600 −400 −200 0 200 Saccade Length

400

600

Cumulative Distribution Function

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec)

1

0 −600 −400 −200 0 200 Saccade Length

400

600

194

Figure 11-76. Simulating Fixation Duration and Saccade Length Distributions of Participant 76

Distribution of Saccade Length

100

50

0

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec) Cumulative Distribution Function

1

0 −600 −400 −200 0 200 Saccade Length

400

600

1 0.8 Cumulative Prob 0.6 0.4 0.2 0

0

0.2 0.4 0.6 0.8 Fixation Duration (in sec)

1

400

600

195 Figure 12. Simulated and Empirical First Fixation Duration by Word Frequency

Low Frequency Words 0.5 0.4 0.5 0.4

Medium Frequency Words

Simulated (sec.)

0.3 0.2 0.1 0

Simulated (sec.)

0.3 0.2 0.1 0

G3 G5 Adult

0

0.1

0.2 0.3 0.4 Empirical (sec.)

0.5

0

0.1

0.2 0.3 0.4 Empirical (sec.)

0.5

High Frequency Words 0.5 0.4

Simulated (sec.)

0.3 0.2 0.1 0

0

0.1

0.2 0.3 0.4 Empirical (sec.)

0.5

196 Figure 13. Simulated and Empirical Single Fixation Duration by Word Frequency

Low Frequency Words 0.5 0.4 0.5 0.4

Medium Frequency Words G3 G5 Adult

Simulated (sec.)

0.3 0.2 0.1 0

Simulated (sec.)

0 0.1 0.2 0.3 0.4 Empirical (sec.) 0.5

0.3 0.2 0.1 0

0

0.1

0.2 0.3 0.4 Empirical (sec.)

0.5

High Frequency Words 0.5 0.4

Simulated (sec.)

0.3 0.2 0.1 0

0

0.1

0.2 0.3 0.4 Empirical (sec.)

0.5

197 Figure 14. Simulated and Empirical Gaze Duration by Word Frequency

**Low Frequency Words 1.4 1.2
**

Simulated (sec.) Simulated (sec.)

Medium Frequency Words 1.4 1.2 1 0.8 0.6 0.4 0.2 G3 G5 Adult

1 0.8 0.6 0.4 0.2 0 0 0.2 0.4 0.6 0.8 1 1.2 1.4 Empirical (sec.) High Frequency Words 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

0

0

0.2 0.4 0.6 0.8 1 1.2 1.4 Empirical (sec.)

Simulated (sec.)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Empirical (sec.)

198 Figure 15. Simulated and Empirical Skipping Probability by Word Frequency

**Low Frequency Words 0.1 0.08
**

Simulated prob.

**Medium Frequency Words 0.25 0.2
**

Simulated prob.

0.06 0.04 0.02 0

0.15 0.1 0.05 0 G3 G5 Adult

0

0.02

0.04 0.06 0.08 Empirical prob.

0.1

0

0.05

0.1 0.15 0.2 Empirical prob.

0.25

**High Frequency Words 0.5 0.4
**

Simulated prob.

0.3 0.2 0.1 0

0

0.1

0.2 0.3 0.4 Empirical prob.

0.5

199 Figure 16. Simulated and Empirical Probability of Making Single Fixation by Word Frequency

**Low Frequency Words 0.25 0.2
**

Simulated prob.

**Medium Frequency Words 0.25 0.2
**

Simulated prob.

0.15 0.1 0.05 0

0.15 0.1 0.05 0 G3 G5 Adult 0 0.05 0.1 0.15 0.2 Empirical prob. 0.25

0

0.05

0.1 0.15 0.2 Empirical prob.

0.25

**High Frequency Words 0.5 0.4
**

Simulated prob.

0.3 0.2 0.1 0

0

0.1

0.2 0.3 0.4 Empirical prob.

0.5

200 Figure 17. Simulated and Empirical Probability of Making Two Fixations by Word

**Low Frequency Words 0.14 0.12
**

Simulated prob. Simulated prob.

Medium Frequency Words 0.14 0.12 0.1 0.08 0.06 0.04 0.02 G3 G5 Adult

0.1 0.08 0.06 0.04 0.02 0 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 Empirical prob. High Frequency Words 0.2

0

0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 Empirical prob.

Simulated prob.

0.15

0.1

0.05

0

0

0.05 0.1 0.15 Empirical prob.

0.2

201 Figure 18. Developmental Changes in Saccade Targeting Probabilities

Probability of Regressions 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

Probability of Refixations G3 G5 Adult

Prob.

0

200 400 600 Reading Speed (WPM) Prob. of Progressing 1 Word

800

Prob.

0

200 400 600 Reading Speed (WPM)

800

Prob. of Progressing 2 or More Words 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

Prob.

0

200 400 600 Reading Speed (WPM)

800

Prob.

0

200 400 600 Reading Speed (WPM)

800

202 Figure 19. Developmental Changes in Fixation Duration Control: Probabilities of Making Short, Medium, and Long Fixations

Probability of Short Fixations 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

Probability of Medium Fixations

Prob.

Prob.

G3 G5 Adult

0

200 400 600 Reading Speed (WPM) Probability of Long Fixations

800

0

200 400 600 Reading Speed (WPM)

800

1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

Prob.

0

200 400 600 Reading Speed (WPM)

800

203 Figure 20. Developmental Changes in Fixation Duration Control: Modes of Short, Medium, and Long Fixation Durations

**Mode of Short Fixations 0.5 0.4
**

Time (sec.)

**Mode of Medium Fixations 0.5 0.4
**

Time (sec.)

G3 G5 Adult

0.3 0.2 0.1 0

0.3 0.2 0.1 0

0

200 400 600 Reading Speed (WPM) Mode of Long Fixations

800

0

200 400 600 Reading Speed (WPM)

800

0.5 0.4

Time (sec.)

0.3 0.2 0.1 0

0

200 400 600 Reading Speed (WPM)

800

204 Figure 21. Developmental Changes in Fixation Duration Control: Variance of Short, Medium, and Long Fixation Durations

**Variance of Short Fixations 0.5 0.4
**

Var. (sec.)

**Variance of Medium Fixations 0.5 0.4
**

Var. (sec.)

G3 G5 Adult

0.3 0.2 0.1 0

0.3 0.2 0.1 0

0

200 400 600 Reading Speed (WPM) Variance of Long Fixations

800

0

200 400 600 Reading Speed (WPM)

800

0.5 0.4

Var. (sec.)

0.3 0.2 0.1 0

0

200 400 600 Reading Speed (WPM)

800

205 Figure 22. What Affects Saccade Targeting: Effects of Word Frequency, Length of the Next Word, Fixation Landing Position, and the Previous Saccade Move

Word Frequency 0.5 0.4 0.3 0.2 0.1 0 −0.1 −0.2 −0.3 −0.4 −0.5 0.5 0.4 0.3 0.2 0.1 0 −0.1 −0.2 −0.3 −0.4 −0.5

Length of Next Word G3 G5 Adult

Gamma

0

200 400 600 Reading Speed (WPM) Landing Position

800

Gamma

0

200 400 600 Reading Speed (WPM) Last Saccade Move

800

0.5 0.4 0.3 0.2 0.1 0 −0.1 −0.2 −0.3 −0.4 −0.5

0

200 400 600 Reading Speed (WPM)

800

0.5 0.4 0.3 0.2 0.1 0 −0.1 −0.2 −0.3 −0.4 −0.5

Gamma

Gamma

0

200 400 600 Reading Speed (WPM)

800

206 Figure 23. What Affects Fixation Duration Control: Effects of Word Frequency, Length of the Next Word, Fixation Landing Position, and the Previous Saccade Move

Word Frequency 0.5 0.4 0.3 0.2 0.1 0 −0.1 −0.2 −0.3 −0.4 −0.5 0.5 0.4 0.3 0.2 0.1 0 −0.1 −0.2 −0.3 −0.4 −0.5

Length of Next Word G3 G5 Adult

Gamma

0

200 400 600 Reading Speed (WPM) Landing Position

800

Gamma

0

200 400 600 Reading Speed (WPM) Last Fixation

800

0.5 0.4 0.3 0.2 0.1 0 −0.1 −0.2 −0.3 −0.4 −0.5

0

200 400 600 Reading Speed (WPM)

800

0.5 0.4 0.3 0.2 0.1 0 −0.1 −0.2 −0.3 −0.4 −0.5

Gamma

Gamma

0

200 400 600 Reading Speed (WPM)

800

207 Figure 24. BNT Mixture of Gaussian Model Diagram

FDC S|M|L

DUR

Figure 24. Graphical representation of the BNT Mixture of Gaussian model for fitting fixation duration distributions. FDC is a hidden node representing the fixation duration category. DUR is the log-transformed duration of the current fixation. FDC is a discrete variable with three states: S, M, and L, with prior probabilities of 0.10, 0.55, and 0.35, respectively. DUR is a continuous variable following normal (Gaussian) distributions. The priors for DUR conditioned on FDC value are set as follows: DURS~N(75, 80), DURM~N(180, 130), DURL~N(320, 320).

208

Figure 25-1. Fitting 3rd-grade Fixation Duration Distribution with 1-component Lognormal Mixture Model

4.5 N= 45995, mean= 0.27084, LogLikelihood= −40497 4 Mode(linear)= 0.230, var(log)= 0.341, w= 1.000

3.5

3 Prob. Density

2.5

2

1.5

1

0.5

0

0

0.2

0.4 Time (in second)

0.6

0.8

1

209

Figure 25-2. Fitting 3rd-grade Fixation Duration Distribution with 2-component Lognormal Mixture Model

4.5 N= 45995, mean= 0.27084, LogLikelihood= −38832 4 Mode(linear)= Mode(linear)= 0.218, var(log)= 0.238, var(log)= 0.636, w= 0.139, w= 0.402 0.598

3.5

3 Prob. Density

2.5

2

1.5

1

0.5

0

0

0.2

0.4 Time (in second)

0.6

0.8

1

210

Figure 25-3. Fitting 3rd-grade Fixation Duration Distribution with 3-component Lognormal Mixture Model

4.5 N= 45995, mean= 0.27084, LogLikelihood= −38498 4 Mode(linear)= Mode(linear)= Mode(linear)= 0.081, var(log)= 0.212, var(log)= 0.362, var(log)= 0.353, w= 0.120, w= 0.246, w= 0.088 0.608 0.305

3.5

3 Prob. Density

2.5

2

1.5

1

0.5

0

0

0.2

0.4 Time (in second)

0.6

0.8

1

211

Figure 25-4. Fitting 3rd-grade Fixation Duration Distribution with 4-component Lognormal Mixture Model

4.5

N= 45995, mean= 0.27084, LogLikelihood= −38437 Mode(linear)= Mode(linear)= Mode(linear)= Mode(linear)= 0.067, var(log)= 0.170, var(log)= 0.274, var(log)= 0.444, var(log)= 0.230, w= 0.066, w= 0.079, w= 0.210, w= 0.071 0.346 0.399 0.183

4

3.5

3 Prob. Density

2.5

2

1.5

1

0.5

0

0

0.2

0.4 Time (in second)

0.6

0.8

1

212

Figure 25-5. Fitting 3rd-grade Fixation Duration Distribution with 5-component Lognormal Mixture Model

4.5

N= 45995, mean= 0.27084, LogLikelihood= −38458 Mode(linear)= Mode(linear)= Mode(linear)= Mode(linear)= Mode(linear)= 0.064, var(log)= 0.155, var(log)= 0.223, var(log)= 0.338, var(log)= 0.533, var(log)= 0.208, w= 0.057, w= 0.050, w= 0.061, w= 0.180, w= 0.066 0.239 0.339 0.245 0.111

4

3.5

3 Prob. Density

2.5

2

1.5

1

0.5

0

0

0.2

0.4 Time (in second)

0.6

0.8

1

213

Figure 25-6. Fitting 5th-grade Fixation Duration Distribution with 1-component Lognormal Mixture Model

7

6

5 N= 57015, mean= 0.24816, LogLikelihood= −43961 Prob. Density 4 Mode(linear)= 0.217, var(log)= 0.274, w= 1.000

3

2

1

0

0

0.2

0.4 Time (in second)

0.6

0.8

1

214

Figure 25-7. Fitting 5th-grade Fixation Duration Distribution with 2-component Lognormal Mixture Model

7

6

5 N= 57015, mean= 0.24816, LogLikelihood= −41707 Prob. Density 4 Mode(linear)= Mode(linear)= 0.215, var(log)= 0.218, var(log)= 0.528, w= 0.109, w= 0.393 0.607

3

2

1

0

0

0.2

0.4 Time (in second)

0.6

0.8

1

215

Figure 25-8. Fitting 5th-grade Fixation Duration Distribution with 3-component Lognormal Mixture Model

7

6

5 N= 57015, mean= 0.24816, LogLikelihood= −41175 Prob. Density 4 Mode(linear)= Mode(linear)= Mode(linear)= 0.086, var(log)= 0.198, var(log)= 0.326, var(log)= 0.341, w= 0.091, w= 0.201, w= 0.080 0.599 0.321

3

2

1

0

0

0.2

0.4 Time (in second)

0.6

0.8

1

216

Figure 25-9. Fitting 5th-grade Fixation Duration Distribution with 4-component Lognormal Mixture Model

7

6

5 N= 57015, mean= 0.24816, LogLikelihood= −40986 Prob. Density Mode(linear)= Mode(linear)= Mode(linear)= Mode(linear)= 0.077, var(log)= 0.170, var(log)= 0.266, var(log)= 0.393, var(log)= 0.248, w= 0.050, w= 0.069, w= 0.195, w= 0.074 0.396 0.361 0.169

4

3

2

1

0

0

0.2

0.4 Time (in second)

0.6

0.8

1

217

Figure 25-10. Fitting 5th-grade Fixation Duration Distribution with 5-component Lognormal Mixture Model

7

6

5 N= 57015, mean= 0.24816, LogLikelihood= −40947 Mode(linear)= Mode(linear)= Mode(linear)= Mode(linear)= Mode(linear)= 0.059, var(log)= 0.133, var(log)= 0.182, var(log)= 0.281, var(log)= 0.422, var(log)= 0.155, w= 0.055, w= 0.034, w= 0.052, w= 0.173, w= 0.046 0.119 0.364 0.325 0.147

Prob. Density

4

3

2

1

0

0

0.2

0.4 Time (in second)

0.6

0.8

1

218

Figure 25-11. Fitting Adult Fixation Duration Distribution with 1-component Lognormal Mixture Model

8

7

6

5 Prob. Density N= 40478, mean= 0.19254, LogLikelihood= −23607 4 Mode(linear)= 0.176, var(log)= 0.188, w= 1.000

3

2

1

0

0

0.2

0.4 Time (in second)

0.6

0.8

1

219

Figure 25-12. Fitting Adult Fixation Duration Distribution with 2-component Lognormal Mixture Model

8

7

6

5 Prob. Density N= 40478, mean= 0.19254, LogLikelihood= −21839 4 Mode(linear)= Mode(linear)= 0.150, var(log)= 0.187, var(log)= 0.404, w= 0.093, w= 0.276 0.724

3

2

1

0

0

0.2

0.4 Time (in second)

0.6

0.8

1

220

Figure 25-13. Fitting Adult Fixation Duration Distribution with 3-component Lognormal Mixture Model

8

7

6

5 Prob. Density N= 40478, mean= 0.19254, LogLikelihood= −21812 4 Mode(linear)= Mode(linear)= Mode(linear)= 0.110, var(log)= 0.173, var(log)= 0.237, var(log)= 0.328, w= 0.072, w= 0.133, w= 0.156 0.564 0.279

3

2

1

0

0

0.2

0.4 Time (in second)

0.6

0.8

1

221

Figure 25-14. Fitting Adult Fixation Duration Distribution with 4-component Lognormal Mixture Model

8

7

6

5 Prob. Density N= 40478, mean= 0.19254, LogLikelihood= −21814 Mode(linear)= Mode(linear)= Mode(linear)= Mode(linear)= 0.096, var(log)= 0.154, var(log)= 0.216, var(log)= 0.285, var(log)= 0.252, w= 0.050, w= 0.052, w= 0.133, w= 0.133 0.391 0.348 0.128

4

3

2

1

0

0

0.2

0.4 Time (in second)

0.6

0.8

1

222

Figure 25-15. Fitting Adult Fixation Duration Distribution with 5-component Lognormal Mixture Model

8

7

6

5 Prob. Density

N= 40478, mean= 0.19254, LogLikelihood= −21764 Mode(linear)= Mode(linear)= Mode(linear)= Mode(linear)= Mode(linear)= 0.075, var(log)= 0.124, var(log)= 0.166, var(log)= 0.231, var(log)= 0.298, var(log)= 0.193, w= 0.043, w= 0.021, w= 0.033, w= 0.115, w= 0.075 0.181 0.323 0.295 0.127

4

3

2

1

0

0

0.2

0.4 Time (in second)

0.6

0.8

1

223 APPENDIX A. PROBLEMS IN THE E-Z READER MODEL Reichle et al. (1998; Reichle et al., 1999) developed a series of “E-Z Reader” models of eye-movement control during reading. They concluded that the E-Z Reader models fit the data well. However, as will be shown below, evaluating the goodness of fit of the model turned out to be impossible because of serious problems in their goodness-of-fit index and limitations of the empirical data used for modeling. The Goodness-of-fit Index. A goodness-of-fit index is arguably the most important part of a model. On one hand, it is the criterion based on which a model is "optimized" and parameters are estimated. On the other hand, it is an important criterion for comparing and selecting models. It is the link between theory and data. However, the way goodness-of-fit was handled in Reichle et al. is questionable. According to Reichle et al. (1998): The model's overall performance was measured by using the root mean square of the normalized difference scores (errors) between the observed and predicted means of the five frequency classes for each of the dependent measures. The normalization process allowed the errors to be evaluated on a common scale (i.e., milliseconds and probabilities were converted to unitless scores). The normalization process that we used was to square the difference between the observed and predicted values and then to divide this difference by the standard deviation of the observed values. (p. 157) To facilitate further discussion, let’s put the above into formulas. Let X, an eyemovement measure, be a random variable with expected value µx and standard deviation σx . Let {x1 .. xN} be a random sample of X, with a sample mean of x and sample standard deviation sd.

224 For large N we know that the distribution of sample mean is approximately normal with a standard deviation that is estimated by the sample standard error, se = sd N . Finally, let xs be

the mean of measure X from the E-Z Reader simulation. Because of the large size of N in the simulation (1,000 “statistical subjects”), xs should be very stable and can be practically treated as a constant. With the above notations, we can write Reichle et al.’s normalization algorithm and the goodness-of-fit index (RMS) in the following formulas. For each measure of the M=30 measures, Xi, the normalized difference score, according to Reichle, et al. (1998, cited above), is yi = ( x si - x i )2 , sd i

and the goodness-of-fit index, root mean square (RMS), of a model is calculated as

RMS =

∑y

i

M

2 i

M

There are at least two serious errors in the above goodness-of-fit index, each of which will be shown to have a large impact on the evaluation and interpretation of the models. In addition, the use of RMS as goodness-of-fit is also questionable. I will discuss each of them below. The “normalization.” Reichle et al. claim that their normalization process "allowed the errors to be evaluated on a common scale" that is, rendering them unitless. The idea was probably to normalize using Z-scores. But, their formula of normalization does not serve this purpose:

yi =

( x si - x i )2 x -x = (x s i - x i ) × s i i = (x s i - x i ) × Z i sd i sd i

225 Clearly Zi is a unitless Z-score, but Reichle et al.'s "normalized difference score" scaled Zi by the difference between the observed and estimated mean of measure X. As a consequence, when yi’s were used to calculate overall goodness of fit, different measures had different contributions to the loss function and the weight depended on the scale of the measurers. Specific to the E-Z Reader models, a rough estimation from Reichle et al. (1999) Table 1 showed that ( x s i - x i ) for gaze duration, first fixation duration, and single-fixation duration are anywhere from 2 to 18 (not counting 0's), while ( x s i - x i ) for the probability of skipping, making single fixations, and for making two fixations are in the range of 0.01 to 0.1. The difference between the two groups of measures is in a factor of 100. Without doing any mathematical analysis, it's obvious that the effects on the probabilities were grossly suppressed during the model-fitting and parameter-estimating process. An immediate consequence of using this 100:1 "normalization" formula is that the E-Z Reader models were sensitive to fixation duration data but practically ignored effects on skipping and refixation probabilities. It is not surprising then, given this optimization criterion, that model fitting did not improve in any real sense from E-Z Reader 2 to 6, and in many cases the fitness was actually worse. It's interesting, though, that even under this extremely unfavorable treatment, the three probability measures were fit reasonably well, judged by simply looking at the observed and estimated means. A possible explanation is that the different measures of eye movements may not be independent (as indeed they should not be if the E-Z Reader model is correct), and consequently fitting a subset of the variables would guarantee that the rest of the variables are also fit well. This hypothesis will be examined later. Standard deviation versus standard error. In calculating

226

Zi = xsi - x i , sd i

Reichle et al. (1998; Reichle et al., 1999) used standard deviation of the observed sample as the denominator. Because the comparisons here were between means of observed versus simulated observations, sample standard error should be used in the denominator (see Hayes, 1988). I suspect that the confusion might stem from a seemingly similar situation, model training in artificial neural networks, where after each cycle RMS is calculated on the basis of sample standard deviation. This use of sample standard deviation is legitimate because a single observation – activation level after this cycle – is the center of concern, rather than a mean of some sort. However, the Monte Carlo simulation that Reichle et al. was doing is fundamentally based on the Law of Large Numbers and is only concerned with means. What impact does this have on goodness-of-fit indices? The answer depends on the sample size. A rough guess on the N for each of the 30 means from Schiling et al. is approximately 3,000 (48 sentences, 12 words long on average, 30 subjects, divided by 5 frequency categories). If Reichle et al. used standard error instead of standard deviation of each measure, the Z scores, hence the overall goodness-of-fit index, would have been roughly 50 times larger. The RMS for E-Z Reader 6, for example, would have been in the neighborhood of 10, instead of 0.218. The Z-scores (using the correct formula) follow a unit Normal distribution (for N=3,000). Therefore any |Zi| >2 clearly indicates a poor fit at point i, at an α level of .05. If

sd were used in place of se, as Reichle et al. did, the magnitude of Zi would be shrunk some 50-

fold and would never be significant. RMS and goodness-of-fit testing. Reichle et al. chose to use the root mean square of error

227 (RMS) as an index of the goodness of fit during grid-searches of optimal parameters. There is nothing wrong with the choice. However, RMS is rarely used in statistical modeling or Monte Carlo simulations as a goodness-of-fit index because (a) it is difficult to test the fit of a model to data or to compare different models on the basis of RMS, and (b) there are easier ways to do the job. One classical goodness-of-test statistic, Chi-square, is actually closely related to RMS. When each of the M error components is independently and identically distributed (i.i.d.) as unit Normal distribution (Z), the sum of squared errors (SSE),

SSE = RMS 2 × M = ∑ Z i2 ,

i M

is distributed as a Chi-square distribution with degree of freedom (df) of M. Thus SSE can be used to test against an appropriate Chi-square distribution to see if the hypothesis that the model fits the data set should be rejected. Not only can the fit of a single model be tested this way, but also a series of two or more hierarchically constructed models, with increasing numbers of free parameters, can be compared using the Chi-square test in order to decide whether the improvement in fit with additional parameters is statistically justifiable. Reichle et al. did not formally test the fit of their models to the data or based model selection on clear empirical criteria, being primarily concerned with psychological validity. Well-developed statistical methods of model fitting exist, and can provide a more systematic means of developing and comparing models. Correlations, Multicollinearity, and Parsimonious Modeling. A question raised previously is why E-Z Reader was able to model eye-movement

228 probability data fairly well even when these measures had little weight in model optimization and parameter estimation. A possibility is that the probability measures were highly correlated with duration data. There was a hint in the report that this was true, as Reichle et al. (1998) stated that “the single-fixation duration and refixation means were not included in this [RMS] measure because their values are largely redundant with the other measures.” To test this hypothesis, I computed pairwise correlations between the six means of eyemovement measures, mean category word frequency, and the logarithm of the frequencies given in Reichle et al. (1999) Table 1. All eye-movement measure means are highly correlated. The correlation coefficients range from .85 (between skipping rate and first fixation duration, p=.069, N.S. for n=5), to .998 (between first fixation duration and single fixation duration, highly significant). A Principle Component analysis on the six eye-movement measures showed that the first component accounts for 94.6% of the total variance, the first two components account for 98.6%, and the first three component account for 99.999% of total variance. In addition, all eyemovement measures are highly correlated with the logarithm of word frequency (all p's<.05). In short, the six eye-movement variables can be effectively reduced to a single variable, with only 5% loss of information. The model fitting on the 30-point empirical data was practically based on 5 points, which have an almost perfect linear relationship with log-transformed word frequency. The multicollinearity explains another puzzling aspect of the E-Z Reader models. First, as E-Z Reader evolved from 1 to 5, its goodness-of-fit (measured by RMS) did not improve, and often got worse. This goes against the common experience in modeling. Part of the reason for this is because of the errors in the loss function. On the other hand, it could also be that the E-Z

229 Reader 1 was almost perfect given such a simple structure in the data. Any additional mechanisms and parameters added in subsequent models could not possibly improve the fit. Obviously, the most parsimonious model, possibly the only model, for this data set is "any eye-movement measure is a linear function of log-transformed word frequency." Given the extremely high correlations among all variables, a good model for one variable is automatically a good one for another variable. The rest of the modeling process is to find out the intercepts and slopes of the linear functions – an easy job for the grid-search algorithm. The EZ-Reader modeling effort is one of the most ambitious attempts to model eyemovement control parameters in a psychologically plausible fashion, but important errors in the modeling approach severely limit the conclusions that can be drawn from this research.

230 APPENDIX B. FITTING MIXTURE MODELS TO EMPIRICAL FIXATION DURATION DISTRIBUTIONS Introduction There has been empirical evidence that fixation duration may not follow a single distribution but instead consist of a mixture of distributions (Gezeck et al., 1997; McConkie & Dyre, 2000; Yang & McConkie, in press; see Chapter 3 for discussions on these studies). Therefore, two critical modeling decisions are (a) the component distributions and (b) the number of components. To date, the most successful models of fixation duration distribution are the three models from McConkie and Dyre (2000) – the two-state transition model, the two-stage race model, and the two-stage mixture model. All of the three models are essentially mixture models of an early, short component and a late, long component43. The choices of component distributions varied (Weibull, exponential, convolutions of Weibull and exponentials), but they were largely motivated by empirical hazard functions.

43

For the two-state transition model, the short fixations are assumed to follow a Weibull distribution with a power

(shape) parameter equals to 2 (which has a linearly rising hazard function). The long component is assumed to be exponential. The mixing rate, i.e., the proportion to switch from State 1 to State 2, increases over time. For the twostage mixture and two-stage racing models, McConkie and Dyre (2000) assumed that the duration of Stage 1 is a mixture of short and long components; the duration of Stage1 is then convoluted with that of Stage 2, which is an exponentially distributed random variable. This is mathematically equivalent to saying that the final distribution is composed of short and long fixations, each of which is a convolution of two random distributions – the corresponding State 1 distribution and the exponential. Therefore, all three models are essentially mixture models.

231 There is no unique way to fit an empirical distribution with mixture models (c.f. McLachlan & Peel, 2000). The success of these models suggests that other mixture models with different component assumptions may also achieve good results. In addition, the components in McConkie and Dyre’s (2000) models are complex and difficult to handle mathematically. The current study was a search for a simpler solution. A simple distribution, the lognormal distribution, was chosen as the distribution of the mixture components44. There were two reasons for this choice. First, the hazard function of lognormal distribution has the characteristics of the empirical hazard rates: an initially slow but accelerating curve, reaching at a peak, which is followed by a very slow, graduate decreasing tail (Johnson et al., 1994). Secondly, a mixture-of-lognormal distribution is easy to handle because on the log-scale it becomes a mixture-of-normal distribution, which is the most extensively studied mixture model class. Its mathematical properties are well understood, and many statistical algorithms are available for model estimation. Method Data and Apparatus. See Chapter 4. Modeling procedure. Model fitting was done in MatLab, a numeric computation software package. Fixation duration was first log-transformed, so that the logarithm of it was to be fit with mixture-of-component Gaussian models. Two fitting methods were used. For maximum likelihood estimating, the Gaussian Mixture Model (GMM) Toolbox was

44

The log-normal distribution is closely related to Normal distribution in that if log(x), x>0 is Normally distributed,

then x follows a log-normal distribution.

232 used (Cadez, Smyth, McLachlan, & McLaren, 2001). The GMM algorithm fits a mixture of n Gaussian model, where n is a pre-specified integer, to the data and iteratively changes model parameters until it maximizes the likelihood of observing the data given the model. For more discussions on mixture models in general or maximum likelihood estimation of mixture of Gaussian models, see McLachlan and colleagues (McLachlan & Basford, 1988; McLachlan & Peel, 2000) and Titterington, Smith, and Makov (1985). The logarithm of fixation durations was fitted with n=1..7 Gaussian mixture models, and the best fitting parameters over 5 repetitions (with different random initial values) were used. In addition to the maximum likelihood method, Bayesian estimation was done with the Bayes Net Toolbox (BNT) developed by Kevin Murphy (2001). A graphical representation of the BNT Gaussian mixture model is shown in Figure 24. The Bayesian method takes into account the prior probability distribution of a parameter, which represents prior knowledge, and incorporates it with the information in data to maximize the posterior probability, or the probability of parameter values given observed data. A unique advantage of the Bayesian method over the maximum likelihood estimation is that it incorporates prior knowledge about the likely values of parameters. In the current case, the prior knowledge came from the empirical results of Yang and McConkie (in press), i.e., the modes of the distributions in their Figure 9. Results Maximum likelihood estimates. Figures 25-1 through 15 show the empirical fixation duration distribution, the best-fit n-component Gaussian-mixture models, and the (weighted) component distributions for third-grade, fifth-grade, and adult data. A visual inspection suggests

233 that 3-component Gaussian-mixture models fit the empirical data very well. Most importantly, the three components in each age group correspond fairly well with the results from Yang and McConkie. Formally determining the number of components, however, was difficult. The typical log-likelihood ratio test, a statistical procedure for comparing a “full” versus a “reduced” model by weighting the gain in the goodness of fit against additional number of parameters, cannot be applied directly in this case, because a 2-component Gaussian-mixture model is not strictly a “reduced” model of a 3-component Gaussian-mixture model (McLachlan & Basford, 1988; McLachlan & Peel, 2000; Titterington et al., 1985). Many alternative tests have been proposed (McLachlan & Peel, 2000). Here I adopted a modified log-likelihood ratio test by Wolfe (1971; see also Everitt, 1981), which has been shown to work well when the number of cases is at least five times larger than the number of components. Wolfe proposed that under the null hypothesis that the data arise from a mixture of g1 populations versus the alternative that they arise from g2 (g1<g2) populations, the usual log-likelihood statistic 2 logλ would be approximated as -2c logλ ~ χ2d , where the degrees of freedom, d, is taken to be twice the difference in the number of parameters in the two hypotheses, not including the mixing proportions, and the correction factor, c, is given by (n-1-p-1/2 g2)/n In the current case n is sufficiently large, so c is practically 1. Wolfe’s test was carried out in sequence to test the minimal number mixture components

234 that provided satisfactory fit to empirical data45. Each additional Normal component added two new parameters, and hence d=4, and the corresponding Chi-square critical value for α=0.005 is 14.8602. In other words, if the difference of log-likelihood in two consecutive models (in terms of the number of components) was larger than 14.86, the null hypothesis (having a smaller number of components) should be rejected and the hypothesis associated with a larger number of components should be adopted. In all age groups the 3-component Gaussian-mixture models provided significantly better fit than 2-component models, and seemed to capture the basic characteristics of the distributions. The statistical tests showed that one should prefer a 4-component model for third-grade data, 5component for fifth-grade, and 3-component for adults. The additional variance accounted for in moving beyond 3 components was relatively small (e.g., the loglikelihood for 3rd-grade distribution increased by 334 when 3 components were used instead of 2, but it only increased by 61 and 21 for each additional component above 3), although significant. Because the differences were so small and in order to facilitate comparison between age groups, 3-component models were used for all groups in analysis of the parameters. Although the maximum likelihood estimates of 3-component means corroborate with Yang and McConkie’s (in press) findings in general, the estimates for the first component (the short fixations) were not numerically stable from run to run, and the estimated means and variances had a sizable effect on the estimates of parameters of the third (the longest)

45

Here the potential problem of correlation in sequential testing was simply dealt with by using a more stringent α

level, α=0.005.

235 components. There was a need to “anchor” the first component so as to obtain more stable estimates of other components. Bayesian estimates. The Bayesian estimation method was used to achieve these goals. In these analyses, the number of components was fixed to three. Rather than having the maximum likelihood algorithm randomly guess the initial values of parameters, the Bayesian method allows imposing constraints of parameter values using prior distributions. Based on Yang and McConkie (in press), the prior distributions of the components were set to three normal distributions: N(log(75), 80), N(log(180), 130), and N(log(320), 320)46. The prior distribution of the mixture weights was set to a Dirichlet distribution, following Bayesian modeling conventions, with pi= {0.10, 0.55, 0.35}. These prior weights were based on the maximum likelihood estimates of the weights for 3-component models. Because Bayesian estimation is notoriously time consuming, random samples of 10% of the original data were used in Bayesian estimation. This procedure was repeated three times to ensure stability of estimates. In fact, the estimates were very stable even if only 1% of data (which correspond to approximately 200-500 cases in each age group) were used. For comparison the same random samples were subject to maximum likelihood estimation as well. Table 2 showed the parameters and log-likelihood indices of the Bayesian estimates and the corresponding maximum likelihood estimates. The results of the two methods were generally in agreement. The fittings of Bayesian estimates (log-likelihood) were at least as good as that of maximum likelihood ones, and the differences were often within the range of random

46

The unit for the means is millisecond. Note that fixation duration was log-transformed first and then fit to

236 fluctuations caused by different random starting points in the maximum likelihood method. As expected, expected the Bayesian method provided a more consistent estimate of the mean of the first component, so that it was less likely to interfere with the parameters of the third component. To summarize the fitting results of lognormal-mixture models, 3-component models provided very close fit to fixation duration distributions of both children and adult readers. Although it is impossible to compare the goodness-of-fit of the lognormal-mixture model to that of McConkie and Dyre’s models, they appear to be largely comparable based on the distribution plots. In addition, the parameters of the three component distributions were reasonably close to empirical findings in Yang and McConkie. This was an encouraging support for the choice of lognormal-mixture model. Additional analyses showed that the 3-component lognormal-mixture model could also fit distributions of individual readers. Fixation duration on low frequency words had higher proportion of the “long” component, and the mode of the component was larger. A further investigation on the frequency effect showed the effect could be accounted for solely by the weight component, i.e., when the parameters of the three components were fixed and only the weights were allowed to vary, model fitting was not significantly different from when all parameters were allowed to vary. Discussion The current study showed that a 3-component mixture-of-lognormal model could successfully model empirical fixation duration distributions of beginning readers and adults. The

mixture-of-Gaussian models, which is equivalent to fitting fixation duration with mixture-of-log-normal models.

237 fitting appeared to be as good as McConkie and Dyre’s (2000) models. The 3-component lognormal-mixture model provided a simple, straightforward interpretation for Yang and McConkie’s (in press) results. According to the current model, there are three classes of fixations, each with different distributional properties. In normal reading, the mixture rate of these fixation classes may change with linguistic or other factors, but is relatively stable. The resulted mixture showed the typical unimodal, long-tailed distribution. Under extreme experimental manipulations such as in Yang and McConkie’s study, however, the proportions are knocked out of normal balance and therefore individual component were revealed. The current mixture model would hypothesize that each individual reader should have stable component parameters in normal reading and Yang and McConkie’s experimental conditions. It would be interesting to see this hypothesis tested. Interpreting Yang and McConkie’s findings (in press) in McConkie and Dyre’s (2000) modeling framework is difficult, because they assumed a two-component structure. In this sense, the current model seems to be more readily interpretable. Unlike McConkie and Dyre (2000), no attempt was made to infer the underlying processing mechanism from the forms of distributions. Reasoning about stochastic processes from their marginal distributions is often risky, as many mechanisms may result in similar distributions. The choice of using lognormal components, which were no more arbitrary than those components in McConkie and Dyre’s models, may raise skepticism. There is no doubt that choosing the lognormal distribution was for modeling convenience, but the results suggested that the decision was not a particularly bad one. At the same time, there is nothing in the model that requires a lognormal distribution, and any other reasonable distribution may just work as well.

238 REFERENCES Agresti, A. (1990). Categorical data analysis. New York: Wiley. Andriessen, J. J., & De Voogd, A. H. (1973). Analysis of eye movement pattern in silent reading. IF’0 Annual Program Report, 30-35. Bengio, Y. (1999). Markovian models for sequential data. Neural computing surveys, 2, 129-162. Bengio, Y., & Frasconi, P. (1996). Input/output HMMs for sequence processing. IEEE Transactions on Neural Networks, 1231-1249. Bernardo, J. M., & Smith, A. F. M. (1994). Bayesian theory. Chichester, England: John Wiley. Birnbaum, Z. W. (1952). Numerical tabulation of the distribution of Kolmogorov's statistic for finite sample size. Journal of the American Statistical Association, 47, 425-441. Boyen, X., & Koller, D. (1998a). Approximate learning of dynamic models. Paper presented at the Neural Information Processing Systems (NIPS-11). Boyen, X., & Koller, D. (1998b). Tractable inference for complex stochastic processes. Paper presented at the 14th Annual Conference on Uncertainty in AI (UAI), San Francisco. Brysbaert, M., & Vitu, F. (1998). Word skipping: Implications for theories of eye movement control in reading. In G. Underwood (Ed.), Eye guidance in reading and scene perception (pp. 125-147). Oxford, England UK: Anonima Romana. Brysbaert, M., Vitu, F., & Schroyens, W. (1996). The right visual field advantage and the optimal viewing position effect: On the relation between foveal and parafoveal word recognition. Neuropsychology, 10, 385-395.

239 Buswell, G. T. (1922). Fundamental reading habits: A study of their development. Supplementary Educational Monographs, 21. Buswell, G. T. (1937). How adults read. Chicago, Ill.,: University of Chicago. Cadez, I. V., Smyth, P., McLachlan, G. J., & McLaren, C. E. (2001). Maximum likelihood estimation of mixture densities for binned and truncated multivariate data. Machine learning journal, special edition on unsupervised learning, in press. Carpenter, P. A. (1984). The influence of methodologies on psycholinguistic research: A regression to the Whorfian hypothesis. In D. E. Kieras & M. A. Just (Eds.), New methods in reading comprehension research (pp. 1-12). Hillsdale, NJ: Lawrence Erlbaum Asso. Carpenter, R. H. S. (1988). Movements of the eyes (2nd rev. & enlarged ed.). London, England UK: Pion Limited. Conover, W. J. (1999). Practical nonparametric statistics. (3rd ed.). New York: Wiley. Cowell, R. (1998a). Advanced inference in Bayesian networks, Learning in graphic models (pp. 27-50). Cambridge, MA: MIT Press. Cowell, R. (1998b). Introduction to inference for Bayesian networks, Learning in graphic models (pp. 9-26). Cambridge, MA: MIT Press. Dearborn, W. F. (1906). The Psychology of Reading. (Vol. XIV). New York: The Science Press. Everitt, B. S. (1981). A Monte Carlo investigation of the likelihood ratio test for the number of components in a mixture of normal distributions. Multivariate Behavioral Research, 16, 171-180.

240 Feng, G., Miller, K. F., Zhang, H., & Shu, H. (2001). Towed to recovery: the use of phonological and orthographic information in reading Chinese and English. Journal of Experimental Psychology: Learning, Memory, and Cognition, 27, 1079-1100. Findlay, J. M., & Walker, R. (1999). A model of saccade generation based on parallel processing and competitive inhibition. Behavioral & Brain Sciences, 22, 661-721. Francis, W. N., & Kucera, H. (1982). Frequency analysis of English usage: lexicon and grammar. Boston: Houghton Mifflin. Gezeck, S., Fischer, B., & Timmer, J. (1997). Saccadic reaction times: A statistical analysis of multimodal distributions. Vision Research, 37, 2119-2131. Goodman, L. A., & Kruskal, W. H. (1954). Measures of Association for Cross Classifications. Journal of the American Statistical Association, 49, 732-764. Goodman, L. A., & Kruskal, W. H. (1963). Measures of Association for Cross Classifications III: Approximate sampling theory. Journal of the American Statistical Association, 58, 310-364. Gray, C. T. (1922). Deficiencies in reading ability: Their diagnosis and remedies. Chicago, IL: Heath & Co. Hacisalihzade, S. S., Stark, L. W., & Allen, J. S. (1992). Visual perception and sequences of eye movement fixations: A stochastic modeling approach. IEEE Transactions on Systems, Man & Cybernetics, 22, 474-481. Hall, W. J., & Wellner, J. A. (1980). Confidence bands for a survival curve from censored data. Biometrika, 67, 133-143.

241 Harris, C. M., Hainline, L., Abramov, I., Lemerise, E., & et al. (1988). The distribution of fixation durations in infants and naive adults. Vision Research, 28, 419-432. Heckerman, D. (1998). A tutorial on learning with Bayesian networks. In M. Jordan (Ed.), Learning in Graphic Models (pp. 301-354). Cambridge, MA: MIT Press. Heller, D. (1982). Eye movements in reading. In R. Groner & P. Fraisse (Eds.), Cognition and eye movements (pp. 139-154). Amsterdam: North Holland. Henderson, J. M., & Ferreira, F. (1993). Eye movement control during reading: Fixation measures reflect foveal but not parafoveal processing difficulty. Canadian Journal of Experimental Psychology, 47, 201-221. Hogaboam, T. (1983). Reading patterns in eye movement data. In K. Rayner (Ed.), Eye movements in reading: Perceptual and language processes (pp. 309-332). New York: Academic Press. Hollander, M., & Wolfe, D. A. (1999). Nonparametric statistical methods. (2nd ed.). New York: Wiley. Huey, E. B. (1908). The psychology and pedagogy of reading: with a review of the history of reading and writing and of methods, texts, and hygiene in reading. Cambridge, Mass.: MIT Press. Inhoff, A. W., & Radach, R. (1998). Definition and computation of oculomotor measures in the study of cognitive processes. In G. Underwood (Ed.), Eye guidance in reading and scene perception (pp. 29-53). Oxford, England UK: Anonima Romana. Irwin, D. E. (1998). Lexical processing during saccadic eye movements. Cognitive Psychology, 36, 1-27.

242 Javel, E. (1878). Essai sur la physiologie de la lecture. Ann. Oculist, 79, 97-117, 240-274. Johnson, N. L., Kotz, S., & Balakrishnan, N. (1994). Continuous univariate distributions. (2nd ed.). New York: Wiley & Sons. Jordan, M., Ghahramani, Z., & Saul, L. K. (1997). Hidden Markov decision trees. In M. C. Mozer, M. I. Jordan, & T. Petsche (Eds.), Advances in neural information processing systems (Vol. 9, pp. 501-507). Cambridge, MA: MIT Press. Jordan, M., & Jacobs, R. A. (1994). Hierarchical mixtures of experts and the EM algorithm. Neural Computation, 6, 181-214. Jordan, M. I., Ghahramani, Z., Jaakkola, T. S., & Saul, L. K. (1998). An introduction to variational methods for graphical models. In M. Jordan (Ed.), Learning in graphical models (pp. 105-159). Cambridge, MA: MIT Press. Just, M. A., & Carpenter, P. A. (1980). A theory of reading: From eye fixations to comprehension. Psychological Review, 87, 329-354. Kennison, S. M., & Clifton, C. (1995). Determinants of parafoveal preview benefit in high and low working memory capacity readers: Implications for eye movement control. Journal of Experimental Psychology: Learning, Memory, & Cognition, 21, 68-81. Kerr, P. W. (1992). Eye movement control during reading: The selection of where to send the eyes. Unpublished Doctoral thesis, University of Illinois, Urbana-Champaign, IL. Kingstone, A., & Klein, R. M. (1993). Visual offsets facilitate saccadic latency: Does predisengagement of visuospatial attention mediate this gap effect? Journal of Experimental Psychology: Human Perception & Performance, 19, 1251-1265.

243 Kliegl, R. M., Olson, R. K., & Davidson, B. J. (1982). Regression analyses as a tool for studying reading processes: Comment on Just and Carpenter's eye fixation theory. Memory & Cognition, 10, 287-296. Legge, G. E., Klitz, T. S., & Tjan, B. S. (1997). Mr. Chips: An ideal-observer model of reading. Psychological Review, 104, 524-553. Liversedge, S. P., Paterson, K. B., & Pickering, M. J. (1998). Eye movements and measures of reading time. In G. Underwood (Ed.), Eye guidance in reading and scene perception (pp. 55-75). Oxford, England UK: Anonima Romana. Liversedge, S. P., & Underwood, G. (1998). Foveal processing load and landing position effects in reading. In G. Underwood (Ed.), Eye guidance in reading and scene perception (pp. 201-221). Oxford, England UK: Anonima Romana. McConkie, G. W. (1981). Evaluating and reporting data quality in eye movement research. Behavior Research Methods & Instrumentation, 13, 97-106. McConkie, G. W., & Dyre, B. P. (2000). Eye fixation durations in reading: Models of frequency distributions. In A. Kennedy, R. Radach, D. Heller, & J. Pynte (Eds.), Reading as a perceptual process. Amsterdam: Elsevier Science Ltd. McConkie, G. W., Kerr, P. W., & Dyre, B. P. (1994). What are "normal" eye movements during reading: Toward a mathematical description. In J. Ygge & G. Lennerstrand (Eds.), Eye movements in reading. Tarrytown, NY: Pergamon. McConkie, G. W., Kerr, P. W., Reddix, M. D., & Zola, D. (1988). Eye movement control during reading: I. The location of initial eye fixations on words. Vision Research, 28, 1107-1118.

244 McConkie, G. W., Kerr, P. W., Reddix, M. D., Zola, D., & et al. (1989). Eye movement control during reading: II. Frequency of refixating a word. Perception & Psychophysics, 46, 245253. McConkie, G. W., & Rayner, K. (1973). An on-line computer technique for studying reading: Identifying the perceptual span. In P. L. Nacke (Ed.), Diversity in mature reading: theory and research (Vol. 1, pp. 119-130): National Reading Conference, Inc. McConkie, G. W., & Rayner, K. (1975). The span of the effective stimulus during a fixation in reading. Perception & Psychophysics, 17, 578-586. McConkie, G. W., Reddix, M. D., & Zola, D. (1992). Perception and cognition in reading: Where is the meeting point. In K. Rayner (Ed.), Eye movements and visual cognition: Scene perception and reading (pp. 293-303). New York, NY: Springer. McConkie, G. W., Zola, D., Grimes, J., Kerr, P. W., Bryant, N. R., & Wolff, P. M. (1991). Children's eye movements during reading. In J. F. Stein (Ed.), Vision and visual dyslexia (pp. 251-262). London: Macmillan Press. McCullagh, P., & Nelder, J. A. (1983). Generalized linear models. London ; New York: Chapman and Hall. McLachlan, G. J., & Basford, K. E. (1988). Mixture models : inference and applications to clustering. New York, N.Y.: M. Dekker. McLachlan, G. J., & Peel, D. (2000). Finite Mixture Models. NY: Wiley. Miller, K., & Feng, G. (in prep.). Reading English and Chinese: A developmental eyemovement study.

245 Morrison, R. E. (1984). Manipulation of stimulus onset delay in reading: Evidence for parallel programming of saccades. Journal of Experimental Psychology: Human Perception & Performance, 10, 667-682. Murphy. (2001). Bayes Net Toolbox for Matlab 5. Available: http://www.cs.berkeley.edu/~murphyk/Bayes/bnt.html. Murray, W. S. (2000). Commentary on Section 4. Sentence processing: Issues and measures. In A. Kennedy & R. Radach (Eds.), Reading as a perceptual process (pp. 649-664). Amsterdam, Netherlands: North-Holland/Elsevier Science Publishers. O'Regan, J. K. (1990). Eye-movements and reading. In E. Kowler (Ed.), Eye movements and their role in visual and cognitive processes (pp. 395-453). Amsterdam: Elsevier. O'Regan, J. K., & Jacobs, A. M. (1992). Optimal viewing position effect in word recognition: A challenge to current theory. Journal of Experimental Psychology: Human Perception & Performance, 18, 185-197. Perl, J. (2000). Causality: Models, reasoning, and inference. Cambridge, UK: Cambridge. Rabiner, L. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77. Radach, R., & McConkie, G. W. (1998). Determinants of fixation positions in words during reading. In G. Underwood (Ed.), Eye guidance in reading and scene perception (pp. 77100). Oxford, England UK: Anonima Romana. Rayner, K. (1986). Eye movements and the perceptual span in beginning and skilled readers. Journal of Experimental Child Psychology, 41, 211-236.

246 Rayner, K. (1995). Eye movements and cognitive processes in reading, visual search, and scene perception. In J. M. Findlay & R. Walker (Eds.), Eye movement research: Mechanisms, processes and applications. Studies in visual information processing, 6 (pp. 3-22). Amsterdam, Netherlands: Elsevier Science Publishing Co, Inc. Rayner, K. (1998). Eye movements in reading and information processing: 20 years of research. Psychological Bulletin, 124, 372-422. Rayner, K., & McConkie, G. W. (1976). What guides a reader's eye movements? Vision Research, 16, 829-837. Rayner, K., & Pollatsek, A. (1989). The psychology of reading. Englewood Cliffs, N.J.: Prentice Hall. Rayner, K., Reichle, E. D., & Pollatsek, A. (1998). Eye movement control in reading: An overview and model. In G. Underwood (Ed.), Eye guidance in reading and scene perception (pp. 243-268). Oxford, England UK: Anonima Romana. Reichle, E. D., Pollatsek, A., Fisher, D. L., & Rayner, K. (1998). Toward a model of eye movement control in reading. Psychological Review, 105, 125-157. Reichle, E. D., Rayner, K., & Pollatsek, A. (1999). Eye movement control in reading: Accounting for initial fixation locations and refixations within the E-Z Reader model. Vision Research, 39, 4403-4411. Reilly, R. (1993). A connectionist framework for modeling eye-movement control in reading. In G. d'Ydewalle & J. Van Rensbergen (Eds.), Perception and cognition: Advances in eye movement research. Studies in visual information processing (Vol. 4, pp. 193-212). Amsterdam, Netherlands: North-Holland/Elsevier Science Publishers.

247 Reilly, R. G., & O'Regan, J. K. (1998). Eye movement control during reading: A simulation of some word-targeting strategies. Vision Research, 38, 303-317. Schilling, H. E. H., Rayner, K., & Chumbley, J. I. (1998). Comparing naming, lexical decision, and eye fixation times: Word frequency effects and individual differences. Memory & Cognition, 26, 1270-1281. Shillcock, R., Ellison, T. M., & Monaghan, P. (2000). Eye-fixation behavior, lexical storage, and visual word recognition in a split processing model. Psychological Review, 107, US: American Psychological Assn. Stark, L. (1994). Sequences of fixations and saccades in reading. In J. Ygge & G. Lennerstrand (Eds.), Eye Movements in Reading (pp. 135-161). Tarrytown, NY: Pergamon. Stark, L., & Ellis, S. (1981). Scanpaths revisited: cognitive models direct active looking. In R. A. Monty & J. W. Senders (Eds.), Eye movements, cognition and visual perception (pp. 193-226). Hillsdale, NJ: Erlbaum. Suppes, P. (1990). Eye-movement models for arithmetic and reading performance. In E. Kowler (Ed.), Eye movements and their role in visual and cognitive processes (Vol. 4, pp. 455477). Amsterdam: Elsevier. Suppes, P. (1994). Stochastic models of reading. In J. Ygge & G. Lennerstrand (Eds.), Eye movements in reading (pp. 349-364). Oxford, England: Pergamon Press. Suppes, P., & et al. (1983). A procedural theory of eye movements in doing arithmetic. Journal of Mathematical Psychology, 27, 341-369. Taylor, S. E. (1965). Eye movements in reading: Facts and fallacies. American Educational Research Journal, 2, 1965, 187-202.

248 Thibadeau, R. (1983). CAPS: A language for modeling highly skilled knowledgeintensive behavior. Behavior Research Methods, Instruments, & Computers, 15, 300-304. Thibadeau, R., Just, M. A., & Carpenter, P. A. (1982). A model of the time course and content of human reading. Cognitive Science, 6, 101-155. Titterington, D. M., Smith, A. F. M., & Makov, U. E. (1985). Statistical analysis of finite mixture distributions. Chichester ; New York: Wiley. van Gisbergen, J. A. M., Gielen, S., Cox, H., Brujins, J., & Schaars, K. H. (1981). Relation between metrics of saccades and stimulus trajectory in visual target tracking: implications for models of the saccadic system. In A. F. Fuchs & W. Becker (Eds.), Progress in oculomotor research. North Holland: Elsevier. Vitu, F., & McConkie, G. W. (2000). Regressive saccades and word perception in adult reading. In A. Kennedy & R. Radach (Eds.), Reading as a perceptual process (pp. 301-326). Amsterdam, Netherlands: North-Holland/Elsevier Science Publishers. Vitu, F., McConkie, G. W., & Zola, D. (1998). About regressive saccades in reading and their relation to word identification. In G. Underwood (Ed.), Eye guidance in reading and scene perception (pp. 101-124). Oxford, England UK: Anonima Romana. Wagner, R. A., & Fischer, M. J. (1974). The string-to-string correction problem. Journal of Association of Computing Machinery, 21, 168-173. Walker, R., Kentridge, R. W., & Findlay, J. M. (1995). Independent contributions of the orienting of attention, fixation offset and bilateral stimulation on human saccadic latencies. Experimental Brain Research, 103, 294-310.

249 Wolfe, J. H. (1971). A Monte Carlo study of sampling distribution fo the likelihood ratio for mixtures of multinormal distributions (Technical Bulletin STB 72-2). San Diego, CA: U.S. Naval Personnel and Training Research Laboratory. Yang, S.-N., & McConkie, G. W. (in press). Eye movements during reading: A theory of saccade initiation times. Zangemeister, W. H., Sherman, K., & Stark, L. (1995). Evidence for a global scanpath strategy in viewing abstract compared with realistic images. Neuropsychologia, 33, 1009-1025.

250 CURRICULUM VITAE Biographical Information Name: Date of Birth: Place of Birth: Education 2001 Ph.D. University of Illinois at Urbana-Champaign Department of Psychology Major area: Developmental Psychology Minor area: Quantitative Psychology University of Illinois at Urbana-Champaign Department of Statistics University of Illinois at Urbana-Champaign Department of Psychology Beijing Normal University, Beijing, China Department of Psychology Gang Feng March 16, 1968 Beijing, China

1999

M.S.

1998

M.A.

1990

B. Edu.

Awards and Honors 1999-2000 1999 1990 1986-1990 Beckman Institute Graduate Fellow Cognitive Science/AI Summer Fellowship, UIUC Honor Graduate, Beijing Normal University Government fellowships, Beijing Normal University

Research Experience 1999 - 2000 Beckman Institute Graduate Fellow, Beckman Institute, UIUC

251 Summer, 1999 Summer, 1998 1994 - 2000 1990 - 1994 CogSci/AI Steering Committee Summer Fellowship, UIUC Data Analyst, Center for Reading Research, UIUC Research Assistant, Beckman Institute, UIUC Assistant Researcher, Institute of Psychology, Chinese Academy of Sciences Teaching Experience 1998-1999 1996-1997 Publications Feng, G., Miller, K. F., Zhang, H., & Shu, H. (2001). Towed to recovery: the use of phonological and orthographic information in reading Chinese and English. Journal of Experimental Psychology: Learning, Memory, and Cognition, 27, 1079-1100. Kelly, M., Miller, K., Fang, G., & Feng, G. (1999). When Days Are Numbered: Calendar Structure and the Development of Calendar Processing in English and Chinese. Journal of Experimental Child Psychology, 73, 289-314. Feng, G. (1998). Homophone confusion in reading English and Chinese. Unpublished master’s thesis, University of Illinois at Urbana-Champaign. Fang, G., Fang, F., & Feng, G. (1995). A comparative study of elementary school students’ mathematics achievement and motivations. Chinese University of Hong Kong Elementary Education, 2, 51-56. Fang, G., Feng, G., Fang, F., & Jiang, T. (1994). Preschoolers' estimation of time duration and their cognitive strategies. Psychological Science (China), 17, 3-9. Teaching Assistant, Child Psychology Teaching Assistant, Research methods in developmental psychology

252 Fang, G., Feng, G., Jiang, T., & Fang, F. (1993). Time duration estimated by preschoolers and their strategies. Acta Psychologica Sinica, 25, 346-352.

Sign up to vote on this title

UsefulNot usefulPh. D. dissertation on eye movements in reading

Ph. D. dissertation on eye movements in reading

- Dissertation of Alexey Lindo (Final Version)
- all
- Mcmc Theory
- Prigogine
- Final%2BSpring%2B2007.pdf
- Gibbs Sampling
- Markov Chain
- Markov Analysis
- A Call Admission Control for Service Differentiation and Fairness Management in WDM Grooming Networks - Zhanxiang
- Syllabus
- Non-life claims reserves using Dirichlet random environment
- chapter5_1
- final6711F07sols
- Markov Chains
- Hierchical Markov Model for Pavement deterioration forecasting
- t3Fall2008Spring2012fall2012
- hang doi
- Hw 2 Solution
- 10.1.1.76.353
- SEM11 for Permeability Simulation
- v38i08
- Stochastic Modelling 2000-2004
- Section 3.4 Sampling
- A Downtown Car Service Station Has Facilities for a Maximum of 4 Cars Being Serviced or Waiting for Service on Its Premises
- MCMC - Markov Chain Monte Carlo
- t2Fall2013A.pdf
- Training Hidden Markov Models with Multiple Observations – A Combinatorial Method
- Untitled
- mech4001y-5-2009-2
- Act l 2102 Pass Sample Midterm
- Feng (2001) - Dissertation