You are on page 1of 372
UNIVERSITI PERTANIAN MALAYSIA sin Pahoa i ors: Models for Behavior ©. Atkinson 0 Fre Stochastic Processes in Psychology Thomas D. Wickens nverty of Colfoi,Las Angeles CA W. H. Freeman and Company San Francisco (oa. 8 To Carl and Wick Project Eater: Peat C. Vipnck Designer: Sharon Helen Sith Production Coordinator” Linda Jupiter Mlusraion Coordinator: Richard Quitones Compostor: BEComp, tn, Prine and Riner> The Mapa-Vall Book Mansfastoring Group brary of Congress Cataloging in Pbliceion Data Wickens, Thomas D., 192- ‘Modes foe behavior. (A Series of books in poychology) Bibliography: p. Includes index 1. Psychology-—Matbematical models. 2. Markov processes, I. Tile, Series. BEWS2S | Iy.724 E1349 ISBN 0-71671527 AACR ISBN 0-7167.13533 (pbk) (ost Copyright © 1982 by WH. Freeman and Company 'No pat of this book may be reproduced by any mechanical, photographic, oF lectronie process, or inthe form ofa phonograph recording, noe amy i be Stored ina retrieval system, wansmitted, or otherwise copied for publ or private use, without weiter ermisson from the publisher, Printed in the United States of America BF 1234567890 MP 0898765432 24 W636 0409838 J Ms | Models for Behavior : Stochastic’ Processes in Psychology Models for Behavior A Sete of Book in Psychology M dors Richard C. Atkinson a Jonathan Freedman Gardner Lindzey Stochas Richard F. Thompson WE waghic, oF may tbe \ vole ot Be | 39 W636 LUNIVERSITI PERTANIAN MALATSIA Contents Glossary of Symbols i Prefice ‘Theories and Markov Models 1 1 An Example: The AltorNone Model 1 412 Theories ond Models 6 13 Morkov Models” ‘The State Space | Tensions Benen States “The Markov Property 14 Overview of This Book 16 Representation of Psychological Proceses as Markor Models. 18 24 Markov Models 19 22. Simple Models of Learning 24 The Altor-None Association Model | AltorNone Learning ‘on Eros / The Lear Model} The Random-Tils- Tnrement Model 23. Two-Stage Learning Movdels 32 An Intermediate State | The Long-ani-Short Model | Maltite Transition Operators Contents 24 Choice Models 39 Single Choices! Sequences of Choices 1 Probably Learning 25 Random Walks 46 Algebraic Analysis of Finite Markov Chains St 3 Stote Probebiliee 52 32 Response Probsbiltes 8 33 The Number of Srers 60 34 Sequonil Response Probabies 65 4S Hem Vaviabiity 68 Matris Methods 77 4d Representation and State Probabties 78 42, Transitions Amoug Sets of States 2 State Sets os Responses | Liketihods of Response Swings 1 The Fandamentel Matis 43° Divoibuions of Some Summary Statice 92 The Number of Ervors (The Tia of Last Eror | Latency of Responses 44 Selection of lems Taking Parcular Paths 8 Lkethoods Again 1 The Backward Learning Curse 1 The Bachvara Latency Corre Esimaion of Parameters 108 51 The Estimation Froblem 109 52 Methoof Momous Estimators 118 5.2 Maximun-Lielilood Estimators 1 54 Minimum CiSquare Estimators 12 55 Finding Numerical Eurema 126 Su 61 62 63 ie 72 74 Ma a 2 43 a 92 0 os Contents vi Statistical Testing 138 6. Goodnessaf Fi Tests 137 62. Tests of Parameters and Comparison of Models 141 6.3 Examples of Likeltood-Rato Testing 45 dentifieation of Models and Parameters 181 7d Unidentiable Parameters and Equivalent Models 152 72. VeetorSpace Representation 158 74 Treatment of Hdentfabiy Problems 182 Equivalence Classes of Modes Paramenie Resritons ( Anayas ino Broader Domain Markov Chains Without Absorbing States 171 84 Classicaton of Markov Stars and Chains 172 82 State Probabilities 176 Transient States 1 Ergode Sates and Chains 1 Period States 8.2 Renewal Properties of @ Fite Markor Chain 185 Recurrence Probables | Recwrence of Transient States 1 Expected Retarn Tames | Maris Frmlaton Markov Chains with Unbounded State Spaces 199 9.4 Random Welks 200 State Probables | Asymptotic Behavior | State Types and Recurence Probaiies 92 A Queuing Example 206 93 Learning Models 208 24 Generating Functions 211 Properties of Generating Functions | Diference Eqvations and State Probobiies | The RTI Model | Recurence Probables vu 7 Contents Continuows-Time Markov Processes 226 101 The Poisson Process 227 Defiton ofa Poeson Process! The State Distbuton | Wauing Times | Erlang Procenes 102 Birth Proceses with State-Dependent Rates 238 Yule Processes 1 Death Processes 103 Processes with Anivale and Departures 245 A Sipe Quewe ¢ State Proportional Kates | Maliple Step Transitions Summary of Probaility Theory 254 Ad Fundament’ Defitions 256 A2 Random Vaables 256 Expeced Values AJ Some Dise'e Disibatons 261 The Geometric Disrbution 1 Sums of Geom Series / The Binowial Distribution | The Negative Binomial Distibuion {The Potzon Dismbuton Ad Some Continous Disributtns 271 Disribaens from Statistics: The ChisSquare sand the Normal 1 The Exponential Dissbution { The Gamma Diembuton } The Beta Dstbation Difference Equation 280 BA Homogencou Diference Equations 282 B2 —Nonhomozensous Diference Equations 285 23 Simultaneous Diference Equations 288 Introduction o Linsar Algebra 291 Ct Buse Depmions 292 C2 Fundamental Operations 296 t t % & te tion 1 Series 1 conten ie C3 Manis Inversion 299 CA Parttoned Mowices 303, C5 Vector Spaces, Bases, and Transformations 304 C4 igenalues and Eigenvectors 308 Some Concepts from Caleutus 314 Dit The Derivative 318 Finding Exema of Functions 1 Finding Devivatves D2 Integration 321 Inegrats as Weighted Combinations | The Relation ofthe Integral to the Derivative! Ceeuation of tntegrals Da Difeveniol Equations 327 The Greek Alphabet 329 CheSquare and Normal Distributions 330 Solutions to Problem 331 References 318 Indes 47 “hi los thrush | etaione _—ae re | Glossary of Symbols “This glossary defines some quantities and conventions that are used throughout the book, Page number(s) indicates a definition; for further citstion, refer to the index, abe, A,B,C, 2.8% 4, bb Be) CE xe a Kt fi To Lie) ot) Row vectors 292 Matrices 293 ‘Subvector of absorption probabilities 85 Parameters 26 + Parameter estimates 108 Probability of no errors following & response 63, 90 Beta function 277 Correct responses, exors 27 Chiesquare tet statistic 124, 136 Smal interval of time 229 ‘The event of an error on tials 58 Probability of first return to S,on rth trial 187 Probability of eventual return to Sy 191 Gamma function 275 I Likelihood function 117 “Trial of ast error 75, Latency on tials 97 igenvalue 308 Fundamental matrix, @~ Q-* 9 [Any function that goes to 0 faster than its frgument 281 Column veetor of I's 90 Vector of parameters 109 si Glorsory of Syubots oa tse tod ot PB PC) Ry Ra, mR So Si Su 50) Z r ™ [eatan) Wr Vector of parameter estimates 109 Space of permissible parameter values 109 Parameter estimates and parameter space for restricted model 143, Unrestic:ed and restricted models 142 The series po, prs -.. 212 -step transition probability, PASieulSu) 187 Response probability vector on tial 79 Generating function for fp} 212 ‘Submatrix of transient states 85 Responses (in goneral) 21 Response mapping and matrix 21, 79 ‘States (in general) 13, 21 ‘The event that « process isin state Sy at discrete wial 13,22 ‘The event that a process isin state S, at continuous times 28 State-probabilty vector at time 78 Asymptotic state-probabilty distribution 179, Total number of errors 61 ‘Transition mapping and matrix 21, 78 step transition mateix 81 Interval from time ¢ (inclusive) to +r (exclusive) 229 Response-election matrix. 99 This hoe tw const processe them. Tt sent psy bow ton less abot Whyt rrodels statemer exact te oaist’sg process &s the f Tisve tha Howes relevant book. 0 recogniz: lacked tt send the appropri senting necessity much me leat, are too adva methods Althow sists, ti tempt because ‘There an \ "7. } rete innous | Preface ‘This book desetibes a variety of stochastic processes that can be used te construct models of psychological phenomena. It emphasizes the processes themselves and the mathematical methods used to analyze them. Thus, a reader may discover here something about how to repre~ Sent psychological theory as a Markov process, and, in more detail, how to make a particular calculation orto derive a statisti, but will find less about the psyehologial theories that make the models important ‘Why this particular orientation? Fundamentally, I believe that there ‘is an important, although often underdeveloped, role for quantitative dell aoa ET ae Tat atements are often hard to work with and are not readily subject to Cxact tests, Mathematical models are a valuable way to provide the fore exact representation necessary for testing. Even when a psychol- ‘gist’s goal is not the construction of a general theory, a simple model process summarizes data beter than do average performance statistics Is the field of psychology develops more rigor and exactness, I be- five thet quantitative models will play an increasingly central role, “However, these quantitative models cannot be writen unless the relevant mathematical methods are familiar. This requires a good source= book. Over the years T have encountered a number of students who recognized the value of a mathematical model for their work, but who hacked the tools to ereate it. Unfortunately, there is no good place to send these students to learn the techniques. Research articles, quite appropriately, are devoted tothe substance of the model they are pre- fenting, Textbooks in mathematical psychology are better, but by fecessity must choose between limiting coverage and presupposing t00 Shuck mathematical skill. Mathematical sources, although often excel- Ent, are not oriented to the psychologist reader and are frequently ‘go advanced. A book for psychologists that teats the mathematical methods at an intermediate level of difficulty is needed. “Although this book contains mathematical techniques for psycholo~ jist, Its nol a survey of mathematical psychology. T have made no Attempt to review the feld nor to do much more than present models because they are good examples of a particular stochastic process. ‘There are two reasons for this. The first, and most simple, i space: to att iy refoce include the methods, their motivation within psychology, and a review oftheir uses would make a single book both dificult to use and expen- sive Ata deeper level, have always felt that “mathematical psychol- ay” is an awkward designation fora Held. Content better defines an area of psychology than does methodology. Mathematical methods ‘may be useful to the study of human learning, of perception, or of clinical psychology, but ‘he value of these models lies in their relation- ship to the complete fed. The particular places where mathematical ‘methods have proved valuable should not be the central focus of survey, but should form part ofa general review anchored in the con fent area. A book on modeling techniques Is not the place for this ‘Thus the present book, I hope that it presents a sullcently wide range of techniques to give useful ideas, while remaining simple and complete enough so that a reader without extensive mathematical ‘background can use it. Although the material i not af uniform difieulty (some of the harder sections are indicated by a vertical rue in the margin), a student with a good background in intermediate psychologi cal statistics should be able to follow most of it. OF course, no one ‘becomes proficient with mathematical modeling without having worked ‘0 apply the models in ractice; so some things will probably remain ‘obscure after a first or sezond reading. Working the problems provided here should help, but the reader must not become discouraged. I hope this book will also provie a reference for someone who fs tying to ‘ead a technical article on some quantitative model, and perhaps help that person to develop and use a new model. In writing this book, Ihave received much help and assistance. Many. fiends have discussed various points with me or have read chapters. In Particular, ill Larkin helsed me to formulate a number of my ideas as ‘Twas starting to write, Eric Holman served as a source or sounding board for many of my ttoughts, John Cotton taught a class from an carly draft and returned many comments on it, and Richard Millward submitted that draft to a careful, eritical, but encouraging reading. T also thank Geollery Keppel and Florence Wong at the Institute of Human Learning ofthe Cnivetsity of California, Berkeley, for provid- ing space at the Institute for me both during and after a sabbatical leave, Finally, many students, my own and others", atthe University of California in Berkeley, Les Angels, and Santa Barbara, have struggled with and commented on sarious drafts of the text. To all of these peo- ple, and to others who have encouraged me, my thanks. Without any of them, the book would be ess than i is; nei falls short of what it might be, i reflects myself alone October 1981 Thomas D. Wickens Pryehole Convent thers fom ap tation, T ge acl cause th coped in of the st probabil Aad part Markov 1 Ar Averys Tt server are writ simplest ida review and expen al psychol- ‘defines an a methods | oa, oF of ir tolation- thematial focus of a in the con: ce for this ently wide simple and uhematical m diffeulty rule in the ssychologi- ing worked bly remain s provided fed. Lhope S trying to haps help ace. Many hapters. In wy ideas as, + sounding s fiom a U Millward reading, T stitute of ir provid sabbatical iversity of struggled these peo- jut any of seitmight >. Wickens re w Chapter 1 ee Theories and Markov Models Psychological theories cun be expressed in many different languages. Conventional verbal statements best explain certain theories, while thers gain from more formal representation. Some theories benefit {fom a physiological expression, and others are most clearly revealed tsa computer program, Another possibility is a mathematica represen ‘ation This ean be valuable not oaly because mathematical expressions five a clear and unambiguous formulation of the theory, but also be- {Cause the mathematies allow the implications ofthe theory to be devel- Sped in a rigorous manner. Ifthe theory deseribes changes in the state atthe subject, itis often profitable to express dt in the language of probability theory. This book concerns representations ofthis types End particularly the most useful of the probabilistic processes, the Markov process. LL An Example: The All-or-None Model 'A very simple all-r-none model of learning is presented in this section. ft serves to introduce the principles by Which psychological theories fre writen as probabilistic processes. Although this model is the simplest nontrivial model of learning, it embodies most ofthe important 2 Theories and Markov Models ideas used in more complex models. It will be used extensively as an example throughout this book. Consider a subject lesming a list of paired-associate items, con- structed by pairing some members of a large set of potential stimuli {e.t., nonsense syllables) with responses drawn, with replacement, {rom a small set of altercatives (the digits | through 5, say), The sub ects goal isto learn what response goes with each stimulus, s0 tha, When presented with the stimulus, the appropriate response can be produced. On each tril the stimulus member ofa pair is shown tothe ‘subject, who answers by shoosing one ofthe responses. Following this response, the correct puit appears, providing both feedback and a chance to study the pair. This completes a tral, On the next tral ‘another pait, possibly dierent from the first, is presented. ‘The experi ‘ment continues in this way until all pairs ate learned of until the time available for the experiment is exhausted. The data that are collected include the sequence of stimuli that are presented, the subject's re- sponses, and possibly such additional information as latencies or con. dence ratings ‘The theoris’s job is fo construct an explanation, or model forthe subject's behavior in the learning task. Suppose that this model is approached from an association point of view. Consider learning as the formation of associations, and suppose that these associations. are quantum linkages, which are either there in full or completely absent ‘The model should deserbe how associations between the stimulus ‘member ofa pair and the response are formed, Obviously, the devel ‘opment of a complete, general theory of association learning is a major task and goes well beyond the data of any single experiment. There is ro need for such a compete theory in order to construct the model here. A simplified picture enough. The model must embody the basic idea that the learning ofthe item i the formation of sn association, but the mode! can do this as part of a relatively restricted picture. ‘The complications ofthe general theory of associative learning are ignored, hile its essential aspects are preserved, ‘To create this model, suppose that three simplifications are made. First, the earning of each pai, oritem, is treated as being independent ofthe learning of the other pais. This independence lets each item be ‘modeled in isolation. From the complete sequence of trial, those trials fon which a given item is presented are extracted. The model describes ‘What happens on these tials only, and another item would be rep. resented by a separate application of the model. Second, each item {s represented by a single cuantum association, At any time, the stimu. us member of a pair is associated with at most one response. This association has an allor-none character, so tha itis ether completely present or completely absent. pait is known or not, without interme. T iat» poss Tumsed tog in err. N made, oaly With the the ator series of Assumpt ows compl Assumnpt Assumpt the le: of thi sumbs BeAn Assumps itemt! ing re tives bw Each of t behavior, that i, os cates the ¢ serves, bor assumption response, Ttis not must do %0 ourassum make preci Many of t Alttough tial part of antiipated sive preser axe made « errr is obs have been sively as an c items, con- ental stimuli replacement, ay). The sub- ulus, $0 that, ponse can be shown tothe ‘ollowing this back and « he next tril | The experi- snl the time 2 collected subject's re- cies or cont rode, forthe his model is aming as the stations are etely absent. the stimulus y, the devel- ngis a major ont. There is ct the model sy the basic ociation, but picture. The are ignored, i are made, independent each item be those trials il describes bull be rep: i, each item 2 the stimu ponse. This “completely Ld. Av Example: The AllorNone Model 3 Aiate possibility, Finally, responses other than the correct one are humped together, so that each response is clasiied as either correct or in etror. No attempt is made to describe which incorrect response is made, ony that iti not correct. ‘With these simplifications, a model of association learning, Known a the allor-none model can be written, This model is formalize series of very specific assumptions or axioms: ‘Assumption 1. The subject’s knowledge about an item is rep- resented by one of two states: either nothing whatsoever is Kknown about the item and itis in the guessing site, or i is completely learned and is in the learned sate. Assumprion 2, Initially, all tems are in the guessing state, “Assumption 3. a, Aa item can change from the guessing state to the learned state whenever feedback is given. The probability of this transition is constant, depending neither on the trial umber nor on the past history ofthe pair. 'b. An item in the learned state says there indefinitely. Assunption 4. a. When presented with the stimulus member ofan item that i inthe guessing state, the subject responds by choos- ing randomly among the full set of potential response alterna tives. b. When presented with an item in the learned state, a correct response is always made, Each of these four assumptions concerns one part of the subject's behavior. The first describes what the subject ean know about a pair, that is, how the subject's knowledge is represented. The second ind cates the subject's state when the experiment begins. The third de- seribes how learning occur, tht is, how the state changes. The final ‘assumption relates the subject's internal state of knowledge to an overt response. Tis not enough for the model to describe a psychological process; it ‘must do so with suficient completeness tote it to data. Infact, these {our assumptions not only describe a specific mathematical process, but ‘make precise predictions about how paired-associate data shoul look. Many of these predictions are treated in detail in Chapters 3 and 4, Although the mathematical derivation of these predictions isan essen tial part of working with the model, some of the predictions ean be tantiipated less formally. Consider the way in which errors on sucees- sive presentations are related. The fourth assumption states that errors fare made only when the subject is in the guessing state, Thus, if an ‘error i observed on thet presentation ofan item, that item cannot yet have been learned. There is only one guessing state, and this must be 4 Theories and Markov Models the state of the item regardless of whether the errar occurs on the frst, the tenth, or the fitieth presentation. The information gained by ob- serving this error can be applied to predict what happens on the next presentation. Because the item is known tobe inthe guessing state, the only way for the next presentation to also be un error is forthe subject to fall to learn from the feedback on presentation f, then to make a Wrong guess, In terms of probabilities, ‘Plerror on presentation + 1 given an error on presentation‘) = P(error given the item isin the guessing state) (1.1) x Plitem nat learned after Feedhack) None of these probabilities depend on so thatthe conditional error probability in Equation 1.1is aconstant. The constancy ofthis quantity over presentations is a strong prediction of the all-or-none model and can be ested by examining data from a paired-associate experiment. ‘At this point—or perhaps before—some objections to the model ‘come to mind. The four assumptions are surely not altogether realistic Many things may be wrong. Some objections center on the underlying theory. Associations may not be quantum, or atleast associations of intermediate strength may occur, and so the allor-none nature of As- sumptions 1 and 4a is wrong. Other objections involve the way that the associations are applied. Even if associations are only there of no, it ‘may still be impossible for a single association to completely represent the connection between st mulus and response (Assumption I again) Furthermore, the items probably are not independent of each other (Assumption 3a) A third group of objections concerns the description of the task. If easy items are learned frst, then the pairs that remain after a number of trials are nore dificult and harder to leara (a violation fof Assumption 32). Numerous other objections can be raised. Almost certainly, some of them are correct. ‘The goal in raising these objections is not to prove the model wrong. ‘To do sos not very exciting. Ifone wanted only to disprove the model, ‘one could easily reject Assumption 3b (no forgetting) by retesting the “subject a few weeks after learning. Undoubtably, most of the items ‘would no longer be remembered. But this would be a trivial rejection. ‘The value of the model t> psychological theorizing comes in other ‘ways. The extent to which ihe model is correct gives information about which assumptions are reasonable; and even when the model is wrong, the way in which it fails tells how the theory might be changed, Consider the two objeciions to Assumption 3: the dependence of Items and the leaning ot eany items hrs, Both imply that the probabl- ity ofan error on presentation + 1 given an error on presentation ¢ is ‘not a constant. However, they have opposite implications about the a Fe way in w independ ‘ther. TE appear e: could no Ecuation tin +E and accor ‘Alirnat items ren andthe » violations beween condition process explicit then 1.1, ites ‘This prov beter ane the frst, d by ob- as the next tate, the e subject y make a ion) | ap i eal error ‘quantity del and, 12 model realistic. nderlying | ations of re of AS y that the of not, it represent 1 again). ch other serption violation |. Almost el wrong. I 1e model, ting the ih rejection, | in other | jon about is wrong, [UNIVERSITI PERTAMIAN SALATSID LE An Example: The Albor-None Model 5 nash ten eee, Mel vith event ten lteracien wae AS tomas oi SA SAL ‘rae igure 11 Tye ect of wo diferent violations of Assumption 3 fon the conditions! eror probability in he allor-none model ‘vay in which this probability changes. Suppose thatthe items are not Jndependent because one unlearned item tends to interfere with an- ‘ther. Then the items still unlearned at the end of the session would “pear easier because most of the other items had been learned and ‘ould no longer interfere. This affects the second term of the product in Equation 1-1. The probability of being inthe learned state on presenta tion ¢ + 1 given the guessing state on presentation would go up witht, tnd accordingly the conditional probability ofan error would decrease. ‘Alternatively, if easy items are learned at the beginning and harder ftems remain atthe end, then the fist term in Equation 1.1 is changed and the probability ofan error given an error incteases. These implica~ tions are summarized in Figure 1.1. The lines representing the to violations are quite diferent. A look at data would help to diseriminate between them. Thus, even if the original model fails to predict the conditional error probability correctly, more can be said about the Srocess. This information would be dificult to obtain without the ‘xpliit analysis provided by the model. I the mode is observed to fail in one of the ways shown in Figure LLL itean be modified to take the new processes into consideration This produces a more complex model, but one that may fit the data Detter and lead to further psychological insights. Modifications of these sorts are considered later in this book. However, changing the model is 5 Theories and Markov Models ‘not without hazard. Much of the usefulness of the model lies in the fact that tis simple and easy to Work with rather than ints generality, Itis ‘easier to add complexity o a model than to see how the processes can be represented simply, Parsimony inthe description, while not always a virtue in itself is important to keep in mind. Ifthe alLor-none model Were able t explain 70% ofthe behavior in an experiment, t would be of far more use than # model that explained 75% of the behavior but as twice as comples, 12° Theories and Models ‘The principles exemplified by the allor-none model in the preceding section apply to many models in psychology and to all that are dis- ‘cussed in this book. For this reason, itis worth recapitulating more abstractly what is involved in constructing a model, Most research in ‘experimental psychology has as its ultimate goal the construction of a {general therory about psychological processes. These general theories ‘are much larger than can »e encompassed by any single experiment or ‘even by 2 seties of them. For example, the allot-none model is based ‘on association theory. One may believe that the mind acts according t0 principles of a network of associations; yet no single experiment is going to establish or disprove this belief. However, some evidence is provided by how well the allornone model fares. ‘The development of @ general theory isa slow process, which pro- ‘ceeds more often by accretion than by sudden decisive changes. Even in the rare cases where « new theoretical postion is proposed, itis based on—olten as a reaction to—a body of older theory that has been laboriously worked out. One experiment supports a theory, another suggests how the theory might be changed, and a third extends it to a ‘new domain. Fach step is tested by comparing the predictions of the theory (or beter, prediction from several competing theories) t data ‘As a result of the comparison, the standing of one theory is advanced, While another is found in need of modification ‘A serious problem in constructing a theory is finding a proper bal- ance between generality end specifiity. A truly general theory must say something about a wide range of situations and individuals, But itis nearly impossible for the theory to be this generally applicable, yet specific enough to make unambiguous predictions. A. completely specified general theory would contain so much detail tat it would be hopeless to try to clarify all of it at once. The breadth ofthe theors. Which is one ofits most important aspects, is incompatible with great specificity. At some poin:, a unified theory that contains too much sdtail founders ina mass of special eases and inconsistencies. Anyone who ha served theory Aco sient pr tre vag 2 theor perimer several predict tion ot theory. ‘equally ‘sich as than the rartcul rodicte rade if The make e is impo: the gene road, ¢ clude go of the cexperim awe deri purely a shows fy Tm this ‘The imp bat mak ‘modell bt this ‘Amo describe akorno sion of | satisfact ng more earch in tion of a theories iment or is based ording to iment is idence is another dsittoa ns of the to data vanced, oper bal- ory must But itis able, yet mpletely vould be theory, ith great 20) much ‘Anyone “ “| 12 Theorles and Models 7 who has examined any theoretical controversy in psychofogy has ob- setved the dificultes involved in pinning down exactly what a gener theory does and does not predict "A complementary dificulty concerns making predictions with sufi- cient precision, Ifa theory only weakly specifies details, its predictions fe vague and dificult to test exactly. Its often unclear what supports ‘theory and what does not, This vagueness frequently appears in €x- Feriments designed to test theories by looking atthe mean scores for ‘several conditions. Suppose that fora two-group experiment, a theory rredicts that group has larger mean than group 8 Statistically, this theoretical prediction is texted as the hypothesis yi, = j», with rejec- tion of equality in the appropriate direction taken xs support for the theory. Yet surely not all values of jg and for Which 4 > su are ‘equally in accord with the theory. The results of an experiment may be ftch as fo reject sg = in, but be even less consistent with the theory than the rejected hypothesis. The evidence from such an experiment is particulary thin when a is {00 often the case, the event js = jay 8 ot ‘predicted by any interesting or plausible theory. A better test can be nade ifthe theory predicts more exact values of 14 and Hs, "The two needs are in confit. On the one hand, deta is necessary to make exact, testable, and useful predictions; on the other, this detail is impossible to achieve with consistency. One solution is to separate the generality fom the detail. There isa general theory that remains broad, and there are more precise representations of its parts that in ‘lude greater levels of detal. Limiting the scope of the speciic forms ofthe theory lets the exact predictions that are necessary for a good ‘experimental test be made. At the same time, since the limited theories ae derived from the more general one, they are saved from being purely ad hoc. The global theory ties the narrower ones together and ‘hows how their implications link. Tn this book the term model is used to denote these limited theories. ‘The important characteristic of a model is that itis concrete enough and clear enough vo that its properties are well defined and its predictions se exact. Hence, it is subject to exact testing. A model is smaller and ‘hore restricted in seape than the general theory from which it derives, but makes up for this in precision. Of course, the restricted nature of « snodel limits its applicability tothe situation for which it was defined, but this isa fir price for its exactness. ‘A model is usually tied not only to a particular psychological phe somenon but also to a particular experimental task, I is intended to deseribe behavior in one specific situation, notin others. Thus, the allor-none model presented above applies only toa rather simple ver~ sion of paired-associate learning, aot to other tasks. Yet, if it were Satisfactory, itwould lend support tothe assocationisttheoty. The link 5 Theorles and Markov Models to other situations comes trough the general theory, which generates ‘other, related models for other tasks, Ifa model proves satisfactory for ‘a given task, it provides support for the antecedent general theory. On the other hand, if'a mode is unsatisfactory because data disagree with its predictions, then doubt s easton the original theory, However, the general theory is not disproved thereby, for one can aivways question the way thatthe theory is represented inthe model, No general theory is so narrow that it falls because a model that it suggest is wrong, The general theory is supported or rejected only as a collection of models, ‘derived from it are predominantly supported or rejected. Te larger the collection of satiafuctory models that arise from a theory, the more support the theory receives. If such a collection of models cannot be {ound or if they do not form a consistent pattera from situation to situation, the theory is weskened and may need to be abandoned.+ Because ofits precision, one must interpret the rejection of model ‘with caution. Models make exact prediction, so it seems most natural {0 test them as null hypotheses against less definite alternatives. In such a situation, rejecting the model is the most definitive conclusion that cen be made, However, n order to derive a tractable model from a general theory, its aways necessary to simplify some ofthe assuinp- tions; so the model is always wrong in some details. With a sulcently large group of subjects, it is always possible to reject a particular ‘model. Even ifs model fit ene set of data exactly, another set could be found where it fais, ‘Once a distinction is made between general theories and models, the failure of a model is nota problem. The value of the model is as the expression ofa theory, notas a complete description of behavior. The ‘model only represents the theory in a particular situation and to a particular degree of accuracy, so the fact that there are situations ‘here i fails is a commonplace. Hence, the simple statistical testing of ‘a model at a conventional sgniicance level sof litle interest in ise, ‘A more useful approach iso look at the way in which « model fits of fails to ft and to compare one model to other models derived from diferent general theories. ‘The logie ofthese tests is quite similar to that of conventional statis- tical testing. In conventional tests, the data are also represented by models from which consequences are derived and hypotheses teste, For example, consider a two-group /-tet. For this test, a model of independent, normally distibuted observations is adopted. However, 1A good example of he use of eobubiintic models inthe prs a «more genera {hor she sets of avian repre by Greet, Jats, DaPoior and Poon 5m, this mode! distribtie as because either dite wheh the conparise Tr spite from the psycholog. Tos tke, attempt a meehnise models gn ‘The pro data analy ‘The quant of the beb response. detection rate these the observ measure () Tus than d tesponse, items pass model isa error prob compariso tion proba guessing. | ‘mrodel are 13° Mar ‘This book tities knov probabilist probability known a8 Markov pe been used portion of tory for ory. On ce with ver, the uestion theory ng. The models reer the not be ation to ned 2 model natural ves. tn slusion Irom a sump. ciently ticular ould be els, the sas the ior. The ad to a tuations sting of in itself fits oF ed from, statis: ated by tested, podel of reveal 4 13 Markov Models, 9 this model is not directly testedit would be no surprise ifthe normal distribution assumption could be found to fail in some particular, such tsbecause it say’ thatthe potential range of scores extends infinitely in either direction, Rather, to alternative models are considered: one in ‘which the group means are equal, the other in which they differ. Comparison of these models is tested by the t-statsti. “in spite of this similarity, the conventional statistical models difer from the models inthis book in that they are models of data and not of psychological process, The statistical models describe what the data Took lke, but not how they eame about. The methods presented below attempt a further step, nferpretng the data through x description of the mechanisms that underlie them. This more elaborate basis gives the ‘models greater power to test psychological theory. “The process-oriented nature of these models makes them useful for data analysis, even when tests ofthe process itself are not of interest. ‘The quantities on which the model depends may reflect characteristics of the behavior under study with more accuracy than does any overt response, One of the clearest examples of this is the use of signal ‘election theory (e.8., Green and Swets, 1966; McNicol, 1971) to sepa fare the sensitivity of an observer to detect a stimulus ftom the bias of the observer to respond by saying “yes or “no.” A derived sensitivity measure (knowa as d’) better measures the ability to detect the stimo- Is than do more overt measures such as the probability of a correct response, Inthe context of learning models, the probability with which items pass from the guessing state tothe learned state in the allor-none ‘model is good measure of learning rate, while such quantities as the {or probability are contaminated by guessing. I is better to base a emparison of whether two groups differ in learning rate on this transi- tion probability, particularly when the groups also dif in their rate of fvessing. Such atest has value, even when the details ofthe all-or-none ‘mode are not correct 1.3 Markov Models This book discusses models that are formulated as mathematical en tiies known as stochastic processes. These are processes based on probabilistic descriptions (stochastic means “governed by the laws of probability"). Not all stochastic processes are treated, only what are Enowa as denumerable-siare Marlow processes. The denumerable-sate Narkov provesses do not exhaust the mathematical forms that have ‘been used for psychological models, but they encompass a large pro- portion of them. 3 54 10 "theories and Marky Models As their name indicats, the denumerable-state Markov processes are characterized by three properties: 1. Denumerable states. The state of the process at any time is specified by one ofa discrete set of alternatives, that is, by a member of a denumerable set of states, 2, Probabitisticiransions. The way in which the state changes is described by a probabilistic mechanism, 3. The Markov property. The state ofthe processes at time? > T depends only on the state at time T. In particular, how the process yets lo « particular state Is not important all informe tion about the pasts embodied in the current state, ‘These three assumptions are considered in more detail inthe remainder ‘ofthis section, A formal éefintion of Markov model appears in Chap ter2. The State Space In the all-or-none paired-associate model, the state of each item is represented by one of twe possibilities, the guessing state or the earned state, and the change of state from guessing to learned defines leraing. ‘This is one example of te representation common to all the models ‘considered in this book. With respect othe process being modeled, the subject is characterized as being in exactly one afa set of states at any point of time. This set of states i called the state space of the model. ‘Changes, such as learning, are represented by transitions from one state to another. States ae usually denoted by leters, most abstractly Sin Se, Sy and 50 forth, but more mnemonically where possible. The to states ofthe all-or-none model are indicated by G and L, for ex- ample. A large part of the wors in defining a useful model lies in finding an appropriate state space. The trick is to select a sufcently large state space to allow interesting properties to appear, but not one that is so complex as to be unworkable, Small state spaces often make a good ‘model, and itis rather suprising o see the compleity of behavior that is predicted by very simple processes. However, large state spaces are not necessarily bad, particularly when the relationships among the stales are simple. ‘The mos critical part ofconstructing a modelis propery defining the state space. Quite freauestly, models that initially appear messy and intractable become very simple when changes are made in the state space. Several examples appear later in this chapter and throughout Chapter 2. ‘An ides tained by mmedel. P formulate items. Ar cated sta Fo: exam of earin ber ote item X nc tem x tem X tem X tem” "em 1 the int necessary keeping i staies, so nation of the mode. may seen, larger sta fine stat ‘Which level of tives the be modele of simpli fecences t other une modes le essary. F “The sai ‘one postul manner) © studied, T potentials thowgh, of any finite processes i, by a anges is er >T ow the norma: remainder sin Chap- ch item is helearned s learning, he models dled, the tes at any he model, from one abstractly sible. The 1, for ex: finding an large state that is 50 ke a good navi that spaces are umong the efining the mossy and re tate hroughout 13. Markov Modele 11 ‘An idea of the diffrent representations that are possible can be ob- tained by looking at some alternate state spaces for the allor-none ‘model, Part ofthe simplicity ofthis model comes from the fact that itis formulated at the level of a single item rather than of the full list of items, Any interactions between items are neglected. A more compli- cited state space is needed if these interactions are to be considered. For example, suppose that the list consists of'n items and that the rate ‘of learning for any particular item, cal it item X, depends on the num- ber of items that are yet to be learned. An appropriate state space for item X now requires + | states: Wem X and all other items are yet unlearned. Item X and n ~ 2 other items are yet unlearned. Wem X and n ~ 3 other items are yet unlearned. ‘tem X alone is yet unlearned Item X has been learned, If the interactions among particular items are important, it might be necessary t0 work With the state space for the whole experiment, by Keeping track ofthe state of all them items at once. Each item has to ‘sates, 40 altogether 2 states are needed, one for each possible combi rilion of learned and unlearned items. Obviously, the complexity of the models increases greatly inthe larger spaces, although less than it may seem at first ifthe relationships among states remain simple inthe Iarger state spaces. Note that the model in all three eases still has & faite state space and learning is still all-or-none. ‘Which of these three state spaces one decides to use depends on the level of detail at which one wishes to work. The fll space of states tives the most detail, allowing interactions between particular items to tte modeled, For most purposes, this is excessively complex. Inthe sort cf simplifation that characterizes the construction of models, the dit. ferences between individual associations are ignofed. Ifthe number of other unlearned items determines the rate of learning, asin one ofthe ‘models leading to Figure 1.1, the intermediate-sized state space is nec- ‘ssary, For other purposes, the {vo states suffice. ‘The state space of model need not be finite. For example, suppose ‘one postulates thatthe probability of learning an item (in an a-or-none smanner) changes as a function ofthe numberof trials the item has been uudied, This model needs a separate state to represent an unlearned item on each trial of the experiment. Since the number of rials is potentially unbounded, the state space of the model is infinite, a: though, of course, only a finite number of states is needed to represent any finite body of data 12 Theories and Markov Models Although infinite in size, the state space inthe last example is still countable and discrete (ie, the states can still be numbered 1, 2, 3, ). There is no mathematical reason Why the state space could not bbe any set, including seis shat are uncountably infinite, In particu, it {is possible to consiruet models in which the state space is the set of Points on a line. Processes of this sort are called cifusion processes, since they can be used as 2 model of the physical diffusion of particle through space. However, recent psychological theory has tended to ‘emphasize relatively sharp changes in the subject's state—the forma- tion of an association, the change of view brought about by feedback, the occurtence of an insightful answer t0 a problem, These are easy to represent as changes within a diserete state set, Because of this, tnd because of their mathematical dificulty, diflision processes have found litle use in psychological models and so are nat covered inthis book, Transitions Between States Once the state space is defined, the next step is to decide how the process moves from state (0 state. First, one must define how time is fobe measured. Although ime is, of course, a continuous variable, itis ‘often much simpler to think ofthe process as operating in discrete time Frequently, this is areflecton ofthe experimental design, For example, ‘many experiments are organized as a series of tials. With such exper. iments, it is usually more helpful to index events by the diserete tial number than by continuous clock time. For example, in the paired- associate experiment deserved above, only the state of an item atthe start of the test phase of a ial i important. Both the response and the effect of feedback are completely determined by the state. So sucves- sive tests ofan item are counted as one unit in site ofthe fact thatthe lapsed time between two presentations may be quite variable, The time measure takes only the integral values! 1,2,3, ...« Most models in this book use this sort of quantized time. Some continuous-time models appear in Chapter 10. However time is measured, the course of the process is represented by the sequence of states it occupies. The point of a model is to de scribe changes, so the rules by which one state leads to another are fundamental. These rues give the next state ofthe process asa func- tion of some combination of information about the history ofthe pro- ‘cess. The past history ofa process i entirely emboitied inthe sequence of states leading up tothe current state, so the model's future behavior is determined by this sequence. The transition rules describe exactly hhow future states depend on this history. In stochastic models, no attempt is steac, the 1 proces nos over the st exacily wh tribution de model, the sitferent it been learne tional on th “ALIS p state space that the po that the pre process sta ‘Then defini the erobabi snd the pro the rast his and s fort probabilitie P are require allornone Obvious! probibilitie state space modify the space. Fro each future ahilies an model mus! sary to ede ing it. Thus eis stil 41,2, 3, ould not ticular, it he set of, a particle ended to ie forma- eedback, are easy cof this, wes have ed in this how the w time is able, itis rte time. example, ch exper rete trial € paired om at the e and the hat the ible. The models in te models sresented is to de other are isa func Fthe pro- sequence behavior e exactly odels, 0 w 13 Markov Models 3 attempt is made to write exact, deterministic rules of transition. In- Steed, the rules are given in probabilistic form. In other words, for & process now at time f, the model determines a probability distrioution ‘vee the state space at a late time f, > f, but does not determine ‘exactly which state occuts at fe, For most models this probability dis: tribution depends atleast onthe state at f, In the allor-none learning ‘mode, the probability of being in the guessing state, say on til 10, is Uiferent if on tral 9 the subject had been guessing than i he item had ‘been learned. Thus, the probability distribution is in some way condi- tiogal on the history ofthe process. "At this point itis worth developing some notation. Suppose thatthe state space isthe Finite or countably infinite set {Sip Se, Soy... } and thatthe process operates in disrete time. Let Sis indicate the event that the process isin state Sy on trial. In particular, suppose that the process stats in state S, on tral 1, moves to Son trial 2, and 90 on, ‘Then defining the transition mechanism of the model involves giving the probabilities of being in each state on the initial trial, PS) 12 and the probabilities of being in each subsequent state, conditional on the past history ofthe process, Pula) J 12 PUSralSiaN Si) fade k= 12 au so forth, More generally for any sequence of state i oy oo i the probabilities PSrinlSee Sieur veA Sua) FN Zo 12) ‘are required. These probabilities are provided by Assumption 3 of the alkor-none model, although without the formalism. ‘Obviously, the choice of state space is important tothe way thatthe probabilities are defined. In order to get & consistent definition, the Sate space must be adequate, Things cannot go on that systematically Imodify the value of Equation 1.2, but are not included in the state face. From any state and with the same history, the probability of uch future state should always be the same. When the transition prob “bites are not constant, processes that should be included in the tmodel must have been omitted. When this happens, its usually neces- Saty to redefine the state space to resolve the difielty, often by enlarg- ‘i. Thus, expanding the state space of the alor-none model in the 1 Theores and Marko» Models examples ofthe preceding section allows models to be writen that are too complicated to have stable transition probabilities in the original state space. ‘The use of probabilistic rules to deseribe the transitions is important; in fact, itis the defincg characteristic of a stochastic process. The decision o use a probatilistic model rather than a deterministic one is sometimes interpreted as a statement about the implied nature of psy. ‘chological processes. The use of Equation I.2, such an argument would say, indicates that psyctological events must be, in the end, uncertain This isnot really tue, for the probabilistic aspects are mainly a con. venieuce. The stochastic properties of a model ate closely related 10 ts simplified nature and to he fat that it does not eapture every patt of & process in detail. Global understanding and a full analysis of details are the province of more general theories. The use of probabilities lets one construct a model that concentrates on certain parts of the behavior and ignores others. One may believe that the behavior in question is ‘compietely deterministic; yet one still may not be able to write a model that s precise for an individual subject, general enough to apely to any subject, and short enough to be tractable. Practical knowledge is not adequate to remove all uncertainty from behavior. Psychological theo les are never suliiently exhaustive for one to be able to write a precise deterministic medel, nor is one's knowledge about a subject Ssuficiently detailed. The use of probability in the model is @ way to get. around this ignorance. Subsequent, more sophisticated models may remove some of the probabilistic aspects, but they will retain others. ‘The probabilistic character of a stochastic model also serves to de- Seribe parts of the task that are truly random. Generally these are aspects ofthe experimental design involving randomization. Material often chosen for presentation according to some random schedule, and this is easily aecommodeted in the probabilistic part of the model. One effect of using a stochastic model is thatthe predictions of the ‘model apply to populations of subjects or of items, But not to individ. uals. Except forthe exclision of a few impossible transitions, i is not Possible for a stochastic model to predict unequivocally the future of a Particular subject. Only the state distribution for a population of sub sects is Cetermined. Hence, tests of models are based on large collec- tions of data, not on what & particular individual does. In a sense, this {imprecision atthe individual levels the price one pays for being able to ‘ignore fine details and keep the model simple. In fac, the practical restrictions caused by using a probabilistic model are litle diferent from the restrictions imposed by normal statistical testing, in which a ‘probabilistic mode! is placed on the data rather than on the process. In either ease, a reasonably rge collection of independent observations i needed before tests can be made The Marte A model ablities © as to bei patticular ess from plex; ame estimates data. Som all pasts for the di: ‘This is km ACA. Mat areealled 13, the p Fini, af For ap somewhat every pas tially the times whe states Si In words, depends 0: is more ge special ex: Ignoring restriction stats of an type of det ries. Althe subject's ¢ be to kno tion would is usable 6 informatio ton that are he original important; oeets, The ure of psy- nent would uncertain, inky a con fated 0 its ry part of dais are ies lets one e behavior question is tea model ply to any ge is not ical theo- ta subject way to get odels may 3in others. ves t0 de> these are Material is edule, and ‘model. ions of the to indivi 8 tis not future of jon of subs rae collec- in which a rocess. In 13. Markov Models 15 The Markov Property ‘A model constructed by explicitly specifying all the conditional prob- atilties called for by Equation 1.2 would contain so many parameters fs to be all but impossible to write, As writen, the probability of a perticular state on trial t+ | depends on the entire history of the pro- ess ftom trial 1 through f. This makes the model prohibitively com- plex; among other things, it would surely be impossible to et realistic fstimates of all these probabilities from a reasonable-sized body of dita, Some sort of simplifcation must be made. The usual solution i to tonstruct the model in which the future of the process is dependent of all past states except forthe one the process currently occupies. Thus, for the discrete-time model, Equation 1.2 is simplified to PSiualSia Sagecr A <= Sua) = PSaelS) (13) ‘Tris is known asthe Markov property (alter the Russian mathematician ALA. Markov, 1856-1922), and stochastic processes for which it holds fare ealled Markov processes. Where the time is diserete, asin Equation 13, the process is known at a Markov chain, or, ifthe state space is nite, ante Markov chain. For & process in continuous time, Equation 1.3 must be changed somewhat, With continuous time, it is not possible to list the states at trery past time, Nevertheless, the Markov property remains essen- tilly the same, The independence of history holds for any choice of times when the procest is observed. For any f,< fy <1. < fs, and sates Siy Sup --+ Sig, at these times, the Markov property states that PSisdSas «+A Sia) = PSueatalSan) — 4) In words, no matter when itis examined, the future of the process depends on its past history only through the current state. Equation 1.4 js more general than Equation 1,3, and it includes Equation 1.3 as a special case Tenoring the events on trials prior to the most recent is less of a restriction on a model thaa it might seem. The idea that the current Sate of an organism determines what it does next i consistent with the {ype of deterministic thinking that ies behind most psychological theo- Fes. Although the past determines where the subject is now, itis the ‘ibject’s current state that determines future activity. Ifit were possi tHe te know the subject completely at a moment of time, that informa tion would be sufficient to predict the subject's future behavior. fone is unable to make such a prediction, the faut les in having inadequate information about the current state or in having an inadequate rep 16 Theories and Marko» Models resentation ofthat state, sather than in needing to refer tothe past. The Markov property expresses this idea as a mathematical statement, Where the Markov property seems to fal, the model can usually be ‘reformulated to make it hold. Once agai, the trick i to define the sate space properly. Any information that is needed for the response must be represented in the state. An example shows this, Suppose that a ‘model has been proposed with three states, A, B, and C, and that the transitions out of state B depend on whether the process has ever been in state Cin the past. For example, in a learning model, suppose that State B represents items whose association is unknown, state A repte- tents items that are peemunenty learned, and state C represents ites learned only temporarily, It may be easier to relearn an unknown item (passing from B to.) ifithas been temporarily learned inthe past (has been in C) than if it has never been known. Knowing whether the process has been in doesnot require access tothe entre past histor. is nevessary only to expand the state space by spiting state B into {wo states that differ only in whether the process has passed through © ‘The state space now contain four states, A, A, Bus and C. State By indicates thatthe process has been in state C at some time in the pas, while Bye indicates that i: has never been there. The process stars i 4; then, after reaching C, it moves into B, or A and never returns t0 ‘Bue Dillerent transition probabilities can exist leaving Be and Bre, ‘which solves the original problem. Doubling state B allows the process to “remember” the passage through C. Without violating the Markov principle, very elaborate characteristics of the history can be retained in the states 1.4 Overview of This Book ‘This book discusses various types of Markov processes that can be used as models of psychological processes. As the discussion in the preceding sections shows, the way that the state space ofthe model and ‘the transitions among the states are defined is of fundamental impor: tance, In Chapter 2, a number of models are described, illustrating how state spaces and transition probabilities can capture a psychological theory, ‘The major emphasis of this book is on the way in which predictions about behavior are made from the Markov models. Examples drawn from the models in Chapter 2 are used to illustrate the analysis proce- dures. The next five chapters, 3-7, discuss one type of Markov pro- cess: finite Markov chains in discrete time containing states in which the process eventually comes to rest, These models are characteristi- cal init Within preliction course, th possible, « preblems chepters and Chap these chap kor cin cones dis Unforte isimilae the same les that ti Dleto esti model. Q, and what ‘The res state nw tinue to € considere similar to or anothe relaxed a) ‘Throug algebra, pans oft fs well a series of with Cha digerence peadix D, Markov ¢ needed Certain witha ve: ‘mere tec! the rack: reading. | marked 1 we past. The stement. usually Be ne the state ponse must pose that a id that the s ever been uppose that eA repre sents ems known item ne past (has mhether the ast history, tate B into through C C. State By inthe past, starts in returns to 1 and Bro, the process he Markov be retained hat can be sion in the todel and ntl impor- cating how ychological predictions ples drawn, ysis proce arkOV pro- sin which aracteristi- 14 Overview of This Book 17 aly finite: both their state spaces and the number of trials on which transitions take place are finite. ‘Within this series, Chapters 3 and 4 look at the basic derivation of predictions, fist without, then with, the use of matrix algebra, Of eure, these predictions are useful only when comparisons to data are possible, either to match the model to data or totes it. The statistical problems involved in these operations are discussed in the next two ‘Chapters. Chapter $ considers how numerical quantities are estimated; fad Chapter 6, how hypotheses about models are tested. Although these chapters take their examples only from the finite, absorbing Mar how chains, their methods are more general and can apply to the pro- cesses discussed inthe later chapters as well. Unfortunately, it is not always the case that models that appear dissimilar make diferent predictions, When different models predict the same behavior, they do not differentiate between the general theo fins that ie behind them. For sila reasons, itis frequently not pos be to estimate all the numerical quantities that seem to be present in & rmodel. Questions ofthis sort—when models make the same predictions ‘and what t9 do about it—are the topic of Chapter 7. "The restrictions on the variety of models imposed in Chapters 3-7 ae removed in the last three chapters. Chapter 8 drops the need for a Slate in which the process comes to rest, so that the process can con- tinue to change state forever. In Chapter 9, infinite state spaces are considered. The methods of analysis in each of these cases are very ‘Smilarto the methods of the earlier chapters, but expanded in one way ‘or another. Finaly, in Chapter 10, the discretesime requirement is Felaxed and Markov processes in continuous time are considered. ‘Throughout these chapters, results from probability theory, tinear algebra, and calculus are needed. Some users of this book may find parts ofthese feds tobe unfamiliar. For them, reviews ofthese topics, { well as of the solution to difference equations, are provided in a ‘tries of four appendices, These sections can be read before starting ‘vith Chapters 3 (Appendix A, probability theory, and Appendix B, Uifference equations), 4 (Appendix C, linear algebra) and 5 or 10 (Ap- rendix D,caleuls), although many readers will be enxious to geto the ‘Markov models and so will prefer to turn to the appendices only as reeded. ‘Certain sections of the text and certain problems have been marked vith a vertical rule in the margin. These contain material that is either Inore technical, involves more dificult mathematics, is some what off the track ofthe chapter, ot that fr other reasons ean be skipped on first reading, Unmarked material i later chapters Joes not require earlier racked material Chapter 2 Representation of Psychological Processes as Markov Models ‘The ist step in creating < Markov model ofa psychological theory is to efine the state space oF the Markov process and the way that the subject moves from state to state. In many respects, this step is the hardest one. Often many attempts are necessary to find a way to de scribe a psychological theory in a fairly simple set of states without losing its essential aspects. This chapter introduces the construction of ‘models. Unfortunately, within the confines ofthe chapter, i is impos- sible to do more than illustrate how models are defined. Indeed, within {8 book of any length, it would not be possible to give explicit rules for how to write a successful model. In the end, the creation of a go0d ‘model lies in the skill of -he researcher, Because ofthis, and because the technical methods of mathematical analysis ae easier to define (but ‘often more disturbing to the student), it is easy to ignore the construc tion ofthe model altogether and start immediately with the calculation. ‘This would bypass an imsortant pat ofthe modeling process. So the examples here illustrate some of what les behind the formal models Inthis chapter a number of psychological theories are translated into the state spaces and trarsition probabilities of Markov models, The ‘models are chosen to be simple and relatively clear examples, They are reasonable approximations to complicated processes and can serve # building blocks in the constuction of larger theories, Throughout the 8 chapter, no the models This cha models to present the ples is limi here appr complete a Simpler or ccurent one ‘Lamming, veys. Man, times ar associates ( 1960. 20 Mari Before stan be establish of Marko: definition is ‘The natu theory is a cloments of edgeabout knowledge tookservet types of st probibilte tive sates served dive tciviad. Ev model of C rectly obser What can model is to modeled, T responses ¢ ever. this p theory. Firs pose that ¢ associate theory is 0 ay that the step is the way to de- 128 without struction of itis impos eed, within ct rales for 1 of a good nd because define (but caleulation, es. So the al models. nslated into dels. The s. They are mughout the 2 Markov Models 19 chapter, no ealevations are made. The mathematical methods whereby the models are analyzed occupy the remainder ofthe book. ‘This chapter is not a survey of the applications of mathematical models to psychology, however. The principal goal of this book isto present the techniques for analyzing models, so the selection of exam pls is limited. Although the models represent areas of psychology ‘where appreciable use has been made of Markov models, they are more Complete as examples than as descriptions of theorizing in any area, Sinpler or clearer models have often been selected in preference to current ones. Other texts (e.g., Coombs, Dawes, and Tversky, 1970; Lamuming, 1973; Restle and Groen, 1970) provide more extended sur- veys. Many of the models derive from & general theory known as Stimulus sampling theory, largely formalized by W. K. Estes and his ‘ssociates (se the collection of papers reprinted in Neimark and Estes, 1957). 21 Markov Models Before starting withthe examples, the ground rules for the models must beestablished, Accordingly, this section presents an abstract definition ofa Markov model. Inthe remainder ofthe chapter, this mathematical Aetsition i filled out into a variety of psychological models. ‘The natural application of finite Markov chains to psychological theory is as models of changes in the subject's cognitive state, The ‘lments ofthe state space correspond to the subjects states of know! edge about the task, andthe transition mechanism shows the way these knowledge states change. Application would be easy ifit were possible te observe the subject's state directly. One would need only to tally the types of states, then count the transition frequencies to estimate the probabilities of ging from one state to another. However, the cogni tive states postulated by most contemporary theories cannot be ob- served directly (f they could, much of psychology would be made trivia), Even in a case as simple as the allor-none paired-assoviate rtodel of Chapter 1, the wo states, guessing and learned, are not di- rectly observable. ‘What can be observed isthe subject's sequence of responses. Ifthe model isto be tested against data, itis these that must ultimately be ‘modeled, This suggests that it might make more sense to model the responses directly, instead of the unobservable intemal states. How- ‘ever, this procedure fils with regard to both the mathematics and the theory. Fits, responses are usualy not Markovian. For example, sup: pose that one treated the correct and error responses in a paired tssovate task asthe states ofa stochastic process, IFthese states were 20. Representation of Esycholoical Processes at Markov Models Figure 2.1 The sequence of sates in a Markov model, The fesponse process (shed arows is induced from he Maroy gpg i Koved ste bythe response mapping Markovian, the conditional probability of a correct response on trial 1+ I given a correct response on trial: would be independent of the past history of the process. Yet surely the probability of a correct response depends on mere than this. A run of five correct responses ives fairly good evidence of learning and is likely to be followed by ‘another correct response, while the faal correct response in the se- quence “error, erro, evr, eror, correct” could well bea lucky guess and is much more likely: be followed by an error. The Markov prop erty (Equation 1.3) does not hold for the responses. ‘This might suggest thatthe Markov property should be abandoned, ‘even at the cost of grester mathematical complexity. A model that ‘cannot describe data is mot of much interest even if tis simple, But a second diffculty warns that this isnot the direction to proceed. Current cognitive theories do not deal only with responses, but also with the underlying knowledge states of the subject. For example, when one ddeseribes apiece of information as being in "short-term memory” or in “long-term memory," one is making a distinction that is not directly reflected in a response. Presumably, information remembered in either ‘way yields the same correct response. What differs (in the theory) is ‘what happens to the information and how it inteaets with other infor: ‘mation, Thus, the representation of knowledge is more complex than the responses. Modeling only the responses precludes the expression of ‘many important theories. ‘The solution to this miror dilemma isto use a Markov process as the ‘model ofthe subject's internal states and to lt the responses be func= tions ofthese states, The internal knowledge states can be Markovian, {even though the response states are aot. Two sets of sates and two Stochastic processes are involved. A space of unobservable states, ‘hich inthis book is called the nowledgestate space (or often just the state space process is the second way need & states to th ing. Depet The situ somewhat | mod in F are denoted Ry. For the is” possible koowledge- rect)and timer = 1( then to Sy atthe top. operator, ta next With Figue 2.2, LLL fist five ti tens an ur mapping, i Figure 211 particular e fs produced Stochastic f fodets 2 Markov Models 21 reg 34, soe eye eras mel pin he ee 2 SEE en eet wot he tue ice eed oeter wth Mack poem on Ti scones cos tye one ie cette epotear pce nave —— Beet meee posse The eons saunce rossce ins ‘lowed oyster Marton the tneon th ae eke eine se wera tyr Mid sete cpr repr a ase deepen oe no tt mping me be reba wore) 5 hore ret ares Faun acon th ot ssn for spec eam ote aoraoe abandoned, ser ac The ste ofthe eral Evie see oad mele yb Sires Se and he tapas ems by Ra Raey ie ut Mein peat cxanglae more mien ee of a ae eh Cure Srpuuihe Sor @ tgunind and £ Cond conse he mi caret | attest pce ofthe alarame mol wile ates © (or eas apaad rostre repute Suppe sujet sara nse Sy nan oe | ET dant andes hr Syat met = 2 oc do kta sy ndo fohTs eqmne own nthe eer a att fed in either | tthe top of Figure 2.1. The transformation 7, caled the raston rte | Spamr atest udedjing owes sat and cago ‘other infor- o> next. With the state space (5, Ss, ... » Sn), it forms a Markov chain. In siren | Rymadundrandy Saher 006 vresson ot cee tng at for pric mth ject pt the | fit fe esi ten avs The norman wich ate aaaee attics able sei eyo, he peat sere fe warn, akg he bowie str wih he spe ea vata, Fame AG's shown gong esque RyRy =n oi a a Mngt n stanmnee CE cot ris canard ures inte tess te wr Miki reeset | Stn pose inply dete ove he opm sce 22 Represemation of Prchologieal Processes as Markov Modelt ‘This collection of entities constitutes a Markov model, as the term is used here: a knowiedge-state space, (Sy, Sp .. Sy) @ transition op- erator, T; a response-state space, (Ry, Ry, s.. Ryli and a response ‘operator that connects the two spaces, R. ‘The transition operator ofthe Knowledge-sate process is Markovian ‘and so can be described by the probability that one state follows an ‘other. AS in Chapter I, lt Sis indicate tht the process is in state S, on tial ¢. More specifically. for the allor-none model, Gs indicates that the item isin the guessing state on tial 3. Then the transition probabi- lay of passage from $; on tral to 5, on the next tril i P(Sjuo Si) These probabilities are conveniently writen ina square array: State on trial + 1 Ss Ss ASuaiig MSianlS) -- PSeanlSu) QD State on Ss | PUSselSea) PSsanlSead «+ PSueulSed trial : f PSusulSaa ‘The rows correspond to the source state on tral; and the columns, to the resultant state on tral + I. This array is called the sraniion ‘mauris of the chain. The mathematical properties ofthe transition ma- ‘wix as a matrix become important in Chapter 4 A similar response man lays out the array of probabilities for the response mapping: Response & Re Re PRIS.) PRIS). PURI) 2.2) PRIS) PURIS) PUR aISe) PR ISed) PRS) os Plas) ‘Where only two responses are involved, as with correct responses and PRIS) = 1 - PRIS.) ‘and it suilices to report only one columa of this matrix, Most psychological models simply this representation in one impor. tant respect chological p ‘These proce ‘Thus, the tr for any trial be omitteda geneous of logical mode the contrary transition pr >omozeneou totherespor property of particular th same theory mode! to ab In order ¢ transition pr starts i also state, as int the probabit initia probat to give the F ‘These proba Initial vector properties © Specifying the definitior Teast ata pr across the st ping this - Other quant fr returned logical chare abilities. dels the term is sition op- a response, Markovian follows an- state Son cates that mn probabil. P¢SyalSud. ve i) GD a) 5) columns, t0 Se transition tis for the en) ) ponses and sone impor — oe 2.1 Markov Models 23 tant respect, The transition probabilities deseribe the underlying psy- Chological processes by showing how the knowledge states change. ‘These processes themselves usually do not change from tral to tral ‘Ths, the tfensition matrix is independent of and PSsenlSu) = PASrilSia) forany trial numbers and. If this is the case, the time subscript can be omitted and the probabilities written as P(S)S). A process in which the transition pvabahiities do not change with time is said to be homo- geneous Oratationary (here is some Variation in terminology). Psycho- Togical models are so often stationary thatthe fact is seldom noted. On the contrary, it i more common to refer to models with changing transition probabilities as nonstationary or nonhomogeneous than for a bpomogencous model 1 be identified a8 such. The same principles apply to te response mapping in Matix 2.2. The homogeneity of a model is & property ofthe model, not of the theory it represents. One model of particular theory may be homogeneous, while another model of the ame theory is tot. It is often possible fo change & nonhomogeneous ‘medel to a homogeneous one by redefining the state space, Tn order to completely define a Markoy chain, more than just the ‘tasition probabilities must be given. The state where the process Suarts is algo needed, A precise selection of one and only one starting state, asin the allor-none example, is not necessary, In keeping with the probabilistic nature of Markov processes, itis enough to give an inal probability distribution across the knowledgestate space, thats, togive the probabilities PSs). Seas PSaade os» Pad es ‘These probabilities are known as the iniatstate vector (or simply the intial vector) of the model. As with the transtiow matrix, the vector ‘properties of the initial vector are treated in Chapter 4. ‘Specifying the probabilities in Equations 21, 2.2, and 2.3 completes the definition of Markov model, From the intial vector and the trans- tion matrix, the future of the Markov process can be determined, at Teast at a probabilistic level. For any tral, the probability distribution across the slates, or state vector, is found. Applying the response map- ping to this produces the response probability distribution for that tri Cthor quantities, such athe Trequencies with hich states are entered ‘Or returned to can also be determined. These offen represent psycho- Tegieal characteristics of the theory as important as the response prob- abilities, 2% Representation of Psychological Processes at Markov Models 2.2. Simple Models of Learning One of the major applicstions of Markov models to psychology i in the area of learning. Of the many diferent learning paradigms to which ‘models have been appied, probably the most common one is the paired-associate task, In part this is because the paired-associate task lends itself to a simple analysis, Accordingly, many of the models in this section apply initially to paired-associate daa.t {In a paired-associnte experiment, subjects are presented with a series ‘of stimulus-response pairs to be learned. The subject must learn to give the response member ef the pair when presented wh the stimulus member, For the purposs of exposition here, the stimilus member may be thought of as drawn com a set of nonsense syllables; the response ‘member, from a small set of integers. Such a pair is NOF-3. On each tial of the anticipation form of the paired-asociate procedure, the Stimulus member of a mir (NOF) is presented alone and the subject sttemps to recall the response (3), Following the response, feedback is ven as tothe correct pairing. This provides anew opportunity to learn the pair and completes the tral. On the next trial, a diferent pat i presented for test and study In most experiments, many different pairs (or items) are learned simultaneously, each pair being presented at vacious spacings with various numbers of other items between presentations. A typical se- ‘quence of trials is shovr in Figure 2.3. Utimately, a theory must deal ‘With the interrelationshis inthe learning of all these items. However, 1s in the example of Chapter I i is a good fist approximation to treat cach pair in isolation. Thus, a sequence consisting of trils 2, 6, and so forth, inthe experiment of Figure 2.3 can be extracted to look atthe item J1V-5. The wordiril is commonly used to refer both to the tril of the experiment and tothe tral for a particular item, so that trial 6 ofthe ‘experiment is also rial 2 for the item HIV. The models inthis chapter largely deal with the sequence of trials for a particular item The All-orsNone Assocation Model ‘The allor-none model introduced in Chapter I is the fist and most basic example ofa paired-associate model, Although itis undoubtedly oversimplified asa realsie model of learning, it nevertheless embodies the basie principe of earning. Thus, ican serve as an initial approxi- 1 For god dscusslon of te representation ofasighy sale ate ast, se (Green (178). . Figure mation to + model. As involved m braie mani niques are to analyze eral mateix ‘The lose Chapter t2 ready fort seribes the novledge: undeciying. errors, the hhomegeneo Assumpri noted! Assumpri Assur fodls logy isin the ms to which one is the ssoviate task be models in with a series learn to give the stimulus member may the response “3. On each ovedure, the Fthe subject feedback is inity to learn ferent pai is are learned pacings with A typical se- ry must deal s. However, ation to teat $2, 6, and so 0 look at the vo the rial of iil 6 ofthe this chapter st and most undoubtedly sss embodies itil approxi 22 Simple Models of Learning 25 + (ows > ae T > aa T L « Re > (Re Figure 233 A sequence owls fom a paired-assoclat experiment, ination to « more complicated model or as one component of such a fnodel. As an example, it illustrates most of the principles of more wolved models It is simple enough to minimize the amount of alge- traie manipulation, bu still complex enough to show clearly how tech rigues are applied, It wil also turn out thatthe algebraic methods used to analyze the all-or-none model in Chapter 3 closely parallel the gen- ‘eal mattix methods of Chapter 4, which are applicable to any model "The loge behind the all-or-none model has already been described in CChapter 1 and expressed as four assumptions. These assumptions are realy formalized in the notation of a Markov chain, The model de- fetibes the acquisition of a single item, and its states represent the Knowledge ofthat item, Clearly, the guessing and learned states are the ‘underlying knowledge states of the process; and correct responses and lenors, the two response states, Tho fist three assumptions define omogeneous two-state Markov chain over the knowledge-state space’ ‘Assumption {, The knowledge-state process has two states, de- noted by @ and L. Assumption 2. The initial state probabilities are PG)=1 and PUL) =0 en Assumption 3. The transition probabilities from state L ave 1 and MGinlEd = 0 ALaylbd 25 Representation of Pocholopeal Processes as Markov Models and those from G ace PilwG)~a and PGiulG) = 1 @ for some parameter a, 0 = a = |. More compactly the transi- tion matrix is| State on tial + 1 uae es Stateon E10 wide Gla t-a ‘The quantity a is an unknown parameter of the model, that is, a ‘quantity that needs to be specified in order to be able to make numer ‘al predictions, It represents the probability that the feedback ona tral is effective in causing an tom to be learned, soi is called the learning rate. Even with a detailed analysis of the particular experiment that is being modeled, itis usuaLy impossible to assign a value to aby a priori analysis, Tt must be estimated from dats. Methods for making this estimate are treated in Chapter 5. In this book, Greek lelterst are used {to symbolize parameters that requite estimation from data, Keeping Latin letters for other quantities such as constants derived from the procedure or intermediate results ‘A diagrammatic representation of the transitions is often quite help- ful. For the allor-none model, this diagram is faely trivial (see Figure 2.4), Conventionally ina state-ransiton diagram, the states are rep- resented as nodes and potential transitions among them by arrows. Paths from e state (0 itself are indicated, except for states from which there is no exit. Each path is labeled with the probability that itis ‘chosen. Note that this state diagram is not the same thing as the ditt ‘grams in Figures 2.1 and 7.2, where the sequence over a series of trials was illustrated. ‘State L has the property that, once a item has reached it, no further changes are possible. Such state is known as an absorbing stat. State G, on the other hand, isnot permanent. Eventually, with probability 1, the process departs from it and (being caught in Z) never returns. Such states are called transient states, Most models of learning have atleast ‘one absorbing state and ene or mote transient states. The distinction 1 Many fret Ges eters audits book, 50 pys ot tc mes saat ‘A table of he Grek sipabet includ as Append E Figs between t) ters ‘These complete 1 process. T Assump, and E where Like a, somewhat experimen “paramete “Alor Markov eo any survey Alter.None ‘The versio facecunt Tikely to 0 independer associate ¢ Suppose t odes be transi- es 1 that is, @ ake numeri- ck on atrial the learning ment that is by a priori making this rst are used ca, keeping od from the quite help- (see Figure tes are rep by arrows 1 no further State. State obability 1, turns. Such ave at least, distinction anes eit ry 2.2 Simple Models of Leaming 27 @5@ Figure 2.4 State-ransition diagram forthe allor-none model between transient and absorbing states is amplified further in Chap- ters, "These three assumptions define a Markov chain, but they do not complete the learning model, fbr they do not describe the response process. The fourth assumption does this Assumption 4. The response-state space consists of the states C ‘and E. The respoase mapping is Response oka ot rer eo sae £ |} where g isthe probability of a correct gues, pede eo [eee © ~ umber of response alternatives Like a, the symbol ¢ represents a numerical quantity, but it has a “somewhat diffrent status Tis easly given a value by an analysis of the ‘experimental paradigm and need not be estimated from data. The word parameter’ isnot generally used for quantities of this type. "All-or-none models ofthe type described here were developed inthe Markov context by Bower (1961. A discussion of them can be found in any survey of mathematical psychology. Allor.None Learning on Errors ‘The version ofthe allor-none model just presented does not take into sccount any feedback about the subject's response. Learning is as Ukaly to occur if the response is correct as if tis in error. While this independence of feedback might be appropriate for a simple paired- ‘essoviate task, there are other eases where itis clearly not realistic. Suppose that an experiment is run in which there is aru by which the 28 Representation of Peychological Processes as Markov Models stimuli can be classified For example, ifthe stimuli are geometric figures, red figures may have one response, blue ones another. The subject's task in this concepridentficaion experiment isto learn the correct response toa stimulus, but to doth by discovering the rule so that novel stimuli can ako be correctly classified. One way for the subject to discover the cule is by a win-stay/lose-shit hypothesis- testing strategy. At any given time, the subject holds a bypothesis bout the rule and uses i! to make responses. If that hypothesis, be it lltimately right or wrong, happens to yield a correct response, the subject sticks with the hypothesis. If it yields an error, the subject sbandons it and tries a new hypothesis. IFthe correct rule Is colo, fOr example, a rule based os the size of the stimulus yields correct re- sponses only by chance and is changed as soon as it causes an error ““Leaming” is all-or-none in nature and takes place when the correct rule is chosen. ‘The allor-none model, as embodied in the four assumptions above, does not properly represent ths situation, In order to adequately de~ sctbe the subject's actors, the feedback must be able to influence the transition probabilities, so that transitions to L take place only alter errors. To let this happen, the knowledge of the outcome is incorpo- rated into the knowledge state space by breaking the guessing state G into two states, GC and CE. These represent the subject’s knowledge following both the response and the feedback: in GE a guess has been ‘made and called an error; in GC a guess has been made and called correct. The state L is unchanged ‘Transitions among these states are shown in Figure 2.5. At first, the process passes between stilesGC andGE, going to GC with probability 4 and to GE with probabiity 1 ~ g. Transitions into Z take place only fom GE (i<., following an error) and occur at learning rate denoted by the parameter 8. This defines the transition matrix en Gc]o Ime Ge |B - Me W-pKl~ 9) Initially, the subject guesses, so starts in one of the G states. The ‘chance of this initial guess being correct is the inital state prob abilities are Pl) = 0 PGC)=~2 GE) -e ‘The incorporation of the feedback into the Knowledge states o simplifies Correct re state GE, probabils 8 vill be Hypotte and Traba inclaing (1978). The Linear ‘The learni info:mation chance tha theres no as te lines ‘moving on correct res Formally tunities. T teal (ee F sels + geometric rather. The o learn the rhe rule so vay for the hypothesis hypothesis thesis, be it sponse, the the subject is color, for ‘correct re ‘the correct ions above, auately de. uence the only after is incorpo- sing state G knowledge ss has been and called At frst, the probability “place only ate denoted en states, The state probs 22. Simple Models of Learning 29 igure 23. stteseanston diagram foranalcor-none model in which lean ing only flows eror, simples the response probabilities and makes them nonprobabilsic. Correct responses result from states L and GC, while errors result from state GE, all responses taking place with probability 1. The noa- probabilistic aspect ofthis response mapping is useful in its own righ, £8 will be seen in Chapter 4 Hypothesis-testing models of this type were developed by Bower and Trabasso (1964) and Restle (1962). A more complete discussion, rneluding several related models, is given by Millward and Wickens 1974). ‘The Linear Model ‘The learning process can be viewed as the gradual accumulation of information, rather than the formation ofa discrete association. In such 2 view, each successive study of a paired-associate item increases the hanes thatthe item wil be responded to correctly on the next tia, but there is no single sharp jump. In te simplest model ofthis type, known ts thelinear model, the item passes regularly through a series of states, ‘moving one state further after each presentation, asthe probability of correct response increases, Formally, let; be the state of the item following ¢ study oppor- tunities, ‘The process starts in Sp and advances to a new state on each trial (gee Figure 2.6). The transition probabilities are rtsatsid= (5 Ladtue a 30 Represeraton of Psychological Processes ax Markov Models Figure 2.6 Sttesernsion diagram forte lisse model or, in tabular form, 9 Although this process & not very interesting probabilstically, lwoduces two important new properties. First, its state space is not nite, but infinite. Although in any rel experiment an item has only a finite number of presentations, there is no good Way to seta limit on this ‘number. In such cases itis easier to define potentially infinite stat space. The second difference between this model and those presented earlier isthe lack of an absorbing state. The process never comes to rest in a single state of Matrix 2.9, but continues to advance. ‘This knowledgesstate process isnot probabilistic an is only trivially 8 Markov chain. But responses must be attached to its states, and the probabilistic aspect is introduced in the response mapping. Suppose that the probability ofan error while in $, is Clearly, all the e, cannot be free parameters, for there would never be enough observations to estimate them and to test the model. Tastead, lt ¢ ~ e, and let each change of state shrink the probability of an ervor to a proportion 6 of its prior value = Bei @.10) ‘As is demonstrated in the next chapter or by using the methods of ‘Appendix B, this implies that ane eu ‘The linear model has an important history in mathematical learning theory (see Bush and Mosteller. 1955; Bush and Sternberg, 1950: of ‘any mathematical psycho ogy text). The linear model has often been contrasted with the all-or-none model, for each represents something of an extreme ofits type. ‘The infinite state space just described is not the only way to for- smalize thet chain with there are nc that one st abilities of | ‘This versio the iain. on whether by st cease an in abandoned, mines whic fence betwe The Rendon In the tines sense that « probability assumption ‘which thes mapping of One way all-or-none Let bet takes place tions 2.81 and dels eo ically, it in pace is not has only 3 limit on tis finite state presented aly tcvially| tes, and the ig Suppose fee; cannot vations to ind let each ation # ofits 2.10) methods of ean cal learning ‘B, 1989; oF ‘often been omething of way t0 for 22 Simple Model of Learning 3 ‘malize the linear model. It can also be thought of as aone-state Markov tain with a nonhomogeneous response mapping. With only one state, there are no transitions in the Markov-chain portion of the model, To that one state is attached a response mapping that reflects the prob abilities of Equation 2.11: AC) = 1 et ey ‘This version of the linear model is no diferent in its predictions from tte infinite-state version. Which way one thinks of the model depends to whether one is more comfortable with @ response process indexed by states, asin Equation 2.11, of by trials, asin Equations 2.12, In one cease an infinite state space is necessary; inthe other, homogeneity is shandoned. The larger theory from which the model is derived deter. ‘ines which view is simpler, for inthe mathematis there is no difer- ence between the choices. ‘The Random-Trial-Inerement Model In the linear model, every presentation ofan item is effective, in the ‘sense that every tral leads to some learning and some increase in the probability of a correct response. This is not necessarily a realistic sumption, Quite reasonably, some useless presentations occur, on ‘which the subject learns nothing. This suggests combining the response ‘napping ofthe linear model with a probabilistic tansiion mechanism. ‘ne way to construct such a model is to use something like the “alor-none model as & building block to govern the transitions upward. Let « be the probability that an advancement through the state space takes place, Then the transition probabilities of the linear model (Equa- tons 2.8 and 2.9) become en) and ow 32 Representation of Pecholopcal Processes ax Markov Models ‘The response mapping isthe same as that of the linear model, given by Equation 2.11. The resulting model is known as the randonc-tials- increment (RT1) model (Norman, 1964) because the probability of a correct response is incremented on randomly occurring tials, ‘The RTT model contans the linear model as a special case when 4 = 1. Ihalso subsumes the all-or-none model when 8 = 0, for then the probability of an error fills trom its initial value to O alter the frst sition, All transition after the fist are not reflected in changes in the response probabilities, so are invisible in the data, The model can then be reformulated with S, and 5, playing the role ofthe guessing and learned states ofthe allor none model. This position ns» generalization ‘of both the allor-none model and the linear model isa useful property of the RTI model. For one thing, when the model is ft to a set of data, the sizes of a and @ indicate whether the alLor-none or gradual proper- ties are emphasized ina set of data. For another, tests ofthe hypothe- ses a = Land 9 = 0 let the linear and the all-or none models be com- pared. Infact, the origins. motivation for construction of the RTI model Was this generalization of the two simpler models. 2.3 Two-Stage Learning Models Both the linear model andthe RTI model are considerable retreats from the simplicity of the single association embodied in the all-or-none ‘model. Some such retrect is necessary, for the allor-none model is frequently not supported by data. Often there is evidence for perfot= ‘mance that is better than chance guessing, yet not perfect. This sug- gests that partial learning states should be added to the model, How- ‘ever, itis not necessary to go as faras the linear model or the RTT model, ‘A single state interposed between G and L is often sufficient. This section examines such models. In many of these models, associations remain all-or-none in natu, but with a more complex structute for the learning process An Intermediate Stote ‘When a second state of learning is inserted between the two states of the all-or-none model, the result is a model commonly known as the two-stage model (or sometimes the rwo-element model), There are three knowledge states: a guessing state G, an intermediate state , and a leamed state. The process passes from G to L pethaps via J. There are several psychological processes the slate vould represent, Dut before discussing them, consider a general form of the two-stage model As with state G, Ww presented, take place the worst next, With and goes i Ailfeent 9 enter L, be P In the vers entered | stated. Tet remaining final earns smarizing th Next, ¢ models, a1 and adele el, given by ndom-trials blity of a als. rease when for then the ter the first | changes in + model can messing and neralization fl propery set of data, iual proper- he hypothe- cls be com- > RTT model eats from allor-none te model is for perfor- . This sug- odel. Ho RIL model, cient. This ssociations tue forthe fo states of, own as the re ae three tel, anda ial, There resent, but two-stage 23. Two-Stage Learning Models 23 ~ Figure 2.7_Stteseansiton diagram for's two-stage mote. ‘As with the all-or-none model, tems start out in the no-knowledge state G, where only guessing takes place, Whenever an item in Gis pyesented, there is a chance that some learning occurs: let exit from G take place with probability « (see Figure 2.7). The difference between the two-stage model and the al-or-none model comes in what happens next. With probability 2, an item that leaves G is completely learned tnd goes into the learned state L. (Of course, neither « nor 6 is the ‘ime as it was inthe all-r-none model; parameter letters are reused in diferent models.) With probability 1 ~ B, an item leaving G does not tater L, but ends in the intermediate state . Thus, for example, PUIG) (go to Meave G)Peave G) = (1 ~ Ba Inthe version of the two-stage model treated here, once an item has entered, it never returns to G, but either slays there or progresses to ‘Sate. Ifthe probability of progression toZ is, then the probability of femaining in is 1 ~ 8, When the item reaches state Z, representing Final learning, itis absorbed and no further transitions are made. Sum Inarizing these probabilities, the transition matrix of the chain is eis) Next, consider the starting configuration. Rather than starting all items in one state, as was done above forthe all-or-none, linear, and RT models, more general initial vector can be used. Let PL) = Puy = (=o e219 and PIG) = (1 = lll - 9) ein 34 Represemation of Pechological Processes as Markov Models be intial conditions, where «and 7 are unknown parameters. In this ‘way, any configuration of starting states i possible, In many situations, it may be possible to parally determine the parameters ¢ and r from a priori information; for example, if al the items are unknown to start with, then o = r= 0. In other cases, one or both parameters must be estimated ‘The response mapping remains to be specified. Clearly, in the gues- sing state, the probability of a correct response is g, a it was for the all-or-none model. The value to assign to P(C|) depends on the in terpretation of the intermediate state, In many interesting eases, the responses in? are not simple guesses. Hence, 0 be general, the proba billy of a correct resporse is left free here, denoted by ‘Many different theories give rise 1o the (wo-stage model. Three of these are deseribed in the next few paragraphs, and some others in Problems 2.6 and 2.7. Although the general theories underlying the throe models are diferet, they lead to the same formal model and s0 ‘make the same predictions. This emphasizes a point already made in Chapter I: the fact that a model fis the data is no proof tha the theory behind the model is correct. Acceptance of a general theory must rest ‘on more than the agreemeat of one of its predictions with data, Issues ‘of consistency, simplicity, and breadth—beyond the scope of this ‘book-—are important. The situation is further complicated by the fact that the equivalence of two models isnot always obvious. Models that appear different may be mathematically identical, This issue is quite important and is the topi: of Chapter 7 ‘One interpretation of the intermediate state is as a state of partial learning. Suppose that the task is such that selection of the correct response requires information about two different aspects of the re- sponse. The set of resporse alternatives, for example, may be divided into several classes; in particular, the set (1,2, 3,4, ,€} contains both ‘numbers and letters. Eniry into L means that both the class of the response and its particular value within the class have been leerned: While state indicates thatthe class of the response has been lesred, but the value is still unknown. Whether the response probability, ,in the intermediate state difers from g depends on whether knowledge of the clas ofthe response lets the subject eliminate enough alternatives to increase the chance of a correct guess, Even when r= g, the two. stage model's predictions differ from those of the allor none model beeause the passage through J retards learning, ‘A second interpretation of the two-stage model is as a picture of shorts and long-term memory. An item ia state G is one that has not been learned yet in any sense, while an item in state has been placed insome form of long-term memory store, where it will not be forgotten, atleast for the duration ofthe experiment. Between these states, items ae retain leamed or of forgets However, senced in Finally, the items many way to lst for fora few which of ¢ Inadequats state L an intermedi rect respo rectrespor within & in Bree w motel to The Longe The interp are quite t thar. pres« thecry, rat more detai rateanalys section. TE and Croths ‘Once ag denoted G, that, when place with the initial term store gone! node the next tr the item mi: itemis tran by te stat adele ters. In this y situations, and from 8 yn to start ers must be in the gues- was for the son the in- y cases, the the proba: 1. Three of | e others in lerlying the odel and $0 dy made in tthe theory y must rest data Issues ope of this by the fact, Models that ue is quite e of partial the correct of the re ‘be divided tains both 388 of the en learned en learned, bility, in owledge of, lternatives f. the two: one model picture of hat hae not en placed forgotten, ates, items 24 Two-Stage Learning Models 35 ‘are retained in a short-term store in which they may eventually be learned or may be forgotten again. Thisis state/. There i sill chance ‘of forgetting in this state, so the probability of an error is not zero. However, because the item is retained for some time, the probability of ‘8 correct response is greater than g. A version ofthis model is pre- sented in greater detail in tho next section. Finally, the model can be interpreted in terms of the way in which the items are coded (Greeno, 1967, 1968, 1974). Suppose that there are ‘many ways to encode an item: most totally inadequate, afew adequate tolast for the duration ofthe experiment, and some that are adequate for a few trials but eventually fal. The nodes states correspond to ‘which of these encodings is currently employed. In state G the item is imdequately encoded and performance is no better than chance, in state L an adequate encoding has been adopted, and in state f one of intermediate worth is held, The intermediate encoding produces cor. rect responses for a while, but eventually ails, Because ofthese cor rect responses, ¢ < m= 1. In some circumstances this interpretation is ‘most natural ifthe changes of encoding are brought about by errors within a hypothesistesting framework. The subject keeps an encoding in force until it fails. This requires @ modification of the two-stage ‘model to allow state changes only following errors (see Problem 2.5) The Long-and-Short Model ‘The interpretations ofthe two-stage model inthe last three paragraphs are quite brief. Obviously, the psychological bases are more complex than presented here, The models originally developed out of the theory, rather than the other way around, To show this development in rmore detail, a version ofthe two-stage model based on a more elabo- rate analysis of long-term/short-term memory stores is presented inthis section, The model is closely based on a model proposed by Atkinson and Crothers (1968) ‘Once again suppose that items start out in a state of no knowledge, noted G, in which only guessing takes place, and once again suppose that, whenever an item is presented, departure from the state takes place with probability « (see Figure 2.8). This event is interpreted as the initial encoding ofthe item, which brings it int the subject's short term store, This encoding is indicated by passage to the small hexa- tonal node labeled & in the state-transition diagram. However, before the next trial, several thines ean happen so that, when next observed, the item may not sil be in this encoded state, One possibility is that the itom is transferred into long-term or permanent memory. Represent this by the state £, and let & denote the probability of the transfer. With 38 Representation of Peychological Proceses as Markoy Models Figure 2.8. State-tansition diagram for the oag-and-short model. The hexagonal nodes ae intermediate steps in the analysis of the model, but are not know fe sates ofthe Markov cha probability 1 ~ 8, the item stays in short-term store (indicated by the hexagonal node \). If not transferred to, an item in this state may be forgotten before the next test; let the probability ofthis be @. Thus, wo additional states are needed: F for items forgotten from short-term store and S for items stil retained. In summary, the transition prob- blities for an item starting in state G are PG) = 1 ~ PU = a8 SIG) = alt - BL - 8) and LAG) = a(t = 6 ‘These sum to 1, as they must, ‘Once an item nas been encoded and has left state G, it wil not return; 0 all transitions into G have probability 0. State L, which represents permanent storage, is absorbing, so without outward transitions. Ifthe item isin either state S or state Fit has been encoded and is subject to transitions very similar to those following departure fromG. In effect. it ‘stars each tral from the rexagonal node & in Figure 2.8, Presentation ‘of a forgotten item (in state F) brings it back to the short-term encoded ‘stale, with a chance of transfer to L or of further forgetting. To allow forthe po tolong-ter used ton change, Took like 1 Pp and s0 for ‘The res states. Sy for guosset response, pro>ebily ‘The mo mapping: anon r y 3 ag This alse ory" witht Farsthe s: transition ¢ allows one State F dist items in st The trans rot look lib ever, the st with the backward model. The Chayter 7, either Mate Short mode ‘model, ist some of the ifthe psych ate may be “Thus, Wo short-term | net by te | ition prob- not return; represents fons. tf the S subject to Inetfect, it resentation encoded 8. To allow aly 23. Two-Stage Learning Models 37 for the possibility thatthe initial eneoding may interact withthe transfer tolonge-term memory or the recovery from, parameters y and 8 can be used to represent long-term encoding instead of Except for this ange, aad for the omission ofthe « term, the transition probabilities Took like those of the transitions from G; for example, Piss) = (= y= 4) PSE = axl = 6) and so forth, ‘The responte mapping depends on what is retained in the various rates, Suppose retention is alhor-none, £0 that items are either known for guessed. ‘Then the response carries states Sand L to a correct response, but maps states F and G to a correct response only with probability ¢ ‘The model is summarized in the transition matrix and response mapping: Picorrectstate) 0 0 a-p0-8) d= ne (= H1-8 0-96 ap al At 8) alt Bb 1 c 1 y 5 Tocelo 18) ‘This also ilustrates how Markov states are used to provide “mem- ory” within a model. Although the response properties of states G and Fare the same-—in both states the curreat response is not known—their transition probabilities ae different, The distinction between the states allows one element of the history of the process to be remembered: ‘sate distinguishes forgotten items, which have been in state S, from items in state G, which have never been encoded. “The transition matrix ofthe long-and-short mode in Matrix 2.18 does ‘ot look like the matrix of the two-stage model in Matrix 2.15. How ‘ever, the states ean be grouped into three sets, (G}, {S, F}, and (L), with’ the process passing from the first set to the last, but never tackward. This grouping parallels the three states of the two-stage ‘model, The parallel is quite fundamental: infact as will be discussed in Chapter 7), the two models make identical predictions. Models with either Matrix 2.15 or Matrix 2.18 ft data equally well. The long-and- ‘hort model, with more states and more parameters than the two-stage hhodel, is more elaborate than computation alone requires. In fact some ofthe parameters of Matrix 2.18 cannot be estimated. However, ithe psychological theory that underlies the long-andsshort model is of 38 Representation of Paoholopeal Processes a Markov Models primary interest, itis the model of Matrix 2.18 that one wants to test. Even in this case, the medel in the form of Matrix 2.15 is useful as a ‘computational aid, Multiple Transition Operaters In the construction of a Markov model, there is no necessity to apply ‘the same transition operator at every instance when the state may change. If several differeat operations take place in the course of the experiment, it’ may be best to represent them by diferent transition ‘operators. Depending or what happens, one or another operator applies. The composite process is the combination of several simpler operators ‘This notion has been used to represent both acquisition and interfer- ‘ence in a learning model (Cale and Atkinson, 1965). The models for the paired-associate task that have been described so far have followed 4 single item and have neglected what happens to other items. OF ‘course, subjeets learn more than one pair in an experiment, and be- ‘tween any two presentations of one pair, other pairs are presented. The situation was illustrated in Figure 2.3, which showed three pairs presented between the two presentations ofthe pair JIV-5. These presentations create interference and can lead to forgetting of what had been learned about the JIV-5 association on trial 2. A more complete ‘model of the process requires learning operators that apply when an item is studied and forgeting operators that apply when any other item is presented. Consider a two-stage model in which the intermediate state repre- sents a less durable encoding ofthe response. When the item is pre sented for study, the normal transition operator, Matix 2.18, applies: 2.19) Glog ai-p) 1-a So fer, this is nothing new. Now consider the presentation of an inter: fering item. The interfereece has no effect if the item isin L oF G, for items in. are permanently learned and there is nothing to forget about fan item in G. But ifthe item isin, it can be forgotten, that is, trans- ferred back to the guessing state. This operation is described by & second transition operator: where Cis, ting. Cons appies on ands fort either of th Procedur cated proce plex, itis whieh can covered in’ simple ope process. It model that 2.4 Choi Consider a altematives one ofthe 2 the choice « succesful» Single Choke “The simple tives, say the stuatin A and, th place that b ‘The time sp seons most Wher ep se od

You might also like