You are on page 1of 59
es § Individual Decision Making Colin Camerer 4. Introduction [il eview recent experimental studies of individual decision making-wih their implications for ecouomics in mind’ Devision making is increasingly important for economics for a eat to reasons. First, in many ecouomic settings individuals make decisions by and among themseves: consumers save, sell ther labor, buy houses and durable goods, form economic and soci relationships, and bargain. In these cases the isttational veils separating people from others are thi. A couple decides whether to buy a house trom ancther couple; Maro hires college student to work inthe grocery store he owns: a daughtec borrows money from her mother, (nother stings the institutional veil separating individuals doing business is thick—Monique leads ‘money to dhe shareholders of General Motors trough the concrete veil of Cit ‘bank, where she has a savings account and GM has a line of credit) “The thickness of iasituional veils i important because ther isa strong intui- tion Yat instuional forex correct erors people make; the more directly people trade with eachother, tha inition implies, the mot likely thee erors are to persist, Economic analysis as increasingly reached into setings with thin veils recenty. Judges are presumed o mike law as if they had ecovorsic ficiency in ‘mind; there are models in which people opimize marriages, sleep suicide, and catramartal affairs; tbe household is modelett as @ unit of production; and so fort, In these settings systematic exors by individuals may not be corected by instintionl forces. Studies of individual decision making can help predict whea tmatket prices may be wrong and allocation inefficient and suggest ways to in prove efficiency. ‘Second, econo analysis has alse reached into increasingly complicated do- tains recently. Uni thirty years af0 there were Few formal models with any uncertainties, Weak assumptions sb0ut agent rationality were adequate to gene ate strong marke level results (e4, Pareto optimality), Now many mode's pro- sume agents can make ehoices under risk and uncerainty, overtime, keeping it ‘mind subile yame theoretic efits. As the models grow more and more compli- ced, agents are assumed (0 have more and more rationality. Then itis nore likely individual agents wiolate the models; studies lke the ones I describe may tellus how and why: see COLIN CAMERER A, Limited Rationality and Decision Research For the last thiry years or so, most research on individual decision making has ‘ken normative theories of judgment and choice (typically probability rules and Utility theories) as null hypotheses about behaviet, and tested chose hypotheses in Dsyehology experiments. Much of this work is called “behavioral decision search” (aterm coined by Edwards {1961a)) or, sometimes, “cognitive illusions. ‘or “cognitive misperceptions.” The goal is to test whether normative rules are systematically violated and co propose alternative theories to explain any ob- served violations. ‘The most fruitful, popular alternative theories spring from the idea that limits ‘on compatational ability force people to use simplified procedures or “heuristics that eause systematic mistakes (biases) in problem solving, judgment. and choice ‘The roots ofthis approach are in Simon's (1955) distinction between substantive *atinality (he result of normative maximizing models} and procedural rational- ity—people behave coherently by following reasonable procedures but some ‘umes make suboptimal decisions as @ result |. Why Study Ertors in Decision Making? Cataloguing systematic violations of rational models was not always the theme of the poychologists efforts 1967, Peterson and Beach wrote a review of research on intuitive statistical judgment and concluded that people obeyed normative Jaws rather well. Psychologists began focusing on judgment errs in the 1970s because they thought judgment errors might reveal how people generally make {udgments just a optimal illusions tellus about perception and forgetting tells us ‘about memory (Kahneman and Tversky 1982). The same scientific heuristic is ‘sed in other fields, The Great Depression, the stock market crash of 1987, and the savings-and-loan crisis are carefully studied for clues about the general behav. ior of economies and markets. Engineers study bridge collapses and sirplane crashes to learn how to build sturdier bridges and planes, Whether people make judgment errors frequently of not is difficult to judge and—to most psychologists—beside the point, Prychologists study errors be- ‘cause if people use simplified procedures 1o judge and choose, those procedures ‘may be seen most clearly through the errors they cause, For economists, the fe ‘quency of errors is important because errors cnight affect economic efficiency, and methods for removing ecrors could be useful policy tools. B. Two Controversies: Methods and Implications Since many of the psychologists’ studies can be seen as direc attacks on assump- tions of individual rationality, the studies are sometimes hotly debated. There are ‘two kinds of debates: methodology and iaplications. INDIVIDUAL DECISION MAKING sa The conventional methods sed in eycholgial sc of econ makin ar lle diferent than the convene exalted by shferecan ne ded gout ts adhoc nthe pychoegy cxpeio a Pag? RO Paid according to thelr pecfrmance, or are pd small aneanne agen, naw label hay inde ements eae Imake repeated choices unr statonaty replication: eaten reat hy diving ites: and As tel ay bt ‘tis, Ropistno ndings tego mato of experimen economics store popu a oe a ket ievel prediction: whether cy do an empl qucsion Eom exsinens Rep aver eg. Plat 180) Eepoinos ee eae ms teary ge smh oncwaton noe se eee gale activity, which ate the best raw materia) for judging whether individual er- {Repent and ingot Sor aggregate behavior Thea sso conpaiginvids und gare beaver wie oe ‘They ae reviewed blow, insecions NCDED oi Gene ©. A Map and Guidebook Te reaper ea tence ein i tire nach, eat seta ang an a te ir Slt sc, ee te ee Sonoma mh Vocations oro arose sty na soa COLIN CAMERER findings in theoretical terms familiar 19 economists, and 19 provoke replication ‘With numerous suggestions for further research, ‘The methodological range of studies summarized inthis chapter is perhaps as wide as in any chapcer in the handbook. sprinkled brief digressions abou meth- odology throughout the chapter. a points where they illuminate debate and where the debate provides a context that adds flavor to an otherwise bland discussion, (Other sourees include Thaler (1987), who presents much of the same evidence organized as a critique of economic tenets (and see his Jownal af Economic Perspectives columns, collected in Thaler |}992}). ited collections of important articles in behavioral decision theory are Kahneman, Slovic, and Tversky (1982), ‘Atkes and Hammond (1987) and Bell, Raifa, and Tyersky (1988). There are graduate level textbooks by Davies (1988) and Hogans (1987). Texts by Bazer- ‘man (1990) and Russo and Schoemaker (1989) ate easier, Yates (1990) harder. A series of atticles in the Annual Review of Psychology {mosc recently Payne, Bettman, and Johnson [1992}) provide a2 authocitative chronicle of psycholop! cal decision research. The chapter by Abelson and Levi (1985} is a rough equiva ent of this chapter, aimed at psychologists. New work on models of choice is reviewed by Machina (1987), Fishbum (1988), Weber and Camerer (1987), and ‘Camerer and Weber (1982), ‘The psychology-economic nexus is covered by the book edited by Hogarth and Reder (1987) (reprinting a 1986 Jowrea! of Business special issue), catically viewed by Smith (1991). Cox and Issac (1986) cover & small patch of similar ground, 1, Judgment A. Calibration Good probability judgments should match actual relative frequencies. The match Is shown in a “calibration curve." For example, in 1965 the National Weather Service hegan requiring ts metzorologists to announce mumerica!jadyments of te probabil of precipitation. Figure 8.1 shows aalibraton curve for one Tore- caster, using actual Forecasts from several days. On the Y axis isthe relative frequency of events (propotion of days with precipitation) foreach category of probability forecast shown om the X axis. The mumber of evens in eacb forecast category is indicated by the size of each point (and weitten alongside it) The forecaster shown in Figute 8.1 said there was » 30 percent chance of ain on 160 days: it actually rained sighaly more than 38 pereent of those days. Accuracy of probability judgments has ww9 distinct components, calibation sind rsaluion (Sometimes called “calibration-in-the-small” and calibra ‘he-large”). Calibration is how wel the event forecast a particular category (ll vents with 3 probably) matches the actual relative frequency of those events (60 of 160 occured). In a calibration curve Hike the one shown in Figure §.1 calibration is measured by how close point are to the identity line (adjusting for INDIVIDUAL DECISION MAXESG so Proportion Prcitaton TSS Probabilistic Forecast yp Fore 8.1. Caltrain raph for Feat A protic orecas of resipaton forth Chae sea Numbers a cas fequeniex Sauce: Mamay and Winkle 1977 sampling error). Resolution (also called “discrimination” is how well prababil ties enable one to discriminate between likely and unlikely events, A high-resoh- tion forecaster will have many forecasts in the extreme categories near 2e70 and ‘one. When making predictions is difficult—~in ong-term economic forecasting, for example—resolution may only be achieved atthe expense of calibration, bY ‘confidently making high and low guesses that ae only partly tight. ‘The judgments ofthe weather forecaster in Figure §.1 show terrific elibration (the points are close tothe line) and good resolution (riost ofthe observations are between zer0 and 2). Calibration as good as the weather forecasters" seems to De tate (see Lichtenstein, Fischhoff, and Phillips 1982). Some empirical calibration, ‘curves based on smdents’ judgments are shown in Figure 8.2. These are “half ange” curves: from two possible answers to a general interest question—did potatoes come from Ireland or Peru?—subjects pick the more likely answer and judge its probability (which must be atleast 5, In general subjects are overconfident. They are insufficiently regressive ia judging We likelihood of events, Events they say aze certain happen only 80 percent of the tie. “Full range” curves, with subjective probabilities from zero. To one, show overconfidence too. (Events judged to be impossible happen 20 percent of the time.) However, subjects are often underconfident when ques- tions are easy (ie, when the percentage of people answering the questions cor- rectly is high), sa COLIN CAMERER 10 tard ard Paton, 1973: Protas azar Paterson, (973: 0 Pin ae ig 1977 Lenn exbenes) 5 * 7 2 : 18 ‘Sujet’ Responses ‘ane #2 Canason fr blfrange, eneabnoadge ems, Rapin rom Lichen scot, and Pip 1993) with person of Cambigge University Pes For aes ead ge, bo. LL. Scoring Rules {Tn most ofthe studies above, suits were not directly rewarded according othe ‘accuracy nf cher probabilities. “A proper scoring rule” isa scheme that rewards ‘probability judgments, depending on the judgment and the outcome of the event being forecasted, in a way that induces truthful revelation of probabilities. An example isthe quadratic scoring rule: If the subject reports a probability p, pay her $(2p ~ pF) if the event occurs and §(1 ~ 7°) ifthe event doesn’t occur” (Or define [= 1 te event ceca an T= Of mt am py SL = tp = 193) For Se, a subject who reports =.10 earns either $19 the event occurs oF tno ovepum p= 20carps8 3601 $96 I the subj sks he ue probity 48-20, then the p= 20 et earas an expected value of 2 (8.36) + 8 (6.96) = S84 and the p =.10 et ears $.83, so the subject should report p = 20. ‘Besides being incentive-compatible scoring rules enable judgments of probar bility to be elicited without mentioning the word “probability” oc defining i. INDIVIDUAL DECISION MAKING 593 Instead, a subject expresses a probabillty judgment implicitly, by choosing among. varsous bets, (Scoring rules are sometimes used to grade students fr probabilistic answers to multiple-choice questions.) However, scoring rules assume risk neutrality; if a subject is risk averse her expressed probabilities willbe biased toward .5. (Allen [1987], suggests paying subjects in lotery tickets; ef the discussion of the binary lottery procedure in ‘chapter 1.) And the payoff function has a flat maximum around the trie subjective probability 7, asthe example above indicates, so subjects are no! penalized much for misteporting,* ‘The calibration studies described im this section did nor use proper seoring. ules. However, there seems to be liule difference between judgments motivated by scoring rules and unmotivated judgments (Beach and Phillips 1967; Jensen ‘and Peterson 1973), The main difference is that when subjects use extreme prob- abilities 00 frequently (and inappropriately), seoring rules punish their mistakes severely and reduce them (Fischer 1982). When rewarded with “improper” scor- ing rules that do not penalize misrepors, subject learn to misreport probabilities (@lelson and Bessler 1989). For example, suppose subjects who report an event probability p ate paid p if the event occurs ane 1 — p otherwise. Then subjects ‘quickly lear to exaggerate dheir beliefs, reporting p = 1 if ther true belief is above .5 and reporting 0 otherwise. Scoring rules could be useful in a wide range of economics experiments to ‘measure probabilistic beliefs in an incentive-compatible way. For example, game theories often make sharp predictions about beliefs that are difficult to test indi- rectly (eg, beliefs after “out-of-equilibrium” events that should never occur and that actually occur only rarely). Some researchers have used scoring rules to elicit beliefs of subjects in games. The enly published economics experiment that use them, that I know of, is MeKelvey and Page (1990) 2. Confidence Imervals Other studies have elicited confidence intervals for quantities (the length of the ‘Amazon river, next month's spot oil pice), nsead of probabilities for events. in these studies confidence interval are typically too narrow; subjets seem to an- chor ona point estimate, then adjust upward and downward by 10 litle, Fity~ percent intervals included te tue quamity only about 30 percent ofthe ime; 98 percent intervals, only 60 perceat ofthe time (Alpert a Rafe 1982), Subjects an lear to spread ther nurvls out with intensive feedback and taining, but they never go high robablty intervals quite wide enough. ‘Many studies have examined the effets of expertise on overconfidence A lot ‘of these studies were motivated by the impressive performance of weather fre ‘asters, shown in Figure 8.1. Researchers then became curious whether ther expeds were equally wel calibrated, Profesional accountansntevals around estimated account balances of cient fms awe good (Tomassiai etal. 1942). “Weather forecasters” intervals of high and low temperatures are precisely the ight 594 COLIN CAMERER ‘width (Murphy and Winkler 1974). But intervals around estimates of physical ‘constants, published in physics journals, are systematically too narrow (Henrion. And Fischboff 1984). ‘There are mixed effects of expertise in studies of event-probability calibration too, Students and professionals forecasting outcomes of basketball and baseball games are poorly calibrated (Yates 1982; Ronis and Yates 1987), Noviees,statis- Hcal exper, and blackjack dealers are equally well calibrated (Keren 1988) and ‘expert bridge players are better calibrated than novices (Keren 1987) at judging the probability of winning a hand given certain cards. Physicians are accurate in ‘some setings and poor in others, especially diagnosing rare diseases (Yates 1990, ble 4.1; cf. the discussion of base-rate fallacy in section 1. later). esting ‘odds at horseracing tracks are wel calibrated, witha slight but persistent tendency to overestimate the chance that longsbots will win and underestimate the chances of favorites (Ziemba and Hausch 1986). (Curiously, the opposite betting patter ‘occurs at Hong Kong racetracks; see Busche and Hall 1988.) Forecasts by profes- sional ecouomists of the chance of economic downrom are pretty well calibrated fone quarter ahead, but the calibration gets much worse as the foreeast horizon extends ovt to four quarters (Braun and Yaniv 1992), ‘There are a few cross-cultural studies of calibration, Asians seem to have high resolution~they use extreme probabilities a lot—but aro very badly ealibrated (Wright et a. 1978; Yates et a. 1989), Some psychologists think differences in the role of chance and bravado in Asian and Western philosophies and culture ‘might account forthe differences. [Recent studies found an important difference between “local confidence,” the sppropriateness of a single confidence interval fr a single quantity, and “global confidence,” the fraction of several intervals that contain their tre quadtities Subjects were university employees and students who were asked ten questions about local operations (e-g., what is the current value of the university's land holdings?). The subjects’ 90 percent confidence intervals were too narrow (8s usual) but ther global confidence was not bad: they guessed that about five often intervals contained true quantities. when only three of ten actualy did (Sniezek ‘and Buckley 1991; ser also May 1986). These results suggest an important differ- ‘ence between the psychological process of constructing a judgment about a single quantity (or event) and making a collective guess about severs) such judgments. ‘Most of us are probably overconfident about the chance of publishing our next anicle in a leading journal or tesching a brilliant class tomorrow, but are more level-headed about Bow many of our nest ten articles or tn classes willbe simi larly successful. ‘The pervasive nding that subjects are (locally) overconfident may have im- Portant economic implications. If people underestimate the width of distributions ‘of future quantities, they will underinvest in Aexibilty and insurance, which ‘might have implications for equilibrium models of rental and owaership of hous- ing, choices of morgage terms (adjustable vs, fixed-rate), marriage and divorce ‘ates, managerial investments in manufacturing flexibility, and so on. Underesti- ‘mation of variation might help explain Why so many sinall businesses fail because INDIVIDUAL MECISION MAKING sos Of insufficient cash flow (stemming from overly narrow planning, perhaps: ef. Kahneman and Lovallo 1993) Recent studies of calibration and confidence have rekindled debate along three lines. The first idea is that part ofthe apparent overconfidence could be caused by probability judgments that are correct on average but contain error (Erev, Wall- sen, and Budescu 1992; Soll 1993), The second claim is that calibration research- fers may have selected sample questions nonrandomly, oversampling “ricky” ‘questions in which natural cues yield the wrong, answer (such as the Peru-Ireland potato question), and hence producing more overvonfidence than is present in ‘natural settings (Gigerenzer, Hoffrage, and Kleinbolting 1991; Jusin, in press). ‘Some new studies sample questions differently and reduce apparent overconti ‘dence, but Griffin and Tversky (1992) and Soll (1993) sampled randomly and still observe overvonfidence. ‘Third, Griffin and Tversky (1992) suggest a framework to onganize many em- pirical results on confidence results. They point out that evidence has both strength (or extremeness) and weight. In several studies they find that judgments of confidence overemphasize the sueagth of evidence (compared to a Bayesian probability benchmark) and underemphasize its weight. Their framework ean ex- plain the observed difference in calibration for hard and easy questions (people Lnderweigh the strong weight of evidence in easy questions), conficting resuits ‘on expert calibration (experts will be highly overconfident in unpredictable en- vironments, when they overweight Weak evidence), and predicts some other phenomens, B. Perception and Memory Biases Machines are natal metaphors and benchmarks for human perception and think ing. The metaphor of men as an information-processor now dominates cognitive psychology (¢g Lachman, Lachman, and Butterfield 1979), Tas proved fruit fal by suggesting coherent theory and many empirical ests. Can people record vents as cameras do? Are memories stored like fms in itary? Does informa- ‘Hon-processing proceed in steps like a computer programm? However, much evidence suggests thet human persion devises sytemati «ally from the camera benchmark and memory deviates fro the computer bench. snark. (My goa inthis very brief section is wo inform readers shout some shreds ‘of evidence, to whet their appetites, and wo suggest ways the data might mater for economics.) For example, Brune, Postman, and Rodsigues (1951) showed sub- jects glimpses of playing cards in which colors and shapes were debertly ris ‘marched-—hearts were black instead of dhe familia red. Subjects thought they saw the funiliar cards red ears). Enors ofthis son are systematic, not random: people more often errby mistaking unfamiliar pater for falar ones than vice ‘ers, Put more formally, erors in abeorbing information appear to be conelated ‘wth how unusual the information is. Misperception of surprising events implies that agents will misperceive outliers that signal regime switches or uring points ima time series. Ter eapetations will not be rational Gn the sense of efciently 598 COLIN CAMERER using available information) because the processing af new information depends ‘nthe stock of old information, or familiar images.* “There ate many biases in memory too, When guessing which cancer claims more lives or which journal 10 submit an article to, people sample their memories Sampling memories is a netural and reasonable heuristic because our memories are a sample of life. But even if our life sample is random, the sample we retrieve from memory will nt be random because memories are not equally retrievable or available” (Tveesky and Kahneman 1973). For example, the most pleasant and {east pleasant memories are more easily remembered, which creates Hlusory nOs- lalgia (Holmes 1970). Personal and concrete experiences are often overweighed (Nisbett ot al. 1976), For example, Kunteuther etal. (1978) found that ‘he pur- chase of earthquake insurance rose afier a quake (though the probability of a subsequent large quake actually falls, because stress on the fault line is relieved). ‘The availabilty of personal experiences is thought to ereate “egocentric” biases in judgments of fault (both spouses think they are responsible for more than balf of their houschold chores, or arguments [Ross and Sicoly 1982]; or two sides in tn experimental dispute both think 3 judge's setement with favor them (sce Babcock et al, in press). Memorable media reports cause biases in judgments because media coverage is not random” (e-g.. Greenberg eal. 1989), For exasm- ple, Combs and Stovie (1979) found that newspapers vastly overreport accidents compared wo diseases, and poople think deaths from disease and accidents are equally common, (In fact, deaths from diseases are 15 times more common.) ‘Availability can limit imagination and make theories, lists of words, or "fault {cees" appear more complete than they really are. Ina study by Fischhoft, Slovie, and Lichtenstein (1978), students and automechanics underestimated ihe proba- bility of “other canses”in an incomplete fault ce listing reasous why acar would ‘ot start. Similar biases in imagining contract contingencies might lead contracts to appear overly incomplete. C. Bayesian Updating and Representativeness When the probabilities people judge are conditional, 2s in updating belief in X after earning M, they should follow the prescription of Bayes rile: PLN POD, “Ry Computing probabilities using Bayes’ rule is complicated. People seem to use simple heuristics instead: they anchor on P(X) and adjust it to reflect M: or they judge POC) by how “representative” X is of M (Tversky and Kahneman 1982). Representativeness will be a useful heuristic because representative values are ‘generally more common than unrepresemtative ones. (Bagles are less representa- tive of the set of birds than cobins, and less common.) But judging likelihoods according to representativeness neglects sorte features that are normatively it~ portant according to Bayes" rule— including the base rates P(X) and P(N, sam. Pada INDIVIDUAL DECISION MAKING sor pling poperis, and cegression effets, Othe features that are or normaiively important oom large in representatveness-based. thinking. Representative- nes therefore erates sever] systematic departures from Bayesian judgment o¢ biases. 1. Underweighting of Base Rates ‘A famous problem used to study Bayesian judgment was introduced by Kahne~ ‘man and Tversky (1972): ‘A cab was involved in & hit and run accident at night. Two cab companies, the Green and the Blue, operate inthe city. You are given the following data: (2) 85 percent ofthe cabs in the city are Green and 15 percent are Blue. (b) a witness idemafied the cab as Blue. ‘The cour tested selablity of the witness under the same circumstances that ‘existed on the night of the accident and concluded that the witness correctly Identified each one of the two colors 80% of the time and failed 20% of the time. What is the probability that the cab involved inthe aceident was Blue rauher than Green? In the experiments, subjects re given the problem exactly as wien above often 4 pat ofa package of problems, Their probability judgments ae recorded, and they ae pod a small sum for participating (or course ere, in some cases), “The median and modal response is 80. It appeas that subjects think the wi ess judgments representative of he actual color ofthe cb, and its representa tiveness leads them to confuse Pidentify Blue | Blue) (from the cour’ tes) With the probability that is asked fr, PUBlueidetfy Blue). According o Bayes rule. the posterior probability that is asked for, PBIveidentity Blu), should reflect the tase ate (Ble) =. 15 alo; but the base ae plays no role in the logic of eprsenemivenes When the bes ac is included the correct posterior proba- bilgyis 1 In these problems, and others like them, base rates ar usually underweighted and often entirely neglected. Stats show that when attention is drawn fo base rates, by varying the base rate in sever versions of the problem or presenting them in causa forms {15 perent ofthe cab acridents inthe city invlve Blue cabs), subjects take base rlas into account but still underveigh them (Ajzen 1977; Bar-Hilel 1980a: ct. Kocher 1989) “The eab question i typical of stimuli used By paychologits o study judgment. ‘Word problems describing natural events aze used to escape from the limits of ‘earlier traditions that emphasized more abstract stirauli,” on the sensible presump- tion that paychologcel processes people use in everyday life could be better un derstood by asking people questions drawn from everyday life, The use of the ‘word problems raises some methodological concerns for both economists and Psychologists. For example, economists might wonder whether base rte neglect son COLIN CAMERER ‘tects asset prices in market; some studies answering his question are reported in section 1.6 below We return tothe methodological concerns after describing ‘replication of the base rate studies, ‘Grete (1980) studied base rate neglect in an abstract seting with threo bingo cages. A draw from the first cage (whose contens were known) determined ‘whether tas Ac state B had occured. The tate A cage had four N balls and two (Gx; the B cage ad thee Ns and three Gs. Subjects observed a sample of sx avs from whichever cage had been chosen (A or B) and were asked to decide which cage was more likely. For example, a subject might observe a sample of ane N and five Gs then choose whether o bet thatthe draws came from the Aor B cage. “The process was repeated several times, with fresh samples each time. At the end of the experimen, ne tal was picked and a subject earned $10 if they had Picked the right cage on that tah ‘Using logit estimation, Grether found that subjects weighted the base rates P(A) and PCB) less than the likelihoods Ptsample | A) and P{sample |B), 25 repre- senatveness predicts, but they didnot ignore the base rates ently. Subjects ‘also thought P(A | sarmple) was especially high when the sample was four Ns and two Gs exactly matching the contents ofthe A cage (and similar forthe sample iatching the B cage), Previous experience with partcular sample, or experience ‘combined with monetary incentives for accuracy, reduced repeseaativeness bias slightly but did no eliminate it, Concemed that Grether's subjects were nt prop- ealy mesivated, Harivon (19896) replicated Greer’ expsiment with varity of financial incentives, He found lite evidence of represeottiveness among, tb- jets with experience or financial incentive. Ther is no obvious way to reconcile the disagreement berween his reals and Grether's, ‘Greer (1991) extended his ealier workin toe ways, In one experiment he was able wo bound the deysecs of belie in random events by having sebjects choose between abet on the most-Hikely cage and abet on achance device (sothet choices revealed whether beliefs were higher [>.75] and lower [<25). In seo- fond experiment he clicited probability judgments witha variant ofthe incentive- compatible Becker, DeGroot, and Marschak (1964) procedure (se chapter 1 for {descripcon). Choices in boin experiments were affected by representativeness. Probebiies elicited with the BDM procedure in the sscond experiment were ‘en far too low or too high, but on average they were fly close to Bayesian {thin 05 t0.10). ina third experiment the Aand B cages cach had tn bails and samples of four tals were drab. Assuming four ball samples cannot be representative often ball cages representaiveness should no affect judgments In this experimen, jodg- ‘ments were quite efferent than inthe frst two experiments—sample information “was underweight rather than oveeweighted (se the next section on conseres= tim). Greder concluded: “This (diference) soggests that in making judements ander uaceraimy individuals use different docision rules in different decision Sitwations.” 2 “contingent-judgment” bypothesis espoosed by aany psycholo- fis, (eg. Payne, Bettman, and Johnson 1992), INDIVIDUAL DECISION MAKING so ‘A Digressionon Methodology: Psychology and Economics Grether's experiments art designed to address many criticisms some economists have of methods used by some psychologists. The Bayesian judyment problem was operationalized using physical devices (bingo cages) raher than a vignette like the cab problem. Subjects made choices rater than simply reporting proba- bilities; they wore paid $10 if on of their choices, randomly selected, was correct. {tn Grether [1991], atypical error cos Sto 20e.) Subjects made repeated choices, ‘with an opportnity 10 lear inthe psychology studies, subjects often answer cach question once hecause the purpose of the experiment sw sudy nial ins itions, not leaning. The existence of some errors was reasonably robust 0 all these changes in conditions in Grether’s dae, but not in Harrison's. Incentives to reduced the number of incoherent and outlying responses (Grether 1981: ef Smith and Walker 1993), “The difference between peychologcal and economic experiments should not be overstated. Inthe 1960s, long before Grether’s work, psychologists and ochers used random devices to study judgment (Edwards 1968) and used the BDM pro- cedure to study valuaons (Lichtenstein and Slovic 1971) Even recently, there it substantial overlap across disciplines in methods, and substantial variation wich disciplines. However, the typical differences in methods are worth analyzing be- cause they usually follow from different background presumptions bout human nature and different target domains investigators hope to generalize to. Itis pre- sumpaous o argue tha either general method is superior For example, many psychologists are curious whether people can recognize and apply satsical rules to everyday situations. Uke the cab problem in which stuisticl structures not transparent. They often use vignettes or problems drawn ‘om natural setings (rather than problems based exclusively on random devices) Tecause (1) they want 10 lear tow people reason about natural events and (2) they think people may reazon diferenly about events and about random de- vices. Given these interests and presumptions, word problems are well suited t0 doing thee research and bingo cages are not. Economists are interested in diffe. nt questions (not How people reason but whether people violate Bayes's rule) ‘and az also more inclined to presume that reasoning about bingo cages and axi- ‘abs is similar, For these purposes. cages and dice are beter because they lay bare the statistical structure (making detection of « Bayesian errr clear and are pre- summed 10 be good substitutes for Word problems, Another area of typical difference i financial motivation of subjects. Psychol sists do not always motivate subjects fnancially—though many beve and a few se alarant about doing so—-because incentives usualy complicate instructions and psychologists presume subjects are cooperative and intznsically motivated to perforin wel. ;Natural timoli are also thought to Keep subjects mentally involved and raise their insrinsic motivation, which substitutes for finance] motivation.) Repetition is anoiher area of typical difference. The psychologists’ asks are ‘often not repeated, with stationary replication, because psychologists are often ‘most curious about initial behavior in @ complicated environment. In adition, 600 COLIN CAMERER ‘many psychologists think stationary replication overstates the frequency, speed, and clarity of feedback the world actually provides. Economists tend to think ‘oppositely: they are mostly curious about equilibrium behavior—the last period, ‘or the frst—and they think extensive laboratory feedback isthe best time-com- pressed imitation ofthe strong learning forces present in natural settings. To reiterate, there is substantial overlap in the way psychologists and econo- ‘mists do experiments. When their methods do difer, very roughly speaking, psy- cchologists use natural stimuli do not pay subjects, and do not repeat tasks. Econ ‘omits pay subjects, prefer blandly labeled random devices as stimuli, and insist ‘on repeating tasks. My view is that these different methods are preferred by differ- ent investigators because they effectively produce answers to different questions. Broad-minded students of individual decision making should have a healthy tol- erance for variety in methods. (And variation in methods is essential to gathering data, to determine whether diferent methods do affect behavior substantially) Its worth noting that judgment error, like those revealed in the cab problem, have been a lively topic of research within psychology too (e.g., Cohen 1981). ‘Many of the arguments made in that literature are like those economists have ‘made about methods or interpretations of results. For example, Gigerenzet, Hell, ‘and Blank (1988) used physical devices to operationalize base rates and found some reduction in base rate neglect (though Grether, and others mentioned fate, also found substantial base rate neglect using physical devices). ‘Amore interesting argument is that some apparent biases might occur because the specific words used, or linguistic convention subjects assume the experi ‘memter is following, convey more information than the experimenter intends." ‘An example isthe famous “Linda problem’ (Tversky and Kahneman 1983). Sub- jects are told the following: Linda is 31 years old, single, outspoken, and very bright. She majored in philosophy. As a student, she was deeply concemed with issues of discrimi nation and social justice, and also participated in anti-nuclear demonstra- tions. "Then they are asked to rank several statements about Linda by their prob- ability: Linda is a teacher in elementary school [Linda works ina bookstore and takes Yoga classes, Linda is active in the feminist movement (F) Linda isa psychiatric social worker. Linda is a member of the League of Women Voters. Linda is a bank teller (T Linda isan insurance salesperson. Linda i a bank teller and is active in the feminist movement (F&T), ‘Any ranking of probability should satisfy the conjunction law: Linda i less likely to bea feminist bank teller (marked F&T), than to be a bank teller (T) or a feminist (F), since the event F&T is a conjunction of the events F and T. In fact, INDIVIDUAL DECISION MAKING oor bout 90 percent of subject exhibit a conjunction fallacy ranking the event F&T as more likely than one (or both) of the eveats F and T, usually. (In a sample of well-trained Stanford decision sciences doctoral students, 85 percent made the ‘same mistake.) The standard psychological explanation is thatthe description of Linda is more representative ofa feminist bank teller than of a bank teller; sub- {ects mistakenly think itis therefore more likely that Linda is a feminist bank teller. ‘The potential linguistic problem is this: in the presence ofthe statement “Linda inst bank teller,” subjects might think that the statement “Linda is a bank teller” tacitly excludes feminists; they might think it actually means “Linda is a ‘hank teller (and is not a feminist." If subjects interpret the wording this way. none ofthe statements are conjunctions of others and no probability rankings are ‘wrong. ‘The linguistic interpretation can be tested in several ways. For example, use a betwecen-subjects design in which some subjects rate the T statement without seeing the F&T statement (and vice versa: or replace “Linda i a bank teller” with the clearly comprehensive “Linda is a bank teller, who may or may not be a feminist,” or with the more specific “Linda is a bank teller (and isnot a feminist)” and see whether conjunction errors persist. In fact, the purely linguistic interpretation appears to be wrong. Tversky and ‘Kahneman (1983) tried both the between-subjects and the clearly-comprehensive variations and still found persistent conjunction fallacies. Others manipulated subtle details of wording and found no substantial changes in some conjunction problems (Morier and Borgida 1984) and some erxor reduction in others (Krosnick, Fan, and Lehman 1990), 2. Underweighting of Likelihood Information (Conservatism) ‘A’second bias is underweighting of likelihood information, or “conservatism.” Conservatism has been observed in Bayesian updating tasks like the one Grether studied. Consider two bingo cages, And B. Bingo cage A contains seven red and three blue balls; B contains three red and seven blues. Suppose each cage is ‘equally likely. Suppose a sample of eight reds and four blues is drawn (with replacement, of course), which clearly favors the A cage, What is P(A|8 red, 4 blue)? The typical response is between .7 and .8 but the Bayesian posterior is actually 97. Subjects are far too conservative in drawing conclusions from sam- ples lke these. One estimate derived from experimental data suggests that i takes {wo to five observations to produce a perceived diagnostic impact equal to the Bayesian impact of one observation (Edwards 1968). McKelvey and Page (1990) ran a study in which subjects observed different parts of a full sample, then reported probability estimates to each other. After hearing the estimates of others, people reported new estimates (taking into ac- count the estimates of others), and soon for several rounds. (This iterative process resembles the aggregation of information through polls and other processes; see McKelvey and Ordeshook [1985]. They observed some conservatism in updat- oor COLIN CAMERER ing of probabiies, Eger and Dickhaut (1982) found some conservatism when accountng students simply reported probabilses, but dhe effets were substan- tially reduced when stimuli were described an accounting content. The conserva: tism also disappeared when subjects revealed probabiies by beting ngaist an experimercer ina way that penalized Bayesian eros. By contrast, Sanders (1968) found conservatism using a proper scoring rule with no Financial incentives. ‘At first glance, the conic between evidence of hase rate neglect (reviewed in the las section) and conservatism seems to indicate that people use Bayes’ rule com average, but sometimes they weigh base rates too livle and sometimes too auch, This isa weak jasiication For ahering to Bayes's rule as a deceptive principle inal circumstances, if one can predic the situons in which We two errors ocsur. (By analogy, alight jacket isthe wrong thing to weat ona trip with ops in Alaska and Tape even i its appropriate forthe average temperature of the two places). Whether erors ar proditable azo situations set he crucial empirical question, ‘There seem tobe several reasons why baso rates are underweighted in some settings (ie taxicab problem) but sample information is underweished in obers {conservatism experiments). Base rates are incorporated when they ae salient Or interpreted causally, as they are likely 10 be in the conservatism experiments