You are on page 1of 44
rope S Pe reararahe es CORE 2.516663, SAGE UNIVERSITY PAPERS Series: Quantitative Applications in the Social Sciences Series Editor: Michael S. Lewis-Beck, University of lowa Editorial Consultents Los Angeles Sociology University of Gaitorn versity of North Cerolina et Chopel University of tows niversity of North Carolina at Greensboro 3° E Kirk, Psychology, Baylor University versity of Calor, Los Angelos jucation and Psychology, University of Washington ‘9, Polical Science, The Ohio State University Publisher Sora Miller McCune. Sage Publications, Inc INSTRUCTIONS TO POTENTIAL CONTRIBUTORS For guidelines on submission of @ monogreph proposal lo iis series, plaase write Depa lows Gity 1A 52242 s24 : NO. 103 SeviestNamiber 07-103 00016663 POPULATION RESEARCH CENTER : DATA ANALYSIS An Introduction MICHAEL S. LEWIS-BECK University of lowa International Educational and Professional Publisher 6 SAGE PUBLICATIONS. Thousand Oaks London New Delhi suaquimu 0} spuom ind 01 Moy syBnD1 OUR smaapuy Hunde OL 9 0s 29 4 SuonEaHdAY 4 uy sekioun ving (6661) 8 We D2LS1H2T veg soup LEP SOLE 6 OL 65 1 U6 96 55 cere mam sx9/pp0 wowssrofy 103 2tnas poe afeoi¢ noses ut ‘suonenigng ates 49 $651 0 1NFEKdOD, DATE DUE. SS ‘DEMOS S607 CONTENTS Series Editor’s Introduction vii Acknowledgments ix 1. Introduction 1 2. DataGathering 2 ‘The Research Question 3 The Sample 3 ‘The Measures 4 Data Coding, Entry, and Checking 7 3. Univariate Statistics 8 Central Tendency & Dispersion 11 Central Tendency, Dispersion, and Outliers 16 4. Measures of Association 19 Correlation 19 Ordinal Data: The Tau Measure of Association 22 Nominal Data: Goodman and Kruskals' Lambda 26 Dichotomous Variables: Flexibility of Choice 28 ‘Summary and Conclusion 5. Significance Testing 30 The Logic: A Simple Example 31 Applying the Logic: Bivariate Measures of Association 35 Critical Issues 38 ‘Summary and Conclusion 40 6. Simple Regression 41 YasaFunction of X 41 The Least Squares Principle 43 ercept and Slope 45 Prediction and Goodness-of-Fit 47 sivapnis afayjoo Jo ajdwes © uo wep wos qe onwapene vydxa 01 moy—aid rune uo pasea "woreatdde ay “noysnonqy A “pun e payoo| are -osse Jo somseaus wot] Pox Jo sainseau!) Ssoyyy toofoud ypueasot ‘Arona ve spenuawsepuny f ‘uorsuaqardiuos asour tp oy uopstioixe axa ‘pasiaboe st uoyssax8az aetseatg Uo 21 ‘vatgang “08 Ayaiagduioo tou Inq “UOHEYIP St YOIYA “UoIssasTax a1ewEA puryssapun 0} xs0mpun0s3 S9pIAOI xo] Juouasnsvaur ‘yswour “Bua * 12} nok ‘SayqeLEA oat Uaoiniog (4 S,uossead) 12 ne|2us00 ayp anoge ‘Kes “Furuswoy Ae Uuonepunos usyy © 221nboe 02 fi rooidde 0], sishyeue uosssaia1 afdninus ovr Ajorerpawuuy dunt pue iyfeud soa0 diys oF ‘siuapnis Zuruutfoq Xq woyo epeut ‘oyeIstw © st idaiog arous 303 $9019 Sumpring se warp Huss ft pgs Auous yo Bunuz09f 3 sn put sfoct 2604 azInb" OF 21109 2M OP MOH AS ou) sanjoaut sissyeue e1eq, ‘U0 25[8j 830 sang SuuE=U jEUE rep 30.488) 9H, geod 10t op size} ndog oraetyoq weusny Jo pLHom ay wos}—SuON .298q0 [votatduio—s}ae} ae U>dh EI¢p ZoudIos [eIo9s ,9e4 50 dno, se (,unuep,, Jo rend) ep, sougop Cowon tT 2D MONS OGRA NOLLDAGOINI SMOLIGT SLUTS Le toWyny au NOGY gL URI ZL suondunssy worssoxSoy aus, Ipuaddy “6 ZL suopepusumosy 8 TL uolsnjsues pure Aseusung 19 AUsBOLTTOON, 09 sayqurea A Lg soug worreayroeds s¢ _ ronwog jeonsteig Jo ys aid es uopssartoy adn “L € pin Arewuing V 6p sveAroruy 20% Tree ee ee ee Data (No. 85, B Tendency and Varia (Wo. 7, Reynolds), Analysis of Ordinal Data (No. 8, Hi Rosenthal), Measures of Associc metric Measures of Association (No. 91, Gibbons); on sig see Tests of Significance (No. 4, Henkel) and Understanding Si Testing (No. 73, Mohr); on regression sce Applied Regres in Multiple Regi Inseraction Effec Wan), Regression Diagnostics (No. 79, Fox), Understanding Regre: Assumptions (No 92, Berry), and Regression With Dummy Variables (No 93, Hardy). (No. 32, Liebetrau), and Nonpara- te Regression in Practice (No 50, Berry & Feldman), m (No. 72, Jaccard, Tutt Michael §. Lewis-Beck Series ACKNOWLEDGMENTS well, [extend thanks to my Wi the equations in the text. 1 Jowa deserve recognition, for previous versions of this material; and the students of the TARKI Summer ‘School in Budapest, who were the first to be exposed tothe (almost) final version, Last, I want to thank C. Deborah Laughton, Editor at Sage, who has helped me vastly on this and several other projects 1 yenqeas pue s59}98 01 Siiliqe Apead ® ioyaseasar aauayas je190s TuwUtBoq ‘yy 2418 01 St 2194, ‘tou anbjut}oai jo Aro1seu sarensuowop Oy ‘ydesZouour sip Jo snd} 2x1 "yareasas uroyy sowi0d if u2y ‘uy pasqaraxa 9q Ayprewnyn saw wawiSpny A.9noo saunpanoid [eunioy Uo Aju 0} tay SuIOUy ‘Ss uy mou 1242102891 Poop POLI 109100, Hf JO ‘301 Buyai8 “feustojur ore S12XNO 515 ‘ap ‘sounseou pur sasayiod Ay pat ¥otssnosipquononasu jo 2pou a pjnoys “owes: Ixy soo ggof 241 UO ‘ay soIAeyaq uewny Hoge LoRsaNb e yaEM sU1B2q YozeDS91 dOUDE NOLLOMGOULNI'T moj fo kussoatupy ‘MOUE-SIMAT 'S TAVHOIA uononponuy uy SISXTVNV VIVG tests appropriate for the question at hand Chapter 2 considers sk of data gathering, offering a data set to be used throughout fe the various analytic techniques. Chapter 3 introduces univariate statistics, the mnship Chapter 6 focises on simple regression, in which a dependent vari fluenced by an independent variable. Chapter 7 explicates multiple regression, in which a dependent variable is influenced by several independent variables. Chapter 8, by way of conclusion, offers analysis recommendations 2. DATA GATHERING ides the data process, which typically goes through the following stages: 1 Sample Measures Coding IV. Entry V. Checking Each of these stages is discussed bel research example. Although data gathering is prelude to analysis, the importance of its proper execution cannot be overemphasized (For an in-depth treatment of the data process, sce Bourque & Clark, ings true. No amount of sophisticated statistical manipulation will make up for the deficiencies of bad data, ‘The Research Question ‘What causes something to happen? Is almost ig up a for investigation. They ponder possible explanations for the theories. Tohelp sort case of ¥. In that ing hypothe: Of course for ‘case, we would expect data analysis to support the fol Iue, then ¥ tends to i overarching research question is “Why do some students do bet others?” In pursuit of an answer, we construct a hypothetical but realistic Academic Affairs Office of Wintergreen College asks us to investigate the determinants of success on the and has provided funds fora student survey Unfortunately as is virtually always the case in social esearch, the available budget does not permit the crest (the incomming class)to be surveyed Therefore, a sample mast be drawn, ‘The Sample method here, which we employ, iple random sampling (SRS). The population of the Ist-year class of Wintergreen 1¢ and money allow us to i ding a sampling fraction of -Year student list provided by the th a random start is referred to as \ Therefore, from the alphabetical Registrar, we select every tenth name, beginning ° se se : ra ¥ . 8 a t u ra é 9 te : 6 oe ; zw @ 4 ® @ 8 a tt sx i se w es 6 zw °* i s o 3 si rn a 96 “ e 3 « st a #1 rn a ss ui ss " 16 or a 6 «& & B “ 6 5 a s ¥s * & £ * z % 1 > 5 4 WS dW rnd eee Bie aoyion wsasBiaLULM “reaave® ‘Kem Suumoitoy oq ut ‘afaq]09 ut povcons oF ol 124 10 si avenyeAxs.o1 Hopmis om S4SE* (WS tttRTOD Tz 2LqRL) nS 91qUTEA a41 HO§ OF PaAsaS y>MGAS “WI 3uQ ‘saIgeREA iar0d 104} UO sosuodsas papjark Koxins wapmes yf, s .0uyej ay inoge Suryse ‘uonsonb jo pupy aures aya sasod wai} puosos y [52004 yo soquir lie ‘poasooax soyiout snok &es nok pina >sasue Uap], {204380} jooqss jo s1e9k KUEW MOH, {ky are souewssojsod 180 "UBA quapuadeput [e:DAas Buydrea ays 905 am "yy UUIRIO9 *| i uo 2102s wopms Kq pans 24 WUT 2 Sf JOqUIMG LEIS WopuEs YENI H9AID CTERGL] UOHTE 295 S018 ~ojoporpats Bay dares jo afesan00 Jog zumpavard uopd2qor anouaists Xp TABLE? Continved "Some students puta fot oftheir fre time into studies, even on the weekends ‘Others think their free timeis their own, What about you? In terms of studyi extra hours, would you say you are willing, not wi in in Table 2.1, columns G,'R, and C student survey data, we also have at able information from the student's college admission ap falue for our stud} 18 with the others we the variables measured have discussed. A\ (ot measurable) in ou analyze on subsequent pages. + obvious, consi Data Coding, Entry, and Checking The values of the different variables should be efficiently stored, for ready computer use. For certain variables, the values to be recorded are ofthe directly gathered rimerie scores themselve3. To takeanexample, the value to save isthe point score larly with Parents" Education schooling —is the one to enter However, (the symbol employed to indicate the v the number—average years of other variables, the value code response category. Cl vate a particu: ideed, Table 2.1 appears much like a typical data file from an inter- active personal computer s for coding errors ‘Thi rectly? A review of the figures in Table 21 reveals, encouragingly, no codes, mei ues outside ofthe possible range of scores on each ble (e-g., no one got a score of “| Student Motivation). Howeve recording of a response categor student may have responde: but was improperly coded as “un cach member of our research teat woneaniout jo vunowe syenbo 2109s ueau 340 NOW WapMS xp KLM ‘a[dureKa JO ‘uoneraudiowyy dieys si s9so aBeaane oun ‘siumoa ajqeiaidianuy wen runes .59P00,, P2!990 fifear are 3|qwseA [PU;pIO SI4) UO SarODs ayy 2sNeo9_, 01098 ut 3 Jo Uo} pareanow 5e 231 (le POxUE!) puoods aw WOK APMIS 0} PoEAHOU aHOWN yuapms 1s1y) 942 YeKN apa|s4O9 Om. ,/papro9pe sfes juapmis 2u0 31 (0 = P02) , = apo) 9s jeuipuo ue Suoje posnseou 10} ‘21qeueA wexa-douesua ayy, “worstoaid Jo Jax] YBty e aACy ‘ouIOOU 40 ae se yons ‘Sajgeuea aarsoniuond) “Kouspuoy [enus0 Jo storeorput soup axe ojqeyaudiojay auout aq) “ivowsnseau oy) astooid asou yt ‘aojaq pansind 24 Iya ota “S94 sin ‘samoue JO “(Ly puE “zp ‘ST SON ‘siuapmIs aaxi A{uo 405 anyea 2M Ajrenise Sx ysis) spout atp weMy s9se9 asour Kueus Uo posed st af “uoseo! uo 10x ‘URDU amp Yord Aiqeqosd pinom aM “a}geLTEA stun WO auapuar enuea aun azueuruins 0 seznseauu aap asou) Jo auo Isnt 199198 0} p2210} axa aa J] uoHopoudsowuy re|UNS © SPIOIK TZ Jo 9pOUE OUL (92 PUL Sz om Joa8esoqe ue suasoudas fipemve ueypaus aut “2404 STF se ‘roquimi uae we st 971s oxdumes oun uoqmN) “SZL St ‘BAOgE JreK PLE ‘0799 sree cnn ‘100s werpaur se, -ueaUEaqp Kq HAs ywopmys jeoidAL at 16 i 1 Sh wuLJuoD SamMseat BAGTEUAAITE ay squared 294 20 s1y pue wapras SuwOoUE ot 203 wonouusoyu ajqenyea st sup ‘283n09 jQ 2ouBULLO}ed ,POO8, q ‘uy Zuqusmy ‘urexo eoursqua aq) Uo [19m Atsy s90p aBajOD v3 majouad 1ysit aug ‘s}eadsoxd “ 6E-0L PUE "POO L494 ., 68-08 "49119" fam. 01-06 Bus098 stuapets a sod QOT 211 wonbady yar 901 pout up ‘onyen aezane agp st (Woot 2 ‘pou pu ‘ueipaur ‘ueaur ae Aowoptat fouapusy (82919) 2 WOM “FaUO} 24D WEAN 20 ‘mojaqJ0u}0 a4 wou} 2Uo “oe SuONTEATDS fodsip jo sainseayy, Sutueaut Kseurwns euiown Susprrosd ‘suoreAsosqo ott a9 Jo sunseayy $0109 Jo pratds zit puodas ay ‘91095 fea at Sus30009 141 aff O}SLadsyp pus <2uapwa) 1911429 ren Aue jo stoadse oni aye sso] EP pIDtK saap95 yo saimaea} Koy 24) Jo Uo! SOLLSILV.LS ALVIAVAINA “€ Jo 2g pynom suaydeys uanbas -qns ut podopsaap sonbruyoa) sisKyeue oxi Jo woneaydde ayy “wojqord rep Buysiui snotago Ue tm pany d10m om pue ‘arRD atp UDG TOW SHUT tte pay Jonsue 0) esnyo4 Jo s9[ao4gytp aun seumuf> AyeUass pire SoHE ‘ofoqe9 asuodsau ajeudosdde aan dojaaap sn padjay Buyuuryd 2oueape smo 4 sreusoyfesostp 280up Jo 240 potoa|2s aqdures ay ut wopmas 2p Uo ep Suysstur ou are oxoyy ‘Ky Sursudins rea au 30} cep Burst ow re asoyp*KOAINS 3M et ceececreceer C coceece « 6 The other level of qual in terms of presence orabsence of an attribute. Further, that attribute cannot be ordered, or scaled. Usual examples are region, gende n. Our study records the Rel column R). The variable has three categories—Catho Protestant (code = "). The scores have no ‘meaning at all, other than to designa a1" says the student is Protestant, not Catholic or Je does not mean thatthe student possesses more of something than astudent withthe score of “0,” Iter words te sores do not imply any ordering of a property, from “more” Sete n of the mean or the m« cease, the mode i ics, compared to 20 Pi igion in this sample, we might re equals 4, indicating the proportion ofthe sa the average score for the Community Type variable is 40, whi that 40% of the students in the sample come froma. dichotomous variables can be rat bach as qu lows dichotomous variables a more flexible rariate analysis Dispersion How spread out arc the scores on a variable? In our particular study, how . For quantitative variables, range measures the distance st to the lowest score. With the quantitative variable of ys the range is 70 (.¢.,99 29). This wide range suggests, atthe students differ greatly ty. For ive variables, the range is better understood as the count of the choices, or categories, recorded. To take an instance, the ordinal variable n, resides between these extremes. It has 46% of we middle eategory ("I"), with 26% on one side (a decided” is the most selected something else tur to the situation of the nominal variable, Zero spread occurs he values are in one category However, concentration cannot be ipty middle,” as it was in the 07 is not ordered and therefore has no’ I variable, maximum spread occurs when peo at 0} sons asoip Bunstddy (soseo ogi Jo ggg Buywsousas ou 01 199d up9 nok “tsi “stoNPIA9p puepuers 96 | Aire nieingp puspueis Z 8049 puokog) suoney 4p6 "ueaus ay) Woss uoREIAap psEp 2 Ue $9500 ot Jo 289 MOgE IEW A yo ajdures © ua4a 104) 995 2m SOAIMOH yeyMOWOS B PIOYE plnod om YSIM IUTIUE IM OS = N caxdiaqu 24 wo roedun -pats09 juewodurt we nur ay 01 da ppe yyy “west dues ou Buy 1 sjuypay) paseiqun'aq a1 oes N69 Burpraip “uorsan0> siqt ‘mou yon sanjeA PoAsasqo W 2 wopoai{ fo 22182p | dn posn 2aey am ‘yn 20 Aressa90U st UeML FOND [= teoneuraiyeu ayy sn 52 nop pamptas 1 67 J0 1002 a2enbs @ dulexo ino u] -2oueties ay Jo 100s axenbs lap puopuots ap aye|nope9 9m “SeBeIUEAPE eu aye SounsTau aS HIOg JO F9RELUEAD Tuonejasdsay Asva ow sey 9 Zaquun a “a}LwrExS ‘oounssuy soy ulueans aaqmyUE ow sey AE YOUN St 98e 3s atqnsisap asoi 9484 SoOp “a>ueLEA (09 up BIn99x9 oF WEN AIeNIUE Ke us S52, SUE StF p amnjosqe a2eaa48 oe} 20) astenbo BS 2801240 21 PU032S pauidis agi asneaeg ‘0192 Jo Uo! ABaqens stun oxzmopy 9Hes9ne U = 9 ~ 2) weal 21p tuosy anpEA YoeD Jo LONELAAP ot OTe s{ asindut s1y y cueDut sig) punoze sex00$ jo peasds at sjenba sa1oos aso Jo ueow 94] Ot ‘L ‘SZ :Swol{o} Se ‘sareNpes8 [ooKHS {gary ug- Bao] 3n0} Jo uorreyndod jets € uo pasmseaut ‘wox Sossef> WOH oyquyiea au uiSeUy 2391 DOW {anyeA Fuuoyoue uorstadsip yesaua8 axp yo ‘a0 opnso v Kypornuupe ‘asus © $2 ‘4 aonoeid jeissaaonuoa stay. erep ({eurWOU 1949U 1m) nofta sauanowos st uoreINap pIEpUeTS ap Teonoend yoseasor ut (Jaap wuotuasnsesur Tunooe oMu Super “peozds pue ‘Aouapuat ena Jo samseous Krew Jo woueasi vsapiAord’Z661 3294512 ‘yderouou siqy 30 suorewmuy aoeds 217) puosoq ave pur pasn $59] yput Jatiag “wiaigosd saypino we Kons Uy ‘ ayqeuea wo 2108 aBe peordAs ay) yo aunseaut zeus 200d © 4 “Eye sjenba ucous at ‘opdurexa 2jdunss sit uy somseaut no vo 294: snousas @ aney we9 SJ2HTINg 980 UO St poremdos Suuseadde (2) 12uB4y 1x0 ays OS} IMO [19% SH “BE ‘AHIPA SEL “O alquutea uo a8e Jo s1604 205 Sanya Jo sOas poyse Bur2q st oxpads ap uy ‘2su9yop pood sazinbar us ‘130i ‘poyoau 34 fyjfaneuroine tout wt J &pmis z9pun ue up anoqe pauvay8 2q 0} VOLE, yeuts0u a Yeu o1voEpuUL Host Js Ssaumoys ax OYIOBOA WOHEL ( mojaq passnosip a8 suo eusou ss9y uorInguisip at 2peUt fj 2]qouea ayy 3u:38oj Jo poyiat Krousosno amp Aq ajduses 24 anoadurt 0} voy [oquauiuedxa we £q paosoyutex S| uotsnyouo> S141) PIL ‘uoig a1ous outst ssoumoys dunsaBSns Bur.oo yerp wos} 205 2940S (Cp JO) us anyea ainjosqe aq (69.4 ‘2661 "AID ¥ anbmnog) ‘sauoong fjfenioe uorngunsip aq 21029 anfeA eynjosge jes ssoumays ay) TBA “quuTyi-jo-o[Al © SB “PURO} 1d yeai8 & asod you sop Arepunog soda ue 0 aigenyea aun Se ySnompry (62 4‘086 *(gandeyp iva 998) 522005 pu taoisnjauos siqi searojutas me ith these transformations, and variable transforma do not always have a meaningful vvatiable Q, measured in years of meaning. However, transformed variable @ is measured of yéars of age, which has no intuitive sense. For reasons of interpretation, is sometimes avoid transformations. Still, they should recog- alue can be retrieved, if sodesired. For instance here, square root score is squared, we are back to the original age in years (Also, variable transformations are sometimes useful for other ‘purposes, as shall be seen in the multiple regression chapter) ‘The third strategy simply recognizes the ouiliers but makes no attempt to exclude or alter them, The argument is straightforward. Because the study is well designed, with a lange random sample and good measures, the outliers are to be trusted as representing genuine values in the population ‘That they may affect the statisties on central tendency and dispersion isi the natute of things and should be duly reported, even if the task of is made more difficult that holds regandless of whethe ‘The suggestion, then, i across both further data gathering and diagnosis. 4. MEASURES OF ASSOCIATION ‘The ental question, For uch social science research, show one variable relates to another Does social class have anything to do participation? If there is a relationship between class and pol strong? Although these are questions from a specific area of research in political sociology, they represent a broader, generic, set. How ddes vari- able X relate to variable ¥? Is that relationship strong? Below we offer different measures of bi order to help answer these questions Correlation When two variables are related, changes in one tend to accompany changes in the other Suppose that as the valucs on X become higher, we observe that the values on Y generally become higher. We would conclude that there was a positive relationship between X and ¥, This seems to be and Academic Ability, according to the scatterplot of ovr sample data from the Winter- sgrcon College study in Figure 4 | scores are arrayed on the ¥ ax scores on the X: jepresent the stud cach is located by the intersection of perpendicular lines, respectively, from ‘Academic Abilty (AA) 191} 909 worTe|a1s09 at ayer 8 ‘64: sienb> jati09 ajdures aya ‘Xpnis a9q[09 380 Uy “pa 15 [Cues 81 PUB ajdues ayy oe 45 5 su ew ot, ales ‘sajqeyea ‘paziprepueys 30 ‘23008 piepusts uo peindwos aoupysmsos ojduies ayy sjenbo Kjarou prepueis uy passaidxa OU SF OTQELEA OM tye = (6 RI-) $86 I~ = (6 2/9) sss1008 prepares 1 Su oauos MOU UeD 9m ‘6 Z SEM UONEIARp prepuers atp BuNtero4 “> PUE “p= Jo ‘hiaauoadsas ‘wea omy wods suonetaap 3 Jo ueau aU. 01 pur *z ‘¢ Z Jo sal00s jeutSt10 pey “sassejo uorTeanp2 pe yo sreak “a[qetea uonejadod ayp ory ur ‘atdeyo Isey aun wo4s 195 exe ajduys Kian aun afer ‘ajduiexo ue sy ‘uonelaap propues 24) £q papiaip 226 (Gnoge ['puonenbg ur o1qeres soresounu w 998) weou sip wHO3s suOHEL Aap ‘2uy“paqjea are Aaqy se '62102s panpunss 2sayy 01 ]qeUA B JO 23098 a4 1298 “oo of, swqun woneraap pmepueis o14t parianitoa sojavuea yiem poreino|eo BOUELIBAGD ay2 SE IUDIOT]S00 UOREILOD oxp JO UML OF {A}AIey SE AT ‘onsHEs Tey si uaroifo0o won mjaxc03 ayy, ArepumnOM 2 1 feaau0ou jun quawosnseaw 0} snovsoduat diysuonejas jo onsnitis Grwwins ‘onypa 228] & uo 23H Ue ‘wer w soonpasd isnt eyo (0 ‘4) uBis up BuURUsO}9p 20} aygenyeA st aoUELIeA aydues sig ut 100 ‘Zg-c¢ syenba Ania otwapeay pue 199 douRURAD aif, (40}EWHSD PeseIquN TN Aq wor soul jo srouposd ogy ! oy) w0¥y Aap ay Te sasn Kyeoneuiaisks ansteIs aouEHEAOD a4] IN ___taoueueno = (A ax-0 K A puex so[qeisen taoastog Ms ‘oounseson ajduues ay) sayeinoje9 e[nws0] BULAO[LOS (Buss & ut (Jou 40) 3 odds pusy siuespenb sau0 91 ‘UHOg StU jespenb yseayvou ayp wos} 225 2M x Uo a8 asoup Tey st worReI2adx9 au) ‘aansod st diysuonja: 2 ft (BEL = X “PIL = 4) So1098 uBaut oadsas otf Mo[>q 0 2AOE Ue 10 paseg‘stuespenb anoy crus roid ‘q uaiussasse rensta astooud osour ‘Aingy aquiapeay wo sonjen sry ancy uo sanjea yain 3x6y os swuopms rey sieadde od sisoBns siuiod so iaieos sry, yder3 yt wo ods anbyun e 3 fe sey {1 ‘ON wWepuodssy mapmg ‘2}duiexa 10g “stxe y>ed Uo anteA sit ised in Chapter 7) Ifthe relationship is not linear, ent poorly estimates it and should not be applied ty, the scatterplot should Obyiously, 10 ai always be const ‘The term “correlation” is a common shorthand, and used without qu: lhe above coefficient. “Pearson's 7 who led its development at the end of the 19th century), has received more appl the ideal bivariate measure for quanti consideration Below we examine 1 tual ‘about the presence of a fed to demonstrate this conclusion ity resides with the level of measuremer ‘strength (Doubi themselves ) The diffi there are so few values on X and Y well-spread over the, le Instead, are dense bunches of points on a scarce number of interpr th ordinal data analysis, the contingency table scatterplot as a preliminary means of evaluating the tabulation of Stadent Motivation and Advisor ‘each of the cells, no student wit likely to “fai” by the advisor. customarily regarded as independent (X)_AL the side is the row va customarily regarded as dependent (Y). The theoretical argurnent is that TABLES Ia ‘The Observed Relationship Between Student Motivation and Advisor Evaluation, Total N'= $0 Advizor ‘Student Motivation Evaluation Ror Wilting Undecided Fail 46% 20% o o . Undecided 465% 48% 51% 6 ay Succeed 2% “) Tout 100% A first test of this hypothesis comes from evaluation of percentage difeences he ests stighforvar, provided the tables leny st, idependent variable at the top (A common mistake is to place it hen attempt to apply the estas if it were inthe column. Always ‘check to see that the independent variable is the column variable ) In each columr ion, summing to i the percentage who selected the different dependent variable given the particular independent variable For exam- the independent variable category of “not * on Student jon (score of 0, column likely to (Score of 1); and the ions to be grea is so? To answer the qu sor Evaluat usamrag diyjsuonejas ansod apetapour v “jes9K0 “Sateoxpur SIy) “gE = ge) ietp 2919890 9m "21 4 21g Jo 9jdwexs evep (enIse 94201 gener wag) ‘s2y10 Yoe. jo iwopuadapuy 19m 4 pu x J uh AW “0-1 = 4-1 Ou “QL'y ataUL Jo € x € BUI UE parotdap re se Yon ‘diysuoneras woapiad 40g ( apmnutew wasayp KyBys JO saquinU e spjax4 {Ayyeasn yous 3-nor “yuotouj909 parsnpe reysauuos e Lodas 01 124916 Kou ase seas 10u st ajqed at ayn AaepUOg [wOHaIOaKN on) 0 CerarMerato) (ex) ten “(69 49961 ‘aearqar7) eaus0y BupmOMO} YL UL UDds Se ‘SoH YOR I= 40) 0 1+ 40 punog toque sy yeods 01 os ‘stsaxpodAy ano , itm spoouoa,, stiapaus Jo aed yong ‘woneneag Jostapy uo Joyfiy sal0Ds oSfe pue ‘juopmis se|nonsed s2qioue UeW) uoneAnoWy IeapMig Wo JOU sry sa!00s wapnis auo ‘ojdusExXD og sese9 Jo ued sunpioouos v spraik (4 < "4 pie "y < Ix ‘Kuqiqissod 2U0 SW “A PUD X so[qoUeA uo saroDs Yat Yoes “Lf pul 1 siuepuodsor asoddiig “ou e 18 aed v “foams w 0} sqwopuodsor sip mnoqe UG, st you op saouaxaysip aeyuooiod ot ui st ret es—,202}20d 101, pu hi Luo AioBayeo 124814 w 01 asour e Aq parueduioooe sXeafe ‘Saou € ‘2I9}4) 1 b a1g8L UE ciysuon|ar reuoZo1p yee ut papzooes Aauanibasy at ajsod jou str ayn. {RAIN YOU SE pur X USAT 1sa8Sns saoussayip aBewoasod 25a ae" ue spfaiA Siu. Siuepass pareanyout siuapmis Buoure “sourDnx9 ay n9q atp S94 2sur94R ‘sejusaiod “popisapun, *2 0 © 10}) Z¢ 01 g wos} 90% KroRarE9 Josiape IsoyBq siyy ut oBeyTdoLId er oy fe oun ope 24 4 9194 'LON. 1 prsoons, popinepan PepR=P serinaop 075 05 =N [PO], woHTENTeag JOSIAPY PUE Uo ‘usomsiog dlysuone|ay JeUIpIO 19ajs0d jeaNeKIOAAH V ar yatavi 26 TABLES te : ‘A Hypothetical Nonmonotonic Relationship Between Student Motivation and Advisor Evaluation, Total N = 50 ‘Advicor Evaluation ot Witting Undec Wiling Pai 100% “> Undecided Secceed 100% oe Tout 1008 100% ‘nox NOTE: The erm nthe ibe ae defied aso Table Pearson's r, appropriate applic ship estimated was linear. With ordinal vari- xr precision. However, they are ‘or example, when a relations! tend to be accompanied by increases in ¥. (Itis not expected, though, that each inerease in X produces the same numerical increase in ¥ Thus the mi linearity condition ) Ob: ‘two ordinal vari- shows Here one observes, in her X values do not always tend to lead to hi second X rank to the The relationship between X and ¥, even though perfect 1 measure of association, su Nominal Data: Goodman and Kruskals’ Lambda satwo nominal variable: In assessing the relationship be measure of association such as tau-b. makes no sense to apply an ordinal 2 TABLE42 “The Observed Relationship Between Community Type and Religious Alfiliation, Total W = 50 i Community Type Religious Afliation Urban Rural cathotie 30R tat 3) © Protestant 30% er o ay Jewish a o Total Tot NOTE: The tem inthe tbl av defined ain Tbe 4 te ial variables lack the ordering property of “ ation ean be pre Therefore, in the table, Community Type independent variable (X) position, and Reli side Are the two variables associated? Look at perce {reading across). Fully 55% of the rural students arc Protestant, as com- pared to 30% of urban students, for a percentage difference of 25 points ink. However, the other comparable percentage ty 20 points for Catholics and 5 se percentage differences do not give a clear descrip- ip between community and religion What is needed as Goodman and Kruskals’ 983, pp. 16-24) ‘whose calculation and us how moch we can 4 34 Uo *SSa]DYROAI TPOADH YF sooUDHO}}Ip oH. swi0jdx9 ue y>qMo}ouoUIUON Jo aouasaid ata Se YONS “IED 2 tu sonapiqns yo 100, Sau sys woneo4tdde anssnyoxa stow pu“ 0 saunseaun ayeqreasg Jo wyeau a4 3sneyx9 10U Op sao!0yo asoq ‘951009 JO ‘epquie] 20} sje> wouroinseou jeujuou pur ‘ney 10 seo yuowasNseaW! [eUIpz0 “7 s,uosiea 10j ste uoK! web “esaUe8 Uy soqqeuea oq) Jo wwowaunseaut so foxot LUOREIOOSSE ay aiMSeaL! O} SOYSIM JOYDZERSAI 99U9195 [EI90S OM “Ue, uojsnjpuo9 pur Kinuuing saydoyo 1x0u oy) ut da woyer st Yor “Buns o0ueD uswares ® yans pusysidwos oy, {ueDt eqn Soop rey Ing, THEOYTUTES, OU sem NsoH OMT EMI MOYS PINOM 1501 & 38 uy dn paddod , yp, s9quunu axa ‘aoueyo £q 3s sou ajdures asorp Jo siseq axp Uo “soqqeLIEA omy a4 Ua94NI9G 4! ow ayp wsuyeBe andue 01 nay} Waas plnom 1 LEAL iuapais so uonejadod ay ut ‘paapuy 0197 01 25010 ne[au09 axp no SuIML E SY ‘sajqeuen oma asauy atejauioo ‘fyarepdosdde ‘Aews an Jo sigeuea eanzinuenb ayy oF ayeIa1 19pue8 s20q (9 = Kaiooyayp ® Sf Topp JO 9|GEEIEA 94, “2589 1 ‘spincad Apmis ano Jo eiep aULE ‘Pareyno|e9 29 pINOD.« UE LR ‘1am £ 51 (0) pu ayeudardde 2q pinom mea © woth fe 1 sajgeyea om) ot s{ aotoyp uowerDosse ay ueKp “x Aes “AUIOIO¥: yo avo pt saying -s,uosieag e Koyduto Aa ave 4 pu X Woy say aarrernuenb 0} feurps0 din 9xous am se popuoyse st ay snorfrjoy pur odAy Aagunuwioc uoaisiog oauro> passond ag 01 suor8ia1 ruapms a10u 2 vruguios sanopays aun Jo a8payMouy rein BAro5q0 9m “LT =UPqUE] OOK “pyduixe sno uy qe 1 diay ou st x Huumouy wom '9 = Bpqui jt ‘2tH2.1x9 raino ayn ry Jo uontpaxd ssaynoy swore x BuLmoury Hamp OT = ePaweT 31 ‘ausnx9 2u0 ap IY 0.01 OL WOH] aBue Epgtue] JO sonIeA 21RHsS0q x Supnouy jousiou tol Yup wouy{ss0159 wonsip>) LU = 6zrs = epauset ‘99H “ePUIN P>qAgEL euonod § Aq sou Buymouy “122240 G+ cam snd sowed 9 snd suesoiorg 6 200° poonpat x sau (smog ang “Rouanbasyaso8se] 2m) $04 UT ‘sv ABayens owes axp mo}[0} NOA ‘40119 voHoIpaxd 22 Xx Y900 UIA f $0 ‘ouanbays up Jo 98paqmouy SAOTTE YA ~Sonjea x 291 U0 uoniewHoyu mp paprrord osye axe nox ep 9UEBEUHY MON, ‘toutes tou are oye stu2prNS 67 9k AfautUt ‘70310 a1qBIEPISUOD fusoyut te yong, SIUOPTAS 17 HI soyesouad ABajens uonotpasd 100d ng ‘Koutanboxs 19830] 24) nu no "1049 aru Jo) 4 Jo wonngunsip Aouanbauy 1280 amp Ajuo SuymoUy, wapms Yoe9 rorpasd 0} ysta Nok asoddns IIL Zp AGEL JO ny 29p1su0D ‘Y MOUY 2m 29u0f Furoepazd wts0419 No sap 8% tf creeeccce c ccececccce ¢ Ccoecececcccrs 30 ger magnitude than tau. As a case in point, from ots report of the lationship between Student fon and Advisor Evaluation, tau-b = 38 but gamma = 58. This not atypical and comes neglect of “tied” pairs 1 is evident square st alternative to lambda (Liebe confidence is g: calculated on our student us about the real relation: stance, does the tau-b = 38 n and the Acai Evaluation variables in the a sample, we would expect ely th ight. But how far off can we say itis? Could the trse value say 172 Could it even be near 00, with the ed? ship between variables X and Y in a ly significant?” If a hypothesis rel xy have anything to 31 estimation of that simplest of population parameters—the mean, Such @ background makes it easy to grasp the interpretation o for the more complex, bivariate measures of as treatment of significance testing, see Mohr, 1990.) ‘The Logie: A Simple Example ion, let us explore another variable that happens also and retumed books. The regi ical interest is in the degree of delinquency library use, as indicated by late returns (There is also a with some admi abuse) Every student surveyed was assigned a current Library Book Retum score, equal to the number of days early (~) of late (+) they brought back their last book. (Thus. student who was 3 days overdue would receive score of +3, whereas a student who turned in the book 2 days early would receive a score of -2) One hypothesis is that the typical Ist-year student is rule-abiding and returns books right on time, in which case the average Library Book Return seore would equal zero An alternative is that Ist-year students generally fail to retarn books on time, ‘case the average Library Book Return score will be greater than zero. (An opy jemnative, even if remote, is that they actually tend to return the books zero.) These compose two hypotheses-—a null and an open-ended alterna- tive—about jty the mean of Library Book Return days (variable jon: that mean is zer0 or it isnot. We may Hg Hy = 0 Hy uy #0 rary Book Return age enough that, ue. (We have sampled off the mark, 10 do so), In our sample, suppose we estimate the mean of i019 $1 wwaur uowejndod 83 241 Jo 094-5] ay UL sKep LIMO, 9 80} MONTY 2m Kes “Dou SUL es tin) teu, : Cece ete sean eae annL ee 8 Aone toe ee ee mh 6 AIM. Coma ers fo Cuonbouy 9 spa ‘pokease are sarewtsa Ueaus ay) 30} S9OTEA WALA}IEP ay se Buoy “1'¢ 2intg 29s) onan poussoU e mofo} 43 ord vay “axos10yy %t“uvaur vorrendod 2 ayes ‘soioas omy ‘Kioseg ‘sonadoud 91qury7eu9 sydures 2 1 ‘ajqutzea tuo $21098 01 yunOUE S $ pimom soqewse SeUsH9WOS 30 ‘oge at Souatos ‘aures 4) 29 OF a1 (one toad ou pnom ani wan ood AzegrT wea oUt JO SIEL ND or 128 0s pue ‘suas 10 50 (9s = A sean) setdues HOPS aid jewsy ys0saqUt Jo snes amp Sumeusts> ou ‘xvsp 29m 204 SUBELIL OF 0 109 0} pao an “UO! noge sa wou! ued ys ajduves © moy soH99 puenszopun OL mova 1 souwoytudis v Kidde smu 9 supaur poreuinis> oyun am yiyan Jo stusuoduros ay ny toeope a jo uegusia Buns 1 2508s Ty overdo 34 3” directly to our rival hypotheses. Assuming the Let us apply 2" of the above Z-score formula: X= tL [53a] 2-3Dq) een 15.36) SD.%) ero. Therefore, we ree pall hypothesis and entertain the rival that, generally speaking, 1st-year students ae late library books The only difficulty with the above conclusion of statistical significance is that the test assumes the population standard deviation of X, Library Book Retums, is known (see again the numerator of Equation study, or almost any ol do have at hand - 154) 35 tion, rather than the kle, in terms of testing, is that 0 account, to select just the right grees of freedom need to be ta istribution For our case, the -st 49 degrees of freedom. To carry out a significance test at the 0S level, we may refer toa table (a the back of any statistics book) and find for Z, of 1.96. These critical values are wi ‘once the sample size passes 30 or so.) If our t and we know why we are dei same for the bivariate st complicated and the the measures of ass which the null of hypothesis of a r ionship More formally, sisoqjody souspuadapur ‘yp roofs am ‘onywa yBorIED B sposaxs “mofaq e[AULO} at UP: se ‘anjpa parenbs-i4o au J] vonsanb syyt zamstie sd 1 SUL cA pue x Uaamieg aovapuadopul Jo siso\Iod.y aun 19afeN ISM Om EU solouanbayy paisadxa ain wosy yBnous 18) sofouonbayy paniasqo ui a1y “Kysojaurg uequn 9q "9tL = (1z x 09) vou Se eee yy Jo uspusd sy ad, Aununto Jy woneLysY rf pue adKy Ayunuswod 24 ‘Zp sige ‘alduexe ue se ‘ore, sojQetieA aus Jo eouapuadaput SutAopun amp 20] pjnom Sorouanbayy 1]99 portosqo oy) “ex9¥03 Uy S919 -uanbau 1190 poarosqo sy) inoge suoneisadxa uyeui99 axey om ‘so}qerreA aut jo ajdwes wopues © uaai8 ‘rays, wonendod om ‘Ayastooud axous ‘40 parejosun axe ayqeLeA UW asoddng 1s) 210mbs-249 oy Sts se ssejo mmaKasy amp UY Joyo ype exp ureLs99 Ap ppasn 9q wea you °q-ne} 103 1019 prepuris ayeuutyoudde ue a4¥8 soBeysed rajndutoo wsoyy (pC-z dd “£661 ‘SUOGGID 295) JeULIOU O4 25019 udures © sey if ‘O¢ e 1 BY Wopaaiy Jo somo Z ~ NV YIM uonngE aoe sO NON vone|edod euou 38 651 is rejected. For the case at hand, 4? = 3.19. This falls short value of 5.99 (at 2 degrees of freedom) necessary for sig We cannot reject the hypothesis test results from the lambda, we Religion are related ‘Above, we have considered significance tests appropriate to different though the particulars vary, according to the ‘conclude that Community Typé and Is the relationship significant? If “yes,” reje is probably a relationship) If “no,” do n there is probably not a relationship). These, of course, are the bare bones, and no such set of rules, mindlessly ways suffice, The application and interpretation of signi es careful judg 1s req nt Below, we cover a number of central issues with which the careful -archer must cope Critical Issues When a student is init ince testing for measures, of association, some conventions may seem arbitrary Why do we focus on rejecting the null hypothesis of a zero ass when different values could be posited? One reason is that it offers a benchmark other social scientists understand and accept. Another is that we seldom agree on what those different values would be Researcher A mig it be 2, Re- argue for 4, and Researcher C might suggest 6. After because we are uncertain about what the value is—~and iertaken in the 39) important cumulative evidence of the nature of the felationship between an X anda ¥, and certainly, in principle, they can lead to the speci and testing of rival nonzero hypotheses Another perplexing issue, at least on ‘commonly chosen. Why not select tisk of being wrong 6 times out of the change of level from.05 to O1 reduces our risk of Type Lerror—rejection ofthe null even though itis true. (Under 05 we are ‘wrong 5 times out of 100, but under 01 we are wrong only I time out of 100.) However, that by this k of Type Ht is false. This is can be tedious. Fortunately, however, virtual 1 packages automatically provide for each measure the ‘ayy 0 X 272Usis9q “(pourpydxo 09 02 aiqeuen -yqo}108 wapuadap aya se 4 210% ‘A pur x ou 1qeLEA Om) osaddng X Jo uoppung ese x CoR6t p0R stm 298 “uossssten fos 01 20 “ystyqeiso 01 euodust 18 SaIGELEA Ors ‘9[80$ 240-C1-0592 v UO “sarap FEY 0} MOUS UOREIDOSsE jo samnsvou “Bupreads AijesoUDH (uoNEIaxd ~rotuy snondiqure Ajysiq e pI 40} ejnutzo) y sonoIpaid aig poreuisop oq yx Avs “a[qetsea aU0 rey) Sostnboz gatas *y ‘epg WJsoxe snolago uy) 2Uauws are soinseou 28 Spon Joho tuna Pofone st Yo puv nos popes a1geuEA pty Jo ssojpe8as roguinu aus 249 \ sy 19asue 94) JL 1801 aou8a4tuBis @ "uonsonb 4j9q og ideysu 1 Buons mo} pue gdiysuonear easou sy 54 -sonb jeiveusepuny omy &q papinB st saygousea om) waniog dys ‘ysjeue “pulul uF souDtaput Jo (eo aq dues Apmis KiqeueAuy sourye 2m ‘S202. [E1908 oy U] uojsnppuoy pus Ceuuns sony asay ap uueyy Apmas mau soy se Suyacsp SEIT 10K oF ‘aq Avut (209 40) 4 40 X Jo Douetzea a1p “yLsno4 QZ" JO + s,uossEDG B WOM uvoxjruBts 10,510 03 £24 E[ OWE Sf OF Jo4s,uosivag e ‘fends ZurDq sBun ofa OF saprey 1 S9CUE Ose YH “EUS guaUt 109 1] OKI. uy nou uaad “2ouEoy tr srojsey eum Buoim ou ye8 anvy om 10s eyep & UI 0u0 ‘soxeistit ssoare9 sr ya ain 30 “ ‘qoud,, peox o1 st ayerstus s,Jouur80q uourwod Y ‘sroquinu yy pealstus jot fryoivo 9 Ista suo “Bursesaun st simp yBnoKgy “12491 AY -qord pareynojea (jasaid 01 pasoddo se) zejnanied sti pue ons; coc? 7 42 TABLE61 Observations From a Perfect Rel X Observations ° 4 1 10 2 6 3 2 ‘ 28 5 34 ya4s6x independent variable (the “caus ables may be joined mathemsi ing the data of Table 1g elementary algebra, itis any physics text. Others are simple, as th 6.1, with observations on an X and a ¥. Re The general formula expressing a line is Y=a+ox A particu (the slope) For the d in the table, the formula flawlessly predicts the 3c, using the observations, we go on to draw the line and illustrate graphically the meanings of intercept and slope (see Figure 6 1) ct Because of the comple ins. Thus it would be more aceut accept some error in our pred Y=a+6X+e, B 2 Ys 446 2 16. 10. contribute less than expected, others the variable, number of ns. In Figure 6 2, we see is not surprising th: childvea, i , with more observa children) and ¥ (school fund cont Visually, the rela ines are sketched besides: ¥=2 + 7X (labeled line 2), and ¥ 5+ 5X (labeled line 3). Perhaps one of them is preferred. To decide which, of all possible lines, is “best,” we resort to the least squares principle. ‘The Least Squares Pri The best line generates prediction error? For an individual case, 3t think of prediction error as Jo uonwumuiexa st Ascanuyaid jenuasso ue ‘anaMO}] uo}ss218%24‘2jduys 10 ‘arouoaig v ammoaxa om ‘Sa]qetiea om axe a101n asNE9g UOISse1B91 (10) aunoo0e Oyu Saxe 0: Jo wonouny sou S| anuropeay typ Ajporidxo andre wou a4 adorg pue ydaouajuy x“ WZ eS ig CA 'WK- WIZ ‘ASS oun smug yen @ pue » jo sonyes anbjun 24p apisoud sey Suysorteg aus ano 4m vst a10y3 2, Uf 9q A|uEssa7—U JOU PINON *Z DUH SEA ‘onjea soqjours & Su EY 399]9S UDI pInoo om ‘stuns aasyp OM FuLeduIo’>“¢ auN, 10} POIw|Ng|¥o OS|s UE “| DUYL 40) pareInojeO 2g Pinod ass ty g-WZ=ass wonui ‘sio9 pu a4 Jo saxenbs 2up Jo wins 247 SB PouLJap 24 OF soWI0D BUI] @ Jo ‘yy ‘20U94] 30u932}u! Jo SoNPMLAKeUE [apISMOd B10uL ‘usajqoud 2yp aafos oy (soquinut mo] Aju = £— $+ 2809 pue G asec wo si0L pareynaqeo 2q pynoo s10u19 ‘asymoyt '¢- jo sous wono}pord fonpratpul ue 105 1¢ =A 194 “ye ='4 SpIOUK Laur] § = x ques “q 9s65 9 MON + Jo 401:9 woHIpaud yeNprAtpuT Ue 404 'EE= 4 1948 = 4 SPIE | UT y= X UMM 'Z 9 2INL UE 258 aYEL onjea patoipaud e sSheoiput anrea atp 19K aq, 241) (1g) sunt am aq parmipaad angen aun pe (/4) angen paasosgo ath u9ais10q 994 eae 389017 21QI580q 79 2308 A pus (aaspqiy9 Jo sequin) x woomeg, sds yo sequinn) « {s1e110p uy) uoninquiuog puny jooweg Academe Ability (lest score) Parents Education (in years) Figure 63. OLS Regression Line of Parents’ Education (X) on Academic Ability variable X_ Visual inspection of that plot suggests the relationship ‘The particular line is given by OLS. When we “regress” Yon X, we ol P= 16645 04x (63) where Pis the predicted 7 and 5 04 isthe least sq s highly plausible for a stude perform very poorly in college arent has ever been to school 47 of experience. Specifically, score lower than 9 years In general, prediction based on values outside the known range of X values in the data set should be avoided, ‘The slope indicates the expected change in ¥ for a unit change in X.In the case at hand students will earn, on average, 5.04 more points on the ‘exam for every additional year of parents’ education (this is shown graphi- cally in Figure 6.3). The slope then, measures the effect ofthe independent variable. Subst the effect of parents’ education seems rather im- portant, For instance, when students have parents with a high score, say X 20, they can expect to do much better than when parents havea low score, say X= 10. More precisely, the first group can expect to score about 50 points higher than the second group (ie , [20-10] x 504 = 50.4) Prediction and Goodness-of- A simple regression equation can be used to predict ¥, for a given X value. Suppose that we know a student in our study has parents with an mn, and we wish to predict exam perfor- number and generate the predic- average of 13 years of edu mance. It is an easy thing to plug tion: 9216645 08x = 196+5 0413) {64} = 166+ 65 52 =6718 ‘Thus for 13 years of parents’ education, we predict an exam score of about 67 items correct. OF course, it would be possible to use the equation to predict each student's performance. As canbe observed summary measure of how well the prediction equation the R-squared, R’, also known usa) soso aup st pue ‘adojs uopeyndod aust (P19q) f 'dassanuy wonreindod omy st (eude)w axoKy pexdto=s fod 2) ys 108 ay) ‘sonamoy, tors) aed trad se qons uonenba ajdures yp “eep 2jduses 0 sarenbs 1s80; fydde sAemye rsowye am ‘sisKjeue woyssauTas Uy SqBAs91uy aouapyuoD pus sisay souEr1yUdIS smoud seme gag Jo opmyueus out ‘sasvosou! soz9 se * zaddn Jeoneioay) ou sty amnseour ayy asneo9q “woUISpnt asioosd ssed ipaxd 01 Ajazow st (e08 s Jayoivases oy JT 0129 uonoipaud jo yunowe gj v s9ye10U98 [epout atp a¥er9ae wo TeIp ZuNsaBAns ‘sway wexe LOT = (WA', KC) 10x19 uoroxpard ainjosge aesone “gay aM ew opm ~juBeu! uy 2ofsey reymausos 29 sear 9pout & 40) 30129 uororpaxd afisoae ue 0} aso[p Suuypowos sopraotd worreynutioy ayp rey 295 244 t-N {6s} 4-0, =aas :smoqloy se ‘uoyssoi8o1 ajduus & 0 “£ 40 (aS) a1Dunsa fo 401.2 pup 1-Jo-ssoupoo8 Jo ainsvout oyouy ‘Aioatp Buyssnostp oq oy paunsaid are ‘ow ‘soy;pour nom ,voneueydxa, piom ayn asn sioyoreasas uod ‘But ae S58 01 UO O08 Keut 109 [eans!oay} sno uo Zuipuadog yopou zwoUy shes Suu £9 = ‘ajdutexa 0 uy “sanqea auaN “2¥ 241 &q pasnseout se ens7) ‘00 = parenbs-y = parenbs-y je 204 stunoaae worssaifor ayy Uay ‘pug 20 UO) A pu x wo0m0q ‘uoqes “puny s940 uo) ‘uoneuea ou [89] SSL/SSU= 2 “fydusys st parenbs-y ayn 405 eint4404 241 (SSA) 10} parvnosde you FayI0 (SSW) uoIssorfat aun Aq 10; porunoase aue—siauodusa> om Set "SSL 5p “A “ayqveres wapuadap ay) Ul UoNELeA yp Yeu 995 a4 sm | se xap pasenbs Jo wns 10113 ssa) st tol C4 “Wt se pazumumuuns ‘ayeuros 144 30129 p 24) [8 40} junods" Oy sey uOESSaLAD1 9K JE uot “ueou 94) HOH) [99] e(4— YX = (Gow suonstsp pares Jo wns wossassayy ‘motjoy se woyssara1 [so] °C ~'X = (SSH) suonesasp parenbs Jo wns feo, :@uyjaoutis sudys Jo wraiqoxd am proae on wou A. ‘onyea, ueour sty) ‘no temp 2uiBeu ‘1st cussed, easly extends to regression. Can the null hypothesis—no relation- ship in the population—be rejected? Consider rival hypotheses for the slope: Hg B=0 know the from the 16.12] mated slope; ton of the Le-hw-2 Yer-x? slope equals zero, this a (613) Let us apply the test at the 05 level of statistical significance, for our particular data example from Wintergreen College above, where b = 5.04 icant? Consulting a t-table, we find ven 48 degrees of freedorn (N ~2= 48), thee ore, 9.02 (ie, 5.04/ 56), far exceeds ;pothesis and conclude The above result conforms to a convenient rule-of-thumb: When the (4) is greater than or equal to 200 in absolute value, the samples become very st nly 2.04, and wi that busy researchers, when perusing pages of printout and many regression equations, employ this rule-of-thumb to arrive at preliminary judgments The above rules for significance testing also apply tothe other parameter intercept For these data, the intercept estimate has -ratio of alS, = 21. Ata glance, we recognize that this number is far below the threshold critical value of 2.00. We immediately conciude that the intercept “mtergreen College example: wa aetylg xis tat xia toad Smoj]o} Se WON st ppour jesouo8 94] “qunoD98 out waxes axe sSuNyT saMNO Taye “x SsoUaNYLT qued v ‘pooput “IEyI ADUApYUOD no asEDsIUE OF IUEM am ;PLOIDS ‘A ‘syqeuea quapuadap xp WHI ‘]quumea wapuadoput uo weU a20U apm|sUt am “won s248a42pdi nw IDEA NOISSAUOTA ATLL AW "6 sSuipuy uoissaufou da una way 20 ‘youts0g sump 250 I 89 Jo 308 prepUeIs 24 jo soquinu) ozis ajdures ay ons) [949] SO 94 Te Z9UeD1 -aipuosed uy saunSy ayy “Uuo1 x0119 oyp st 2"(s.094 ut) uonwompg, x “(oaus09 Sua}! Fo zaquINU Ut) 21098 159) KiNfiqy sTuLApeDY 2uN SE az0yR ZLOl=3dS OS=N ED = oe (17) PHXeHO SHOP LRA ayy Aypatstiaysidusoa pur Kyeaj9 par {od oy or ynoquud ayy wos} anou 0; mou Ing “Staquinu aBuezIs s98uoy sping Aremuns y ssyhsay wojssoady Suipuosazg fe 19942 a 40} Kropunog ea1 a4p 01 950)9 2 an patna oud Ba odvoout £q sayy | y wet) 129}}2 Sit U! 1oMO| 129 ax} WO} 1 ayy 190FaH DM ‘souapYUOD ¥5G uasaudas yor “2AfeAn SIH st 9) (9CD-aqy o wos “SBeIUEApE BOU! 1 1a “pasenbs-y paisn{peun ayy rex sueeu Wel] “pose $¢ Wop: 40 2013p avows uo ‘uonrenba uo|ssarfor e 0} pape 2jquisea wopuadapu ‘ennxe {2249 Jo "94098 panodat poseabs-y paisnfpe aif 29g) wopo2s} jo soasdop 2104 da Sujsn 20} aesuoduson sturopeoy uo 199] 1uLayrUsis Faq oun 928) Bit 01 Moy tH {oy yoyNR “woReAO WA IUOPRH s9]qe18a wwopuadapur s10ur Zuyppe £q da posn wop aay) Jo soai8ap axp yunoase oru! 9y21 01 parsnpe ‘uoneunuarap odin Jo quayoyj209 amt St ay paisnlpy pue ‘saiqeutea Kaun pa ‘yn axe {q pue Iq ‘210}9q se pouyjop are sonstieas ays put x 916=aaS OS=N LS pAsipy SL = Or (eal (z-) (igi-) ty) do's) (06) 2+ 7a 001 —'aS8'S —"K9 LL + XeBS PHSED=A :Smoi[oy se worssoiias JO} WapRg ainseows OF O10} at JO S1seq 2 UO E (papioae 2q on st dea qessea Aurumnp, seq, Aouepunpat peareuoyreu o4p yo dsne29q ostaay amp ($710) 21 ea Aumunp arene v0 . de1suiezed ayqensap vo oF Jopro ut uns ay) s9pun aqua uopUudd: pooooid you pines uoneusise “q pur 'q 01 $q Jo ‘toy Ans ‘poyanusuoe A[uoyeast 2194 Samu ‘qf ul 9q smu apis 9 Youn ‘9= fa PUE O='C 3! ‘smouy AI.tessaoou aan ig pur 'q uo 21008 (0 pazoos soiqeutea snoworoyptp “aarennuenb om are %q pue ‘cy sayy =on e=8 Meg 1100="Xst 0) =oy1 teGor th ig z100="xI | sojgeuiva Auuinp | ~ 5 owt 2qeHea DanrertTenb pemdsouod 241 -o8are9 9 atp dn saqunuinp,, au0 "waup ateas9 Oy -s2yqou Aun 0 are sajqeisea yuopuadapur aun Udy a}goomidde Aessu28 14 JoAO] TatUaINSEOUE Stu 9U10912A0 OF KEM Y 'SHOSUOLID st 4 pur €¥ uoomyaq diysuoneia1 snounds e jo aste29g “wen 2q Ket jen e sasodus Staretsdosdd uyaay, ‘UONBAROW IPMS WO s[qeHeA feuIpIO UE “ yeA-iuepwodapur-2onu aun ‘9AIMOH “SIPAN sKyeur woIssardoy soyquiae, Kuang ‘papmsuy aq pynoys japou ays uy saqqoiipa ays ys st stuo9 soy Soe 2 hence dichotomous, OLS can still perform as an unbiased estimator but loses in efficiency. See further discussion in the Appendix ) Collinearity unstable slope: Indeed, when the Coefficient of an X may reg High collinear the slope es These aige standard errs make for wide confidence intervals yy over the precise slope value. Recall the formula for tailed confidence interval for bi, bx (T.)Sy) 78) again about the formula for the estimate: 8 where fis the tratio, bs the slope estimate, and S, is the standard error oft Remeimbering our rule-of-thumb, as the ¢-ratio exceeds the absolute nt generally exceeds st recomes more difficult for ¢ to exceed 2.00 as the denomi- ernumber. What makes 5, bigger? In other words, ted variation of the slope? Look at this formula for the variance of a slope estimate, by 3 Variance b= S?. a 7.10} whore 53 i the estimated standard error squared of, slope By: $2 variance ofthe regression eror term: and Vi = (Ky), where Of, ~ is the prediction error sidual. from the regression of independent variable X, on the other independent variables in the model (X;, i) ‘As can be seen, the variance of 8, and thus its standard error, will lio suggests a method for assessing the level of which we apply below. Te research worker should be alert to ty problem Are the estimates of re the signs “right”? A ge? Is the jem appear in the purposes, reconsider the four-independent-vari just presented above (sec Equation The estimates for the effects of Parents’ Education (X,) and Communit Type (%) do not appear unusual in magnitude, nor are the signs unex- 96=HBS OS=N LA tard (66s) (698) (60) xX 'Xaby y+ op SA sec I+ s2qyoue ‘2{geHTea ywopuodapuy 240 Jory ;2atlo9 24 PIMO|LO} ZALY aM “IL] OS, sways vopoeraquy ey -uea Suipuayjo ayi snumw uonesyyioods emp a un A 4 atp ‘sjopourt sod asruordwoay soreusnsa paserg aaquerend fapous postaat e Twasasd pu Ino ayOUL 10 a0 de ‘uaiqoud Kyuvauiyo2 ou $3 244s “yqogord sou sarompuy aaupoyfiudss ooustaors fo aouasaad 2yp “kypsiaKuor “{ 01 <0 ‘yoyo tou st x “pe aay “ey yeanas fyosou Aout soueatsudis qeonsteIs {Jo Yoel Jo Axaa0asip ayy ‘AqsnoragO) wargoxd Ajzeaurqjoo B Sure} J05 punos8 wa unis 0 at -uou 240“ ‘vonenbg wos sayeurso a ne wos 4 se pamseout ue &q pure Iq ‘dk, isa laps ~"xer +90 tee== Fa Sra Tag Tx +90 +8r-='C Cra) zag fae + a0 + Keo +27 =x OU = MN “asyet lasec4 "K6L +P I= "x suoldurés ouostsi0% Jo audege are syosar au “uayy ‘3jou ann UG (Hq pu Iq ‘saygouea kumunp 66 ‘where variables and statistics are defined as before. of education can expect about 44 extra they are from the country or thé city. In ‘assiumed independent of the Value of Xp. Bu ethaps the effect of parents edu of the student’s home residence. One is ‘n an urban environment, because of ‘competing so ion forces. Given sich a circumstance, the slope ofthe Parents’ Education variable is really lower for the urban students, as ccomparéd tothe rural. Thatisto say, the slope ofX, should diffe hypothesis, we incorporate the m 10 our regression model, as follows: Yo bby, +b,X,+b(X XD +E 013) and estimate with OLS, a= 5344 90K, +34,754X, ~ 1 6AX,X) +E 07) B23) (2.12) 1.45) Beds Adj R= 72 N=50 SBE=926 Jas bX, 404K) = (a4 by) +b, Fb)X, 715} = (253 $34.79) + (4.90- L64)X, =3422-4 3.26%; 67 ‘And, for urban residents (X, =O): Pa b,X, +50) +8,X 0) =a+hX, 12.16) =~ 5344.90, What we observe is that the slope estimate for urban residents is a bit greater than for rural residents (4 90 > 3 26), rather than the opposite, as thesized. However, more important, the difference between the two ically function of the independent variables. This assumption is more than convenience The accumulated experience of empirical social research i ficult to improve on the linear assumption, Below, we demonstrate In an earlier chapter, we began consideration ofthe f ship between Parents" sumed it was linear. In that tuation, anit change in X, produces the same ixta-'x'g+ =a (xs'g~2=4 — uoqredéey ura (xBop'g+o= 4 sonmpaesoy tylgenag ‘Boury suonenbo pouoyiew pansy ae sound anos 259] Suymonoy ap us A) soueunopied wey: _stussed Jo sjanay «stuased UE s98ue49 ‘Ayenise saseasour ' soxny yor Joye ‘y Ur soswaxour ox9qy I" SunBty UL poysTeyS ye? se Yons ‘4 40} jopou! 2yoqnind © st Ayqissod youy (72 7 Weduit Aue AIprey avy p[nom ‘Siva 07 01 SOK 61 Aes ‘waxy aBueyD e soAOaIOWY sIEaK Sy 01 SIEOK pT twody a8ueyo trun v 6) paredutoo ‘souanyyut afse{ Sean & aaey pinom s1806 101 82804 0] wioss uoeanpa Jo aBuey> Wu e“o|durex9 104) 0192 But -qoeoadde Ayemiuaaa ‘Kppoenb soqnes: jo vedas oxp ‘uonoadap wey up DTZ Ndi Dyogiedsy w sj { aI2q Sf 200 VoUNOS Y “an|eA WMO sit Wo spuadap Ty jo roedaay ai araym “sojdurexa’peotiaioog yons somo ome aioqL aaqj09 Jo sivaA weayxa weyp yuetiodut ouoU axe fooyos YSIY JO s1voK wayxa asneaoq 29 YH St oot, SHe9K 11 01 SI8OK OL adunyo oxy se yoedutt ue Suoms se you mq ‘aoteuLo}ad wiexa uo roe Wd aney pinom swe0 C1 O1 Steak pf wosj YORoNpe Ut eseaLoUT He “aydurexa 104 ‘sasearour fy Jo anjea at Se ssf pu ssoj nq ‘x soouonyyAt adueyp mun & Jo uonouny Stuy patuasaidas) diysuoneas zeaury auraseq ay) 01 uontppe q 192 S9N1 YDIOYS om “TZ aUNdL uy “somtgissod ‘reoutjuow ‘sauyo am asaip “2sinoa jo “hypeouai0ay (SHO {1 OF sHe9K gL wots se oedutt ores ata sey it sito4 cy 01 194 py Wwosy 517090 idutexo.104) "x fo anyon ays fo ssappansan x wy 98ueyD A.pue ty woamiog diysuonejay 241 jo suiog ajqissog arewsonry 1 2 2m he by 7 ‘ ouowedsy 9 yoaesed a he i ae siywedo“e, wean ¥ coeccocce eEHeL Het cececereccece 70 sight argue that X, argue that theory dictates a nonlinear specification. Resolution of the debate réceives aid from estimation of rival specifications. Suppose the ‘Type, also considered by each as part of the explanation. It goes without saying that, in any model specifi- |, some of the variables may enter transformed, some not ) Below arc ies for the four models. Model I includes no transformed X,) P5464 4.446X, + 11 280%, (79) (8.69) 3.99) R=72 Adj. R= 71 N=50 SEE=936 Mod (includes natura! log afX)) 5 F-91314 8 rItoaN,) + 10.tIe: (506) (865) G75) oh R272 Adj R= 71 N=S0 SEE=939 Model Ill (includes reciprocal of X,) Je tng 0re— 791 25¢(7%) + 10436, (16.35) (-827) G53) 17.20) SEE =965 R271 Adj =69 N=50 Model IV (includes X?). $=-8. 294.6 42K, ~.071X}) + 1098+X, (25) (140) (44) (3.74) 721) Rad Adj. R= 71 N=50 SEE=944 ‘where the variables and statistics are defined as before ‘Review of these model results does not provide statistical support for a ionover the linear specification of Model I The ratios ‘only slightly $0, than the tratios of the other BE are greater, and the SEE of Model 1is terpretation ofthe X; X, coe! grounds, then, Model I seems favored On theoretical grounds as well, it Of course, for other social science research questions, proper transformation Summary and Conclusion th multiple regression, we posit that a variable jabeled X, ates for the structural relationships are ry least squares. The slope estimates appens to Y, when X, or X, . .. or Xychanges, under sound? The answer depends difficulties, we have offered counsel. As the reader has observed, OLS is -squn axe sxoyeusysa saenbs jseoj “12Ur are “C{A[-] SUONAUINSSE LOLA, pamnquaisip Kieuuou stan at (sajqeuea wapuadoper 2 oniuoa Ayjsivautedss iowues 9m asnez2q popood st vol duns ) soigetsea wuapuadapur axa jo Kue que pare a aoqioue te ye sous a1 (yore qiym pavejaize0 ou ze FWD 20H) pax 2 te sy fenpyapat uo paroyied rep “2jdutex9 40j ep aim weoued djsou sf Siu] seygeteR wepuodaput @ 9 sonen 2ip $0338 1Ue}SU09 Sf SOUELIEA Joug) oHSEpaASOWOY st ( sreunse 18 poiseq st uondunsse s1qy) urou 0322 © Sey IY ea $1? WHA] 300 41 “AL p wrosy paroypaud way swapwadaput oN) Ayeresnoae ((eujuiou so yourps0 you) 2a sourg WOUOINSEOYN ON HL (gesuyuon I) poe 4 woraeg dussonses 200 we 941 “A 0usnyuy (iu2puodaput ou) wopusdop 51 4-¥ ‘ong uoneoyipeds ONL jouue owos an KyaUsa 850 UoIssaiBas Jo oUt ap juasaud Acu sisfeue ta ete xi toas cuopenbo uorssa:8u 2jdqynus uoneyndod 24 jo ssoyouresed oun noe soouasajuy Suons oyeus 01 2[qu are am soreUNS9 pas i porewnso odetoav youn J] poSesaan sinsar 241 pu ‘sayduses pawwodss 2990 patewinsa siadojs easoddng) pasorqum are stoyewnso sozenbs 103] 941 104 ‘es ayy an 4o sn Woyut ep apdures sno wroxy sarewtiso oun “pertyiny axe suonduunsse uowssarar oq vaya ng Ino soinduwoo & uo siaquiNy arou aZe uo am syjnsou ay) uvour wea suonduinsse uorssexBa2 22us oF a1n|:6) Y SNOLLAWASSY NOISSA¥OTA AHL ?XIONGddV 6 (duro 10 ayduns 20479446 puay oy sreadde pur poreayjduuod aout = pue ‘foyjod uBlox05 wauKojduroun ‘uy soueyo &q paureldxe waprsard ay: yo Ayseyndod oy} 5] ‘souvayrudts jeyy. 210s wsukoyduroun said seo eiep Burjiod yim ‘us Jo auo aA ‘uoRsoNb YarEasas & ansners © uy .hqay caBuCys SuoRsopun Kom any Zaseo at Jo si9nj stp aze wey Gudddey BuIyTOWOs Ip ‘MOH sisAfeue Pep SaTeALTOM pLOM ay Jo Sup LOM aM MOgE AitsoLIN-) SNOLLVONAWAODTH 'S ssonssy siskreue nym {pe sounfe Surpuey Jo ajgeded ‘Joon ajgexays Afowanxo UE ABOUT THE AUTHOR MICHAEL S. LEWIS-BECK, Professor of Pol sity of Iowa, reccived his Ph.D. from the University of Michigan. Cur- re App Recently, he completed a term as cal Science. Professot Lewis-Beck has authored or coauthored nv- Juding Applied Regression: An Introduction, Research Pe (odapest, Hungary), Also, Professor Lewis-Beck has held visiting ap- Institue for Development Administation (Gua- 5). the Cath (Cima, Peru), and the of Paris 1 (Sorbonne) ”

You might also like