You are on page 1of 12

UNICEFWorkshoponGlobalStudy th th 18 to28 August2008

FittingLinearRegressioninSPSSandOutput Interpretation
th Tuesday26 August

TheaimofthisworkshopistointroduceyoutofittinglinearregressioninSPSS.It willbeusingtheDHSfromGhana,althoughthetechniquesshownarethesamefor alldatasets.Thisworksheetandthedataassociatedwiththeworkshopareall availableonthecoursewebsite. Attheendofthissessionyoushouldbeableto: fitasimplelinearregressionmodelinSPSS understandhowtocreatedummyvariablesforuseinlinearregressionwith categoricalexplanatoryvariables interpretoutputfromlinearregressionanalyses

1. SimpleLinearRegression ContinuousExplanatory Variables


Firstofall,downloadthedatasetfromthecoursewebsiteat www.southampton.ac.uk/socsci/ghp3/course/material.html toyourdesktop.The datasetthatwillbeusedforthissessionisthesameasforComputerWorkshop3.It isareducedversionoftheGhanaDHS2003,withalineforeachchildagedunder5 yearsoldintheselectedhouseholds. OpenSPSSintheusualwayandopenupthedataset. Inthefirstpartoftheworkshopwewillbelooking attherelationshipbetweenbirth weightand weightforagezscore.Thehypothesisisthatthelowerthebirthweight, thelowerthe weightforagezscoreagainstthereferencepopulation.Wewillstart withsomedatamanipulation,followedbyexploratoryanalysesandthentothe simplelinearregression.

Itisalwaysextremelyimportant togetafeelforthedatabeforeyourushheadlong intosomecomplicatedstatisticalanalysis.

CentreforGlobalHealth,Population,PovertyandPolicy(GHP3)

UNICEFWorkshoponGlobalStudy th th 18 to28 August2008 1. SelectAnalyze|DescriptiveStatistics|Explore.Thefollowingdialogue boxshouldappear.

2.

TransferthevariableWt/AStandarddeviation totheDependentListbox byclickingtherightarrownexttothebox,andthenclickontheOKbutton.

3.

TheoutputwillappearintherighthandpaneoftheOutputViewer window. ScrollthroughthisoutputcarefullyandnotewhatSPSShasproduced.The defaultoutputwillincludethemeanandstandarddeviationforyourdata,a 95%confidenceintervalforthepopulationmean,astemandleafplotanda boxplot.Thestemandleafplotisusefulasitenablesustoseewhetherthe distributionofourresponsevariable(weightforage)ishighlyskewedornot inthiscaseitisnot!However,itisclearfromtheboxplotthatthereare somestrangevalues,withascoreofabout1000.

4.

Thereareanumberof childrenwhohavehadtheirweightforageflagged. Thisisbecausethevaluesforweightforageforthosechildrenareoutside acceptablerangesthemeasurementforheightmayhavebeenincorrect. Thesearecodedas9998,butareincludedintheanalysisatthemoment.We needtochangethis(andwhilewearedoingthiswewillchangeother variableslikethisaswell.

CentreforGlobalHealth,Population,PovertyandPolicy(GHP3)

UNICEFWorkshoponGlobalStudy th th 18 to28 August2008

5.

GotoTransform|RecodeintoSameVariablesandrecodevalues9996and 9998intoSystemmissingforHeightforage,Weightforage,Weightfor heightandbirthweight.Ifyouhaveforgottenhowtorecodevariablesplease ask.

6.

ReruntheExplorecommandandstudytheresultsagain.Theresultshave changedbyalargeamount.

7.

Wecaninvestigatetherelationshipbetweenweightforageandbirthweight bylookingatthecorrelationbetweenthetwovariables.Correlationisusually calculatedbetweentwocontinuousvariables.Acorrelationof1indicates perfectpositivecorrelation asonevariableincreasestheotheralsoincreases atexactlythesamerate,whileacorrelationof1indicatesperfectnegative correlation asonevariableincreasestheotherdecreasesatexactlythesame rate.Acorrelationof0indicatesnolinearrelationshipbetweenthetwo variables. GotoAnalyse|Correlate|Bivariate. Thefollowingboxappears.

CentreforGlobalHealth,Population,PovertyandPolicy(GHP3)

UNICEFWorkshoponGlobalStudy th th 18 to28 August2008 PlaceWt/AStandardDeviationsandBirthweightintherighthand VariablesBox,asshownabove.ClickOK. Thefollowingtableis producedintheoutput.


Correlations Wt/AStandard deviations Wt/AStandarddeviations PearsonCorrelation Sig.(2tailed) N Birthweight(kilos3dec.) PearsonCorrelation Sig.(2tailed) N **.Correlationissignificantatthe0.01level(2tailed). 3094.000
** .109

Birthweight (kilos3dec.)
** .109

1.000

.002 837 1.000

.002 837 974.000

ThecorrelationbetweenWeightforageStandarddeviationandbirth weightis0.109.Thisisnotthathigh,butthepvalue(intheSig.(2 tailed)is0.002.Thisisbelow0.05(fora5%test)andthusis significantatthe5%level.Thusthereisarelationshipbetweenthetwo variables.Alsonotethatthenumberofchildrenincludedinthis correlationisonly837.Manychildrendonothavearecordedbirth weight,andsomedonothaveaweightforage(thechildrenwithouta weightforageincludethosewhohavediedbetweenbirthandthe survey)/

8.

Itisnowtimeforthesimplelinearregression.SelectAnalyze|Regression| Linear.Thelinearregressiondialogueboxappears(seenextpage).

9.

OurdependentvariableisWt/AStandarddeviations,soplacethisintothe dependentbox.WearepredictingweightforageusingBirthWeight,so placebirthweightintotheindependent(s)box.

10.

ClickOK.

CentreforGlobalHealth,Population,PovertyandPolicy(GHP3)

UNICEFWorkshoponGlobalStudy th th 18 to28 August2008

11.

Thefollowingoutputisproduced:
b VariablesEntered/Removed

Model 1

Variables Entered Birthweight a (kilos3dec.)

Variables Removed

Method .Enter

Thistablesimplystates thevariablesinthe modelandtheselection methodchosen.

a.Allrequestedvariablesentered. b.DependentVariable:Wt/AStandarddeviations

ModelSummary Model R
a

RSquare

AdjustedR Square

Std.Errorofthe Estimate 120.660

1 .109 .012 .011 a.Predictors:(Constant),Birthweight(kilos3dec.)


b ANOVA

Theresultsindicatethe correlation(0.109,asseenbefore) andthersquare thisindicates howmuchvariationisexplained inthiscasenotmuch!

Model 1 Regression Residual Total

SumofSquares 145568.173 1.216E7 1.230E7

df 1 835 836

MeanSquare 145568.173 14558.868

F 9.999

Sig.
a .002

Donot worry aboutthis box!

a.Predictors:(Constant),Birthweight(kilos3dec.) b.DependentVariable:Wt/AStandarddeviations

CentreforGlobalHealth,Population,PovertyandPolicy(GHP3)

UNICEFWorkshoponGlobalStudy th th 18 to28 August2008


a Coefficients

UnstandardizedCoefficients Model 1 (Constant) B 146.216 Std.Error 18.112 .005

Standardized Coefficients Beta t 8.073 .109 3.162 Sig. .000 .002

Birthweight(kilos3dec.) .017 a.DependentVariable:Wt/AStandarddeviations

Thefinalbox,labelled coefficients givestheresultsoftheanalysis.Eachofthe columnsisexplainedbelow: UnstandardizedCoefficientsB:Thisshowsthevaluesofthenumbersinthe linearregressionequation. o Theconstanttermis 146.2 indicatingthatachildwhoweighs0gatbirth (impossible,butthisisthetheory)willbe 146.2standarddeviationsbelow themeanfortheirweightforage. o Therelationshipbetweenbirthweightandweightforageis0.017.Forevery gramincreaseinbirthweight,weightforageincreasesby0.017. UnstandardizedCoefficientsStd.Error:Thisisthestandarderrorforthe coefficient itisusedinthecalculationofsignificance StandardizedCoefficientsBeta:Donotworryaboutthis! t:Thisisthettesttoseeifthecoefficientsaresignificantlydifferentfrom0.A valueover1.96indicatessignificanceatthe5%level. Sig.:Thisisthepvalue.If itisunder0.05thenthevariableissignificant.The valuewehavehereis0.002,whichishighlysignificant.Thereisasignificant relationshipbetweenbirthweightandweightforage.

2. SimpleLinearRegression CategoricalExplanatory Variables

1.

Theprocedureforconductinglinearregressionwhentherearecategorical explanatoryvariablesisslightlydifferent,asyouneedtocreatedummy variables,asexplainedearlier.Ifyoudonotdothis,theresultsthatyou obtainwillnotbevalid.Wewilllookattherelationshipbetweenwealthindex andweightforagestandarddeviations.

CentreforGlobalHealth,Population,PovertyandPolicy(GHP3)

UNICEFWorkshoponGlobalStudy th th 18 to28 August2008


2. Firstly,dosomeexploratoryanalysis.Onewaytodothiswithcategorical variablesistocalculatethemeanstandarddeviationforeachwealthquintile. Todothis: Goto Analyze|CompareMeans|Means PlaceWt/AStandardDeviationsintheDependentList PutWealthindexintotheIndependentlistbox ClickOK.Thefollowingresultsshouldbeproduced:
Report Wt/AStandarddeviations Wealth index Poorest Poorer Middle Richer Richest Total Mean 135.55 113.66 110.86 94.47 68.28 112.12 N 1031 694 556 425 388 3094 Std.Deviation 127.879 122.574 117.847 112.391 117.536 123.417

Therearelargedifferencesinweightforagebywealth.The averageforthe poorestquintileis 135.55,whilefortherichestitis 68.28.Aswealth increases,weightforageagainstthereferencepopulationalsoincreases.

3.

Wewillnowrecreatethisanalysisbyconductinglinearregression.Butfirst, wewillneedtocreatedummyvariablesforthewealthindex Fournewvariablesneedtobecreated,aswealthhasfivecategories (rememberthatthenumberofdummyvariablesisneededisonelessthanthe numberofcategories!) Goto Transform|RecodeintoDifferentVariables Place Wealthindex intothecentralbox.Ontherighthandside,under OutputVariable,enterinPoorestintothenamevariableandlabelthis DummyvariableforPoorestWealthQuintile.Click Change.

CentreforGlobalHealth,Population,PovertyandPolicy(GHP3)

UNICEFWorkshoponGlobalStudy th th 18 to28 August2008

Clickon OldandNewValues Thepoorestwealthquintileislabelled1intheoriginalvariable.Therefore wewanttokeepthislabelbutseteverythingelseto0 Inthe OldValue:Value boxtype1andintheNewValue:Value box alsotype1.Click Add. Nowmakeallothervaluestobezero.ClickonAllothervaluesatthe bottomofthelefthandboxandenter0intotheNewValue:Value.Click Add. Anymissingvalueneedstostaymissing,soclickon Systemoruser missing onthelefthandside,andclickon Systemmissing ontheright handside.Click Add. Theboxshouldlooklike:

CentreforGlobalHealth,Population,PovertyandPolicy(GHP3)

UNICEFWorkshoponGlobalStudy th th 18 to28 August2008


Click Continueandthen OK.Anewvariableiscreatedcalledpoorest.

4.

Younowneedtocreatethreemoredummyvariablesforothercategoriesof wealth.Todothis,goto Transform|RecodeintoDifferentVariables andfollowtheprocessabovefor Poorer,Middleand Richer.Eachtime youwillneedtorecodeadifferentvaluetobethedummy(forinstancefor Middle,allthosewitha3intheoriginaldatasetneed toberecodedasa1, andallothervariablesasa0.Pleaseaskifyouareconfused!

Alternatively,usethesyntaxtodothisautomatically.Afileisincludedonthe websiteforyoutousetocreateyourdummyvariables.

5.

Nowthelinearregressioncanberun.Goto Analyze|Regression| Linear.Theregressionfromthepreviousanalysiswillstillbethere.The Dependentvariableremainsthesame, Wt/AStandarddeviations,butthe Independentvariablesarenowdifferent.

RemoveBirthweight fromthe Independent(s) box.Enterinsteadthefour dummyvariables:Poorest,Poorer,MiddleandRicher.

CentreforGlobalHealth,Population,PovertyandPolicy(GHP3)

UNICEFWorkshoponGlobalStudy th th 18 to28 August2008


Click OK

6.

Fourboxesareproduced,asbefore.Belowisthefinalbox,labelled Coefficients.
Coefficients
a

UnstandardizedCoefficients Model 1 (Constant) Dummyvariableforpoorest wealthquintile Dummyvariableforpoorer wealthquintile Dummyvariableformiddle wealthquintile B 68.284 67.262 45.381 42.576 Std.Error 6.173 7.242 7.707 8.043 8.537

Standardized Coefficients Beta t 11.062 .257 .153 .132 .073 9.288 5.888 5.294 3.068 Sig. .000 .000 .000 .000 .002

Dummyvariableforricher 26.189 wealthquintile a.DependentVariable:Wt/AStandarddeviations

Youwillseethatallofthevariablesarehighlysignificant!Thisisseeninthe finalcolumn, Sig.,whichshowsthepvalue.Thisindicatesthatallwealth quintilesaredifferentfromtheConstant,whichisthe Richest quintile.

Thevaluefortheconstantis 68.284,whichisthesameasseenpreviously for themeanstandarddeviationforthe Richest quintile!

Forthepoorestquintiletheaveragescoreis 68.284 67.262= 135.546.The sameasbefore!Forallthewealthquintilestheresultsmirrortheresultsseen before. 3. MultipleLinearRegression

Youmaybewonderingwhywebothereddoingtheregressiononweightforageand wealthwhenwecangettheresultssimplyusingthe CompareMeans command. Thereasonistoshowthedifferenceswhenmorethanonevariableisaddedintothe modelatthesametime. Wehaveseenthatbirthweightandwealtharerelatedtoweightforagewhenthe simplebivariateanalysisisconducted.Butwhathappensifweanalysethemtogether?

CentreforGlobalHealth,Population,PovertyandPolicy(GHP3)

10

UNICEFWorkshoponGlobalStudy th th 18 to28 August2008


Birthweightishighlyrelatedtowealth:infantsborntopoorerhouseholdsare likely tobelighterthaninfantsborntoricherhouseholds.Soistherelationshipbetween wealthandweightforageonlyduetotherelationshipwithbirthweight thoseofa lighterbirthweightarelikelytoremainbelowthenormthroughoutchildhood. Totestthisweenterthevariablesintothemodeltogether. 1. Goto Analyze|Regression|Linear.Thepreviousregressionvariables willstillbecontainedinthedifferentboxes. 2. ClickonBirthWeightandplaceitintothe Independent(s) box,alongside thewealthquintiledummyvariables. 3. Click OK.Thefinaltableintheoutputiscopiedbelow.
a

Coefficients

UnstandardizedCoefficients Model 1 (Constant) Dummyvariableforpoorest wealthquintile Dummyvariableforpoorer wealthquintile Dummyvariableformiddle wealthquintile Dummyvariableforricher wealthquintile Birthweight(kilos3dec.) a.DependentVariable:Wt/AStandarddeviations B 119.658 83.830 37.202 42.243 39.684 .018 Std.Error 18.412 14.066 12.418 12.494 11.140 .005

Standardized Coefficients Beta t 6.499 .220 .115 .130 .138 .120 5.960 2.996 3.381 3.562 3.491 Sig. .000 .000 .003 .001 .000 .001

Theresultshavechanged!Partlythisisduetotherebeingadifferentsample beingused(onlythosewithabirthweightANDawealthquintileareincluded intheanalysis)butitisalsoduetohavingbothvariablesinthemodelatone time. Allthevariablesaresignificantinthemodelstill,althoughaftertaking accountofbirthweight thedifferencebetweenrichestandpoorestactually increases.Thisshowsthateventhoughbirthweightissignificantlyrelatedto weightforage,thereisaverylargeeffectofwealthafterthebirthonweight forage.

CentreforGlobalHealth,Population,PovertyandPolicy(GHP3)

11

UNICEFWorkshoponGlobalStudy th th 18 to28 August2008

4.

Theanalysiscanbeextended toincludeothervariables,suchas Typeof PlaceofResidence, EducationalLeveland PlaceofDelivery. However,allofthesearecategoricalvariables,soremembertocategorise theseasdummyvariablesfirst!

Exercises
1. ConductmultiplelinearregressiononWeightforageStandarddeviations, includingasexplanatoryvariablesbirthweight,wealthindex,urban/ruraland highesteducationalleveloftheparent 2. ConductmultiplelinearregressiononWeightforHeight,usingthesame variablesasin Exercise1.Arethereanyobviousdifferencesthatyoucansee? Whatistherelationshipbetweenwealthandweightforheightafter controllingfortheothervariables?

CentreforGlobalHealth,Population,PovertyandPolicy(GHP3)

12