70 views

Uploaded by doanhnghiep171

STATA

- design of experiment introduction
- stata
- Practical Concepts of Quality Control
- Impact of School Location on Academic Achievement of Science Students in Senior Secondary School Certificate Examination
- tmp8BDD.tmp
- 9. Manage-A Comparative Analysis of Indo-eu Textiles Trade-Asiya Chaudhary
- QTM Regression Analysis Ch4 RSH
- Extent of Parent-Teacher Association Involvement in the Implementation of Universal Basic Education Program in Primary Schools in Northern Senatorial District of Ondo State, Nigeria
- occt 504 quantitative paper
- Chapter 7
- Final Report on Qtia
- 33
- Testing of Hypothesis_one Sample
- Price Product Mkted
- dsi-dl42_1_class_pendharkar
- Ch6 Planning Personnel-Labor Requirements
- A11
- file 1
- lasell sph245 homework 1
- Schizophr Bull 1982 Schooler 85 98

You are on page 1of 17

09/09/2011

Source: http://www.princeton.edu/~otorres/Stata/

Zeynep Ugur

Regression

Technically, linear regression estimates how much Y changes when X changes

one unit.

In Stata use the command "regress",

type:

regress [dependent variable] [independent variable(s)]

regress y x

In a multivariate setting we type:

regress y x1 x2 x3

Before running a regression it is recommended to have a clear idea of what you

are trying to estimate (i.e. which are your outcome and predictor variables).

A regression makes sense only if there is a sound hypothesis behind it.

Regression: example

Example: Do older people report lower life satisfaction controlling for other factors?*

Outcome (Y) variable life satisfaction, cp08a011 in sample dataset

Predictor (X) variables

Age of houshold member (leeftijd)

nationality ( cr08a043)

gender (geslacht)

level of education (oplcat)

Personal monthly income in categories (nettocat)

Civil Status (burgstat)

Assuming that sample dataset is saved on the desktop, type:

use "C:\Documents and Settings\Administrator\Desktop\sample dataset.dta"

Regression: variables

It is recommended first to examine the variables in the model to check for possible errors, type:

describe lifesatisfaction age dutch female married nevermarried netincome educ

summarize lifesatisfaction age dutch female married nevermarried netincome educ

Regression: what to look for

This is the p-value of the model. It

tests whether R

2

is different from

0. Usually we need a p-value

lower than 0.05 to show a

statistically significant relationship

between Y and Xs.

R-square shows the amount of

variance of Y explained by Xs. In

this case the model explains 4%

of the variance in life satisfaction.

Lets run the regression:

Adj R

2

shows the same as R

2

but

adjusted by the # of cases and

# of variables.

Two-tail p-values test the hypothesis that each coefficient is different

from 0. To reject this, the p-value has to be lower than 0.05 (you

could choose also an alpha of 0.10). In this case, age is not

statistically significant in explaining life satisfaction.

The t-values test the hypothesis that the coefficient is

different from 0. To reject this, you need a t-value greater

than 1.96 (for 95% confidence). You can get the t-values

by dividing the coefficient by its standard error. The t-

values also show the importance of a variable in the

model.

Outcome

variable (Y)

Predictor

variables (X)

1

2

3

4

5

Regression: with dummies

Region is entered here as dummy variable. The easy way to add dummy variables

to a regression is using xi and the prefix i. (interpretation is the same as before).

The first category is always the reference:

xi:lifesatisfaction age female netincome educ dutch married nevermarried

i.VW HG

NOTE: By default xi excludes the first value, to select a different value, before running the regression type:

char sted [omit] 4

xi: regress lifesatisfaction age female netincome educ dutch married nevermarried i.sted

This will select (4) as the reference category for the dummy variables.

NOTE: Another way to create dummy variables is to type:

tab sted, gen(urban)

This will create 5 new variables (or a many a categories in the variable), one for each region in this case.

Regression: ANOVA table

When you run the regression, at the top you get the ANOVA table

xi: regress csat expense percent income high college i.region

A = Model Sum of Squares (MSS). The closer to TSS the better fit.

B = Residual Sum of Squares (RSS)

C = Total Sum of Squares (TSS)

A

B

C

Regression: estto/esttab

To show the models side-by-side you can use the commands estto and esttab:

regress lifesatisfaction age female

eststo model1

regress lifesatisfaction age female netincome

educ dutch married nevermarried

eststo model2

xi:regress lifesatisfaction age female netincome

educ dutch married nevermarried i.sted

eststo model3

esttab, r2 ar

Regression: exploring relationships

scatter lifesatisfaction age

There might be be a curvilinear relationship between llifesatisfaction and age.

we PLJKWZDQWWRadd a square version ofthe variable, in this case DJH

gen age2=age*age

scatter lifesatisfaction age2

Regression: getting predicted values

How good the model is will depend on how well it predicts Y, the linearity of the model and the behavior of

the residuals.

to generate the predicted values of Y (usually called Yhat) given the model:

use predict immediately after running the regression:

xi: regress lifesatisfaction age female netincome educ dutch married nevermarried i.sted

predict lifesathat

label variable lifesathat "predicted life satisfaction"

Regression: observed vs. predicted values

For a quick assessment of the model run a scatter plot

scatter liesatisaction liesathat

We should expect a 45 degree pattern in the data. Y-axis is the observed data and x-axis the predicted

data (Yhat).

In this case the model does not seem to be doing a good job in predicting lifesatisfaction

Regression: joint test (F-test)

To test whether two coefficients are jointly different from 0 use the command test

xi: quietly regress lifesatisfaction age female netincome educ dutch married nevermarried

i.sted

Note quietly suppress the regression output

To test the null hypothesis that both coefficients do not have any effect on lifesatisfaction

type:

test age female

The p-value is 0.0023, we reject the null and conclude that the variables jointly have indeed a significant effect

on lifesatisfaction.

Some other possible tests are

test netincome = 1

test netincome = educ

Regression: saving regression coefficients

Stata temporarily stores the coefficients as _b[varname], so if you type:

gen age_b = _b[age]

gen constant_b = _b[_cons]

You can also save the standard errors of the variables _se[varname]

gen age_se = _se[age]

gen constant_se = _se[_cons]

Regression: saving regression coefficients/getting predicted values

Type help return for more details

Interaction terms are needed whenever there is reason to believe that the effect of one independent variable depends on the value of

another independent variable. We will explore here the interaction between two dummy (binary) variables. In the example below there

could be the case that the effect of type of dwelling on lifesatisfaction may depend on the gender of the respondent.

Dependent variable (Y) Lifesatisfaction

Independent variables (X)

Binary selfowneddwelling is 1 if (woning) type of dwelling is self-owned.

Interaction term In Stata: gen selfownd_f=female* selfowneddwelling

xi: regress lifesatisfaction age female netincome educ dutch married nevermarried i.sted selfowneddwelling selfownd_f

Regression: interaction between dummies

The effect of female on the lifesatisfaction is 0.8 but given the interaction term (and assuming all coefficients are significant), the net effect is

0.8+0.4* selfowneddwelling. If selfowneddwelling is 0 then the effect is 0.8 (which is selfowneddwelling coefficient),

but if selfowneddwelling is 1 then the effect is 0.8+0.4= 1.2.

In this case, the effect of being female on lifesatisfaction is more positive if women have their own houses.

Binary rentaldwelling is 1 if (woning) type of dwelling is rental.

Lets explore the same interaction as before but we keep student-teacher ratio continuous and the English learners variable as binary. The

question remains the same*.

xi:regress lifesatisfaction age female netincome educ dutch married nevermarried i.sted income_f

Regression: interaction between a dummy and a continuous variable

The effect of income on lifesatisfaction is lower for females.

If female=0 then the effect of income 0.06

If female=1 then the effect of income 0.06-0.03

Increasing income category by 1 unit for males will increase life satisfaction by 0.06 units, but it will have a lower impact for

females.

Dependent variable (Y) Lifesatisfaction

Independent variables (X)

Continous netincome

Interaction term In Stata: gen income_f=female* netincome

Binary female

Lets keep now both variables continuous. The question remains the same*.

Regression: interaction between two continuous variables

The effect of the interaction term is very small. the effect of rise in income category is 0.02 + 0.0003*age

So: If age = 50, the slope of income is 0.042

If age = 70, the slope of income 0.05.

In the continuous case there is a very small effect (and not significant).

xi:regress lifesatisfaction age female netincome educ dutch married nevermarried i.sted inc_age

Dependent variable (Y) Lifesatisfaction

Independent variables (X)

Continous netincome

Continous age

Interaction term In Stata: gen income_age=age* netincome

- design of experiment introductionUploaded bywshen1468
- stataUploaded bygarryduff
- Practical Concepts of Quality ControlUploaded bySchreiber_Dieses
- Impact of School Location on Academic Achievement of Science Students in Senior Secondary School Certificate ExaminationUploaded byIJSRP ORG
- tmp8BDD.tmpUploaded byFrontiers
- 9. Manage-A Comparative Analysis of Indo-eu Textiles Trade-Asiya ChaudharyUploaded byImpact Journals
- QTM Regression Analysis Ch4 RSHUploaded byNadia Khan
- Extent of Parent-Teacher Association Involvement in the Implementation of Universal Basic Education Program in Primary Schools in Northern Senatorial District of Ondo State, NigeriaUploaded byAkinfolarin Akinwale Victor
- occt 504 quantitative paperUploaded byapi-282223043
- Chapter 7Uploaded byJamel Cayabyab
- Final Report on QtiaUploaded bywasifq
- 33Uploaded byArfandi Adnan
- Testing of Hypothesis_one SampleUploaded byAnkur
- Price Product MktedUploaded byParmod Singh
- dsi-dl42_1_class_pendharkarUploaded bystar3432
- Ch6 Planning Personnel-Labor RequirementsUploaded byMHD
- A11Uploaded byShradha Gawankar
- file 1Uploaded byGautham Muthukumar
- lasell sph245 homework 1Uploaded byapi-296534570
- Schizophr Bull 1982 Schooler 85 98Uploaded byAlejandro Israel Garcia Esparza
- Analysis of Factors Affecting Farmer Satisfaction in Artificial Insemination Services in Jepara Regency Central Java IndonesiaUploaded byIJEAB Journal
- latihan 1Uploaded byDian Agustin
- 330_Lecture6_2014.pdfUploaded byAnonymous gUySMcpSq
- Effects of Perceived Skill Dissimilarity and Task Interdependence on Helping in Work TeamsUploaded byElena Madalina Constantin
- Spending Behavior of the Teaching Personnel in an Asian University.pdfUploaded byRolan Ambrocio
- Multiple Regression Analysis Using SPSS Statistics Step by StepUploaded byshankar_mission
- How to Write an APA Style Research Paper1Uploaded byNoor UD Din
- Do School Mission work? (James H. Davis, John A. Ruhe, Monle Lee, Ujvala Rajadhyaksha)Uploaded byAgon Wenewolok
- Vezi BiblioUploaded byadsaacom
- ForecastingUploaded byAbdul Khaliq

- Sundaram & DasUploaded bysswibowo
- AJFMQBUploaded bycl85Scrib
- S2 Steps system GMM Models for Female Ratio in BoardUploaded bydoanhnghiep171
- MPRA Paper 32685Uploaded bydoanhnghiep171
- Rethinking Risk ManagementUploaded bydoanhnghiep171
- h 01465256Uploaded bydoanhnghiep171
- Noraini-MohdAriffinUploaded bydoanhnghiep171
- IELTS1_studyguideUploaded byPhạm Đức Tùng
- F_10-10-14_TAILLARDUploaded bydoanhnghiep171
- riber_b14-179_422-432Uploaded bydoanhnghiep171
- Gilje Taillard Risk Management 05-09-2014Uploaded bydoanhnghiep171
- International Finance Peiexam Fall2012 CorrectionUploaded bydoanhnghiep171
- FixedversusRandom_1_2Uploaded bydoanhnghiep171
- 2195-6646-1-PB_2Uploaded bydoanhnghiep171
- Risk ManagementUploaded bydoanhnghiep171
- Arellano Bond GMM EstimatorsUploaded byamnoman17
- BootstrapUploaded byŁukasz Chocholak
- Chapter 36 Large Sample Estimation and Hypothesis TestingUploaded bydoanhnghiep171
- Lich Thi Thang 04 2013 He SDHUploaded bydoanhnghiep171
- 2 Nd Derivative SolutionsUploaded bydoanhnghiep171
- gmm_2Uploaded bydoanhnghiep171
- Rein HartUploaded bydoanhnghiep171

- Hypothesis Testing Using the Binomial DistributionUploaded byDiksha Koossool
- Estatística - (McDonald 2009) Handbook of Biological Statistics 2nd EditionUploaded byCaducas Rocha Duarte
- feys.pdfUploaded byRivin Fireflies
- 173232298 a Guide to Modern Econometrics by Verbeek 1 10Uploaded byAnonymous T2LhplU
- DATA final ARIMA.xlsxUploaded byMelissa Palma Portugal
- Lecture 4 Day 3 Stochastic Frontier AnalysisUploaded bytrpitono
- Homogeneity ISO 13528 2015Uploaded bykhairil ulin nuha
- Robust Nonparametric Statistical MethodsUploaded byPetronilo Jamachit
- Chapter 15Uploaded byAbdul Mateen
- Tut 10Uploaded bysitihidayah188277
- Stochastic models of Nigerian total livebirthsUploaded bysardineta
- Principal Component Analysis R Program and OutputUploaded byThameem Ansari
- exam1Uploaded byAmalAbdlFattah
- regressionanalysis-110723130213-phpapp02.pptUploaded byHarshi Garg
- ch8-4710Uploaded byLeia Seungho
- Recipes for State Space Models in R Paul TeetorUploaded byalexa_sherpy
- Henseler Fassott Dijkstra Wilson 2012Uploaded byLiliya Iskhakova
- Sample SizeUploaded byliorkadosh
- defensexin.pptUploaded byanutaneja
- Ind Btech Sem7Uploaded bynimesh
- Session 3 - Logistic Regression.pptUploaded byNausheen Fatima
- Bi Variate and Multiple Linear RegressionUploaded byDanialAiman
- Maxwell Et Al 2008 Sample Size Planning and Statistical Power and Accuracy in Parameter EstimationUploaded bytreeknox
- Regression Exercise IAP 2013Uploaded byprofollied
- Chapter 14Uploaded byreza786
- algebra 2 standardsUploaded byapi-293314945
- Solutions w 07Uploaded byJamie Samuel
- Projection and RegressionUploaded byapi-26344229
- Algebra 2 CCSS Unit Organizers_Rev_8!19!13 (1)Uploaded byrcarteaga
- skittles term project finalUploaded byapi-254516775