Welcome to Scribd. Sign in or start your free trial to enjoy unlimited e-books, audiobooks & documents.Find out more
Download
Standard view
Full view
of .
P. 1
Regression With Stata

# Regression With Stata

Ratings:

5.0

(1)
|Views: 8,921|Likes:
Published by api-3737025

### More info:

Published by: api-3737025 on Oct 15, 2008
Copyright:Attribution Non-commercial

### Availability:

Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less

03/18/2014

pdf

text

original

Regression with StataChapter 1 - Simple and Multiple RegressionChapter Outline1.0 Introduction1.1 A First Regression Analysis1.2 Examining Data1.3 Simple linear regression1.4 Multiple regression1.5 Transforming variables1.6 Summary1.7 Self assessment1.8 For more information

1.0 Introduction
This book is composed of four chapters covering a variety of topics about using Stata for regression. We should emphasize that thisbook is about "data analysis" and that it demonstrates how Stata can be used for regression analysis, as opposed to a book that coversthe statistical basis of multiple regression. We assume that you have had at least one statistics course covering regression analysisand that you have a regression book that you can use as a reference (see theRegression With Statapage and ourStatistics Books for Loan pagefor recommended regression analysis books). This book is designed to apply your knowledge of regression, combine itwith instruction on Stata, to perform, understand and interpret regression analyses.This first chapter will cover topics in simple and multiple regression, as well as the supporting tasks that are important in preparing toanalyze your data, e.g., data checking, getting familiar with your data file, and examining the distribution of your variables. We willillustrate the basics of simple and multiple regression and demonstrate the importance of inspecting, checking and verifying your databefore accepting the results of your analysis. In general, we hope to show that the results of your regression analysis can bemisleading without further probing of your data, which could reveal relationships that a casual analysis could overlook.In this chapter, and in subsequent chapters, we will be using a data file that was created by randomly sampling 400 elementaryschools from the California Department of Education's API 2000 dataset. This data file contains a measure of school academicperformance as well as other attributes of the elementary schools, such as, class size, enrollment, poverty, etc.You can access this data file over the web from within Stata with the Stata
use
command as shown below.
Note:
Do not type theleading dot in the command -- the dot is a convention to indicate that the statement is a Stata command.
. use http://www.ats.ucla.edu/stat/stata/webbooks/reg/elemapi

Once you have read the file, you probably want to store a copy of it on your computer (so you don't need to read it over the webevery time). Let's say you are using Windows and want to store the file in a folder called
c:\regstata
(you can choose a differentname if you like). First, you can make this folder within Stata using the
mkdir
command.
. mkdir c:\regstata

We can then change to that directory using the
cd
command.
. cd c:\regstata

And then if you save the file it will be saved in the
c:\regstata
folder. Let's save the file as
elemapi
.
. save elemapi

Now the data file is saved as
c:\regstata\elemapi.dta
and you could quit Stata and the data file would still be there. When you wishto use the file in the future, you would just use the
cd
command to change to the
c:\regstata
directory (or whatever you called it) andthen
use
the
elemapi
file.
. cd c:\regstata. use elemapi

1.1 A First Regression Analysis

1

Let's dive right in and perform a regression analysis using the variables
api00
,
acs_k3
,
meals
and
full
. These measure the academicperformance of the school (
api00
), the average class size in kindergarten through 3rd grade (
acs_k3
), the percentage of studentsreceiving free meals (
meals
) - which is an indicator of poverty, and the percentage of teachers who have full teaching credentials(
full
). We expect that better academic performance would be associated with lower class size, fewer students receiving free meals,and a higher percentage of teachers having full teaching credentials. Below, we show the Stata command for testing this regressionmodel followed by the Stata output.
. regress api00 acs_k3 meals full
Source | SS df MS Number of obs = 313-------------+------------------------------ F( 3, 309) = 213.41Model | 2634884.26 3 878294.754 Prob > F = 0.0000Residual | 1271713.21 309 4115.57673 R-squared = 0.6745-------------+------------------------------ Adj R-squared = 0.6713Total | 3906597.47 312 12521.1457 Root MSE = 64.153------------------------------------------------------------------------------api00 | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+----------------------------------------------------------------acs_k3 | -2.681508 1.393991 -1.92 0.055 -5.424424 .0614073meals | -3.702419 .1540256 -24.04 0.000 -4.005491 -3.399348full | .1086104 .090719 1.20 0.232 -.0698947 .2871154 _cons | 906.7392 28.26505 32.08 0.000 851.1228 962.3555------------------------------------------------------------------------------
Let's focus on the three predictors, whether they are statistically significant and, if so, the direction of the relationship. The averageclass size (
acs_k3
, b=-2.68), is not significant (p=0.055), but only just so. The coefficient is negative which would indicate thatlarger class size is related to lower academic performance -- which is what we would expect. Next, the effect of
meals
(b=-3.70,p=.000) is significant and its coefficient is negative indicating that the greater the proportion students receiving free meals, the lowerthe academic performance. Please note, that we are not saying that free meals are causing lower academic performance. The
meals
variable is highly related to income level and functions more as a proxy for poverty. Thus, higher levels of poverty are associatedwith lower academic performance. This result also makes sense. Finally, the percentage of teachers with full credentials (
full
, b=0.11,p=.232) seems to be unrelated to academic performance. This would seem to indicate that the percentage of teachers with fullcredentials is not an important factor in predicting academic performance -- this result was somewhat unexpected.Should we take these results and write them up for publication? From these results, we would conclude that lower class sizes arerelated to higher performance, that fewer students receiving free meals is associated with higher performance, and that the percentageof teachers with full credentials was not related to academic performance in the schools. Before we write this up for publication, weshould do a number of checks to make sure we can firmly stand behind these results. We start by getting more familiar with the datafile, doing preliminary data checking, looking for errors in the data.
1.2 Examining data
First, let's use the
describe
command to learn more about this data file. We can verify how many observations it has and see thenames of the variables it contains. To do this, we simply type
. describe
Contains data from http://www.ats.ucla.edu/stat/stata/webbooks/reg/elemapi.dtaobs: 400vars: 21 25 Feb 2001 16:58size: 14,800 (92.3% of memory free)-------------------------------------------------------------------------------storage display valuevariable name type format label variable label-------------------------------------------------------------------------------snum int %9.0g school numberdnum int %7.0g dname district numberapi00 int %6.0g api 2000api99 int %6.0g api 1999growth int %6.0g growth 1999 to 2000meals byte %4.0f pct free mealsell byte %4.0f english language learnersyr_rnd byte %4.0f yr_rnd year round schoolmobility byte %4.0f pct 1st year in schoolacs_k3 byte %4.0f avg class size k-3acs_46 byte %4.0f avg class size 4-6
2

not_hsg byte %4.0f parent not hsghsg byte %4.0f parent hsgsome_col byte %4.0f parent some collegecol_grad byte %4.0f parent college gradgrad_sch byte %4.0f parent grad schoolavg_ed float %9.0g avg parent edfull float %4.0f pct full credentialemer byte %4.0f pct emer credentialenroll int %9.0g number of studentsmealcat byte %18.0g mealcat Percentage free meals in 3categories-------------------------------------------------------------------------------Sorted by: dnum
We will not go into all of the details of this output. Note that there are 400 observations and 21 variables. We have variables aboutacademic performance in 2000 and 1999 and the change in performance,
api00
,
api99
and
growth
respectively. We also havevarious characteristics of the schools, e.g., class size, parents education, percent of teachers with full and emergency credentials, andnumber of students. Note that when we did our original regression analysis it said that there were 313 observations, but the
describe
command indicates that we have 400 observations in the data file.If you want to learn more about the data file, you could
list
all or some of the observations. For example, below we
list
the first fiveobservations.
. list in 1/5
Observation 1snum 906 dnum 41 api00 693api99 600 growth 93 meals 67ell 9 yr_rnd No mobility 11acs_k3 16 acs_46 22 not_hsg 0hsg 0 some_col 0 col_grad 0grad_sch 0 avg_ed . full 76.00emer 24 enroll 247 mealcat 47-80% freeObservation 2snum 889 dnum 41 api00 570api99 501 growth 69 meals 92ell 21 yr_rnd No mobility 33acs_k3 15 acs_46 32 not_hsg 0hsg 0 some_col 0 col_grad 0grad_sch 0 avg_ed . full 79.00emer 19 enroll 463 mealcat 81-100% freeObservation 3snum 887 dnum 41 api00 546api99 472 growth 74 meals 97ell 29 yr_rnd No mobility 36acs_k3 17 acs_46 25 not_hsg 0hsg 0 some_col 0 col_grad 0grad_sch 0 avg_ed . full 68.00emer 29 enroll 395 mealcat 81-100% freeObservation 4snum 876 dnum 41 api00 571api99 487 growth 84 meals 90ell 27 yr_rnd No mobility 27acs_k3 20 acs_46 30 not_hsg 36hsg 45 some_col 9 col_grad 9grad_sch 0 avg_ed 1.91 full 87.00emer 11 enroll 418 mealcat 81-100% freeObservation 5
3

## Activity (75)

You've already reviewed this. Edit your review.
1 hundred reads
1 thousand reads
Armando Saavedra liked this
Gareth Anderson liked this
Reny Yustina liked this
Danish Shahid liked this
pahpra liked this
scribd
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->