You are on page 1of 5

A brief tour of Stata

1. Opening the data set
. use "C:\Documents and Settings\tongyai\My Documents\Downloads\LAWSCH85_1.dta", clear

The command will read the data from the specified directory.

This data set contains information about selected law schools in the U.S. The description of each
variable is given in the label.

2. Simple regression analysis

Suppose we would like to find the effect of GPA on salary, ceteris paribus. Let us first estimate the
effect using a simple regression model

 =  + 

. and related statistics as follows. The Stata command that computes the least squares estimators  and  is . + . reg lsalary GPA This command produces the least squares estimates.

multiplication.0000 . The R-squared of the model is given by SSE/SST = 5.26862 GPA | 136 3.5534.74270257/10. 3. with  = 7.55344139 4.3763518 .08 and  = 1. Min Max -------------+-------------------------------------------------------- lsalary | 136 10.2772397 10. Basic descriptive statistics in Stata We can compute the mean.12262 11. . display 5. correlate lsalary GPA (obs=136) | lsalary GPA -------------+------------------ lsalary | 1.8 3. Dev.1972305 2.05.38 = 0. subtraction. Calculation in Stata We can perform basic calculation (addition. For example.0000 GPA | 0. division) in Stata using the command display.54149 .74/10.7439 1.309632 . correlation coefficient of random variables in Stata as follows. .82 .The results show that there are 136 observations used in the computation of our estimators. sum lsalary GPA Variable | Obs Mean Std. standard deviation.

Now we can verify that the least squares estimator  = (. .

)/( .

).2772397*.1972305)^2 1.1972305)/(.7439*.045673  −  . display (0. .

Another property of the least squares residuals is that ∑ . predict residual.309632 7. The estimated intercept is  =  . . we know that ∑ ̂ = 0. . In theory.0805284 5.045724*3. Interval] -------------+------------------------------------------------ residual | 6. Err.0314184 -------------------------------------------------------------- The sum of the computed ̂ turns out to be ‘zero’ (although not exactly zero. resid Note that residual is just a name of ̂. . due to rounding ups).0314184 . mean residual Mean estimation Number of obs = 136 -------------------------------------------------------------- | Mean Std.0158864 -.54149-1. Generating residuals We can use the command predict after running the regression to generate residuals ̂. display 10. [95% Conf. You may use another name if you want.39e-10 .

so that estimates produced by a simple regression model are biased.0000 The computed correlation coefficient is zero. correlate residual GPA (obs=136) | residual GPA -------------+------------------ residual | 1. ∗ ̂ = 0. Multiple regression analysis Suppose there are other variables that also affect lsalary and correlates with GPA. . which confirms the theoretical property of the residuals that they are uncorrelated with the error terms. Let us now include another explanatory variable LSAT so that the model becomes  =  +  .0000 GPA | -0. 6.0000 1.

87 Model | 6. " is . F( 2. 133) = 113.  and .0000 Residual | 3. reg lsalary GPA LSAT Source | SS df MS Number of obs = 136 -------------+-----------------------------.3763518 135 .27541212 Prob > F = 0. + " #$ % + .028763365 R-squared = 0.82552758 133 .6313 -------------+-----------------------------.076861865 Root MSE = .6258 Total | 10.1696 .55082423 2 3. Adj R-squared = 0. The Stata command that computes the least squares estimators  .

7988599 LSAT | . t P>|t| [95% Conf.0264977 . Std.3376745 .  can be alternatively expressed as ∑ ̂ ∗   = .116581 4.08 0.369171 5.87 0. Interval] -------------+---------------------------------------------------------------- GPA | .552136 ------------------------------------------------------------------------------ 7. Err. ∑ ̂ " Where ̂ is the residuals from regressing GPA on LSAT .000 3.0363857 _cons | 4.000 .5682672 .------------------------------------------------------------------------------ lsalary | Coef.0049991 5.460653 .5518221 8.30 0. Partialling out interpretation of multiple regression analysis We would like to confirm that the least squares estimator.0166097 .000 .

0230625 457.49588 10.09 0.21 0. t P>|t| [95% Conf.942068 .015793534 R-squared = 0.51 Model | 3.5970 -------------+-----------------------------.13514804 1 3.5940 Total | 5. Std.204537 ------------------------------------------------------------------------------ . . holding the effect of LSAT constant.25148166 135 .5682672 .57.0589 Total | 10.e.58711 ------------------------------------------------------------------------------ Note that the estimated effect of ̂ on lsalary is exactly equal to the estimated effect of GPA on lsalary in the multiple regression.000 10.2026126 .0284812 .57 in the multiple regression model. Interval] -------------+---------------------------------------------------------------- r_hat | .683422633 Prob > F = 0.000 -2.45 Model | . F( 1. Adj R-squared = 0.072335292 R-squared = 0.000 . Adj R-squared = 0.0659 -------------+-----------------------------.1848771 3. it follows that a simple regression model will produce an estimated effect with a positive bias.26895 ------------------------------------------------------------------------------ lsalary | Coef.9339219 _cons | 10.07 0.3763518 135 . = & + & #$ % + . i.08 0. Std. t P>|t| [95% Conf.076861865 Root MSE = .003 .0023516 14. scatter lsalary GPA . 8.679598 -1. This is obvious when we compare  = 1. 134) = 198. 134) = 9.683422633 1 . . reg lsalary r_hat Source | SS df MS Number of obs = 136 -------------+-----------------------------.69292918 134 . resid . Interval] -------------+---------------------------------------------------------------- LSAT | . is equal to 0.0377832 _cons | -1. reg GPA LSAT Source | SS df MS Number of obs = 136 -------------+-----------------------------.12567 ------------------------------------------------------------------------------ GPA | Coef.11633362 134 .0000 Residual | 2. Graph A simple scatter plot of the relationship between lsalary and GPA can be constructed as follows.0331322 .13514804 Prob > F = 0. the effect of GPA on predicted lsalary.0026 Residual | 9.3728996 -5.038899864 Root MSE = .54149 . F( 1. Err.05 in the simple regression model and  = 0. Err. predict r_hat. Note also that since GPA and LSAT are positively correlated.

1) Note that this variable is purely random.076861865 Root MSE = . which is higher than that in the previous model with GPA and LSAT as the explanatory variables (R-squared =0. t P>|t| [95% Conf. .6334 -------------+-----------------------------.8 3 3.04 0.2 2.19064549 Prob > F = 0.6 10. gen noise = rnormal(0.1168718 4. Err. lsalary = β + β GPA + β" LSAT + 5 678 + ϵ.6 3.8 log(salary) 10.7940046 LSAT | .6334.0000 Residual | 3.0406037 _cons | 4.01 Model | 6.6250 Total | 10. 132) = 76.80441534 132 . it has virtually no relationship with lsalary.0168106 .33 0. and noise as the explanatory variables increases to 0. Suppose we include this variable in our regression and estimate the following model.4 3. Adj R-squared = 0. reg lsalary GPA LSAT noise Source | SS df MS Number of obs = 136 -------------+-----------------------------. R-squared is always non-decreasing in the number of explanatory variables We would like to show that the goodness-of-fit is always non-decreasing in the number of explanatory variables in the model.028821328 R-squared = 0. .5528044 8. Interval] -------------+---------------------------------------------------------------- GPA | .348569 5.5628206 .0160786 . .57193647 3 2.442071 .0122625 .82 0. F( 3.2 11 10.A scatter diagram will be produced in another window.3316367 .000 .005011 5.8 median college GPA 9.2 3. 11.535573 Note that R-squared from the new model with GPA.394 -.3763518 135 .000 .0143275 0.6313).4 10.0366352 noise | . Let us generate another variable noise that is randomized from a normal distribution with mean 0 and variance 1.16977 ------------------------------------------------------------------------------ lsalary | Coef.86 0.000 3.0267229 . LSAT. Std.