You are on page 1of 45

Research Methods

Lecture 2
The dummies guide to STATA

Wiji Arulampalam
18/10/2006
1

Econometrics Software
You can use any software that does what you
need
See Timberlake for details of what does what
well [www.timberlake.co.uk]
PC Give is hard to beat for time series analysis
Microfit, EViews are good alternatives

STATA does (just about) everything.


STATA (and everything else) is available as a
delivered application on the network.
2

WHY STATA

Need to know how to use STATA for


(i) Econometrics A [next term]
(ii) Econometrics B [this term]
(iii) Panel Data Econometrics [next term]

E-Views demo will be given by the


Econometrics tutors!

The above two should be sufficient


3

STATA

Hopefully you will have access by next week

So full demo next week

Stata command file wages.do and data file


wages.dta on the module web page for you to
practice

STATA
Use STATA: FOR
large survey datasets (merging them)
complex nonlinear models (e.g. LDVs)
But see also LimDep
nonparametric and evaluation methods
you want to
continue studying economics
be a professional economist
learn something new

you hate PC Give.


5

Some useful websites


Statas own resources for learning STATA
Stata website, Stata journal, Stata library, Statalist
archive
http://www.stata.com/links/resources1.html

Michigans web-based guide to STATA (for SA)


UCLA resources to help you learn and use
STATA:
http://www. ats.ucla.edu/stat/stata
including movies and web-books
6

Accessing STATA
Available from your Delivered Applications

Ws tata.exe
Double click on icon!

Buttons/Menu

Enter commands here

OR use the do editor to


create a .do file

10

Results window

Better to save the output more late

11

Click for Extensive Help


OR

Type help in command line


help
12

Type help in command line

help xxx
13

Exit, clear
14

Click and point in v9

Exit, clear

Menu/tabs

15

Important features (1)


NOTE
Always use lowercase in STATA
Otherwise you can get very confused

More
--more-- in your output window more output to come.
[Press spacebar and the next page appears]
Command set more off turn this off

Not enough memory [so reset!]


. set mem XXXm (allocate XXX mb of data)
. set matsize XXX (max matrix size XXX square)
16

Important features (2)


To Break
To stop anything hit the break (menu button with red
cross, or hit Ctrl and C simultaneously)

17

Using data on disk (1)


Opening a dataset
datasets need to be rectangular
[variables in columns; observations in rows ]
Stata datasets have a .dta extension
Will read excel or text files
Otherwise use Stat/Transfer to convert other format
files to stata files

18

Using data on disk (2)


There are several ways of getting data into
STATA: eg: wages.dta

. use wages (or click: file/open on the menu bar)


. use lwage ed exp in 1/1000 if fem==1
. insheet using wages.csv (or .txt)
(imports an Excel csv file or a text file)
19

Opens the file


List of variables

20

Basic data reporting (1)


.describe (or press F3 key)
Lists the variable names and labels
.describe using wages
Lists the variable names etc WITHOUT loading the
data into memory (useful if the data is too big to fit)
.codebook
Tells you about the means, labels, missing values etc
21

22

Basic data reporting (2)


sort and count
.sort personid
sorts data by personid
.count if personid==personid[_n-1]
counts how many unique separate personids
_n-1 is the previous observation

23

24

25

First look at the data (1)


.list lwage ed exp in 1/10 if fem>=0
Lists the first 10 rows of var1 to var3 for which var40

.tab fem union (or tabulate)


[variables should be integers]
gives a crosstab of fem vs union

26

27

First look at the data (2)


.summ fem union (or summarize or sum)
means, std devs etc for x1 and x2

.corr ed exp in 1/100 if fem<1 (,cov)


correlation coeffs (or covariances) for selected data
.pwcorr ed exp lwage [does all pairwise corr coeffs]

28

29

30

31

Tabulating (1)
tab x1 x2 if x4==0, sum(x3)
gives the means of x3 for each cell of the x1 vs x2
crosstabulation for observations where x4=0

tab x1 x2, missing


Includes the missing values

tab x1 x2, nolabel


Uses numeric codes instead of labels
Eg 1 instead of NorthWest etc
32

Tabulating (1)
tab x1 x2, col
Gives % of column instead of count
Can get row percentages by using row instead
Or both by using row col

table educ ethnic, c(mean wage) row col


Customises the table so it includes the mean (or
median or mx or count or sd .) of wage by cells

33

Labelling
Always have your data comprehensively labelled
.label data This is pooled GHS 90-99
.label variable reg region
.lab define reglab 0 North 1 South 2 Middle
.lab values region reglab

Tedious to do for lots of variables


but then your output will be intelligibly labelled
other people will be able to understand it in future
34

Data manipulation (1)


Data can be renamed, recoded, and transformed:
Command .generate or gen for short
. gen logrw=log((earn/hours)/rpi)
. gen agesq=age^2

(squares)

. gen region1=(region==1)

(1 if true, 0 if not)

. gen ylagged=y[ _n-1 ]


(_n is the obs # in STATA)
35

Data manipulation (2)


Command recode:
. recode x1 .=0, 1/5=1

(. is missing value (mv))

. replace rate=rate/100
. replace age=25 if age==250
. egen meaninc=mean(income), by (region)
(see help egen for details)

36

37

Data selection (1)


You can also organise your data set with various
commands:
. keep

if _n<=1000 ( _n is the observation number)


. drop region
. drop if ethnic~=1

keeps only the first 1000 observations, drops


region, and drops all the observations where the
variable ethnic1 (~= is not equal to)
38

Data selection (2)


Then save the smaller file for subsequent
analysis
. save newfile
. save, replace (take care it overwrites existing file)

39

40

Functions
Lots of functions are possible.
See . help functions
Obvious ones like
Log(), abs(), int(), round(), sqrt(), min(), max(), sum()

And many very specialised ones.


Statistical functions
distributions

String functions
Converting strings to numbers and vice versa

Date functions
Converting dates to numbers and vice versa

And lots more


41

Command files
Stata command files have a .do extension
It is ALWAYS good practice to use a .do file
you will know exactly what you have done.
It makes it easy to develop ideas.
And correct mistakes.

. do wages.do, nostop
(echoes to screen, and keeps going after error
encountered)

Or . run wages.do

(executes silently)
42

Keeping track of output (1)


Can scroll back your screen (upto a point)
But better to open a log file at the beginning of
your session, and close it at the end.
Click on file, log, begin . Or type
. log using myoutput
. Commands
. log close
[log command allows the replace and append options.]
43

Keeping track of output (2)


Default is .smcl file extension (that STATA can
read)
.log extension gives an ASCII file that anything
can edit
ALWAYS LOG your output
is a good way of developing a .do file since it saves
the commands as well as the output

44

Next Lecture
Monday 23rd October F107 11:00-12:00
STATA demo

45