You are on page 1of 12

STATA WORKSHOP 1

Econometrics (ECN3311)
John Musantu
What is Stata?
What is Stata?
❑ Stata is a powerful statistical package with:
▪ Smart data-management facilities
▪ A wide array of up-to-date statistical techniques
▪ An excellent system for producing publication-quality graphs
❑ Available on a variety of operating systems (Windows, Mac OS and
Linux distributions)
❑ Also available in different varieties:
▪ IC (standard)
▪ SE (extended)
▪ MP (multiprocessing)
Other Statistical Packages
There are alternative statistical software packages you can use (to name a few):
❑R
❑ Matlab
❑ SAS
❑ SPSS
❑ Gauss
❑ Gretl
❑ Eviews
Learning Curve
Stata 14 Front End Graphic User Interface (GUI)
Stata 14 Front End

❑ Stata has a menu bar on the top and 5 internal windows.

❑ The main window is the one in the middle (1 on the previous slide).
▪ It gives you all the output of your operations in Stata.
❑ The command window (2) executes commands.
▪ You can type commands directly in this window as an alternative to
using the menu system.
▪ Stata will show you what the written command is for each action
performed using the drop-down menus.
Stata 14 Front End

❑ The review window (3), lists all the operations preformed since
opening Stata. If you click on one of your past commands, you will
see the command being displayed in the Command window and you
can re-run it by hitting the enter key.
❑ The variables window (4) lists the variables in the current dataset
(and their descriptions). When you double-click on the variable, it
appears in the Command window.
❑ The properties window (5) gives information about your dataset and
your variables.
Stata Basic Commands
❑ General syntax for writing commands in Stata:

▪ commandvariable(s) qualifiers , options

❑We will make use of qualifiers ALL the time. Qualifiers allow us to restrict the group of people
we are conducting our analysis on. The syntax for the various qualifiers in Stata is as follows:
and &
or |
is equal to ==
is not equal to !=
is greater than >
is less than <
is greater than or equal to >=
is smaller than or equal to <=
Stata Basic Commands
❑ browse

▪ Displays a spreadsheet of the data or, in other words, the data in its raw form. Browse is useful
for allowing you to look at what is going on in the data, especially when you are generating
new variables.

❑ describe

▪ Displays some information about the dataset as a whole and then about each individual
variable, including the variable label. Note that this command can be shorted to ‘des’ or just ‘d’.

❑ set more off

▪ In order to not have to scroll through all 2000 vbls, you can press the red circle with a cross in it
on the menu bar at the top – stops whatever Stata is executing. Quicker and more efficient is to
use set more off. Do this before you run the command that generates a lot of output.
Stata Basic Commands
❑lookfor

▪ Sometimes we want to find a specific variable instead of paging through all the variables. This
command allows us to search for variables that contain specific letters/words either in the
variable name or in the variable label. For example, if we type in “lookfor emplopment”,
Stata will list all variables that have ‘rental’ in the name or label. codebook

▪ This command can be used to find out more about a specific variable, e.g. codebook
Employment. Codebook is a nice command to tell you what is stored in the variable. Name,
label, type of variable, range of values in the variable, how many unique values there, number
of missings, and gives an example of some of the content.

❑list
▪ Another command to look at data is the list command. E.g. list Employment. It lists the
content of the data set (i.e. each observation) for this variable
Stata Basic Commands
❑ tab

▪ The tab command is very useful and we will use it a lot at different times in this course. Tab stands for ‘tabulate’ and as it
indicates, it puts values in a table. For example, run a tabulation of education: tab w1_r_b7. Tab is very helpful for looking
at the data. Each row is a unique value that the variable can take and it tells you how frequent each of these values is. It also
gives us the percentage that gave that response. “Cum.” tells us the cumulative percentage – all those who gave at least that
value for their answer.

You might also like