Introduction to STATA
Stata is an all purpose statistics package. Stata 8.0 is installed on all the lab machines. It is documented in Reference Manual, a User’s Guide and a Getting Started manual. Each of these can be found in the lab. Stata also has a built in help utility and several on-line tutorials.
To start Stata click on the Start button then point to programs then stata then Intercooled Stata. Four windows will appear, review, variables, stata results, and stata command. There may be a stata icon on the desktop.
The cd command
The cd command tells you what the current working directory is. This is directory where Stata will look for data or save data or save a log file. When you first open stata the working directory is C:\data. You can change the working directory by using the command cd. For example, the command cd c:\ changes the working directory to c:\ The dir command The dir command lists the contents of the current directory. The command dir c:\stata lists the contents of the directory c:\stata Comments Stata treats a line preceded by a asterisk as a comment. * this is a comment The Display Command The command display allows you to use Stata as a calculator . di 2/3 + 9^(1/2) 3.6666667 Entering data There are four commonly used ways to enter data into Stata. Which method you will use depends upon the way the data is made available to you. A. Entering data from a previously saved Stata dataset. If the data has been saved in a stata dataset the simplest way to open it is to use the open command from the file menu. If the data
You should try this. If your data is stored as a special file from another program. Data from other programs. Try the command use auto. If you get the reply “ file auto.dta it can also be loaded with the command use filename . You can change the command to use c:\stata8\auto and it will load properly. The command use or open replaces the data in memory if there is any with new data.dta in the directory c:\stata8. you can convert the data to a Stata dataset using the program StatTransfer. Entering data using stata’s spreadsheet editor. In order to avoid this loss of data stata will return an error message and will not load the new data. In order to clear the existing data from memory and replace it with new data we enter the command use filename.raw in the current working directory. (Special note for econ209/210/310: If you are using a file which is saved on the Economics server Irving you can use the command: use “\\\Irving\Data\econ209\wages”. Microsoft Excel for example. If the data file has another extension the full filename should be used. C.raw is assumed if you do not specify otherwise. Similar instructions apply for econ 210 and 310. that means that the file is not in the current directory. Entering data from a file. Examples input x1 input x1 x2 x3 After the last observation is entered type end. There is a file named auto. This data can be loaded into stata using the command. b. Entering data from the keyboard. This would involve a loss of data if you have data in memory which has not been saved.
. The file menu import command contains other options for other types of data. and c stored in a file filename. D.) B. infile a b c using filename The extension . Suppose the data consists of 100 observations of variables a.set has been stored with the name filename. A simpler way in this case is to choose open from the file menu then go to Network then Irving then Data then econ209 then the file you are looking for. clear. Stata has a spreadsheet-like editor which can be accessed by using the command edit Instructions for using the editor are found in chapter 4 of the Getting Started Manual E. But in this case a simpler way is to use import from the file menu.dta not found”. which is on each of the Econlab Machines. Data may be entered from the keyboard using the input command.
the mean. x2.) Examples of egen follow: egen xbar= mean(x) egen xdev = x . maximum. For example this can be done with the command g t = _n followed by the command tsset t. minimum. x2. The command keep x1 x2 x3 keeps the variables x1. describe. Also the symbol == (double equal sign) means a relational equality rather than an assignment equality. row standard deviation. and x3 and drops all the rest. L2. Then lagged values can be designated as L. and summarize can be used to examine the data. and x5.varname. standard deviation. the standard deviation. Lagged values of variables are frequently used in economics.xbar
. The command summarize lists for each of the variables in memory the number of observations. Lagged variables. They are & (and). A lagged variable can be created in using the following way: First define a time series index. Generate and replace use the familiar arithmetic operators: +. | (or) and ~ (not).y designates a lagged value of the variable y. For example L. Commands to manipulate data: The most basic stata commands for manipulating data are the commands generate and replace. Describe and summarize can be restricted to a specific list of variables just like the command list. ( If you need to use egen it is documented in the reference manual and in help egen.Stata Commands Commands to report on data: The commands list. For example the command drop x1 x2 x3 eliminates the variables x1. The command describe describes the nature of all the variables in memory. The “L” can be capitalized or not as you prefer. x2. count. *. median. row mean. ^. the command list x1 x2 x5 lists the variables x1. /.invest designates the variable invest lagged twice. -. etc. The command egen (extensions to generate): this command generates several built in functions including mean. and x3. replace is used to replace the values of a variable which already exists. A few other commands for data reporting include table and tabulate. The command list lists all the variables observation by observation. the maximum and the minimum. generate x3 = x2/x1 generate x4 = x3^2 replace x4 = x3^3 g lny = ln(y) (or lny =log(y)) g expy = exp(y) g sqrtx = sqrt(x) g y = x +z if t > 0 (dummy variables) gd=0 replace d = 1 if t >=1978 gd=0 replace d = 1 if age > 21 & age <= 35 g d = 1 if region == “west” Logical operators appear an many expressions. Generate is used to create a new variable. moving average. For example. The commands drop and keep: The commands drop and keep are used to alter the number of variables in the data set.
This file is neither readable or printable as an text file.
. The log can be turned off by using the command log off. A log file created in this way is a text file and can be edited with a word processing program or a text editing program. (Note: the double equals sign means relational equality ). You can repeat a command by simply clicking on the command.smcl. options] The elements enclosed by square brackets are optional. You can then use the do file editor to edit the commands. The command list x2 x3 in 34/41 lists the values of the variables x2 and x3 in observations 34 through 41. The if exp part of a command instructs stata to perform the command if the expression is non zero or true.smcl filename. the command generate z = x1 + x2 if y creates the new variable z for all observation with a non zero y. Do Files Do files are programs or macros in stata. However it can be converted to a printable file with the following command translate filename. You can save the commands in the review window as a do file by clicking on the box at the upper left of the review window and choosing the save review contents option. If you simply use the command log using filename. When you are done you can close the log file with the command log close. text opens a log file with the name filename. For example the command summarize y [weights = x2] summarizes the variable y calculating the mean and standard deviation using the variable x2 as weights. You can create a do file in any text editor such as wordpad or notepad or as text output from a word processor. For example the command by y: summarize x1 x2 summarizes the variables x1 and x2 for each value which y takes on. the file that will be saved will as filename. The command log using filename. The command summarize x if region == “west” summarizes the variable x for all observations in region “west”. Notice that in the case of weights the square brackets must be included. When you are working in stata the commands which you have executed appear in the review window. In range instructs Stata to perform the indicated operation for observations within a specified range. For example.log.log.egen z = mean(2*x2 + x4) egen racesex = group( race sex) Command Syntax With few exceptions stata commands have the following syntax [by varlist:] command [varlist] [= exp] [if exp] [in range] [weight] [. You can use a do file whenever there is a set of commands which you want to repeat. Weights specifies the weights for the command. The log can be resumed with the command log on. The results of your work can be saved in a log file. Logging Results . Stata also has a built in do file editor. ( See help weights). The by varlist: prefix instructs stata to perform the indicated operation for every value of the variables which appear in the varlist.
find the fitted values of y. They include probit. logit. ovtest. acc test x5 = 10*x2. Graphing : Stata has a wide array of graphic commands. oprobit. among other things. and x4. For example. Another useful post regression command is predict. re puts the estimated residuals in a variable called e. reg y x2 x3 x4. This command can be used to. or to make out of sample predictions. dwstat. the coefficient of x4 = 3. If a regression through the origin is desired the command is reg y x2 x3 x4. The test command can be used to test two or more hypotheses jointly. The command twoway scatter y z x will produce a graph of y and z against x. coefficient of x2 =3. x3. noconstant. The simplest command twoway scatter y x will produce a graph of y against x. find the residuals. suppose I run the regression. There are also a variety of post-regression commands: The command test can be used to test a variety of hypotheses about the coefficients . and 2 times the coefficient of x3 +coefficient of x4 = 4. The command predict e . Other estimation commands: Stata has a variety of estimation commands which are similar to the regress command. Predict can also be used to make out of sample predictions. and the coefficient of x5 = 10 time the coefficient of x2. In addition to the command predict there are several diagnostic command available after the regression command: vif. For example: test x2 = x3 test x4 = 3.
. The command predict se. For example the coefficient of x3 is _b[x3]. coefficient of x2 = coefficient of x4. The command predict yhat puts the fitted values from the regression in the variable yhat. coefficients of x3 and x4 are both = 0.Regression The command reg y x2 x3 x4 instructs stata to run the regression of y on a constant and the explanatary variables x2. the coefficients and standard errors are saved as _b[variable name] and _se[variable name]. poisson. test x2 = x3 = x4 = x5 This command instructs stata to test the joint hypothesis that the coefficient of all these variables are equal. hettest. Accessing estimation results: Every estimation command saves information which can be accessed. tobit. etc. Examples: test x2 test x3 x4 test x2 = x4 test x2 = 3 test 2*x3+x4 = 4 These commands test respectively the following hypotheses: coefficient of x2 = 0. acc This set of command test the joint hypothesis that the coefficients of x2 and x3 are equal. For example. There are a variety of other options. More complex joint hypotheses can be test using the accumulate option after the test command. stdp stores the estimated standard deviation of as the variable se.