You are on page 1of 5

STATA COMMANDS

Note: Brackets indicate a variable name (do not include the brackets). A vertical bar indicates a mandatory choice. WILDCARDS var* refers to all variables starting with "var" var? refers to all variables starting with "var" and with one additional character VARIABLE MANAGEMENT CREATE A NEW VARIABLE generate [new variable name] = function DELETE A VARIABLE drop [variable name] CREATE A NORMALLY DISTRIBUTED VARIABLE generate [new variable name] = rnormal() SHOW DATA list [variable name] CONVERT STRING VARIABLE TO NUMERIC VARIABLE destring [string variable name], replace|generate CREATE A SEQUENCE OF DUMMIES BASED ON A CATEGORICAL VARIABLE tabulate [catvar], generate(dumvar) Note: The sequence of dummy variables (in this example) will be called dumvar1, dumvar2, dumvar3, etc. CONVERT LABELS TO NUMERIC IDENTIFIERS egen [new numeric identifier variable] = group([variable containing labels]) CHANGE MAXIMUM NUMBER OF OBSERVATIONS set [number of observations] DECLARE DATA SET TO BE TIME SERIES tsset [date variable] DECLARE DATA SET TO BE 2-D PANEL tsset [cross-section variable] [date variable] USE A SUBET OF THE DATA regress ... if [variable] [condition] . indicates a missing observation and has a large value; hence, "if [variable] < ." omits missing variables & indicates "and" == indicates "equality" | indicates "or"

REGRESSION OLS REGRESSION regress [dependent variable] [regressor 1] [regressor 2] ... [regressor N] OLS REGRESSION WITH ARMA AND UNIT ROOT CORRECTION arima [dependent variable] [regressor 1] [regressor 2] ... [regressor N], arima([AR order], [integration order],[MA order]) ar(n/m) specifies ar orders n through m; ar(n m) specifies ar orders n and m. use the option CONDITION after arima to specify conditional maximum likelihood instead of unconditional maximum likelihood; conditional ML is sometimes necessary when performing ARMA correction in a panel model. OLS REGRESSION WITH HETEROSKEDASTICITY CORRECTION regress [dependent variable] [regressor 1] [regressor 2] ... [regressor N], vce(hc3) PANEL REGRESSION (GLS when using random effects) xtreg [dependent variable] [regressor 1] [regressor 2] ... [regressor N], [option] For [option], use RE for random effects, BE for time-specific fixed effects, and FE for cross-sectional fixed effects. TWO STAGE LEAST SQUARES ivregress 2sls [dependent variable] ( [endogenous variable] = [list of instruments] ) [list of exogenous variables] regress [dependent variable] [list of endogenous and exogenous regressors] ( [list of exogenous regressors and instruments] ) Both methods perform 2SLS, but the first method allows for the post-estimation endogeneity tests. NON-LINEAR LEAST SQUARES nl ([equation]) [equation] takes the form (for example) y={alpha=0}+{beta=.5}*x Note: nl does not like missing data in the regressors. Eliminate with IF (for example) nl (y={alpha}+{beta1}*x1+{beta2}x2) if x1<. & x2<. Note: The missing variable indicator (.) evaluates to positive infinity CONSTRAINED REGRESSION constraint define [num] [variable] = [variable] [num] is the constraint number; [variable] refers to the coefficient attached to variable cnsreg [dependent variable] [list of independent variables], constraints([num]) LOGIT REGRESSION logit [dependent variable name] [independent variable names] ORDERED LOGIT REGRESSION ologit [dependent variable name] [independent variable names] MULTINOMIAL LOGIT REGRESSION mlogit [dependent variable name] [independent variable names] TRUNCATED REGRESSION truncreg [dependent variable name] [independent variable name], [ll([lower truncation limit]) or ul([upper truncation limit])]

CENSORED REGRESSION cnreg [dependent variable name] [independent variable name], censored([filter variable name]) Note: the filter variable is a vector where -1 indicates that the observation on the dependent variable is left censored; 1 indicates observation is right censored, and 0 indicates observation is not censored Note: left censored means "true measure is less than or equal to recorded measure", right censored means "true measure is greater than or equal to recorded measure" FITTED MEASURES, RESIDUALS, FORECAST STANDARD ERRORS FROM LAST REGRESSION Note: These commands generate residuals and forecasts based on the last run regression. OLS: Prediction: predict [new variable name] Forecast Standard Error: predict [new variable name], stdb Residuals: predict [new variable name], residuals Panel Data:

Residual plus fixed effects (total residual): predict [new variable name], ue Fixed effects (individual specific residual component): predict [new variable name], u Non-specific residual: predict [new variable name], e NLS: Prediction: predictnl [new variable name] = predict() Forecast Standard Error: predict [new fitted variable name] = predict(), se([new se variable name]) GRANGER CAUSALITY TEST var [list of dependent variables], lags(A/B) [A and B are the upper and lower limits on the lags] vargranger NOTE: The null hypothesis is no granger causality. TESTS TEST FOR NORMALITY sktest [variable name] Note: The null hypothesis is normality. PORTMANTEAU (Q) TEST FOR SERIAL CORRELATION wntestq [variable name], lags(#) CORRELAGRAM corrgram [variable name]

BREUSCH-PAGAN TEST FOR HETEROSKEDASTICITY hettest (run this after running a regression) TESTS FOR ENDOGENEITY estat endogenous (run this after running a regression) DICKEY-FULLER TEST FOR NON-STATIONARITY dfuller [variable name], [option] [option] = {noconstant, trend, drift} Note: The null hypothesis is non-stationarity. PHILLIPS-PERRON TEST FOR NON-STATIONARITY pperron [variable name], [option] [option] = {noconstant, trend, drift} Note: The null hypothesis is non-stationarity. Note: Use this test when testing a residual for non-stationarity. HAUSMAN TEST FOR RANDOM EFFECTS Note: The null hypothesis is that random effects are consistent and efficient. SERIAL CORRELATION TEST FOR PANEL DATA xtserial Note: This is a module that must be installed. See MODULES section. TRANSFORMATIONS D.[variable name] First difference in the variable L.[variable name] Variable lagged one period GRAPHING twoway (scatter [y1 variable] [y2 variable] ... [x variable]) plot [y variable] [x variable] MODULES SSC NEW List new modules available. SSC INSTALL [module name] Install an available module. FINDIT [module name] Locate a module by name.

OTHER SAVE COMMANDS AND OUTPUT log using [filename] Writes all subsequent commands and output to a file. log using [filename], noproc Writes all subsequent commands, but no output, to a file. log off Suspends logging. log on Resumes logging. log close Stops logging and closes the log file. RESTRICT OPERATION TO A SUBSET OF THE DATA [command] in [starting observation]/[ending observation] GENERATE CORRELATION MATRIX correlate [variable name, variable name, ...] FOR NEXT LOOP forvalues i=[start](step)[end] { generate x`i'=random } GENERATE DATE VARIABLES FROM NUMERIC VARIABLES generate [newvar] = date([var],mask) examples of mask: 5/12/2008 --> "MDY" May 12, 2008 --> "MDY" 2008-05-12 10:18:00 --> "YMDhms" Wed May 12 --> "#MD" DECLARE A VARIABLE TO BE A DATE format [var] filter examples of filter: %td --> date %tw --> week %tm --> month FINDIT [command] Searches help and online databases for information on the command or statement. HELP [command] Provides help on a specific command. SEARCH [terms] Searches help text for the specified terms.