You are on page 1of 2

Stata to R :: CHEAT SHEET

Introduction Summarize Data example data: `wage1` Estimate Models, 1/2


This cheat sheet summarizes common Stata Where Stata only allows one to work with one data set
commands for econometric analysis and provides at a time, multiple data sets can be loaded into the R OLS example data: `wage1`

their equivalent expression in R. environment simultaneously, and hence must be reg wage educ // simple regression mod1 <- lm(wage ~ educ, data =
specified with each function call. Note: R does not have of `wage` by `educ` (Results wage1) # simple regression of
an equivalent to Stata’s `codebook` command. printed automatically). `wage` by `educ`, store results in
References for importing/cleaning data, manipulating `mod1`
variables, and other basic commands include Hanck summary(mod1) # print summary of
browse // open browser for loaded data
et al. (2019), Econometrics with R, and Wickham and reg wage educ if nonwhite==1 // `mod1` results
Grolemund (2017), R for Data Science. add condition with if statement
describe // describe structure of
loaded data mod2 <- lm(wage ~ educ, data =
Example data comes from Wooldridge Introductory summarize // display summary reg wage educ exper, robust // wage1[wage1$nonwhite==1, ]) # add
Econometrics: A Modern Approach. Download Stata statistics for all variables in multiple regression using HC1 condition with if statement`
data sets here. R data sets can be accessed by dataset robust standard errors mod3 <- estimatr::lm_robust(wage ~
installing the `wooldridge` package from CRAN. list in 1/6 // display first 6 rows reg wage educ exper, educ + exper, data = wage1, se_type
cluster(numdep) // use clustered = “stata”) # multiple regression
tabulate educ // tabulate `educ` standard errors with HC1 (Stata default) robust
All R commands written in base R, unless otherwise variable frequencies standard errors, use {estimatr}
noted. tabulate educ female // cross-tabulate package
`educ` and `female` frequencies Tip: An alternate way to compute robust mod4 <- estimatr::lm_robust(wage ~
standard errors in R for any models not
Setup covered by {estimatr} package is load the
educ + exper, data = wage1,
clusters = numdep) # use clustered
Note: While it is common to create a `log` file in {AER} package and run: standard errors.
Stata to store the commands and output of Stata View(wage1) # open browser for loaded
`wage1` data coeftest(mod1, vcov. = vcovHC,
sessions, the equivalent does not exist in R. A more
savvy version in R is to create a R-markdown file to type = "HC1")
str(wage1) # describe structure of mod_log <- glm(inlf~nwifeinc + educ
capture code and output. `wage1` data + family=binomial(link="logit"),
summary(wage1) # display summary data=mroz) # estimate logistic
ssc install outreg2 // install statistics for `wage1` variables
MLE (Logit/Probit/Tobit) example data:`mroz` regression
`outreg2` package. Note: unlike R head(wage1) # display first 6 (default)
packages, Stata packages do not have rows data
to be loaded each time once installed. tail(wage1) # display last 6 rows logit inlf nwifeinc educ // mod_pro <- glm(inlf~nwifeinc + educ
estimate logistic regression + family=binomial(link=“probit"),
table(wage1$educ) #tabulate `educ` data=mroz) # estimate logistic
install.packages(“wooldridge”) # install frequencies regression
table(“yrs_edu” = wage1$educ, “female” =
probit inlf nwifeinc educ //
`wooldridge` package
wage1$female) # tabulate `educ` estimate logistic regression
data(package = “wooldridge”) # list frequencies name table columns mod_tob <- AER::tobit(hours ~
datasets in `wooldridge` package nwifeinc + educ, left = 0, data =
tobit hours nwifeinc educ, ll(0) mroz) # estimate tobit regression,
Tip: The {AER} package will automatically // estimate tobit regression,
load(wage1) # load `wage1` dataset into lower-limit of y censored at zero,
load other useful dependent packages, lower-limit of y censored at zero use {AER} package
session including: {car}, {lmtest}, {sandwich} which
?wage1 # consult documentation on are used for many of the commands listed in
`wage1` dataset this cheat sheet. Postestimation, 1/2 example data:`wage1`

Note: Postestimation commands in Stata apply to the most recently run estimation commands.
Basic plots example data:`wage1`

hist(wage) // histogram of `wage` hist(wage1$wage) # histogram of `wage` reg wage educ // estimation used mod1 <- lm(wage ~ educ, data =
hist(wage), by(nonwhite) // for the following post-estimation wage1) # estimation used for the
scatter(wage educ) // scatter plot commands following post-estimation commands
plot(y = wage$1wage, x = wage1$educ) #
of `wage` by `educ` scatter plot predict yhat // get predicted yhat <- predict(mod1) # get
twoway (scatter wage educ) (lfit abline(lm(wage1$wage~wage1$educ), values from last estimation, store predicted values
wage educ) // scatter plot with col=“red”) # add fitted line to as `yhat`
fitted line scatterplot
predict e, res // get residuals e <- residuals(mod1) # get residual
graph box wage, by(nonwhite) //
boxplot(wage1$wage~wage1$nonwhite) # from last estimation, store as `e` values
boxplot of wage by `nonwhite`
boxplot of `wage` by `nonwhite`

CC BY SA Anthony Nguyen • @anguyen1210 • mentalbreaks.rbind.io • version 1.0.0 • Updated: 2019-10


Create/Edit Variables example data: `wage1` Estimate Models, 2/2
Note: where Stata only allows one to work with one data set at a time, multiple data sets can be loaded into
the R environment simultaneously, hence the data set must be specified for each command. Panel/Longitudinal example data: `murder`

xtset id year // set `id` as plm::is.pbalanced(murder$id,


gen exper2 = exper^2 // create wage1$exper2 <- wage1$exper^2 # murder$year) # check panel balance
`exper` squared variable create `exper` squared variable entities (panel) and `year` as
time variable with {plm} package
egen wage_avg = mean(wage) // create wage1$wage_avg <- mean(wage1$wage) #
create average wage variable xtdescribe // describe pattern of modfe <- plm::plm(mrdrte ~ unem,
average wage variable index = c("id", "year"),model =
xt data
"within", data = murder) # estimate
drop tenursq // drop `tenursq` xtsum // summarize xt data fixed effects (“within”) model
variable wage1$tenursq <- NULL #drop `tenursq`
xtreg mrdrte unem, fe // fixed summary(modfe) # display results
effects regression
keep wage educ exper nonwhite // keep wage1 <- wage1[ , c(“wage”, “educ”,
selected variables “exper”, “nonwhite”)] # keep selected
variables Instrumental Variables (2SLS) example data: `mroz`

tab numdep, gen(numdep) // create wage1 <- ivreg lwage (educ = fatheduc), modiv <-AER::ivreg(lwage ~ educ |
fastDummies::dummy_cols(wage1, first // show results of first fatheduc, data = mroz) # estimate
dummy variables for `numdep` 2SLS with {AER} package
select_columns = “numdep”) # create stage regression
recode exper (1/20 = 1 "1 to 20 dummy variables for `numdep`, use summary(modiv, diagnostics = TRUE)
{fastDummies} package etest first // test IV and
years") (21/40 = 2 "21 to 40 years") # get diagnostic tests of IV and
endogenous variable endogenous variable
(41/max = 3 "41+ years"), ivreg lwage(educ = fatheduc) //

{
gen(experlvl) // recode `exper` and wage1$experlvl <- 3 # recode `exper`
show results of 2SLS directly
gen new variable wage1$experlvl[wage1$exper < 41] <- 2
wage1$experlvl[wage1$exper < 21] <- 1

Post-estimation, 2/2 example data: `wage1`


Statistical tests / diagnostics example data: `wage1`
Note: Postestimation commands in Stata apply to the most recently run estimation commands.
reg lwage educ exper // estimation mod <-lm(lwage ~ educ exper, data =
used for examples below wage1) # estimate used for examples
reg lwage educ exper##exper // mod1 <- lm(lwage ~ educ + exper +
estat hettest // Breusch-Pagan / below
estimation used for following post- I(exper^2), data = wage1) # Note: in
Cook-Weisberg test for lmtest::bptest(mod) # Breusch-Pagan estimation commands R, mathematical expressions inside a
heteroskedasticity / Cook-Weisberg test for hetero- formula call must be isolated with
skedasticity using the {lmtest} estimates store mod1 // stores in
estat ovtest // Ramsey RESET test `I()`
package memory the last estimation results
for omitted variables to `mod1`
lmtest::resettest(mod) # Ramsey
ttest wage, by(nonwhite) // RESET test
independent group t-test, compare margins // get average predictive margins::prediction(mod1) # get
t.test(wage ~ nonwhite, data =
means of same variable between margins average predictive margins with
wage1) # independent group t-test
groups {margins} package
margins, dydx(*) // get average
m1 <- margins::margins(mod1) # get
marginal effects for all variables
Interactions, categorical/continuous variables example data: `wage1`
marginsplot // plot marginal
average marginal effects for all
variables
In Stata, it is common to use special operators to specify the treatment of variables as continuous (`c.`) or effects plot(m) # plot marginal effects
categorical (`i.`). Similarly, the `#` operator denotes different ways to return the interaction of those
variables. Here we show some common uses of these operators as well as their R equivalents. margins, dydx(exper) // average summary(m) # get detailed summary of
marginal effects of experience marginal effects
reg lwage i.numdep // treat lm(lwage ~ as.factor(numdep), data
margins, at(exper=(1(10)51)) //
= wage1) # treat `numdep` as factor margins::prediction(mod1, at =
`numdep` as a factor variable average predictive margins over list(exper = seq(1,51,10))) #
reg lwage c.educ#c.exper // return lm(lwage ~ educ:exper, data = `exper` range at 10-year increments predictive margins over `exper` range
interaction term only wage1) # return interaction term at 10-year increments
only
reg lwage c.educ##c.exper // return estimates use mod1 // loads `mod1`
full factorial specification lm(lwage ~ educ*exper, data = stargazer::stargazer(mod1, mod2, type
wage1) # return full factorial back into working memory
reg lwage c.exper##i.numdep // = “text”) # use {stargazer} package,
specification estimates table mod1 mod2 // with `type=text` to display results
return full, interact continuous display table with stored
and categorical lm(wage ~ exper*as.factor(numdep), within R. Note: `type= ` also can be
data = wage1) # return full, estimation results changed for LaTex and HTML output.
interact continuous and categorical

CC BY SA Anthony Nguyen • @anguyen1210 • mentalbreaks.rbind.io • version 1.0.0 • Updated: 2019-10

You might also like