Cheat Sheet: With Stata

Data Analysis Declare Data By declaring data type, you enable Stata to apply data munging and analysis
By declaring data type, you enable Stata to apply data munging and analysis functions specific to certain data types
with Stata Cheat Sheet TIME SERIES webuse sunspot, clear PANEL / LONGITUDINAL webuse nlswork, clear
For more info, see Stata’s reference manual (stata.com) tsset time, yearly xtset id year
Results are stored as either r -class or e -class. See Programming Cheat Sheet declare sunspot data to be yearly time series declare national longitudinal data to be a panel
Summarize Data Examples use auto.dta (sysuse auto, clear)
unless otherwise noted r
tsreport
report time-series aspects of a dataset
xtdescribe
report panel aspects of a dataset
xtline plot
r wage relative to inflation
univar price mpg, boxplot ssc install univar generate lag_spot = L1.spot xtsum hours 4
id 1 id 2
calculate univariate summary with box-and-whiskers plot create a new variable of annual lags of sunspots tsline plot summarize hours worked, decomposing 2
stem mpg tsline spot

Number of sunspots 200 standard deviation into between and 0
return stem-and-leaf display of mpg within components

id 3 id 4
plot time series of sunspots

100 4
summarize price mpg, detail frequently used commands are

highlighted in yellow
e
arima spot, ar(1/2)
0
1850 1900 1950 xtline ln_wage if id <= 22, tlabel(#3) 2
calculate a variety of univariate summary statistics plot panel data as a line plot
0
estimate an autoregressive model with 2 lags

1970 1980 1990
ci mean mpg price, level(99) for Stata 13: ci mpg price, level (99)
TIME-SERIES OPERATORS e
xtreg ln_w c.age##c.age ttl_exp, fe vce(robust)
compute standard errors and confidence intervals L. lag x t-1 L2. 2-period lag x t-2 estimate a fixed-effects model with robust standard errors
r
correlate mpg price F. lead x t+1 F2. 2-period lead x t+2 SURVEY DATA webuse nhanes2b, clear
return correlation or covariance matrix

D. difference x t-x t-1 D2. difference of difference xt-xt−1-(xt−1-xt−2)
svyset psuid [pweight = finalwgt], strata(stratid)
S. seasonal difference x t-xt-1 S2. lag-2 (seasonal difference) xt−xt−2
pwcorr price mpg weight, star(0.05) declare survey design for a dataset
USEFUL ADD-INS r
return all pairwise correlation coefficients with sig. levels tscollap compact time series into means, sums, and end-of-period values svydescribe
mean price mpg carryforward carry nonmissing values forward from one obs. to the next report survey-data details
estimates of means, including standard errors tsspell identify spells or runs in time series svy: mean age, over(sex)
proportion rep78 foreign SURVIVAL ANALYSIS webuse drugtr, clear estimate a population mean for each subpopulation
estimates of proportions, including standard errors for stset studytime, failure(died) svy, subpop(rural): mean age
e categories identified in varlist estimate a population mean for rural areas
r declare survey design for a dataset e
ratio stsum svy: tabulate sex heartatk
estimates of ratio, including standard errors summarize survival-time data report two-way table with tests of independence
total price e
stcox drug age svy: reg zinc c.age##c.age female weight rural
estimates of totals, including standard errors estimate a Cox proportional hazard model estimate a regression using survey weights
Statistical Tests 1 Estimate Models stores results as e-class 2 Diagnostics some are inappropriate with robust SEs
tabulate foreign rep78, chi2 exact expected regress price mpg weight, vce(robust) estat hettest test for heteroskedasticity
tabulate foreign and repair record and return chi2 estimate ordinary least-squares (OLS) model r ovtest test for omitted variable bias
and Fisher’s exact statistic alongside the expected values on mpg weight and foreign, apply robust standard errors vif report variance inflation factor
ttest mpg, by(foreign) regress price mpg weight if foreign == 0, vce(cluster rep78) dfbeta(length) Type help regress postestimation plots
estimate t test on equality of means for mpg by foreign regress price only on domestic cars, cluster standard errors calculate measure of influence for additional diagnostic plots
rreg price mpg weight, genwt(reg_wt) rvfplot, yline(0) avplots
r prtest foreign == 0.5 estimate robust regression to eliminate outliers plot residuals plot all partial-
price
price
Residuals
one-sample test of proportions
mpg rep78
probit foreign turn price, vce(robust) against fitted regression leverage
ADDITIONAL MODELS values plots in one graph
price
price
ksmirnov mpg, by(foreign) exact estimate probit regression with pca built-in Stata principal components analysis
Fitted values headroom weight
Kolmogorov–Smirnov equality-of-distributions test robust standard errors

3 Postestimation commands that use a fitted model
command
factor factor analysis
ranksum mpg, by(foreign) logit foreign headroom mpg, or poisson • nbreg count outcomes
equality tests on unmatched data (independent samples) estimate logistic regression and tobit censored data
regress price headroom length Used in all postestimation examples
report odds ratios ivregress ivreg2 instrumental variables
anova systolic drug webuse systolic, clear bootstrap, reps(100): regress mpg /* rd ssc install ivreg2 regression discontinuity
diff user-written difference-in-difference display _b[length] display _se[length]
analysis of variance and covariance */ weight gear foreign return coefficient estimate or standard error for mpg
xtabond xtdpdsys dynamic panel estimator from most recent regression model
e pwmean mpg, over(rep78) pveffects mcompare(tukey) estimate regression with bootstrapping teffects psmatch propensity score matching
jackknife r(mean), double: sum mpg synth margins, dydx(length) returns e-class information when post option is used
estimate pairwise comparisons of means with equal
variances include multiple comparison adjustment jackknife standard error of sample mean oaxaca
synthetic control analysis
Blinder–Oaxaca decomposition r
return the estimated marginal effect for mpg
margins, eyex(length)
Estimation with Categorical & Factor Variables more details at http://www.stata.com/manuals/u25.pdf return the estimated elasticity for price
CONTINUOUS VARIABLES OPERATOR DESCRIPTION EXAMPLE predict yhat if e(sample)
measure something i. specify indicators regress price i.rep78 specify rep78 variable to be an indicator variable create predictions for sample on which model was fit
CATEGORICAL VARIABLES
ib. specify base indicator regress price ib(3).rep78 set the third category of rep78 to be the base category predict double resid, residuals
identify a group to which
fvset
c.
command to change base
treat variable as continuous
fvset base frequent rep78
regress price i.foreign#c.mpg i.foreign
set the base to most frequently occurring category for rep78
treat mpg as a continuous variable and
calculate residuals based on last fit model
an observations belongs specify an interaction between foreign and mpg test headroom = 0
r test linear hypotheses that headroom estimate equals zero
o. omit a variable or indicator regress price io(2).rep78 set rep78 as an indicator; omit observations with rep78 == 2
INDICATOR VARIABLES
# specify interactions regress price mpg c.mpg#c.mpg create a squared mpg term to be used in regression
T F denote whether lincom headroom - length
something is true or false ## specify factorial interactions regress price c.mpg##c.mpg create all possible interactions with mpg (mpg and mpg )
test linear combination of estimates (headroom = length)
2
Tim Essam (tessam@usaid.gov) • Laura Hughes (lhughes@usaid.gov) inspired by RStudio’s awesome Cheat Sheets (rstudio.com/resources/cheatsheets) geocenter.github.io/StataTraining updated July 2019
follow us @StataRGIS and @flaneuseks Disclaimer: we are not affiliated with Stata. But we like it. CC BY 4.0
Programming Building Blocks basic components of programming Loops: Automate Repetitive Tasks
with Stata Cheat Sheet R- AND E-CLASS: Stata stores calculation results in two* main classes: ANATOMY OF A LOOP see also while
For more info, see Stata’s reference manual (stata.com) r return results from general commands
such as summarize or tabulate e return results from estimation
commands such as regress or mean
Stata has three options for repeating commands over lists or values:
foreach, forvalues, and while. Though each has a different first line,
1 Scalars both r- and e-class results contain scalars To assign values to individual variables use: the syntax is consistent:
scalar x1 = 3 Scalars can hold
1 SCALARS r individual numbers or strings objects to repeat over
create a scalar x1 storing the number 3 numeric values or 2 MATRICES e rectangular array of quantities or expressions foreach x of varlist var1 var2 var3 { open brace must
appear on first line
3 MACROS
temporary variable used
scalar a1 = “I am a string scalar” arbitrarily long strings e pointers that store text (global or local) only within the loop
create a scalar a1 storing a string * there’s also s- and n-class requires local macro notation
2 Matrices e-class results are stored as matrices 4 Access & Save Stored r- and e-class Objects command `x', option command(s) you want to repeat
can be one line or many
...
matrix a = (4\ 5\ 6) matrix b = (7, 8, 9) Many Stata commands store results in types of lists. To access these, use return or
close brace must appear
create a 3 x 1 matrix create a 1 x 3 matrix ereturn commands. Stored results can be scalars, macros, matrices, or functions. } on final line by itself
summarize price, detail mean price
matrix d = b' transpose matrix b; store in d r return list e ereturn list FOREACH: REPEAT COMMANDS OVER STRINGS, LISTS, OR VARIABLES
returns a list of scalars returns list of scalars, macros,
matrix ad1 = a \ d matrix ad2 = a , d matrices, and functions foreach x in|of [ local, global, varlist, newlist, numlist ] {
row bind matrices column bind matrices scalars:
r(N) = 74 Results are replaced
scalars:
e(df_r) = 73 Stata commands referring to `x' list types: objects over which the
commands will be repeated
matselrc b x, c(1 3) findit matselrc r(mean) = 6165.25... each time an r-class e(N_over) = 1 }
loops repeat the same command
select columns 1 & 3 of matrix b & store in new matrix x r(Var)
r(sd)
=
=
86995225.97...
2949.49...
/ e-class command
is called
e(N)
e(k_eq)
=
=
73
1 STRINGS over different arguments:
mat2txt, matrix(ad1) saving(textfile.txt) replace ... e(rank) = 1 foreach x in auto.dta auto2.dta { same as...
sysuse "auto.dta", clear
export a matrix to a text file sysuse "`x'", clear
ssc install mat2txt generate p_mean = r(mean) generate meanN = e(N) tab rep78, missing
tab rep78, missing
sysuse "auto2.dta", clear
DISPLAYING & DELETING BUILDING BLOCKS create a new variable equal to create a new variable equal to } tab rep78, missing
average of price obs. in estimation command LISTS
[scalar | matrix | macro | estimates] [list | drop] b foreach x in "Dr. Nick" "Dr. Hibbert" {
list contents of object b or drop (delete) object b preserve create a temporary copy of active dataframe set restore points display length ( "` x '" )
display length("Dr. Nick")
display length("Dr. Hibbert")
restore restore temporary copy to point last preserved to test code that }
[scalar | matrix | macro | estimates] dir changes data When calling a command that takes a string,
list all defined objects for that class
ACCESSING ESTIMATION RESULTS VARIABLES
surround the macro name with quotes.
matrix list b matrix dir scalar drop x1
After you run any estimation command, the results of the estimates are foreach x in mpg weight {
list contents of matrix b list all matrices delete scalar x1 stored in a structure that you can save, view, compare, and export. summarize `x'
• foreach in takes any list
as an argument with
regress price weight }
3 Macros public or private variables storing text
estimates store est1
Use estimates store must define list type
elements separated by
spaces
summarize mpg
summarize weight
to compile results foreach x of varlist mpg weight { • foreach of requires you
GLOBALS available through Stata sessions PUBLIC store previous estimation results est1 in memory for later use summarize `x' to state the list type,
which makes it faster
eststo est2: regress price weight mpg }
global pathdata "C:/Users/SantasLittleHelper/Stata" ssc install estout
define a global variable called pathdata eststo est3: regress price weight mpg foreign FORVALUES: REPEAT COMMANDS OVER LISTS OF NUMBERS
cd $pathdata add a $ before calling a global macro estimate two regression models and store estimation results iterator Use display command to
change working directory by calling global macro estimates table est1 est2 est3 forvalues i = 10(10)50 { show the iterator value at display 10
global myGlobal price mpg length print a table of the two estimation results est1 and est2 display ì' numeric values over each step in the loop display 20
} which loop will run
summarize $myGlobal ITERATORS
...
summarize price mpg length using global EXPORTING RESULTS i = 10/50 10, 11, 12, ...
The estout and outreg2 packages provide numerous flexible options for making tables i = 10(10)50 10, 20, 30, ...
LOCALS available only in programs, loops, or do-files PRIVATE after estimation commands. See also putexcel and putdocx commands. DEBUGGING CODE i = 10 20 to 50 10, 20, 30, ...
local myLocal price mpg length esttab est1 est2, se star(* 0.10 ** 0.05 *** 0.01) label set trace on (off ) see also capture and scalar _rc
create local variable called myLocal with the create summary table with standard errors and labels trace the execution of programs for error checking
strings price mpg and length esttab using “auto_reg.txt”, replace plain se
summarize `myLocal' add a ` before and a ' after local macro name to call export summary table to a text file, include standard errors PUTTING IT ALL TOGETHER sysuse auto, clear
summarize contents of local myLocal generate car_make = word(make, 1) pull out the first word
levelsof rep78, local(levels) outreg2 [est1 est2] using “auto_reg2.txt”, see replace from the make variable
create a sorted list of distinct values of rep78, export summary table to a text file using outreg2 syntax define the levelsof car_make, local(cmake) calculate unique groups of
car_make and store in local cmake
store results in a local macro called levels local i to be local i = 1
local varLab: variable label foreign can also do with value labels Additional Programming Resources an iterator
local cmake_len : word count `cmake' store the length of local
cmake in local cmake_len
store the variable label for foreign in the local varLab bit.ly/statacode foreach x of local cmake {
download all examples from this cheat sheet in a do-file display in yellow "Make group ì' is `x'"
TEMPVARS & TEMPFILES special locals for loops/programs ado update adolist ssc install adolist
tempvar temp1 if ì' == `cmake_len' {
initialize a new temporary variable called temp1
Update user-written ado-files List/copy user-written ado-files
generate `temp1' = mpg^2
tests the position of the
display "The total number of groups is ì'"
save squared mpg values in temp1
net install package, from (https://raw.githubusercontent.com/username/repo/master) in brackets when the
iterator, executes contents
summarize `temp1' summarize the temporary variable temp1 install a package from a Github repository condition is true
}
tempfile myAuto create a temporary file to see also https://github.com/andrewheiss/SublimeStataEnhanced local i = `++i' increment iterator by one
save `myAuto' be used within a program tempname configure Sublime text for Stata 11–15 }
Data Processing Basic Syntax
with Stata Cheat Sheet All Stata commands have the same format (syntax):
For more info, see Stata’s reference manual (stata.com) [by varlist1:] command [varlist2] [=exp] [if exp] [in range] [weight] [using filename] [,options]
apply the function: what are column to save output as condition: only apply to apply pull data from a file special options
Useful Shortcuts command across you going to do apply a new variable apply the function specific rows weights (if not loaded) for command
each unique to varlists? command to if something is true
combination of
F2 keyboard buttons Ctrl + 9 variables in In this example, we want a detailed summary
varlist1 bysort rep78 : summarize price if foreign == 0 & price <= 9000, detail with stats like kurtosis, plus mean and median
describe data open a new do-file
Ctrl + 8 Ctrl + D
To find out more about any command–like what options it takes–type help command
open the data editor highlight text in do-file,
clear then ctrl + d executes it
delete data in memory in the command line Basic Data Operations Change Data Types
AT COMMAND PROMPT Arithmetic Logic == tests if something is equal Stata has 6 data types, and data can also be missing:
= assigns a value to a variable no data true/false words numbers
add (numbers) & and == equal < less than
PgUp PgDn scroll through previous commands + combine (strings) missing byte string int long float double
! or ~ not != not <= less than or equal to To convert between numbers & strings:
− subtract or > greater than gen foreignString = string(foreign) "1"
Tab autocompletes variable name after typing part | or ~= equal 1 tostring foreign, gen(foreignString) "1"
>= greater or equal to
cls clear the console (where results are displayed) * multiply if foreign != 1 & price >= 10000 if foreign != 1 | price >= 10000
decode foreign , gen(foreignString) "foreign"
Set up / divide make

Chevy Colt
foreign
0
price
3,984
make
Chevy Colt
foreign
0
price
3,984
gen foreignNumeric = real(foreignString) "1"
Buick Riviera 0 10,372 Buick Riviera 0 10,372 1 destring foreignString, gen(foreignNumeric) "1"
pwd ^ raise to a power Honda Civic
Volvo 260
1
1
4,499
11,995
Honda Civic
Volvo 260
1
1
4,499
11,995
encode foreignString, gen(foreignNumeric) "foreign"
print current (working) directory recast double mpg
cd "C:\Program Files\Stata16" Explore Data generic way to convert between types
change working directory
dir VIEW DATA ORGANIZATION SEE DATA DISTRIBUTION Summarize Data
describe make price codebook make price
display filenames in working directory include missing values create binary variable for every rep78
display variable type, format, overview of variable type, stats, value in a new variable, repairRecord
dir *.dta and any value/variable labels number of missing/unique values
List all Stata data in working directory underlined parts tabulate rep78, mi gen(repairRecord)
are shortcuts – count summarize make price mpg one-way table: number of rows with each value of rep78
capture log close count if price > 5000 print summary statistics
use "capture" tabulate rep78 foreign, mi
close the log on any existing do-files or "cap" number of rows (observations) (mean, stdev, min, max) two-way table: cross-tabulate number of observations
log using "myDoFile.txt", replace can be combined with logic for variables for each combination of rep78 and foreign
create a new log file to record your work and results ds, has(type string) inspect mpg bysort rep78: tabulate foreign
search mdesc lookfor "in." show histogram of data and for each value of rep78, apply the command tabulate foreign
packages contain search for variable types, number of missing or zero tabstat price weight mpg, by(foreign) stat(mean sd n)
find the package mdesc to install extra commands that observations
expand Stata’s toolkit
variable name, or variable label create compact table of summary statistics displays stats
ssc install mdesc formats numbers for all data
isid mpg histogram mpg, frequency
install the package mdesc; needs to be done once check if mpg uniquely plot a histogram of the table foreign, contents(mean price sd price) f(%9.2fc) row
identifies the data distribution of a variable
Import Data create a flexible table of summary statistics
BROWSE OBSERVATIONS WITHIN THE DATA collapse (mean) price (max) mpg, by(foreign) replaces data
sysuse auto, clear for many examples, we Missing values are treated as the largest calculate mean price & max mpg by car type (foreign)
load system data (auto data) use the auto dataset. browse or Ctrl + 8 positive number. To exclude missing values,
use "yourStataFile.dta", clear open the data editor ask whether the value is less than "." Create New Variables
load a dataset from the current directory frequently used list make price if price > 10000 & !missing(price) clist ... (compact form) generate mpgSq = mpg^2 gen byte lowPr = price < 4000
commands are list the make and price for observations with price > $10,000 create a new variable. Useful also for creating binary
import excel "yourSpreadsheet.xlsx", /* highlighted in yellow variables based on a condition (generate byte)
display price[4]
*/ sheet("Sheet1") cellrange(A2:H11) firstrow generate id = _n bysort rep78: gen repairIdx = _n
display the 4th observation in price; only works on single values
import delimited "yourFile.csv", /* _n creates a running index of observations in a group
gsort price mpg (ascending) gsort –price –mpg (descending)
*/ rowrange(2:11) colrange(1:8) varnames(2) generate totRows = _N bysort rep78: gen repairTot = _N
sort in order, first by price then miles per gallon
import sas "yourSASfile.sas7bdat", bcat("value labels file") _N creates a running count of the total observations per group
duplicates report assert price!=. pctile mpgQuartile = mpg, nq = 4
import spss "yourSPSSfile.sav" see help import for
more options finds all duplicate values in each variable verify truth of claim
webuse set "https://github.com/GeoCenter/StataTraining/raw/master/Day2/Data" create quartiles of the mpg data
webuse "wb_indicators_long" levelsof rep78 egen meanPrice = mean(price), by(foreign) see help egen
set web-based directory and load data from the web display the unique values for rep78 calculate mean price for each group in foreign for more options
Data Transformation Reshape Data Manipulate Strings
with Stata Cheat Sheet webuse set https://github.com/GeoCenter/StataTraining/raw/master/Day2/Data GET STRING PROPERTIES
For more info, see Stata’s reference manual (stata.com) webuse "coffeeMaize.dta" load demo dataset display length("This string has 29 characters")
MELT DATA (WIDE → LONG) return the length of the string
Select Parts of Data (Subsetting) reshape variables starting unique id create new variable that captures charlist make * user-defined package
with coffee and maize variable (key) the info in the column names
SELECT SPECIFIC COLUMNS display the set of unique characters within a string
drop make reshape long coffee@ maize@, i(country) j(year) new variable display strpos("Stata", "a")
remove the 'make' variable convert a wide dataset to long return the position in Stata where a is first found
TIDY DATASETS
keep make price WIDE LONG (TIDY) have each obser- FIND MATCHING STRINGS
opposite of drop; keep only variables 'make' and 'price' country coffee coffee maize maize melt country year coffee maize
vation in its own display strmatch("123.89", "1??.?9")
2011 2012 2011 2012
FILTER SPECIFIC ROWS
Malawi 2011
Malawi Malawi 2012 row and each return true (1) or false (0) if string matches pattern
drop if mpg < 20 drop in 1/4 Rwanda Rwanda 2011 variable in its own display substr("Stata", 3, 5)
drop observations based on a condition (left)
Uganda cast
Rwanda
Uganda
2012
2011 column. return string of 5 characters starting with position 3
or rows 1–4 (right) CAST DATA (LONG → WIDE)
Uganda 2012
When datasets are list make if regexm(make, "[0-9]")
keep in 1/30 what will be create new variables tidy, they have a list observations where make matches the regular
opposite of drop; keep only rows 1–30 create new variables named unique id with the year added consistent, expression (here, records that contain a number)
keep if inrange(price, 5000, 10000) coffee2011, maize2012... variable (key) to the column name standard format
that is easier to list if regexm(make, "(Cad.|Chev.|Datsun)")
keep values of price between $5,000–$10,000 (inclusive) reshape wide coffee maize, i(country) j(year) return all observations where make contains
keep if inlist(make, "Honda Accord", "Honda Civic", "Subaru") manipulate and
convert a long dataset to wide analyze. "Cad.", "Chev." or "Datsun"
keep the specified values of make compare the given list against the first word in make
xpose, clear varname
sample 25 transpose rows and columns of data, clearing the data and saving list if inlist(word(make, 1), "Cad.", "Chev.", "Datsun")
sample 25% of the observations in the dataset old column names as a new variable called "_varname" return all observations where the first word of the
(use set seed # command for reproducible sampling) make variable contains the listed words
Replace Parts of Data Combine Data TRANSFORM STRINGS
display regexr("My string", "My", "Your")
CHANGE COLUMN NAMES ADDING (APPENDING) NEW DATA see help frames for using replace string1 ("My") with string2 ("Your")
multiple datasets
rename (rep78 foreign) (repairRecord carType) id blue pink
replace make = subinstr(make, "Cad.", "Cadillac", 1)
rename one or multiple variables id blue pink webuse coffeeMaize2.dta, clear replace first occurrence of "Cad." with Cadillac
should save coffeeMaize2.dta, replace load demo data in the make variable
CHANGE ROW VALUES webuse coffeeMaize.dta, clear
+
contain
replace price = 5000 if price < 5000
the same display stritrim(" Too much Space")
variables append using "coffeeMaize2.dta", gen(filenum) replace consecutive spaces with a single space
replace all values of price that are less than $5,000 with 5000
id blue pink
(columns) add observations from "coffeeMaize2.dta" to
current data and create variable "filenum" to display trim(" leading / trailing spaces ")
recode price (0 / 5000 = 5000)
track the origin of each observation remove extra spaces before and after a string
change all prices less than 5000 to be $5,000
MERGING TWO DATASETS TOGETHER display strlower("STATA should not be ALL-CAPS")
recode foreign (0 = 2 "US")(1 = 1 "Not US"), gen(foreign2) webuse ind_age.dta, clear
save ind_age.dta, replace change string case; see also strupper, strproper
change the values and value labels then store in a new must contain a
ONE-TO-ONE
variable, foreign2 common variable webuse ind_ag.dta, clear display strtoname("1Var name")
id blue pink (id) id brown
REPLACE MISSING VALUES
id blue pink brown _merge
merge 1:1 id using "ind_age.dta" convert string to Stata-compatible variable name
+ =
3
one-to-one merge of "ind_age.dta" display real("100")
mvdecode _all, mv(9999) useful for cleaning survey datasets 3
into the loaded dataset and create convert string to a numeric or missing value
replace the number 9999 with missing value in all variables
3
variable "_merge" to track the origin

mvencode _all, mv(9999) useful for exporting data
replace missing values with the number 9999 for all variables
MANY-TO-ONE
Save & Export Data
id blue pink id brown id blue pink brown _merge
webuse hh2.dta, clear
save hh2.dta, replace compress
+ =
3
Label Data .
3
1
webuse ind2.dta, clear compress data in memory
save "myData.dta", replace Stata 12-compatible file
Value labels map string descriptions to numbers. They allow the _merge code
1 row only
3
3
merge m:1 hid using "hh2.dta" saveold "myData.dta", replace version(12)
underlying data to be numeric (making logical tests simpler) (master) in ind2
. 1 many-to-one merge of "hh2.dta" save data in Stata format, replacing the data if
while also connecting the values to human-understandable text.
2 row only
(using) in hh2 . . 2 into the loaded dataset and create a file with same name exists
label define myLabel 0 "US" 1 "Not US"
3 row in
(match) both variable "_merge" to track the origin export excel "myData.xls", /*
label values foreign myLabel FUZZY MATCHING: COMBINING TWO DATASETS WITHOUT A COMMON ID */ firstrow(variables) replace
define a label and apply it the values in foreign export data as an Excel file (.xls) with the
reclink match records from different data sets using probabilistic matching ssc install reclink variable names as the first row
label list note: data note here jarowinkler create distance measure for similarity between two strings ssc install jarowinkler export delimited "myData.csv", delimiter(",") replace
list all labels within the dataset place note in dataset export data as a comma-delimited file (.csv)
Data Visualization BASIC PLOT SYNTAX: graph <plot type>
variables: y first
y1 y2 … yn x [in]
plot-specific options
[if], <plot options>
facet
by(var)
annotations
xline(xint) yline(yint) text(y x "annotation")
with Stata Cheat Sheet titles axes
For more info, see Stata’s reference manual (stata.com) title("title") subtitle("subtitle") xtitle("x-axis title") ytitle("y axis title") xscale(range(low high) log reverse off noline) yscale(<options>)
ONE VARIABLE sysuse auto, clear custom appearance plot size save
<marker, line, text, axis, legend, background options> scheme(s1mono) play(customTheme) xsize(5) ysize(4) saving("myPlot.gph", replace)
CONTINUOUS
histogram mpg, width(5) freq kdensity kdenopts(bwidth(5)) TWO+ CONTINUOUS VARIABLES
histogram
bin(#) • width(#) • density • fraction • frequency • percent • addlabels y1 graph matrix mpg price weight, half twoway pcspike wage68 ttl_exp68 wage88 ttl_exp88
addlabopts(<options>) • normal • normopts(<options>) • kdensity scatterplot of each combination of variables Parallel coordinates plot (sysuse nlswide1)
kdenopts(<options>) y2
half • jitter(#) • jitterseed(#) vertical, • horizontal
kdensity mpg, bwidth(3) y3 diagonal • [aweights(<variable>)]
smoothed histogram
bwidth • kernel(<options> main plot-specific options; twoway pccapsym wage68 ttl_exp68 wage88 ttl_exp88
twoway scatter mpg weight, jitter(7) Slope/bump plot (sysuse nlswide1)
normal • normopts(<line options>) see help for complete set scatterplot vertical • horizontal • headlabel
jitter(#) • jitterseed(#) • sort • cmissing(yes | no)
DISCRETE connect(<options>) • [aweight(<variable>)]
graph bar (count), over(foreign, gap(*0.5)) intensity(*0.5) THREE VARIABLES
bar plot graph hbar draws horizontal bar charts
(asis) • (percent) • (count) • over(<variable>, <options: gap(*#) • 23 twoway scatter mpg weight, mlabel(mpg) twoway contour mpg price weight, level(20) crule(intensity)
relabel • descending • reverse>) • cw •missing • nofill • allcategories • 20 scatterplot with labelled values 3D contour plot
percentages • stack • bargap(#) • intensity(*#) • yalternate • xalternate 17 jitter(#) • jitterseed(#) • sort • cmissing(yes | no) ccuts(#s) • levels(#) • minmax • crule(hue | chue | intensity | linear) •
graph bar (percent), over(rep78) over(foreign) 2 10 connect(<options>) • [aweight(<variable>)] scolor(<color>) • ecolor (<color>) • ccolors(<colorlist>) • heatmap
interp(thinplatespline | shepard | none)
grouped bar plot graph hbar ...
(asis) • (percent) • (count) • over(<variable>, <options: gap(*#) • twoway connected mpg price, sort(price)
relabel • descending • reverse>) • cw •missing • nofill • allcategories • regress price mpg trunk weight length turn, nocons
a b c percentages • stack • bargap(#) • intensity(*#) • yalternate • xalternate scatterplot with connected lines and symbols matrix regmat = e(V) ssc install plotmatrix
jitter(#) • jitterseed(#) • sort see also line plotmatrix, mat(regmat) color(green)
DISCRETE X, CONTINUOUS Y connect(<options>) • cmissing(yes | no) heatmap mat(<variable) • split(<options>) • color(<color>) • freq
graph bar (median) price, over(foreign) graph hbar ...
bar plot (asis) • (percent) • (count) • (stat: mean median sum min max ...) twoway area mpg price, sort(price) SUMMARY PLOTS
over(<variable>, <options: gap(*#) • relabel • descending • reverse line plot with area shading
sort(<variable>)>) • cw • missing • nofill • allcategories • percentages twoway mband mpg weight || scatter mpg weight
sort • cmissing(yes | no) • vertical, • horizontal plot median of the y values
stack • bargap(#) • intensity(*#) • yalternate • xalternate base(#)
bands(#)
graph dot (mean) length headroom, over(foreign) m(1, ms(S))
dot plot (asis) • (percent) • (count) • (stat: mean median sum min max ...) twoway bar price rep78
over(<variable>, <options: gap(*#) • relabel • descending • reverse binscatter weight mpg, line(none) ssc install binscatter
sort(<variable>)>) • cw • missing • nofill • allcategories • percentages bar plot plot a single value (mean or median) for each x value
linegap(#) • marker(#, <options>) • linetype(dot | line | rectangle) vertical, • horizontal • base(#) • barwidth(#)
dots(<options>) • lines(<options>) • rectangles(<options>) • rwidth medians • nquantiles(#) • discrete • controls(<variables>) •
linetype(lfit | qfit | connect | none) • aweight[<variable>]
graph hbox mpg, over(rep78, descending) by(foreign) missing
box plot graph box draws vertical boxplots twoway dot mpg rep78 FITTING RESULTS
over(<variable>, <options: total • gap(*#) • relabel • descending • reverse dot plot vertical, • horizontal • base(#) • ndots(#) twoway lfitci mpg weight || scatter mpg weight
sort(<variable>)>) • missing • allcategories • intensity(*#) • boxgap(#) dcolor(<color>) • dfcolor(<color>) • dlcolor(<color>)
medtype(line | line | marker) • medline(<options>) • medmarker(<options>) calculate and plot linear fit to data with confidence intervals
dsize(<markersize>) • dsymbol(<marker type>) level(#) • stdp • stdf • nofit • fitplot(<plottype>) • ciplot(<plottype>) •
vioplot price, over(foreign) ssc install vioplot dlwidth(<strokesize>) • dotextend(yes | no) range(# #) • n(#) • atobs • estopts(<options>) • predopts(<options>)
violin plot over(<variable>, <options: total • missing>) • nofill •
vertical • horizontal • obs • kernel(<options>) • bwidth(#) •
twoway dropline mpg price in 1/5 twoway lowess mpg weight || scatter mpg weight
barwidth(#) • dscale(#) • ygap(#) • ogap(#) • density(<options>) calculate and plot lowess smoothing
bar(<options>) • median(<options>) • obsopts(<options>) dropped line plot
vertical, • horizontal • base(#) bwidth(#) • mean • noweight • logit • adjust
Plot Placement twoway qfitci mpg weight, alwidth(none) || scatter mpg weight
JUXTAPOSE (FACET) twoway rcapsym length headroom price calculate and plot quadriatic fit to data with confidence intervals
twoway scatter mpg price, by(foreign, norescale) range plot (y1 ÷ y2) with capped lines level(#) • stdp • stdf • nofit • fitplot(<plottype>) • ciplot(<plottype>) •
range(# #) • n(#) • atobs • estopts(<options>) • predopts(<options>)
total • missing • colfirst • rows(#) • cols(#) • holes(<numlist>) vertical • horizontal see also rcap
compact • [no]edgelabel • [no]rescale • [no]yrescal • [no]xrescale
[no]iyaxes • [no]ixaxes • [no]iytick • [no]ixtick • [no]iylabel
REGRESSION RESULTS
[no]ixlabel • [no]iytitle • [no]ixtitle • imargin(<options>) regress price mpg headroom trunk length turn
SUPERIMPOSE twoway rarea length headroom price, sort coefplot, drop(_cons) xline(0) ssc install coefplot
range plot (y1 ÷ y2) with area shading Plot regression coefficients
graph combine plot1.gph plot2.gph... vertical • horizontal • sort baselevels • b(<options>) • at(<options>) • noci • levels(#)
combine two or more saved graphs into a single plot cmissing(yes | no) keep(<variables>) • drop(<variables>) • rename(<list>)
horizontal • vertical • generate(<variable>)
scatter y3 y2 y1 x, msymbol(i o i) mlabel(var3 var2 var1) regress mpg weight length turn
plot several y values for a single x value margins, eyex(weight) at(weight = (1800(200)4800))
twoway rbar length headroom price
graph twoway scatter mpg price in 27/74 || scatter mpg price /* range plot (y1 ÷ y2) with bars marginsplot, noci
*/ if mpg < 15 & price > 12000 in 27/74, mlabel(make) m(i) vertical • horizontal • barwidth(#) • mwidth Plot marginal effects of regression
combine twoway plots using || msize(<marker size>) horizontal • noci
Laura Hughes (lhughes@usaid.gov) • Tim Essam (tessam@usaid.gov) inspired by RStudio’s awesome Cheat Sheets (rstudio.com/resources/cheatsheets) geocenter.github.io/StataTraining updated July 2019
follow us @flaneuseks and @StataRGIS Disclaimer: we are not affiliated with Stata. But we like it. CC BY 4.0
Plotting in Stata ANATOMY OF A PLOT Apply Themes
Customizing Appearance title
annotation titles
subtitle Schemes are sets of graphical parameters, so you don’t
have to specify the look of the graphs every time.
For more info, see Stata’s reference manual (stata.com)
200
plots contain many features
y-axis 10 1
marker label USING A SAVED THEME
graph region
8 line
150
inner graph region 5 twoway scatter mpg price, scheme(customTheme)
y-axis title
9
inner plot region y-axis title 4 marker
Create custom themes by
100
6 help scheme entries saving options in a .scheme file
plot region y-axis labels 2 7 grid lines see all options for setting scheme properties
50
y-line 3 adopath ++ "~/<location>/StataThemes"
outer region inner region tick marks set path of the folder (StataThemes) where custom
0
0 20 40 60 80 100
scatter price mpg, graphregion(fcolor("192 192 192") ifcolor("208 208 208")) x-axis title .scheme files are saved
specify the fill of the background in RGB or with a Stata color x-axis set as default scheme
y2
scatter price mpg, plotregion(fcolor("224 224 224") ifcolor("240 240 240")) legend Fitted values set scheme customTheme, permanently
specify the fill of the plot background in RGB or with a Stata color change the theme
SYMBOLS LINES / BORDERS TEXT net inst brewscheme, from("https://wbuchanan.github.io/brewscheme/") replace
marker arguments for the plot line marker axes tick marks marker label titles axis labels
install William Buchanan’s package to generate custom
<marker objects (in green) go in the <line options> <marker xscale(...) grid lines <marker title(...) xlabel(...)
schemes and color palettes (including ColorBrewer)
SYNTAX
options> options portion of these xline(...) options> yscale(...) options> subtitle(...) ylabel(...) USING THE GRAPH EDITOR
commands (in orange) yline(...) xlabel(...) xtitle(...)
for example: legend ylabel(...) annotation legend
ytitle(...)
scatter price mpg, xline(20, lwidth(vthick)) legend(region(...)) text(...) legend(...) twoway scatter mpg price, play(graphEditorTheme)
mcolor("145 168 208") mcolor(none) lcolor("145 168 208") lcolor(none) color("145 168 208") color(none)
specify the fill and stroke of the marker specify the stroke color of the line or border specify the color of the text
in RGB or with a Stata color Select the
marker mlcolor("145 168 208") marker label mlabcolor("145 168 208")
Graph Editor
COLOR
mfcolor("145 168 208") mfcolor(none) tick marks tlcolor("145 168 208") axis labels labcolor("145 168 208")
specify the fill of the marker
adjust transparency by adding %#
grid lines glcolor("145 168 208") mcolor("145 168 208 %20")
msize(medium) specify the marker size: lwidth(medthick) marker mlwidth(thin) size(medsmall) specify the size of the text:
Click
specify the thickness tick marks tlwidth(thin) marker label mlabsize(medsmall)
Record
(stroke) of a line:
ehuge medlarge grid lines glwidth(thin) axis labels labsize(medsmall)
SIZE / THICKNESSS
vhuge
medium vvvthick medthin
Text vhuge Text medsmall
Text small Double-click on
medsmall
small
vvthick
vthick
thin
vthin
Text huge Text vsmall symbols and areas
huge Text vlarge Text tiny on plot, or regions
vsmall thick vvthin Text half_tiny on sidebar to
vlarge Text large third_tiny customize
tiny medthick vvvthin
Text
Text medlarge Text quarter_tiny

large vtiny medium none Text medium Text minuscule Unclick
Record
msymbol(Dh) specify the marker symbol: line axes lpattern(dash) specify the marker label mlabel(foreign)
line pattern label the points with the values
Save theme
grid lines glpattern(dash) of the foreign variable as a .grec file
O D T S
solid longdash longdash_dot nolabels
Save Plots
axis labels
APPEARANCE
o d t s dash shortdash shortdash_dot no axis labels
Oh Th Sh
axis labels format(%12.2f ) graph twoway scatter y x, saving("myPlot.gph") replace
Dh dot dash_dot blank change the format of the axis labels save the graph when drawing
oh dh th sh axes noline axes off no axis/labels legend off graph save "myPlot.gph", replace
turn off legend
+ X p none i tick marks noticks tick marks tlength(2) save current graph to disk
legend label(# "label")
grid lines nogrid nogmin nogmax change legend label text graph combine plot1.gph plot2.gph...
combine two or more saved graphs into a single plot
POSITION
jitter(#) jitterseed(#) tick marks xlabel(#10, tposition(crossing)) marker label mlabposition(5)

graph export "myPlot.pdf", as(.pdf) see options to set
randomly displace the markers set seed number of tick marks, position (outside | crossing | inside) label location relative to marker (clock position: 0 – 12) export the current graph as an image file size and resolution
Laura Hughes (lhughes@usaid.gov) • Tim Essam (tessam@usaid.gov) inspired by RStudio’s awesome Cheat Sheets (rstudio.com/resources/cheatsheets) geocenter.github.io/StataTraining updated July 2019
follow us @flaneuseks and @StataRGIS Disclaimer: we are not affiliated with Stata. But we like it. CC BY 4.0

Cheat Sheet: With Stata

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Cheat Sheet: With Stata

Uploaded by

Copyright:

Available Formats

Data Analysis Declare Data By declaring data type, you enable Stata to apply data munging and analysis

stem mpg tsline spot

return stem-and-leaf display of mpg within components

plot time series of sunspots

summarize price mpg, detail frequently used commands are

estimate an autoregressive model with 2 lags

return correlation or covariance matrix

Kolmogorov–Smirnov equality-of-distributions test robust standard errors

Set up / divide make

variable "_merge" to track the origin

Text medlarge Text quarter_tiny

o d t s dash shortdash shortdash_dot no axis labels

jitter(#) jitterseed(#) tick marks xlabel(#10, tposition(crossing)) marker label mlabposition(5)

You might also like