Professional Documents
Culture Documents
Agenda Day 3
• Importing datasets
• Transposing & reshaping datasets
• Collapsing & expanding datasets
• Combining datasets
• Generating & transforming variables
• Missing values
• Storage types & output format
• MATA
• Data checks
Stata can import a variety of different file formats (see help import).
Often, datasets are in one of the following (spreadsheet) formats
• text (tab-separated or comma-separated) format: insheet
[varlist] using filename [, options]
• Microsoft Excel (.xls and .xlsx) format: import excel [using]
filename [, import_excel_options]
• Stata format: use filename [, clear]
Remarks:
• Stata can also export datasets into other file formats e.g. using
export excel or outsheet
Data preparation | Day 3
Kohler & Wunderlich | ©Agroscope | Institute for Sustainability Sciences ISS | Tänikon 1, 8356 Ettenhausen
4
Transposing & reshaping datasets
Once in a blue moon, we would like to transpose our dataset, i.e.
interchange columns and rows
• xpose, clear [options]
Caution
• observations not drawn are dropped from memory
• if you want to reproduce results from analyzing random sample
you previously need to set the start value of the random numbers
generator to a specific but arbitrary value using set seed #
Use short and meaningful variable names and label them (see help
rename and help label var). Note that the following variable names
are not allowed:
Explicit subscripts
• varname[...]
Data preparation | Day 3
Kohler & Wunderlich | ©Agroscope | Institute for Sustainability Sciences ISS | Tänikon 1, 8356 Ettenhausen
10
Missing values
Stata has 27 numeric missing values (see help missing)
• default (or system missing value) is denoted by . (dot)
• extended missing values are denoted by .a, .b, ..., .z
• functions: missing()
Datatypes
• differ by precision and storage memory they use
• Stata chooses data type for us (default for numeric variable is
float)
Stata also offers with MATA (see help mata) a matrix programming
language to perform matrix calculations. A very simple example is
shown below
mata
A = (1, 2 3, 4)
B = (1, 2 3, 4)
C = A*B
end
We will not cover MATA in this program (although we will use some
MATA features when we retrieve regression results stored by Stata).
Those interested can find more information here
• http://www.stata.com/features/matrix-programming-mata/
Moreover, it is good practice to check the raw data you collect even
before you start manipulating it!
Today, we have seen how to import and prepare data in Stata such
that it can be used in an empirical analysis.