Professional Documents
Culture Documents
Collapsing data
August 2013
15 14 50 C
Session 3
1
MPA/ID Stata training 2013! Session 3
TF: Teddy Svoronos! August 2013
Merges can be
one-to-one 1:1
many-to-one m:1 The grouping variable is the variable
one-to-many 1:m that Stata uses to match one dataset
many-to-manym:m with the other. There must be a
grouping variable with the exact same
Where the first m or 1 represents the master name in both datasets.
dataset, and the second represents the using
dataset. Make absolutely sure that your
As shown in the example below, many or 1 refers to grouping variables are coded correctly
the number of observations for your grouping before attempting a merge!
variable in each dataset
2
MPA/ID Stata training 2013! Session 3
TF: Teddy Svoronos! August 2013
Dataset1.dta
Observation var1 var2 Group
1 20 10 A
2 7 14 A Dataset2.dta
3 18 7 A
4 17 1 B Group var3 var4
5 9 40 B
6 11 82 B A 0.89 1200
7 7 73 B B 0.24 1687
8 20 80 B C 0.44 1147
9 17 3 B
10 11 67 B
11 12 98 C
12 4 24 C
13 8 75 C
14 8 3 C
15 14 50 C
Additional notes
3
MPA/ID Stata training 2013! Session 3
TF: Teddy Svoronos! August 2013
III. Labels
Assigning a label to a variable
Syntax:
label var var1 "Variable 1 label"
Example:
label var gnipc "GNI per capita"
Syntax:
1. Create a "label definition" that associates numbers with label names
Example:
label def income_label 1 "Low Income" 2 "Lower Middle Income" 3
"Upper Middle Income" 4 "High Income"
Additional notes
• By default, the tab command lists a variable's entries using its labels, not using its
corresponding numbers. To have Stata display the actual numeric values of a
variable, use tab var1, nolabel.
• Note that only numeric variables can be given labels to their values. If you want to
add labels to the values of a string variable, you must create a numeric version of it
before applying labels to its values.
4
MPA/ID Stata training 2013! Session 3
TF: Teddy Svoronos! August 2013
IV. Assignment
1. Generate a variable called female_maj which equals 1 if the ratio of female to male
primary enrollment is 100% or greater and 0 if it is less than 100% (be sure to
account for missing observations!). Add a label to the variable female_maj itself,
and create labels for the two values that female_maj can take.
2. Create a new dataset that consists of observations for each income level (4
observations total) and variables for the median and standard deviations of gni per
capita, poverty headcount ratio at $1.25 a day, and poverty headcount ratio at $2 a
day (6 variables total).
3. Load the dataset that you just made as your master dataset, and merge it with your
original MDG.dta dataset (your original dataset will be the using dataset).