You are on page 1of 5

I.

Collapsing data

TF: Teddy Svoronos!


MPA/ID Stata training 2013!
Name of new variable you want to
create (leave blank if you want to use
the original name)
=
Name of old variable you want to
calculate the statistic for The grouping variable
Statistic that you want to Repeat the previous syntax if
that you want to collapse
calculate in parenthesis you want to calculate a different
Note: You can list several variables the dataset into
statistic, in this case median
after specifying your (statistic)

collapse (mean) mean_var1 = var1 (median) med_var2 = var2, by(Group)

Observation var1 var2 Group Group mean_var1 med_var2


1 20 10 A A 15 10
2 7 14 A B 13.142857 67
3 18 7 A C 9.2 50
4 17 1 B
5 9 40 B
6 11 82 B
7 7 73 B
8 20 80 B
9 17 3 B
10 11 67 B
11 12 98 C
12 4 24 C
13 8 75 C
14 8 3 C

August 2013
15 14 50 C

Session 3
1
MPA/ID Stata training 2013! Session 3
TF: Teddy Svoronos! August 2013

II. Merging datasets


Overview and 1:1 merge

Merges can be

one-to-one 1:1
many-to-one m:1 The grouping variable is the variable
one-to-many 1:m that Stata uses to match one dataset
many-to-manym:m with the other. There must be a
grouping variable with the exact same
Where the first m or 1 represents the master name in both datasets.
dataset, and the second represents the using
dataset. Make absolutely sure that your
As shown in the example below, many or 1 refers to grouping variables are coded correctly
the number of observations for your grouping before attempting a merge!
variable in each dataset

merge 1:1 ID using "Data2.dta"

Your using dataset, or the name


Data1.dta of the dataset that is not currently
"Master dataset" loaded into Stata.
(i.e., the dataset currently loaded in Stata)
ID var1
1 20
2 7 "Merged dataset"
3 18
4 17 ID var1 var2
1 20 10
2 7 14
Data2.dta
3 18 7
"Using dataset" 4 17 1
(i.e., the dataset that you want to
merge with the current dataset)
ID var2
1 10
4 1
2 14
3 7

2
MPA/ID Stata training 2013! Session 3
TF: Teddy Svoronos! August 2013

Example of a m:1 merge

merge m:1 Group using "Dataset2.dta"

Dataset1.dta
Observation var1 var2 Group
1 20 10 A
2 7 14 A Dataset2.dta
3 18 7 A
4 17 1 B Group var3 var4
5 9 40 B
6 11 82 B A 0.89 1200
7 7 73 B B 0.24 1687
8 20 80 B C 0.44 1147
9 17 3 B
10 11 67 B
11 12 98 C
12 4 24 C
13 8 75 C
14 8 3 C
15 14 50 C

Observation var1 var2 Group var3 var4


1 20 10 A 0.89 1200
2 7 14 A 0.89 1200
3 18 7 A 0.89 1200
4 17 1 B 0.24 1687
5 9 40 B 0.24 1687
6 11 82 B 0.24 1687
7 7 73 B 0.24 1687
8 20 80 B 0.24 1687
9 17 3 B 0.24 1687
10 11 67 B 0.24 1687
11 12 98 C 0.44 1147
12 4 24 C 0.44 1147
13 8 75 C 0.44 1147
14 8 3 C 0.44 1147
15 14 50 C 0.44 1147

Additional notes

• By default, executing a merge generates a variable _merge, which takes values:


• _merge = 1 if observation was only in the master data;
• _merge = 2 if observation was only in the using data;
• _merge = 3 if observation was successfully matched between the two datasets.
• Get in the habit of doing a tab _merge after executing a merge, in order to better
understand what took place.
• The merging variable that we use in a merge typically refers to some identifier, such
as person ID, household ID, village ID, etc.

3
MPA/ID Stata training 2013! Session 3
TF: Teddy Svoronos! August 2013

III. Labels
Assigning a label to a variable

Syntax:
label var var1 "Variable 1 label"

Example:
label var gnipc "GNI per capita"

Assigning a label to a variable's values

Syntax:
1. Create a "label definition" that associates numbers with label names

label define labelname # "label1" # "label2" # "label3"


!
! 2. Apply your new label definition to an existing variable, whose values
correspond to the ones in your label

label values var1 labelname

Example:
label def income_label 1 "Low Income" 2 "Lower Middle Income" 3
"Upper Middle Income" 4 "High Income"

label values income_level income_label

Additional notes

• By default, the tab command lists a variable's entries using its labels, not using its
corresponding numbers. To have Stata display the actual numeric values of a
variable, use tab var1, nolabel.
• Note that only numeric variables can be given labels to their values. If you want to
add labels to the values of a string variable, you must create a numeric version of it
before applying labels to its values.

4
MPA/ID Stata training 2013! Session 3
TF: Teddy Svoronos! August 2013

IV. Assignment
1. Generate a variable called female_maj which equals 1 if the ratio of female to male
primary enrollment is 100% or greater and 0 if it is less than 100% (be sure to
account for missing observations!). Add a label to the variable female_maj itself,
and create labels for the two values that female_maj can take.
2. Create a new dataset that consists of observations for each income level (4
observations total) and variables for the median and standard deviations of gni per
capita, poverty headcount ratio at $1.25 a day, and poverty headcount ratio at $2 a
day (6 variables total).
3. Load the dataset that you just made as your master dataset, and merge it with your
original MDG.dta dataset (your original dataset will be the using dataset).

You might also like