Stata Worksheet 3

I.
Collapsing data
TF: Teddy Svoronos!

MPA/ID Stata training 2013!
Name of new variable you want to
create (leave blank if you want to use
the original name)
=
Name of old variable you want to
calculate the statistic for The grouping variable
Statistic that you want to Repeat the previous syntax if
that you want to collapse
calculate in parenthesis you want to calculate a different
Note: You can list several variables the dataset into
statistic, in this case median
after specifying your (statistic)
collapse (mean) mean_var1 = var1 (median) med_var2 = var2, by(Group)
Observation var1 var2 Group Group mean_var1 med_var2

1 20 10 A A 15 10
2 7 14 A B 13.142857 67
3 18 7 A C 9.2 50
4 17 1 B
5 9 40 B
6 11 82 B
7 7 73 B
8 20 80 B
9 17 3 B
10 11 67 B
11 12 98 C
12 4 24 C
13 8 75 C
14 8 3 C
August 2013
15 14 50 C
Session 3
1
MPA/ID Stata training 2013! Session 3
TF: Teddy Svoronos! August 2013
II. Merging datasets

Overview and 1:1 merge
Merges can be
one-to-one 1:1
many-to-one m:1 The grouping variable is the variable
one-to-many 1:m that Stata uses to match one dataset
many-to-manym:m with the other. There must be a
grouping variable with the exact same
Where the first m or 1 represents the master name in both datasets.
dataset, and the second represents the using
dataset. Make absolutely sure that your
As shown in the example below, many or 1 refers to grouping variables are coded correctly
the number of observations for your grouping before attempting a merge!
variable in each dataset
merge 1:1 ID using "Data2.dta"
Your using dataset, or the name

Data1.dta of the dataset that is not currently
"Master dataset" loaded into Stata.
(i.e., the dataset currently loaded in Stata)
ID var1
1 20
2 7 "Merged dataset"
3 18
4 17 ID var1 var2
1 20 10
2 7 14
Data2.dta
3 18 7
"Using dataset" 4 17 1
(i.e., the dataset that you want to
merge with the current dataset)
ID var2
1 10
4 1
2 14
3 7
2
Example of a m:1 merge
merge m:1 Group using "Dataset2.dta"
Dataset1.dta
Observation var1 var2 Group
1 20 10 A
2 7 14 A Dataset2.dta
3 18 7 A
4 17 1 B Group var3 var4
5 9 40 B
6 11 82 B A 0.89 1200
7 7 73 B B 0.24 1687
8 20 80 B C 0.44 1147
9 17 3 B
10 11 67 B
11 12 98 C
12 4 24 C
13 8 75 C
14 8 3 C
15 14 50 C
Observation var1 var2 Group var3 var4

1 20 10 A 0.89 1200
2 7 14 A 0.89 1200
3 18 7 A 0.89 1200
4 17 1 B 0.24 1687
5 9 40 B 0.24 1687
6 11 82 B 0.24 1687
7 7 73 B 0.24 1687
8 20 80 B 0.24 1687
9 17 3 B 0.24 1687
10 11 67 B 0.24 1687
11 12 98 C 0.44 1147
12 4 24 C 0.44 1147
13 8 75 C 0.44 1147
14 8 3 C 0.44 1147
15 14 50 C 0.44 1147
Additional notes
• By default, executing a merge generates a variable _merge, which takes values:

• _merge = 1 if observation was only in the master data;
• _merge = 2 if observation was only in the using data;
• _merge = 3 if observation was successfully matched between the two datasets.
• Get in the habit of doing a tab _merge after executing a merge, in order to better
understand what took place.
• The merging variable that we use in a merge typically refers to some identifier, such
as person ID, household ID, village ID, etc.
3
III. Labels
Assigning a label to a variable
Syntax:
label var var1 "Variable 1 label"
Example:
label var gnipc "GNI per capita"
Assigning a label to a variable's values
Syntax:
1. Create a "label definition" that associates numbers with label names
label define labelname # "label1" # "label2" # "label3"

!
! 2. Apply your new label definition to an existing variable, whose values
correspond to the ones in your label
label values var1 labelname
Example:
label def income_label 1 "Low Income" 2 "Lower Middle Income" 3
"Upper Middle Income" 4 "High Income"
label values income_level income_label
Additional notes
• By default, the tab command lists a variable's entries using its labels, not using its
corresponding numbers. To have Stata display the actual numeric values of a
variable, use tab var1, nolabel.
• Note that only numeric variables can be given labels to their values. If you want to
add labels to the values of a string variable, you must create a numeric version of it
before applying labels to its values.
4
IV. Assignment
1. Generate a variable called female_maj which equals 1 if the ratio of female to male
primary enrollment is 100% or greater and 0 if it is less than 100% (be sure to
account for missing observations!). Add a label to the variable female_maj itself,
and create labels for the two values that female_maj can take.
2. Create a new dataset that consists of observations for each income level (4
observations total) and variables for the median and standard deviations of gni per
capita, poverty headcount ratio at $1.25 a day, and poverty headcount ratio at $2 a
day (6 variables total).
3. Load the dataset that you just made as your master dataset, and merge it with your
original MDG.dta dataset (your original dataset will be the using dataset).

Stata Worksheet 3

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Stata Worksheet 3

Uploaded by

Copyright:

Available Formats

I.

TF: Teddy Svoronos!

collapse (mean) mean_var1 = var1 (median) med_var2 = var2, by(Group)

Observation var1 var2 Group Group mean_var1 med_var2

II. Merging datasets

merge 1:1 ID using "Data2.dta"

Your using dataset, or the name

Example of a m:1 merge

merge m:1 Group using "Dataset2.dta"

Observation var1 var2 Group var3 var4

• By default, executing a merge generates a variable _merge, which takes values:

Assigning a label to a variable's values

label define labelname # "label1" # "label2" # "label3"

label values var1 labelname

label values income_level income_label

You might also like