You are on page 1of 11

Econ 251

Problem Set #2
SOLUTIONS

(38 points)

Part I: Data analysis (14 points in total)


This part of the problem set introduces you to using STATA for simple data
analysis.
Note: Refer to your section 2; you used all of the STATA commands needed in this
homework in the section.

Instructions:
Following each question, please handwrite or type your answers and copy/paste the
STATA output (please, use the ‘copy as picture’ option). Your STATA output should
be included as part of the homework submission.

Background: There is evidence in the economic literature that black workers earn less
than non-black workers, on average – a phenomenon which has become known as the
“racial earnings gap”. It’s important to understand the reason for this gap, in particular it
is important to find out if there is any labour market discrimination against black
workers. By the end of this class we are going to see that there is evidence of such
discrimination (sometimes referred to as the “unexplained” part of the wage gap) and
we’ll also see that finding out the extent of discrimination is a very challenging task.
To address this research question we are going to use data on male workers in the U.S.
called WAGE2.dta (uploaded on Canvas). This data was used in a publication by
Blackburn and Neumark (1992). We are going to start the analyses by looking at sample
averages for black and non-black workers.
Download the STATA file WAGE2.dta from the CANVAS website. WAGE2.dta
contains information on monthly earnings, employment history, education, demographic
characteristics, and two test scores for 935 men in year 1980. In particular, it contains the
following variables:
wage monthly earnings (in 1976 USD)
hours average weekly hours of work
IQ IQ (intelligence quotient) score
educ years of education
age age in years
married =1 if the person is married
black =1 if the person is black

1
1. (i) Find the average years of education for everyone in the sample (variable educ).
(ii) What are the lowest (minimum) and the highest (maximum) years of education
in the sample? (1 points)
Hint: Use the STATA command sum var1

Solution:
. sum educ

Variable Obs Mean Std. Dev. Min Max

educ 935 13.46845 2.196654 9 18

The mean years of education in the sample is 13.47 years. The lowest (minimum)
years of schooling is 9 years and the highest (maximum) is 18 years.

2. (i) How many black men are there in the sample?


(ii) How many non-black men are there in the sample?
(iii) What is the percentage of non-black men in the sample?
(2 points)
Hint: Use the STATA command tab var1

Solution:
. tab black

=1 if black Freq. Percent Cum.

0 815 87.17 87.17


1 120 12.83 100.00

Total 935 100.00

There are 120 black men in the sample and 815 white men. The percentage of non-
black men in the sample is 87.17%.

3. (i) Find the average monthly wage for all men in the sample (variable wage).
(ii) Find the sample mean monthly wage for black and non-black workers
separately. Do black men in the sample earn more or less than non-black men, on
average? (2 points)

Hint: Use the STATA command tab var1, sum(var2)


In our particular case, var1 is black and var2 is wage; hence, what you would need to
type in the command window in STATA is: tab black, sum (wage)
(iii) Does the information on the sample mean wages in part (ii) above provide
compelling evidence that black workers are discriminated against on the labour market
(i.e. that they get lower wages only because they are black)? Why or why not?

2
(2 points)
Hint: This question draws your attention to the difference between sample statistics and
population parameters, and the difference between correlation and causality.

Solution:
Question (ii) looked at the sample mean characteristics of blacks and non-blacks;
however, remember we want to make inferences about the population – comparing
the sample means is not enough – it can only suggest something is going on. We
may still worry that blacks face labour market discrimination – their sample mean
earnings are lower (part (ii)); however, we cannot be certain whether there is any
discrimination against them based only on this information (e.g. you can check that
their sample mean education is also lower, and education is positively related to
earning from part (iv)). In addition, we need to be able to compare workers who are
similar in all dimensions affecting earnings (such as education or work experience),
other than race. Ultimately, we need to investigate the issue further as a first step by
making inferences about the population by testing a hypothesis about the population
mean wages and later on by using econometric tools (more about this in class).

Solution:
. tab black, sum (wage)

Summary of monthly earnings


=1 if black Mean Std. Dev. Freq.

0 990.64785 408.00275 815


1 735.84167 295.93089 120

Total 957.94545 404.36082 935

The average hourly wage in the total sample is $957.95 (in 1976 USD). The sample
mean hourly wage for blacks is $735.84, while the mean wage for non-blacks is
$990.65. Black men in the sample earn less than non-black men, on average.

4. Tabulate variable married. Category married==1 includes people who are married,
while category married==0 includes people who are single, separated, divorced or
widowed. How many people fall in category married==0? Now find the sample
average years of education for married and non-married separately. Which group has a
higher sample mean education? (2 points)
Hint: To answer the question use the STATA command tab var1, sum(var2) again, but
this time var1 is married, var2 is educ.

Solution:

3
. tab married, sum (educ)

=1 if Summary of years of education


married Mean Std. Dev. Freq.

0 13.84 2.2325417 100


1 13.423952 2.1894452 835

Total 13.468449 2.1966539 935

100 people fall in category married==0. The average years of education for married
men is 13.4 years, while the mean wage for non-married is 13.84. Non-married men
have a higher sample mean education than blacks.

5. (i) What is the sample correlation between monthly earnings (wage) and years
of education (educ)? What is the sign of this correlation and what does it mean?
Hint: Use the STATA command corr var1 var2 (2 points)

Solution:
. corr wage educ
(obs=935)

wage educ

wage 1.0000
educ 0.3271 1.0000

The sample correlation between wage and education is 0.32. Recall that the
correlation between two variables always takes on values in the interval [-1; 1]. A
correlation coefficient of positive 0.32 indicates a moderately strong positive
relationship between the variables. This means that these variable tend to move in
the same direction – when one of them increase, so does the other. Put differently,
persons (men in our sample) with a high educational attainment, tend to have high
wages, as well.

(ii) Compute the average monthly wage for each year of education in the sample.
Do you find any relationship between the two variables? Does this make sense to
you? Also, produce a scatter plot of wages and education. Does the scatter plot
confirm this relationship? (3 points)
Hint: Use the STATA command tab var1, sum(var2)
In this case, var1 is educ and var2 is wage.

Solution:

4
. tab educ, sum (wage)

years of Summary of monthly earnings


education Mean Std. Dev. Freq.

9 774.3 239.78928 10
10 719.48571 265.86743 35
11 818.81395 326.26627 43
12 862.67176 324.38961 393
13 946.12941 450.31473 85
14 990.01299 411.41423 77
15 1098.9111 485.67113 45
16 1108.7133 390.49857 150
17 1191.45 468.99221 40
18 1200.8246 531.7439 57

Total 957.94545 404.36082 935

The average wage test score for each year of education in the sample is presented in
the STATA output table above. We do find a relationship between the two
variables: men with high years of education have higher mean wages, on average.
This is one more way to illustrate the relationship between two variables.

Another way to illustrate the relationship between two variables is the so called
scatter plot. The scatter plot displays the data as a collection of points – each point
having the value of one variable (education) determining its position on the
horizontal axis and the value of the other variable (wage) determining its position on
the vertical axis. The scatter plot confirms the positive relationship between wages
and education, as well (see below).

5
Part II: Statistical theory (24 points in total)

Note: Please, refer to your lecture notes and notes on section 2 when solving Part II of
this homework. These problems are very similar to some examples you saw in class and
in your section 2.

Problem 1 (16 points)


You would like to know the average wage of all working women and men between the
ages of 18 and 54 in the Ann Arbor area. Suppose the population has mean µ and
variance σ2 . You find it too costly in terms of time and money to ask everyone about
their earnings, so you randomly select 500 people from this group, and ask them about
their wage.

(i) What are the population and the population parameter of interest in this
example? What is the sample, and what is the sample size, N? (1 point)
Solution:
The population of interest in this example is all working men and women in the Ann
Arbor area between the ages of 18 and 54. The 500 randomly drawn men and
women from the population of interest comprise our sample. The sample size N
equals 500.

Now, comment on each of the proposed estimators of the population mean wage, µ :
̅ as an estimator of the population mean
(ii) using the sample mean X
• Is it unbiased? Show formally and explain what this means intuitively.
• Is it efficient? Why or why not? Explain intuitively what efficiency means.
𝝈𝟐
• Show formally that its variance equals .
𝑵
Hint: we did this in class.
(2 points)

Solution:
The sample mean is always unbiased because its expected value equals the true
population mean, i.e.
̅) = µ.
𝔼(𝐗
We proved this in class using the fact that the expected value of a sum of random
variables equals the sum of expected values:
̅ ) = 𝔼(𝑿𝟏 +𝑿𝟐 + ⋯+𝑿𝑵)= 𝟏 𝔼(𝑿𝟏 +𝑿𝟐 + ⋯ + 𝑿𝑵 ) = 𝟏 [𝔼(𝑿𝟏 )+𝔼(𝑿𝟐 )+ ⋯ + 𝔼(𝑿𝑵 )] =
𝔼(𝑿 𝑵 𝑵 𝑵
𝟏 𝟏
(µ + µ + ⋯ + µ) = . 𝑵µ = µ
𝑵 𝑵
(And we’ve also used the fact that with random sampling every random draw
𝑿𝟏 , 𝑿𝟐 , … , 𝑿𝑵 has the same distribution as the population, so its expected value is µ,
and its variance is 𝝈𝟐 - the variance of the population).

Intuitively, unbiasedness means that if we were able to take lots of samples (say,
10,000 samples), calculate the mean in each sample and plot in on a graph, the

6
expected value (average) of the sampling distribution of the means would be the
population mean. Or, in simpler terms, on average, this estimator would give us the
right answer to the question: “What is the population mean?”.

The sample mean is efficient because it has the smallest variance amongst all
unbiased estimators of the population mean. Recall its variance is given by:
𝟐
̅) = 𝝈 . To show this we use the fact that X1, X2 …, XN are all independent, so the
var(𝐗 𝑵
variance of the sum equals the sum of variances:
𝑿𝟏 +𝑿𝟐 + ⋯+𝑿𝑵 𝟏 𝟏 𝝈𝟐
var[ ]= (𝑵)𝟐[var(X1)+var(X2)+ …+var(XN)]= 𝑵𝟐 ·N𝝈𝟐 = .
𝑵 𝑵

(iii) another estimator 𝝁 ̂ 𝟏 using only three observations from your sample and
1
calculated as: 𝝁 ̂ 𝟏 = 3 (𝑋1 + 𝑋2 + 𝑋3 ) (the average between the first three
observations).
• Is it unbiased? Show formally.
• Is it efficient? Why or why not?
𝝈𝟐 𝝈𝟐
• Show formally that its variance equals > .
𝟑 𝑵
(3 points)

Solution:
It is unbiased because its expected value equals the true population mean µ, i.e.:
𝔼(𝝁̂ 𝟏 ) =𝔼[ 1 ( X 1  X 2  X 3 ) ]=µ. (Note that this estimator is calculated exactly the
3
same way we calculate the sample mean, so it has to be unbiased, but we are acting
as if we only had a sample of size 3, and we are ignoring all the remaining 497
observations).
Here’s the formal proof using the properties of expectation we:
𝑿 +𝑿 +𝑿 1 1
𝔼(𝝁̂ 𝟏 ) =𝔼 [ 𝟏 𝟑𝟐 𝟑]= [𝔼(X1)+𝔼(X2)+𝔼(X3)]= (µ+µ+µ)=µ.
3 3
Intuitively, unbiasedness means that if we were able to take lots of samples (say,
10,000 samples), calculate the average of the first three observations in each sample
and plot it on a graph, the expected value (average) of the sampling distribution of
this estimator would be the population mean. Or, in simpler terms, on average, this
estimator would give us the right answer to the question: “What is the population
mean?”.
It is NOT efficient because its variance is larger than the var(𝐗 ̅) (recall the sample
mean is BLUE – it is the most efficient estimator amongst all unbiased estimators of
the population mean, which are linear functions).
Any estimator which does not use all of the available data is inefficient.

The variance of this estimator is given by:


𝟏 𝟏 𝝈𝟐 𝝈𝟐
̂ 𝟏 ) =var[ 1 ( X 1  X 2  X 3 ) ]= (𝟑)𝟐 [var(X1)+var(X2)+var(X3)]= 𝟗·3𝝈𝟐 =
𝐯𝐚𝐫(𝝁 > 𝑵 for
3 𝟑

all N>3 (in our case N=500).

7
IMPORTANT NOTE: Notice that even though the two estimators seem similar at
first, they are, in fact, VERY different:

Estimator
1
Sample Sample mean, ̅ X 3
(X1  X 2  X 3 )
size (whatever the sample size take the (whatever the sample size take the
average wage of all observations average wage of the first 3
(persons) in the sample) observations (persons) in the sample)
1 1
N=3 (X1  X 2  X 3 ) (X1  X 2  X 3 )
3 3
1 1
( X 1  X 2  ...  X 500 ) (X1  X 2  X 3 ) , i.e. we are ignoring all
N=500 500 3
the remaining 497 observations

(iv) another estimator 𝝁 ̂ 𝟐 using only two observations from your sample and
1 2
calculated as: 𝝁
̂ 𝟐 = 3 𝑋1 + 3 𝑋2 (a weighted average between the first and second
observation with a higher weight placed on the second observation).

• Is it unbiased? Show formally.


• Is it efficient? Why or why not? Explain intuitively.
5 𝝈𝟐
• Show formally that its variance equals 9 𝝈𝟐 > .
𝑵
(3 points)

Solution:
It is unbiased because its expected value equals the true population mean µ, i.e.:
1 2
𝔼(𝝁̂ 𝟐 ) =𝔼[3 𝑋1 + 3 𝑋2 ]=µ. (Note that this estimator is calculated as a weighted
average between the first and second observation with a higher weight placed on the
second observation – all such estimators are unbiased).
Here’s the formal proof using the properties of expectation we:
1 2 1 2 1 2
𝔼(𝝁̂ 𝟐 ) =𝔼 [3 𝑋1 + 3 𝑋2]= 3𝔼(X1)+ 3𝔼(X2)= (3µ+3µ)=µ.
Intuitively, unbiasedness means that if we were able to take lots of samples (say,
10,000 samples), calculate the weighted average of the first two observations in each
sample using the formula for 𝝁 ̂ 𝟐 , and plot it on a graph, the expected value
(average) of the sampling distribution of this estimator would be the population
mean. Or, in simpler terms, on average, this estimator would give us the right
answer to the question: “What is the population mean?”.
̅) (you could also say
It is NOT efficient because its variance is larger than the var(𝐗
that the sample mean is BLUE – it is the most efficient estimator amongst all
unbiased estimators of the population mean, which are linear functions).

The variance of this estimator is given by:

8
1 2 𝟏 𝟐 1 4 5 𝝈𝟐
̂ 𝟐 )=var[3 𝑋1 + 3 𝑋2 ]= (𝟑)𝟐 var(X1)+ (𝟑)𝟐 var(X2)=
var(𝝁 𝝈𝟐 +9 𝝈𝟐 )= 9 𝝈𝟐 > 𝑵 for all
9
9
N>5 (in our case N=500).

(v) another estimator 𝝁̂ 𝟑 using only the first and the last observation from your
sample and calculated as:
𝑿 +𝑵𝑿
̂ 𝟑 = 𝟏𝟐𝑵+𝟐𝑵, where N is the sample size.
𝝁

• Show that this estimator is biased, i.e. show that 𝔼(𝝁̂ 𝟑 ) ≠ 𝜇.


• Find the bias of 𝝁 ̂ 𝟑 . Is the estimator biased up or down (i.e. does it
overestimate or underestimate 𝜇?
• If it is biased, can it be efficient?
(3 points)

Solution:
This estimator is NOT unbiased because its expected value does NOT equal the true
population mean µ, i.e.:
𝑿𝟏 +𝑵𝑿𝑵 𝔼(𝑿𝟏 +𝑵𝔼(𝑿𝑵 ) 𝝁+𝑵𝝁 𝝁(𝟏+𝑵) 𝝁
̂𝟑) = 𝔼 (
𝔼(𝝁 )= = 𝟐(𝑵+𝟏) = 𝟐(𝑵+𝟏) = 𝟐 < 𝜇 in general (so, this
𝟐𝑵+𝟐 𝟐(𝑵+𝟏)
estimator is biased down).
𝝁 𝝁
̂ 𝟑 ) – 𝝁 = 𝟐 − 𝝁 = − 𝟐.
̂ 𝟑 )= E(𝝁
Bias(𝝁
It is not unbiased and, therefore, it cannot be efficient.
For an estimator to be efficient, it needs to be unbiased. The property smallest
variance is meaningless without unbiasedness.

Note: other examples of biased estimators of the population mean are: the sample
median when the population is not symmetric (we showed this with the online
applet), the sample minimum, the sample maximum.

(vi) Rank the estimators in parts (ii) to (v) in order of your preference starting from
the one you would prefer most to the one you would prefer least. Explain.
(3 points)
Hint: Both ̂𝜇1 and 𝜇̂ 2 are unbiased; in order to choose between these two unbiased
estimators, we need to compare their variances (we would prefer the one with a smaller
variance). Is var(𝜇̂ 1 ) > 𝑜𝑟 < var(𝜇̂ 2 )?

Solution:
We would prefer the sample mean most as it is both unbiased and efficient. We
̂𝟑 𝐥east since it’s not unbiased. Both ̂𝝁𝟏 and 𝝁
would prefer 𝛍 ̂ 𝟐 are unbiased, so to
choose between these two unbiased estimators, we need to compare their variances:
𝝈𝟐 5
𝐯𝐚𝐫(𝝁̂ 𝟏 ) = 𝟑 <var(𝝁 ̂ 𝟐 )=9 𝝈𝟐 . We would prefer the one with a smaller variance (we
say it is more efficient).
Here’s the complete ranking of the estimators in order of our preference (from most
to least preferred): 1) the sample mean; 2) 𝝁 ̂ 𝟏 ; 3) 𝝁
̂ 𝟐 ; 4) 𝝁
̂𝟑.

9
(vii) Suppose now you take 5,000 samples of size 500 and calculate the sample mean
in each sample and plot it on a graph. Approximately what will be the sampling
distribution of the sample means? Be as precise as possible. Does your answer
depend on the distribution of the population? (1 points)

Solution:
We know from class that the sampling distribution of the sample mean will be
(approximately) Normal with mean equal to the population mean µ, and variance
𝝈𝟐
equal to 𝑵 , regardless of the distribution of the population (e.g. the population may
be Normally distributed, or uniformly distributed, or follow any distribution!). In
our case N=500 – the sample size. In shorthand notation:
𝝈𝟐
̅ ∼ 𝐍 (µ;
𝐗 ).
𝟓𝟎𝟎

Problem 2 (8 points)
Recall the die rolling example from class: you are interested in estimating the population
mean of a random variable X describing the outcomes of rolling a 6-sided die (i.e. X
follows a uniform distribution). We know that the true population mean of X is 3.5 but
suppose we didn’t know it and we would like to estimate it.
Consider now the following estimators of the population mean 𝜇:
9
𝜇̂ 4 = 10 ̅
X
𝑁−1
𝜇̂ 5 = 𝑁 ̅ X
̅ is the sample mean and N is the sample size.
where X

(i) Show that both 𝜇̂ 4 and 𝜇̂ 5 are biased and find their biases.

(ii) Now open Excel file HW2 Dice roll.xls. Tab N=100 shows the estimates
when using each estimator and a sample of size N=100. Press F9 to generate a
new sample and observe the value of the estimates – are the estimates always
close to the true population mean 3.5?
Note: If you have a Mac there is no F9 key. To generate a new sample do the
following: place the cursor in any cell, type in a number (any number, e.g. 5) and
press “Enter”. The worksheet will refresh and a new sample will be generated.

(iii) Do the same with Tab N=50,000, which shows the estimates when using
each estimator and a sample of size N=50,000. What happens to 𝜇̂ 4 as the sample
size gets large? What about 𝜇̂ 5 ? Which one of the estimators gets closer to the true
population mean 3.5?

(iv) What important property of estimators does this exercise illustrate?


(2 points each, 8 points in total)

10
Solution:
𝟗
(i) E(𝝁 ̂ 𝟒 )=𝑬 (𝟏𝟎 𝐗̅) = 𝟗 𝑬(𝐗 ̅) = 𝟗 𝝁 < 𝜇
𝟏𝟎 𝟏𝟎
Also note that in our example the population mean 𝝁 = 𝟑. 𝟓, so the expected value of
𝟗
the sampling distribution of our estimator will be 𝟏𝟎 · 𝟑. 𝟓 = 𝟑. 𝟏𝟓 - a number lower
than the true population mean of 3.5.
𝑵−𝟏
E(𝝁̂ 𝟓 )=𝑬 ( 𝑵 𝐗 ̅) = 𝑵−𝟏 𝑬(𝐗 ̅) = 𝑵−𝟏 𝝁 < 𝜇
𝑵 𝑵
Note that this estimator is also biased down but its expected value depends on the
sample size N.
𝟗 𝟏
Bias(𝝁 ̂ 𝟒 ) – 𝝁 = 𝟏𝟎 𝝁 − 𝝁 = − 𝟏𝟎 𝝁
̂ 𝟒 )= E(𝝁
𝑵−𝟏 𝟏
̂ 𝟓 )= E(𝝁
Bias(𝝁 ̂ 𝟓 )– 𝝁 = 𝝁− 𝝁=− 𝝁
𝑵 𝑵

(ii) The estimates are not always close to the true population mean 3.5; in fact, they
are often very far off.

̂ 𝟓 gets closer to the true population mean 3.5 – in


(iii) As the sample size gets large 𝝁
most samples its value is almost exactly 3.5, while 𝝁 ̂ 𝟒 does not.

(iv) This exercise illustrates consistency: 𝝁 ̂ 𝟓 is a consistent estimator of the


population mean 𝝁: as the sample size gets large 𝝁 ̂ 𝟓 gets arbitrarily close to the true
population mean 3.5. Estimator 𝝁 ̂ 𝟒 is not consistent: increasing the sample size does
not help – its bias remains unchanged.

Just FYI:
For those of you, who are interested in a more formal treatment, here is the formal
proof that 𝝁
̂ 𝟓 is consistent:
𝟏 𝟏
Bias(𝝁̂ 𝟓 )= − 𝑵 𝝁 → 0 as N→∞ (Note: 𝐥𝐢𝐦( − 𝑵 𝝁)=0).
𝐍→∞
𝑵−𝟏 𝟐
̂ 𝟓 )=var(
var(𝝁 ̅)=(𝑵−𝟏)𝟐 var(𝐗
𝐗 ̅) =(𝑵−𝟏)𝟐 𝝈 → 0 as N→∞
𝑵 𝑵 𝑵 𝑵
𝑵−𝟏 𝟐 𝝈𝟐 𝑵−𝟏 𝝈𝟐
(Note: 𝐥𝐢𝐦[( 𝑵 ) 𝑵 ]=[ 𝐥𝐢𝐦( 𝑵 )]𝟐 𝐥𝐢𝐦 𝑵 ]=1·0=0).
𝐍→∞ 𝐍→∞ 𝐍→∞
𝟏
𝝁
̂ 𝟒 is not consistent as Bias(𝝁
̂ 𝟒 )= − 𝟏𝟎 𝝁 - remains unchanged as N→∞

11

You might also like