You are on page 1of 5

DATA ANALYSIS FOR ECONOMICS

PROBLEM SET 4: DUMMY VARIABLES

1 Using the data of eight firms, a regression model was estimated to analyze the
relationship between investment in thousand Euros (𝑦𝑦𝑖𝑖 ) and production growth rate in %
(𝑥𝑥𝑖𝑖 ):

𝑦𝑦�𝚤𝚤 = 3.841 − 0.0812𝑥𝑥𝑖𝑖 𝑅𝑅2 = 0.466 𝑆𝑆𝑆𝑆𝑆𝑆 = 39.21 𝑛𝑛 = 8


(2.12) (0.038)

Additionally, two different regressions are estimated. The first one only takes into account
European firms within the original sample:

𝑦𝑦�𝚤𝚤 = −0.372 + 0.108𝑥𝑥𝑖𝑖 𝑅𝑅2 = 0.976 𝑆𝑆𝑆𝑆𝑆𝑆 = 0.949 𝑛𝑛 = 4


(0.782) (0.012)

And the second one only takes into American firms within the original sample:

𝑦𝑦�𝚤𝚤 = 1.259 + 0.171𝑥𝑥𝑖𝑖 𝑅𝑅2 = 0.933 𝑆𝑆𝑆𝑆𝑆𝑆 = 1.407 𝑛𝑛 = 4


(1.43) (0.032)

Find whether making the distinction between European and American firms helps to
understand better the behaviour of investment and interpret your results.

2 We have the following estimated regression model that explains the behaviour of
profits:

� 𝚤𝚤 = 215 + 14𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑖𝑖 − 22ℎ𝑜𝑜𝑜𝑜𝑜𝑜𝑖𝑖 − 50𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠ℎ𝑖𝑖 + 45𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑖𝑖


𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝

Such that profit is monthly profits in thousand dollars, sector is a sector dummy variable with
a value of 1 if the sampled company belongs to the tertiary sector, home is a nationality dummy
variable equals to 1 if the sampled company is a national company, south is a dummy variable
with a value of 1 if the sampled company is located in the south of the country and urban is
a dummy variable with a value of 1 if the sampled company is located in an urban area.

a- Find the predicted average profit for a foreign manufacturing company that is located
in a rural area at the north of the country.
b- Taking two companies of our sample, find the estimated average difference in their
monthly profit if we know that one of them is a national manufacturing company
located in a southern city of the country and the other one is a foreign services
company located in a northern city of the country.

1
DATA ANALYSIS FOR ECONOMICS

3 The Chinese Ministry of Education is performing an analysis about the recurrent


expenditures in secondary schools in the city of Shanghai. Using a sample of 74 secondary
schools, the following estimation results are obtained:

𝑦𝑦�𝚤𝚤 = 23,953.3 + 339.0432𝑥𝑥𝑖𝑖 𝑅𝑅2 = 0.394 𝑆𝑆𝑆𝑆𝑆𝑆 = 8.916 𝑛𝑛 = 74


(27,167.96) (49.551)

Where the dependent variable is recurrent expenditures and the explanatory variable is
number of students in each secondary school.

However, it is believed that the type of school affects completely the behaviour of recurrent
expenditures and two different regression models are estimated distinguishing between
regular secondary schools (40 observations) and occupational secondary schools (34
observations) such that:

𝑦𝑦�𝚤𝚤 = 47,974.07 + 436.7769𝑥𝑥𝑖𝑖 𝑅𝑅2 = 0.634 𝑆𝑆𝑆𝑆𝑆𝑆 = 3.4895 𝑛𝑛 = 34


(33.879,03) (58,621)

𝑦𝑦�𝚤𝚤 = 51,475.25 + 152.2982𝑥𝑥𝑖𝑖 𝑅𝑅2 = 0.263 𝑆𝑆𝑆𝑆𝑆𝑆 = 1.215 𝑛𝑛 = 40


(21,599.14) (41.398)

Is there a significant difference in the behaviour of recurrent expenditures between the two
types of schools? Interpret your result at 1% significance level.

4 Male babies tend to weigh more than female babies. If we define a dummy variable
𝑀𝑀 = 1 for male babies and 𝑀𝑀 = 0 for female babies, the regression that explains baby´s
weigh in grams (𝑌𝑌) as a function of the number of cigarettes per day smoked by the mother
(𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐) and the dummy variable 𝑀𝑀 is the following (sample size 𝑛𝑛 = 964):

𝑦𝑦�𝚤𝚤 = 3,354 + 119𝑀𝑀𝑖𝑖 − 7𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑖𝑖 𝑅𝑅2 = 0.033


(20) (26) (2.1)

Interpret this estimated regression model.

5 Using the data of the previous exercise, a new regression model is estimated such
that (strategy 1):

𝑦𝑦�𝚤𝚤 = 3,418 − 7.2𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑖𝑖 𝑅𝑅2 = 0.012 𝑆𝑆𝑆𝑆𝑆𝑆 = 158.6 𝑛𝑛 = 964


(143) (2.1)

2
DATA ANALYSIS FOR ECONOMICS

Strategy 2 consists on performing two different regressions. The first one only takes into
account babies that are first-born (their mothers have not got previous births):

𝑦𝑦�𝚤𝚤 = 3,363 − 4.0𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑖𝑖 𝑅𝑅2 = 0.004 𝑆𝑆𝑆𝑆𝑆𝑆 = 91.2 𝑛𝑛 = 584


(18) (2.8)

And the second one only takes into account babies that are not first-born (their mothers
have got previous births):

𝑦𝑦�𝚤𝚤 = 3,506 − 12.1𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑖𝑖 𝑅𝑅2 = 0.039 𝑆𝑆𝑆𝑆𝑆𝑆 = 63.5 𝑛𝑛 = 380


(23) (3.1)

Find the most appropriate strategy to better understand the behaviour of the dependent
variable (structural break?) and interpret your results at 1 % significance level.

6 We have the following estimated regression model that explains the behaviour of
salaries:

𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤
� 𝑖𝑖 = 300 + 25𝑒𝑒𝑒𝑒𝑒𝑒𝑖𝑖 + 37𝑒𝑒𝑒𝑒𝑒𝑒𝑖𝑖 + 14𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑖𝑖 − 22𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑖𝑖 − 50𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠ℎ𝑖𝑖 + 45𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑖𝑖

Such that wage is the weekly salary in dollars, edu is years of education, exp is years of
professional experience, male is a gender dummy variable with a value of 1 if the sampled
individual is a male, black is a race dummy variable with a value of 1 if the sampled individual
is black, south is a dummy variable with a value of 1 if the sampled individual lives in the
south of the country and urban is a dummy variable with a value of 1 if the sampled individual
lives in an urban area.

a- Which would be the predicted average salary for a black female that lives in a rural area
at the north of the country when edu and exp are equal zero?
b- Taking two males from our sample with the same years of education and the same
years of professional experience, which would be the estimated average difference in their
weekly salary if we know that one of them is black and lives in a southern city of the
country and the other one is white and lives in a northern city of the country?

7 We have a housing price model with the following variables: price (house prices), sqrft
(house size), bdrms (number of bedrooms) and colonial (dummy variable equal to one if the
house is of the colonial style. The estimation results are the following (sample size is 88
houses):

� 𝚤𝚤 ) = 5.56 + 0.707 log(𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑖𝑖 ) + 0.027𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑖𝑖 + 0.054𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑖𝑖


log (𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝

(0.65) (0.093) (0.029) (0.045)

3
DATA ANALYSIS FOR ECONOMICS

a- Interpret this estimated regression model.


b- Is the effect of the variable bdrms statistically significant?
c- Is there a significant evidence to say that colonial houses are more expensive than
the rest of the houses independently of the rest of the factors?
d- Is the above model globally significant knowing that 𝑅𝑅2 = 0.649?

8 The following wage equations have been estimated using data on workers from
Vietnam:

log(�
𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠) = 1.25 + 0.15𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔 + 0.02𝑒𝑒𝑒𝑒𝑒𝑒
(0.35) (0.03) (0.004)

log(�
𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠) = 1.55 + 0.10𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔 + 0.015𝑒𝑒𝑒𝑒𝑒𝑒 − 0.0005𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔 ∗ 𝑒𝑒𝑒𝑒𝑒𝑒
(0.48) (0.05) (0.005) (0.002)

Where salary is measured in US dollars and gender is a dummy variable taking the value of 1
if the worker is a male and 0 if the worker is a female, exp measures the years of work
experience.
a- What is the estimated average difference between a man´s salary with 5 years work
experience and that of a woman´s with 10 years work experience according to the
first model?
b- What is the estimated average difference between a man´s salary with 5 years work
experience and that of a woman´s with 10 years work experience according to the
second model?
c- Test that the salary difference between men and women does not depend on
experience.

9 To see whether people living in urban areas spend more on fish than people living
in rural areas, we get the following estimation results:

OLS Estimation results


Dependent: log(expenditure in fish)
Explanatory OLS t-statistic Degrees of Significance t-critical
Variable Coefficient Freedom level
Intercept 6.375 36 0.01 -
log(income) 1.313 5.328 36 0.01 2.719
gender -0.055 -1.378 36 0.01 -2.719
urban 0.143 10.311 36 0.01 2.719

Sample size (n) 40 F-statistic Degrees of Significance F-critical


Freedom level
R-squared 0.750 36 3(n),36(d) 0.01 4.40

4
DATA ANALYSIS FOR ECONOMICS

Where the dependent variable is expenditure in fish (with log), income is disposable income
(with log), gender is a gender dummy with 1 if male and 0 if female and urban is another
dummy which takes the value 1 if person lives in an urban area. Please, answer the
following three questions:

a- Interpret the above estimations results (only the value of the OLS coefficient for each
of the explanatory variables).
b- Is the variable gender individually significant to explain the behavior of fish
expenditures (at 1% significance level)? Explain.
c- Is the model globally significant at 1% significance level? Explain.

You might also like