You are on page 1of 11

Tamzidul Tom Islam

Mariah Rippe
Erna Rismawaty
ECON 601 Module 5 Problem
Set
Fall 2019
Your solu ons should be typed and well organized. You need to explain/show all of the steps
you used to arrive at your answer. Submit your work through Blackboard as a Word or pdf file.
1. Indicate whether the following statements are true or false, along with a brief
explana on.
a. OLS can be used to study curvilinear rela onships between X and Y.
True. OLS can be used to es mate curvilinear rela onships between X and Y;
however, the data needs to be transformed to find this rela onship.

b. A polynomial of order 3 can also be referred to as a quadra c specifica on.


False: A polynomial of order 3 can be referred to as a cubic specifica on. A
quadra c specifica on means the equa on used 2 as the highest power for the
explanatory variable.

c. Demeaning an explanatory variable that is specified as a polynomial can help reduce


the problem of heteroscedas city..
False. Demeaning, or centering, an explanatory variable helps avoid
mul collinearity in polynomial regression models to some extent.

d. The reciprocal of X is possibly a suitable transforma on for a regression model if the


variable X contains nega ve values and posi ve values.
False. The data for the x values has to be either posi ve OR nega ve, it cannot
have both types of values for reciprocal of X transforma on.

e. A “linear-log model” refers to a model where the natural log of Y is regressed on X.


False: When y is regressed on natural log of x, it is referred to as linear-log model. If
the dependent variable is taken as natural log, and the independent variable is le
at its original scale, then it is called log-linear model. Natural log on any number
has to be greater than 0.

f. All other factors the same, a lower value for the standard error of the regression is
preferred to a higher value.
True: Lower standard error means lower standard devia on of the residual errors.
Smaller standard error means more accuracy (average devia on of error from the
mean is smaller) than a regression equa on with higher standard devia on. With
higher standard error, the value of residual errors are more dispersed from the
mean residual error.
g. The es mated slope coefficient in a log-log model can be interpreted as an elas city.
True. The coefficient b1 measures the sensi vity of y to a percent change in x.

2. Open the file “2018 Movie Gross.dta” on Blackboard. This contains data on movies
released in 2018 that I obtained from www.boxofficemojo.com. The two variables in this
file are:
Y = TG (total domes c gross revenue, $ mil.)
X= OG (opening weekend gross revenue, $ mil.)
Opening weekend gross revenue for a movie tends to be related with how much total gross
revenue that movie generates. Thus, the movie industry closely watches the opening
weekend gross revenues to help make decisions about adver sing, distribu on, etc.

a. Create a sca erplot showing the rela onship between TG and OG. Paste this graph
into your solu ons. Briefly describe what you observe in this sca erplot.

Based off of what the sca erplot is showing, we see that moving from le to right,
the value becomes more spread out. The data is also clustered to the le of x axis
and bo om of y axis. Also we observe the same when we move up the y axis that
the data gets spread out. This suggests natural logarithm on x and y which
condenses the data closer together.
b. Es mate each of the following models and fill in the table values:
Coefficient Is OG sta s cally
Standard Error of
of significant at the
Model the Regression
determina o 1% level? (Yes or
(RMSE)
n (R2) No)
Model 1 (TG vs OG) 0.9356 24.799 Yes
Model 2 (TG vs log of OG) 0.3895 76.373 Yes
Model 3 (log of TG vs OG) 0.3572 1.6605 Yes
Model 4 (log of TG vs log of OG) 0.7541 1.027 Yes
c. Provide an interpreta on for the marginal effect of OG for each model. Your answers
must specify the correct unit of measurement for the predicted change in TG (i.e.,
either % or millions of dollars:

i. Model 1: if OG increases by $1 million, then TG is predicted to increase by: 3.003


Million
ii. Model 2: if OG increases by 1%, then TG is predicted to increase by : .2970891 Million
iii. Model 3: if OG increases by $1 million, then TG is predicted to increase by : 3.93%
iv. Model 4: if OG increases by 1%, then TG is predicted to increase by : .88%
d. For each model, predict the total gross revenue for a movie that has an opening
gross of $10 million:
i. Model 1: the predicted TG is : 35.08 Millions
T G = 5.048291 + 3.00334(10)
T G = 35.081691
ii. Model 2: the predicted TG is : 84.87 Millions
T G = 16.46498 + 29.70891ln(10)
T G = 84.8722733
iii. Model 3: the predicted TG is : 9.32 Millions
ln(T G) = 1.838508 + 0.393164(10)
ln(T G) = 2.231672
e2.231672 = 9.3154285
iv. Model 4: the predicted TG is : 30.70 Millions
ln(T G) = 1.40753 + 0.8758519ln(10)
ln(T G) = 3.424253529
e3.424253529 = 30.69971985

e. Which is your best model? Jus fy your choice.

Model 1: High correla on, but very high standard error. Although regression makes a straight
line, the data on both axis gets more dispersed from le to right on x axis, and moving up the
y axis.

Model 2: Correla on declines substan ally, and standard error is ever higher than model 1
standard error.

Model 3: Since dependent variable was logged, R-squared and standard error cannot be
comparable to other models. From looking at the regression line, it doesn’t seem like a good
fit.

Model 4: Model 4 is be er over model 3, with higher R-squared and lower Standard Error.
Also, the residual plot is more constant around 0 for model 4 than any other model. Thus
model 4 is the best model for predic on.
3. Suppose you have been hired by a human resources department to build a model of
employee wages. The model specifies wages as a func on of an employee’s age (Age)
and his/her years of
educa on (Edu). The es mated model is:
W agei = 11.4 + 0.25Agei − 0.003Agei2 + 2.3Edui
a. Holding Edu fixed at a value of 10, calculate the marginal effect of Age on Wage for a
person whose age changes from 23 to 24. Repeat this calcula on for a person whose
age changes from 54 to 55. Round all values to the nearest penny.

Suppose,
Wage = w,
Age = a
Edu = e

w = 11.4 + .25×a − .003×a2 + 2.3×10


We are keeping e=10 constant
Then, w = 11.4 + .25×a − .003×a2 + 23
Age, a = 23

w = 11.4 + 0.25(23) − 0.003(23)2 + 23


Wage, w = 38.563
Age, a = 24,

w = 11.4 + 0.25(24) − 0.003(24)2 + 23


Wage, w = 38.672
Marginal effect = 38.672 – 38.563 = .109
Average wage increases by .109 unit when age increases from 23 to 24.

Age, a = 54 (e is constant at 10)

w = 11.4 + 0.25(54) − 0.003(54)2 + 23


Wage, w = 39.152
Age, a = 55

w = 11.4 + 0.25(55) − 0.003(55)2 + 23


Wage, w = 39.075
Marginal effect = 39.075 – 39.152 = -.077
Average wage decreases by .077 as the age increases from 54 to 55.
b. At what Age is an employee’s Wage predicted to be greatest? Round the value to the
nearest whole number. (Note: the equa on you should use is shown in slide #10 of
the chapter 5 lecture PowerPoint.)

w = 11.4 + .25×a − .003×a2 + 2.3×e


The coefficient of second-order variable has a nega ve value. That means the curve is facing
down and it will have a maximum point.
Rearranging the above equa on like ax2 + bx + c = y
w =− .003×a2 + .25×a + (2.3×e + 11.4)

x= − b 2
2a (for ax + bx + c = y )
a = 2(−.25 2
−.003) = 41.67 ( w = 11.4 + .25×a − .003×a + 2.3×e )
At age 41.67 , the average employee wage is the highest.

c. What type of rela onship is there between employee Age and Wage according to
the es mated regression model? Your answer should be either (i) increasing at an
increasing rate; (ii) increasing at a decreasing rate; (iii) decreasing at an increasing
rate; or (iv) decreasing at a decreasing rate. Briefly explain how you arrived at your
answer.

It is increasing at a decreasing rate.

We can see whether the equa on shows a diminishing return by looking at the marginal
effect from age 39 to 40, when educa on e = 10 (constant).

Age, a = 39

Wage, w = 39.587
Age, a = 40

Wage, w = 39.6

Marginal effect = 39.6 – 39.587 = .013

Marginal effect from 23 to 24 = .109 > Marginal effect from 39 to 40 = .013 (before it hit the
maximum average wage at 41.67).

It shows diminishing returns.

We can also conclude from looking at the equa on that the 2nd order is in nega ve form that
as the age increases, the average wage increases at a decreasing rate.

4. Open the DERBY5 dataset that comes with the textbook. This data pertains to the
famous horse races at the Kentucky Derby in Louisville, Kentucky each year. The dataset
contains the amount of money bet each year (in millions of dollars) on horse races at the
Kentucky Derby in Louisville, Kentucky from 1927 to 1992. Once you open this dta file,
format the me series data via the following command: tsset date, yearly

a. Build an extrapola ve model for the amount bet using linear or nonlinear trends.
Jus fy your choice of model. In addi on, provide the following: (i) the regression
output from Stata of your chosen model, and (ii) a me series line graph showing the
actual amount bet and the predicted amount.

Commands: summarize bets date


generate centeryear = (date-1959.5)
generate centeryearsqr = (date-1959.5)^2
generate centeryearcube = (date-1959.5)^3
Since we don’t know how to compare log-linear model with quadra c model, we didn’t use
log-linear model, even though the regression line for log linear model seems to fit between
the data very well. We also chose cubic model over log-linear model because the log-linear
model shows some heteroskedas city.
We used cubic model over simple linear regression and quadra c model. Comparison
between simple, quadra c and cubic model is below:

Equa on:

ˆ ˆ 2 ˆ 3
bets = 1.623857 + .0826248 centeryear + .0022012centeryear + .0000243centeryear

Simple Linear Regression Quadra c Model Cubic Model


(centered) (centered)
R-Squared .8520 .9755 0.9797
Adjusted R-Squared .8497 .9747 0.9787
RMSE .79406 .32563 .29875

From above, we believe that cubic model is superior to quadra c model since R-squared,
adjusted R-Squared increased, and RMSE decreased.
We also centered the x variable. The Varia on infla on factor and correla on was very high if
we didn’t center the data.

A er centering the data, the collinearity is reduced.


b. Use your model to forecast the amount bet in 1993 and 1994.

Using cubic model

ˆ ˆ 2 ˆ 3
bets = 1.623857 + .0826248 centeryear + .0022012centeryear + .0000243centeryear
ˆ
centeryear = y ear − mean_year
year = 1993, mean_year = 1959.5
ˆ
centeryear = y ear − mean_year = 1993-1959.5 = 33.5
bets = 1.623857 + .0826248 × 33.5 + .0022012 × 33.52 + .0000243 × 33.53
bets = 7.78 (rounded)
Year = 1994
ˆ
centeryear = y ear − mean_year = 1994-1959.5 = 34.5
bets = 8.09

You might also like