Professional Documents
Culture Documents
Mariah Rippe
Erna Rismawaty
ECON 601 Module 5 Problem
Set
Fall 2019
Your solu ons should be typed and well organized. You need to explain/show all of the steps
you used to arrive at your answer. Submit your work through Blackboard as a Word or pdf file.
1. Indicate whether the following statements are true or false, along with a brief
explana on.
a. OLS can be used to study curvilinear rela onships between X and Y.
True. OLS can be used to es mate curvilinear rela onships between X and Y;
however, the data needs to be transformed to find this rela onship.
f. All other factors the same, a lower value for the standard error of the regression is
preferred to a higher value.
True: Lower standard error means lower standard devia on of the residual errors.
Smaller standard error means more accuracy (average devia on of error from the
mean is smaller) than a regression equa on with higher standard devia on. With
higher standard error, the value of residual errors are more dispersed from the
mean residual error.
g. The es mated slope coefficient in a log-log model can be interpreted as an elas city.
True. The coefficient b1 measures the sensi vity of y to a percent change in x.
2. Open the file “2018 Movie Gross.dta” on Blackboard. This contains data on movies
released in 2018 that I obtained from www.boxofficemojo.com. The two variables in this
file are:
Y = TG (total domes c gross revenue, $ mil.)
X= OG (opening weekend gross revenue, $ mil.)
Opening weekend gross revenue for a movie tends to be related with how much total gross
revenue that movie generates. Thus, the movie industry closely watches the opening
weekend gross revenues to help make decisions about adver sing, distribu on, etc.
a. Create a sca erplot showing the rela onship between TG and OG. Paste this graph
into your solu ons. Briefly describe what you observe in this sca erplot.
Based off of what the sca erplot is showing, we see that moving from le to right,
the value becomes more spread out. The data is also clustered to the le of x axis
and bo om of y axis. Also we observe the same when we move up the y axis that
the data gets spread out. This suggests natural logarithm on x and y which
condenses the data closer together.
b. Es mate each of the following models and fill in the table values:
Coefficient Is OG sta s cally
Standard Error of
of significant at the
Model the Regression
determina o 1% level? (Yes or
(RMSE)
n (R2) No)
Model 1 (TG vs OG) 0.9356 24.799 Yes
Model 2 (TG vs log of OG) 0.3895 76.373 Yes
Model 3 (log of TG vs OG) 0.3572 1.6605 Yes
Model 4 (log of TG vs log of OG) 0.7541 1.027 Yes
c. Provide an interpreta on for the marginal effect of OG for each model. Your answers
must specify the correct unit of measurement for the predicted change in TG (i.e.,
either % or millions of dollars:
Model 1: High correla on, but very high standard error. Although regression makes a straight
line, the data on both axis gets more dispersed from le to right on x axis, and moving up the
y axis.
Model 2: Correla on declines substan ally, and standard error is ever higher than model 1
standard error.
Model 3: Since dependent variable was logged, R-squared and standard error cannot be
comparable to other models. From looking at the regression line, it doesn’t seem like a good
fit.
Model 4: Model 4 is be er over model 3, with higher R-squared and lower Standard Error.
Also, the residual plot is more constant around 0 for model 4 than any other model. Thus
model 4 is the best model for predic on.
3. Suppose you have been hired by a human resources department to build a model of
employee wages. The model specifies wages as a func on of an employee’s age (Age)
and his/her years of
educa on (Edu). The es mated model is:
W agei = 11.4 + 0.25Agei − 0.003Agei2 + 2.3Edui
a. Holding Edu fixed at a value of 10, calculate the marginal effect of Age on Wage for a
person whose age changes from 23 to 24. Repeat this calcula on for a person whose
age changes from 54 to 55. Round all values to the nearest penny.
Suppose,
Wage = w,
Age = a
Edu = e
x= − b 2
2a (for ax + bx + c = y )
a = 2(−.25 2
−.003) = 41.67 ( w = 11.4 + .25×a − .003×a + 2.3×e )
At age 41.67 , the average employee wage is the highest.
c. What type of rela onship is there between employee Age and Wage according to
the es mated regression model? Your answer should be either (i) increasing at an
increasing rate; (ii) increasing at a decreasing rate; (iii) decreasing at an increasing
rate; or (iv) decreasing at a decreasing rate. Briefly explain how you arrived at your
answer.
We can see whether the equa on shows a diminishing return by looking at the marginal
effect from age 39 to 40, when educa on e = 10 (constant).
Age, a = 39
Wage, w = 39.587
Age, a = 40
Wage, w = 39.6
Marginal effect from 23 to 24 = .109 > Marginal effect from 39 to 40 = .013 (before it hit the
maximum average wage at 41.67).
We can also conclude from looking at the equa on that the 2nd order is in nega ve form that
as the age increases, the average wage increases at a decreasing rate.
4. Open the DERBY5 dataset that comes with the textbook. This data pertains to the
famous horse races at the Kentucky Derby in Louisville, Kentucky each year. The dataset
contains the amount of money bet each year (in millions of dollars) on horse races at the
Kentucky Derby in Louisville, Kentucky from 1927 to 1992. Once you open this dta file,
format the me series data via the following command: tsset date, yearly
a. Build an extrapola ve model for the amount bet using linear or nonlinear trends.
Jus fy your choice of model. In addi on, provide the following: (i) the regression
output from Stata of your chosen model, and (ii) a me series line graph showing the
actual amount bet and the predicted amount.
Equa on:
ˆ ˆ 2 ˆ 3
bets = 1.623857 + .0826248 centeryear + .0022012centeryear + .0000243centeryear
From above, we believe that cubic model is superior to quadra c model since R-squared,
adjusted R-Squared increased, and RMSE decreased.
We also centered the x variable. The Varia on infla on factor and correla on was very high if
we didn’t center the data.
ˆ ˆ 2 ˆ 3
bets = 1.623857 + .0826248 centeryear + .0022012centeryear + .0000243centeryear
ˆ
centeryear = y ear − mean_year
year = 1993, mean_year = 1959.5
ˆ
centeryear = y ear − mean_year = 1993-1959.5 = 33.5
bets = 1.623857 + .0826248 × 33.5 + .0022012 × 33.52 + .0000243 × 33.53
bets = 7.78 (rounded)
Year = 1994
ˆ
centeryear = y ear − mean_year = 1994-1959.5 = 34.5
bets = 8.09