You are on page 1of 72

Dummy Variables

 Some potential explanatory variables are categorical


and cannot be measured on a quantitative scale.
 However, we often need to use these variables
because they are related to the response variable.
 The trick is to create dummy variables, also called
indicator or 0-1 variables.
 These are variables that indicate the category a given
observation is in.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6
Dummy Variables -- continued
 To create dummy variables we can use an IF
statement or we can use StatPro’s Dummy variable
procedure.
 The Dummy variable procedure is usually easier
particularly when there are multiple categories.
 Once the dummy variables are created, we can
combine the variables if we like by simply adding the
columns to get the dummy for the new category.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6
Regression Analysis
 In this example we create dummy variables for Gender, and
EducLev.
 Then we can run a regression analysis with Salary as the
response variable, using any combination of numerical and
dummy explanatory variables.
 We must follow two rules:
– We shouldn’t use any of the original categorical variables that the
dummies are based on.

– We should use one less dummy than the number of categories for
any categorical variable.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6
Regression Analysis -- continued
 This second rule is a technical one. If we violate it the
software will give us an error message.
 For example, Ed_1-Ed_6, any five of these variables
can be used. The omitted dummy then corresponds to
the reference category.
 As we will see the interpretation of the dummy variable
coefficients are all relevant to this reference category.
 To get used to dummy variables in regression analysis
we will proceed in several stages.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6
Regression Analysis -- continued
 We first estimate a regression equation with only one variable.
The output is shown in this table. The resulting equation is
Predicated Salary = 45.505 - 8.26Female

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6
Regression Analysis -- continued
 To interpret this equation recall that Female has only
two possible values, 0 and 1. If we substitute 1 then
the predicted salary equals 37.209 and if we
substitute 0 the predicated salary is 45.505.
 These are the average salaries of females and
males. Therefore the interpretation of the -8.926
coefficient of the Female dummy variable is
straightforward.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6
Regression Analysis -- continued
 The above equation only tells part of the story, it
ignores all information except for gender.
 We expand this equation by adding the experience
variables. The output is shown in this table.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6
Regression Analysis -- continued
 The corresponding equation is
Predicted Salary = 35.492 + 0.998YrsExper
+ 0.131YrsPrior - 8.080Female

 It is useful to write two separate equations, one for females


and one for males
Predicted Salary = 27.412 + 0.988YrsExper + 0.131YrsPrior
Predicted Salary = 35.492 + 0.988YrsExper + 0.131YrsPrior

 We interpret the coefficient -8.080 of the Female dummy


variable as the average salary disadvantage for females
relative to males after controlling for job experience. But
there is still more story to tell.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6
Regression Analysis -- continued
 We next add job grade to the equation by including
five of the six job grade dummies. Although any five
can be use we use Job_2-Job_6. The resulting
output is shown in this table.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6
Regression Analysis -- continued
 The estimated regression equations is now
Predicated Salary=30.230 + 0.408YrsExper + 0.149YrsPrior
- 1.962Female + 2.57Job_2 + 6.295Job_3 + 10.475Job_4
+16.011Job_5 + 27.647Job_6

 There are no two categorical variables involved,


gender and job grade.
 However, we can still write a separate equation for
any combination of categories by setting the
dummies to the appropriate values.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6
Regression Analysis -- continued
 For example, the equation for females at the fifth job
grade is found by setting Female=1 and Job_5=1 and
setting the other job dummies equal to 0. The
equation formed is
PredictedSalary = 44.279 + 0.408YrsExper + 0.150YrsPrior

 We interpret this equation as follows:


– For either gender and any job grade, the expected increase
is salary for one extra year of experience with Fifth National
is $408; the expected salary increase for one year
experience with another bank is $149.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6
Regression Analysis -- continued
– The coefficients of the job dummies indicate the average increase in
salary an employee can expect relative to the reference (lowest) job
grade.

– The key coefficient, the negative $1962 for females indicates the
average salary disadvantage for females relative to males, given that
they have the same experience levels and are in the same job grade

 Although the “penalty” is still substantial, it is less than a


fourth of the penalty we saw before.
 It appears that females might be getting paid less on average
partly because they are in the lower job categories.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6
Regression Analysis -- continued
 We can check whether females are
disproportionately in the lower job categories by using
a pivot table with JobGrade in the row area, Gender
in the column area and the count (expressed as a
percentage) of any variable in the data area.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6
Regression Analysis -- continued
 Clearly, females tend to be concentrated at the lower
job grades.
 This certainly helps to explain why females get lower
salaries on average, but it doesn’t explain why
females are at the lower job grades in the first place.
 We won’t be able to provide a thorough analysis of
this issue but we can add one more piece to the
puzzle now by adding education level, age, and
PCJob to the equation.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6
Regression Analysis -- continued
 We don’t provide the whole equation but the resulting
output is shown here.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6
Regression Analysis -- continued
 The coefficients can be seen in the output.
 It doesn’t appear to add much to the previous
equation. The “penalty” does, however, go up to
$2555, which is slightly greater than the $1962.
 At face value we can interpret the coefficients of the
education dummies as a benefit (or loss if negative)
of extra education relative to a high school diploma,
the reference category.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6
Regression Analysis -- continued
 The coefficient of PCJob implies that an employee
with a computer-related job can expect an extra
$4923 in salary relative to an employee without a
computer-related job, provided the other variables
are the same for each employee.
 The age coefficient is quite small and has little effect
on salary.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6
Conclusion
 The main conclusion we can draw from the output is
that there is still a plausible case to be made for
discrimination against females, even after including
information on all the variables in the database in the
regression equation.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6
Modeling Possibilities
BANK.XLS
 The Fifth National Bank of Springfield is facing a
gender-discrimination suit. The charge is that its
female employees receive substantially smaller
salaries than its male employees.
 The bank’s employee database is listed in this file.
Here is a partial list of the data.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6
Question
 Earlier we estimated an equation for Salary suing the
numerical explanatory variables YrsExper and YrsPrior
and the dummy variable Female.
 If we drop the YrsPrior variable from the equation (for
simplicity) and rerun the regression, we obtain the
equation
Predicted Salary = 35.824 + 0.981YrsExper - 8.012Female

 The R2 value for this equation is 49.1%. If we decide to


include an interaction variable between YrsExper and
Female in this equation, what is the effect?

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6
Interaction Terms
 An interaction variable algebraically is the product of
two variables. Its effect is to allow the effect of one of
the variables on Y to depend on the value of the
other variable.
 The interaction term allows the slope of the
regression line to differ between the two categories.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6
Solution
 We first need to form an interaction variable that is the
product of YrsExper and Female.
 This can be done two ways in Excel.
– we can do it manually by introducing a new variable that contains
the product of the two variables involved, or

– we can use the StatPro/Data Utilities/Create Interaction Variable


menu item.

 Using the latter way we must select Female and YrsExper


as the variables, and we do not check either of the boxes in
the dialog box -- neither should be a categorical variable.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6
Solution -- continued
 Once the interaction variable has been created, we
include it in the regression equation in addition to the
other variables. The multiple regression output is
shown here.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6
Solution -- continued
 The estimated regression equation is
Predicated Salary = 30.430 + 1.528YrsExper + 4.908Female
- 1.248YrsExper_Female

 As we discussed before it is useful to write this equation


as two separate equations, one for females and one for
males. The female equation is
Predicated Salary = 34.528 + 0.280YrsExper
and the male equation is
Predicated Salary = 30.430 + 1.528YrsExper

 Next we can show these equations graphically.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6
Nonparallel Female and Male
Salary Lines

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6
Solution -- continued
 The Y-intercept for the female line is slightly higher - females
with no experience at Fifth National Bank tend to start out
slightly higher than males - but the slope of the female line is
much lower. That is, males tend to move up the salary ladder
much more quickly than females.
 Again, this provides another argument, although a somewhat
different one, for gender discrimination against females.
 The R2 value increased from 49.1% to 63.9%. The interaction
variable has definitely added to the explanatory power of the
equation.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6
Modeling Possibilities
BANK.XLS
 The Fifth National Bank of Springfield is facing a
gender-discrimination suit. The charge is that its
female employees receive substantially smaller
salaries than its male employees.
 The bank’s employee database is listed in this file.
Here is a partial list of the data.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6
Question
 A glance at the distribution of salaries of the 208
employees shows some skewness to the right - a few
employees make substantially more than the majority
of employees.
 Therefore, it might make sense to use the natural
logarithm of Salary instead of Salary as the response
variable.
 If we do this, how do we interpret the results?

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6
Solution
 All of the analyses we did earlier with this data set
could be repeated except with Log_Salary as the
response variable.
 For the sake of discussion we will look only at the
regression equation with Female and YrsExper as
explanatory variables.
 After we create the Log_Salary variable and run the
regression, we obtain the output shown here.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6
Regression Output with
Log_Salary as Response Variable

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6
Solution
 The estimated regression equation is
Predicted Log_Salary = 3.5829 +0.0188YrsExper
- 0.1616 Female

 The R2 and se values are 42.4% and 0.1794. For comparison


with Salary these were 49.1% and 8.070.
 We first interpret that neither of these values are directly
comparable to the Salary values.
 The two R2 values are percentages explained of different
response variables, Log_Salary and Salary. The fact that one is
smaller does not mean a “worse” fit. They simply aren’t
comparable.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6
Solution -- continued
 The situation for se is even worse. Each se is a measure of a
typical residual, but the residuals in the Log_Salary equation
are in log dollars, whereas the residuals in the Salary equation
are in dollars.
 Therefore it is no surprise that the Log_Salary is much smaller
than the se for the Salary equation.
 If we want comparable standard error measures for the two
equations, we should take antilogs of the fitted values from the
Log_Salary equation to convert them back to dollars, subtract
these from the original Salary values, and take the standard
deviation of these residuals.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6
Solution -- continued
 The resulting standard deviation is 7.74. This is
somewhat smaller than the se from the Salary equation,
an indication of a slightly better fit.
 Finally we interpret the equation itself.
 When the response variable is Log_Y and a term on
the right hand side of the equation is of the form bX,
then whenever X increases by one unit Y-hat changes
by a constant percentage, and this percentage is
approximately equal to b (written as a percentage).

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6
Solution -- continued
 This means that for each year of experience with
Fifth National, an employees salary can be expected
to increase 1.88%.
 The Female expected percentage decrease in salary
is 16.16%.
 In other words this equation implies that females can
expect to make about 16% less than men for
comparable years of experience.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6
Modeling Possibilities
POWER.XLS
 The Public Service Electric Company produces
different quantities of electricity each month, depending
on the demand.
 This file lists the number of units of electricity produced
(Units) and the total cost of producing these (Cost) for
a 36-month period.
 The data set appears on the next slide.
 How can regression be used to analyze the
relationship between Cost and Units?

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6
Data for Electric Power

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6
Solution
 A good place to start is with a scatterplot of Cost
versus Units.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6
Solution -- continued
 The scatterplot indicates a definite positive relationship and
one that is nearly linear.
 However, there is also some evidence of curvature in the
plot. The points increase slightly less rapidly as Units
increase from left to right.
 In economic terms, there may be economics of scale, where
marginal cost of the electricity decreases as more units of
electricity are produced.
 Nevertheless, we use regression to estimate a linear
relationship between Cost and Units.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6
Solution -- continued
 The resulting regression equation is
Predicted Cost = 23,651 + 30.53 Units

 The corresponding R2 and se are 73.6% and $2734. We also


requested a scatterplot of the residuals versus the fitted values.
The scatterplot is on the next slide. Obtaining this scatterplot is
always a good idea if nonlinearity is suspected.
 The sign of nonlinearity in this plot is that the residuals to the far
left and the far right are all negative, whereas the majority of the
residuals in the middle are positive.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6
Residuals from a Straight-Line
Fit

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6
Solution -- continued
 Admittedly the pattern is far from perfect - there are a
few negatives in the middle - but the plot does hint at
nonlinear behavior.
 The negative-positive-negative behavior of the residuals
suggests a parabola; that is, a quadratic equation with
the square of Units included in the equation.
 We first create a new variable Sqr_Units in the data set.
This can be done manually or using StatPro’s Transform
Variables menu item.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6
Solution -- continued
 Then we use multiple regression to estimate the
equation for Cost with both explanatory variables,
Units and Sqr_Units, included.
 The resulting equation from the output on the next
slide is
Predicated Cost = 5793 +98.3Units - 0.0600Sqr_Units

 Note that R2 has increase to 82.2% and se has


decreased to $2281.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6
Regression Output with Squared
Term Included

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6
Solution -- continued
 One way to see how this regression equation fits the
scatterplot of Costs versus Units is to use Excel’s
trendline option.
 To do so activate the scatterplot, click on any point and
use the Chart/Add Trendline menu item, click the Type
tab and select the Polynormal type or order 2, that is a
quadratic.
 A graph of the equation is superimposed on the
scatterplot on the following slide. It shows reasonably
good fit, plus an obvious curvature.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6
Quadratic Fit Scatterplot

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6
Solution -- continued
 The main downside to a quadratic regression equation is that
there is no easy interpretation of the coefficients of Units and
Sqr_Units.
 All we can say is that the terms in the equation combine to
explain the nonlinear relationship between units produced and
total cost.
 A final note about the equation concerns the coefficient of
Sqr_Units.
– First, the fact that it is a negative make the parabola bend downward.
This produces the decreasing marginal cost behavior, where every
extra unit of electricity incurs a smaller cost.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6
Solution -- continued
– Second, we shouldn’t be fooled by the small magnitude of
this coefficient. Remember that it is the coefficient of Units
squared, which is a large quantity. Therefore, the effect of
the product -0.0600Sqr_Units is sizable.

 One other possibility we might examine is a


logarithmic fit.
 In this case we create a new variable Log_Units, the
natural logarithm of Units, and then regress Cost
against the single variable Log_Units.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6
Solution -- continued
 To create the new variable we can again use
StatPro’s Transform Variable menu item and then we
can superimpose a logarithmic curve on the
scatterplot of Cost versus Units by using the trendline
feature.
 This curve appears in the scatterplot on the next
slide.
 To the naked eye, it appears to be similar, and about
as good a fit as the quadratic curve.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6
Logarithmic Fit Scatterplot

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6
Solution -- continued
 The resulting regression equation is
Predicted Cost = -63,993 + 16,654Log_Units

 The values of R2 and se are 79.8% and 2393.


 These latter values indicate that the logarithmic fit is
not quite as good as the quadratic fit.
 However, the advantage of the logarithmic equation
is that it is easier to interpret.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6
Solution -- continued
 In this case, where the log of the explanatory variable is
used, we can interpret its coefficient as follows.
– Suppose Units increases by 1%, for example from 600 to 606.
Then the equation implies that the expected Cost will increase
approximately $166.54.

– In words, every 1% increase in Units is accompanied by an


expected $166.54 increase in Cost.

– Note that for larger values of Units, a 1% increase represents a


larger absolute increase. But each such 1% increase entails the
same increase in Cost. This is another way of describing the
decreasing marginal cost property.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6
Modeling Possibilities
CARDEMAND.XLS
 This file contains annual data (1970-1987) on domestic auto
sales in the United States. The data set is shown here on the
next slide.
 The variables are defined as
– Quantity: annual domestic auto sales (in number of units)
– Price: real price index of new cars
– Income: real disposable income
– Interest: prime rate of interest
 Estimate and interpret a multiplicative (constant elasticity)
relationship between Quantity and Price, Income and Interest.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6
Car Demand Data

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6
Constant Elasticity Relationships
 A particular type of nonlinear relationship that has
firm grounding in economic theory is called a
constant elasticity relationship. It is also called a
multiplicative relationship.
 One property of this type of relationship is that the
effect of a change on any explanatory variable Xi on
Y depends on the levels of the other X’s in the
equation.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6
Solution
 We first take the natural logs of all four variables.
– This can be done in one step using the Transform Variables
menu item or we can use Excel’s LN function.

 We then use multiple regression, with Log_Quantity as


the response variable and Log_Price, Log_Income, and
Log_Interest as the explanatory variables.
 The resulting output is shown on the next slide and the
corresponding equation
Predicted Log_Quantity = 4.675 - 1.185Log_Price
+ 2.183Log_Income - 0.19Log_Interest

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6
Regression Output for
Multiplicative Relationship

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6
Solution -- continued
 If we like we can convert this back to the original variables,
that is back to multiplicative form, by taking antilogs. The
result is
Predicted Quantity = 107.198Price-1.185Income2.183Interest-0.191

 In either form the equation implies that the elasticities are


approximately equal to -1.185, 2.183 and -0.191.
 When Price increases by 1%, Quantity tends to decrease by
about 1.185%; when Income increases by 1%, Quantity
tends to increase by about 2.183%; and when Interest
increases by 1%, Quantity tends to decrease by about
0.191%.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6
Conclusions
 Does this multiplicative equation provide a better fit to
the automobile data than does an additive relationship?
 Without doing considerable more work it is difficult to
answer this questions with certainty.
 As we discussed previously, it is not sufficient to
compare R2 and se values for the two fits.
 We will simply state that the multiplicative relationship
provides a reasonably good fit, and it makes sense
economically.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6
Modeling Possibilities
LEARNING.XLS
 The Presario Company produces a variety of small
industrial products.
 It has just finished producing 22 batches of a new
product (new to Presario) for a customer.
 This file contains the times (in hours) to produce each
batch. These data are in the table on the next slide.
 Clearly, the times have tended to decrease as Presario
has gained more experience in making the product.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6
Data for Learning Curve

 Does the multiplicative learning model apply to these


data, and what does it imply about the learning rate?
13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6
Learning Curve Model
 A final example of a multiplicative relationship is the
learning curve model.
 A learning curve relates the unit production time (or
cost) to the cumulative volume of output since that
production process first began.
 Empirical studies indicate that production times tend to
decrease by a relatively constant percentage every
time cumulative output doubles.
 The constant percentage is called the learning rate.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6
Solution
 One way to check whether the multiplicative learning
model is reasonable is to create the log variables
Log_time and Log_batch in the usual way and then
see whether a scatterplot of Log_Time versus
Log_Batch is approximately linear.
 The multiplicative model implies that it should be.
 Such a scatterplot is shown on the next slide, along
with a superimposed linear trend line. The fit appears
to be quite good.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6
Scatterplot of Log Variables with
Linear Trend Superimposed

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6
Solution -- continued
 To estimate the relationship, we regress Log_Time
on Log_Batch. The resulting equation is
Predicated Log_Time = 4.834 - 0.155Log_Batch

 There are a couple of ways of interpreting this


equation.
– First, because it is based on a multiplicative relationship, we
can interpret the coefficient -0.155 as an elasticity. That is
when Batch increases by 1%, Time tends to decrease by
approximately 0.155%. Although this is correct it is not as
“useful” as the “doubling” interpretation.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6
Solution -- continued
– We know that the estimated learning rate satisfies
-0.155 = ln(learning rate/ln(2)
Solving for the learning rate (multiply through by ln(2)) and
then take antilogs, we find that it is 0.898, or approximately
90%. In other words, whenever cumulative production
doubles, the time to produce a batch decreases by about
10%.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6
Predicting Future Production
Times
 Presario could use this regression equation to predict
future production times.
 For example, suppose the customer places an order for
15 more batches of the same product. We can use the
equation to predict the log of production time for each
batch, then take their antilogs and sum them to obtain
the total production time.
 The calculations are shown in rows 26-42 of the
following table. The total predicted time to finish is about
1115 hours.

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6
Using the Learning Curve Model
for Predications

13.2 | 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | 13.6

You might also like