Professional Documents
Culture Documents
Understanding
All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted under U.S. or applicable copyright law.
Regression
Analysis
by Amy Gallo
87
EBSCO Publishing : eBook Community College Collection (EBSCOhost) - printed on 1/18/2023 5:06 PM via UNIVERSITY CANADA WEST
AN: 1798828 ; Harvard Business Review.; HBR Guide to Data Analytics Basics for Managers (HBR Guide Series)
Account: ns012452.main.ehost
88
EBSCOhost - printed on 1/18/2023 5:06 PM via UNIVERSITY CANADA WEST. All use subject to https://www.ebsco.com/terms-of-use
this case, let’s say you find out the average monthly rain-
fall for the past three years as well. Then you plot all of
that information on a chart that looks like figure 10-1.
The y-axis is the amount of sales (the dependent vari-
able, the thing you’re interested in, is always on the y-
axis) and the x-axis is the total rainfall. Each dot repre-
sents one month’s data—how much it rained that month
and how many sales you made that same month.
Glancing at this data, you probably notice that sales
are higher on days when it rains a lot. That’s interesting
to know, but by how much? If it rains three inches, do
you know how much you’ll sell? What about if it rains
four inches?
FIGURE 10-1
89
EBSCOhost - printed on 1/18/2023 5:06 PM via UNIVERSITY CANADA WEST. All use subject to https://www.ebsco.com/terms-of-use
FIGURE 10-2
EBSCOhost - printed on 1/18/2023 5:06 PM via UNIVERSITY CANADA WEST. All use subject to https://www.ebsco.com/terms-of-use
y = 200 + 5x
EBSCOhost - printed on 1/18/2023 5:06 PM via UNIVERSITY CANADA WEST. All use subject to https://www.ebsco.com/terms-of-use
get the line that fits best with your data.” While there can
be dangers in trying to include too many variables in a
regression analysis, skilled analysts can minimize those
risks. And considering the impact of multiple variables
at once is one of the biggest advantages of regression.
92
EBSCOhost - printed on 1/18/2023 5:06 PM via UNIVERSITY CANADA WEST. All use subject to https://www.ebsco.com/terms-of-use
93
EBSCOhost - printed on 1/18/2023 5:06 PM via UNIVERSITY CANADA WEST. All use subject to https://www.ebsco.com/terms-of-use
94
EBSCOhost - printed on 1/18/2023 5:06 PM via UNIVERSITY CANADA WEST. All use subject to https://www.ebsco.com/terms-of-use
lect it, and know whether you can trust it (as we learned
in chapter 8). “All the data doesn’t have to be correct or
perfect,” explains Redman, but consider what you will be
doing with the analysis. If the decisions you’ll make as a
result don’t have a huge impact on your business, then
it’s OK if the data is “kind of leaky.” But, “if you’re try-
ing to decide whether to build 8 or 10 of something and
each one costs $1 million to build, then it’s a bigger deal,”
he says.
Redman also says that some managers who are new
to understanding regression analysis make the mistake
of ignoring the error term. This is dangerous because
they’re making the relationship between two variables
more certain than it is. “Oftentimes the results spit out
of a computer and managers think, ‘That’s great, let’s use
this going forward.’” But remember that the results are
always uncertain. As Redman points out, “If the regres-
sion explains 90% of the relationship, that’s great. But
if it explains 10%, and you act like it’s 90%, that’s not
good.” The point of the analysis is to quantify the cer-
tainty that something will happen. “It’s not telling you
how rain will influence your sales, but it’s telling you the
probability that rain may influence your sales.”
The last mistake that Redman warns against is letting
data replace your intuition. “You always have to lay your
intuition on top of the data,” he explains. Ask yourself
whether the results fit with your understanding of the
situation. And if you see something that doesn’t make
sense, ask whether the data was right or whether there
is indeed a large error term. Redman suggests you look
to more experienced managers or other analyses if you’re
95
EBSCOhost - printed on 1/18/2023 5:06 PM via UNIVERSITY CANADA WEST. All use subject to https://www.ebsco.com/terms-of-use
96
EBSCOhost - printed on 1/18/2023 5:06 PM via UNIVERSITY CANADA WEST. All use subject to https://www.ebsco.com/terms-of-use
20 1,950
10 1,925
iPhone
sales
0 1,900
$22B 70 LBS
20
Per capita consumption 65
of high-fructose
18 corn syrup (U.S.)
16 60
14
55
12
Spending on admission
to spectator sports (U.S.)
10 50
, , , , , , , , ,
2000 01 02 03 04 05 06 07 08 09
Source: Tylervigen.com
(continued)
97
EBSCOhost - printed on 1/18/2023 5:06 PM via UNIVERSITY CANADA WEST. All use subject to https://www.ebsco.com/terms-of-use
(continued)
Visitors to Universal
Orlando’s “Islands
7 of Adventure”
Sales of new
cars (U.S.)
5 4
2007 2008 2009
Source: Tylervigen.com
98
EBSCOhost - printed on 1/18/2023 5:06 PM via UNIVERSITY CANADA WEST. All use subject to https://www.ebsco.com/terms-of-use
80M $800M
70
600
eBay total gross
60 merchandise volume
40 200
2008 2009 2010 2011 2012 2013
80M $800M
60 600
40 400
20 200
0 0
, , , , , , , , , ,
2008 09 10 11 12 13 2008 09 10 11 12 13
(continued)
99
EBSCOhost - printed on 1/18/2023 5:06 PM via UNIVERSITY CANADA WEST. All use subject to https://www.ebsco.com/terms-of-use
(continued)
$500K $35K
400 30
300 25
Customers Customers
200 under 40 over 40 20
100 15
0 10
J F M A M J J A S O N D
$500K
400
300
Customers
200 under 40
100
Customers over 40
0
J F M A M J J A S O N D
100
EBSCOhost - printed on 1/18/2023 5:06 PM via UNIVERSITY CANADA WEST. All use subject to https://www.ebsco.com/terms-of-use
$30M 140M
124
25
Musical works
copyrighted (U.S.) 108
20
92
15 Pandora
net losses 76
10 60
2006 2007 2008 2009
–$10M 100
–20 50
–30 0
2006 2007 2008 2009
101
EBSCOhost - printed on 1/18/2023 5:06 PM via UNIVERSITY CANADA WEST. All use subject to https://www.ebsco.com/terms-of-use