You are on page 1of 53

MBA PROGRAMME

INTERNAL USE ONLY

Uncertainty, Data & Judgement

Extra Exercises

05/2016-5715
This document contains material that has been prepared, revised, and edited by Michele Hibon who collected the
works done by the DS area for the MBA and EMBA core courses.
Copyright © 2010 INSEAD. Revised © 2016 INSEAD
COPIES MAY NOT BE MADE WITHOUT PERMISSION. NO PART OF THIS PUBLICATION MAY BE COPIED, STORED, TRANSMITTED, REPRODUCED OR DISTRIBUTED IN
ANY FORM OR MEDIUM WHATSOEVER WITHOUT THE PERMISSION OF THE COPYRIGHT OWNER.
INTERNAL USE ONLY

Extra Exercises SET 1


Probability

1. Super Light Bulbs

The service life of Sultania super light bulbs is normally distributed with a mean of 1200 hours
and a standard deviation of 36 hours.

a. What percentage of the bulbs will last for longer than 1272 hours?
b. What percentage of the bulbs will last for less than 1146 hours?
c. The 10% of the bulbs with the longest service life will last for longer than how many
hours?
d. The 20% of the bulbs with the shortest service life will last for less than how many
hours?

2. Dow-Jones Stock Index

Monthly percentage changes in the Dow-Jones stock price index can be considered as normally
distributed with a mean of +0.65 and a standard deviation of 3.5.

Answer each question if a month is selected at random.

a. What is the probability that the index rose during the month?
b. What is the probability of an increase exceeding 5%?
c. What percentage increase is exceeded with probability 0.05?

3. E-mail Messages

An INSEAD professor receives on average 24 e-mail messages per working day. The rate of e-
mails per hour is constant. If his working day lasts 8 hours, what is the probability that

a. He receives no e-mails in half an hour?


b. He receives more than 5 e-mails in two hours?

Copyright © INSEAD 1
INTERNAL USE ONLY

4. Golden M&Ms

The M&M company is considering a new marketing concept to reward addictive customers:
the golden M&Ms. These would be of a rare kind: the probability of finding a golden candy in
a regular bag will be only 10%. For every ten golden M&Ms found, one will get a special prize
(for example, a free CD).
Suppose Rosalind buys 10 bags of M&Ms. What is the probability that she will find at least
one golden candy?

5. Amex

American Express claims that its card is accepted in 60% of the hotels in Europe. If we consider
the group of 10 hotels in Fontainebleau, what is the probability that only 3 will accept American
Express cards?

6. Getting the Royal Treatment

Getting timely medical attention for serious health problems in the UK is becoming more and
more difficult. Suppose a patient, called Harry, who is diagnosed with a certain health problem,
must get surgery done within 15 days in order to have a reasonable chance of survival. The two
hospitals in the region have a waiting list for this surgery, and the patient can only choose one
of the hospitals because of the preliminary procedures for the surgery. Hospital A handles an
average of eight surgeries in 60 days and Hospital B handles an average of six surgeries in 60
days. There are three people on the waiting list for the surgery in Hospital A and there are two
people on the waiting list for the surgery in Hospital B. The rates for the two hospitals are
independent.

a. If Harry wants to maximize the probability of getting the surgery done within 15 days,
which hospital should he choose?

b. Suppose there is only one (combined) waiting list for the two hospitals (basically, the
two hospitals work as one unit with two operating theatres and all the preliminary
procedures are performed at one place) and Harry is now sixth on the waiting list. The
patient on top of the waiting list undergoes surgery at whichever hospital is ready first.

What is the probability that Harry will undergo surgery within 15 days?

Copyright © INSEAD 2
INTERNAL USE ONLY

7. Chip and Dale Up in the Air

While at INSEAD, Chip and Dale fly regularly between Paris and Singapore, and both
passionately collect frequent flier miles. For every flight (round trip business class) they earn
17,000 miles.

(a) Chip flies on average once a month, whereas Dale flies twice every month and a half, on
average. In the first nine months since the beginning of the year, Chip already flew 10 times
and Dale 9 times. With three more months to go until the end of the year, who is more likely to
collect 200,000 miles this year, Chip or Dale?

(b) Chip and Dale collect miles both on AF (AirFrance’s SkyBlue) and SQ (Singapore Airline’s
Krisflyer). Chip is based in Singapore; every time he travels there is a 60% chance he will fly
SQ, and a 40% chance that he will fly AF. Dale is based in France, and is equally likely to fly
AF and SQ. Assume Chip will fly 12 times this year and Dale 14 times. Who is more likely to
fly SQ at least 5 times this year, Chip or Dale?

Copyright © INSEAD 3
INTERNAL USE ONLY

8. Executive Airlines

Executive Airlines operates small luxury jets on commuter flights from Orly airport to
Heathrow (London). Although each of its jets holds 10 passengers, Executive Airlines takes up
to 12 passenger reservations per flight, a practice known as “overbooking”. This is because they
have found that the probability is 20% that a person with a reservation fails to show up for the
flight. Experience has shown that most passengers are traveling alone, so you may assume that
any person(s) showing up doesn’t change the probability of others showing up.

a. If Executive Airlines has 12 reservations for its next flight, what is the probability that there
will not be enough seats on the jet for all the passengers with reservations who show up?

b. Mechanical breakdowns cause significant scheduling problems for small airlines – if one
plane has problems, there may not be another plane available to make a scheduled flight.
Executive Airlines’ planes are among the most reliable in service: they have found that, on
average, one plane experiences mechanical problems every 10 weeks. The breakdowns
appear to occur independently over time.

Suppose one plane has broken down and will take one week to repair. What is the
probability that at least one more plane will break down while this one is being fixed?

c. Executive Airlines has found that its flight time from Orly to Heathrow is normally
distributed with a mean of one hour and a standard deviation of 6 minutes.

To promote its service, the airline Management is considering offering customers a money-
back guarantee: if any flight takes more than 70 minutes to get to London, each passenger
riding that plane will receive a full refund of the €380 (one-way) fare. Thus, sometimes the
airline will pay out the refunds, but usually it will not.
What is the expected loss of revenue per passenger because of this policy?

Copyright © INSEAD 4
INTERNAL USE ONLY

9. Speculate.com

A sizable part of internet-based trading is due to day-trading (extremely short-term


speculations, usually in highly volatile stocks).

a. Bill, a veteran day-trader, believes that there is a 60% chance that he makes a profit at the
end of any given trading day. Since he closes his positions at the end of every trading day,
Bill believes that making a profit or losing money on any given trading day is independent
of what happened or will happen on any other trading day.

Based on Bill’s estimates, what is the probability that he will make a profit on at most 5
trading days in the next 3 weeks of trading? (Note that markets are open Mon.-Fri., so each
week of trading consists of 5 trading days; assume no holidays during the next three weeks.)

b. An Internet based trading service, Speculate.com, estimates that, at the end of the day, the
return on €1 invested in day-trading is normally distributed. Speculate.com’s trading
records suggest that, on average, €1 invested in day-trading is worth only 95 cents at the
end of the day (i.e., day-traders lose money as a group even though there might be some
individual day-traders who make a fortune). The records also suggest a 45% chance that a
€1 day-trading investment will turn out profitable at the end of the day. Using
Speculate.com’s records and estimates, find the standard deviation of the return at the end
of the day on €1 invested in day-trading.

c. Some analysts do not completely agree with the estimates provided by Speculate.com.
While they do agree that at the end of the day the return on €1 invested in day-trading is
normally distributed, they believe that, on average, it will be worth only 90 cents with a
standard deviation of 20 cents. Using the analysts’ estimates, find the probability that €1
invested in day-trading will be worth more than €1.25 at the end of the day.

d. Nancy is concerned about her Internet trading software crashing in critical moments. A
technical support expert told her that these crashes occur independently and at a steady
rate, but are not very likely. In fact, the probability of five or more crashes during the last
trading hour (when many day-traders try to close their positions) is one hundredth of 1%
(i.e., 0.0001). How many crashes are expected during the last trading hour? What is the
probability of having no crashes in the last trading hour?

Copyright © INSEAD 5
INTERNAL USE ONLY

10. Fundraising at METAKNOWLEDGE CAPITAL

Neil and Ilia have decided to start a hedge fund. They plan to capitalize on their years of
experience teaching MBAs and executives, and leverage what they have learned about
predictable human irrationalities. Having been influenced by thinkers ranging from Confucius
and Nietzsche to George Soros, they decided to name their firm METAKNOWLEDGE
CAPITAL, LLC. For short, they just call it M.C.
They both bought a tie (one each) to wear, and have been travelling around the world trying to
raise capital for the fund. Delightfully, they continue to find that the world is easily
characterized by beautiful statistical regularities. For example, they have observed that they
receive signed investment agreements in the mail from new investors at a rate of 5 per week
(on average), and that the rate is constant over the week (and, remarkably, the arrivals are
independent). Please note that a week means from Monday through Friday, as mail is not
delivered on weekends – that is, a week is five business days.

M.C. is looking to hire a team of young quants for the back office. The interview is on Saturday.
Here are some of their interview questions. Please answer them.

a. What is the probability that M.C. will receive at least three signed investment agreements
next week?
b. What is the probability that the first signed investment agreement next week will be received
on Monday?
c. What is the probability that M.C. will receive at least two signed investment agreements
next week if they receive no signed agreements on Monday and Tuesday?

Further, based on giving their pitch a very large number of times, they have found that it is
successful 25% of the time if they wear ties (i.e., they get funding on average from 1 out of
every 4 potential investors they pitch to if they wear ties), and the pitch is successful 35% of
the time if they do not wear ties. Naturally, whether a particular pitch is successful is
independent of what happened at other pitches. And they have found that it makes no difference
whether Neil or Ilia does the pitch, only whether the one doing so wears a tie.

d. Suppose Ilia pitches to ten new potential investors while wearing a tie, and Neil pitches to
six new potential investors while not wearing a tie. Who is more likely to secure funding
from at least three investors?

Copyright © INSEAD 6
INTERNAL USE ONLY

11. Analysing the Fundraising Investments

After six months on the road, Neil and Ilia have decided to take a break from their travels, and
look closely at how their fundraising has been going.

Based on research published in the Global Hedge Fund Report, a monthly magazine on the
hedge fund industry, they know that the typical (individual) investor puts in an average of
$465K (that is, $465,000) when investing in a hedge fund, and that the standard deviation of
the investment is $176K. These results are based on pooling together individual investments in
all hedge funds – they are the population parameters for the entire industry.

Having been taught well in their statistics course during their MBA, the authors of the article
also showed the histogram of the investments (in addition to reporting the mean and standard
deviation). Amazingly, the histogram of the investment amounts was a beautiful normal curve
– the investment amounts are normally distributed.

a. Using the data reported in the Global Hedge Fund Report, what is the probability that a
randomly selected hedge fund investor will invest between $350K and $650K in a particular
fund?
b. Using the data reported in the Global Hedge Fund Report, what is the probability that the
average investment of 20 randomly selected hedge fund investors will be greater than
$500K?

Copyright © INSEAD 7
INTERNAL USE ONLY

EXTRA EXERCISES SET 2


Combination of Random Variables

1. Microsoft and Intel Stock

Suppose you are considering a portfolio of two securities: Microsoft stock and Intel stock.
Let r 1 be the return on Microsoft stock per US$ invested,

and r 2 be the return on Intel stock per US$ invested.

Using historical annual return, you find that


for r 1 the mean is 1.1 and the standard deviation is 0.1,

for r 2 the mean is 2.0 and the standard deviation is 0.5,

and r 1 and r 2 follow a Normal Distribution.

You decide to invest US$ 2500 in Microsoft stock and US$ 2500 in Intel stock assuming that

the correlation between r 1 and r 2 is 0.6 (the returns on the two stocks are positively correlated

i.e., increases in the return on one stock tend to appear with increases in the other stock).

Note that the net profit here would be:


P = 2500 r 1 +2500 r 2 -5000
What is the probability of losing money?

Copyright © INSEAD 8
INTERNAL USE ONLY

2. IKEA Stores

Monthly sales of a popular furniture item at an IKEA store tend to follow a normal distribution
with a mean of 3000 units and a standard deviation of 500 units. The same item is also sold at
another IKEA store in the same vicinity (about 30 km away), but the distribution of the sales is
slightly different: it is normal with a mean of 1970 units and a standard deviation of 320 units.
The correlation between the sales of the item in the two stores is –0.7.

a. Both IKEA stores maintain their own monthly inventories. How many units of the item
should each store stock if they want the probability of running out and turning away
customers who want the item to be only 5% (i.e., 95% of the demand is met from the
stock)?

b. The two IKEA stores are interested in reducing their inventory costs which have been
running very high. They want to consider the option of having a common warehouse half
way between them. They expect that pooling inventories would reduce the amounts
required in stock for almost all items (although, the transportation costs between the
common warehouse and the two stores and the waiting time for customers still have to be
considered). As an example, they want to consider the case of the popular furniture item
mentioned above. How many units should be stored in the common warehouse such that
95% of the total demand in the two stores is met from that stock (i.e., the probability of
running out in both stores combined is only 5%)?

Copyright © INSEAD 9
INTERNAL USE ONLY

3. Annual Returns

An investment analyst has postulated that annual return of S&P500, a popular US market index,
follows a normal distribution with a mean of 10% and a standard deviation of 4%. The
investment analyst wants to outperform the market. She is considering two different ways to
constitute her mutual fund – both of which have a lower expected return but a higher upside
potential compared to the market index.
The annual return of the first mutual fund (FUND1) is believed to follow a normal distribution
with a mean of 9% and a standard deviation of 7%. The correlation between S&P500 and
FUND1 is expected to be 0.6.
The annual return of the second mutual fund (FUND2) is also assumed to follow a normal
distribution with a mean of 9% and a standard deviation of 7% but the correlation between S&P
500 and FUND2 is likely to be –0.3.

a. If the investment analyst adopts FUND1, what is the probability that she will outperform
the S&P500?

b. If the investment analyst adopts FUND2, what is the probability that she will outperform
the S&P500?

c. Suppose the investment analyst decides to follow a completely random strategy for five
years, i.e. in any given year the probability of outperforming the market (S&P500) is 0.5
and her performance in any given year is independent of her performance in any of the
other years.

What is the probability that she will outperform the market in at least three of the next five
years?

Copyright © INSEAD 10
INTERNAL USE ONLY

4. Theo’s Hot Stock Tips

An equity manager, Theo, decides to use a “blind and random rule” for investing. Consequently,
it is assumed that in any given year Theo will “beat” the market index with a probability of 0.5,
i.e., Theo’s return will be greater than the return on the market index with a probability of 0.5.
Furthermore the difference between the percentage return on Theo’s investment and the
percentage return on the market index (i.e., the percentage return on Theo’s investment minus
the percentage return on the market index) is judged to be normally distributed with a standard
deviation of 2%. Also, Theo’s return and the market return are expected to be independent, of
each other and from one year to the next.

a. What is the probability that, in a given year, that Theo’s return will be less than 3% different
(higher or lower) from the market return?

b. Suppose Theo charges his clients an annual management fee of 1.68 %.

What is the probability that, in a given year, the net annual return for the Theo’s clients
(Theo’s annual return minus his management fees) will be higher than the market return?

c. What is the probability that for each of the next five years Theo’s clients will have a net
return higher than the market return?

5. Bowling

Jill’s bowling scores are approximately normally distributed with mean 170 and standard
deviation 20, while Jack’s scores are approximately normally distributed with mean 160 and
standard deviation 15. If Jack and Jill each bowl one game, then assuming that their scores are
independent random variables, approximate the probability that

a. Jack’s score is higher?

b. The total of their scores is above 350?

c. Now, Jack and Jill bowl two games each. What is the probability that Jack’s summary score
(in these two games) is higher? Compare with (a). Does it make sense?

Copyright © INSEAD 11
INTERNAL USE ONLY

6. Renovation Project

The project manager for renovating the INSEAD buildings scheduled the following activities
to complete the project: (1) Installation of the heating unit; (2) Checking the electricity system.
The expected completion time and the standard deviation of each phase are given in the
following table.

Phase Expected Time Standard Deviation


(in weeks) (in weeks)
Installing the heating unit 7 4
Checking the electricity system 5 2

In addition, the team also needs (exactly) 4 extra weeks to strengthen the ceiling panels.
a. The project manager outsourced the second phase of the project (checking the electricity
system) to Adrian, the best electrician in town. Adrian charges a fixed payment of €1000
in addition to a weekly amount of €200. Assuming that the time it takes for checking the
electricity system follows a normal distribution (with µ=5 and σ=2), what is the probability
that INSEAD ends up paying him more than €2500?

b. Suppose that the correlation between the completion time of the first two phases (installing
the heating unit and checking the electricity system) is –0.80.

• What is the expected completion time of the overall project?


• What is the standard deviation of the completion time?

c. Assuming that the completion times follows a normal distribution, what is the probability
that the project will be completed sometime between 12 and 20 weeks from its start date?

Copyright © INSEAD 12
INTERNAL USE ONLY

7. Chip and Dale Up in the Air

Five years after INSEAD, Chip and Dale are happiness consultants with 3R, an INSEAD start-
up. The world is a happier place, and as their business is flourishing, so are their miles accounts
(Frequent flyers on Singapore Airlines). Their next target is to hit one million miles a year on
SQ (this will entitle them to free life-time foot massages on all SQ flights).

In addition to collecting miles by flying, both Chip and Dale earn two SQ miles for each dollar
they spend on their credit card. Assume for simplicity that both the annual number of miles
accrued by flying SQ and annual amount spent on the credit card (in thousands of dollars) are
normally distributed, with parameters given in the table below.

Annual miles on SQ Annual spending


(in thousands of dollars)
average std. deviation average std. deviation

Chip 425,000 85,000 160 40

Dale 612,000 102,000 140 20

Chip spends more money when he travels, whereas Dale spends more money when he stays at
home (basically they both spend more money in France, typically on red wine). The correlation
between the number of miles flown by Chip and his annual spending is 0.3, whereas for Dale
the correlation is -0.4. Calculate the means and standard deviations of the total number of miles
accrued (from flying and credit card) by Chip, respectively Dale in a year. Who is more likely
to collect 1M miles in a given year, Chip or Dale?

Copyright © INSEAD 13
INTERNAL USE ONLY

8. Managing Pizza Delivery

Rick is a managing director of the new PizzaYo! Start-up, specializing in pizza delivery. Having
polished the unique flavour and taste, Rick turns to optimizing delivery operations. One thing
he noticed is that some of the rival companies, like Domino’s Pizza, offer 30 minute delivery
guarantee – so that if pizza is delivered in more than 30 minutes, the customer gets a voucher.
Rick contemplates introducing the same policy.

a. In PizzaYo!, the average profit per order is $20, and about 10% of incoming calls end up
not ordering because of the lack of a 30 minute delivery guarantee – which makes the
average profit per incoming call equal to $18.

Rick collects the data on delivery time in PizzaYo!, builds the histogram, and finds that delivery
times form a perfect bell-shaped normal curve with the mean of 23 minutes and standard
deviation of 6 minutes. Rick thinks that if PizzaYo! starts offering a $5 voucher for any order
that is delivered in more than 30 minutes, only 5% of incoming calls will end up not ordering.
With this 30 minute delivery guarantee and with the assumptions above, what would be the
expected profit per incoming call?

b. Another issue in pizza delivery business is to ensure that the delivery bag is such that the
pizza is hot when delivered. After testing their thermal bags, PizzaYo! finds out that the
time that a pizza stays hot in the bag is normally distributed with the mean of 32 minutes
and standard deviation of 4 minutes. Rick is trying to estimate the probability that pizza is
delivered hot (recall that delivery time is normal, with the mean of 23 minutes and standard
deviation of 6 minutes), assuming that delivery time and time that pizza stays hot are
independent. Please help Rick to estimate this probability.

Copyright © INSEAD 14
INTERNAL USE ONLY

9. Long Term Vodka Management


After a rough period in their relationship, Neil and Ilia decide to work together. They start a
hedge fund, and decide to invest heavily in vodka futures. The key for the success of their
enterprise is accurate forecast of the future price of vodka.
The error in a particular prediction is the difference between the predicted price and the actual
price of vodka:
Error = Predicted Price – Actual Price.
The prices are in US$. The means and standard deviations of these errors for Ilia and Neil are
shown in the following table:

Average Error Standard Deviation


Ilia -0.19 0.24
Neil 0.13 0.11

A standard way of combining forecasts from different sources is to average the predictions,
which sometimes is done with different weights. The Dean suggests that the weights assigned
to different opinions should be proportional to the number of sections of UDJ course taught by
Ilia and Neil. Ilia teaches 2 sections and Neil teaches 4 sections.
That leads to the following combined estimate:

Combined Estimate = (1/3) Ilia’s Estimate + (2/3) Neil’s Estimate.

Therefore, the errors will be combined in the same way: 1

Error of Combined Estimate = (1/3) Ilia’s Error + (2/3) Neil’s Error.

a. On average, by how much will Neil and Ilia’s combined estimate be off? In other words,
what is the average error of the combined estimate?

b. Suppose that the correlation between Neil and Ilia’s errors is 0.37. What is the standard
deviation of the error of the combined estimate?

1 For example, if on a particular day Ilia overestimates by $1 and Neil underestimates by $1, the combined
error will be (1/3)($1)+(2/3)(-$1) ≈ -$0.33. In this case, their combined estimate will underestimate the actual
price of vodka by about 33 cents.

Copyright © INSEAD 15
INTERNAL USE ONLY

The number you have found in (a) is called the bias of the combined forecast. A standard way
of improving a forecast is to “un-bias” it. For example, if average error of forecast is –0.05,
adding 0.05 to all forecasts will make average error to be zero. Un-biased combined forecast
thus is

Combined Unbiased Estimate = (1/3) Ilia’s Estimate + (2/3) Neil’s Estimate - B,

where B is such that the average error of “Combined Unbiased Estimate” is zero.

c. What proportion of Combined Unbiased Estimates is guaranteed to be within 20 cents from


the actual price if you know that the errors for both Ilia and Neil are normally distributed?

Copyright © INSEAD 16
INTERNAL USE ONLY

10. Is More Risk Always Bad?

In the last “real” stage of the 2003 Tour de France, with just the ceremonial ride into Paris to
follow, the second-place cyclist crashed as a result of taking chances on a wet surface in an
attempt to catch up to the leader. In many situations, both in sports and in business, the
competitor falling behind a leader prefers to pursue a more risky strategy. Is this a good idea?

Consider the case of sailboat racing. Based on data from practice races, the leader expects to
finish the race in 4 hours, with a standard deviation of 30 minutes. The follower expects to
finish in 4 hours 10 minutes, with a standard deviation of 25 minutes. Assume that actual time
to finish the race is normally distributed for both the leader and the follower, and the correlation
between them (since many factors, like wind and current, affect them similarly) is 0.5.

a. What is the probability that the leader finishes in less than 4 hours 5 minutes?

b. What is the probability that the follower finishes in less than 4 hours 5 minutes?

c. What is the probability that the average time to finish for both teams (i.e., average time of
the leader and the follower) will be below 4 hours?

d. What is the probability that the follower finishes first?

The follower can take a different tack, which would increase her expected time to finish the
race up to 4 hours 12 minutes, and will also lead to a greater standard deviation of 45 minutes.
It will also decrease the correlation with the leader to 0.2.

e. What is the probability that the follower will finish first if she takes the different tack?

Copyright © INSEAD 17
INTERNAL USE ONLY

Sampling Distribution of the Mean

1. An auditor takes a random sample of size n = 36 from a large population of accounts


receivable. The mean value of the accounts is µ = $260 with a population standard deviation
of σ = $45.

a. What is the probability that the sample mean will be less than $250?
b. What is the probability that the sample mean will be within $15 from the population
mean?

2. Ages of the MBA students of a recent promotion are normally distributed with a mean of
28.6 and a standard deviation of 2.32 years.

a. What is the probability that a student randomly selected is less than 27?
b. What is the probability that the average age of a sample of 9 students is less than 27?

3. For a large population of normally distributed account balances, the mean balance is
µ = $150 with standard deviation σ = $35.

a. What is the probability that one account has a balance that exceeds $160?
b. What is the probability that the mean for a random sample of n = 40 accounts will exceed
$160?

4. Mary Bartel, an auditor for a large credit card company, knows that, on average, the
monthly balance of any given customer is $112, and that the standard deviation is $56. If
Mary audits 50 randomly selected accounts, what is the probability that the sample average
monthly balance is:

a. Below $100?
b. Between $95 and $120

Copyright © INSEAD 18
INTERNAL USE ONLY

EXTRA EXERCISES SET 3


Confidence Intervals

1. The mean dollar amount of sales per customer at a toy store located at an airlines
terminal from a sample of 36 customers is equal to $20 with a standard deviation of $3.20.

a. What is the range within which the mean dollar amount of sales of the store will fall
with a probability of 99%?
b. What size a random sample should be collected as a minimum if you want to estimate
the mean sales amount within $1 with the same level of confidence?

2. A poll for the weekly Journal du Dimanche (February 19, 1995) reported that the
percentage of the eligible voters who were favourable of Edouard Balladur was 46% with
a margin of error (half the width of a 95% confidence interval) at 2.2%. What must have
been the sample size in the poll?

3. Mark Semmes, owner of the Aurora Restaurant, is considering purchasing new


furniture. To help him decide on the amount he can afford to invest in tables and chairs, he
wishes to determine the average revenue per customer. The checks for 9 randomly sampled
customers had an average of £18.30 and a standard deviation of £6.30.

a. Construct a 95% confidence interval for the average check per customer.
b. What additional sample size will be required if Mr Semmes will like the maximum error
in his estimation to not exceed £1?

Copyright © INSEAD 19
INTERNAL USE ONLY

4. Morality of Foreign Policy

The world is full of nationalistic people certain that their country is morally superior to others,
right? Actually, a new WorldPublicOpinion.org poll of 21 nations around the world finds that
people can be remarkably modest.
In this survey, a typical question looked like this: “What about Russia’s foreign policy? Is its
morality “above average”, “about average”, or “below average”?” Our interest is in the
proportion of people responding “about average.”

a. How many people in each country had to be sampled to make sure that the margin of
error of a 95% confidence interval is no more than 4%?
b. In Russia, the sample size was 806, and 411 replied that morality of Russia’s foreign
policy is “about average.” Using this data, construct a 99% confidence interval for the
proportion of people in Russia that think that morality of Russia’s foreign policy is
“about average.”
c. Suppose that, in fact, 50% of French think that the morality of Russia’s foreign policy
is “about average.” If you ask 20 French, what is the chance that more than 12 of them
will respond that the morality of Russia’s foreign policy is “about average?”

5. In a recent Gallop Poll of 1300 likely voters in the USA, 672 claimed support for the
Democratic candidate.

a. Construct a 99% confidence interval for the proportion of all likely voters who support
the Democratic candidate.
b. What is the probability that the true proportion differs from the sample proportion by
more than 0.01?

Copyright © INSEAD 20
INTERNAL USE ONLY

6. Calling Statman?

Gillette recently announced plans to launch a new 3-edged razor, called the MACH3. The
company spent 6 years and more than $750 million developing this razor. The new razor will
sell for approximately $5 with replacement cartridges selling for $1.35 each, a 35% premium
over Gillette’s flagship two-blade Sensor razor. Wall Street Journal (April 14, 1998) calls the
product a “calculated gamble” and notes that “Gillette’s master plan has little room for error”.

a. Gillette has tested the new razor on a sample of 100 members of the target population
(“men who shave”). In this sample, 65 indicated that they prefer the MACH3 to the
Sensor razor.
Use this sample data to construct a 90% confidence interval for the proportion of “men
who shave” who prefer the MACH3 to the Sensor.

b. Now that the razor has been announced, Gillette wants to gather more information to
improve the accuracy of its market share estimates. Suppose Gillette wants to survey
enough “men who shave” to reduce the 90% confidence interval of part (a) to a width
of ± 2%. How many “men who shave” must they sample?

Copyright © INSEAD 21
INTERNAL USE ONLY

7. AKBAR and JEFF Co. hires new consultants only from the top business schools in the world.
New recruits typically receive a signing bonus in addition to their salary.

a. In a survey of 25 new offers from AKBAR and JEFF, the sample mean was €24,500,
and the sample standard deviation was €5,200, for the signing bonus. Construct an 80%
confidence interval for the mean signing bonus at AKBAR and JEFF. What assumptions
do you need to make?

b. For the next year, one would like to obtain the same width of the confidence interval as
in Part (a) but with a confidence level of 99%. What is then the desired sample size for
the survey? (What assumptions are you making now?)

c. In a separate survey of 100 randomly selected graduates of top b-schools, 44 received a


signing bonus and 56 did not. Construct a 90% confidence interval for the proportion
receiving a signing bonus.

d. What should be the sample size to reduce the maximum error in the estimate of P to
0.05, at the same level of confidence?

8. The Dean of the MBA Program and the female vs. male obsession

The Dean would like to better understand the gender composition (male and female) among the
INSEAD MBA population. The Dean randomly samples 120 MBA participants: 28 are female.

a. Build a 95% confidence interval for the population proportion of female participants.
Last year, the 95% confidence interval for the same sample size had a width of 10%.
Please comment on this year‘s confidence interval, and compare it with last year’s.
How can you explain the difference? Be brief and precise.

b. The Dean would like to find n (the sample size) needed to estimate the proportion of
female at INSEAD within a margin of error of no more than 0.04 at a 90% level of
confidence.

Copyright © INSEAD 22
INTERNAL USE ONLY

9. Monthly returns of the Leima stock

Using a sophisticated econometric model, and ISP, a financial analyst has concluded that the
monthly returns of the Leima stock can be reasonably well approximated by a normal
distribution with µ = 0.01 and σ = 0.03 . Furthermore, the financial analyst is very confident
that the monthly returns will follow the same pattern in the foreseeable future, and they are
independent from month to month.

a. Given that you invest in the stock for a period of 9 months, find the range within which
the mean monthly return for these 9 months will fall with probability 0.95.

b. Suppose you have just become the top financial manager for Leima and you feel that
you must maintain an investment in the Leima stock. What is the probability that your
mean monthly return from Leima stock will be more than zero after 24 months?

10. A market researcher for a large consumer electronics company wanted to study television
viewing habits of residents of a particular small city. A random sample of 40 respondents was
selected, and each respondent was instructed to keep a detailed record of all television viewing
in a particular week. The results were as follows:

Amount of viewing per week: X = 15.3 hours , S = 3.8 hours;


27 respondents watched the Evening News on at least three weeknights.

Set up 95% confidence-interval estimates for each of the following:

a. The average amount of television watched per week in this city.


b. The proportion of respondents who watch the Evening News at least three nights per
week.

If the market researcher wanted to take another survey in a different city:

c. What sample size is required if you wish to be 95% confident of being correct to within
± 1 hour and assume the population standard deviation is equal to 5 hours?
d. What sample size is needed if you wish to be 95% confident of being within ± 0.035
of the true proportion who watch the Evening News on at least three weeknights and
no estimate is available?

Copyright © INSEAD 23
INTERNAL USE ONLY

EXTRA EXERCISES SET 4

Hypothesis Testing

1. It has been hypothesised that 5% of the parts being produced in a manufacturing process
are defective. For a random sample of 100 parts, 10 are found to be defective.
Is there enough evidence to reject this claim?
Find the P-value.

2. An automatic soft ice-cream dispenser has been set to dispense 4.00 oz. per serving.
For a sample of 30 servings, the average amount of ice-cream is 4.05 oz. with a standard
deviation of 0.10 oz.
At what level of significance can you say that the process is “in control”?

3. A university librarian suspects that the average number of books checked out to each
student per visit has changed recently. In the past, an average of 3.4 books was checked
out. However, a recent sample of 32 students averaged 4.3 books per visit, with a standard
deviation of 1.5 books.

At α = 0.05 level of significance, has the average checkout changed?

4. On an average day, about 10% of the stocks on the New York Stock Exchange set a new
high for the year. On a particular day, a random sample of 120 stocks determined that 16
of them had set new annual highs that day.

Using a significance level of 0.01, should we conclude that the proportion of stocks which
set new highs on that day has changed?

5. A Company is trying to improve distribution of its brand of frozen desserts. To accomplish


this, it has expanded its sales force to push the product into new outlets. Prior to expanding
the sales force, the company knew that 44% of grocery stores carried at least one of its
products. After hiring the new salespeople, a sample of 200 grocery stores found 52%
carrying the brand.

At α = 0.05 , can the company conclude that distribution has changed?

Copyright © INSEAD 24
INTERNAL USE ONLY

6. AMD produces processors. One of the machines is suspected to generate too many defects
per processor. A quality control test measures defects on a sample of 36 processors. The
test finds that the average number of defects per processor in the sample is 110, with a
sample standard deviation of 20.

At a level of Significance of 5%, is there sufficient evidence to conclude that this machine
works differently than a regular machine, which produces on average 100 defects per
processor?

7. In a middle-size city the household incomes are normally distributed with a mean of
$25,000 and a standard deviation of $2,000. From a specific area of this city a sample of n
=15 households has a mean income of $24,000.
At a level of significance of 5%, is there enough evidence to say that the income in this area
is different from the average? Would you change your decision at a level of significance of
10%?

Copyright © INSEAD 25
INTERNAL USE ONLY

Extra Exercises SET 5


General Overview

1. The Ministry of Environment in Singapore has decided to cut the gas emission of cars
by encouraging the usage of electric cars by its citizens.

a. Studies have shown that the electricity consumption of electric cars is normally distributed
with a mean use of 30 miles per unit of electricity (mpu) and a standard deviation of 4 mpu.
What percentage of electric cars can obtain 35 or more miles per unit of electricity?

b. What must be the mpu so that 5% of the electric cars in the market perform better than this
threshold?

c. The Ministry of Environment wants to test the claim of existing studies that µ=30 mpu by
taking a sample of 100 electric cars. The mean of such sample is 28 mpu, while the sample
variance is 30.25. What can the ministry conclude? What is the P-value?

d. In estimating the real population mpu the Ministry does not want the error involved to
exceed 0.5 mpu. Explain what needs to be done and show whatever computations are
required, if any.

e. The most important component in an electric car is its electrical fuse. The reliability of an
electrical fuse is measured as the probability that a fuse, chosen at random from a production
line, will function under the conditions for which it has been designed.

A random sample of 20 fuses was tested. Calculate the probability of observing 3 or more
defectives, assuming that the fuse reliability is designed to be 0.95.

Copyright © INSEAD 26
INTERNAL USE ONLY

2. Employees at some corporations can opt to work at their homes part of the work week
(this type of employee is referred to as a “guerrilla telecommuter” by management
consultants). At British Columbia Systems Corp. in Canada, the information technology
provider for the government, a sample of 20 home-based telecommuters was selected. It
was found in the sample that the average home telecommuter personally saves $1700 a
year in expenses. Suppose that the sample standard deviation was $300.

Construct a 99% confidence interval for the average savings by a home-based


telecommuter.

3. Footloose disco club

a. Cheby Chev, the barman of the Footloose disco club, gets a tip of $2 for each cocktail he
prepares.
If, on average, 5 cocktails are ordered per half-hour, what is the probability that he will
collect exactly $30 in tips in one hour?

b. The “best-seller” of Footloose is the tequila sunrise. The number of tequila sunrises sold,
Q, is normally distributed with mean 250. It is known that Prob (Q > 275) = 0.15 .What is
the variance?
c. It is difficult to get good seats at the Footloose club. A group of 8 friends has paid Cheby
to reserve a corner of the pub just for them. Each of the 8 friends had a 50% chance of
coming to the pub on any given Saturday. Moreover, knowing that one friend is coming
does not help us to predict whether any other friend is coming (i.e., their attendance is
independent).

What is the minimum number (m) of seats that should be put in the corner so that there is,
at least, an 85% chance that everyone in the group gets a seat?

Copyright © INSEAD 27
INTERNAL USE ONLY

4. A production superintendent for a pharmaceutical manufacturer must decide when to


adjust the mixing controls for the final stage of production of a pain killer. He wants to base
these everyday decisions on the mean weight of inert material per tablet. The desired mean
value is 0.5 mg for each tablet although some fluctuations around this mean are allowed.

The standard deviation of the process is equal to 0.1 mg.


The superintendent wants to avoid most the error of unnecessarily adjusting controls when
the mean actually falls in the desired limit.

a. Given the information, and using an alpha level of 5%, set up the decision rule based on
the weight of a sample of 25 tablets.
b. If the mean of the process has become 0.518 mg, what is the probability that the sample
will detect that the process should be adjusted, using an alpha level of 5%?

Copyright © INSEAD 28
INTERNAL USE ONLY

6. Day-Trading

A sizable part of internet-based trading is due to day-trading (extremely short-term


speculations, usually in highly volatile stocks).

Bill, a novice in day-trading, has traded for 10 days. Since he closes his positions at the end
of every trading day, Bill believes that a profit or loss on any given trading day is
independent of what happened or will happen on any other trading day. For these 10 days,
average return was -10% with a (sample) standard deviation of 10%. Bill thinks that the
situation is not likely to change in the future, and wants to estimate his prospects of
continuing day-trading.

Assume that the distribution of daily returns is normal.

a. Construct an 80% confidence interval for Bill’s average daily return (if he continues
trading).

b. Suppose that indeed, if Bill continues trading, his daily return will be distributed with

mean = -10% and standard deviation = 10%.

- What is the probability of having a positive return on a given day?

- What is the probability that during a week (5 trading days), he will have at least 3
positive returns?

Copyright © INSEAD 29
INTERNAL USE ONLY

7. O’Duck’s Weight-Loss Burgers

O’Duck’s Fast Foods is considering launching a new diet program, based on a new line of
weight-loss burgers. As the head of marketing, you get eight randomly chosen people to go on
a test diet, where they eat nothing but your burgers for a month. Their weight change over this
period is as listed below.

The sample average is -1.35 Kg (a loss of 1.35 Kg), and the sample standard deviation is

2.57 Kg.

For each of the subsequent questions, state your assumptions, if any.

(a) Construct a 90% confidence interval for the average impact of the diet on weight.

(b) You choose a new sample of size 30. The sample average is now -1 kg, and the sample
standard deviation is 2.4 Kg.

You would like to test if the diet leads to a weight change in either direction (either gaining or
losing weight), can you reject the assumption that the burger diet has no impact on weight, at a
0.05 significance level?

(c) Noticing that 6 out of 8 people in the test program lost weight, you’re considering basing
your advertising on this fact: “3 out of 4 of people eating O’Duck’s burgers lose weight!”
Suppose the diet has no effect on weight, so that each person is as likely to lose as to gain weight
over the test period. For the alternative hypothesis that the diet, on average, leads to weight
change, what is the probability of seeing 6 or more people out of 8 who lost weight?

(d) Seeing that one of your trial participants lost 5.4 Kg, you’re trying to make sense of how
unusual an outcome this is. You look up some data and find that, for the adult population as a
whole, the standard deviation of month-to-month weight change is 1.5 Kg (you assume the
mean change to be zero).

If the distribution of weight changes is normal, what is the probability of seeing a change of 5.4
Kg or more, in either direction?

Copyright © INSEAD 30
INTERNAL USE ONLY

Extra Exercises SET 6


1. Food a la carte
Food a la carte, a leader in the French restaurant market, is investigating opportunities for
opening a new restaurant in town. Competition is very high, the market shares are
shrinking. Before deciding whether or not to go into business, Ms Croquette, operations
manager for Food a la carte, would like to understand what are the factors that make a new
restaurant successful. Henceforth, Ms Croquette decides to collect data on several relevant
variables that may have an impact on the profitability of a new restaurant in town:

1. Total profit from operations in Thousands of Euros. (PROFIT)


2. Total area of the store in m2. (SIZE)
3. Number of employees employed by the store. (EMPL)
4. Total population in 3km radius around site. (TOTAL)
5. Average income in town in Thousands of Euros. (INC)
6. Number of competitors in a 1km radius around site. (COMP)
7. Number of restaurants that do not compete directly with Food a la carte.
(NCOMP)
8. Number of non-restaurant business in 1km radius around site. (NREST)
9. Cost of rent per square meter in Euros. (PRICE)
10. Cost of living index. (CLI)

To begin with, she collects 50 observations for the entire set of variables and starts building
a model to predict total profit (PROFIT).

a. What can you infer from the Matrix of Simple Correlation (Exhibit 1)?

b. What can you infer from the regression analysis in Exhibit 2?

c. Ms Croquette then prepares several different models. Which model would you select
among MODELS 1 to 6 in Exhibit 3?

Explain your reasoning. Please be precise and concise.

Copyright © INSEAD 31
INTERNAL USE ONLY

d. An external consultant, Mr Gourmet, has proposed his best model to predict PROFIT.
Exhibit 4 refers to his best model. From studying Exhibits 4(a) – 4(e), what can you
conclude about the assumptions for regression? How would you correct for problems,
if any? Do you need to make any assumptions? Motivate your answers by indicating
the appropriate exhibit. Please be precise and concise.

e. Based on MODEL 2, estimate the impact on PROFIT of one unit increase in SIZE.
Give a point estimate and a 99% confidence interval.

f. Based on MODEL 2, provide a 95% prediction interval for PROFIT.


The following values for the independent variables are given:
SIZE=100, EMPL=20, PRICE=50.

Copyright © INSEAD 32
INTERNAL USE ONLY

Exhibit 1
Matrix of Simple Correlation Coefficients

Size Empl Total INC COMP NCOMP NREST PRICE CLI


SIZE 1.000 0.706 -0.065 0.323 -0.142 0.014 -0.153 0.102 0.219
EMPL 0.706 1.000 -0.106 0.048 0.087 0.142 -0.176 0.079 0.104
TOTAL -0.065 -0.106 1.000 0.103 -0.116 0.061 0.046 0.027 0.240
INC 0.323 0.048 0.103 1.000 -0.102 0.753 -0.108 0.027 0.109
COMP -0.142 0.087 -0.116 -0.102 1.000 0.199 0.060 -0.357 -0.011
NCOMP 0.014 0.142 0.061 0.753 0.199 1.000 -0.036 -0.205 0.031
NREST -0.153 -0.176 0.046 -0.108 0.060 -0.036 1.000 -0.061 -0.264
PRICE 0.102 0.079 0.027 0.027 -0.357 -0.205 -0.061 1.000 0.269
CLI 0.219 0.104 0.240 0.109 -0.011 0.031 -0.264 0.269 1.000

PROFIT 0.824 0.025 0.126 0.453 -0.299 -0.043 -0.032 0.545 0.335

Exhibit 2
Regression analysis

Regression Statistics
Multiple R 0.987
R Square 0.974
Adjusted R Square 0.969
Standard Deviation of Regression 52.289
Observations 50
D.F.Numerator 9
D.F.Denominator 40

Dependent Variable PROFIT


Independent Variable Coefficient Standard Error t-stat P-value 0.05 Significance?
Constant: a -689.370 140.637 -4.902 0.000 Y
SIZE 4.247 0.161 26.360 0.000 Y
EMPL -5.740 2.085 -2.753 0.009 Y
TOTAL 0.005 0.001 4.320 0.000 Y
INC 17.402 2.396 7.262 0.000 Y
COMP 0.309 3.341 0.092 0.927 N
NCOMP 1.487 2.190 0.679 0.501 N
NREST 1.571 0.345 4.552 0.000 Y
PRICE 22.019 1.358 16.215 0.000 Y
CLI 1.303 1.024 1.272 0.211 N

Copyright © INSEAD 33
INTERNAL USE ONLY
Exhibit 3
Run 1
Regression Statistics
Multiple R 0.952
R Square 0.907
Adjusted R Square 0.898
Standard Deviation of Regression 93.997
Observations 50
D.F.Numerator 4
D.F.Denominator 45

Dependent Variable PROFIT


Independent
Variable Coefficient Standard Error t-stat P-value 0.05 Significance?
Constant: a 134.381 62.568 2.148 0.037 Y
SIZE 4.517 0.265 17.065 0.000 Y
EMPL -8.317 3.655 -2.276 0.028 Y
NCOMP 4.924 3.834 1.284 0.206 N
PRICE 22.801 2.206 10.336 0.000 Y

Run 2
Regression Statistics
Multiple R 0.950
R Square 0.903
Adjusted R Square 0.897
Standard Deviation of Regression 94.659
Observations 50
D.F.Numerator 3
D.F.Denominator 46

Dependent Variable PROFIT


Independent Variable Coefficient Standard Error t-stat P-value 0.05 Significance?
Constant: a 164.441 58.434 2.814 0.007 Y
SIZE 4.523 0.266 16.975 0.000 Y
EMPL -7.569 3.634 -2.083 0.043 Y
PRICE 22.178 2.167 10.233 0.000 Y

Run 3
Regression Statistics
Multiple R 0.984
R Square 0.969
Adjusted R Square 0.960
Standard Deviation of Regression 55.222
Observations 50
D.F.Numerator 5
D.F.Denominator 44

Dependent Variable PROFIT


Independent Variable Coefficient Standard Error t-stat P-value 0.05 Significance?
Constant: a -594.684 79.624 -7.469 0.000 Y
SIZE 4.255 0.165 25.713 0.000 Y
TOTAL 0.006 0.001 4.985 0.000 Y
INC 17.494 2.496 7.010 0.000 Y
NREST 1.597 0.348 4.590 0.000 Y
PRICE 21.971 1.264 17.389 0.000 Y

Copyright © INSEAD 34
INTERNAL USE ONLY

Run 4
Regression Statistics
Multiple R 0.986
R Square 0.973
Adjusted R Square 0.971
Standard Deviation of Regression 51.860
Observations 50
D.F.Numerator 6
D.F.Denominator 43

Dependent Variable PROFIT


Independent Variable Coefficient Standard Error t-stat P-value 0.05 Significance?
Constant: a -535.928 78.053 -6.866 0.000 Y
SIZE 4.280 0.156 27.489 0.000 Y
EMPL -5.321 2.027 -2.625 0.012 Y
TOTAL 0.006 0.001 5.021 0.000 Y
INC 17.606 2.344 7.511 0.000 Y
NREST 1.462 0.330 4.420 0.000 Y
PRICE 22.180 1.189 18.650 0.000 Y

Run 5
Regression Statistics
Multiple R 0.984
R Square 0.971
Adjusted R Square 0.964
Standard Deviation of Regression 55.859
Observations 50
D.F.Numerator 6
D.F.Denominator 43

Dependent Variable PROFIT


Independent Variable Coefficient Standard Error t-stat P-value 0.05 Significance?
Constant: a -596.202 85.053 -7.010 0.000 Y
SIZE 4.256 0.168 25.310 0.000 Y
TOTAL 0.006 0.001 4.901 0.000 Y
INC 17.501 2.527 6.925 0.000 Y
COMP 0.192 3.455 0.056 0.956 N
NREST 1.600 0.352 4.535 0.000 Y
PRICE 21.997 1.362 16.149 0.000 Y

Run 6
Regression Statistics
Multiple R 0.550
R Square 0.302
Adjusted R Square 0.257
Standard Deviation of Regression 254.284
Observations 50
D.F.Numerator 3
D.F.Denominator 46

Dependent Variable PROFIT


Independent Variable Coefficient Standard Error t-stat P-value 0.05 Significance?
Constant: a 504.701 141.768 3.560 0.001 Y
SIZE 5.840 4.945 1.180 0.242 N
EMPL -4.468 6.600 -0.676 0.502 N
NCOMP -3.792 6.790 -0.558 0.579 N

Copyright © INSEAD 35
INTERNAL USE ONLY

Exhibit 4(a)
Plot of Residuals against Observation Number

25.0
20.0

15.0

10.0
Residual

5.0

0.0

-5.0

-10.0

-15.0
0 5 10 15 20 25 30 35 40 45 50 55

Observations (Ordered by Profit)

Exhibit 4(b)
Plot of Residuals against Predicted Values

800

600

400
Residual

200

-200

-400

-600
800 900 1000 1100 1200 1300 1400 1500 1600
Predicted

Copyright © INSEAD 36
INTERNAL USE ONLY

Exhibit 4(c)
Plot of Profit against Size
2000
1800
1600
1400
Profit

1200
1000
800
600
400
50 100 150 200 250 300
Size

Exhibit 4(d)
Plot of Profit against NREST
2000
1800
1600
1400
Profit

1200
1000
800
600
400
0 10 20 30 40 50 60 70 80 90 100
NREST

Copyright © INSEAD 37
INTERNAL USE ONLY

Exhibit 4(e)
Histogram of the Residuals with the Superimposed Normal Curve

14

12

10

0
-175 -150 -125 -100 -75 -50 -25 0 25 50 75 100 125 150 175

Copyright © INSEAD 38
INTERNAL USE ONLY

2. Internet Users

A lot of business nowadays involves advertising and direct sales via internet. To predict the
number of internet users, the following data were collected for the year 2000:

Variable Unit Description


GDP per Capita One unit is one US $ per Gross Domestic Product per
Capita Capita, in constant US $
Personal Computers One unit is one Computer Number of Personal
per 1,000 people Computers per 1,000 people.
Mobile Phones One unit is one Mobile Number of Mobile Phones
Phone per 1,000 people per 1,000 people
Television Sets One unit is one Television Number of Television Sets
Set per 1,000 people per 1,000 people
Electric Power per Capita One unit is Kwh per Capita Electric Power Consumption
per capita, in Kwh (kilowatt-
hours)

Internet Users One unit is one Internet User Number of Internet Users per
per 1,000 people 1,000 people

The data were collected for all countries with GDP per Capita exceeding 1,000 US $, and
ordered by GDP per Capita. In all regression models, Internet Users is the dependent variable.

a. What can you infer from the correlation matrix of the variables (Exhibit 1)

b. From Regression Models 1-5. (Exhibit 2), which model is the best? Please justify your
answer.

c. In Regression Model 4 three important statistics are missing for the intercept:

t-stat, p-value, and significance at 0.05 level. Please compute them.

d. Exhibit 3 shows the Analysis of the Residuals ( Residuals vs. Predicted values and
Histogram of the residuals) for Regression Model 4.

Are the regression assumptions satisfied? If not, what could be the reason and what
would you do to improve the model?

e. Interpret the regression coefficient corresponding to the independent variable “Personal


Computers” in Regression Model 4

Compute a 95% confidence interval for this coefficient.

Copyright © INSEAD 39
INTERNAL USE ONLY

f. Use Regression Model 4 to compute a 95% prediction interval for the number of internet
users per 1,000 people in Singapore.
The data for Singapore is as follows:

GDP per Capita 22,767


Personal Computers 483
Mobile Phones 684
Television Sets 304
Electric Power per Capita 6,889

Exhibit 1
Matrix of Simple Correlation Coefficients

Electric
GDP per Personal Television Mobile
Power per
Capita Computers Sets Phones
Capita
GDP per Capita 1.0000 0.8919 0.7036 0.8309 0.7902
Personal Computers 0.8919 1.0000 0.7056 0.7967 0.7290
Television Sets 0.7036 0.7056 1.0000 0.6687 0.6317
Mobile Phones 0.8309 0.7967 0.6687 1.0000 0.6767
Electric Power per Capita 0.7902 0.7290 0.6317 0.6767 1.0000

Internet Users 0.8203 0.8774 0.6898 0.8279 0.8011

Copyright © INSEAD 40
INTERNAL USE ONLY

Exhibit 2: Regression Models

Regression Model 1
Regression Statistics
Multiple R 0.8203
R Square 0.6729
Adjusted R Square 0.6693
Standard Deviation of Regression 81.9948
Observations 94
D.F.Numerator 1
D.F.Denominator 92

Analysis of Variance (ANOVA)

Source Sum of Squares d.f. Mean Square F P-value


Regression 1272321.434 1 1272321.434 189.244877 0
Residual 618529.6728 92 6723.148618
Total 1890851.106 93

Dependent Variable Internet Users

Independent Variable Coefficient Standard Error t-stat P-value 0.05 Significance?


Constant: a 17.9176 11.7766 1.521459326 0.1316 N
GDP per Capita 0.0111 0.0008 13.75663029 0.0000 Y

Regression Model 2
Regression Statistics
Multiple R 0.9030
R Square 0.8155
Adjusted R Square 0.8093
Standard Deviation of Regress 62.2634
Observations 94
D.F.Numerator 3
D.F.Denominator 90

Analysis of Variance (ANOVA)

Source Sum of Squares d.f. Mean Square F P-value


Regression 1541944.8 3 513981.5998 132.580991 0
Residual 348906.3068 90 3876.736743
Total 1890851.106 93

Dependent Variable Internet Users

Independent Variable Coefficient Standard Error t-stat P-value 0.05 Significance?


Constant: a -6.8389 9.6141 -0.711335632 0.4787 N
GDP per Capita -0.0004 0.0015 -0.258363598 0.7967 N
Personal Computers 0.5454 0.0911 5.98373515 0.0000 Y
Mobile Phones 0.2041 0.0470 4.343585869 0.0000 Y

Copyright © INSEAD 41
INTERNAL USE ONLY

Regression Model 3
Regression Statistics
Multiple R 0.9231
R Square 0.8522
Adjusted R Square 0.8455
Standard Deviation of Regression 56.0402
Observations 94
D.F.Numerator 4
D.F.Denominator 89

Analysis of Variance (ANOVA)

Source Sum of Squares d.f. Mean Square F P-value


Regression 1611345.783 4 402836.4457 128.271 0
Residual 279505.3235 89 3140.509253
Total 1890851.106 93

Dependent Variable Internet Users

Independent Variable Coefficient Standard Error t-stat P-value 0.05 Significance?


Constant: a -14.1843 11.9902 -1.182989055 0.2400 N
Personal Computers 0.3912 0.0691 5.662723671 0.0000 Y
Television Sets 0.0113 0.0421 0.268272458 0.7891 N
Mobile Phones 0.1540 0.0399 3.861960219 0.0002 Y
Electric Power per Capita 0.0086 0.0019 4.55894168 0.0000 Y

Regression Model 4
Regression Statistics
Multiple R 0.9231
R Square 0.8521
Adjusted R Square 0.8471
Standard Deviation of Regression 55.7506
Observations 94
D.F.Numerator 3
D.F.Denominator 90

Analysis of Variance (ANOVA)

Source Sum of Squares d.f. Mean Square F P-value


Regression 1611119.76 3 537039.92 172.785758 0
Residual 279731.3463 90 3108.12607
Total 1890851.106 93

Dependent Variable Internet Users

Standard
Independent Variable Coefficient Error t-stat P-value 0.05 Significance?
Constant: a -11.9771 8.6771
Personal Computers 0.3965 0.0658 6.022392222 0.0000 Y
Mobile Phones 0.1562 0.0388 4.021884879 0.0001 Y
Electric Power per Capita 0.0087 0.0018 4.726459761 0.0000 Y

Copyright © INSEAD 42
INTERNAL USE ONLY

Regression Model 5
Regression Statistics
Multiple R 0.9112
R Square 0.8302
Adjusted R Square 0.8265
Standard Deviation of Regression 59.3902
Observations 94
D.F.Numerator 2
D.F.Denominator 91

Analysis of Variance (ANOVA)

Source Sum of Squares d.f. Mean Square F P-value


Regression 1569875.926 2 784937.9632 222.538561 0
Residual 320975.1799 91 3527.199779
Total 1890851.106 93

Dependent Variable Internet Users

Independent Variable Coefficient Standard Error t-stat P-value 0.05 Significance?


Constant: a 0.6676 8.6890 0.076834138 0.9389 N
Personal Computers 0.5364 0.0573 9.357745811 0.0000 Y
Electric Power per Capita 0.0113 0.0020 5.690454334 0.0000 Y

Copyright © INSEAD 43
INTERNAL USE ONLY

Exhibit 3: Analysis of Residuals for Regression Model 4

150

100

50
Residuals

-50

-100

-150

-200
-100 0 100 200 300 400 500 600

Predicted

30

25

20
Absolute Frequency

15

10

0
-200 -175 -150 -125 -100 -75 -50 -25 0 25 50 75 100 125 150 175 200 225
Histogram of Residuals

Copyright © INSEAD 44
INTERNAL USE ONLY

3. TechProducts Sales

Bob Smart is the CEO of TechProducts, a manufacturer and distributor of high tech
products, before they become commodities. TechProducts has signed a number of
strategic alliances with two big high tech firms to license their products, as they are being
commoditized. Consequently, TechProducts is producing and distributing cheaper
versions of such products for the medium and low ends of the market. Bob is concerned
about the sales of external memory cards (used in cameras, PDAs, and hand held
computers) that his company is producing. These cards account for over 25% of its
revenues and about 1/3 of its profits. There are several questions that Bob is not sure
about. For instance is it more beneficial to advertise his memory cards in trade magazines,
or spend more money on promotions? In addition, he is not sure about the effect of price
increases/decreases on sales, or the influence of advertising and promotions done by
competitors. To improve his insights concerning these and similar questions, he asked
his assistant, John Timber, to collect as much data as possible and run regressions
(remembering from his days at INSEAD that regression could provide useful
information). Bob hopes that this will clarify his concerns, and help him make more
intelligent decisions.

The monthly data John has collected consists of Sales, the dependent variable, and six
independent ones. These are described briefly below:

1. Total monthly sales of memory cards, minus returns, in Thousands of Boxes


(each box contains six memory cards). (SALES) The capacity of memory cards
varied from 128K to 1000K, and with it the price.

2. Total monthly budget in Thousands of Dollars spent on advertising, mostly in


trade journals. (ADV)

3. Total monthly budget in Thousands of Dollars spent on encouraging distributors


to promote TechProducts memory cards by displaying them in prominent places
in their stores, or by selling them cheaper. (PROMOT)
4. Average monthly price of the memory cards shipped during the month, in
Dollars. (PRICE)

5. Total monthly advertising budget spent by TechProducts’ competitors (also


mainly used in trade magazines). (COMP.ADV)

Copyright © INSEAD 45
INTERNAL USE ONLY

6. Total monthly promotional budget spent by TechProducts’ competitors. Unlike


competitive advertising there figures are not as reliable estimates for
promotional spending, reducing the trustworthiness of the numbers.
(COMP.PROMOT)

7. Occasionally, TechProducts would find itself with a high inventory of memory


cards, or cards of lesser memory capacity than those demanded in the market. In
such cases, TechProducts provides the extra/unwanted cards to big discounters
that sell them at reduced prices ranging between 20% and 40%. The result is that
the cards are sold, but at reduced profit margins that cover costs and a small part
of the fixed expenses. During the months that such deals are provided to
Discounters, this independent variable takes the value 1; otherwise its value is
zero. (DISCOUNTERS)

Copyright © INSEAD 46
INTERNAL USE ONLY

Please answer the following questions in a precise but brief and concise manner by consulting
Exhibits 1, 2 and 3.

Question 1 (please refer to Exhibit 1):

(a) Are there any possible problems that you should be aware of by studying Exhibit 1?
(b) What does the correlation coefficient of -0.8089 between PRICE and
DISCOUNTERS indicate?

Question 2 (please refer to Exhibit 2):

(a) In your view, which is the best Regression Run from the six listed in Exhibit 2?
(b) Write down the regression equation you chose in part (a) above, and explain the precise
meaning of the regression coefficients a and b i ?
(c) Construct a 99% confidence interval for the values of a and b in Run 6.
(d) In Regression Run 3, test the hypotheses that the value of the regression coefficient
B price = -10, versus the alternative that it is different than -10.

Question 3 (please refer to Exhibit 3):

By relating each specific part of Exhibit 3 to the various assumptions of regression, explain
if such assumptions are or are not satisfied. If necessary, specify what other information
you may want to seek to answer this question.

Question 4
After having studied the various Regression Runs and having answered the questions
above, what is your best advice for Bob? Is it more beneficial to advertise his memory
cards in trade magazines or spend more money on promotions? Please be brief and
precise.

Copyright © INSEAD 47
INTERNAL USE ONLY

Exhibit 1
Matrix of Simple Correlations

PROMO COMP.AD COMP.PROM DISCOUNTER


ADV PRICE
T V OT S
ADV 1.0000 -0.2704 -0.2184 -0.5327 0.1799 0.3354
PROMOT -0.2704 1.0000 0.2358 0.0727 0.4150 -0.2938
PRICE -0.2184 0.2358 1.0000 0.2746 -0.0599 -0.8089
COMP.ADV -0.5327 0.0727 0.2746 1.0000 -0.1950 -0.1797
COMP.PROM
0.1799 0.4150 -0.0599 -0.1950 1.0000 0.2357
OT
DISCOUNTER
0.3354 -0.2938 -0.8089 -0.1797 0.2357 1.0000
S

SALES 0.5968 0.0433 -0.6686 -0.6062 0.3388 0.6530

Exhibit 2

Run 1
Regression Statistics
Multiple R 0.911
R Square 0.830
Adjusted R Square 0.801
Standard Deviation of Regression 62.968
Observations 42
D.F.Numerator 6
D.F.Denominator 35

Dependent Variable SALES


Independent Variable Coefficient Standard Error t-stat P-value
Constant: a 624.55 356.30 1.75 0.088
ADV 3.03 0.78 3.90 0.000
PROMOT 4.83 1.29 3.76 0.001
PRICE -4.81 1.85 -2.60 0.014
COMP.ADV -0.17 0.05 -3.32 0.002
COMP.PROMOT -0.04 0.23 -0.18 0.860
DISCOUNTERS 108.80 50.63 2.15 0.039

Copyright © INSEAD 48
INTERNAL USE ONLY

Run 2
Regression Statistics
Multiple R 0.911
R Square 0.830
Adjusted R Square 0.806
Standard Deviation of Regression 62.115
Observations 42
D.F.Numerator 5
D.F.Denominator 36

Dependent Variable SALES


Independent Variable Coefficient Standard Error t-stat P-value
Constant: a 639.04 342.09 1.87 0.070
ADV 3.01 0.76 3.96 0.000
PROMOT 4.70 1.05 4.48 0.000
PRICE -4.91 1.74 -2.82 0.008
COMP.ADV -0.17 0.05 -3.39 0.002
DISCOUNTERS 104.88 44.92 2.33 0.025

Run 3
Regression Statistics
Multiple R 0.897
R Square 0.804
Adjusted R Square 0.763
Standard Deviation of Regression 65.746
Observations 42
D.F.Numerator 4
D.F.Denominator 37

Dependent Variable SALES


Independent Variable Coefficient Standard Error t-stat P-value
Constant: a 984.43 326.48 3.02 0.005
ADV 3.61 0.76 4.75 0.000
PROMOT 4.47 1.11 4.04 0.000
PRICE -8.19 1.09 -7.50 0.000
COMP.ADV -0.14 0.05 -2.74 0.009

Run 4
Regression Statistics
Multiple R 0.877
R Square 0.769
Adjusted R Square 0.744
Standard Deviation of Regression 71.346
Observations 42
D.F.Numerator 4
D.F.Denominator 37

Dependent Variable SALES


Independent Variable Coefficient Standard Error t-stat P-value
Constant: a 572.59 311.54 1.84 0.074
ADV 4.47 0.74 6.02 0.000
PROMOT 4.23 1.38 3.07 0.004
PRICE -8.69 1.17 -7.44 0.000
COMP.PROMOT 0.21 0.24 0.90 0.373

Copyright © INSEAD 49
INTERNAL USE ONLY

Run 5
Regression Statistics
Multiple R 0.881
R Square 0.776
Adjusted R Square 0.745
Standard Deviation of Regression 71.196
Observations 42
D.F.Numerator 5
D.F.Denominator 36

Dependent Variable SALES


Independent Variable Coefficient Standard Error t-stat P-value
Constant: a 320.50 389.32 0.82 0.416
ADV 4.37 0.75 5.85 0.000
PROMOT 4.74 1.45 3.26 0.002
PRICE -7.00 1.96 -3.58 0.001
COMP.PROMOT 0.10 0.03 2.99 0.005
DISCOUNTERS 58.78 27.99 2.10 0.042

Run 6
Regression Statistics
Multiple R 0.669
R Square 0.447
Adjusted R Square 0.433
Standard Deviation of Regression 106.210
Observations 42
D.F.Numerator 1
D.F.Denominator 40

Source Sum of Squares d.f. Mean Square F P-value


Regression 364796.2 1 364796.2 32.3 0.000001
Residual (e i ) 451249.7 40 11281.2
Total 816045.9 41

Dependent Variable SALES


Independent Variable Coefficient Standard Error t-stat P-value
Constant: a 2416.92 229.61 10.53 0.00
PRICE -9.39 1.65 -5.69 0.00

Copyright © INSEAD 50
INTERNAL USE ONLY

Exhibit 3(a)
Plot of Residuals against Predicted Values

Copyright © INSEAD 51
INTERNAL USE ONLY

Exhibit 3(b)
Histogram of the Residuals with the Superimposed Normal Curve

6
Absolute Frequency

0
-220 -200 -180 -160 -140 -120 -100 -80 -60 -40 -20 0 20 40 60 80 100 120 140 160 180 200 220 240
Histogram of Residuals

Copyright © INSEAD 52

You might also like