You are on page 1of 5

COLOGNE-ROTTERDAM

EXECUTIVE MBA 2024

BUSINESS ANALYTICS

PROBLEM SET 4
Exercise 1

Data from a British government survey of household spending was used to examine the relationship between
household spending (£ per person/month) on tobacco products and alcoholic beverages. A scatterplot of
spending on alcohol vs. spending on tobacco in the 11 regions of Great Britain shows an overall positive linear
relationship with one region- Northern Ireland, number 11 on the graph, as an outlier.

1
a) Does alcohol spending appear to be a significant predictor of tobacco spending? Can we say that
alcohol spending causes tobacco spending?
b) If we calculate the correlation for all eleven regions, we get the correlation = 0.223, whereas if we
consider only the first ten regions the correlation = 0.784. In building a model, should we discard the
outlier or not? Why?
COLOGNE-ROTTERDAM
EXECUTIVE MBA 2024

BUSINESS ANALYTICS

Exercise 2

It has been suggested that January is a good indicator of the behaviour of the stock market during the entire
year. We ran a regression on changes in stocks in the S&P500 Index during January and the corresponding
changes in the same index for the entire year for the years 1964 to 2001, with the results attached.

a) Do you believe that January can serve as a predictor of the stock market’s performance? Justify your
answer statistically. Is it a good predictor?
b) In January 1999, the S&P500 Index increased by 4%. Predict the performance of the index for the
entire year and give an approximate 95% prediction interval.
c) As a fund manager, can you use this phenomenon to design a trading strategy? Do you need to run
other statistical tests?
COLOGNE-ROTTERDAM
EXECUTIVE MBA 2024

BUSINESS ANALYTICS

Exercise 3
A popular model often used is the power trend line, which has the form 𝑦 = 𝑎𝑥 𝑏 , and the property that when
𝑥 increases by 1%, 𝑦 changes by a constant percentage. In fact, this constant percentage is approximately
equal to 𝑏 (which could be positive or negative). The power trend line is often used in economics models,
where, for example, 𝑥 might be price and 𝑦 might be demand. The table below shows data on demand and
price of a commodity.

Price Demand
48.67 1465
37.70 1786
34.70 2067
51.68 1294
30.73 2345
41.55 1537
53.16 1385
34.25 2060
30.97 2312
54.88 1170
43.15 1608
48.35 1439

• Draw a chart of this data1. Using Excel’s Power Trend Line tool (right click on one data point, select 3
“Add Trendline”, and then select “Power”), estimate and interpret a power trend line for the data on
demand and price listed above. What is the 𝑅 2 value? If price increases by 1%, what do you expect
to happen to demand?
• Using Excel’s LOG() function, make two new columns by taking the LOG() of Price and Demand. Graph
the new values in a new chart, and fit a straight line (i.e., use a linear trend). What is the 𝑅 2 value?
What is the estimated Demand when the price is 50? In what range is Demand likely to be?

1
Select all the data (including the headers) and click on “Insert → Recommended Charts” or “Insert → Scater” and select
the Scater Plot.
COLOGNE-ROTTERDAM
EXECUTIVE MBA 2024

BUSINESS ANALYTICS

Exercise 4
In this exercise, we will walk through a practical example of how correlation can be used to construct a
minimum variance portfolio of a targeted expected return. To this end, we have collected data between 31
December 2019 and 29 October 2020 of five publicly traded companies, as shown in file stocks.xlsx (see
Canvas). In sheet “Data”, alongside reporting the daily adjusted closing price of each share (columns B to F),
we also report their daily variations (columns H to L).

• Use the Correlation option from the Data Analysis Toolbox to calculate a correlation table. In “Output
options”, make sure you select “Output Range” and then cell N1 (the cell painted yellow). Which
stock price changes are strongly correlated and which are not? Are there any insights worth
reporting?
• Use the Covariance option from the Data Analysis Toolbox to calculate the covariance table. In
“Output options”, make sure you select “Output Range” and then cell N8 (the cell painted red). This
is important because covariance numbers are linked to further analysis. Do the covariances
correspond to the correlations you have calculated? How would you explain potential discrepancies?
• Calculate the average and standard deviation of each stock price change (columns H to L) in the table
shown in cells J11 to N17. For the standard deviation, use the formula STDEV.S(), since we are dealing
with a sample and not with the entire population. Which share appears to be the most profitable?
Which one is the least profitable? How about risk? You may calculate additional quantities in case
you find them useful to answer this part.
• Go to sheet MVP (Minimum Portfolio Variance). There, you should see the calculations of average 4
returns, standard deviations, the correlation and covariance matrixes being identical to what you had
in the previous tab. In row 23, you can change the portfolio weights to select the composition of your
portfolio. Assuming that you want to maximize the expected return of your portfolio, how would you
select the weights? What would be the corresponding portfolio variance (in cell G26).
• Add the Solver add-in (the procedure is the same as the one for the Data Analysis Toolbox). You can
find the solver button under the Data ribbon. Click on the Solver button and inspect the model. Then
click the “Solve” button and wait until a window with title “Solver Results” appears. Click OK, and
inspect the new weights and total portfolio variance. How much bigger is the portfolio variance
compared to the one you found when maximizing the expected return? How about the expected
return?
• Change the daily target return cell (G36) to 0.001 and resolve the model. How has the portfolio
composition changed? How is the portfolio variance compared to the optimal one?
A Python implementation of this problem, alongside explanations, can be found here.
COLOGNE-ROTTERDAM
EXECUTIVE MBA 2024

BUSINESS ANALYTICS

Exercise 5
In this exercise, we will make use of the scrapped Airbnb data from Amsterdam. The dataset can be found on
Canvas (airbnb.xlsx). This data contains the id of each host (which is a unique identifier), the size of each
apartment or room (in square feet) and the price per day. We would like to see if our data can be used to
predict the price of each apartment or room.

• Make a histogram of price (Select the price column, including the header, and go to Insert,
Recommended charts, Histogram). Does the price appear to follow a normal distribution?
• Carry out a regression analysis between the price and the square feet. Graph the square feet vs the
price and add the regression trendline. Does this appear to be a good model? In particular, does it
satisfy the assumptions of the ordinary linear regression model?
• Calculate the logarithm of price (using the LN() function) and repeat the previous two steps. Does the
logarithm of price appear to follow a normal distribution? Does the regression deliver a better fit?
How can we interpret the coefficient of square meters? Can we improve the fit?
• Finally, fit a model with square feet and the hasBathtub variables against the logarithm of price. Does
the model improve? Can we improve it further?

You might also like