You are on page 1of 16

Assignment 5 A

Part I (30 points)


Linear regression is a method of estimating the portion of a cost that is variable and the portion that is
fixed. This method models the relationship between an activity and the total cost by fitting a linear
equation to the data. Unlike the high-low method which uses only two data points, linear regression
uses all data points in constructing the cost equation, making it much superior to the high-low method.
A linear regression generates information for a cost equation in the same form as the other methods of
estimating costs: Y = VCx + TFC, where 'x' is the independent variable (the activity) and 'Y' is the total
cost (dependent variable).

While there are several software programs that generate linear regressions, using Excel is relatively easy
and is a business tool frequently used by managers. It is also a business tool that is installed on the
majority of home and business computers. As such, you will learn how to run (generate) a regression
using Excel1

Interpreting the Regression Output

While a number of statistical items are generated in the regression output, your primary interest is the
components of the cost function found in the last section of the summary output. Linear regression
output for a home moving company that packs and moves residents to new homes appears below.

Intercept is the y-intercept which is the estimate of total fixed costs for each period. X Variable 1
represents the estimated variable cost per unit, or the slope of the cost equation. The cost equation for
the moving company is:

Y = 3,815.69x + 828,814

We read this formula as 'Total cost equals variable cost of $3,815.69 times the number of residences
moved plus fixed costs of $828,814'. Always express unit costs (i.e., the unit variable cost) with two
decimals, and total costs (fixed costs) with no decimals.
Is the Data 'Good'?

A scatter graph is often prepared prior to running a regression to pre-assess the relationship between
two variables. A weak or nonexistent relationship between the activity and the total cost indicates that
the linear regression output will not provide a useful cost equation. Assessing the quality of the cost
equation (regardless of the cost estimation used) is beyond the scope of this course. As such, you will
focus solely on generating a cost equation and how to use it to estimate future costs.

Data Analysis ToolPak

To use the linear regression tool in Excel, the Data Analysis Toolpak must be installed. To verify if it is
installed, click Data from the Excel� menu. If you see the Data Analysis command in the Analysis group
(far right), the Data Analysis Toolpak is already installed. If it is not installed, follow the directions in
below to install it.

How to Install the Data Analysis Tool Pak

Open Excel 2016. Choose the Data tab from the menu ribbon in Excel. If there is no Data Analysis item
on the ribbon (look to the far right), follow the steps below to install.

1. Click the File menu option, and then click Options.


2. Click Add-Ins, and then in the Manage box (near the bottom), select Excel Add-ins.
3. Click Go.
4. In the Add-Ins available section, select the Analysis ToolPak check box, and then click OK. If
Analysis ToolPak is not listed in the Add-Ins available box, click Browse to locate it. If you get
prompted that the Analysis ToolPak is not currently installed on your computer, click Yes to
install it.
5. You will be immediately returned to your original worksheet. Click the Data tab from the menu
ribbon and you will see the Data Analysis option on the Analysis tab at the far right.

Walk Through Problem

Wilson Company provided the following information concerning the number of monthly service calls
provided and the total cost incurred for its pest control operations for the each month during 2016:

Number of
Year Total Cost
Service Calls
January 1,040 $33,600
February 1,200 36,300
March 1,260 37,800
April 1,100 35,500
May 1,220 36,600
June 1,010 32,900
July 1,190 36,200
August 1,050 33,400
September 1,210 37,700
October 1,250 37,400
November 1,060 33,800
December 1,280 38,100

Run a linear regression using the regression tool in Excel�. Write the cost equation in standard form.
Determine the estimated cost of providing 1,140 service calls for January, 2017.

Solution

Step 1: Open the Excel program. Copy and paste the data for Wilson Company into columns A, B, and C
beginning in row 1 of a blank worksheet.

Step 2: Because the regression output occupies 9 columns and 13 rows, to avoid placing the regression
output on top of your data, select a location in the worksheet in which there are enough empty rows
and columns to accommodate the output. Place your mouse pointer in cell A15 since that is an
acceptable location.

Step 3: Select the Data ribbon menu, then Data Analysis command on the Analysis tab. A popup box will
appear. Scroll down and select Regression. Click OK.

Step 4: The Regression wizard will be displayed. In the Input Y Range field, select all the values in the
total cost column of the data table---cells C1 through C13.

Step 5: In the Input X Range field, select all the activity values---cells B1 to B13 in the same manner as
Step 4.

Step 6: In the Output Range field, select cell A15 so that the regression output will begin in that cell
Step 7: Verify you check Labels to let Excel know that the first are the label i.e Cost and Number of
Service Calls. Your wizard should be identical to the graphic below:

Step 8: Click OK to 'run' the regression. Verify your output appears as follows, noting that the column
widths may differ based on your worksheet settings.
Step 9: Write the cost formula based on the regression output as:
Y = 19.08x + 13,720
The total cost is $13,720 plus $19.08 times the number of service calls. Always express unit costs (i.e.,
the variable cost) with two decimals, and total costs (fixed costs) with no decimals.

Step 10: The total cost expected if 1,140 service calls are made is:

Y = (19.0812465 x 1,140) + 13,720.2593 = $35,473

Question:
1. What is the Total Cost expected if 1,500 services are made (Please provide the answer below the
chart and highlight the answer in yellow.
Part II Another approach to Regression (30 points)

Regression Analysis is still the most popular method used in Predictive Analytics. The main reason is that
it works. It is well known and understood. With its different flavors, regression analysis covers a width
swath of problems. Another great reason to use it, is that regression tools are easy to find.

Years Salary
1 56384
30 90464
28 91857
15 61109
10 59014
9 79631
27 96031
7 64211
15 52731
22 83080
14 62902
14 59707
8 38434
4 58892
7 86632
28 93245
24 95754
10 42202
27 79743
28 88192
13 60516
12 62367
19 60230
13 97568
14 72823
23 96045
26 76180
22 85220
9 58869
19 66026
2 49721
17 71530
2 35993
22 93938
17 69246
15 71655
26 83459
20 95578
1 82725

What is Linear Regression?

Linear Regression is a method of statistical modeling where the value of a dependent variable can be
calculated based on the value of one or more independent variables. The general idea, as seen in the
picture below, is finding a line of best fit through the data. Using that line, you can then predict the
value of Y(salary) given X (year of experience).

Lets Start by Looking at the Data

There are 2 columns labeled Years and Salary. This example data set consist of the years of service (x)
and salary (y) of 39 employees for an imaginary company.
What we are going to attempt to do is to develop a model using Linear Regression that will allow us to
predict the salary (y) of an employee given their years of service (x).

Step 1: Build a Scatter Plot

The first thing we want to do is build a scatter plot. Excel makes this simple enough. Just highlight all of
your data > select the Insert Tab from the Ribbon > Select Scatter from Charts:

What you will get should look something like this:


We have a scatter chart with Salary on the Y Axis and Years on the X Axis. **Excel scatter charts set the
left most column of the data set to the X Axis by default.

Before we move on, I want to take a moment to look at the scatter plot. Do you see a pattern? Can you
see where you might be able to draw a line through the data?

I am not trying to just fill space here. I am asking a serious question. Because the answer is sometimes
you will not see a pattern. Sometimes the scattering of data will be so random that there will no need to
go forward with a linear regression. Learning to look for patterns in data visualizations is skill worth
developing.
In this example there is a general pattern, or more accurately, we see what looks like Positive
Correlation. We call it positive because it appears that as X increases so does Y. So now that our scatter
chart has passed the visual test, it is time perform our regression.

Trend Line

Performing a simple linear regression in Excel is ridiculously easy. Simply click on your scatter plot > from
the Ribbon select Chart Tools – Design > Add Chart Element > Trendline > Linear

Your trendline appears on your chart. The line is a little hard to see as is, so we ar going to format it a bit.
Start by double clicking on the trendline and the Format Trendline window will open on the right.

make the following changes:

Line: — Color: Red — Width: 3pt — Dash type: Solid Line

Trendline Options — Select Display Equation on chart and Display R-squared value on chart
The trend line is much easier to read.

Now let us talk about the numbers in the circle. Now I know I said I was not going to get too deep into
the math, but I feel I can’t do this subject justice without at least a cursory explanation of what is going
on.
What exactly did Excel do when it added the trendline? Technically it performed a statistical function
known as Ordinary Least Squares. What does that mean? Well if you wanted to attempt this by hand,
one approach you could take would be to start by drawing a line that looked best to you. You would
then measure the Residuals (the distance from the actual data points and line you drew)

You then repeat the process (picking a new line and measuring residuals) until you find the line that
results in the lowest overall residual.Once you have it, you get the equation for your line: y =
1357.9x+50974 (Luckily for us Excel makes the process a lot easier)

Now a quick refresher on the line formula: Y= mX + b (where m = Slope and b = Y-Intercept). This
equation is what you would use to make predictions. In our equation a person with 0 years in service
would have a salary of 50974: Y = 1357.9(0) + 50974 — Y= 50974. And each year of service would add
1357.90 to the salary.

Before we go start using your equation to start making predictions, we still need to discuss the R² you
see below your line equation. I won’t bore you with how R² is calculated. You don’t really need to know
how it is calculated to use linear regression, but you do need to know how to read it.

The simplest explanation I can give you for R² is that a value of 1 means perfect fit – every point in your
data matches up to your line. 0 on the other hand, means your line doesn’t match anything. Our R² is
0.4423, which really is not that great. I generally prefer to aim for a R² value above 0.6.

How can we improve our R² value? My preference would be to get more data. We currently only have
39 tuples. More data could improve our accuracy. If more data is not available though, you can look at
your outliers as Linear Regression can be greatly affected by outliers. Unfortunately outliers are often
tricky to deal with. A person with 1 year of service making 100,000 a year would definitely be an outlier,
but it is not an impossibility. If this employee is a highly experienced individual who just transferred from
another company, it is totally feasible they could be earning 100,000.

The hard truth is, considering only the data we have, we cannot rightfully develop a reliable model. This
happens more often than you might think. That is okay though, we will chalk this up as a learning
experience and move on.

Questions:

Question:

1. What is the strength of the correlation between the year (independent variable) and salary
(dependent variable)?
2. Explain what the trend line means.
3. Is there a significant relationship between your independent and dependent variables?
4. What is the probability that the value of the coefficient is obtained by chance?
5. What is your regression equation?
6. What is the Salary of an employee given 5 years’ experience?

What to submit to this assignment?

The workbook with the proper file naming convention. Your workbook should have three work
sheets i.e. Part I Part II, etc…; i.e. Any worksheet without proper names will not be graded.
Answers to the questions must be on the Regression worksheet and heighted in yellow i.e.

Part III (40 points)


To develop a better understanding of the consumers’ sensitivities, Johan wants to estimate the price
elasticity of durian, and he wants to focus on the best-selling variety of durian, namely, the Musang
King. Luckily for Johan, there exists some variation due to the frequent price promotion for this
variety of durian. More specifically, the price schedule of the Musang king is as following:

Price of Musang King Durian


Days Price/kg Kg Sold
1 25.37 45
2 25.37 40
3 25.37 40
4 25.37 43
5 22.83 41
6 20.3 45
7 25.37 45
8 20.3 46
9 17.76 47
10 25.37 41
11 22.83 40
12 17.76 42
13 25.37 41
14 25.37 44
15 25.37 39
16 20.3 43
17 22.83 43
18 25.37 42
19 25.37 43
20 15.22 45

Use Excel to make a scatter diagram of the quantity sold and price and add a trend line to your
scatter diagram (to verify that a linear relationship does exist) and develop a regression equation to
estimate this relationship.

Question:

1. What is the strength of the correlation between the price and number of Kg sold?
2. Explain what the trend line means.
3. Is there a significant relationship between your independent and dependent variables?
4. What is the probability that the value of the coefficient is obtained by chance?
5. What is your regression equation?
6. How would you advise Johan on his pricing strategy (create a memo to communicate your
rationale to Johan?

Please see the following for a refresher on interpretation of regression results:

Explanation of Regression Analysis Results

Excel Regression Output - How You Can Quickly Read and Understand It

You might also like