You are on page 1of 10

1

A Step by Step Guide to Regression Analysis (For MKTG 475)

Professor ‘Mike’ Minhi Hahn

Example [1]. Coke Price over Time (Year)

 Are the price and time related (in any sense)?

 Can we predict the future Coca Cola price based on the past data?

 Let’s observe a relationship: Coke Price = f ( Year )

More specifically we observe a linear relationship:

Coke Price = a + b x Year

We will estimate what a and b should be.

I-1: Using Excel Data Analysis Add-In

1. Raw Data: 1920 $0.05, 1980 $0.25, 1985 $0.65, 1990 $0.87, 2000 $0.1.25, 2014 $2.00

2. Input Data

(1) Open Excel

(2) Open a blank file (workbook)

(3) Type in data.

[After the data input, name the file for the case you use the data file, later, again. (Click “file”
menu and use “save” option to name your file and the directory you want to put the file in.)]
2

- Some operations with Excel

◼ Moving columns

◼ Inserting data of a new variable

◼ Selecting a range

◼ Calculating “average”

◼ Calculating Price_Estimate = a + b x Year

3. Visualizing the data

(a) Select the range of data you want to draw a chart (Use the mouse)

(b) Click “Insert”. Then Click “Recommended Chart”. Click OK (for the Scatter Chart).

Price
2.5

1.5

0.5

0
4. Analysis of Correlation 1900 1920 1940 1960 1980 2000 2020

4. Calculating CORRELATION

(a) Click “Data” in the top menu.

(b) Click “Data Analysis (Add-in) at the right end of the list of sub-menu.

(c) Box (of alternative data analysis methods) will pop up. Choose Correlation.

(d) Click OK.


3

(e) Then, select the range of relevant data of both dependent variable (Price) and the
independent variable (Year). You may use mouse, or, you may type in $A$1:$B$7 to designate
the relevant range.

(f) Click “Labels in first row” if you included the label in your selected data range.

(g) Correlation value will appear in a separate sheet.

=> Correlation between Year and Price is 0.818

5. Running a Regression Analysis (with Excel)

(a) Back to the data. (Normally the sheet 1)

(b) Again select “data” in the top menu bar. Then, select “data analysis” tool pack.

(c) The data analysis menu box will pop up. Scroll down to find the regression option. Choose
the regression option (OK).
4

(d) Select the range of your dependent variable (criterion, target, endogenous variable). The
variable you want to make prediction. In this case “Price.”

(e) Select the range of your independent variable (explanatory, predictor, controlled,
exogenous variable) . The variable you want to rely on for the prediction, i.e., the predictor. In
this case, it is “Year.”

6. Regression Output

(a) We got the output of regression analysis. However, it looks a little messy.

(b) My Excel version has a bug.


5

- In the output, Lower 95% and Upper 95% appear twice (Column F&G and Column H&I).

- I will delete one of them, i.e., column H & I. Select the column H and I using your mouse.

Click the right button of your mouse. Select “delete” menu and delete the columns H and I.

(c) Showing numbers up to three decimal points.

- Select the whole sheet using your mouse. Drag your


mouse from A to G on the label row with left mouse click.

- Right click your mouse. Select “format cell” option.

- In the pop up box, select “Number” among the Category


option.

- Increase “Decimal places” from two to three. => OK

(d) Expanding cell size to see all the words.

- Again select the whole sheet.


6

- Expand the column A to have the size you want. All other columns will be expanded to have
the same size, too, automatically. => You get the following result.

7. Reading the Output

Assumption 1: When we determine whether a result is statistically significant or not, we will


use the criterion (“alpha error”) of 0.05.

(1) Significance of F: 0.046 < 0.05

=> “The linear regression model is significant.”

=> It is a meaningful model. We may go further and interpret the result.

(2) R-square is .670.

=> The higher the value (closer to 1.0), the model fits the data very well.

=> How do we interpret .670? It depends on situations.

=> We may say, the linear model reasonably fits the data.
7

(3) p-value of the independent variable (Year) is .046 < .05. (t-value is 2.849)

=> Again, we conclude it is statistically significant.

=> The independent variable (Year) is significantly related to the dependent variable,
i.e., Price. Year is related to Coke Price. How?

(4) Estimated model: Price = -34.645 + 0.018 x Year

(5) What is the interpretation?

=> As “Year” (IV) increases by one unit, i.e., increase of one year,

“Price” (DV) increases by 0.018, i.e., $0.018.

 Every year, Coke price increases by $0.018 on the average. (During 1920~2014).

(6) Shall we predict the price using the estimated model?

Let’s calculate the “estimated price” for years 1920, 1980,

- We want to calculate the values

and show them in column C.

- Click the cell C2.

- Type “=”

- Then type in the regression formula

In the fx box as shown on the right.

- “Enter”

- I added Year 2020 to see what will be the Coke price in 2020.
8

- Copy C2 to C3~C8. (Select C2. Right click your mouse. Select the “copy” option. => Then
select C3 to C8 with your mouse. Then, hit “Enter”.)

- We see that the estimated prices do not match the real price very well.

- The prediction for the 2020 price is 1.715, even lower than the price in 2014.

8. Can we improve the prediction?

Let’s drop the 1920 data and see what happens.

(May be the data of 1920 does not reflect the current trend of the Coke price. At that time,
Coke price has been fixed from 1886.)

(a) Drop the 1920 data from our data file.

(b) Try to get correlation. What is the correlation score? Do you get 0.995?

(c) Run Regression.

(d) Make the messy output more readable.


9

(e) Interpret the output.

The estimated model: Price = -96.590 + 0.049 x Year

(f) What is the predicted price for 2020? Predicted price is $2.268

=> If we use more data points, the prediction is likely to get better.

Q&A

Exercise [2-1] Weights versus Heights (With 500 sample version)

(1) Read the data file (WTdata.xlsx)

(2) Analyze correlation between weight and height

(3) Regression analysis of weight with height

(4) Interpretation
10

Exercise [2-2]: Weight versus Height & Gender

- Interpretation of coefficients when independent variables are categorical

If we use the dummy variable as Male = 1, Female = 0:

Model: Weight = Constant + b1 x Height + b2 x Gender

As usual, the coefficient b1 is the effect when Weight is increased by one unit.

It is the increase in Weight when the Height becomes taller by one unit (one inch).

What is b2 (coefficient of Gender)?

If the person is a MALE, Gender is 1.

Then, the model becomes Weight = Constant + b1 x Height + b2 x 1

That is, Weight = Constant + b1 x Height + b2

If the person is a FEMALE, Gender is 0.

Then, the model becomes Weight = Constant + b1 x Height + b2 x 0

That is, Weight = Constant + b1 x Height

Therefore, b2 is the difference in height on the average between Male and Female.

(1) Run Regression with DV = Weight and IV = Height and Gender

(2) Interpret the output

(3) What will be the weight for a woman with height of 70 inches tall?

You might also like