You are on page 1of 22

Guide to Regression in Excel for CAES9821 – Part 2 Linear Regression

In the correlation test there two instances of multicollinearity (very high correlation between independent variables)

• SAT and ACT


• Est. annual price 2018-19 with avg. grant and Est. annual price 2018-19 without aid

For each of these pairs of independent variables we need to remove one of the variables from the model. For each pair we will delete the variable
which has the lowest correlation with the dependent variable (Early career earnings) – this means we should delete SAT and Est. annual price 2018-19
with avg. grant from the model.

1. Create a new worksheet with SAT and Est. annual price 2018-19 with avg. grant columns deleted from the data. Name this worksheet something like
“Cleaned data no SAT no av grant.”

When you have finished the dataset should look like this.

1
We will carry out linear regression with the remaining variables.

2. Click on the Data Tab on the menu bar at the top of the screen then click on Data Analysis (top right of screen)

1. Data Tab 2. Data Analysis Tab

3. The Data Analysis window will appear. Scroll down, select Regression and click OK.

4. In the regression window select Labels, Standardized Residuals, and Line Fit Plots

2
5. Select the Data for the Y Axis by clicking on the arrow

Click on the arrow for Input Y range

6. You will see the little box below.

7. Select all the data in the Early career earnings column. There are two ways to do this.

Option A

1. Click on cell F1
2. Press and hold the shift key then press
on the down arrow. Scroll down until
reaching row 715 (last row of data)

3
3. Click on the arrow after highlighting the data

Option B

3. Click on cell F1 2. Press and hold the shift key then press on
the down arrow and scroll down a few
rows.

3. Click inside the regression box and edit


the second row to be 715.

4. Click on the arrow after highlighting the data.

4
8. Click on the arrow to select the data for the X input range

Click on the arrow for Input X range

9. You will see the little box below.

10. Select all the data in columns A - E and then click on the arrow icon in the Regression box.

Click on the arrow after highlighting the data

5
11. The Regression window should look like this. We now need to select where to output the data. Choose New Worksheet Ply (if not selected as default),
then click on OK to complete the calculation and output to a new worksheet.

1. Select New Worksheet Ply (if not already selected)

2. Now click on OK to complete the calculation

6
12. You will be directed to the new worksheet which should look like this:

13. First rename the new worksheet as shown below

7
14. Tidy up the data. Double click on the border between column to widen the columns to view the data more easily.

Double click on the border between each column

15. The 5 charts will be stacked on top of each other so drag them apart.

1. Left click on the top chart and hold.


2. Drag each chart and place them one
below the other on the page.
3. Leave space between the charts as we
will be enlarging them.

8
16. We will now enlarge each chart

1. Hold cursor over bottom right-hand corner


until double arrow appears

2. Left click and drag to increase size of chart

17. Suggested size/proportion of chart shown below (do for each chart).

9
18. Add the fit line for predicted Early career earnings. Click on the plot area so the three icons are visible. Now click on the + icon so the options appear

19. Don’t click in the box marked trendline. Instead follow the instructions below.

2. Click on right arrow and


options will appear. Select
More Options.

1. Hover mouse over


Trendline until right
arrow appears.

10
20. When the Add Trendline box appears select Predicted Early Career Earnings, then click OK.

3. Close the sidebar

21. We will now add some extra information to the chart.

1. Best fit line on chart.

2. Select Display
Equation on chart.

11
22. Move the equation to another part of the chart (hold and drag)

23. Repeat Steps 18 – 22 for the other 4 charts.

Your formatted worksheet should look something like the one below:

12
We now need to calculate the standardized coefficients as this will tell us if the predictor variables are strong, weak or moderate predictors of Early
Career Earnings. Excel, unlike other software, cannot do this automatically, but we can create the formulae do calculate this. Follow the steps below:

24. In the tab Cleaned data no SAT no av grant you need to create an extra row for the Standard Deviation (see below)

1. Write STDEV here


and align right

2. Select top and bottom border

25. Calculate the standard deviation for each variable (X and Y) by inputting the formula below:

=STDEV (First cell in column:Last cell in column)


No spaces in the formula
e.g. for Rank this would be =STDEV(A2:A715)

13
26. A quick way to do this is to calculate the standard deviation for Rank first, click on the cell with the answer, place the cursor on the bottom right corner
of the cell (will change from a white to a black cross), then left click and drag across the next 5 cells. This will copy the formula across the cells.

2. Left click, hold and drag cursor across next 5 cells.

1. Place cursor on bottom right corner of cell.

27. The completed standard deviation calculations should look like this:

28. Create a new table of standardized coefficients and their standard errors on your multiple regression worksheet or another worksheet (see below).
We suggest you create it on the MLR tab and start in cell D3.

29. Calculate the standardized coefficient for X1 (Rank) by inputting the following formula:

=non-standardized coefficient for X1 * standard deviation X1 / standard deviation Y

You will need to click on the relevant cells in the relevant worksheet. A worked example for X1 (Rank) is shown below.

14
1. Click in this cell (E4) and type =

2. Click on this cell. Notice how the original


cell (E4) has been updated. There is also
a formula bar near the top of the page.

3. Click in the original cell (E4) or the


formula bar and type *

15
4. Click in the tab Cleaned data no SAT no av grant, then on cell A717.
Notice how the formula bar has been updated.

5. Stay on the same tab, go to the formula bar and type /

6. Click on cell F717.

16
7. The formula bar should now look like this. Click on the arrow to
complete the calculation.

8. Excel will jump back to the original worksheet, which should look like this.

30. Calculate the standardized coefficients for the other X variables (X 2 = Median ACT, X3 = Est annual price 2018-19 without aid, X4 = % of students who
get grants, X5 = Average Student Debt). Use the same formula at the beginning of Step 29 and substitute in each X variable.

31. Your completed table for the Standardized Coefficients should look like this:

17
32. We now need to calculate the Standard Error for the Standardized Coefficients as we will be reporting this when we write up the data. We will use the
following formula and start with X1 (Rank):

=non-standardized standard error for X1 * standard deviation X1 / standard deviation Y

A worked example for X1 (Rank) is shown below:

1. Click in this cell (F4) and type =

2. Click on this cell. Notice how the original


cell (F4) has been updated. There is also
a formula bar near the top of the page.

18
3. Click in the original cell (F4) or the
formula bar and type *

4. Click in the tab Cleaned data no SAT no av grant, then on cell A717.
Notice how the formula bar has been updated.

5. Stay on the same tab, go to the formula bar and type /

6. Click on cell F717.

19
7. The formula bar should now look like this. Click on the arrow to
complete the calculation.

8. Excel will jump back to the original worksheet, which should look like this.

33. Calculate the standard error for the other X variables (X 2 = Median ACT, X3 = Est annual price 2018-19 without aid, X4 = % of students who get grants,
X5 = Average Student Debt). Use the same formula at the beginning of Step 32 and substitute in each X variable.

34. Your completed table should look like this.

20
35. There is one final step. When we write up the data we only need three decimal places so we will format all the data to 3 decimal places.

1. Highlight these rows in the Variance Explained Table.

2. In the home tab click on


Decrease Decimal
multiple times to reduce
to 3 decimal places.

3. When you have finished the table should look like this.

36. Format all the data that is not a whole number to 3 decimal places.

37. The data is now ready for interpretation. Watch the video Part 2 Overview of Regression Data in Excel.

38. Interpret the data from your MLR model. Think about the questions below to help you interpret the data:

1. Regression Statistics Table


o What is the Coefficient of Determination (R Square)?
o How much of the variance in Early Career Earnings does the model explain?
o Is the model a good, moderate, or poor fit for the data?

2. ANOVA Table
o Is the model statistically significant? (p <0.05)

21
3. Regression Coefficients Table
o Is each variable statistically significant (p <0.05) or not statistically significant (p >0.05)?

4. Standardized coefficients
o Is each variable a strong, weak, or moderate predictor of Early Career Earnings?

5. Regression Equation
o What is the regression equation for this model?

6. Standardized Residuals
o Are there many outliers in the model?

Note: You may wish to refer to Statistics in Plain English Chapter 13 Regression (Urden, 2016) to help you answer the above questions.

When you have interpreted the data, you can check your answers by watching the video Part 3 Interpreting the Results from the Regression Model.

22

You might also like