You are on page 1of 7

Linear Regression in Excel

SIMPLE LINEAR REGRESSION EXAMPLE


Butlers Trucking Company is an independent trucking Company in southern California. A major portion of Butlers business involves deliveries throughout its local area. To develop better work schedules, the managers want to estimate the total daily travel time for their drivers. Initially the managers believed that the total daily travel time would be closely related to the number of miles traveled in making the daily deliveries. A simple random sample of 10 driving assignments is provided in Table 1. Use Excel to make a scatter diagram of these deliveries (to verify that a linear relationship does exist) and develop a regression equation expressing this relationship. Table 1 Driving Assignment 1 2 3 4 5 6 7 8 9 10 X1=Miles Traveled 100 50 100 100 50 80 75 65 90 90 Y=Travel Time (hrs.) 9.3 4.8 8.9 6.5 4.2 6.2 7.4 6.0 7.6 6.1

Excel Instructions for Drawing a Scatter Plot to Verify that a Linear Trend Exists 1. Enter the above information in the Excel spreadsheet as shown in Figure 1 below. 2. Highlight your numerical data in cells A2 through B11. Click on Insert tab and then Scatter. Select the first option (Scatter with only markers). See Figure 2. An XY Scatter chart will appear for your data (Figure 3). 3. You can label your chart, and x and y axes by clicking on the Layout tab under the Chart Tools. In the Labels section, you can specify where you want to place your Chart title and edit the text. You can also define your labels for your horizontal axis (x) and vertical axis (y). See Figure 4. 4. After verifying that a linear trend does exist, determine the least squared regression equation.

Figure 1

Figure 2

Figure 3

Figure 4

Excel Instructions for Regression Analysis


1. The Regression Macro (which is part of the Analysis ToolPak) is standard with Excel, however, it is not always active and available for use. Click on the Data tab. If Analysis ToolPak is active then you should see Data Analysis under the Analysis section. If this item is present skip to step 3. 2. If this item is not there, you need to add this toolpak. Click on the Office Button in the top left corner of your excel Spreadsheet. Select Excel Options in the bottom right of the Office menu. Select the Add-Ins tab on the left pane. On the bottom of the Add-ins menu, next to the Manage: Excel Add-ins, click Go. Click the Analysis ToolPak checkbox, then OK. Analysis Toolpak should now be present under Analysis section of the Data tab in the future. 2. Select the Data Analysis option under the Analysis and select the Regression option (as shown below).

Figure 5 3. Your dependent variable (y) data is in cells B1 through B11 (including the variable name or label), and your independent variable data (x) is in cells A1 through A11. Click the labels box to indicate that the first row contains the variable names, and then click ok. See Figure 6.

Figure 6

4. A new worksheet will appear revealing the results of your regression analysis. The results from this analysis are shown below.
SUMMARY OUTPUT

Correlation Coefficient Coefficient of Determination

Regression Statistics Multiple R 0.8149057 R Square 0.6640713 Adjusted R Square 0.6220802 Standard Error 1.0017919 Observations 10

ANOVA df Regression Residual Total 1 8 9 Coefficient s 1.273913 0.0678261 SS 15.87130435 8.028695652 23.9 MS 15.87130 4 1.003587 F 15.814578 Significance F 0.004080177

P value for Anova Test

b0
Intercept X1=Miles Traveled

Standard Error 1.400744525 0.017055637

t Stat 0.909454 2 3.976754 7

P-value 0.3896874 0.0040802

Lower 95% -1.95621171 0.028495691

Upper 95% 4.5040378 0.1071565

Lower 95.0% -1.9562117 0.0284956 9

Upper 95.0% 4.5040378 0.1071565

Interpreting Results

b1

P value for t test for X1

1. In your second model summary table, you will find the Coefficient of Determination, R2, and the Correlation Coefficient, R. 2. The ANOVA table gives the F statistic for testing the claim that there is no significant relationship between your independent and dependent variables. The sig. value is your p value. Thus you should reject the claim that there is no significant relationship between your independent and dependent variables if p<. 3. The Columns below the Coefficients box gives the b0 and b1 values for the regression equation. The intercept value is always b0. The b1value is next to your independent variable, x. 4. In the last P-value column of the coefficient output data, the p values for individual t tests for our independent variable is given (in the same row as your independent variable). Recall that this t test tests the claim that there is no relationship between the independent variable and your dependent variable. Thus you should reject the claim that there is no significant relationship between your independent variable and dependent variable if p<.

II. MULTIPLE REGRESSION EXAMPLE


In attempting to identify another independent variable, the managers felt that the number of deliveries could also contribute to the total travel time. Table 2 includes the number of deliveries for each of the random driving assignments provided in Table 1. Table 2 1 2 3 4 5 6 7 8 9 10 11 A X1=Miles Traveled 100 50 100 100 50 80 75 65 90 90 B X2=Number of Deliveries 4 3 4 2 2 2 3 4 3 2 C Y=Travel Time (hrs.) 9.3 4.8 8.9 6.5 4.2 6.2 7.4 6.0 7.6 6.1

To determine the regression equation for this scenario follow the same SPSS steps provided for Simple Linear Regression with the following modifications: Enter your multiple regression data in Excel as shown above. In Step 3, specify your dependent variable (y) data is in cells C1 through C11 (including the variable name or label), and your independent variable data (x1 and x2) is in cells A1 through B11. Click the labels box to indicate that the first row contains the variable names, and then click ok. See Figure 7.

Your output for this multiple regression problem should be similar to the results shown below.

SUMMARY OUTPUT Regression Statistics 0.95067816 Multiple R 6 0.90378897 R Square 5 Adjusted R 0.87630011 Square 1 0.57314215 Standard Error 2 Observations ANOVA df Regression Residual Total 2 7 9 SS 21.6005565 1 2.29944348 6 23.9 Standard Error 0.95154772 5 0.00988849 5 0.22111346 1 Lower 95.0% 3.11875268 3 0.03775204 1 0.40057548 9 Upper 95.0% 1.38134975 0.08451715 6 1.44627524 4 MS 10.80028 0.328492 F 32.87836743 Significance F 0.00027624 10

Intercept X1=Miles Traveled X2=Number of Deliveries

Coefficients 0.86870146 7 0.06113459 9 0.92342536 7

t Stat -0.91294 6.182397 4.176251

P-value 0.391634304 0.000452961 0.004156622

Lower 95% 3.11875268 3 0.03775204 1 0.40057548 9

Upper 95% 1.38134975 0.08451715 6 1.44627524 4

Interpreting Results
1. In your second model summary table, you will find the Adjusted Coefficient of Determination, Adjusted R2, and the Correlation Coefficient, R. 2. The ANOVA table gives the F statistic for testing the claim that there is no significant relationship between your all of your independent and dependent variables. The sig. value is your p value. Thus you should reject the claim that there is no significant relationship between your independent and dependent variables if p<. 3. The Coefficients box gives the b0 and b1, and b2 values for the regression equation. The constant value is always b0. The b1value is next to your x1 value, and b2 is next to your x2 value. 4. In the last column of the coefficient box, the p values for individual t tests for our independent variables is given. Recall that this t test tests the claim that there is no relationship between the independent variable (in the corresponding row) and your dependent variable. Thus you should reject the claim that there is no significant relationship between your independent variable (in the corresponding row) and dependent variable if p<.