# Regression step-by-step using Microsoft Excel®

Notes prepared by Pamela Peterson Drake, James Madison University

Step 1:

Type the data into the spreadsheet

The example used throughout this “How to” is a regression model of home prices, explained by: square footage, number of bedrooms, number of bathrooms, number of garages, whether it has a pool, whether it is on a lake, and whether it is on a golf course.

The objective is to explain the variation in home prices, using the variation in the independent variables. In other words, we are asking the question of “Why do home prices vary from home to home?” It may be because the homes have different features. Hence, we are using the variation in the features to explain the variation in the home prices. You need to arrange the data in columns to use the built in regression function within Microsoft Excel. The first column contains the observations on the dependent variable and then the other, adjoining columns containing the observations on the independent variables. Include column headings to make it is easier to interpret your results.

if we want to view the correlation among the dependent variables. Regression In the Tools menu. If you include the variable names in the column headings and these column headings are part of the range of observations that you specified. you should then choose Regression: Step 3: Specify the regression data and output You will see a pop-up box for the regression specifications. For example. be sure to check the Labels box. This process is similar to the correlation specification. If you leave the default checked as New Worksheet Ply. . you will find a Data Analysis option.Step 2: Use Excel®’s Data Analysis program. a new worksheet is created that will contain the results. You can then specify where you would like to place the results. 1 Within Data Analysis. we would use a similar process using the Data Analysis function “Correlation”: 1 If you do not find this option. Using this screen. you can then specify the dependent variable [Input Y Range] and the columns of the independent variables [Input X Range]. you will want to click on Add-ins and then specify Data Analysis as an option.

This statistic is not generally interpreted because it is neither a percentage (like the R2). The multiple correlation coefficient is 0.99%.334. The standard error of the regression is \$419. a measure of explanatory power. The adjusted R-square. The coefficient of determination. . R .82795539 419334. is 84. This indicates that the correlation among the independent and dependent variables is positive.84985198 0. in dollar terms. nor a test of significance (such as the Fstatistic). is 0.82795539. which is an estimate of the variation of the observed home prices. 2. does not indicate statistical significance of this correlation. which ranges from -1 to +1. This statistic.92187417. This means that close to 85% of the variation in the dependent variable (home prices) is explained by the independent variables.The results of the regression analysis appear as follows: Step 4: Interpret the results Summary information 1. 2 Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations 0. about the regression line. 2 Analysis of variance 2 The standard error is in the same unit of measurement as the dependent variable.615 56 3. 4.92187417 0.

The estimated regression line The results of the estimated regression line include the estimated coefficients.144.0% -30. Adding a bathroom.60 150. 4.755.63 Standard Error 277.28 572. The F-statistic is calculated using the ratio of the mean square regression (MS Regression) to the mean square residual (MS Residual).77 t Stat -1.22 639. the higher the home price.77 0. The other independent variables do not add not significantly in explaining the variation in home prices.493.08 -105. 1.211.40 14.65 491.40 184. the calculated t-statistic.7584E+11 F 38.92 816.44 88.59 -557.79 3. SS Residual is the variation of the dependent variable that is not explained. The number of bathrooms is positively related to home prices.822. apart from the effect on square footage. and the number of bathrooms.0% -960.15 289.51 229. on average. 2.766.8248E+12 1.635.908. The independent variables that statistically significant in explaining the variation in the home prices are the square footage.49 134.46 Lower 90.791.67 212.13 -183.044.30 1. Coefficients Intercept Square footage Number of bedrooms Number of bathrooms Number of car-garage Whether it has a pool Whether on a lake Whether on a golf course -495.451.88 7.812076 Significance F 1.027.95 118.50 . 2. Comparing this value with 5%.73 -166.835.539.44039E+12 5.18174E-17 1.33 818.09 -309.237.478.311.The analysis of variance information provides the breakdown of the total variation of the dependent variable in this case home prices) in to the explained and unexplained portions.25 -187.167.07 -389. HA: at least one βi not equal to 0 3.53 461.30 -345.432.41 501.77734E+13 8.00 0.00 0.15 -0. the number of bedrooms. 3.25 0.08 0. increases the home price. This is statistic can then be compared with the critical F value for 7 and 48 degrees of freedom (available from an F-table) to test the null hypothesis: H0: β1= β2= β3= β4= β5= β6= β7=0 v.80 95. the standard error of the coefficients.02 -239.116.503. and (2) the calculated p-values that are less than the significance level of 5%.26 -3.938.27 0.61 924.219. for example.55 -221.80 787.63 511. the corresponding p-value. but this may be due to an interaction with the square footage variable because larger homes tend to have more bedrooms.62138E+13 MS 6.475.28 315. The p-value associated with the calculated F-statistic is probability beyond the calculated value.303.88 0.244.460. ANOVA df Regression Residual Total 7 48 55 SS 4. The coefficient of 310.058.24 Upper 90.98 -264.894.07.23 203.053.36 Lower 95% -1.012.982.587.20 Upper 95% 62.27 310. and the bounds of both the 95% and the 90% confidence intervals.989.248.400.00 0.276.278. The relationship between square footage and home prices is positive: the larger the square footage. The SS Regression is the variation explained by the regression line.682.08 100.527. indicates rejection of the null hypothesis. an additional square foot increases the home price by \$310.92 Pvalue 0.691.174. The number of bedrooms is negatively related to the home price.11 0.71 -39.07 indicates.91 183.89 -591.04 98.29 -150.93 469.869. as indicated by (1) calculated tstatistics that exceed the critical values.51 180.

235 2.095 1.000 0.873 0. suggesting that multicollinearity may be a problem.083 1. we see that there is positive correlation among several variables: Correlation coefficients Home listing price Square footage Number of bedrooms Number of bathrooms Number of car-garage Whether it has a pool Whether on a lake Whether on a golf course Home listing price 1.282 0.004 1.179 1.Correlations When using multiple regression to estimate a relationship.495 4.280 0.644 -0.311 0.282 2.430 0.463 0.404 0.029 4. .038 1.527 0.381 0.349 0.130 -0.625 4.445 0.000 0. we see that there are many statistically significant correlations (indicated in green). we cannot conclude that any of these correlations are important until we test for significance. we could either: Increase the sample size (which will often reduce the correlation among the independent variables.499 0.023 3.000 0. calculating the test statistic for each of the pair-wise correlations above.003 1.144 1.156 9.000 Number of bedrooms Number of cargarage Whether on a golf course Square footage Number of bathrooms Whether it has a pool Whether on a lake However. generated by the Correlation function within Data Analysis.027 4.554 0.735 0.795 0.465 0.765 0.338 -0.555 0.534 -0.890 3. Looking at the correlation.000 0.702 Number of bedrooms Number of bathrooms Number of cargarage Whether it has a pool Whether on a lake Whether on a golf course In the case of multicollinearity. This correlation may be pair-wise or multiple correlation.613 3.067 1.840 2.598 0.936 2.345 0.000 0.861 3.000 -0. removing or restating the independent variables such that there is less correlation among them. there is always the possibility of correlation among the independent variables.000 0. Test statistics Square footage Square footage Number of bedrooms Number of bathrooms Number of car-garage Whether it has a pool Whether on a lake Whether on a golf course 6.371 0.962 -0. or Re-specify the regression model.642 0.