You are on page 1of 22

# Introduction to Regression Analysis Using Excel

Why and When to Use Regression

Dependent Variable or Response Y, the variable we wish to explain Independent or Regressor Variables , the variables used to explain the dependent variable Relation beteween Y and

is called a regression model

or residual Interception Dependent Variable .Multiple Linear Regression Model Slope Coefficient Independent Variable Random Error term.

Linear Regression Assumptions • The probability distribution of the residual ε is normal (normality assumption) • The probability distribution of the residual has constant variance (constant variance assumption) • Residuals are statistically independent .

Multiple Linear Regression Model (continued) y Observed Value of y for xi εi Predicted Value of y for xi Intercept = β0 Slope = β1 Random Error for this x value xi x .

Least Squares Criterion • are obtained by finding the values of b0 that minimize the sum of the squared residuals .

Scatter Plot Examples Linear relationships y y Curvilinear relationships x No relationship y y Strong relationships y x Weak relationships x x x .

Calculating the Correlation Coefficient The correlation coefficient r measures the strength of the association between the variables .

6 y x x r=0 r = +.3 x r = +1 x .Examples of Approximate r Values y y y x r = -1 y r = -.

Linear Regression Assumptions • A real estate agent wishes to examine the relationship between the selling price of a home and its size (measured in square feet) • A random sample of 10 houses is selected – Dependent variable (y) = house price in \$1000s – Independent variable (x) = square feet .

Sample Data for House Price Model .

Scatter Plot And Correlation Coefficient .

Regression Using Excel • Tools / Data Analysis / Regression .

Excel Output The regression equation is: .

Graphical Presentation .

Analysis of Variance (ANOVA) .

Explained and Unexplained Variation (continued) .

Coefficient of Determination. R2 • The coefficient of determination is the portion of the total variation in the dependent variable that is explained by variation in the independent variable • The coefficient of determination is also called R-squared and is denoted as R2 where 0  R2  1 .

that is. the number of factors .Adjusted R2 • The adjusted R2 is a statistic that is adjusted for the “size” of the model.

08% of the variation in house prices is explained by variation in square feet .Excel Output 58.

79749 -43.99284 House Price Model Residual Plot Residuals -6.33264 10 284.937162 -19.85348 304.Residual Plot RESIDUAL OUTPUT Predicted House Price 1 2 3 4 5 251.20251 367.92316 273.853484 3.38832 48.38832 356.923162 Residuals 80 60 40 20 0 -20 -40 -60 Square Feet 0 1000 2000 3000 38.99284 6 7 8 9 268.6674 -49.17929 64.06284 218.87671 284.85348 .85348 -29.17929 254.12329 -5.

Thank you .