You are on page 1of 22

Introduction to Regression Analysis Using Excel

Why and When to Use Regression

Dependent Variable or Response Y, the variable we wish to explain Independent or Regressor Variables , the variables used to explain the dependent variable Relation beteween Y and

is called a regression model

or residual Interception Dependent Variable .Multiple Linear Regression Model Slope Coefficient Independent Variable Random Error term.

Linear Regression Assumptions • The probability distribution of the residual ε is normal (normality assumption) • The probability distribution of the residual has constant variance (constant variance assumption) • Residuals are statistically independent .

Multiple Linear Regression Model (continued) y Observed Value of y for xi εi Predicted Value of y for xi Intercept = β0 Slope = β1 Random Error for this x value xi x .

Least Squares Criterion • are obtained by finding the values of b0 that minimize the sum of the squared residuals .

Scatter Plot Examples Linear relationships y y Curvilinear relationships x No relationship y y Strong relationships y x Weak relationships x x x .

Calculating the Correlation Coefficient The correlation coefficient r measures the strength of the association between the variables .

6 y x x r=0 r = +.3 x r = +1 x .Examples of Approximate r Values y y y x r = -1 y r = -.

Linear Regression Assumptions • A real estate agent wishes to examine the relationship between the selling price of a home and its size (measured in square feet) • A random sample of 10 houses is selected – Dependent variable (y) = house price in $1000s – Independent variable (x) = square feet .

Sample Data for House Price Model .

Scatter Plot And Correlation Coefficient .

Regression Using Excel • Tools / Data Analysis / Regression .

Excel Output The regression equation is: .

Graphical Presentation .

Analysis of Variance (ANOVA) .

Explained and Unexplained Variation (continued) .

Coefficient of Determination. R2 • The coefficient of determination is the portion of the total variation in the dependent variable that is explained by variation in the independent variable • The coefficient of determination is also called R-squared and is denoted as R2 where 0  R2  1 .

that is. the number of factors .Adjusted R2 • The adjusted R2 is a statistic that is adjusted for the “size” of the model.

08% of the variation in house prices is explained by variation in square feet .Excel Output 58.

79749 -43.99284 House Price Model Residual Plot Residuals -6.33264 10 284.937162 -19.85348 304.Residual Plot RESIDUAL OUTPUT Predicted House Price 1 2 3 4 5 251.20251 367.92316 273.853484 3.38832 48.38832 356.923162 Residuals 80 60 40 20 0 -20 -40 -60 Square Feet 0 1000 2000 3000 38.99284 6 7 8 9 268.6674 -49.17929 64.06284 218.87671 284.85348 .85348 -29.17929 254.12329 -5.

Thank you .