You are on page 1of 14


Specific statistical methods for finding the line of best fit for one response (dependent) numerical variable based on one or more explanatory (independent) variables.


Regression: 3 Main Purposes

To describe (or model)
To predict (or estimate) To control (or administer)


Regression analysis examines associative relationships between a metric dependent variable and one or more independent variables in the following ways: Determine whether the independent variables explain a significant variation in the dependent variable Determine how much of the variation in the dependent variable can be explained by the independent variables: strength of the relationship. Predict the values of the dependent variable.

Regression Analysis


Plan an outdoor party.
Estimate number of soft drinks to buy per person, based on how hot the weather is. Use Temperature/Water data and regression.


Real Life Applications

Estimating Seasonal Sales for Department Stores (Periodic)


Real Life Applications

Predicting Student Grades Based on Time Spent Studying


Practice Problems
Can the number of points scored in a basketball game be predicted by
The time a player plays in the game? By the players height?


Types of Regression Models

Positive Linear Relationship Relationship NOT Linear

Negative Linear Relationship

No Relationship


Least square method

The equation for regression line assumed by Least Squares method is Y=a+bx+ei Where ei =Yi-i Where Y is the dependent variable X is the independent variable a is the Y-intercept b is the slope of the line b=( (n(XY)-(XY))/ ((n(X2)-(X)2) a=Y-bX

Calculations for determining constants a and b

Man Hours(X) 3.6 4.8 2.4 7.2 6.9 8.4 10.7 11.2 6.1 7.9 9.5 Productivity in units(Y) 9.3 10.2 9.7 11.5 12 14.2 18.6 28.4 13.2 10.8 22.7 XY 33.48 48.96 23.28 82.8 82.8 119.28 199.02 318.08 80.52 85.32 215.65 X2 12.96 23.04 5.76 51.84 47.61 70.56 114.49 125.44 37.21 62.41 90.25


5.4 X=84.1

12.3 Y=172.9

66.42 XY=1355.61

29.16 X2


b=1.768 a=2.01



Coefficient of Multiple Determination

The coefficient of multiple determination measures the magnitude of the association of the variables involved in multiple regression. It is denoted by R2. In mathematical terms, it measures the percentage of variation in variable Y explained by the independent variables. R2 = ( Explained Variance) / ( Total Variance)


The Strength of Association R2

R2 = ( Explained Variance) / ( Total Variance) Total Variance = (Explained Variance)+ (Unexplained Variance) Explained Variance=(Total Variance ) (Unexplained Variance) R2 = (Total variance-Unexplained Variance) / Total Variance R2 = 1 ( Unexplained Variance/Total