You are on page 1of 19

Chapter 9

Linear Regression and Correlation


Prepared by: Fariba Mirzaei, Reza Ahmadi, Romana Dolati

Revision
9.1 LINEAR RELATIONSHIP
One of the simplest relationship between two variables occurs when high values on one measure are associated with high values on another and low values on one are associated with low values on the other. Such a relationship can be drawn on a graph and it approximates a straight line. A linear relationship is an association between two variables that may be accurately represented on a graph by a straight line.

Revision Linear Function


The formula y = + x expresses observations on y Linear Function of observations on x. The formula has a straight line graph with slop and y-intercept .

Revision

If is positive, then y increase as x increase- the straight line goes upward. So when a relationship between two variables follows a straight line with >0, the relationship is said to be positive. If negative, then y decrease as x increase. The straight line then goes downward, and the relationship is said to be negative. A model is simple approximation for the relationship between variables in the population. The linear function is the simplest mathematical function.

9.2 Least Squares Prediction Equation Using sample data, we can estimate the linear model. The process treats and in the equation y = + x as unknown parameters and estimates them. The estimated linear function then provides predicted y-values at fixed values for x.

Outline:
Scatter plot

Prediction Equation

Prediction Errors (Residuals) Least Squares Estimates

Scatter Plot

The first step of model fitting is to plot the data, to reveal whether a model with a straight line trend makes sense. The data values (x,y) for any one subject form a point relative to the x- and y-axes. A plot of the n observations as n points is called a scatter plot.

Prediction Equation

When the scatter plot suggests that the model y = + x is realistic, we use the data to estimate this line. The notation = a + bx represents a sample equation that estimates the linear model. In the sample equation, the y-intercept (a) estimates the y-intercept of the model and the slope (b) estimates the slope . Substituting a particular x-value into a + bx provides a value, denoted by , that predicts y at that value of x. The sample equation = a + bx is called the prediction equation, because it provides a prediction for the response variable at any value of x. .

. The prediction equation is the best straight line, falling closet to the points in the scatter plot, in a sense discussed later in this section The formulates for a and b in the prediction equation

= a + bx are:

Example 1:
The following table has shown the statistics marks and the average of students. (a) Find prediction equation. (b) Predict the average for a student who got 20 in statistics.

Statistics Mark

GPA

10 15 8 5 18 12 6 16 17 12

2 3 2 1.5 3.5 2.5 2 3 4 3

Answer
(a) = a + bx = 0.77+ 0.158x (b) For x=20 = 0.77+ 0.158x=o.77+0.15820 =3.93

Prediction Errors (Residuals)

For an observation, the differences between an observed value and the predicted value of the response variable, y-, is residual. A positive residual results when the observed value y is larger than the predicted value , so y- >0 . A negative residual results when the observed value is smaller than the predicted value . The smaller the absolute value of the residual, the better is prediction, since the predicted value is closer to the observed value . Example 2: Find the residual and interpret for x=15 ,y=3 and x=6 ,y=2 in Example1.

Answer
= a + bx = 0.77+ 0.158x For x=15 = 0.77+ 0.158x=o.77+0.15815 =3.14 y=3 ,=3.14 y-=3-3.14=-0.14<0

Answer(cont)
= a + bx = 0.77+ 0.158x For x=6 = 0.77+ 0.158x=o.77+0.1586 =1.718 y=2 ,=1.718 y-=2-1.718=0.282>0

Least Squares Estimates


The least squares estimates a and b are the values that provide the prediction equation = a + bx for which the residual sum of squares , is a minimum. The symbol SSE is an observation for sum of squared errors. The prediction line =a + bx is called least squares line, because it is the one with the smallest sum of squarer residual. Example 3: (a) Find SSE for the data in Example 1. (b) Show scatter plot and prediction line. Answer(a):

Scatter Plot
4.00

Prediction Line
GPA
Observed Linear

3.50

4.00

3.00

3.50

GPA

3.00 2.50

2.50 2.00 2.00

1.50 1.50 5.00 7.50 10.00 12.50 15.00 17.50 5.00 7.50 10.00 12.50 15.00 17.50

Marks

Marks

References
1.

2.

Agresti, A. & Finlay, B. (2007).Statistical Method for The Social Science, Pearson Education, Inc. Freund, J. E & Perles, B. M. (2004) Statistics: A First Course: Fifth Edition. Pearson: Prentice Hall, New Jersey.

Thank You for Your Attention

You might also like