You are on page 1of 6

Lecture 13 Regression and Correlation Analysis Regression and correlation analysis are two methods for studying relationships

between two variables that are very useful but often misunderstood or misused. The purpose of this section is to explain the basic concepts of these methods and how to interpret the results. Regression and correlation analysis is often used when two measurements or variables are made on the same unit, and it is desired to see if any relationship exists between the two or to be able to predict one by knowing the value of the other. Examples:  Weight and height of people.  Hardness and tensile strength of metal. Hardness is easy to measure. hardness. Control tensile strength by controlling

In process control it is often desired to determine the relationship between a process parameter (temperature, pressure, speed) and a product characteristic (diameter, length, hardness). Once this relationship is established, the process parameter can be controlled and satisfactory product can be produced. Regression analysis is the primary method to determine if a significant relationship is present and establish an equation for prediction. Regression is used when one knows or suspects a cause and effect between two variables. Changing one will cause the other to change. Determination of whether there is cause and effect depends on an understanding of basic physical principles, not statistics. Regression analysis does not test for cause and effect, only how to express the relationship. Correlation only establishes if a relationship exists and if an equation is not the primary interest. When using these techniques, one may find a relationship that was not expected or the lack of a relationship when one was expected. The discussion in this section is confined to establishing linear relations between two variables.

. X 1 2 3 4 5 Y 2 1 3 5 4 It appears that a straight line would approximate the relationship between X and Y.The two variables being studied are identified as independent and dependent. Scatter Diagram: The first step in studying the relationship between two variables is to draw a scatter diagram. Example: Plot the following data on the graph below. The independent variable is identified by X and is the one that will be controlled and used to predict the depended variable identified by Y. Draw a line That seems to best fit the data.

(15)2 = 275 .b1X bar X 1 2 3 4 5 15 ∑X n = 5 Y 2 1 3 5 4 15 ∑Y XY 2 2 9 20 20 53 ∑XY X2 1 4 9 16 25 55 ∑X2 Y2 4 1 9 25 16 55 ∑Y2 A = 5(53) .Regression analysis can be used to estimate a line that best fits the data and whether the line is useful in predicting values of Y.(∑X)2 Y bar = ∑Y/n X bar = ∑X/n Applying this to the example gives the following results: b0 = Y bar .[(15)(15)] = 265 . The formulas for b0 and b1 are as follows: b1 = A/C Where: A = n(∑XY) .(∑X) (∑Y) C = n(∑X2) . The method of least squares provides the proper estimates of the constants b0 and b1 in the straight line equation Y = b0 + b1X.225 = 40 C = 5(55) .225 = 50 .

An easier test is to calculate the correlation coefficient. r. is calculated by the following formula: R = A/√(B)(C) B = n(∑Y2) .(.(.8 and the least squares estimate of the intercept b0 is . b1 equal to 0.b1 = A/C = 40/50 = . This is equivalent to a test of the slope.b1X bar = 15/5 . then the line is significant.8) (3) = 3 .2.6. From the earlier example: A = 40 C = 50 B = n(∑Y2) . Least squares analysis provides the best fit line.88 so the line is not significant.225 = 50 r = 40/√(50)(50) = 40/50 = . r. however.(∑Y)2 = 5(55) .8 b0 = Y bar . If r is significantly different from 0. The slope is significantly different from 0. The correlation coefficient. then the equation can be used to make predictions about the Y variable for given values of X. (See Table 1 on Page 6).80 Correlation coefficient is less than .(∑Y)2 If the magnitude of r is greater than or equal to the value of r from Table 1.(15)2 = 275 . and test is r is significantly different from 0. .8) [15/5] = 3 .6 The least squares estimate of the slope b1 is . The best test is to test the hypothesis that the slope b1 is zero.4 = . it does not indicate if this is a good equation to use to predict from.

the equation can be used to predict values of Y as long as the values of X selected are within the range of X values used to establish the line.317 + 2.71)(4) = .157 . Suppose the following equation for X and Y was found to be significant: Y = .71X Prediction of Y for a particular value of X is obtained by substituting the value of X in the equation and solving for Y. The following graphs indicate the type of relation or association of X to Y for four values of r. Y = .317 + . When a significant line is established. Example: Find the value of Y for X = 4 in the above equation.840 = 3. the value of r is from -1 to +1.The correlation coefficient only indicates an association between the two variables.317 + (.

48 .53 .75 .35 .20 .60 .55 .23 .88 .TABLE 1 Table of Significance of r To be significant r must be at least .58 .27 .71 .29 .63 .30 .38 .67 .42 .33 .81 .16 Sample size n 4 5 6 7 8 9 10 11 12 13 14 15 20 25 30 35 40 45 50 70 100 150 .95 .