Professional Documents
Culture Documents
CHAP6 - 062 Linear Regression (2) - 210910 042627
CHAP6 - 062 Linear Regression (2) - 210910 042627
Bivariate Data
Bivariate Data: Consists of the values of two different response variables that are obtained from the same population of interest
Three combinations of variable types: 1. Both variables are qualitative (attribute) 2. One variable is qualitative (attribute) and the other is quantitative (numerical) 3. Both variables are quantitative (both numerical)
50
40
Weight
30
20
10 10 20 30 40 50
Height
Example
Example: In a study involving childrens fear related to being hospitalized, the age and the score each child made on the Child Medical Fear Scale (CMFS) are given in the table below:
Age (x ) CMFS (y ) Age (x ) CMFS (y ) 8 9 9 10 11 9 8 9 8 11 31 25 40 27 35 29 25 34 44 19 7 6 6 8 9 12 15 13 10 10 28 47 42 37 35 16 12 23 26 36
Solution
age = input variable, CMFS = output variable Child Medical Fear Scale
50
40
CMFS
30
20
10 6 7 8 9 10 11 12 13 14 15
Age
Linear Correlation
Measures the strength of a linear relationship between two variables
As x increases, no definite shift in y: no correlation As x increases, a definite shift in y: correlation Positive correlation: x increases, y increases Negative correlation: x increases, y decreases If the ordered pairs follow a straight-line path: linear correlation
Example: No Correlation
As x increases, there is no definite shift in y:
55
Output
45
35 10 20 30
Input
50
Output
40
30
20 10 15 20 25 30 35 40 45 50 55
Input
85
Output
75
65
55 10 15 20 25 30 35 40 45 50 55
Input
Please Note
Perfect positive correlation: all the points lie along a line with positive slope Perfect negative correlation: all the points lie along a line with negative slope If the points lie along a horizontal or vertical line: no correlation If the points exhibit some other nonlinear pattern: no linear relationship, no correlation Need some way to measure correlation
( x x)( y y)
( n 1) sx s y
SS( x ) = sum of squ ares for x= x 2 SS( y ) = sum of squ ares for y= y
2
( x)2
n n
( y)2
x y n
Example
Example: The table below presents the weight (in thousands of pounds) x and the gasoline mileage (miles per gallon) y for ten different automobiles. Find the linear correlation coefficient: y2 y xy x x2
2.5 3.0 4.0 3.5 2.7 4.5 3.8 2.9 5.0 2.2 34.1 40 43 30 35 42 19 32 39 15 14 309 6.25 9.00 16.00 12.25 7.29 20.25 14.44 8.41 25.00 4.84 123.73 1600 1849 900 1225 1764 361 1024 1521 225 196 10665 100.0 129.0 120.0 122.5 113.4 85.5 121.6 113.1 75.0 30.8 1010.9
Sum
x2
y2
xy
( x )
n
= 123.73
2
( 34.1) 2 = 7.449 10
( 309 ) 2 = 1116.9 10
( y)
n
n
= 10665
SS( xy ) = xy r=
SS ( xy ) = SS ( x )SS ( y )
Please Note
r is usually rounded to the nearest hundredth r close to 0: little or no linear correlation As the magnitude of r increases, towards -1 or +1, there is an increasingly stronger linear correlation between the two variables Method of estimating r based on the scatter diagram. Window should be approximately square. Useful for checking calculations.
Linear Regression
Regression analysis finds the equation of the line that best describes the relationship between two variables One use of this equation: to make predictions
y Logarithmic: ^ = a log b x
Note: What would a scatter diagram look like to suggest each relationship?
is as small as possible
Illustration
y
^ = b0 + b1 x y
y^ y
( x, ^ ) y
^ y
y
x
10
b0
Example
Example: A recent article measured the job satisfaction of subjects with a 14-question survey. The data below represents the job satisfaction scores, y, and the salaries, x, for a sample of similar individuals:
x y 31 17 33 20 22 13 24 15 35 18 29 17 23 12 37 21
1) Draw a scatter diagram for this data 2) Find the equation of the line of best fit
11
Finding b1 & b0
Preliminary calculations needed to find b1 and b0:
x
23 31 33 22 24 35 29 37 234
12 17 20 13 15 18 17 21 133
xy x2 529 276 961 527 1089 660 484 286 576 360 1225 630 841 493 1369 777 7074 4009
x2
xy
( x )
n
n
SS( xy ) = xy
b1 = b0 =
12
Scatter Diagram
Solution 2)
22 21 20 19 18
Job Satisfaction
17 16 15 14 13 12
21
23
25
27
29
31
33
35
37
Salary
Please Note
Keep at least three extra decimal places while doing the calculations to ensure an accurate answer When rounding off the calculated values of b0 and b1, always keep at least two significant digits in the final answer The slope b1 represents the predicted change in y per unit increase in x The y-intercept is the value of y where the line of best fit intersects the y-axis The line of best fit will always pass through the point ( x, y)
13
Making Predictions
1. One of the main purposes for obtaining a regression equation is for making predictions 2. For a given value of x, we can predict a value of ^ y 3. The regression equation should be used to make predictions only about the population from which the sample was drawn 4. The regression equation should be used only to cover the sample domain on the input variable. You can estimate values outside the domain interval, but use caution and use values close to the domain interval. 5. Use current data. A sample taken in 1987 should not be used to make predictions in 1999.
14