Professional Documents
Culture Documents
Ba 04
Ba 04
Regression
Correlation
220
200
180
160
140
120
100
80 wt (kg)
60 70 80 90 100 110 120
negative relationship
no relationship
Positive relationship
18
16
14
12
Height in CM
10
0
0 10 20 30 40 50 60 70 80 90
Age in Weeks
Negative relationship
Reliability
Age of Car
No relation
Correlation Coefficient
If r = l = perfect correlation.
How to compute the simple correlation
coefficient (r)
xy x y
r n
( x) 2
( y)
2
x
2 . y
2
n n
:Example
A sample of 6 children was selected, data about their
age in years and weight in kilograms was recorded as
shown in the following table . It is required to find the
correlation between age and weight.
xy x y
r n
( x) 2
( y)
2
x
2 . y
2
n n
Age Weight
Serial
(years) (Kg) xy X2 Y2
.n
(x) (y)
1 7 12 84 49 144
2 6 8 48 36 64
3 8 12 96 64 144
4 5 10 50 25 100
5 6 11 66 36 121
6 9 13 117 81 169
Total =x∑ =y∑ xy=∑ =x2∑ =y2∑
41 66 461 291 742
41 66
461
r 6
(41)2 (66)2
291 .742
6 6
r = 0.759
Interpretation of r:
Strong direct correlation
EXAMPLE: Relationship between Anxiety and
Test Scores
Anxiety Test X2 Y2 XY
)X( score (Y)
10 2 100 4 20
8 3 64 9 24
2 9 4 81 18
1 7 1 49 7
5 6 25 36 30
6 5 36 25 30
X = 32∑ Y = 32∑ X2 = 230∑ Y2 = 204∑ XY=129∑
Calculating Correlation Coefficient
r = - 0.94
Interpretation of r:
Indirect strong correlation
Regression Analysis
220
200
180
160
140
120
100
80
Wt (kg)
60 70 80 90 100 110 120
Get the constants for the
regression line
Now we see how to calculate the constants a: y intercept, where the
regression line will meet Y axis and slope i.e. gradient of the straight
line
mathematically 160
140
120
Intercept 100
80
Wt (kg)
Slope 60 70 80 90 100 110 120
Linear Equations
Y
ŷY = bX
a +bX
a
Change
b = Slope in Y
Change in X
a = Y-intercept
X
Using the Least-Square Method
Problem I
The director of a sanitation department is interested in the
relationship between the age of a garbage truck and the annual
repair expense she could expect to incur. In order to determine
this relationship, the director has accumulated information
.concerning four of the trucks the city currently owns
If the city has a truck that is 4 years old, predict the annual repair
.expense for the same
Truck Number Age of the Truck (in Repair Expense during
years) (X) last year in Hundreds
of $ (Y)
A 5 7
B 3 7
C 3 6
D 1 4
b = 0.75
a = 3.75
Therefore the regression equation for y on
:x is given by
y = 3.75 + 0.75 x
Therefore, when x = 4
y = 675.00
Hours studying and grades
Regressing grades on hours
Linear Regression
90.00 Final grade in course = 59.95 + 3.17 * study
R-Square = 0.88
80.00
70.00
x n
2 41678
20
ŷ =112.13 + 0.4547 x
for age 25
B.P = 112.13 + 0.4547 * 25=123.49 = 123.5 mm hg
Checking the Estimating Equation
A crude way to verify the accuracy of the estimating equation
is to determine the graph of the sample points
ŷ
Calculating the upper and the lower
limits of prediction
Y^ +1Se = 675 + 1 (86.60)
$761.60 =
Y^ -1Se = 675 - 1 (86.60)
$588.40 =
i.e., we are roughly 68% confident that the actual repair
expense will be within $588.40 and $761.60
Overhead 191 170 272 155 280 173 234 116 153 178
Units 40 42 53 35 56 39 48 30 37 40
1 80,000 $1,200
2 29,000 $150
3 53,000 $650
4 13,000 $200
5 45,000 $325
:The regression equation is computed as
Y = 50 + .03 X
For example, if X=50,000 then Y = 50 + .03 (50,000) = $1,550
a=50 or the cost of maintenance when X=0; if there is no mileage on
the car, then the yearly cost of maintenance=$50
b=.03 the value that Y increases for each unit increase in X; for each
extra mile driven (X), the cost of yearly maintenance increases by
$.03
s.e.b = .0005; the value of b divided by s.e.b=60.0; the t-table
indicates that the b coefficient of X is statistically significant (it is
related to Y)
r2=.90 we can explain 90% of the variance in repair costs for
different vehicles if we know the vehicle mileage for each car