Professional Documents
Culture Documents
r
( x x )( y y )
[ ( x x ) ][ ( y y ) ]
2 2
X X r=0 X
r = -1 r = -.6
Y
Y Y
X X X
r = +1 r = +.3
11
r=0
Guide in interpreting ρ (r)
ρ Interpretation
0 No linear association
0<ρ<0.2 Very weak linear association
0.2≤ρ<0.4 Weak linear association
0.4≤ρ<0.6 Moderate linear association
0.6≤ρ<0.8 Strong linear association
0.8≤ρ<1.0 Very strong linear association
1.0 Perfect linear association
Test of hypothesis about ρ
Ho: ρ=0 (There is no linear association between X and Y)
i. |tc|≥tα/2, n-2
ii. tc≥tα, n-2
iii. tc≤-tα, n-2
Calculation Example
Supply Current without
Voltage (X) Electronics (Y)
X2 Y2 XY
0.66 7.32 0.4356 53.5824 4.8312
1.32 12.22 1.7424 149.328 16.1304
1.98 16.34 3.9204 266.996 32.3532
2.64 23.66 6.9696 559.796 62.4624
3.3 28.06 10.89 787.364 92.598
3.96 33.39 15.6816 1114.89 132.224
4.62 34.12 21.3444 1164.17 157.634
3.28 39.21 10.7584 1537.42 128.609
5.94 44.21 35.2836 1954.52 262.607
6.6 47.48 43.56 2254.35 313.368
X=34.3 Y=286.01 X2=150.6 Y2=9842 XY=1203
Calculation Example
(continued)
n xy x y
r
[n( x 2 ) ( x) 2 ][n( y 2 ) ( y)2 ]
10(1203) (34.3)(286.01)
[10(150.6) (34.3) 2 ][10(9842) (286.01) 2 ]
0.9479
n2 10 2
Decision rule: Reject Ho if |tc|≥t0.025,8=2.306; otherwise, do not
reject Ho.
Decision: Reject Ho since 8.42>2.306.
Conclusion: At α=0.01, there is sufficient evidence to conclude
that Voltage and Current are positively correlated.
Introduction to Regression Analysis
Regression analysis is used to:
– Predict the value of a dependent variable based
on the value of at least one independent variable
– Explain the impact of changes in an independent
variable on the dependent variable
Dependent variable: the variable we wish to
explain
Independent variable: the variable used to
explain the dependent variable
Simple Linear Regression Model
Population Random
Population Independent Error
Slope
y intercept Variable term, or
Coefficient
Dependent residual
y β0 β1x ε
Variable
y y β0 β1x ε
Observed Value of y
for xi
εi Slope = β1
Predicted Value of Random Error for
y for xi
this x value
Intercept = β0
xi x
Estimated Regression Model
The sample regression line provides an estimate of the
population regression line
Independent
ŷ i b0 b1x variable
e 2
(y ŷ) 2
(y (b 0 b1x)) 2
The Least Squares Equation
• The formulas for b1 and b0 are:
b1
( x x )( y y )
(x x) 2
algebraic equivalent:
and
xy x y
b1 n b0 y b1 x
( x )2
x 2
n
Interpretation of the
Slope and the Intercept
ˆy i 5.503 6.734 x i
Example
ˆy i 5.503 6.734 x i
Interpretation of the
Intercept, b0
ˆ 5.503 6.734 X
Y
• b0 is the estimated average value of Y when
the value of X is zero (if x = 0 is in the range of
observed x values)
• b1 is the estimated change in the value of Y
per unit change in X
Least Squares Regression Properties
• The sum of the residuals from the least squares
regression line is 0 ( )
( y yˆ ) 0
• The sum of the squared residuals is a minimum
(minimized )
( y yˆ ) 2
Xi x
Coefficient of Determination, R2
• The coefficient of determination is the
proportion of the total variation in the
dependent variable that is explained by
variation in the independent variable
TSS
Coefficient of Determination, R2
(continued)
Coefficient of determination
SSR sum of squares explained by regression
R 2
TSS total sum of squares
R r 2 2
where:
R2 = Coefficient of determination
r = Simple correlation coefficient
Assessing the overall goodness-of-fit
of the model
• ANOVA F test—if the F test of the ANOVA is
significant at α then the estimated
equation fits well to the data
• R2—the higher the value of R2, the better the
fit of the model
• Adjusted R2—more reliable than R2…use this
if there are more than 1 predictor in the
model
Standard Error of Estimate
• The standard deviation of the variation of
observations around the regression line is
estimated by
SSE
s
n k 1
Where
SSE = Sum of squares error
n = Sample size
k = number of independent variables in the model
The Standard Deviation of the
Regression Slope
• The standard error of the regression slope
coefficient (b1) is estimated by
sε sε
sb1
(x x) 2
x 2
( x) 2
n
where:
sb1 = Estimate of the standard error of the least squares slope
SSE
sε
n 2 = Sample standard error of the estimate
Inference about the Slope:
t Test
• t test for a population slope
– Is there a linear relationship between x and y?
• Null and alternative hypotheses
– H0: β1 = 0 (no linear relationship)
– H1: β1 0 (linear relationship does exist)
• Test statistic
b1 where:
– t b1 = Sample regression slope
s b1 coefficient
sb1 = Estimator of the standard
– d.f. n 2 error of the slope
SLR using Data Analysis Toolpak (MS Excel)
Residual Analysis
• Purposes
–Examine for linearity assumption
– Examine for constant variance for all levels
of x
– Evaluate normal distribution assumption
• Graphical Analysis of Residuals
– Can plot residuals vs. x
–Can create histogram of residuals to check
for normality
Residual Analysis for Linearity
y y
x x
residuals
residuals
x x
Not Linear
✓ Linear
Residual Analysis for
Constant Variance
y y
x x
residuals
x residuals x