Regression Analysis
Regression Analysis
1
LeastSquares Linear Regression
Enables fit of linear or exponential function to data. The goal in regression analysis is the development of a statistical model that can be used to predict the values of a dependent or response variable from the values of the independent variable(s).
Linear Fits Most Common For exponential functions, data must be transformed.
Regression Analysis
2
Method of Least Squares
If we have N pairs of data (x _{i} , y _{i} ) we seek to fit a straight line through the data of the form:
Determine constants, a _{0} and a _{1} , such that the distance between the actual y data and the fitted/ predicted line is minimized.
Each x _{i} is assumed to be error free. All the error is assumed to be in the y values.
y =
a
0
+
a x
1
Regression Analysis
3
Manual Calculation Method
Raw Data 

y 
i 
x 
i 
x _{i} yi 
x 
2 i 

1.2 
1 
1.2 
1 

2 
1.6 
3.2 
2.56 

2.4 
3.4 
8.16 
11.56 

3.5 
4 
14 
16 

3.5 
5.2 
18.2 
27.04 

S 
u m 
1 
2 . 6 
1 
5 . 2 
4 
4 . 7 6 
5 
8 . 1 6 
Seeking an equation with the form: y=a _{0} +a _{1} x
y=0.879+0.540x
Regression Analysis
4
How good is the fit?
Coefficient of Determination (R ^{2} ) measures the goodness of fit and the proportion of the variation of the y values associated with the variation in the x variable in the regression. The ratio of the explained variation to the total variation.
R ^{2} =1 Perfect Fit (good prediction) R ^{2} =0 No correlation between x and y For engineering data, R ^{2} , will normally be quite high (0.80.90 or higher) A low value might indicate that some important variable was not considered, but is affecting the results.
R ^{2}
" (ax _{i} + b ! y _{i} ) ^{2}
= 1 ! 
= Excel Function RSQ (y _{i} 's, 
x _{i} 's) 
" (y _{i} ! y) ^{2}
where
y = average of the y _{i} 's
Regression Analysis
5
Standard Error of Estimate SEE
The standard error of estimate (SEE or S
) is a
yx
statistical measure of how well the bestfit line represents the data. This is, effectively, the standard deviation of the differences between the data points and the bestfit line.
It provides an estimation of the scatter/random error in the data about the fitted line. This is analogous to standard deviation for sample data. It has the same units as y. 2 degrees of freedom are lost to calculate coefficients a _{0} and a _{1} .
sey = SEE = S _{y}_{x} =
= Excel Function STEYX ( y _{i} ' s , x _{i} ' s )
where y _{i} = actual value of y for a given x _{i} yˆ _{i} = predicted value of y for a given x _{i}
Regression Analysis
6
Linear Regression Assumptions
Variation in the data is assumed to be normally distributed and due to random causes. Assuming random variation exists in y values, while x values are error free. Since error has been minimized in the y direction, an erroneous conclusion may be made if x is estimated based on a value for y. For power law or exponential relationships, data needs to be transformed before carrying out linear regression analysis. (As we will discuss later, the method of least squares can also be applied to nonlinear functional relationships.)
Regression Analysis
7
Linear Regression Example
Use Excel Chart>>Add Trendline to obtain coefficients Functions RSQ() and STEYX() to determine R ^{2} and SEE
Length, cm
Regression Analysis
8
Regression Analysis using Excel Analysis Tools
Linear regression is a standard feature of statistical programs and most spreadsheet programs. It is only necessary to input the x and y data. The remaining calculations are performed immediately.
Excel “Regression Analysis” macro
Performs linear regression only Nonlinear relationships must be transformed Calculates the slope, intercept, SEE, and the upper and lower confidence intervals for the slope and intercept Does not produce any graphical output on the user’s plot. Does not update automatically. The user must interpret the results.
Regression Analysis
9
Linear Regression in Excel 2008
Torque, Nm (Y)
Y = m1i X + b
RPM (X)
Y Predicted
Residual
Residual/SEE=Residual/sey
4.89 
100 4.998433207 0.108433207 
0.17558474 

4.77 
201 
4.559896053 0.210103947 
0.340219088 

3.79 
298 4.138726707 0.348726707 
0.564689451 

3.76 
402 
3.687163697 0.072836303 
0.117943051 

2.84 
500 3.261652399 0.421652399 
0.682777249 

4.12 
601 
2.823115245 
1.296884755 
2.100031702 
2.05 
699 2.397603947 0.347603947 
0.562871377 

1.61 
799 1.963408745 0.353408745 
0.572271025 
0.004341952 5.432628409
0.481645161
0.617554846
6
0.000954031
0.775391233
20.71311576
m1 
b 
se1 
seb 
r^2 
sey 
F 
df 
=LINEST(A2:A9,B2:B9,TRUE,TRUE)
Outlier
Regression Analysis
10
Linear Regression Example: Omit Outlier
Torque, Nm (Y)
RPM (X)
Y Predicted
Residual
Residual/SEE=Residual/sey
4.89 
100 
5.000219168 
0.110219168 
0.504559919 

4.77 
201 
4.504157858 
0.265842142 
1.21696881 

3.79 
298 
4.02774254 
0.23774254 
1.088334807 

3.76 
402 
3.516946736 
0.243053264 
1.112646171 

2.84 
500 
3.03561992 
0.19561992 
0.895506407 

2.05 
699 
2.058231795 
0.008231795 
0.037683406 

1.61 
799 
1.567081983 
0.042918017 
0.196469559 

0.004911498 
5.49136898 
m1 
b 

0.000348477 
0.170606738 
se1 
seb 

0.975447633 
0.218446143 
r^2 
sey 

198.6463557 
5 
F 
df 

9.479149271 
0.238593586 
m1 
b 
Regression Analysis
11
Uncertainties on Regression
Confidence Interval for Regression Line SEE=sey
TINV(a=0.05,n=5) 2.570581835 95% C.I.=TINV( α=0.05, ν =5)*SEE/SQRT(7)# 0.212239784
0.218446143
Prediction Band for Regression Line 95% P.I.=TINV( α=0.05, ν =5)*SEE # 0.561533687
Uncertainty in Slope Δb=TiINV(0.05,5)*se1# 0.000895789
Uncertainty in Intercept Δb=TiINV(0.05,5)*seb # 0.438558582
Regression Analysis
12
Regression Line Confidence Intervals & Prediction Band
Not only do you want to obtain a curve fit relationship but you also want to establish a confidence interval in the equation or measure of random uncertainty in a curve fit. ν =N2 in determination of tvalue. Two degrees of freedom are lost because m1 and b are determined.
RPM
Regression Analysis
13
Regression Line Confidence Interval & Prediction Band
CI! in ! Curve! Fit! = ± t _{!} _{2} _{,} _{n} _{"} _{2} # sey ^{1}
More accurate
^{$} ^{±} ^{t} ! 2 , n " 2 ^{#} ^{s}^{e}^{y}
$ ± t " 2 , n # 2 sey
Approximate
minimum at mean flares out at low & high extremes
Regression Analysis
14
Summations Used in Statistics & Regression
Variable 
Expression 

Sample Standard Deviation 
$ 
1 
# 
( 
i ! x _{)} ^{2} ' ( 
1 / 2 

S _{x} = 
& % N 
! 1 
" 
x 
) 

Expressions used in regression analysis 

Sum of squares for evaluating CI & PI 
S xx = 
" 
( x 
i ! x _{)} ^{2} 
1 / 2 

Standard error of estimate 
sey = SEE = S _{y}_{x} = 
# % 
" ( y ! y i 
predicted ! at ! x = x 
i 
) ^{2} 
& ( 

$ 
N ! 2 
' 
CI in slope and intercept
Slope, m
CI ! in!slope = ± t _{!} _{2} _{,} _{v} " se1
Intercept, b
CI in Intercept = ± t _{!} _{2}_{,} _{v} " seb
Note 1: ν =n2. Note 2: m & b are not independent variables. Therefore, do not apply RSS to y=mx+b to determine Δ y. Instead, use CI for curve fit.
Regression Analysis
16
Outliers in xy Data Sets
Method involves computing the ratio of the residuals (predictedactual) to the standard error of estimate (sey=SEE)
1.
2.
3.
Residuals=y _{p}_{r}_{e}_{d}_{i}_{c}_{t}_{e}_{d} y _{a}_{c}_{t}_{u}_{a}_{l} at each x _{i} Plot the ratio of residuals/SEE for each x _{i} . These are the “standardized residuals”. Standardized residuals exceeding ±2 may be considered outliers. Assuming the residuals are normally distributed, you can expect that 95% of residuals are in the range ±2 (that is, within 2 standard deviations from best fit line)
Regression Analysis
17
Linear Regression with Data Transformation
Regression Analysis
18
Data Transformation
Commonly, test data do not show an approximate linear relationship between the dependent (Y) and independent (X) variables and a direct linear regression is not useful.
The form of the relationship expected between the dependent and independent variables is often known. The data needs to be transformed prior to performing a linear regression. Transformations often can be accomplished by taking the logarithms of or natural logarithms of one or both sides of the equation.
Regression Analysis
19
Common Transformations
Relationship 
Plot Method 
Transformed Intercept, b 
Transformed Slope, m1 

y=α x ^{γ} 
Log y vs. Log x (log plot) 
Log( α ) 
γ 

Log(y)=Log( α )+γLog(x) 

Ln y vs. x 
(loglog paper) 
γ 

Ln(y)=Ln( α )+γLn(x) 
Ln( α ) 

y=α e ^{γ}^{x} 
Log y vs. x 
(semilog plot) 

Log(y)=Log( α )+γLog(e)x 
Log( α ) 
γ Log(e) 

Ln y vs. x 
(semilog plot) 

Ln(y)=Ln( α )+γx 
Ln( α ) 
γ 
Regression Analysis
20
Regression with Transformation
Example
A velocity probe provides a voltage output that is related to velocity, U, by the form E= δ+ εU ^{ρ} δ, ε, and ρ are constants
Output Voltage, VDC
5
4.5
4
3.5
3
U (ft/s)
0
Ei (V)
3.19
10 3.99
20
30 4.48
40 4.65
4.3
0
10
20
30
40
Velocity, ft/s
50
Velocity, ft/s
Regression Analysis
21
Data Relationship Transformation
E= δ+εU ^{ρ}
Log(E3.19)=Log( εU ^{ρ} ) Log(E3.19)=Log( ε)+Log(U ^{ρ} )= Log( ε)+ ρLog(U)
(E=δ=3.19 at U=0)
U (ft/s) 
Ei (V) 
Lets Tranform this 
X 
Y 

0 
3.19 

10 
3.99 
1.00 
0.097 

20 
4.3 
1.30 
0.045 

30 
4.48 
1.48 
0.111 

40 
4.65 
1.60 
0.164 

Perform Regression on the transformed Data 
Regression Analysis
22
Solution (Excel 2004 Output)
SUMMARY OUTPUT 

Regression Statistics 

Multiple R 
0.998723855 

R Square Adjusted R Square Standard Error Observations 
0.997449339 0.996174009 
t value ^{t} ! ," 
TINV ( 0.05 , 2) = 4.3026 t*SEE 0.02 = 3.18 

0.01 


4 
SEE=0.0070 

ANOVA 

df 
SS 
MS 
F 
Significance F 

Regression 
1 
0.038118269 0.038118 
782.1106 
0.00127614 

Residual 
2 
9.74754E05 4.87E05 

Total 
3 
0.038215745 

Coefficients 
Standard Error t Stat 
Pvalue 
Lower 95% 
Upper 95% 

Intercept 
0.525 
0.021056315 24.9274 0.001605 0.61547736 0.4342812 

X 
Variable 1 
0.432 
0.015438034 27.96624 0.001276 0.36531922 0.49816831 
Y=0.525+0.432X
Regression Analysis
23
Regression with Transformation & Uncertainty
Y predicted 
Y+ 
Y 
Transform it Back Again 
E 
E+ 
E 

3.19 
3.19 
3.19 

0.0931 
0.0781 
0.1082 
4.00 
4.03 
3.97 

0.0368 
0.0519 
0.0218 
4.28 
4.32 
4.24 

0.1129 
0.1279 
0.0978 
4.49 
4.53 
4.44 

0.1668 
0.1818 
0.1518 
4.66 
4.71 
4.61 
Example 4.10
U, ft/s
Regression Analysis
B=Logb
0.525=Logb
b=0.298
E=3.19+0.298U ^{0}^{.}^{4}^{3}^{2}
24
Multiple and Polynomial Regression
Regression analysis can also be performed in situations where there is more than one independent variable (multiple regression) or for polynomials of an independent variable (polynomial regression)
Polynomial Expression Seeks the form
Y=b+m _{1} *x+m _{2} *x ^{2} + …… +m _{k} x ^{k}
Multiple Regression seeks a function of the form
Y = b + m _{1} xˆ _{1} + m _{2} xˆ _{2} + m _{3} xˆ _{3} +
where
xˆ may represent several independent variables For example:
xˆ _{1} = x _{1} xˆ _{2} = x _{2} xˆ _{3} = x _{1} ! x _{2}
+ m _{k} xˆ _{k}
Regression Analysis
25
Linear Regression in Excel 2004
Input the result values
Input the independent variable
Input desired confidence level
Regression Analysis
26
Excel 2004 Linear Regression Output
SUMMARY OUTPUT
R ^{2}
Regression Statistics
Multiple R 
0.99964308 

R 
Square 
0.99928628 
SEE=sey 

Adjusted R Square 
0.99910785 


Standard Error 
0.02788582 

Observations 
6 
N


ANOVA 

df 
SS 
MS 
F 
Significance F 

Regression 
1 
4.35502286 
4.35502286 
5600.45805 
1.9107E07 

Residual 
4 
0.00311048 
0.00077762 

Total 
5 
4.35813333 

Coefficients Standard Erro 
t Stat 
Pvalue 
Lower 95% 
Upper 95% 
intercept ”b"
slope ”m1"
The lower and upper bounds for the coefficients. To obtain the + bound, simply subtract the lower from the upper and divide by two.
27
Regression Analysis
Much more than documents.
Discover everything Scribd has to offer, including books and audiobooks from major publishers.
Cancel anytime.