Professional Documents
Culture Documents
Linear regression
-involves solving set of variables when it is known that there
exist some inherent relationship among the variables
Examples:
It highly that for many example runs in which the inlet temperature is
the same, 130C, the outlet tar content will not be the same.
For automobiles with the same engine volume, they will not all have the
same gas mileage
Dependent variables- responses in the scenarios(tar content, gas
mileage)
Independent variables(regressors)- inlet temperature and engine
volume(cubic feet)
• The relationship for the response Y and the regressor x is the linear
relationship
� = �� + �1 �
where,�� is the intercept and is the �1 slope.
• If the relationship is exact, then it is deterministic relationship
between two scientific variables and there is no random or
probabilistic component to it.
� = � � + � 1 �1 + � 2 �2
}��
Simple Linear Regression Model
In regression analysis that deals on non deterministic relationship
of the variables there must be a random component to the equation that
relates the variables. Indeed, in most applications of regression , the
linear equation, say, � = �� + �1 �1 + �2 �2 is an approximation that is
a simplification of something unknown and much more complicated
• For the equation more often,� = �� + �1 �1 the models that are
simplifications of complicated and unknown structure are linear in
nature. These linear structures are simple and empirical in nature and
thus called empirical models.
�3
�1
3 5 36 34
7 11 37 36
11 21 38 38
15 16 39 37
18 16 39 36
27 28 39 45
29 27 40 39
30 25 41 41
30 35 42 40
31 30 42 44
31 40 43 37
Solids Reduction Oxygen Demand Solids Reduction Oxygen Demand
x% reduction x% reduction
y% y%
32 32 44 44
33 34 45 46
33 32 46 46
34 34 47 49
36 37 50 50
36 38
The table are plotted in a scatter diagram as shown in the figure. From
the observation, the points closely follow a straight line indicating the
assumption of linearity between the two variables appears to be
reasonable
Least square and fitted model
fitting an estimated regression line to the data requires the
determination of estimates �� for �� and �1 for �1 and computing for
the predicted values from the equation
� = �� + �1 �
A residual is essentially an error in the fit of the model
� = �� + �1 �
Definition
Residual:( Error in Fit)Given a set of regression data (�� , �� ); � =
1,2, . . . , �} and a fitted model � = �� + �1 � the ith residual �� is given
by
� � = �� − �� � = 1,2, . . . , �
If a set of n residuals is large, then the fit of the model is not good.
Small residuals are a sign of a good fit.
Equation
� = �� + �1 � + ��
bear in mind that �� are not observed and �� are not only observed
but also play an important role in the total analysis
The Method of Least Square
� �
�
�=1 �
− � � �=1 ��
�� = = � − �1 �
�
Estimate the regression line for the pollution data
33 33 33 33 2
�
�=1 �
= 1104 �
�=1 1
= 1124 �=1
41,355 �=1
� = 41,086
Therefore:
(33)(41,355) − (1104)(1124)
�1 = 2 = 0.903643
(33)(41,086) − (1104)
1124 − (0.903643)(1104)
�� = = 3.829633
33
The estimated regression line is given by
� = 3.8296 + 0.9036�
x
Table of values
n Stride Length(x) Speed(y) xy ��
1 2.5 3.4 8.5 6.25
2 3 4.9 14.7 9
3 3.3 5.5 18.15 10.89
4 3.5 6.6 23.1 12.25
5 3.8 7.0 26.6 14.44
6 4 7.7 30.8 16
7 4.2 8.3 34.86 17.64
8 4.5 8.7 39.15 20.25
Solution:
� =− 3.3163 + 2.7302(3.7)
� = 6.875 interpolation
� =− 3.3163 + 2.7302(4.7)
� =9.516 extrapolation