You are on page 1of 12

Regression Equation

Regression Equation:
A mathematical equation that allows us to predict values of one dependent variable from
known values of one or more independent variables is called regression equation.

Type of Regression Equation:


There are following ------------- type of equations.
1. Simple linear regression equation
2. Exponential regression equation
3. Multiple regression equation
4. General linear model

What is Simple Linear Regression Equation (SLR)?

A mathematical equation that allows us to predict value of one dependent variable from known
value of one independent variable is called simple linear regression equation.
The predict equation 𝑌̂ = 𝑎 + 𝑏𝑥 , by which we can predict the value of y on the basis of
predictor x.

The parametric simple linear regression equation is represented by the equation 𝜇𝑌𝐼𝑥 = 𝛼 +
𝛽𝑥 , where 𝛼 𝑎𝑛𝑑 𝛽 are called parameters of regression coefficients. Where a and b are the
point estimators for 𝛼 𝑎𝑛𝑑 𝛽 , respectively, we can then estimate 𝜇𝑌𝐼𝑥 by 𝑌̂ from the sample
regression line 𝑌̂ = 𝑎 + 𝑏𝑥.
If we let 𝑒𝑖 represent the vertical deviation from the y points to the Ŷi regression line, the
i
method of least squares yields formulas for calculating a and b so that the sum of the squares
of these deviations is a minimum. This sum of the squares of the deviations is called the sum of
squares of the errors about the regression line and is denoted by SSE.
𝑒𝑖 = 𝑦𝑖 − 𝑦̂𝑖
𝑒𝑖 = 𝑦𝑖 − (𝑎 + 𝑏𝑥𝑖 )
𝑒𝑖2 = (𝑦𝑖 − 𝑎 − 𝑏𝑥𝑖 )2
n n
𝑆𝑆𝐸 =  ei2   ( yi  a  bxi ) 2
i 1 i 1

Given the sample {(x,y); I = 1,2,3,…..,n}, the least-square method estimates of the parameters in
the regression line are obtained from the formula:
n  n  n 
n  xi yi    xi   yi 
Cov( x, y ) S xy  i 1  i 1 
b  2  i 1 2
2  
Var( x) Sx n n
n  xi    xi 
i 1  i 1 
a  y  bx
Now for any fixed value of x, each observation (𝑥𝑖 , 𝑦𝑖 ) in our sample satisfies the relation
y    i
i yix
Where  i is a random error representing the vertical deviation of the point from the population
regression line (parametric regression equation). From previous assumptions on y ,  i must
i
necessarily be a value of a random variable having a mean of zero and the variance  2 . In terms of the
sample regression line, we can also write:
yi  yˆi  ei
An essential part of regression analysis involves the construction of confidence intervals for α and β and
testing hypothesis concerning these regression coefficients. The hypothesis for testing the coefficients
are α=0 and β=0. However the unknown variance σ2 must be estimated from the data. An unbiased
estimate of σ2 with n-2 degree of freedom, denoted by S e2 , is given by the formula:

SSE  ei2  ( yi  yˆi ) 2


Se2   
n2 n2 n2
In usual the sample variance formula we use to take one degree of freedom, provide an unbiased
estimate of the population variance, since only µ is replaced by the sample mean 𝑥̅ in our calculations.
Here, it is necessary to take 2 degree of freedom in the formula for S e2 because 2 degree of freedom are
lost by replacing α and β by a and b in our calculation of the yˆ i ' s .The simple formula for the
calculation of SSE are as follows:
SSE  (n  1)(S y2  b 2 S x2 )
Where
n n n n
n xi2  ( xi ) 2 n yi2  ( yi ) 2
S x2  i 1 i 1
andS y2  i 1 i 1
n(n  1) n(n  1)
Test for Linearity of Regression Equation
OR
Validity of the Regression Model
We define the regression to be linear when all the means of y corresponding to each 𝑥𝑖 fall on a straight
line. One can always prefer a linear regression model over non linear model. We can test the linearity of
the regression equation by using the ANOVA test. If the linearity will be confirm than we can say that
regression model is valid and then we develop the model.

Calculation of ANOVA:

Values of x
50 55 65 70
Values of 𝑦̂ 74.893 79.378 88.348 92.833
corresponding to 74.893 79.378 88.348 92.833
each X 79.378 88.348 92.833
79.378
Sum 149.786 317.512 265.044 278.499 1010.841
Square of sum 22435.85 100813.9 70248.32 77561.69 1021800
Square of sum/ni 11217.92 25203.47 23416.11 25853.9 85149.96

Regression sum of square = (11217.92+25203.47+23416.11+25853.9)- 85149.96

Regression sum of square = 541.69


Residual sum of square = SSE = 186.557 and S e2 = 18.656

ANOVAb

Model Sum of Squares df Mean Square F Sig.

1 Regression 541.693 1 541.693 29.036 .000a

Residual 186.557 10 18.656

Total 728.250 11

a. Predictors: (Constant), TestScore

b. Dependent Variable: CheScore

Here the significance value of 0.000 < 0.05 which means that the Ho of ANOVA is significant
means that the Ho reject.
Inference: Ho = all means are equal; which is rejected means Regression line is not
horizontal that implies that line with some slope and slope shows the correlation between the
predictor and the estimator.

Inferences Concerning the regression coefficients:


Confidence interval for α

A (1-α)100% confidence interval for the parameter α in the regression line 𝜇𝑌𝐼𝑥 = 𝛼 + 𝛽𝑥 is

n n
t S e  xi2 t S e  xi2
a i 1
  a i 1
2 2

S x n(n  1) S x n(n  1)
Note that the symbol α is being used here in two totally unrelated ways, first as the level of
significance and then as the intercept of the regression line.

Confidence interval for β

A (1-α) 100% confidence interval for the parameter β in the regression line 𝜇𝑌𝐼𝑥 = 𝛼 + 𝛽𝑥 is

t S e t S e
b 2
 b 2
S x (n  1) S x (n  1)
Note that the symbol α is being used here in two totally unrelated ways, first as the level of
significance and then as the intercept of the regression line.

Predictions
̂ = 𝑎 + 𝑏𝑥 may be used to predict the mean response
The equation 𝑌 y
x o at x = xo, where, xo
is not necessarily one of the pre-chosen values, or it may be used to predict a single value 𝑦𝑜 of
the variable 𝑌𝑜 when𝑥 = 𝑥𝑜 . We would expect the error of the prediction to be higher in the
case of a single predicted value than in the case where a mean is predicted. This, then, will
affect the width of our confidence intervals for the values being predicted.

Predictions for y xo
A (1-α) 100% confidence interval for the mean Y x o is given by:

1 ( xo  x ) 2 1 ( xo  x ) 2
yˆ o  t 2 Se   Y xo  yˆ o  t 2 Se 
n (n  1) S x2 n (n  1) S x2
Predictions for yo
A (1-α) 100% confidence interval for the single value yo when 𝑥 = 𝑥𝑜 is given by:

1 ( xo  x ) 2 1 ( xo  x ) 2
yˆ o  t 2 Se 1   yo  yˆ o  t 2 Se 1  
n (n  1) S x2 n (n  1) S x2

What is the necessary condition for Simple linear regression?

The necessary condition for simple linear regression is that the test must be run between two
scale variable.
The variable must be correlated with each other.

How to run the test?

For the understanding we take the example from the book “wall pole” page no. 347.
In this example the two variables are IQ test score and Chemistry test score. Both are scale
measurement and theoretically they have correlation with each other.
Interpretation of output:

Descriptive Statistics

Mean Std. Deviation N

Chemistry test score 84.2500 8.13662 12

IQ test score 60.4167 7.82140 12

The descriptive of the variables

Correlations

Chemistry test
score IQ test score

Pearson Correlation Chemistry test score 1.000 .862

IQ test score .862 1.000

Sig. (1-tailed) Chemistry test score . .000

IQ test score .000 .

N Chemistry test score 12 12

IQ test score 12 12

The independent and dependent variable having correlation with each other therefore test can
be run.

Model Summaryb

Adjusted R Std. Error of the


Model R R Square Square Estimate

1 .862a .744 .718 4.31923

a. Predictors: (Constant), IQ test score

b. Dependent Variable: Chemistry test score

 R = 0.862 implies that the dependent and independent variables having strong and
direct relation with each other.
 R-square = 0.744 this means 74.4% variation in dependent variable is explained by the
predictors (independent variable) of the model.
 1-R Square = 0.256 or 25.6 is the unexplained variation in dependent variables due to all
those independent variables which are not in our study or in our model.
ANOVAb

Model Sum of Squares df Mean Square F Sig.

1 Regression 541.693 1 541.693 29.036 .000a

Residual 186.557 10 18.656

Total 728.250 11

a. Predictors: (Constant), IQ test score

b. Dependent Variable: Chemistry test score

The value of F-statistic is 29.036 which is very high and the p-value (or the sig value) is 0.000
which is less than 0.05 ( level of significance) this implies that the test of ANOVA is significant
and the model is valid from the given predictors. (See page # 365 for study)
Coefficientsa

Unstandardized Standardized 95.0% Confidence Collinearity


Coefficients Coefficients Interval for B Statistics

Lower Upper
Model B Std. Error Beta t Sig. Bound Bound Tolerance VIF

1 (Constant) 30.043 10.137 2.964 .014 7.458 52.629

IQ test .897 .167 .862 5.389 .000 .526 1.268 1.000 1.000
score

a. Dependent Variable: Chemistry test score

1. t-values are calculated by taking the ratio between β and the standard error. (e.g.
30.043/10.137 = 2.964)
2. as far as standard error is increases the t-value is decreases and ast-value decreases the
significant value (p-value) will increases and if p-value become more than the level of
significance which is usually 0.05 than the predictor become insignificant or less
important for the model
3. Here the significance value (the p-value) is less than 0.14 and 0.000, both are less than
0.05 which means that the constant term as well as the coefficient of x both are
significant for model.
4. Standardized Coefficients of Beta: it can be calculated by taking standardized values of
all the predictors and then run the test of regression analysis. In this context, what ever
be the value of β be calculated that will be the Standardized Coefficients of Beta.
5. If there are more than one predictors, Standardized Coefficients of Beta will rank the
importance of the predictors. The bigger value will be more important predictor as
compare to the one which has the smaller value.
6. It shows that 95% confidence Interval which fall between 7.458 and 52.629. It means
researcher is 95% confident that minimum value for 30.043 may be 7.458 and maximum
may be 52.629. (See page no. 358 to 360 of wall Pole for further study).
7. As in this model which is simple linear regression model where only one predictor
therefore the explanation of tolerance and VIF cannot be explain well. It will discuses
when we discuss multiple regression model.

Variable which is save during run of the test is RES_1, which shows the residual value we can
check by taking the difference between predicted value (PRE_1) and actual (chemistryScore)
The variables 7, 8 and 9, 10 show the confidence interval of 95% for the predicted value of ŷ at
specific value of x on the basis of mean of sample and on the individual basis. See page 361 to
363 of Wall Pole; for further understanding. Here this is important to understand that the range
of individual is more than mean. Because, it depends upon sample and its mean

Final Regression model


Yˆ  30.043  0.897 x
The predictor explaining 74.4% (R-Square = 0744) of the model.

You might also like