Professional Documents
Culture Documents
= a + b1 x1 + b2 x 2 + b3 x3 + ............ + bk x k y
Even in the case of the population regression plane regression plane not all data points will lie on it.. Why ? Consider our IRS problem. Not all payments to informants will be equally effective.;
Some of the computer hours may be used for organizing
Therefore instead of satisfying the above equation the individual data points will satisfy :
t=
Where bi : slope of fitted regression
= a + b1 x1 + b2 x 2 + b3 x3 + ............ + bk x k + e y
This is the population regression plane plus a random disturbance term . The term e is a random disturbance term which equals zero on the average. The standard deviation of this term of this term is e. The standard error of the regression se which we have talked about in the earlier section is an estimate of e. As our sample regression equation :
bi Bio s bi
Bio : actual slope of hypothesized for the population Sbi: standard error of the regression coefficient
= a + b1 x1 + b2 x 2 + b3 x3 + ............ + bk x k y
This equation estimates the unknown e population regression plane
= a + B1 x1 + B2 x 2 + B3 x 3 + ............ + Bk x k y
As we can see the estimation of a regression plane can also be thought of as a problem of statistical inference where we make inferences regarding an unknown population relationship on the basis of an estimated relationship based on sample data. Much in the same way as for hypothesis testing for a mean we can also set up confidence intervals for the parameters of the estimated equation. We can also make inferences about the slopes of the true regression equation slope parameters(B1, B 2, Bk) on the basis of slopes coefficients of the estimated equation (b1, b2 , b 3, bk ). Tests of Inference of an Individual Slope Parameter Bi As explained earlier we can use the value of the individual bi , which are values of the slope parameter for the ith variable , to test a hypotheses about the value of Bi , which is the true population value of the slope for the ith variable. The process of hypotheses testing is the same as that delineated for testing the mean.
11.556
183
RESEARCH METHODOLOGY
Figure 3 Figure 2 If p> Xi is not a significant explanatory variable . If p< Xi is a significant explanatory variable . This test of significance of the explanatory variable is always a two-tailed test. The independent variable x i is a significant explanatory variable if bi is significantly different from zero. This requires that our t ratio be a large positive or negative. In our IRS example for each of the three explanatory variables p is less than .01. Therefore we conclude that each one is a significant explanatory variable. TEst of Significance of The Regression as a Whole It is quite possible that we frequently may get a high value of R2 by pure chance. After all if we throw a dart on board to get a scatter plot we could generate a regression, which may conceivably have a high R2. Therefore we need to ask the question a high value of R2 necessarily mean that the independent variables explain a large proportion of the variation in Y or could this be a freak chance. In statistical terms we ask the following question: Is the regression as a whole significant? In the last section we had looked at whether the individual x i were significant. Now we ask whether collectively all the xi (i=1k) together significantly explain the variability in y. Our hypothesis is: Ho: B1=B2+Bk = 0 Null hypothesis that y does not depend on x is Ha: atleast one Bi0 Alternative hypothesis that at least one Bi is not zero. To explain this concept we have to go back to our initial diagram, which shows the two variable case. (insert diag Lr p743 The total variation in y Thus when we look at the variation in y we look at 3 different terms each of which is a sum of squares .These are denoted as follows:
Total variation in y can be broken into two parts: the explained and the unexplained: SST=SSR+SSE Each of these has an associated degrees of freedom. SST has n-1 degrees of freedom. SSR has k degrees of freedom because there are k independent variables. SSE has degrees of freedom n-k-1 because we used n observations to estimate k+1 parameters a, b1,b2, ..bk. If the null hypotheses is true we get the following F ratio
SSR k F= SSE n k 1
Which has a F distribution with k numerator degrees of freedom and n-k-1 degrees of freedom in the denominator. If the null hypotheses is false i.e that the explanatory variables have a significant effect on y then the F ratio tends to be higher than if the null hypothesis is true., So if the F ratio is large we reject the null hypotheses that the explanatory variables have no effect on the variation of y . Therefore we reject Ho and conclude that the regression is significant. Going back to our IRS example we now look at the computer output. A typical output of a regression also includes the computed F ratio for the regression. This is also at times called the ANOVA for the regression. This is because we break up the up the analysis of variation in Y into explained variance or variance explained by the regression(between column variace0 and unexplained variance.(within column variance.) This is shown in table 3
( y y)
y) (y ) (y y
Unexplained variation
2
This is shown in the figure 3 for the one variable case of simplicity. For a multiple variable case the something applies conceptually.
Copy Right: Rai University
184
11.556
Table 3
a. What is the best fitting regression equation for these data? b. What percentage of the variation in grades is explained by this equation?
RESEARCH METHODOLOGY
c. What grade would you aspect for a 21- year old student with an IQ of 113 who studied 5 hour and used three different books? Notes
118.52 0.00
Total
29.600
The sample output for the IRS problem is given above. SSR=29.109, k=3 SSE=.491 ( with n-k-1 df = 6) degrees of freedom.
Stdev 41.55
Tratio -1.20
1.06931 0.98163 1.09 1.36460 0.37627 3.63 2.03982 1.50799 1.35 1.78990 0.67332 -2.67
S = 11.657
11.556
R sq = 76.7%
Copy Right: Rai University 185