Professional Documents
Culture Documents
PREPARED FOR
PREPARED BY
NAME ID
So, here Y = Mortality rate is our Dependent Variables. We can say that there is a positive
relationship between X1 and Y or X2 and Y. The positive (+) sign defines the positive
relationship here. We can also assume that X1>0 so this is positive sign. But we don’t know
what should be the exact sign. This is our hypothesis term not estimated yet. When we estimate
our variables by excel or different kind of software we will know the expected sign.
An error term in statistics is a value which represents how observed data differs from actual population
data. It can also be a variable which represents how a given statistical model differs from reality.An error
term essentially means that the model is not completely accurate and results in differing results during
real-world applications. For example, assume there is a multiple linear regression function that takes the
following form:
Y=αX+βρ+ϵ
X, ρ=Independent variables
ϵ=Error term
When the actual Y differs from the expected or predicted Y in the model during an empirical test, then
the error term does not equal 0, which means there are other factors that influence Y.
If we can show these then we can easily understand that what are we going to do when we run
our regression equation. In our data set Mortality rate, GNP, female literacy rate is Quantitate
data. Because these data are explained in terms of numerous numbers. But the totally fertility
rate data is qualitative data.
Female
Child literacy Total
mortality rate, Per capita fertility
Observation rate percent GNP rate
Mean 32.5 Mean 141.5 Mean 51.1875 Mean 1401.25 Mean 5.549688
Standard Standard Standard Standard Standard
Error 2.327373 Error 9.497258 Error 3.250982 Error 340.712 Error 0.188624
Median 32.5 Median 138.5 Median 48 Median 620 Median 6.04
Mode #N/A Mode 142 Mode 22 Mode 300 Mode 6.5
Standard 18.61899 Standard 75.97807 Standard 26.00786 Standard 2725.696 Standard 1.508993
Deviation Deviation Deviation Deviation Deviation
Sample Sample Sample Sample Sample
Variance 346.6667 Variance 5772.667 Variance 676.4087 Variance 7429417 Variance 2.27706
Kurtosis -1.2 Kurtosis -0.57317 Kurtosis -1.35364 Kurtosis 34.30127 Kurtosis -0.0983
Skewness -2.9E-17 Skewness 0.286098 Skewness 0.079812 Skewness 5.411062 Skewness -0.63557
Range 63 Range 300 Range 86 Range 19710 Range 6.8
Minimum 1 Minimum 12 Minimum 9 Minimum 120 Minimum 1.69
Maximum 64 Maximum 312 Maximum 95 Maximum 19830 Maximum 8.49
Sum 2080 Sum 9056 Sum 3276 Sum 89680 Sum 355.18
Count 64 Count 64 Count 64 Count 64 Count 64
So, this table shows the summary of our data. Here mean for child mortality rate is 32.5, female
literacy rate 141.5, GNP 51.18 and fertility rate mean are 1401.25. In total mean is 5.54.
Standard deviation for child mortality 18.61, Female literacy rate 75.97, GNP 26.00, Fertility
rate 2725.65 and in total Standard deviation 1.50.
Skewness is the degree of distortion from the symmetrical bell curve or the normal distribution.
A symmetrical distribution will have a skewness of 0. In the summary of data, the Skewness for
child mortality rate -2.9, Female literacy rate 0.28, GNP 0.07, fertility rate 5.41 and in total
skewness -0.63. So, we can conclude that this model has positive and negative skewness.
Finally, in last option we got a observation as count, Count means Sample Size (n).
4.Methodolgy
So, our methodology is Y=Bo+B1x1+B2x2+B3x3+E. I want to estimate this data by using ols
method. In statistics, OLS means ordinary least squares, OLS is a type of linear least squares
method for estimating the unknown parameters in a linear regression model.
Y = 168.3067-1.768X1-0.00551X2+12.86864X3
Here, Bo = 168.3067 is the vertical intercept of the model, B1 = -1.768 indicates that when
female literacy rate change in 1% then child mortality decreased by 1.768 thousands on an
average, considering other factors (Per capita GNP, Total fertility rate) are constant. B2 = -
0.00551 indicates that when per capita GNP change by 1 unit then child mortality deceased
by 0.00551 thousand on an average considering other factors (Female literacy, Total fertility
rate) are constant. B3 = 12.86864 indicates that when total fertility rate change in 1% then
child mortality increase by 12.86864 thousands on an average considering other factors (
Female literacy, Per capita GNP) are constant. So, from this equation we can say that only
total fertility rate has a positive relationship with child mortality rate.
Regression Statistics
Multiple R 0.864507
R Square 0.747372
Adjusted R
Square 0.73474
Standard
Error 39.13127
Observation
s 64
In this model R-Squared value is 0.747372 that means 74.74% variance of child mortality rate
explained by the three independent variable female literacy rate, per capita GNP and total
fertility rate. As R-Squared value is above 70%, we can say that the model is good fit.
T-test is the partial significance test or individual test. If t statistics value is greater than t critical
value then we can reject null hypothesis and accept alternative hypothesis. We can also check
probability value from level of significance and assume that t is significant or not.
Hypothesis Building:
For B1, Ho: B1 = 0 [There is no relationship between child mortality rate and female literacy]
Ha: B1 ≠ 0 α = 5%
T statistic value for b1 is -7.12866 which is negative statistics value. P value for b1 is
0.00000000151 which is less than .05 level of significance. So, we can reject null hypothesis at
5% level and accept alternative hypothesis. There is a 5% significance relationship between child
mortality rate and female literacy rate.
For B2, Ho: B1 = 0 [There is no relationship between child mortality and per capita GNP]
Ha: B1 ≠ 0 α = 5%
T statistic value for b2 is -2.93428 which is negative statistic value. P value for b2 is 0.004731
which is less than .05 level of significance. So, we can reject null hypothesis at 5% level and
accept alternative hypothesis. There is a 5% significance relationship between child mortality
rate and per capita GNP.
For B3, Ho: B3 = 0 [There is no relationship between child mortality and total fertility rate]
Ha: B3 ≠ 0 α = 5%
T statistic value for b3 is 3.070883 which is positive statistic value. P value for b3 is 0.003205
which is less than .05 level of significance. So, we can reject null hypothesis at 5% level and
accept alternative hypothesis. There is a 5% significance relationship between child mortality
rate and total fertility rate.
ANOVA
df SS MS F Significance F
Regression 3 271802.6 90600.87 59.16767 6.46E-18
Residual 60 91875.38 1531.256
Total 63 363678
F-test is an overall test. The F-test of overall significance indicates that whether your linear
regression model provides a better fit to the data than a model that contains no independent
variables. F-test is the jointly effect of that model. If F-test is less than critical value then we can
reject null hypothesis and reject alternative hypothesis. If probability value is less than α then F
test will be significance of α level.
Model Building:
Ho: B1=B2=B3= 0 [There is jointly relationship between dependent and independent variable]
Ha: B1≠B2≠B3≠ 0 α = 5%
F statistic value is 59.16767. P value is 6.46 E-18 which is less than .05 level of significance.
That means our sample data provides sufficient evidence to conclude that our regression model
fits the data. So, we can reject null hypothesis at 5% level and accept alternative hypothesis.
There is a 5% significance relationship between child mortality rate and female literacy rate, per
capita GNP, total fertility rate.
In this case the pair wise correlation coefficient among independent variables are less than 0.70
and all independent variables are greater than - 0.70.
Heteroskedasticity:
Based on the scatter plot output above, it appears that the spots are diffused and don’t form a
clear specific pattern. So it can be concleded that the regrassion model considering female
literacy rate, doesn’t occure heteroscedasticity problem unable to satisfy constant varience.
Based on the scatter plot output above,it appears that the spots are diffused and don’t form a
cleadr specific pattern. So it can be concleded that the regrassion model considering Total
fertility rate, doesn’t occure heteroscedasticity problem and unable to satisfy constant varience.
Normality Assumption:
400
300
200
100
0
0.78125
5.46875
24.21875
47.65625
71.09375
94.53125
10.15625
14.84375
19.53125
28.90625
33.59375
38.28125
42.96875
52.34375
57.03125
61.71875
66.40625
75.78125
80.46875
85.15625
89.84375
99.21875
Series1
Sample Percentile
As, more than 80% Standardized residuals data of this model lies between the range of positive 2
to negative 2,we can say that this model satisfies the normal distribution.
number of deaths of
Child mortality, the
4430
3731872279
-500 children under deaths of
Per capita GNP age 5 in a year… children…
Female literacy rate, percent
A good line fit refers to a line through a scatter plot of data points that best express the
relationship between these points. If the actual value and predicted value is close then we can say
this plot is good line fit. In this case the child mortality rate and per capita GNP which is
predicted and actual value close to each other. So, we can conclude that the child mortality
diagram against per capita GNP is good line fit.
Female literacy and child mortality also follows the good line fit.
Conclusion: In this research hypothesis we test a lot of things including T test, F test, R-
Squared, P-value approach. Then we also done post estimation analysis by heteroskedasticity test
for constant variance, Normality assumption, multicollinearity problem and at last line fit. So our
individual test and overall test both are significance at 5% level. Our R-Squared value is more
than 70% which is good fit. We also have no excess correlation problem in this model and we
don’t find any heteroskedasticity problem as well. So, our model supports the model accuracy
and model selection properties.
Thank You