You are on page 1of 10

Term Paper On “Child Mortality Rate”

Course title: Statistics for Business and Economics II


Course Code: Eco 204
Section: 02
Group name: Horizon

PREPARED FOR

BIPLOB KUMAR NANDI


Lecturer, Department of Economics
Department of Economics
East West University

PREPARED BY

NAME ID

Sultan Ahmed Niloy 2017-1-30-008


Imam Mehedi Hasan 2018-1-10-092
Kh. Bushra Rawnak 2019-2-30-052
Monisha Bhattacharjee 2018-1-10-256

DATE OF SUBMISSION: 26th September, 2020


1. Introduction:
The excel data is based on Child mortality, the number of deaths of children under age 5 in a
year per 1000 live births, Female literacy rate, (percent), GNP, Fertility rate. We have to find out
that does the female literacy rate, GNP, Fertility rate is a matter for Child mortality rate or not.
And the data set we have it’s a secondary data. We could not able to do field survey so we
collect this data and identified as secondary data. We chose these data because we know that
these this are relative. Like if mortality rate goes down then GNP will increase. Again, if female
literacy rate goes down this will create negative impact on mortality rate and GNP. Like female
won’t know never about the bad result of child marriage and being a mom at the early age of life.
So, it will increase the mortality rate of child and mom both. In our data Y= Child mortality rate
which is dependent variable. On the other hand, X1= female literacy rate, x2= GNP, x3= Fertility
rate these are independent variable.
Objectives of the study:
Model has built based on Research hypothesis. This research hypothesis has come from excels
data. Since we have a data set on Child mortality rate. The objective of this study is that does the
Female literacy rate, GNP, Fertility rate is a matter for Child mortality rate?
Y= Bo+B1x1+B2x2+B3x3+E
Here Y= Child Mortality Rate, X1= Female Literacy Rate, X2= GNP, X3= Total Fertility Rate.

2.Population Regression Model


Normally Regression Model establish if there is a relationship between two variables. More
specifically establish if there is a statistically significant relationship between two. As we assume
that Y=Bo+B1x1+B2x2+B3x3+E

So, here Y = Mortality rate is our Dependent Variables. We can say that there is a positive
relationship between X1 and Y or X2 and Y. The positive (+) sign defines the positive
relationship here. We can also assume that X1>0 so this is positive sign. But we don’t know
what should be the exact sign. This is our hypothesis term not estimated yet. When we estimate
our variables by excel or different kind of software we will know the expected sign.
An error term in statistics is a value which represents how observed data differs from actual population
data. It can also be a variable which represents how a given statistical model differs from reality.An error
term essentially means that the model is not completely accurate and results in differing results during
real-world applications. For example, assume there is a multiple linear regression function that takes the
following form:

Y=αX+βρ+ϵ

where: α, β=Constant parameters

X, ρ=Independent variables

ϵ=Error term

When the actual Y differs from the expected or predicted Y in the model during an empirical test, then
the error term does not equal 0, which means there are other factors that influence Y.

3. Summary of the Data:


Summary of the data means before estimated our regression equation or estimated our co
efficient by using OLS I have to show that what is the basic properties variables that I have
included in my model. So, this is called summary of the data. In this part normally, we want to
understand what’s the pattern of data. In the summary of the data I have to show mean value,
Standard variations, skewness etc.

If we can show these then we can easily understand that what are we going to do when we run
our regression equation. In our data set Mortality rate, GNP, female literacy rate is Quantitate
data. Because these data are explained in terms of numerous numbers. But the totally fertility
rate data is qualitative data.

Female
Child literacy Total
mortality rate, Per capita fertility
Observation rate percent GNP rate
Mean 32.5 Mean 141.5 Mean 51.1875 Mean 1401.25 Mean 5.549688
Standard Standard Standard Standard Standard
Error 2.327373 Error 9.497258 Error 3.250982 Error 340.712 Error 0.188624
Median 32.5 Median 138.5 Median 48 Median 620 Median 6.04
Mode #N/A Mode 142 Mode 22 Mode 300 Mode 6.5
Standard 18.61899 Standard 75.97807 Standard 26.00786 Standard 2725.696 Standard 1.508993
Deviation Deviation Deviation Deviation Deviation
Sample Sample Sample Sample Sample
Variance 346.6667 Variance 5772.667 Variance 676.4087 Variance 7429417 Variance 2.27706
Kurtosis -1.2 Kurtosis -0.57317 Kurtosis -1.35364 Kurtosis 34.30127 Kurtosis -0.0983
Skewness -2.9E-17 Skewness 0.286098 Skewness 0.079812 Skewness 5.411062 Skewness -0.63557
Range 63 Range 300 Range 86 Range 19710 Range 6.8
Minimum 1 Minimum 12 Minimum 9 Minimum 120 Minimum 1.69
Maximum 64 Maximum 312 Maximum 95 Maximum 19830 Maximum 8.49
Sum 2080 Sum 9056 Sum 3276 Sum 89680 Sum 355.18
Count 64 Count 64 Count 64 Count 64 Count 64

So, this table shows the summary of our data. Here mean for child mortality rate is 32.5, female
literacy rate 141.5, GNP 51.18 and fertility rate mean are 1401.25. In total mean is 5.54.

Standard deviation for child mortality 18.61, Female literacy rate 75.97, GNP 26.00, Fertility
rate 2725.65 and in total Standard deviation 1.50.

Skewness is the degree of distortion from the symmetrical bell curve or the normal distribution.
A symmetrical distribution will have a skewness of 0. In the summary of data, the Skewness for
child mortality rate -2.9, Female literacy rate 0.28, GNP 0.07, fertility rate 5.41 and in total
skewness -0.63. So, we can conclude that this model has positive and negative skewness.

Finally, in last option we got a observation as count, Count means Sample Size (n).

4.Methodolgy
So, our methodology is Y=Bo+B1x1+B2x2+B3x3+E. I want to estimate this data by using ols
method. In statistics, OLS means ordinary least squares, OLS is a type of linear least squares
method for estimating the unknown parameters in a linear regression model.

5.Analysis of our Regression Estimation Result:


The linear regression equation is,

Y = 168.3067-1.768X1-0.00551X2+12.86864X3

Here, Bo = 168.3067 is the vertical intercept of the model, B1 = -1.768 indicates that when
female literacy rate change in 1% then child mortality decreased by 1.768 thousands on an
average, considering other factors (Per capita GNP, Total fertility rate) are constant. B2 = -
0.00551 indicates that when per capita GNP change by 1 unit then child mortality deceased
by 0.00551 thousand on an average considering other factors (Female literacy, Total fertility
rate) are constant. B3 = 12.86864 indicates that when total fertility rate change in 1% then
child mortality increase by 12.86864 thousands on an average considering other factors (
Female literacy, Per capita GNP) are constant. So, from this equation we can say that only
total fertility rate has a positive relationship with child mortality rate.

Regression Statistics
Multiple R 0.864507
R Square 0.747372
Adjusted R
Square 0.73474
Standard
Error 39.13127
Observation
s 64

In this model R-Squared value is 0.747372 that means 74.74% variance of child mortality rate
explained by the three independent variable female literacy rate, per capita GNP and total
fertility rate. As R-Squared value is above 70%, we can say that the model is good fit.

Standard Lower Lower Upper


Coefficients Error t Stat P-value 95% Upper 95% 95.0% 95.0%
Intercept 168.3067 32.89165 5.117003 3.44E-06 102.5136 234.0998 102.5136 234.0998
Female
literacy
rate,
percent -1.768 0.248017 -7.12866 1.51E-09 -2.26414 -1.27192 -2.26414 -1.27192
Per capita
GNP -0.00551 0.001878 -2.93428 0.004731 -0.00927 -0.00175 -0.00927 -0.00175
Total
fertility
rate 12.86864 4.190533 3.070883 0.003205 4.486323 21.25095 4.486323 21.25095

T-test is the partial significance test or individual test. If t statistics value is greater than t critical
value then we can reject null hypothesis and accept alternative hypothesis. We can also check
probability value from level of significance and assume that t is significant or not.

Hypothesis Building:

For B1, Ho: B1 = 0 [There is no relationship between child mortality rate and female literacy]
Ha: B1 ≠ 0 α = 5%

T statistic value for b1 is -7.12866 which is negative statistics value. P value for b1 is
0.00000000151 which is less than .05 level of significance. So, we can reject null hypothesis at
5% level and accept alternative hypothesis. There is a 5% significance relationship between child
mortality rate and female literacy rate.

For B2, Ho: B1 = 0 [There is no relationship between child mortality and per capita GNP]

Ha: B1 ≠ 0 α = 5%

T statistic value for b2 is -2.93428 which is negative statistic value. P value for b2 is 0.004731
which is less than .05 level of significance. So, we can reject null hypothesis at 5% level and
accept alternative hypothesis. There is a 5% significance relationship between child mortality
rate and per capita GNP.

For B3, Ho: B3 = 0 [There is no relationship between child mortality and total fertility rate]

Ha: B3 ≠ 0 α = 5%

T statistic value for b3 is 3.070883 which is positive statistic value. P value for b3 is 0.003205
which is less than .05 level of significance. So, we can reject null hypothesis at 5% level and
accept alternative hypothesis. There is a 5% significance relationship between child mortality
rate and total fertility rate.

ANOVA

df SS MS F Significance F
Regression 3 271802.6 90600.87 59.16767 6.46E-18
Residual 60 91875.38 1531.256
Total 63 363678

F-test is an overall test. The F-test of overall significance indicates that whether your linear
regression model provides a better fit to the data than a model that contains no independent
variables. F-test is the jointly effect of that model. If F-test is less than critical value then we can
reject null hypothesis and reject alternative hypothesis. If probability value is less than α then F
test will be significance of α level.
Model Building:

Ho: B1=B2=B3= 0 [There is jointly relationship between dependent and independent variable]

Ha: B1≠B2≠B3≠ 0 α = 5%

F statistic value is 59.16767. P value is 6.46 E-18 which is less than .05 level of significance.
That means our sample data provides sufficient evidence to conclude that our regression model
fits the data. So, we can reject null hypothesis at 5% level and accept alternative hypothesis.
There is a 5% significance relationship between child mortality rate and female literacy rate, per
capita GNP, total fertility rate.

Multicollinearity by using the correlation matrix:

Child mortality, the


number of deaths of
children under age 5
in a year per 1000 live Female literacy Total fertility
births rate, percent Per capita GNP rate
Child mortality, the number of
deaths of children under age 5 in a
year per 1000 live births 1
Female literacy rate, percent -0.81828 1
Per capita GNP -0.4077 0.26853 1
Total fertility rate 0.671135 -0.62595 -0.18572 1

Considering Correlation Matrix:

R(x1, x2) = .27< .70

R(x1, x3) = - .63< - .70

R(x2, x3) = .19< .70

In this case the pair wise correlation coefficient among independent variables are less than 0.70
and all independent variables are greater than - 0.70.

So, we can say there has no multicollinearity problem.

But, if we have multicollinearity problem, we can:


• Reduce insignificant variable. (Isn’t a good solution).
• Increase size of the sample.
• Use proxy variable. (Best solution).

Heteroskedasticity:

Figure 1: Residual plot & Female literacy rate

Figure 1 : Residual Plot


Residual

-150 -100 -50 0 50 100 150


Female literacy rate

Based on the scatter plot output above, it appears that the spots are diffused and don’t form a
clear specific pattern. So it can be concleded that the regrassion model considering female
literacy rate, doesn’t occure heteroscedasticity problem unable to satisfy constant varience.

Figure2: Residual plot & GNP

Fugure 2: Residual Plot


Residual

-150 -100 -50 0 50 100 150


GNP
Based on the scatter plot output above,it appears that the spots are diffused and don’t form a
clear specific pattern. So it can be concleded that the regrassion model considering GPN ,doesn’t
occure heteroscedasticity problem unable to satisfy constant varience.

Figure3: Residual Plot and Total Fertility Rate

Figure 3: Residual plot


Residual

-150 -100 -50 0 50 100 150


Total Fertility rate

Based on the scatter plot output above,it appears that the spots are diffused and don’t form a
cleadr specific pattern. So it can be concleded that the regrassion model considering Total
fertility rate, doesn’t occure heteroscedasticity problem and unable to satisfy constant varience.

Normality Assumption:

Normal Probability Plot


children under age 5 in a
year per 1000 live births
number of deaths of
Child mortality, the

400
300
200
100
0
0.78125
5.46875

24.21875

47.65625

71.09375

94.53125
10.15625
14.84375
19.53125

28.90625
33.59375
38.28125
42.96875

52.34375
57.03125
61.71875
66.40625

75.78125
80.46875
85.15625
89.84375

99.21875

Series1

Sample Percentile
As, more than 80% Standardized residuals data of this model lies between the range of positive 2
to negative 2,we can say that this model satisfies the normal distribution.

Line Fit Model:

Per capita GNP Line Fit Female literacy rate,


children under age 5…

children under age 5…


number of deaths of

number of deaths of
Child mortality, the

Child mortality, the


Plot percent Line Fit Plot
500 500
Child mortality, Child
0 the number of mortality, the
0
deaths of number of
230
1870
1150
3020

4430

3731872279
-500 children under deaths of
Per capita GNP age 5 in a year… children…
Female literacy rate, percent

A good line fit refers to a line through a scatter plot of data points that best express the
relationship between these points. If the actual value and predicted value is close then we can say
this plot is good line fit. In this case the child mortality rate and per capita GNP which is
predicted and actual value close to each other. So, we can conclude that the child mortality
diagram against per capita GNP is good line fit.

Female literacy and child mortality also follows the good line fit.

Conclusion: In this research hypothesis we test a lot of things including T test, F test, R-
Squared, P-value approach. Then we also done post estimation analysis by heteroskedasticity test
for constant variance, Normality assumption, multicollinearity problem and at last line fit. So our
individual test and overall test both are significance at 5% level. Our R-Squared value is more
than 70% which is good fit. We also have no excess correlation problem in this model and we
don’t find any heteroskedasticity problem as well. So, our model supports the model accuracy
and model selection properties.

Thank You

You might also like