Professional Documents
Culture Documents
2015 82.8 0 93 74 93 93
2006 81 10.31 95 0 85 83
https://www.kaggle.com/datasets/kumarajarshi/life-expectancy-who
Answer:
Command given in R-Studio
1.data=read.csv(choose.files())
Output results:
> data=read.csv(choose.files())
> data
y x1 x2 x3 x4 x5
1 82.8 0.00 93 74 93 93
2 82.7 9.71 91 340 92 91
3 82.3 9.87 91 158 91 92
4 82.0 10.03 91 199 92 92
5 81.9 10.30 92 190 92 92
6 81.8 10.52 92 70 89 86
7 81.7 10.62 94 104 86 83
8 81.3 10.76 94 65 83 83
9 81.2 10.56 94 11 83 85
10 81.0 10.31 95 0 85 83
2.y=data$y
Output results
> y=data$y
> data
y x1 x2 x3 x4 x5
1 82.8 0.00 93 74 93 93
2 82.7 9.71 91 340 92 91
3 82.3 9.87 91 158 91 92
4 82.0 10.03 91 199 92 92
5 81.9 10.30 92 190 92 92
6 81.8 10.52 92 70 89 86
7 81.7 10.62 94 104 86 83
8 81.3 10.76 94 65 83 83
9 81.2 10.56 94 11 83 85
10 81.0 10.31 95 0 85 83
Commands:
3.lifexpectancy=data$y;alcohol=data$x1;hep.b=data$x2;measels=data$x3;polio=data$x4;diphtheria=data$x5
4.reg=lm(lifexpectancy~alcohol+hep.b+measels+polio+diphtheria)
5.summary(reg)
Output results:
> lifexpectancy=data$y;alcohol=data$x1;hep.b=data$x2;measels=data$x3;polio=data$x4;diphtheria=data$x5
> reg=lm(lifexpectancy~alcohol+hep.b+measels+polio+diphtheria)
> summary(reg)
Call:
lm(formula = lifexpectancy ~ alcohol + hep.b + measels + polio +
diphtheria)
Residuals:
1 2 3 4 5 6 7 8 9 10
-0.0026923 0.0005238 0.1958168 -0.2024294 0.0114413 -0.0270730 0.1471499 -0.1007616 0.0302466 -0.0522222
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.101e+02 1.360e+01 8.097 0.00126 **
alcohol -1.490e-01 3.262e-02 -4.568 0.01028 *
hep.b -2.485e-01 1.074e-01 -2.313 0.08178 .
measels 2.798e-03 9.937e-04 2.816 0.04804 *
polio 7.370e-03 4.476e-02 0.165 0.87719
diphtheria -5.486e-02 4.046e-02 -1.356 0.24666
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.17 on 4 degrees of freedom
Multiple R-squared: 0.9652, Adjusted R-squared: 0.9217
F-statistic: 22.17 on 5 and 4 DF, p-value: 0.005121
1.Interpretation:
According to the results Pr(>|t|) values of x1 alcohol is 0.01028 and x3 measles is 0.04804 which is
probability value indicates that these 2 variables are best fitted model as they have directly affected the y= life
expectancy ratio of Australia population from 2015 to 2006
2.Significance of model:
Hypothesis:
Ho:β1=β2=β3=β4=β5
H1: At least one regression coefficient is significant
P-value: 0.005121 which means all the independent regression coefficients are equal to level of
significance so, we accept the Ho and reject the H1.
3.BEST FITTED MODEL VALUES ACCORDING TO Pr(>|t|)
Pr(>|t|) : x1 alcohol =0.01028, x3 measles = 0.04804
Hypothesis:
Ho:β1=β3
H1: at least one regression coefficient is significant
p-value of β1 and β2 is less than 0.05 so we accept the Ho and reject the H1, which means alcohol and
measles affect the life expectancy of Australian population in year between 2015-2006. These are best fitted
model of multiple regression.
These values are best fitted model values as they are less than probability of significance.
4.Goodness/fitted model:
Multiple R-squared: 0.9652, Adjusted R-squared: 0.9217
R-squared value is close to “1” so it interpret that our data is fitted model.
Commands:
6.reg$residuals
7.sum(reg$residuals)
8.reg$fitted.values
9.reg=lm(y-x1+x2+x3+x4)
Output of commands:
> reg$residuals
1 2 3 4 5 6 7 8
-0.002692313 0.000523759 0.195816814 -0.202429400 0.011441349 -0.027073038 0.147149935 -0.100761586
9 10
0.030246648 -0.052222170
> sum(reg$residuals)
[1] -3.361027e-18
> reg$fitted.values
1 2 3 4 5 6 7 8 9 10
82.80269 82.69948 82.10418 82.20243 81.88856 81.82707 81.55285 81.40076 81.16975 81.05222
5.Interpretation:
1. > reg$residuals: it is error values in each regression coefficient of model
2. > sum(reg$residuals): it is total residual error in model.
3. > reg$fitted.values : these are best fitted values all regression coefficient of model.
commands
10.plot(alcohol,measels)
11.cor(alcohol,measel)
Output of commands:
> plot(alcohol,measels)
> cor(alcohol,measels)
[1] 0.08143148
Interpretation of commands:
Strength of Co=relation: -1≤ r ≤1
Answer:
Commands:
1.aus_life_expectancy=c(82.8,82.7,82.5,82.3,82,81.9,81.7,81.3,81.3)
2.life_expectancy=c(82.8,82.7,82.5,82.3,82,81.9,81.7,81.3,81.3)
3.hist(life_expectancy)
Results of commands:
> aus_life_expectancy=c(82.8,82.7,82.5,82.3,82,81.9,81.7,81.3,81.3)
> life_expectancy=c(82.8,82.7,82.5,82.3,82,81.9,81.7,81.3,81.3)
> hist(life_expectancy)
> shapiro.test(life_expectancy)
Commands :
4.shapiro.test(life_expectancy)
5.t.test(life_expectancy,mu=0)
Results of commands:
Shapiro-Wilk normality test
data: life_expectancy
W = 0.93244, p-value = 0.505
> t.test(life_expectancy,mu=0)
data: life_expectancy
t = 438.41, df = 8, p-value < 2.2e-16
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
81.62395 82.48716
sample estimates:
mean of x
82.05556
Interpretation:
2. Normality of data:
p-value = 0.505 which means our data is normally distributed as level of significance is greater than 0.05 so accept Ho and
reject H1.
3. T-test:Level of significance is p-value < 2.2e-16 which is equal to -10.0197799774 as it is less than 0.05 so, we accept H1
and reject Ho. Resultantly the life expectancy of the Australian population mean from the year between 2015 to 2007 is not
equal to 82 years.
Ho: mu=82
H1: mu82 accept.
Degree of freedom: v= n-1=8 means 1 parameter is independent.
Lower confidence interval=81.62395 upper confidence interval=82.48716
The value of the mean difference is between LCI and UCI.
commands:
#training program before ICT knowledge#
1.pre<-c(12,14,13,11,12,10,15,13,9,14)
#training program after ICT knowledge#
2.post<-c(13,15,13,12,13,11,16,13,8,14)
3.d=(pre-post)
4.shapiro.test(d)
data: d
W = 0.73087, p-value = 0.002088
Interpretations:
Assumptions:
1. To check normality and mean difference:
P=0.002 ,its mean difference(d) is not normally distributed between the samples, as significance level
is less than 0.05 so we accept H1 and reject Ho.
Hypothesis :
Ho:d=0
H1:d≠0
Hypothesis :
Ho: m=0
H1: m≠0
p-value = 0.05218,the level of significance is equal to 0.05 so, we Ho and reject H1, which means the ICT training
program pre and post session has significant effect on knowledge of population.
ANSWER:
Interpretation of above data by using R-studio commands:
1.aus_life_expectancy<-c(82.8,82.7,82.5,82.3,82,81.9,81.7,81.3,81.3)
2.pak_life_expectancy<-c(66.4,66.2,66,65.7,65.5,65.1,64.8,64.6,64.4)
3.life_expectancy=c(aus_life_expectancy,pak_life_expectancy)
4.group=rep(c("aus_life_expectancy","pak_life_expectancy"),each=9)
5.data.frame(aus_life_expectancy,pak_life_expectancy)
RESULTS:
> aus_life_expectancy<-c(82.8,82.7,82.5,82.3,82,81.9,81.7,81.3,81.3)
> pak_life_expectancy<-c(66.4,66.2,66,65.7,65.5,65.1,64.8,64.6,64.4)
> life_expectancy=c(aus_life_expectancy,pak_life_expectancy)
> group=rep(c("aus_life_expectancy","pak_life_expectancy"),each=9)
> data.frame(aus_life_expectancy,pak_life_expectancy)
aus_life_expectancy pak_life_expectancy
1 82.8 66.4
2 82.7 66.2
3 82.5 66.0
4 82.3 65.7
5 82.0 65.5
6 81.9 65.1
7 81.7 64.8
8 81.3 64.6
9 81.3 64.4
1. 1st Assumption to check Normality of data by Shapiro-test
COMMANDS:-
1.my_data=data.frame(group,life_expectancy)
2.shapiro.test(aus_life_expectancy)
3.shapiro.test(pak_life_expectancy)
RESULTS:
> my_data=data.frame(group,life_expectancy)
> shapiro.test(aus_life_expectancy)
data: aus_life_expectancy
W = 0.93244, p-value = 0.505
> shapiro.test(pak_life_expectancy)
data: pak_life_expectancy
W = 0.94637, p-value = 0.6503
Command:
1.var.test(life_expectancy~group,data=my_data)
Results:
> var.test(life_expectancy~group,data=my_data)
As, is according to the above 2 assumptions the result is shows both a variance means and normality is
accepted by Null hypothesis and our data is normally distributes, so now we go to the final Test which is T-
test for independent sample T testing.
3.T-test of the independent sample:
Command:
t.test(aus_life_expectancy,pak_life_expectancy,var.equal=TRUE)
Results:
> t.test(aus_life_expectancy,pak_life_expectancy,var.equal=TRUE)
Commands:
1.a=c(74.8,74.6,74.4,74.4,73.9)
2.b=c(82.8,82.7,82.5,82.3,82)
3.c=c(81.5,81.4,81.1,88,88)
4.countaries=c(74.8,74.6,74.4,74.4,73.9,82.8,82.7,82.5,82.3,82,81.5,81.4,81.1,88,88)
5.group=rep(c("armania","australia","austria"),each=5)
6.dat=data.frame(countaries,group)
7.anova=aov(countaries~group,data=dat)
8.summary(anova)
9.TukeyHSD(anova)
10.library(car)
11.leveneTest(countaries~group,data=dat)
12. shapiro.test(anova$residuals)
Commands result:
> a=c(74.8,74.6,74.4,74.4,73.9)
> b=c(82.8,82.7,82.5,82.3,82)
> c=c(81.5,81.4,81.1,88,88)
> countaries=c(74.8,74.6,74.4,74.4,73.9,82.8,82.7,82.5,82.3,82,81.5,81.4,81.1,88,88)
> group=rep(c("armania","australia","austria"),each=5)
> dat=data.frame(countaries,group)
> dat
countaries group
1 74.8 armania
2 74.6 armania
3 74.4 armania
4 74.4 armania
5 73.9 armania
6 82.8 australia
7 82.7 australia
8 82.5 australia
9 82.3 australia
10 82.0 australia
11 81.5 austria
12 81.4 austria
13 81.1 austria
14 88.0 austria
15 88.0 austria
> anova=aov(countaries~group,data=dat)
> anova
Call:
aov(formula = countaries ~ group, data = dat)
Terms:
group Residuals
Sum of Squares 264.6493 54.2800
Deg. of Freedom 2 12
$group
diff lwr upr p adj
australia-armania 8.04 4.451418 11.628582 0.0001761
austria-armania 9.58 5.991418 13.168582 0.0000334
austria-australia 1.54 -2.048582 5.128582 0.5063531
> library(car)
> leveneTest(countaries~group,data=dat)
Levene's Test for Homogeneity of Variance (center = median)
Df F value Pr(>F)
group 2 2.5129 0.1226
12
> shapiro.test(anova$residuals)
data: anova$residuals
W = 0.83775, p-value = 0.0117
Interpretations:
1.HYPOTHESIS:
Ho:µ1=µ2=µ3
H1: atleast one mean is significant.
Significant value is 2.43e-05, which is 0.016, it is greater than 0.05, so we accept Ho and reject H1.
it means that population means are statistically significant.
3. Between the samples:
TukeyHSD(anova)
1.australia-armania : adj-p= 0.0001761; its p value is less than 0.05,so we accept H1 and reject
HO.
µ1≠µ2 we accept H1.
2.austria-armania : adj-p= 0.0000334; its p value is less than 0.05,so we accept H1 and reject
HO.
µ1≠µ3 we accept H1
3.austria-australia: adj-p=0.5063531; Its p-values is greater than 0.05 so, we accept Ho and
reject H1.
µ3=µ2 WE accept Ho
There means are statistically significant
p-value is 0.1226, which means it is greater than 0.05 value of significance so we accept Ho and
reject H1.
5. Normality of data:
p-value = 0.0117<0.05 so we accept H1 and reject Ho, data is not normally distributed.
REFERENCES:
https://www.kaggle.com/datasets/kumarajarshi/life-expectancy-who
https://statisticstechs.weebly.com/inferential-statistics/paired-sample-t-test-or-repeated-measures