You are on page 1of 7

Madhuresh Kumar | Sushil Kumar A C | Naveen Kumar Singh | Karthik Cheboli | Yashni Nagarajan | Mintu Kumar Singh | Jitendra

Sokal

STATISTICS PROJECT-GROUP 10
Dataset: program
Hypotheses:
1. Test scores differ by participation
2. Grades increased after program participation

Summary of Data:
Number of Observations
Median Test Score
Average Test Score
Number of participants in
program
Number of grade increments
Of which, number of
participants

32
22
21.94
14
11
8

Data Source: Spector and Mazzeo (1980).

Hypothesis 1: Test Scores differ by participation


Definitions:
1. Population parameters:
a. 1 is the mean test score of students who have not participated in the program
b. 2 is the mean test score of students who have participated in the program
2. Sample statistics:
a. Sample 1: Sample of students who have not participated in the program

1 is the mean test score, 1 is the sample standard deviation, 1 is the sample size
b. Sample 2: Sample of students who have participated in the program

2 is the mean test score, 2 is the sample standard deviation, 2 is the sample size
Then,
0 : 1 2 = 0
1 : 1 2 0
Test Statistic:
=

1
(
2 )
1
1
( + )
1 2

~ 1 +2 2

where,
Pooled variance, 2 =
Calculated Values:

21.56
1
1
4.00
1
18

3.94

(1 1)12 +(2 1)22

2
2
2

1 +2 2

22.43
3.86
14
-0.62

Madhuresh Kumar | Sushil Kumar A C | Naveen Kumar Singh | Karthik Cheboli | Yashni Nagarajan | Mintu Kumar Singh | Jitendra Sokal

Here, we have done the t-test with equal population variance due to the result of the following hypothesis
testing:
2

0 : 12 = 1
2

12
1 : 2 1
2
Test Statistic:
=

12 12
~ (1 1,2 1)
22 22

= 1.0771 (when 12 = 22 )

Test Results and Interpretation


F-test:
p-value: Two tail tests- p=0.9072
p-value is greater than error at each confidence interval (0.01, 0.05, 0.1)
Therefore, we cannot reject the null hypothesis, meaning that the ratio of variances of the populations
is 1.
Variance of test scores of those who have and have not participated in the program are equal. So, we
can do the t-test with equal variances.
t-test:
p-value: Two tail tests: p=0.5388
p-value is greater than error at each confidence interval (0.01, 0.05, 0.1)
Therefore, we cannot reject the null hypothesis, meaning that test scores did not differ by
participation

Madhuresh Kumar | Sushil Kumar A C | Naveen Kumar Singh | Karthik Cheboli | Yashni Nagarajan | Mintu Kumar Singh | Jitendra Sokal

Challenges
Here, we have assumed that the population follows normal distribution. From the above k-density plot,
we can see that the sample is also approximately normal

Hypothesis 2: Grades increased after program participation


Definitions:
1. Population parameters:
a. 1 is the proportion of students whose grades have increased despite not having participated in the
program
b. 2 is the proportion of students whose grades have increased after participating in the program
2. Sample statistics:
a. Sample 1: Sample of students who have not participated in the program
1 the proportion whose grades increased, 1 is the sample size
is
b. Sample 2: Sample of students who have participated in the program
2 the proportion whose grades increased, 2 is the sample size
is
Then,
0 : 1 2 = 0
1 : 1 2 < 0
Test Statistic:
=

(
1 )
2
( (1 )1 + (1 )1 )
1
2

~ (0,1)

Since the hypothesis is that the population proportions are equal, the best estimate of overall population
proportion is a combined proportion of success, given by
=

1
1 + 2
2
1 + 2

Calculated Values
1

0.167
18
0.344

0.571
14
-2.391

Madhuresh Kumar | Sushil Kumar A C | Naveen Kumar Singh | Karthik Cheboli | Yashni Nagarajan | Mintu Kumar Singh | Jitendra Sokal

Test Results and Interpretation


z-test
p-value: Left tail test- p=0.0084
p-value is lower than each of the confidences required (0.01, 0.05, 0.1)
We can therefore say that at each of these confidences, null hypothesis is rejected. This means, we can
be more than 99% confident that grades have increased after program participation

Challenges
Here we have assumed normality even though sample sizes are not very large (18 and 14)
This may cause difference in std. error in the z-value. Accordingly, the p-value may also be inaccurate to
some degree. However the conclusion will remain the same.
To be sure, we can assume that the population follows Normal distribution instead of Bernoulli distribution.
We can use the t-test here, test statistic:
=

where, Pooled variance, 2 =


Calculated Values:
1

0.167
1
0.383
1
18

0.444

2
2

(
1 )
2
1
1
( + )
1
2

~ 1 +2 2

(1 1)12 +(2 1)22


1 +2 2

0.571
0.513
14
-2.55

t-test
p-value: Left tail test- p=0.00796
p-value is lower than each of the confidences required (0.01, 0.05, 0.1). This means, we can be more
than 99% confident that grades have increased after program participation
As we can observe, doing either the t-test or z-test does not show a great difference in p-value. Therefore, we
can conclude at 90%, 95% and 99% Confidence that grades have increased after program participation

RBI stats project 1

Sunday January 8 15:48:30 2017

Page 1

___ ____ ____ ____ ____(R)


/__
/
____/
/
____/
___/
/
/___/
/
/___/
Statistics/Data Analysis
User: Naveen Kumar Singh
Project: Team 10
___ ____ ____ ____ ____ (R)
/__
/
____/
/
____/
___/
/
/___/
/
/___/
13.1
Statistics/Data Analysis

Copyright 1985-2013 StataCorp LP


StataCorp
4905 Lakeway Drive
College Station, Texas 77845 USA
800-STATA-PC
http://www.stata.com
979-696-4600
stata@stata.com
979-696-4601 (fax)

50-student Stata lab perpetual license:


Serial number: 301306257722
Licensed to: Naveen Kumar Singh
Reserve Bank of India
Notes:
1.

You are running Small Stata.

1 . use "C:\Users\Naveen\Desktop\4thweek\stats assignment\team 10.dta"


2 . sdtest nonpart_tuce == part_tuce
Variance ratio test
Variable

Obs

Mean

Std. Err.

Std. Dev.

[95% Conf. Interval]

nonpar~e
part_t~e

18
14

21.55556
22.42857

.943579
1.030919

4.003267
3.857346

19.56478
20.20141

23.54633
24.65574

combined

32

21.9375

.6896959

3.901509

20.53086

23.34414

ratio = sd(nonpart_tuce) / sd(part_tuce)


Ho: ratio = 1
Ha: ratio < 1
Pr(F < f) = 0.5464

f =
degrees of freedom =

Ha: ratio != 1
2*Pr(F > f) = 0.9072

1.0771
17, 13

Ha: ratio > 1


Pr(F > f) = 0.4536

3 . ttest nonpart_tuce == part_tuce, unpaired


Two-sample t test with equal variances
Variable

Obs

Mean

nonpar~e
part_t~e

18
14

combined

32

diff

Std. Err.

Std. Dev.

[95% Conf. Interval]

21.55556
22.42857

.943579
1.030919

4.003267
3.857346

19.56478
20.20141

23.54633
24.65574

21.9375

.6896959

3.901509

20.53086

23.34414

-.8730159

1.404261

-3.7409

1.994868

diff = mean(nonpart_tuce) - mean(part_tuce)


Ho: diff = 0
Ha: diff < 0
Pr(T < t) = 0.2694

t =
degrees of freedom =

Ha: diff != 0
Pr(|T| > |t|) = 0.5388

-0.6217
30

Ha: diff > 0


Pr(T > t) = 0.7306

4 . sdtest nonpart_tuce == part_tuce, level(90)


Variance ratio test
Variable

Obs

Mean

Std. Err.

Std. Dev.

[90% Conf. Interval]

nonpar~e
part_t~e

18
14

21.55556
22.42857

.943579
1.030919

4.003267
3.857346

19.9141
20.60288

23.19701
24.25426

combined

32

21.9375

.6896959

3.901509

20.76811

23.10689

ratio = sd(nonpart_tuce) / sd(part_tuce)


Ho: ratio = 1

f =
degrees of freedom =

1.0771
17, 13

RBI stats project 1

Sunday January 8 15:48:30 2017

Ha: ratio < 1


Pr(F < f) = 0.5464

Page 2

Ha: ratio != 1
2*Pr(F > f) = 0.9072

Ha: ratio > 1


Pr(F > f) = 0.4536

5 . ttest nonpart_tuce == part_tuce, unpaired level(90)


Two-sample t test with equal variances
Variable

Obs

Mean

nonpar~e
part_t~e

18
14

combined

32

diff

Std. Err.

Std. Dev.

[90% Conf. Interval]

21.55556
22.42857

.943579
1.030919

4.003267
3.857346

19.9141
20.60288

23.19701
24.25426

21.9375

.6896959

3.901509

20.76811

23.10689

-.8730159

1.404261

-3.256413

1.510382

diff = mean(nonpart_tuce) - mean(part_tuce)


Ho: diff = 0
Ha: diff < 0
Pr(T < t) = 0.2694

t =
degrees of freedom =

Ha: diff != 0
Pr(|T| > |t|) = 0.5388

-0.6217
30

Ha: diff > 0


Pr(T > t) = 0.7306

6 . sdtest nonpart_tuce == part_tuce, level(99)


Variance ratio test
Variable

Obs

Mean

Std. Err.

Std. Dev.

[99% Conf. Interval]

nonpar~e
part_t~e

18
14

21.55556
22.42857

.943579
1.030919

4.003267
3.857346

18.82085
19.32316

24.29026
25.53398

combined

32

21.9375

.6896959

3.901509

20.04495

23.83005

ratio = sd(nonpart_tuce) / sd(part_tuce)


Ho: ratio = 1
Ha: ratio < 1
Pr(F < f) = 0.5464

f =
degrees of freedom =

Ha: ratio != 1
2*Pr(F > f) = 0.9072

1.0771
17, 13

Ha: ratio > 1


Pr(F > f) = 0.4536

7 . ttest nonpart_tuce == part_tuce, unpaired level(99)


Two-sample t test with equal variances
Variable

Obs

Mean

nonpar~e
part_t~e

18
14

combined

32

diff

Std. Err.

Std. Dev.

[99% Conf. Interval]

21.55556
22.42857

.943579
1.030919

4.003267
3.857346

18.82085
19.32316

24.29026
25.53398

21.9375

.6896959

3.901509

20.04495

23.83005

-.8730159

1.404261

-4.734728

2.988696

diff = mean(nonpart_tuce) - mean(part_tuce)


Ho: diff = 0
Ha: diff < 0
Pr(T < t) = 0.2694

t =
degrees of freedom =

Ha: diff != 0
Pr(|T| > |t|) = 0.5388

-0.6217
30

Ha: diff > 0


Pr(T > t) = 0.7306

8 . prtest nonpart_inc == part_inc


Two-sample test of proportions

Variable

Mean

Std. Err.

nonpart_inc
part_inc

.1666667
.5714286

.087841
.13226

diff

-.4047619
under Ho:

.1587727
.1692508

nonpart_inc: Number of obs =


part_inc: Number of obs =
z

-2.39

P>|z|

18
14

[95% Conf. Interval]


-.0054986
.3122037

.338832
.8306534

-.7159506

-.0935732

0.017

diff = prop(nonpart_inc) - prop(part_inc)

z =

-2.3915

RBI stats project 1

Sunday January 8 15:48:30 2017

Page 3

Ho: diff = 0
Ha: diff < 0
Pr(Z < z) = 0.0084

Ha: diff != 0
Pr(|Z| > |z|) = 0.0168

Ha: diff > 0


Pr(Z > z) = 0.9916

9 . prtest nonpart_inc == part_inc, level(90)


Two-sample test of proportions

Variable

Mean

nonpart_inc
part_inc

.1666667
.5714286

.087841
.13226

diff

-.4047619
under Ho:

.1587727
.1692508

nonpart_inc: Number of obs =


part_inc: Number of obs =

Std. Err.

-2.39

P>|z|

[90% Conf. Interval]


.022181
.3538802

.3111523
.7889769

-.6659197

-.1436041

0.017

diff = prop(nonpart_inc) - prop(part_inc)


Ho: diff = 0
Ha: diff < 0
Pr(Z < z) = 0.0084

18
14

Ha: diff != 0
Pr(|Z| > |z|) = 0.0168

z =

-2.3915

Ha: diff > 0


Pr(Z > z) = 0.9916

10 . prtest nonpart_inc == part_inc, level(99)


Two-sample test of proportions

Variable

Mean

Std. Err.

nonpart_inc
part_inc

.1666667
.5714286

.087841
.13226

diff

-.4047619
under Ho:

.1587727
.1692508

nonpart_inc: Number of obs =


part_inc: Number of obs =
z

-2.39

P>|z|

11 .
>
>
>

[99% Conf. Interval]


-.0595969
.2307494

.3929302
.9121078

-.8137332

.0042094

0.017

diff = prop(nonpart_inc) - prop(part_inc)


Ho: diff = 0
Ha: diff < 0
Pr(Z < z) = 0.0084

18
14

Ha: diff != 0
Pr(|Z| > |z|) = 0.0168

z =

-2.3915

Ha: diff > 0


Pr(Z > z) = 0.9916

twoway (function y=tden(31,x), range(-5 -0.6247) color(ltblue) recast(area)) (function y=tden


olor(ltblue) recast(area)) (function y=tden(31,x), range(-5 5)), legend(off) plotregion(margi
title("t") text(0 -0.6247 "-0.6247", place(s)) text(0 0.6247 "0.6247", place(s)) title("Two-t
31), alpha=0.05")

12 . twoway (function y=Fden(13,17,x), range(0.9287 3) color(ltblue) recast(area)) (function y=Fde


> legend(off) plotregion(margin(zero)) ytitle("f(t)") xtitle("t") text(0 0.9287 "0.9287", plac
> ection region" "F(13,17), alpha=0.05")

13 . twoway (function y=normalden(x), range(-5 -2.3915) color(ltblue) recast(area)) (function y=no


> legend(off) plotregion(margin(zero)) ytitle("f(z)") xtitle("z") text(0 -2.3915 "-2.3915", pl
> ejection region" "z, alpha=0.05")
14 . kdensity tuce, kernel(epanechnikov) normal
(n() set to 32)
15 . graph save Graph "C:\Users\Naveen\Desktop\4thweek\stats assignment\Graph5.gph", replace
(file C:\Users\Naveen\Desktop\4thweek\stats assignment\Graph5.gph saved)
save "C:\Users\Naveen\Desktop\4thweek\stats assignment\team 10.dta", replace
file C:\Users\Naveen\Desktop\4thweek\stats assignment\team 10.dta saved
16 .