Sokal

STATISTICS PROJECT-GROUP 10

Dataset: program

Hypotheses:

1. Test scores differ by participation

2. Grades increased after program participation

Summary of Data:

Number of Observations

Median Test Score

Average Test Score

Number of participants in

program

Number of grade increments

Of which, number of

participants

32

22

21.94

14

11

8

Definitions:

1. Population parameters:

a. 1 is the mean test score of students who have not participated in the program

b. 2 is the mean test score of students who have participated in the program

2. Sample statistics:

a. Sample 1: Sample of students who have not participated in the program

1 is the mean test score, 1 is the sample standard deviation, 1 is the sample size

b. Sample 2: Sample of students who have participated in the program

2 is the mean test score, 2 is the sample standard deviation, 2 is the sample size

Then,

0 : 1 2 = 0

1 : 1 2 0

Test Statistic:

=

1

(

2 )

1

1

( + )

1 2

~ 1 +2 2

where,

Pooled variance, 2 =

Calculated Values:

21.56

1

1

4.00

1

18

3.94

2

2

2

1 +2 2

22.43

3.86

14

-0.62

Madhuresh Kumar | Sushil Kumar A C | Naveen Kumar Singh | Karthik Cheboli | Yashni Nagarajan | Mintu Kumar Singh | Jitendra Sokal

Here, we have done the t-test with equal population variance due to the result of the following hypothesis

testing:

2

0 : 12 = 1

2

12

1 : 2 1

2

Test Statistic:

=

12 12

~ (1 1,2 1)

22 22

= 1.0771 (when 12 = 22 )

F-test:

p-value: Two tail tests- p=0.9072

p-value is greater than error at each confidence interval (0.01, 0.05, 0.1)

Therefore, we cannot reject the null hypothesis, meaning that the ratio of variances of the populations

is 1.

Variance of test scores of those who have and have not participated in the program are equal. So, we

can do the t-test with equal variances.

t-test:

p-value: Two tail tests: p=0.5388

p-value is greater than error at each confidence interval (0.01, 0.05, 0.1)

Therefore, we cannot reject the null hypothesis, meaning that test scores did not differ by

participation

Madhuresh Kumar | Sushil Kumar A C | Naveen Kumar Singh | Karthik Cheboli | Yashni Nagarajan | Mintu Kumar Singh | Jitendra Sokal

Challenges

Here, we have assumed that the population follows normal distribution. From the above k-density plot,

we can see that the sample is also approximately normal

Definitions:

1. Population parameters:

a. 1 is the proportion of students whose grades have increased despite not having participated in the

program

b. 2 is the proportion of students whose grades have increased after participating in the program

2. Sample statistics:

a. Sample 1: Sample of students who have not participated in the program

1 the proportion whose grades increased, 1 is the sample size

is

b. Sample 2: Sample of students who have participated in the program

2 the proportion whose grades increased, 2 is the sample size

is

Then,

0 : 1 2 = 0

1 : 1 2 < 0

Test Statistic:

=

(

1 )

2

( (1 )1 + (1 )1 )

1

2

~ (0,1)

Since the hypothesis is that the population proportions are equal, the best estimate of overall population

proportion is a combined proportion of success, given by

=

1

1 + 2

2

1 + 2

Calculated Values

1

0.167

18

0.344

0.571

14

-2.391

Madhuresh Kumar | Sushil Kumar A C | Naveen Kumar Singh | Karthik Cheboli | Yashni Nagarajan | Mintu Kumar Singh | Jitendra Sokal

z-test

p-value: Left tail test- p=0.0084

p-value is lower than each of the confidences required (0.01, 0.05, 0.1)

We can therefore say that at each of these confidences, null hypothesis is rejected. This means, we can

be more than 99% confident that grades have increased after program participation

Challenges

Here we have assumed normality even though sample sizes are not very large (18 and 14)

This may cause difference in std. error in the z-value. Accordingly, the p-value may also be inaccurate to

some degree. However the conclusion will remain the same.

To be sure, we can assume that the population follows Normal distribution instead of Bernoulli distribution.

We can use the t-test here, test statistic:

=

Calculated Values:

1

0.167

1

0.383

1

18

0.444

2

2

(

1 )

2

1

1

( + )

1

2

~ 1 +2 2

1 +2 2

0.571

0.513

14

-2.55

t-test

p-value: Left tail test- p=0.00796

p-value is lower than each of the confidences required (0.01, 0.05, 0.1). This means, we can be more

than 99% confident that grades have increased after program participation

As we can observe, doing either the t-test or z-test does not show a great difference in p-value. Therefore, we

can conclude at 90%, 95% and 99% Confidence that grades have increased after program participation

Notes:

1.

2 . sdtest nonpart_tuce == part_tuce

Variance ratio test

Variable

Obs

Mean

Std. Err.

Std. Dev.

nonpar~e

part_t~e

18

14

21.55556

22.42857

.943579

1.030919

4.003267

3.857346

19.56478

20.20141

23.54633

24.65574

combined

32

21.9375

.6896959

3.901509

20.53086

23.34414

Ho: ratio = 1

Ha: ratio < 1

Pr(F < f) = 0.5464

f =

degrees of freedom =

Ha: ratio != 1

2*Pr(F > f) = 0.9072

1.0771

17, 13

Pr(F > f) = 0.4536

Two-sample t test with equal variances

Variable

Obs

Mean

nonpar~e

part_t~e

18

14

combined

32

diff

Std. Err.

Std. Dev.

21.55556

22.42857

.943579

1.030919

4.003267

3.857346

19.56478

20.20141

23.54633

24.65574

21.9375

.6896959

3.901509

20.53086

23.34414

-.8730159

1.404261

-3.7409

1.994868

Ho: diff = 0

Ha: diff < 0

Pr(T < t) = 0.2694

t =

degrees of freedom =

Ha: diff != 0

Pr(|T| > |t|) = 0.5388

-0.6217

30

Pr(T > t) = 0.7306

Variance ratio test

Variable

Obs

Mean

Std. Err.

Std. Dev.

nonpar~e

part_t~e

18

14

21.55556

22.42857

.943579

1.030919

4.003267

3.857346

19.9141

20.60288

23.19701

24.25426

combined

32

21.9375

.6896959

3.901509

20.76811

23.10689

Ho: ratio = 1

f =

degrees of freedom =

1.0771

17, 13

Pr(F < f) = 0.5464

Ha: ratio != 1

2*Pr(F > f) = 0.9072

Pr(F > f) = 0.4536

Two-sample t test with equal variances

Variable

Obs

Mean

nonpar~e

part_t~e

18

14

combined

32

diff

Std. Err.

Std. Dev.

21.55556

22.42857

.943579

1.030919

4.003267

3.857346

19.9141

20.60288

23.19701

24.25426

21.9375

.6896959

3.901509

20.76811

23.10689

-.8730159

1.404261

-3.256413

1.510382

Ho: diff = 0

Ha: diff < 0

Pr(T < t) = 0.2694

t =

degrees of freedom =

Ha: diff != 0

Pr(|T| > |t|) = 0.5388

-0.6217

30

Pr(T > t) = 0.7306

Variance ratio test

Variable

Obs

Mean

Std. Err.

Std. Dev.

nonpar~e

part_t~e

18

14

21.55556

22.42857

.943579

1.030919

4.003267

3.857346

18.82085

19.32316

24.29026

25.53398

combined

32

21.9375

.6896959

3.901509

20.04495

23.83005

Ho: ratio = 1

Ha: ratio < 1

Pr(F < f) = 0.5464

f =

degrees of freedom =

Ha: ratio != 1

2*Pr(F > f) = 0.9072

1.0771

17, 13

Pr(F > f) = 0.4536

Two-sample t test with equal variances

Variable

Obs

Mean

nonpar~e

part_t~e

18

14

combined

32

diff

Std. Err.

Std. Dev.

21.55556

22.42857

.943579

1.030919

4.003267

3.857346

18.82085

19.32316

24.29026

25.53398

21.9375

.6896959

3.901509

20.04495

23.83005

-.8730159

1.404261

-4.734728

2.988696

Ho: diff = 0

Ha: diff < 0

Pr(T < t) = 0.2694

t =

degrees of freedom =

Ha: diff != 0

Pr(|T| > |t|) = 0.5388

-0.6217

30

Pr(T > t) = 0.7306

Two-sample test of proportions

Variable

Mean

Std. Err.

nonpart_inc

part_inc

.1666667

.5714286

.087841

.13226

diff

-.4047619

under Ho:

.1587727

.1692508

part_inc: Number of obs =

z

-2.39

P>|z|

18

14

-.0054986

.3122037

.338832

.8306534

-.7159506

-.0935732

0.017

z =

-2.3915

Ho: diff = 0

Ha: diff < 0

Pr(Z < z) = 0.0084

Ha: diff != 0

Pr(|Z| > |z|) = 0.0168

Pr(Z > z) = 0.9916

Two-sample test of proportions

Variable

Mean

nonpart_inc

part_inc

.1666667

.5714286

.087841

.13226

diff

-.4047619

under Ho:

.1587727

.1692508

part_inc: Number of obs =

Std. Err.

-2.39

P>|z|

.022181

.3538802

.3111523

.7889769

-.6659197

-.1436041

0.017

Ho: diff = 0

Ha: diff < 0

Pr(Z < z) = 0.0084

18

14

Ha: diff != 0

Pr(|Z| > |z|) = 0.0168

z =

-2.3915

Pr(Z > z) = 0.9916

Two-sample test of proportions

Variable

Mean

Std. Err.

nonpart_inc

part_inc

.1666667

.5714286

.087841

.13226

diff

-.4047619

under Ho:

.1587727

.1692508

part_inc: Number of obs =

z

-2.39

P>|z|

11 .

>

>

>

-.0595969

.2307494

.3929302

.9121078

-.8137332

.0042094

0.017

Ho: diff = 0

Ha: diff < 0

Pr(Z < z) = 0.0084

18

14

Ha: diff != 0

Pr(|Z| > |z|) = 0.0168

z =

-2.3915

Pr(Z > z) = 0.9916

olor(ltblue) recast(area)) (function y=tden(31,x), range(-5 5)), legend(off) plotregion(margi

title("t") text(0 -0.6247 "-0.6247", place(s)) text(0 0.6247 "0.6247", place(s)) title("Two-t

31), alpha=0.05")

> legend(off) plotregion(margin(zero)) ytitle("f(t)") xtitle("t") text(0 0.9287 "0.9287", plac

> ection region" "F(13,17), alpha=0.05")

> legend(off) plotregion(margin(zero)) ytitle("f(z)") xtitle("z") text(0 -2.3915 "-2.3915", pl

> ejection region" "z, alpha=0.05")

14 . kdensity tuce, kernel(epanechnikov) normal

(n() set to 32)

15 . graph save Graph "C:\Users\Naveen\Desktop\4thweek\stats assignment\Graph5.gph", replace

(file C:\Users\Naveen\Desktop\4thweek\stats assignment\Graph5.gph saved)

save "C:\Users\Naveen\Desktop\4thweek\stats assignment\team 10.dta", replace

file C:\Users\Naveen\Desktop\4thweek\stats assignment\team 10.dta saved

