You are on page 1of 8

Altynai Kubatova Fin-18

Homework Assignment 2

1. T-test:
Firstly as important grade in university/college, Cumulative GDP is chosen as dependent variable,
which shows students’ study performance.
1
.8 .6
Density
.4
.2
0

0 1 2 3 4
CumGPA

As histogram shows, Cum GPA is normally data distributed. According the National score of
Cumulative GPA in the world, universities and colleges usually require GPAs higher than 3.0.
Therefore, for null hypothesis it would be greater or equal to 3.0.
H0: μ ≥ 3.0 (null hypothesis)
H1: μ<3.0 (alternative hypothesis)

a. One-Sample Test for the Mean (σ Unknown)

ttest CumGPA ==3.0

One-sample t test
------------------------------------------------------------------------------
Variable | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]
---------+--------------------------------------------------------------------
CumGPA | 732 2.080861 .0365773 .9896168 2.009052 2.15267
------------------------------------------------------------------------------
mean = mean(CumGPA) t = -25.1287
Ho: mean = 3.0 degrees of freedom = 731

Ha: mean < 3.0 Ha: mean != 3.0 Ha: mean > 3.0
Pr(T < t) = 0.0000 Pr(|T| > |t|) = 0.0000 Pr(T > t) = 1.0000
Our null hypothesis is a greater or equal to 3.0 and it is one-tailed. Mean is 2.08, which is less than
3.0

Decision making: critical value is +-1.96


Reject H0, if Tstat >+1.96 or Tstat<-1.96. Our T value is less than -1.96, which is in rejection region.
So, we reject our null hypothesis (μ≥3.0), because Cum GPA’s mean is not equal to 3.0.

P-value approach:
P-value≥α , fail to reject H0 and P-value<α, reject H0.
Our p-value is 0.0000, and our alpha is 0.05 by having 95% Confidence interval. So, we will reject
null hypothesis (μ≥3.0), because real Cum GPA’s mean is different.
b. Comparing the Means of Two Independent Populations
H0: μ1-μ2=0
H1: μ2-μ2≠0
For two independent variables, Rank in HS and Football Player was chosen.
sdtest HSRank , by( FootballPlayer1 )

Variance ratio test


------------------------------------------------------------------------------
Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]
---------+--------------------------------------------------------------------
0 | 494 100.5951 4.784699 106.3453 91.19422 109.9961
1 | 238 123.0756 7.453738 114.9907 108.3916 137.7597
---------+--------------------------------------------------------------------
combined | 732 107.9044 4.053144 109.6598 99.94718 115.8616
------------------------------------------------------------------------------
ratio = sd(0) / sd(1) f = 0.8553
Ho: ratio = 1 degrees of freedom = 493, 237

Ha: ratio < 1 Ha: ratio != 1 Ha: ratio > 1


Pr(F < f) = 0.0774 2*Pr(F < f) = 0.1548 Pr(F > f) = 0.9226

Variance ratio shows difference between 2 standard deviation for determining unequal and equal
variance. Here null hypothesis must equal to 1, which says that equal variances. If we divide sd(0)
by sd(1), ratio will be equal 0.9161. p-value is greater than alpha, 2*Pr(F < f) = 0.1548 > 0.05, fail
to reject H0. Because ratio is 0.9161, and it is near to 1 (0.9161≈1). It means, that 2 variances are
equal.
ttest HSRank , by( FootballPlayer1 )

Two-sample t test with equal variances


------------------------------------------------------------------------------
Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]
---------+--------------------------------------------------------------------
0 | 494 100.5951 4.784699 106.3453 91.19422 109.9961
1 | 238 123.0756 7.453738 114.9907 108.3916 137.7597
---------+--------------------------------------------------------------------
combined | 732 107.9044 4.053144 109.6598 99.94718 115.8616
---------+--------------------------------------------------------------------
diff | -22.48049 8.618545 -39.40058 -5.560397
------------------------------------------------------------------------------
diff = mean(0) - mean(1) t = -2.6084
Ho: diff = 0 degrees of freedom = 730

Ha: diff < 0 Ha: diff != 0 Ha: diff > 0


Pr(T < t) = 0.0046 Pr(|T| > |t|) = 0.0093 Pr(T > t) = 0.9954

T-test is -2.6084. Reject H0, if Tstat >+1.96 or Tstat<-1.96. Therefore, reject H0, where T-test is less
than -1.96.
P-value=0.0093, it is less than alpha (0.05). Difference of two means is-22.48049, which is not
equal to zero (null hypothesis). So, reject H0. During the comparing, we got a big difference
between two means.
c. Comparing the Means of Two Related/Matched Populations
As two related variables, Cumulative GDP and Term GDP is chosen, and it collected from one
place.
ttest CumGPA = TermGPA

Paired t test
------------------------------------------------------------------------------

Variable | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]

---------+--------------------------------------------------------------------

CumGPA | 732 2.080861 .0365773 .9896168 2.009052 2.15267

TermGPA | 732 2.330246 .0280262 .7582622 2.275225 2.385267

---------+--------------------------------------------------------------------

diff | 732 -.2493852 .0381588 1.032405 -.3242991 -.1744714

------------------------------------------------------------------------------

mean(diff) = mean(CumGPA - TermGPA) t = -6.5355

Ho: mean(diff) = 0 degrees of freedom = 731

Ha: mean(diff) < 0 Ha: mean(diff) != 0 Ha: mean(diff) > 0

Pr(T < t) = 0.0000 Pr(|T| > |t|) = 0.0000 Pr(T > t) = 1.0000

According the critical value, H0 is reject, because t=-6.5355 is a greater than critical value =-1.96.
P-value is 0.0000, this is also reject H0 because p-value is less than 0.05 (alpha). If look mean
difference, so it is -0.2494, which is not related with null hypothesis (mean differ=0)

2. Z-test:
.003
.002
Density
.001
0

400 600 800 1000 1200 1400 400 600 800 1,000 1,200 1,400
SAT Score SAT Score

For dependent variable, it was chosen SAT score, because nowadays SAT is more important and
more valuable admissions.
According the histogram and box plot, data is normally distributed (normal distribution), because
box plot shows us easy way to see mean, outliers and quartile. Therefore, we sure that data set is
following bell-shaped symmetrical curve
Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
SATScore | 732 898.9071 168.1912 450 1430
For evaluate null hypothesis, look the average SAT score in the world, it is around 1000. (Source
from blog.prepscholar.com › what-is-the-average-sat-score)

H0: μ ≥1000 (null hypothesis)


H1: μ <1000 (alternative hypothesis)
a. One-Sample Test for the Mean (σ known, use S given the N is large)
sdtest SATScore, by( SemesterF1S2 ) Variance ratio
test
------------------------------------------------------------------------------
Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]
---------+--------------------------------------------------------------------
1 | 366 898.9071 8.797513 168.3063 881.6069 916.2073
2 | 366 898.9071 8.797513 168.3063 881.6069 916.2073
---------+--------------------------------------------------------------------
combined | 732 898.9071 6.216525 168.1912 886.7027 911.1115
------------------------------------------------------------------------------
ratio = sd(1) / sd(2) f = 1.0000
Ho: ratio = 1 degrees of freedom = 365, 365

Ha: ratio < 1 Ha: ratio != 1 Ha: ratio > 1


Pr(F < f) = 0.5000 2*Pr(F > f) = 1.0000 Pr(F > f) = 0.5000

For get standard deviation, we combine SAT with Semester. In this case, it shows what SAT
score get student in fall and spring semester. St. Dev=168.1912
ztest SATScore==1000, sd(168.1912)

One-sample z test
------------------------------------------------------------------------------
Variable | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]
---------+--------------------------------------------------------------------
SATScore | 732 898.9071 6.216526 168.1912 886.7229 911.0913
------------------------------------------------------------------------------
mean = mean(SATScore) z = -16.2620
Ho: mean = 1000

Ha: mean < 1000 Ha: mean != 1000 Ha: mean > 1000
Pr(Z < z) = 0.0000 Pr(|Z| > |z|) = 0.0000 Pr(Z > z) = 1.0000

Our null hypothesis is a greater or equal to 1000 and it is one-tailed. Alternative hypothesis
is highlighted yellow color. Stata gives us above results. According the stata analysis, mean
is 898.9071, which is not the same value as null hypothesis. Z-statistic is -16.2620.
H0: μ ≥1000 (null hypothesis)
H1: μ <1000 (alternative hypothesis)
Decision making: critical value is +-1.96
Reject H0, if Zstst >+1.96 or Zstst<-1.96. Our Z value is less than -1.96, which is in rejection
region. So, we reject our null hypothesis (μ=1000), because SAT score’s mean is not equal
to 1000, real mean is 898.9071.

P-value approach:
P-value≥α , fail to reject H0 and P-value<α, reject H0.
Our p-value is 0.0000, and our alpha is 0.05 by having 95% Confidence interval. So, we will
reject null hypothesis (μ=1000), because real SAT score’s mean is different.

b. Comparing the Means of Two Independent Populations


H0: μ1-μ2=0
H1: μ2-μ2≠0
VerdivMath and Race are two independent population, and they are not related.
sdtest VerdivMath , by( White1 )

Variance ratio test


------------------------------------------------------------------------------
Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]
---------+--------------------------------------------------------------------
0 | 178 .852694 .0120807 .1611768 .8288533 .8765348
1 | 554 .855089 .006456 .1519552 .8424078 .8677701
---------+--------------------------------------------------------------------
combined | 732 .8545066 .0056972 .1541396 .8433218 .8656913
------------------------------------------------------------------------------
ratio = sd(0) / sd(1) f = 1.1251
Ho: ratio = 1 degrees of freedom = 177, 553

Ha: ratio < 1 Ha: ratio != 1 Ha: ratio > 1


Pr(F < f) = 0.8400 2*Pr(F > f) = 0.3199 Pr(F > f) = 0.1600
Std. Dev. ratio is 1.08 (0.16/0.15) and statistically it is equal to 1. So, fail to reject H0,
because p-value is greater than alpha. Also, it means that is equal variance.
For analysis, we compute two-sample test.
ztest VerdivMath , by( White1 ) sd1(0.1611) sd2(0.1519)

Two-sample z test
------------------------------------------------------------------------------
Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]
---------+--------------------------------------------------------------------
0 | 178 .852694 .012075 .1611 .8290276 .8763605
1 | 554 .855089 .0064536 .1519 .8424401 .8677378
---------+--------------------------------------------------------------------
diff | -.0023949 .0136914 -.0292295 .0244397
------------------------------------------------------------------------------
diff = mean(0) - mean(1) z = -0.1749
Ho: diff = 0

Ha: diff < 0 Ha: diff != 0 Ha: diff > 0


Pr(Z < z) = 0.4306 Pr(|Z| > |z|) = 0.8611 Pr(Z > z) = 0.5694
Z-statistic is more than-1.96, because -0.1749 is near to zero and far from -1.96. In critical
value approach, it would fail to reject H0
P-value is greater (0.8611) than alpha (0.05), and fail to reject H0
It means, that two means is equal to each other. Even their difference are negative value (-
0.0023), statistically it is mean no differences and no important.
c. Comparing the Means of Two Related/Matched Populations
SAT score and Season are related with each other, SAT score is passed and get score in one
season.
H0: μ1-μ2=0
H1: μ1-μ2≠0
. sdtest SATScore , by( Season1 )

Variance ratio test


------------------------------------------------------------------------------
Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]
---------+--------------------------------------------------------------------
0 | 241 885.3942 10.75647 166.9854 864.205 906.5833
1 | 491 905.5397 7.606684 168.5529 890.594 920.4855
---------+--------------------------------------------------------------------
combined | 732 898.9071 6.216525 168.1912 886.7027 911.1115
------------------------------------------------------------------------------
ratio = sd(0) / sd(1) f = 0.9815
Ho: ratio = 1 degrees of freedom = 240, 490

Ha: ratio < 1 Ha: ratio != 1 Ha: ratio > 1


Pr(F < f) = 0.4385 2*Pr(F < f) = 0.8771 Pr(F > f) = 0.5615
p-value is greater than alpha and we fail to reject null hypothesis. Standard deviation
difference is 1.5675≈1.
ztest SATScore = Season1 , sddiff(1.5675)

Paired z test
------------------------------------------------------------------------------
Variable | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]
---------+--------------------------------------------------------------------
diff | 250 20.1455 .0579365 1.5675 898.1228 898.3499
------------------------------------------------------------------------------
mean(diff) = mean(SATScore - Season1) z = 1.6e+04
Ho: mean(diff) = 0

Ha: mean(diff) < 0 Ha: mean(diff) != 0 Ha: mean(diff) > 0


Pr(Z < z) = 1.0000 Pr(|Z| > |z|) = 0.0000 Pr(Z > z) = 0.0000

Here, differences are 20.1455, statistically it is large number and we cannot say that two
means are equally.
Reject H0, if z-stat<critical value or z-stat>upper tail critical value
Z-statistic is greater than 1.96 (rejection region), that’s why we will reject the null
hypothesis (H0: mean(diff) is zero).
P-value is less (0.0000) than alpha (0.05), therefore reject H0. There is significant difference
between SAT score and Semester.
3. Confidence interval (Ϭ unkown)- SAT score
ẋ±zα/2*Ϭ/n1/2
N=732
zα/2=1.96
Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
SATScore | 732 898.9071 168.1912 450 1430

899.9071±1.96*168.1912/7321/2
887.7227 ≤μ≤912.0914
(887.7227; 912.0914) Thus, We are 95% confident that the mean amount of SAT score is
somewhere between 887.72 and 912.09 scores.
4. Statistical Power
Above we have analysis with z-test comparing means related population, and it was reject null
hypothesis. Here, statistical power will define the probability that it will reject a false null
hypothesis.
. power twomeans (885.3942) (905.5397), sd1(166.9854) sd2(168.5529) n(732)

Estimated power for a two-sample means test


Satterthwaite's t test assuming unequal variances
Ho: m2 = m1 versus Ha: m2 != m1

Study parameters:

alpha = 0.0500
N = 732
N per group = 366
delta = 20.1455
m1 = 885.3942
m2 = 905.5397
sd1 = 166.9854
sd2 = 168.5529

Estimated power:

power = 0.3680

Statistical power is inversely related to the probability of making a Type II error. To get perfect
research, probability of statistical power must be 0.80-1.
Statistical Power is 36.80%, it means that we are researcher, we have Type-2 error. Our sample size
is not enough to research deeper and get high statistical power.
Now let’s assume that sample size is 2500. According the stata analysis, we see that with 2500
sample size, we get 0.85 statistical power. It means that our research is true and do not have type-2
error.
power twomeans (885.3942) (905.5397), sd1(166.9854) sd2(168.5529) n(2500)

Estimated power for a two-sample means test

Satterthwaite's t test assuming unequal variances

Ho: m2 = m1 versus Ha: m2 != m1

Study parameters:
alpha = 0.0500

N = 2500

N per group = 1250

delta = 20.1455

m1 = 885.3942

m2 = 905.5397

sd1 = 166.9854

sd2 = 168.5529

Estimated power: power = 0.8510


Stata Commands:
ttest CumGPA ==3.0
sdtest HSRank , by( FootballPlayer1 )
ttest HSRank , by( FootballPlayer1 )
ttest CumGPA = TermGPA
sdtest SATScore, by( SemesterF1S2 )
ztest SATScore==1000, sd(168.1912)
sdtest VerdivMath , by( White1 )
ztest VerdivMath , by( White1 ) sd1(0.1611) sd2(0.1519)
sdtest SATScore , by( Season1 )
ztest SATScore = Season1 , sddiff(1.5675)
power twomeans (885.3942) (905.5397), sd1(166.9854) sd2(168.5529) n(732)
power twomeans (885.3942) (905.5397), sd1(166.9854) sd2(168.5529) n(2500)

You might also like