Professional Documents
Culture Documents
Homework Assignment 2
1. T-test:
Firstly as important grade in university/college, Cumulative GDP is chosen as dependent variable,
which shows students’ study performance.
1
.8 .6
Density
.4
.2
0
0 1 2 3 4
CumGPA
As histogram shows, Cum GPA is normally data distributed. According the National score of
Cumulative GPA in the world, universities and colleges usually require GPAs higher than 3.0.
Therefore, for null hypothesis it would be greater or equal to 3.0.
H0: μ ≥ 3.0 (null hypothesis)
H1: μ<3.0 (alternative hypothesis)
One-sample t test
------------------------------------------------------------------------------
Variable | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]
---------+--------------------------------------------------------------------
CumGPA | 732 2.080861 .0365773 .9896168 2.009052 2.15267
------------------------------------------------------------------------------
mean = mean(CumGPA) t = -25.1287
Ho: mean = 3.0 degrees of freedom = 731
Ha: mean < 3.0 Ha: mean != 3.0 Ha: mean > 3.0
Pr(T < t) = 0.0000 Pr(|T| > |t|) = 0.0000 Pr(T > t) = 1.0000
Our null hypothesis is a greater or equal to 3.0 and it is one-tailed. Mean is 2.08, which is less than
3.0
P-value approach:
P-value≥α , fail to reject H0 and P-value<α, reject H0.
Our p-value is 0.0000, and our alpha is 0.05 by having 95% Confidence interval. So, we will reject
null hypothesis (μ≥3.0), because real Cum GPA’s mean is different.
b. Comparing the Means of Two Independent Populations
H0: μ1-μ2=0
H1: μ2-μ2≠0
For two independent variables, Rank in HS and Football Player was chosen.
sdtest HSRank , by( FootballPlayer1 )
Variance ratio shows difference between 2 standard deviation for determining unequal and equal
variance. Here null hypothesis must equal to 1, which says that equal variances. If we divide sd(0)
by sd(1), ratio will be equal 0.9161. p-value is greater than alpha, 2*Pr(F < f) = 0.1548 > 0.05, fail
to reject H0. Because ratio is 0.9161, and it is near to 1 (0.9161≈1). It means, that 2 variances are
equal.
ttest HSRank , by( FootballPlayer1 )
T-test is -2.6084. Reject H0, if Tstat >+1.96 or Tstat<-1.96. Therefore, reject H0, where T-test is less
than -1.96.
P-value=0.0093, it is less than alpha (0.05). Difference of two means is-22.48049, which is not
equal to zero (null hypothesis). So, reject H0. During the comparing, we got a big difference
between two means.
c. Comparing the Means of Two Related/Matched Populations
As two related variables, Cumulative GDP and Term GDP is chosen, and it collected from one
place.
ttest CumGPA = TermGPA
Paired t test
------------------------------------------------------------------------------
Variable | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]
---------+--------------------------------------------------------------------
---------+--------------------------------------------------------------------
------------------------------------------------------------------------------
Pr(T < t) = 0.0000 Pr(|T| > |t|) = 0.0000 Pr(T > t) = 1.0000
According the critical value, H0 is reject, because t=-6.5355 is a greater than critical value =-1.96.
P-value is 0.0000, this is also reject H0 because p-value is less than 0.05 (alpha). If look mean
difference, so it is -0.2494, which is not related with null hypothesis (mean differ=0)
2. Z-test:
.003
.002
Density
.001
0
400 600 800 1000 1200 1400 400 600 800 1,000 1,200 1,400
SAT Score SAT Score
For dependent variable, it was chosen SAT score, because nowadays SAT is more important and
more valuable admissions.
According the histogram and box plot, data is normally distributed (normal distribution), because
box plot shows us easy way to see mean, outliers and quartile. Therefore, we sure that data set is
following bell-shaped symmetrical curve
Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
SATScore | 732 898.9071 168.1912 450 1430
For evaluate null hypothesis, look the average SAT score in the world, it is around 1000. (Source
from blog.prepscholar.com › what-is-the-average-sat-score)
For get standard deviation, we combine SAT with Semester. In this case, it shows what SAT
score get student in fall and spring semester. St. Dev=168.1912
ztest SATScore==1000, sd(168.1912)
One-sample z test
------------------------------------------------------------------------------
Variable | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]
---------+--------------------------------------------------------------------
SATScore | 732 898.9071 6.216526 168.1912 886.7229 911.0913
------------------------------------------------------------------------------
mean = mean(SATScore) z = -16.2620
Ho: mean = 1000
Ha: mean < 1000 Ha: mean != 1000 Ha: mean > 1000
Pr(Z < z) = 0.0000 Pr(|Z| > |z|) = 0.0000 Pr(Z > z) = 1.0000
Our null hypothesis is a greater or equal to 1000 and it is one-tailed. Alternative hypothesis
is highlighted yellow color. Stata gives us above results. According the stata analysis, mean
is 898.9071, which is not the same value as null hypothesis. Z-statistic is -16.2620.
H0: μ ≥1000 (null hypothesis)
H1: μ <1000 (alternative hypothesis)
Decision making: critical value is +-1.96
Reject H0, if Zstst >+1.96 or Zstst<-1.96. Our Z value is less than -1.96, which is in rejection
region. So, we reject our null hypothesis (μ=1000), because SAT score’s mean is not equal
to 1000, real mean is 898.9071.
P-value approach:
P-value≥α , fail to reject H0 and P-value<α, reject H0.
Our p-value is 0.0000, and our alpha is 0.05 by having 95% Confidence interval. So, we will
reject null hypothesis (μ=1000), because real SAT score’s mean is different.
Two-sample z test
------------------------------------------------------------------------------
Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]
---------+--------------------------------------------------------------------
0 | 178 .852694 .012075 .1611 .8290276 .8763605
1 | 554 .855089 .0064536 .1519 .8424401 .8677378
---------+--------------------------------------------------------------------
diff | -.0023949 .0136914 -.0292295 .0244397
------------------------------------------------------------------------------
diff = mean(0) - mean(1) z = -0.1749
Ho: diff = 0
Paired z test
------------------------------------------------------------------------------
Variable | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]
---------+--------------------------------------------------------------------
diff | 250 20.1455 .0579365 1.5675 898.1228 898.3499
------------------------------------------------------------------------------
mean(diff) = mean(SATScore - Season1) z = 1.6e+04
Ho: mean(diff) = 0
Here, differences are 20.1455, statistically it is large number and we cannot say that two
means are equally.
Reject H0, if z-stat<critical value or z-stat>upper tail critical value
Z-statistic is greater than 1.96 (rejection region), that’s why we will reject the null
hypothesis (H0: mean(diff) is zero).
P-value is less (0.0000) than alpha (0.05), therefore reject H0. There is significant difference
between SAT score and Semester.
3. Confidence interval (Ϭ unkown)- SAT score
ẋ±zα/2*Ϭ/n1/2
N=732
zα/2=1.96
Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
SATScore | 732 898.9071 168.1912 450 1430
899.9071±1.96*168.1912/7321/2
887.7227 ≤μ≤912.0914
(887.7227; 912.0914) Thus, We are 95% confident that the mean amount of SAT score is
somewhere between 887.72 and 912.09 scores.
4. Statistical Power
Above we have analysis with z-test comparing means related population, and it was reject null
hypothesis. Here, statistical power will define the probability that it will reject a false null
hypothesis.
. power twomeans (885.3942) (905.5397), sd1(166.9854) sd2(168.5529) n(732)
Study parameters:
alpha = 0.0500
N = 732
N per group = 366
delta = 20.1455
m1 = 885.3942
m2 = 905.5397
sd1 = 166.9854
sd2 = 168.5529
Estimated power:
power = 0.3680
Statistical power is inversely related to the probability of making a Type II error. To get perfect
research, probability of statistical power must be 0.80-1.
Statistical Power is 36.80%, it means that we are researcher, we have Type-2 error. Our sample size
is not enough to research deeper and get high statistical power.
Now let’s assume that sample size is 2500. According the stata analysis, we see that with 2500
sample size, we get 0.85 statistical power. It means that our research is true and do not have type-2
error.
power twomeans (885.3942) (905.5397), sd1(166.9854) sd2(168.5529) n(2500)
Study parameters:
alpha = 0.0500
N = 2500
delta = 20.1455
m1 = 885.3942
m2 = 905.5397
sd1 = 166.9854
sd2 = 168.5529