You are on page 1of 11

Da Nang University of Science and Technology

Statistic and probability

Da Nang, 25th December 2021

I. Introduction
1. Problem
 Descriptive characteristic of the height of DUT student via freshman students of FAST
 Two-sample test the mean of math entering score of 2 group students.
2. Parameter
Population parameter of interests (from 21ES/ECE students)

 The mean height of DUT students


 The mean of math entering score
Initial conjecture: +We expected that the mean height of DUT students is 167cm
+ There is no different math score entering between group 1 and group 2

3. Team members and Task


 Trần Ngọc Nguyên (Leader): R code and Writting report
 Nguyễn Bá Vương: Descriptive statistic
 Nguyễn Hoàng Hải: Inferential statistic
 Lưu Đinh Đại Đức: Hypothesis Testing
 Phạm Xuân Sang: Two-sided sample

II. Data collection ways


In processing of collecting data, we have some serious problem:

 The confidence of data


 The error of format
3/ The mistake of typing.

Solution: We need to change the format of data to uniform and expect that the wrong data are given from
students due omitted or redundant numbers. Therefore, we can fix it based on intuition.

III. Analysis of results


1. Descriptive statistic
Representing by R:
a. Height in two groups.

#Comment:
By histogram1, the distribution concentrates on densely interval [160,170] cm. And,
Interval [170,175]cm is least.
By histogram2, the distribution appears normal. This agrees to Central Limit Theorem
which implies that
sample sizes are larger than 40, the distribution is approximately normal. Here n=38.
b. Weight in two groups.

#Comment:
By Weight in group1, it’s clear that a first half of sample is narrower than the rest.
And there doesn't exist outlier.
By Weight in group2, it appears that the distribution in two sides with respect to "median"
point is quite equal. Similarly, no outlier appears.
c. Total Score.
#Comment
-Values of sample lie between a little lower than 21 and 27.
-Interval median to 75th-percentile is the narrowest.
-There is a mild outlier in this sample.

2. Inferential statistic.
Problem: determine the mean height of students in FAST from group1 and group2.
a. Confidence Interval.
Group1.
Let confidence level be 95%. Given that n = 22, x=170.0818∧s=8.015143 .
Because sample size is 22. Then, it doesn’t satisfy the CLT theorem, so we must use t-
distribution.
-Use R code:

-Result:

Group2.
Similary, confidence level is 95%. Given that n = 38, x=170.6184∧s=6.20531 Because the
sample size n = 38,this is a little less than 40 .So, we can “quitely” agree the CLT theorem.
Result:

b. Hypothesis Testing for Height.


Assume the mean height of students in FAST is 167cm.

Problem: Ho: µ=167 vs Ho: µ ≠ 167


With the significant level α =0.05

Group1
Because sample size is 22. It doesn’t satisfy the CLT theorem we must use t_distribution.
Moreover, because this is two-tailed test.

Then we reject Ho if either t >= t_alpha/2 or t <= -t_alpha/2.


Using R.
 This is valid. As we have known, the confidence interval for group1 is
[166.5281,173.6355] for confidence level = 95%. However, the hypothesis is 167cm
which lies in this interval. Therefore, we don’t reject Ho.
 We conclude that the population height of student in DUT is different from the data
in Group1.
Group2.
Because sample size is 38. We can accept normal distribution.
Similarly, because this is two-tailed test.

Because this case is for large-sample tests without knowing population standard deviation.
Test static is quite different:
Then we reject Ho if either z >= z_alpha/2 or z <= -z_alpha/2.
Using R-studio.

 This is valid. As we have known, the confidence interval for group1 is


[168.6455,172.5914] for confidence level = 95%. However, the hypothesis is 167cm
which lies outside the interval. Therefore, we reject Ho.
 We conclude that the population height of student in DUT is 167cm from the data in
Group2.

c. CI for two-sample test.


Problem: Determine the confidence interval of difference between mean height of group1
and mean height of group2.
Assume α = 0.05, sample size 1 is 22 and sample size 2 is 38. Given that, x 1=170.0818 ,
x 2=170.6184 , s1=8.015143 and s2=6.20531 .
From two sample sizes, we can use two-sample t Confidence interval. It may seem strange
a little bit. We show previously, sample size 2 is normal distribution, but here we use two-
sample t confidence interval.
The reason is that when sample size is large, then t-distribution is approximated normal
distribution. Therefore, here we use t-distribution for sample size 2.
+) Obtaining df.

+) Formular for confidence level.


Result:
Comment: With a high degree of confidence, we can say that true average height for
freshman student in group 1 exceed that in group 2 by between −0.30 3∧0.26 6

Hypothesis for two sample t test

R
esult:
Comment: Using a significance level of .05, we can barely reject the null hypothesis in favor
of the alternative hypothesis

IV. Conclude:
1. By confidence level = 95%. In group1, the confidence interval [166.5281,173.6355]. This
figure to group2 is [168.6455,172.5914]. There for, we can use the null hypothesis in group
1, it is the average height is 167cm to represent the average height of DUT students, but we
can not use the null hypothesis in group 2 to define it.
2. There is no different in the true average math score entering of FAST students in 2 group

∞∞∞ END ∞∞∞

You might also like