You are on page 1of 14

Computer Project

Students Performance

Subject: MAS291
Semester: Spring 2023
Class: IA1701
Group: 1
Instructor: Nguyễn Văn Thiện

Roll Number Full Name Task

HE161634 Nguyễn Trí Vương - Assigning tasks to group members


- Test a hypothesis and construct a
confidence interval for the mean of a
population (Excel)

HE160687 Đoàn Văn Thắng - Test a hypothesis and construct a


confidence interval for the proportion of a
population (Excel)

HE161757 Quách Thành Nam - Test a hypothesis and construct a


confidence interval for the diffrenece in
mearns of two populations (Excel)

HE160753 Đỗ Nhật Linh

HE160753 Hà Thanh Tùng - Test a hypothesis and construct a


confidence interval for the difference in
proportions of two populations (Excel)

HE161376 Nguyễn Đăng Đại - Regression analysis (Excel)

1
Table of Contents

Contents
Table of Contents......................................................................................................................... 2
I. Abstract............................................................................................................................ 3
II. Data.................................................................................................................................. 3
III. Test a hypothesis and construct a confidence interval for the mean of a population...............4
IV. Test a hypothesis and construct a confidence interval  for the proportion of a population.......5
V. Test a hypothesis and construct a confidence interval for the difference in means of two
populations.............................................................................................................................. 7
VI. Test a hypothesis and construct a confidence interval for the difference in proportions of two
populations............................................................................................................................ 11
VII. Regression Analysis.......................................................................................................... 12

2
I. Abstract
Learning is the process of acquiring and learning to have an understanding of basic skills
and knowledge for yourself. There are many ways to measure the quality of learning.
However, the optimal and best way is often chosen as the average score. There are also a
number of factors that may be involved in the impact on this value. It can include factors
such as family, physical environment, and mental environment. This is considered an
interesting topic for our group. So we discussed and researched to find answers to the
questions:
1. How to improve the students' performance in each test?
2. What are the major factors influencing the test scores?
3. Effectiveness of test preparation course?
Học là quá trình tiếp thu, học hỏi để có những hiểu biết về kỹ năng, kiến
thức cơ bản cho bản thân. Có nhiều cách để đo lường chất lượng học tập.
Tuy nhiên, cách tối ưu và tốt nhất thường được chọn là điểm trung bình.
Ngoài ra còn có một số yếu tố có thể liên quan đến tác động đến giá trị
này. Nó có thể bao gồm các yếu tố như gia đình,hoàn cảnh và yếu tố tâm
lý. Đây được coi là một chủ đề thú vị đối với nhóm chúng tôi. Vì vậy, chúng
tôi đã thảo luận và nghiên cứu để tìm câu hỏi và câu trả lời:
1. Làm thế nào để nâng cao thành tích của học sinh trong mỗi bài kiểm tra?
2. Các yếu tố chính ảnh hưởng đến điểm thi là gì?
3. Hiệu quả của khóa luyện thi?

II. Data
There are many data sources on the internet. Our team's data search priorities are
complete, accurate, and reliable. So we found this data and will use it for analysis and
evaluation in this report.

The independent variables are follow:


● gender: sex of students
● race: race of students

3
● parental_edu_level: parents' final education
● lunch: having lunch before test (normal or abnormal)
● prepare_test: complete or not complete before test
● math_score: just student’s result on math test
● reading_score: just student’s result on reading test
● writing_score: just student’s result on writing test

III. Test a hypothesis and construct a confidence interval for the mean of a
population
1. Confidence Intervals
From the survey data, we can easily calculate confidence intervals of students' average
score by their gender. Then follow the mathematical formula to determine the confidence
s
interval x ± E where E=t(α /2 , n−1)×
√n
a. Male (Từ dữ liệu khảo sát và sử dụng công thức em có thể xác định dễ dàng khoảng
tin cậy điểm trung bình của học sinh nam)

Mean x=62.56

Standard Deviation s=14.64

Number of Observation n=100

Critical Value t (0.025,99)=2.276

14.64
E=2.276 × =3.332
√ 100
→ The 95% confidence intervals of average score by male is (59.2, 65.9)
Khoảng tin cậy 95% điểm trung bình của học sinh nam

b. Female

Mean x=68.96

Standard Deviation s=15.49

Number of Observation n=99

4
Critical Value t (0.025,98)=2.276

15.49
E=2.276 × =3.543
√ 99
→ The 95% confidence intervals of average score by female is (65.4, 72.5)

2. Hypothesis Testing
a. Problem
We predict that “ if a student's score is higher than sample mean, it is likely that all
of them tend to get grade B ”. In other words, the question is whether the population mean
of average scores equals 65.7 (mean of average scores) ?( Chúng tôi dự đoán rằng “nếu
điểm của một học sinh cao hơn trung bình mẫu, có khả năng tất cả các em đều có xu hướng
đạt điểm B”. Nói cách khác, câu hỏi đặt ra là liệu trung bình tổng thể của điểm trung bình
có bằng 65,7 hay không?)
b. Solution
Consider the problem: In this problem, we will test the hypothesis of score in the two
genders.

Test the hypothesis (at α = 0.05) whether the mean score is equal to 65.7 or not.

Null hypothesis: H0 : µ = 65.7


Alternative hypothesis: H1 : µ ≠ 65.7

Male

Fail to reject H0 if -1.972 < t < 1.972

5
Test Statistic:
t = -2.995
–> Reject H0

Female
Fail to reject H0 if -2.0243 < t < 2.0243
Test Statistic:
t = 2.898
→ Reject H0

3. Conclusion
According to the above results, we can conclude that if their score is greater than average,
not all of them get B grade.( Theo kết quả trên, chúng ta có thể kết luận rằng nếu điểm của
họ lớn hơn mức trung bình, thì không phải tất cả họ đều đạt điểm B)

IV. Test a hypothesis and construct a confidence interval for the proportion
of a population.
1. Hypothesis Testing and Construct Confidence Interval:
a. Problem
“Does that data support the claim that the proportion of students having average score
under 80 is 70% of all students’ population?”.
b. Solution
Consider the problem: In this problem, we will test the hypothesis of average score and
construct a confidence interval for the proportion of the population

Test the hypothesis (at α = 0.05) whether the proportion is equal 70% or not.

Null hypothesis: H0: p = 0.7


Alternative hypothesis: H1: p ≠ 0.7

6
Fail to reject H0 if -1.96 <= z <= 1.96
Test Statistic: z = 3.36
–> Reject H0

Confidence Interval for population proportion p:

Confidence Interval for population proportion p: 0.75 <= p <= 0.78

2. Conclusion
According to the above results, we can conclude the proportion of students having average
score under 80 is NOT 70% of all students’ population

V. Test a hypothesis and construct a confidence interval for the difference


in means of two populations
Problem: Assess the trend of overall grade through the parameters of the individual's
gender and preparation (The figures used are purely real, and are taken from 199 people
of different gender and preparation).

1. The Gender
a. Confidence Intervals
To survey the grade through the gender. Male has 100 people, the average grade is
approximately 62.56, with a sample standard deviation of 14.64000331. Female has 99

7
people, the average grade is approximately 68.96, with a sample deviation of
15.48792785.

The point estimate of μ1 − μ2 is:


x 1 − x 2 = 62.56 – 68.96 = -6.4

We construct a point estimate and a 95% confidence interval means that:

α = 1 − 0.95 = 0.05 so that t (α/2) = t(0.025) = 1.96

From the formula for the pooled sample variance we compute:

S 12 (n 1−1)+ S 22 (n 2−1)
= = 227.0379614
n1+ n2−2

Thus:


( x 1 − x 2 ) ± tα/2 × Sp 2 ( 1 + 1 ) = -6.4 ± 4.166
n 1 n2

→ We are 95% confident that the difference in the population means lies in the
interval [−10.566, −2.234], in the sense that in repeated sampling 95% of all intervals
constructed from the sample data in this manner will contain μ1 − μ2.

b. Hypothesis Testing

Male Female

n = 100 n = 99

x 1 = 62.56 x 2 = 68.96

S1 = 14.64000331 S2 = 15.48792785

The parameters of gender of student are u1 and u2, we want to know who student
that is male or female has greater grade. Test at the 5% level of significance whether
the data provide sufficient evidence to conclude that the popularity of the gender is
different?

The relevant test is:

8
H0: u1 = u2

vs.

H1: u1 ≠ u2 & α = 0.05

Since the samples are independent, the test statistic is:

which has t-distribution with df = 100 + 99 - 2 = 197 degrees of freedom.

Since the symbol in H1 is “≠”, this is a two-tailed test, so there are two critical values, ±
t(α/2) = ± t(0.025) with the heading df = 197 we read off
t(0.025) = 1.972079034 . The rejection region is (−∞, -1.972079034] ∪ [ 1.972079034,
∞).

–>The test statistic in the rejection region. The decision is “reject H0”.
c. Using Excel

2. Did Students Prepare For The Test?


a. Confidence Intervals
To survey the overall grade through the completion of the prepared test. 73 students
completed, the average grade is approximately 71.5, with a sample standard deviation
of 13.74886186. And 126 students did not prepare for the test, the average grade is
62.4, with a sample standard deviation of 15.32123747.
The point estimate of μ1 − μ2 is:
x 1 − x 2 = 71.5 -62.4 = 9.1

9
We construct a point estimate and a 95% confidence interval means that:

α = 1 − 0.95 = 0.05 so that t(α/2) = t(0.025) = 1.97

From the formula for the pooled sample variance we compute:

2 2
S 1 (n 1−1)+ S 2 (n 2−1)
= = 218.034448
n1+ n2−2

Thus:


( x 1 − x 2 ) ± tα/2 × Sp 2 (
1 1
+ ) = 9.1 ± 4.278668296
n 1 n2

→ We are 95% confident that the difference in the population means lies in the
interval [4.821331704, 13.3786683], in the sense that in repeated sampling 95% of all
intervals constructed from the sample data in this manner will contain μ1 − μ2.

b. Hypothesis Testing:

Complete None

n = 73 n = 126

x 1 = 71.5 x 2 = 62.4

S1 = 13.74886186 S2 = 15.32123747

The parameters of student’s preparation are u1 and u2. We want to know which
students from “Complete” or “None” who have greater grades. Test at the 5% level of
significance whether the data provide sufficient evidence to conclude that the
popularity of the race is different?

The relevant test is:

H0: u1 = u2

vs.

H1: u1 ≠ u2 & α = 0.05

10
Since the samples are independent, the test statistic is:

( x 1−x 2 )−( p 1− p 2)

√ 1 1 = 4.180854822
Sp2 ( + )
n1 n2

which has t-distribution with df = 73 + 126 - 2 = 197 degrees of freedom.

Since the symbol in H1 is “≠”, this is a two-tailed test, so there are two critical values, ±
t(α/2) = ± t(0.025) with the heading df = 197 we read off
t(0.025) = 1.972079034 . The rejection region is (−∞, -1.972079034] ∪
[ 1.972079034, ∞).

–>The test statistic in the rejection region. The decision is “reject H0”.
c. Using Excel

VI. Test a hypothesis and construct a confidence interval for the difference
in proportions of two populations.

1. Confidence Intervals
From the survey data, we can easily calculate confidence intervals of students' proportion
score by their gender. Then follow the mathematical formula to determine the confidence
interval P ± E where E=Z ( α /2)× √(( p(1−p))/n)
a) Male

11
n 100
xa 6
xb 27
xc 33
xd 27
xf 7
pa 6%
pb 27%
pc 33%
pd 27%
pf 7%
α 0.05
Z(α /2) 1.96
Ea 0.0466
Eb 0.087
Ec 0.0921
Ed 0.087
Ef 0.05

→ The 95% confidence intervals of proportion score = A by male is (0.0134, 0.1066)


→ The 95% confidence intervals of proportion score = B by male is (0.183, 0.357)
→ The 95% confidence intervals of proportion score = C by male is (0.2379, 0.4221)
→ The 95% confidence intervals of proportion score = D by male is (0.183, 0.357)
→ The 95% confidence intervals of proportion score = F by male is (0.02, 0.12)
b) Female

n 99
xa 13
xb 32
xc 37
xd 13
xf 4
pa 13.1%
pb 32.3%
pc 37.3%
pd 27.3%
pf 7%
α 0.05
Z(α /2) 1.96
Ea 0.0665
Eb 0.092
Ec 0.095
Ed 0.0878
Ef 0.05

12
→ The 95% confidence intervals of proportion score = A by female is (0.0645,0.1978)
→ The 95% confidence intervals of proportion score = B by female is (0.231,0.415)
→ The 95% confidence intervals of proportion score = C by female is (0.278,0.468)
→ The 95% confidence intervals of proportion score = D by female is (0.1852,0.3608)
→ The 95% confidence intervals of proportion score = F by female is (0.02,0.12)

VII. Regression Analysis


Excel Test about relationship between reading_score and average

13
 x = 67.62814
 y = 65.74372
 Sxy = (72*72+90*82+….+53*51) -199*67.62814*65.74327 = 46900.03
 Sxx = ¿ …….+53 ¿ ¿2-199*67.62814*67.62814 = 49332.27
46900.03
 B1 = =0.9508 ; B0=65.74372−0.9508∗67.62814 =1.449
49332.27
 Regression line=09508∗x +1.449

Well, the result is obvious, we can see it through this Regression


Statistics Table. With the value of Multiple R is approximately 0.97, R
Square is 0.95, we can assume that there is a strong positive
relationship between reading_score and average, because average is
calculated by adding all types of score and then dividing by the count
of those types.

14

You might also like