You are on page 1of 8

PART 1

TASK 1:

-Không phải chọn 3 cái ra giải thích mà là chọn hết tất cả các cột để giải thích

-Nhưng chỉ nhận xét MODE thôi-không cần nhận xét mean vs median

VD: Dataset 1: SEX  Mode = 0The number of students, whose sex are female, is
more than male students.

(Tương tự cho tất cả các dataset khác. Có thể nhìn file của nhóm Khương để làm mẫu vì
task1 chung số liệu nhưng mà PHẢI VIẾT CÂU VĂN KHÁC và có thể đổi vị trí các dataset, để
mắc công giống kì lém:D)

TASK 2:

-Kí hiệu GM là Grade for Math; GP là Grade for Portugese

-Trình bày lại hết hypothesis y như viết trong giấy(để table v ào)

VD: Portugese grade: (kí hiệu thuộc trong equation

Assumed that the population of two samples is normally distributed.

Ho: µ1 - µ2 = 0

H1: µ1 - µ2 ≠ 0

The test statistic value: Using t-Test: Two-Sample Assuming Unequal Variances
(Để cái table chổ này)

According to the table we see, t-stat ∈ ±t-critical (ghi số liệu ra), then we cannot reject Ho as
false, µ1 - µ2 = 0.

Therefore, the average Portuguese grade of students from two schools are the same.

(Tương tự cho Math Grade).

TASK 3:

-Ghi lại cái đề

-Kí hiệu GM là Grade for Math; GP là Grade for Portugese

-Làm giống cái cũ của mình

VD: Math Grade


We take the regression test from Medu to reason for both GM and PG, which means the
Output is Math Grade and the Input are Medu, Fedu, Mjob, Fjob, reason.

(Để cái table vào)

Looking at the table we figure out the regression model as follow:

Model Y = 38.99010149 +( 27.02042351*Medu) +(6.493547175*Fedu) + (13.67341185*Mjob)


+(2.226851272*Fjob) +(0.070793134*reason)

In order to test if the variables of the regression model are significant or not, we conduct the test
of individual regression parameters:

Hypothesis:

Determine the null hyperthesis and alternative hyperthesis

Ho: β1=β2=β3=β4=β5

H1: Not all βi ( i= 1,2,3,4,5) are equal

According to the data, we compare the p-value:

P-value (Medu) = 0.003873515

P-value (Mjob) = 0.036803036

So, at 0.05 level of significance, we can reject H0 as P-value (of Mdeu, Mjob) < 0.05. It
means that based on the hypothesis testing, we have enough evidence to prove that the all
variables are significant.

Therefore, the model regression is

Y= 38.99010149 +( 27.02042351*Medu) + (13.67341185*Mjob) .

(Tương tự cho Portugese Grade- do tất cả P-value của PG đề > 0.05 nên chổ “According to
the data, we compare the p-value” mình ghi lại hết tất cả p-value của
Medu,Fjob,Mjob,Fedu,reason bằng bao nhiêu-cái này kết luận khác:

So, at 0.05 level of significance, we cannot reject H0 as all P-value > 0.05. It means that based
on the hypothesis testing, we have no evidence to prove that the all variables are significant.

Model Y = 49.92548461

TASK 4:

-Cách trình bày y như task 3 nhưng do mình lấy số liệu khác (Schoolup, famup,paid,
activities, nursery, higher, internet, romantic)
-Cũng ghi “We take the regression test from Schoolup to romantic for both GM and PG,
which means the Output is Math/Port Grade and the Input are Schoolup, famup,paid,
activities, nursery, higher, internet, romantic:

- Y chang task3 nếu p-value của cái nào < 0.05 thì ghi lại xong kết luận giống như Math
grade của VD trên) Model Y= Model Y=39.14105+(-10.9199*schoolup)+(13.85493*higher)

-Tương tự Portugese Grade (Model Y= 53.18579+(-7.39219*schoolup) )

TASK 5: (này tân làm lại rồi nha, chèn table thôi)

-Ghi lại đề

Assume that the population is normally distributed

Using t-Test: Paired Two Sample for Means

H0 : μ1-μ2 = 0

H1: μ1-μ2 ≠ 0

(Để table vào)

According to the table, we have:

t-stat = ± 6.435024955

t Critical two-tail= ± 1.966018615

p-value= 3.5967E-10 <0.05

Since, t-stat NOT ∈ t-Critical two-tail, and p-value<0.05, we can reject Ho.

Hence, there is no evidence that students who performs well in Math also performs well in
Portuguese.

PART 2:
TASK 1: (task này tân với thảo làm duoi day nha, tại giải thích bằng word khó hiểu lắm
:D ))

Base on the descriptive statistic, we conclude that students, who have the more Activities level
, experience the higher English level significantly. To be specific: at the 0.0 and 0.3 Activities
level, the mean English level is approximately 1 and 2, respectively. At 1.0 and 2.0 Activities
level, the mean English level of students is 2.410 and 3. Especailly, the mode of English level
(according to the Activities level), is increasing at each level of Activities ( 1 at 0.0 Activities
level, 2 at 1.0 Activities level, and 3 at Activities level). It means that the students who have
more Activities level, study the higher English level.
As can be seen from the data, students seem to receive nearly the same GPA whether they
participate in university activities or not. The average GPA of students in Activities level 0.0 and
1.0 and 2.0 is approximately 74 ( we do not consider the Activities level 0.3 because of the rare
appearance). It means that although students take much or less time for university activities, it
will not have impact on the result of their GPA.

From the data showed, the number of students increase constantly at higher level of university
activities. In detail, 558 students at 0.0 Activities level; 970 students at 0.3 Activities level; 1104
students and 1316 students at 1.0 and 2.0 Activities level, respectively. That means when
attending to the university, students tend to participate more in university activities rather than
just to study for whole time.

TASK 2: (task này chèn lại table là xong nha,)

(Để lại cái table)

HYPOTHESIS:

Ho: μ1= μ2= μ3 =0

H1: not all the μi (i =1,2,3) are equal.

Based on the ANOVA table, we receive that:

F-ratio = 117.0293592

F-critical = 3.002170142

At 5% of significance, we can reject the hypothesis since F-ratio > F-critical.


Thus, based on the ANOVA table and the hypothesis testing we have sufficient evidence to
prove that not all three groups have the same average salary after graduation. Consequencely,
by looking at the table, the group 3 has highest salary after graduation ($1481.6905) and group
1 has lowest income of three groups ($552.3509).

TASK 3:

-Ghi lại đề

-Ghi lại hết 3 cái table.

-Task này mình làm đúng nhưng viết quá vắng tắt. Viết lại thành câu đầy đủ chủ vị dễ
hiểu hơn

Câu a: ghi thêm câu According to the chart…. (đại ý là số lượng male trong English level(EL)
1.0 thì nhiều nhất so với male trong các EL còn lại (2.0 and 3.0). Ngược lại số lượng female
trong EL của 2.0 and 3.0 thì nhiều hơn male trong hai EL này. Mà EL càng cao thì trình độ
English càng giỏi, nghĩa là trình đọ EL 2.0 and 3.0 cao hơn trình độ EL 1.0 hoặc Trình độ EL 1.0
là thấp nhất. Suy ra female học giỏi English hơn male students.)
Câu b: Looking at the chart: (câu này viết như vầy cũng ổn rồi :D )

Group 3 has the highest income (Family Income/Month is from $800 to $1,600). Besides, in
group 3, the number of student has English level 3 is greater than other groups. Therefore, the
higher income, the better English level of students.

TASK 4:

-Task này mình làm đúng rồi copy lại với edit cho đẹp

-Nhớ thêm câu “At 0.05 level of significant level” vào đầu mỗi câu kết luận.(trước chữ
because)

VD câu a:

At the significant level α = 0.05, since F >FC, we can reject the null hypothesis. Its
means that based on the ANOVA table and hypothesis testing we have sufficient evidence to
prove that the students who participated in extra activities at college tend to have higher Salary
after Graduation.

(tương tự b,c)

TASK 5( cái này tân làm luôn rồi nha chỉ cần chèn lại cái table thôi)

-Copy lại cái bảng

Câu a:

From the table, we have the regression model as below:

Model Y = 377,6670 + 0,3887 (Family Income/Month) + 94,9902(Gender)

In order to test the variables of the regression modal are significant, we have to conduct the
test of individual regression parameters.

HYPOTHESIS:

Ho: μ1= μ2 =0

H1: not all the μi (i =1,2) are equal.

Based on the table of coefficients, we can compare the p- value:

P-value(Family Income/Month) = 0

P-value (Gender) = 0,000460374

At 0.05 level of significance, we reject H0 as P-value(Family Income/Month) < 0.05, P-


value(Gender) < 0.05. Therefore, we have enough evidence to prove that the variables Family
Income/Month and Gender are significant.
The regression model:

Model Y = 377,6670877 + 0,388795125 (Family Income/Month) + 94,99024052(Gender)

Câu b:

Since, the variables Family Income/Month and Gender above are significant. We have statistical
evidence that variables has a linear relationship with Y and explanatory power with respect to
the dependent variable. Thus, it is okay to say that there is inequality in Salary after Graduation
based on different Gender.

TASK 6: (tân làm luôn rồi nha, chèn table lại là ôkie)

-Ghi lại table

a) From the table, we can build the regression model as followings:


Model Y = -1173,432241+ 0,358873069(Family Imcome/Month) + (-40,12626986(Gender))+
228,8605952(English level) + 382,1151522(Activites Level) + 11,54236023(GPA) +
147,1168692(University Ranking)

In order to test the variables of the regression modal are significant, we have to conduct the
test of individual regression parameters.

HYPOTHESIS:

Ho: μ1= μ2= μ3 = μ4= μ5= μ6

H1: not all the μi (i =1,2,3,4,5,6) are equal.

Based on the table of coefficients, we can compare the p- value:

P-value(Family Income/Month) = 0

P-value(Gender) = 0,04284

P-value(English level) = 8,30733E-52

P-value(Activities level)= 2,6717E-107

P-value(GPA)= 4,27055E-58

P-value(University Ranking)= 4,21945E-30

At 0.05 level of significance, we reject H0 as all above P-value < 0.05. Concluding that based on
the hypothesis testing, we have enough evidence to prove that the all variables are significant.

The regression model:


Model Y= -1173,432241+ 0,358873069(Family Imcome/Month) + (-40,12626986(Gender))+
228,8605952(English level) + 382,1151522(Activites Level) + 11,54236023(GPA) +
147,1168692(University Ranking)

b) Comparing to the regression model obtained in the section 5(which has only 2
variables of the dependents variable Y), the dependents variable Y (Salary after
Graduation) provides linear relationship containing 6 variables(4 new variables:
English level, activities level, GPA, University Ranking ; 2 same variables:Family
Imcome/Month, Activites Level).

You might also like