You are on page 1of 19

HANOI UNIVERSITY

FACULTY OF MANAGEMENT AND TOURISM

STATISTICS FOR ECONOMICS


(FALL 2019)

Case study analysis: Academic Performance of


University Students
by TWO-WAY ANOVA Test

Instructor: Ms. Lai Hoai Phuong


Tutorial 5 - Group 6

Group members:
1. Nguyễn Thị Lan Anh 1604010005
2.Nguyễn Tuấn Phong 1704040093
3. Nguyễn Gia Phương Anh 1704040005
4. Lê Thị Bảo Ngọc 1704040084
5. Nguyễn Việt Hoa 1704040043
6. Nguyễn Thị Nhung 1704040093
7. Đặng Ngọc Quỳnh 1704040100
TABLE OF CONTENTS

I. Introduction...........................................................................................................1
II. Answering questions..............................................................................................1
 Question 1...................................................................................................1
 Question 2...................................................................................................1
 Question 3...................................................................................................5
 Question 4...................................................................................................7
 Question 5...................................................................................................8
 Question 6.................................................................................................10
I. INTRODUCTION

Analysis of variance (ANOVA) is a statistical technique that assesses potential differences in a


scale-level dependent variable by a nominal-level variable having two or more categories. By
using this method, the aggregate variability in a dataset is divided into two parts: random factors
and systematic factors. In fact, we often use two types of ANOVA methods to determine whether
differences exist among population means, they are: one-way and two-way. In particular, a one-
way ANOVA has just one independent variable, which estimates the effect of a factor on a
response variable. The other, a two-way ANOVA, refers to an ANOVA using two independent
variables. In this case study: we study the relationship, if any, between classroom seating
positions and academic performance (GPA) for both female and male students in a large
university in the United States by the way of using two-way ANOVA method. The aim of our
project is to describe how the outstanding features of two-way ANOVA model applied into the
real case study.

II. ANSWERING THE QUESTIONS

1. What inference technique should be considered for this study? Explain.

The objective of the survey in this case is to test for any significant interaction between
Classroom seating positions and Gender and to test for any significant difference in academic
performance (GPA) due to seat preference and gender. We can easily notice that the suitable
inference technique should be used for this study is Two-way ANOVA model. Two-way
ANOVA compares the mean differences among groups that have been split into 2 independent
factors, each with several levels. In particular, it is clear that respondents were asked to specify
one of three levels of seat preference: “front” , “middle” and “back”. Therefore, seating
positions become the first factor which including 3 levels. The second factor is gender with 2
levels of male and female. From utilizing two factors, two-way ANOVA will expose the
interaction between these two factors. Each combination of the factors is named a cell.
Therefore, total combinations of seats and genders results in 6 cells.

2. Produce descriptive statistics for the dataset. You are expected to generate as many

relevant descriptive statistics as possible using ALL the relevant tools introduced in the
labs of this course. Remember to provide appropriate interpretations for the descriptive

statistics. Try not to include unnecessary or irrelevant descriptive statistics.

2.1 Sample size

The sample of the conservations is normally distributed. It is conducted by 300 respondents


which are large enough and it is independent because the attendants are randomly selected. There
are three variables consisting of the GPA, the gender (male,female), and the Seat (front, middle
and back ).

2.2 Mean and Standard deviation

We can get the mean of the GPA and find the standard deviation of two other variables but we
have to convert variable Gender and Seat into factors. Using “Factor” function, then use “By”
function to get the mean for two groups at the same time.

 Convert variable Gender and Seat into factors and Crosstabulation table between
Gender and Seat variables:
❖ StudentSurvey$Gender <- factor(StudentSurvey$Gender, levels=c("Male","Female"))
❖ StudentSurvey$Seat <- factor(StudentSurvey$Seat, levels=c("Back","Front","Middle"))
❖ table(StudentSurvey$Seat,StudentSurvey$Gender)

❖ Mean of GPA for each combination of Seat and Gender :


❖ Standard deviation of GPA for each combination of Seat and Gender:

From this output, it is clearly seen that the highest standard deviation is the combination of back
seat and male gender at 0.4958685 and the lowest one is 0.3795011 examined from the group of
front seat and female gender.

2. 3 Boxplot and mean plot

❖ Graphical description

boxplot(GPA ~ interaction(Seat,Gender), data = StudentSurvey, xlab = "Seat and Gender",


ylab = "expected GPA", col = c("red", "blue", "yellow","grey","pink","green"))
Judging the above boxplots, we can see that students who are female often have a stable mean
than male . In the male gender, the lowest GPA appears in the student groups who prefered the
back, while it is middle in the female gender. The black line which represent the median of the
group reach the highest in group “Front.Female” and lowest in “Back.Male”. Furthermore, there
are total seven outliers in the boxplots.

➢ install.packages("gplots")
➢ library(gplots)
➢ plotmeans(GPA ~ interaction(Seat,Gender), data = StudentSurvey, xlab = "Seat and
Gender", ylab = "expected GPA", main="Mean Plot with 95% CI")
Mean plot provides the difference between mean GPA of each combination and standard
deviation of them. Plot in front seat combined with female gender stands at the highest GPA with
more than 3.3 , followed by “Back.Female” at nearly 3.2, and the lowest one is the “Back.Male”
with only 3.0.

3. Check all the assumptions of the inference technique you suggest in Question 1. Are the
assumptions satisfied? Explain.
There are 3 assumptions required to use two – way ANOVA:

 Samples are independent, simple random samples.


 All populations are normal distributions.

 All populations have the same standard deviation: : = = …=

3.1 Samples are independent, simple random samples


Looking up for the definition of an independent sample, it is a sample which does not have any
connection to another sample when they happen. The samples are independent, the occurrence of
this sample does not influence the probability of another sample.
 table(StudentSurvey$Seat,StudentSurvey$Gender)

Male Female
Back 50 50
Front 50 50
Middle 50 50

As can be seen, the total sample size of this survey is 300 observations provided in the
accompanying file named StudentSurvey.csv, consists of six groups: Back-Male, Back-Female,
Front-Male, Front-Female, Middle-Male, Middle-Female. Since there is not any information on
how respondents are selected, the group thinks that they are chosen randomly. Each response
came from a different person, and his/her answer is not affected by another. Therefore, the
samples are independent, and are randomly selected.

3.2 All populations have the same standard deviation


To check whether all populations have the same standard deviation or not, we look for the ratio
of the largest standard deviation divided by the smallest one. If this ratio is smaller than 2, we
can conclude that the populations are equal.
From the by() function shown in question 2 to get the standard deviation, it can be seen that the
largest SD is 0.4958685, while the smallest SD is 0.3795011. The ratio of these two components
is 1.3, which is smaller than 2. Therefore, we can conclude that all populations have the same
standard deviation.
Another technique can be used to check this assumption is to conduct the Levene test. This test is
to check the homogeneity of the variance, so the null hypothesis is all the variances which are
equal. We compare the P-value of the Levene test and our significant level (α = 0.05). The
rejection rule is to reject Ho if P-value is smaller than α.
The Levene test is in the “car” package, so it is necessary to install “car” package.
R code:
-> install.packages("car")
-> library(car)
leveneTest(StudentSurvey$GPA,interaction(StudentSurvey$Seat,StudentSurvey$Gender),center
=mean)
The outcome
Levene's Test for Homogeneity of Variance (center = mean)
Df F value Pr(>F)
group 5 1.1739 0.322
294
The P-value of the test is 0.322 while our α is 0.05, therefore we do not reject the hypothesis, as
well as cannot conclude that the standard deviations are different.
However, since the ratio is smaller than 2, conducting the Levene test is not truly necessary in
this case. If the ratio of this case is larger than 3, we should choose other tests instead of the
Two-way ANOVA.
3.3 All populations are normal distributions.
We can check the normality by using Q-Q plot of residuals (The Q-Q plot was made in Rsudio)
with this code and output:
R code:

 install.packages("car")
 library(car)
 leveneTest(StudentSurvey$GPA,interaction(StudentSurvey$Seat,StudentSurvey$Gender)
, center=mean)
 qqPlot(lm(GPA ~ Gender + Seat + Gender*Seat, data=StudentSurvey), simulate=T,
main="Q-Q Plot", labels=F)

The outcome:

It is clearly seen from the Q-Q plot that all outliers lie within the confidence envelop,
which obviously demonstrates that all populations are normally distributed.

4. Perform the inference technique you suggest in Question 1. Remember to provide all the
necessary steps. What are your interpretations and conclusions? Explain.
ANOVA test 2-way factors:
Step 1: Identify null and alternative hypothesis:
Ho: There is not a significant interaction between seat preference and gender in GPA.
Ha: There is significant interaction between seat preference and gender in GPA.
Step 2: Test statistic and p-value:
We used Rstudio to calculate and had the output as following:
> StudentSurvey.result<-aov(GPA ~ Gender*Seat, data = StudentSurvey)
> summary(StudentSurvey.result)
Df Sum Sq Mean Sq F value Pr(>F)
Gender 1 1.40 1.4008 7.108 0.0081 **
Seat 2 0.93 0.4673 2.371 0.0951 .
Gender:Seat 2 1.35 0.6745 3.423 0.0339 *
Residuals 294 57.94 0.1971
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Step 3: Level of significance: α=0.05


Step 4: Decision rule:
Reject Ho if p-value < ∝
From R output, we can see that the interaction between seat preference and gender has P-value:
0.0339<∝

 Reject Ho. The effect of interaction between seat preference and gender is significant.

Step 5: Conclusion:
We have enough statistical evidence to conclude that there are significant differences in GPA
due to seat preference and gender.

Question 5: Draw an interaction plot and interpret the plot


As you can see that there is a significant interaction in GPA due to genders and seat is the
interaction plot here with:
Rcode:
 > interaction.plot(StudentSurvey$Gender,StudentSurvey$Seat,
StudentSurvey$GPA,type=“b”,col=c(“red”, “blue”),pch=c(16,18),main=“Interaction between
Gender and Seat”)
Figure 7: Interaction Plot between Gender and Seat

As we can see from the interaction plot, the male and female student groups record a significant
difference among the ones who sit in the front, middle and back. Looking at the details, the
female group who sit in the front scores the highest GPA with over 3.3 while the male group
who also sit at the same spot has 3.1. The female sitting in the middle has approximately 3.1 and
the male group has a bit higher GPA. The female group who sits in the back shows a similarity
with the ones who sit in the middle but the male has the lowest GPA (less than 3.0). From this
interaction, we can conclude that the ones who sit from the middle to the front has the tendency
of having higher GPA. Yet, the female group who sits in the back also has remarkable result.

An intersection among seat lines can be observed in the above interaction plot. This indicates
that there is a connection between genders and the seat position. The female students sitting in
the front and the back of the class have better performance than the male students and the
contrary can be seen in the middle seat group.

6. Discuss the credibility of the interpretations and conclusions of question 4. Is there


anything we should be concerned about? Explain.

a. Credibility of the interpretations


With the purpose of comparing population means when population is categorized by two
categorical factors, an appropriate and useful tool is used in this case study – two-way ANOVA
test. Secondly, a significant level of 0.05 is utilized, which guarantees the accuracy of the test. At
the same time, the result of p-value is quite small meaning that there is a higher chance to reject
the null hypothesis. Besides, all the assumptions for the test are satisfied with clear evidences as
well as explanation for each proof in the third part of the report. The thing should be highlighted
is that although we use “by” function to test equal variances and receive the result: Largest
standard deviation/Smallest standard deviation equal 1.3 (< 2), we still apply LeveneTest to
ensure the result of this assumption checking. Eventually, the plot and interpretation of
interaction between two factors is considered as an important part of the case study.

b. Limitations of the case


First of all, one of the assumptions is that the sample of the case has to be a Simple Random
Sample. However, there is nothing here to ensure that the sample is chosen randomly from its
population. Moreover, ANOVA test assumes that the data are normally distributed and the
violation of this assumption affects greatly on the results. Since the violation in this case is
moderate, therefore if there are some outliers in the QQ-plot, this assumption still can be
satisfied.
Another limitation is the condition of equal variances because the greater the difference in
variances between groups, the greater chance that the conclusion of the test is inaccurate.
Eventually, when running ANOVA to test the difference of GPA due to Gender and Seat
position, the result only tells whether there is a difference or not but it does not indicate how the
difference is.

III. Conclusion
Two-way ANOVA which is used to address this case is satisfied. It brings us to the conclusion
that it is significant about the change in academic performance due to the relationship classroom
seating positions and academic performance (GPA) for both female and male students.
APPENDIX

1. Read R code with file “StudentSurvey.csv”

2. Mean and standard deviation


3. Check assumption
4. Interaction plot
STATISTICS FOR ECONOMICS - PEER EVALUATION FORM

Please fill out this form to perform evaluation of your group members. Discuss with all members
and agree on the final evaluations.

Please evaluate each member out of a scale of 100%. Allocation should be based upon group
opinions regarding how satisfactorily the member fulfilled his/her assigned tasks within the
group’s case study. For example, a 100% rating should be given to members who fulfilled
satisfactorily the tasks assigned by the group.

Group members should ask themselves the following questions before assigning the percentages
to others.

1. Did he/she do his/her fair share of the work on schedule and to the group’s satisfaction?
2. Did he/she cooperate with other group members?
3. Did he/she participate in, contribute to and share ideas in all relevant discussions?
4. Did he/she attend group meetings when required?
5. Did he/she relate and communicate to other group members?

Team members Contribution Signature (all members)


(100%)
Nguyễn Thị Lan Anh 100%

Nguyễn Tuấn Phong 100%

Nguyễn Gia Phương Anh 100%

Lê Thị Bảo Ngọc 100%

Nguyễn Việt Hoa 100%

Nguyễn Thị Nhung 100%

Đặng Ngọc Quỳnh 100%

Guidelines for peer evaluation:

 Disregard your general impression and concentrate on group members’ performance in


the case study within this course only.
 Make a fair, objective and impartial evaluation of group members.
 Sign the evaluation form to indicate group consensus.
 Attach the evaluation form at the end of the report.

Note: Your final mark for the case study will be equal to Your group result * Your peer
rating.

You might also like