Professional Documents
Culture Documents
Introduction
Analysis need to be made know which factor may affect the student's performance, we classify
the
score into couple of ranks, and figure out which feature affects the score more significant.
localhost:8888/nbconvert/html/Downloads/students-performance-in-exams.ipynb?download=false 1/12
12/23/21, 9:05 PM students-performance-in-exams
Out[2]: test
parental level of math reading writing
gender race/ethnicity lunch preparation
education score score score
course
associate's
3 male group A free/reduced none 47 57 44
degree
Insights:
gender object
Out[4]:
ethnicity object
parent_education object
lunch object
pre object
math int64
reading int64
writing int64
dtype: object
localhost:8888/nbconvert/html/Downloads/students-performance-in-exams.ipynb?download=false 2/12
12/23/21, 9:05 PM students-performance-in-exams
plt.show()
Insights:
We can see that male has better performance on math field, but worse on reading and writing.
In [6]:
fig, ax = plt.subplots()
fig.subplots_adjust(hspace=0.8, wspace=0.8, left = 0.2, right = 1.5)
for idx in range(3):
plt.subplot(1,3, idx+1)
ethn_df = score_df.groupby("ethnicity")[list(score_df.columns[-3:])[idx]].mean()
sns.barplot(x=ethn_df.index, y = ethn_df.values, palette = "Greens")
plt.xlabel("Group")
plt.ylabel("mean score")
plt.xticks(rotation=90)
plt.title(list(score_df.columns[-3:])[idx])
plt.show()
localhost:8888/nbconvert/html/Downloads/students-performance-in-exams.ipynb?download=false 3/12
12/23/21, 9:05 PM students-performance-in-exams
Insights:
Obviously, group E has best performance for all the fields, and group A is the worst.
localhost:8888/nbconvert/html/Downloads/students-performance-in-exams.ipynb?download=false 4/12
12/23/21, 9:05 PM students-performance-in-exams
Insights:
The score distribution got narrower if students complete the preparation before test,
and also we can see that the average of the score is better.
In [8]:
for item in score_df.columns[-3:]:
sns.boxplot(x=score_df["lunch"], y=score_df[item])
plt.title(item+" vs lunch", loc="left")
plt.show()
localhost:8888/nbconvert/html/Downloads/students-performance-in-exams.ipynb?download=false 5/12
12/23/21, 9:05 PM students-performance-in-exams
Insights:
Students are easier to get better score once they eat standardly.
localhost:8888/nbconvert/html/Downloads/students-performance-in-exams.ipynb?download=false 6/12
12/23/21, 9:05 PM students-performance-in-exams
train_df = score_df.copy()
train_df["parent_education"] = labelencoder.fit_transform(train_df["parent_education
train_df["pre"] = labelencoder.fit_transform(train_df["pre"])
train_df["lunch"] = labelencoder.fit_transform(train_df["lunch"])
train_df.head()
0 female group B 1 1 1 72 72 74
1 female group C 4 1 0 69 90 88
2 female group B 3 1 1 90 95 93
3 male group A 0 0 1 47 57 44
4 male group C 4 1 1 76 78 75
Insights:
Insights:
localhost:8888/nbconvert/html/Downloads/students-performance-in-exams.ipynb?download=false 7/12
12/23/21, 9:05 PM students-performance-in-exams
score_df["classification"] = kmeans_label
score_df.head(10)
Out[11]: gender ethnicity parent_education lunch pre math reading writing classificatio
classification
ax.set_xlabel('Classiffication')
localhost:8888/nbconvert/html/Downloads/students-performance-in-exams.ipynb?download=false 8/12
12/23/21, 9:05 PM students-performance-in-exams
ax.set_ylabel('Scores')
ax.set_xticks(ind)
ax.legend()
plt.show()
In [17]:
class_df["total_ave_score"] = (class_df.math + class_df.reading + class_df.writing)/
rank = class_df["total_ave_score"].sort_values(ascending = False)
rank.index
In [18]:
rank
classification
Out[18]:
3 91.787879
7 81.981735
1 75.039735
4 70.258454
0 64.402151
6 59.264103
5 50.143791
2 34.324786
Name: total_ave_score, dtype: float64
For top5 rank, the average score all passed, Rank0 is the best cluster, Rank1 is second one and
so on.
From now on, we can find out the correlation between the performance of students and
features. Let's plot pie chart to see whether parents education level can affect the performance
or not.
In [ ]:
def plot_pie_chart(column):
fig, ax = plt.subplots(figsize=(20,16))
color = ["orange","lightblue","green","yellow","red","pink","brown","gray"]
for idx in range(8):
plt.subplot(3, 3, idx+1)
num = "class"+ str(idx)
num = score_df[score_df["classification"]==rank.index[idx]]
percentage_of_parent_edu = num[column].value_counts()
percentage_of_parent_edu.sort_index()
label = percentage_of_parent_edu.index
value = percentage_of_parent_edu.values
plt.pie(value, labels = label, autopct = "%1.1f%%",
localhost:8888/nbconvert/html/Downloads/students-performance-in-exams.ipynb?download=false 9/12
12/23/21, 9:05 PM students-performance-in-exams
Insights:
Let's define the high degree of education. Parents having bachelor or master degree are
high-level educated. So we focus on these two terms.
As pie chart were shown above, we can easily understand the ratio of high-degree
education. For the rank0, its ratio is around 32%. In addition, there are no differences
between rank1 to rank3, and the ratio are around 15~17%. Finally, the ratio is only 8% in
rank7.
We calculated the average score of each rank before, so we can say that parent's education
affect the score but not obviously, because there are still 70%~80% parents without high
education degree.
percentage_of_column = score_df[score_df["classification"]==rank.index[4]][colum
for i in range(len(percentage_of_column.index)):
rects = ax.bar(ind - width/(i+1),
index_dict[percentage_of_column.index[i]],
width, label=percentage_of_column.index[i])
ax.set_xlabel('Rank')
ax.set_ylabel('# of students')
ax.set_title("Percentage of " + column)
ax.set_xticks(ind)
ax.legend()
plt.show()
plot_bar_chart("pre")
localhost:8888/nbconvert/html/Downloads/students-performance-in-exams.ipynb?download=false 10/12
12/23/21, 9:05 PM students-performance-in-exams
Insights:
Over 50% of students in rank0 completed the test preparation course, and normally
It is say that preparation course can help students get better score.
In [22]:
plot_bar_chart("lunch")
Insights:
Also the same trend as "pre". Students who had lunch before test got better score.
localhost:8888/nbconvert/html/Downloads/students-performance-in-exams.ipynb?download=false 11/12
12/23/21, 9:05 PM students-performance-in-exams
In [23]:
plot_bar_chart("gender")
Insights:
Conclusion
There are few conclusions below:
1. Parents' education level may affect the performance of students, but not the important one.
2. Finishing preparation course is benefitial.
3. Having lunch is important to students, and it is also the most significant one.
4. Gender has no correlation with the score.
In summary, if students want to have good performance, they should have enough
nutrient and make effort to prepare the test.
In [ ]:
localhost:8888/nbconvert/html/Downloads/students-performance-in-exams.ipynb?download=false 12/12