You are on page 1of 16

Using Logistic Regression to predict

Secondary School Student Performance


W R I T T E N B Y: S H A H N O O R , H A S A N S A R WA R , H A L A YA S M I N , U M A R H AY YAT, Z A I N - U L - A B I D I N , M U H A M M A D
REHMAN SHAHID

PRESENTED BY:
SHAHNOOR ALI
DEPARTMENT OF COMPUTER SCIENCE
NATIONAL TEXTILE UNIVERSITY, FAISALABAD
Overview
Introduction
Literature Survey
Material and Methods
Result
Conclusion

DEPARTMENT OF COMPUTER SCIENCE


NATIONAL TEXTILE UNIVERSITY, FAISALABAD
Introduction
Failure in Mathematics
Factors involved in student failure
To meet the furtherance of these
countries like Europe, new
techniques, proficiency in a particular
field, craftsmanship must be
instigated and cramming must be
halted in our education system.
Smoking, drinking, drug abuse can
be one of the factors of student failure with
its ramification of a student losing
self-confidence, becoming discouraged and
decreasing their effort in work.

DEPARTMENT OF COMPUTER SCIENCE


NATIONAL TEXTILE UNIVERSITY, FAISALABAD
Other factors evolve such as truancy from classes, dropping out, redoing the grade or nether
education.
It has been observed that a significant number of students (about 20%) are hypothetically
primitive and failed to achieve good marks.
Moreover, to meet the furtherance of these countries, new techniques, proficiency in a
particular field, craftsmanship must be instigated and cramming must be halted in our education
system.

DEPARTMENT OF COMPUTER SCIENCE


NATIONAL TEXTILE UNIVERSITY, FAISALABAD
Literature Survey
According to the last few years, several various significant studies have been carried out to
develop different models for assessing students’ performance by considering different factors
like family pay, direction from parents, teacher and student relationship, school distance and sex
of the students Author Context Variable Sample Statistical Analysis Analysis Tools
Size
P. Cortez Failure of students 32 650 Classification, RMiner
(2008) Regression
Madeeha Child’s failure in school 64 699 Simple random Excel
(2009) and grade retention sampling method
C. Gbollie Cause and Reasons for 13 323 Correlation, Mean, Statistical packages for the
(2017) the failure of students Standard Deviation social sciences (SPSS 17.0.)

Irfan Factors contributing in 5 155 Mean, Standard Appropriate statistical


Mushtaq failures of students deviation, correlation, package
(2012)
L. Factors positively 10 650 Frequency, percentage, Excel
Kalagbor influencing on Mean
(2012) student’s academic
performance

DEPARTMENT OF COMPUTER SCIENCE


NATIONAL TEXTILE UNIVERSITY, FAISALABAD
Problem Statement
Failure has been a serious problem in countries and
several methods have been proposed to measure
academic failure.
Logistic Regression is the best way of finding
student’s failure as it gives result in binary format i.e.
0 or 1. It does not answers when it is 0.0, 0.1 or 1.1,
1.2.
So this way the results will be clear and results will
be easy to interpret i.e. whether student failed or not
and what factors specifically influencing failure the
most.
Moreover, it would be easy to eliminate those factorsfrom the nation that are influencing failure the
most.

DEPARTMENT OF COMPUTER SCIENCE


NATIONAL TEXTILE UNIVERSITY, FAISALABAD
Material and Methods
Data taken by Paulo Cortez is used and 13 variables include were age, quality of family
relationships, going out with friends, current health status, number of school absences, gender
of students, weekly study time, internet access at home, extra-curricular activities, number of
pas class failures, cohabitation status of parents, wants to take higher eduation, in a romantic
relationship were tested.
Logistic Regression, used in research projects that require the analysis of the relationship of
dependent variable or of a result with one or more independent variables or predictors when
the dependent variable is either
(a) Dichotomous, with only two classifications, for instance, if one has failed (yes or no).

DEPARTMENT OF COMPUTER SCIENCE


NATIONAL TEXTILE UNIVERSITY, FAISALABAD
Attribute Description For Logistic Regression
Age Age of student (numeric: from 15 to 22) ____________

Famrel Quality of family relationships (numeric: from 1 – very bad to 5 – excellent) ____________

Gout Going out with friends (numeric: from 1 – very low to 5 – very high) ____________

Health Current health status (numeric: from 1 – very bad to 5 – very good) ____________

Absences Number of school absences (numeric: from 0 to 93) ____________


Sex Gender of Student (binary: female or male) 1 = female
0 = male

StudyTime Weekly study time (numeric: 1 – < 2 hour, 2 – 2 to 5 hours, 3 – 5 to 10 hours or 4 – > 1 = 2 to 5 hours
10 hours) 0 = 5 to 10 hours

Internet Internet access at home (binary: yes or no) 1 = no


0 = yes

Activities extra-curricular activities (binary: yes or no) 1 = no extra-curricular activities


0 = yes to extra-curricular activities

Failures number of past class failures (numeric: n if 1 ≤ n < 3, else 4) 1 = no failure


0 = one or more than one failure

PStatus Cohabitation status of parents (binary: apart or living together) 1 = parents are apart
0 = parents are living together

Higher wants to take higher education (binary: yes or no) 1 = yes to higher education
0 = no to higher education

Romantic in a romantic relationship (binary: yes or no) 1 = in a romantic relationship


0 = not in a romantic relationship

DEPARTMENT OF COMPUTER SCIENCE


NATIONAL TEXTILE UNIVERSITY, FAISALABAD
The failure status of a student was categorized as never failing in any subject (1) and failing in
at least one subject (0).

DEPARTMENT OF COMPUTER SCIENCE


NATIONAL TEXTILE UNIVERSITY, FAISALABAD
In order to apply Logistic Regression, Data wrangling was done on the dataset. Some of the
nominal variables of dataset were transmuted into numeric and also into dichotomous, i.e. 1 or
0.

DEPARTMENT OF COMPUTER SCIENCE


NATIONAL TEXTILE UNIVERSITY, FAISALABAD
The Logistic Regression is derived from the Straight-Line Equation(1) and then reducing the
equation(1) ranging only from 0 to 1 resulting equation(2). In this way, the Logistic Regressions’
predictions are in the form of probabilities of an occasion happening, i.e., the likelihood of y=1,
given specific estimations of input variables x.
𝑌 = 𝐶 + 𝐵1 𝑋1 + 𝐵2 𝑋2 + ⋯ → 𝑅𝑎𝑛𝑔𝑒 𝑖𝑠 𝑓𝑟𝑜𝑚 − (𝑖𝑛𝑓𝑖𝑛𝑖𝑡𝑦) 𝑡𝑜 (𝑖𝑛𝑓𝑖𝑛𝑖𝑡𝑦) (1)
𝐵𝑦 𝑅𝑒𝑑𝑢𝑐𝑖𝑛𝑔
𝑌 = 𝐶 + 𝐵1 𝑋1 + 𝐵2 𝑋2 + ⋯ → 𝐼𝑛 𝐿𝑜𝑔𝑖𝑠𝑡𝑖𝑐 𝐸𝑞𝑢𝑎𝑡𝑖𝑜𝑛, 𝑌 𝑐𝑎𝑛 𝑏𝑒 𝑜𝑛𝑙𝑦 𝑓𝑟𝑜𝑚 0 𝑡𝑜 1 (2)
1 𝑒𝑥
=
1+𝑒 −𝑥 1+𝑒 𝑥
𝑌
log → 𝑌 = 𝐶 + 𝐵1 𝑋1 + 𝐵2 𝑋2
1−𝑌

DEPARTMENT OF COMPUTER SCIENCE


NATIONAL TEXTILE UNIVERSITY, FAISALABAD
Result
The accuracy of the predictions was estimated through classification report comprises of precision,
recall, f1-score, support.
Confusion matrix demonstrates the quality of predictions.

PN PY

AN 5 16

AY 1 97

Classification Report Confusion Matrix

DEPARTMENT OF COMPUTER SCIENCE


NATIONAL TEXTILE UNIVERSITY, FAISALABAD
Conclusion
In this paper, we have addressed the prediction of the grades of secondary school students in a
core class i.e. Mathematics by using previous school grades, demographic, social and other
school related data.
Past academic performances, extra-curricular activities, going out with friends, these social
factors cause academic failure.
The applied algorithm predicted the model results with accuracy count of 86%.

DEPARTMENT OF COMPUTER SCIENCE


NATIONAL TEXTILE UNIVERSITY, FAISALABAD
References
https://www.academia.edu/15907212/AN_ANALYSIS_OF_THE_FACTORS_INFLUENCING_STUDE
NT_S_ACADEMIC_PERFORMANCE_IN_HO_POLYTECHNIC
http://www3.dsi.uminho.pt/pcortez/student.pdf
https://www.hindawi.com/journals/edri/2017/1789084/
https://bmcpediatr.biomedcentral.com/articles/10.1186/1471-2431-11-114
https://www.researchgate.net/publication/320317776_IoT-
based_students_interaction_framework_using_attention-scoring_assessment_in_eLearning
https://www.researchgate.net/publication/269317478_Tap_into_visual_analysis_of_customiza
tion_of_grouping_of_activities_in_eLearning
http://ijpe.penpublishing.net/makale/471

DEPARTMENT OF COMPUTER SCIENCE


NATIONAL TEXTILE UNIVERSITY, FAISALABAD
https://www.researchgate.net/publication/312200014_Real-time_imaging-
based_assessment_model_for_improving_teaching_performance_and_student_experience_in_
e-learning
https://www.researchgate.net/publication/329785670_Students'_Attention_Assessment_in_e
Learning_based_on_Machine_Learning
http://www.tojet.net/volumes/v8i2.pdf
https://ieeexplore.ieee.org/abstract/document/7474213

DEPARTMENT OF COMPUTER SCIENCE


NATIONAL TEXTILE UNIVERSITY, FAISALABAD
Email at: shahnoorali9716@gmail.com

DEPARTMENT OF COMPUTER SCIENCE


NATIONAL TEXTILE UNIVERSITY, FAISALABAD

You might also like