Professional Documents
Culture Documents
net/publication/328815363
CITATIONS READS
0 164
3 authors, including:
Akhilesh Patil
M.S. Ramaiah Institute of Technology
5 PUBLICATIONS 6 CITATIONS
SEE PROFILE
All content following this page was uploaded by Akhilesh Patil on 23 January 2019.
Abstract— The main objective of this paper is to provide an students receive a final grade based on how they perform in
overview of the deep learning techniques that can be used to their respective internal examination as well as their final
predict a student’s performance as compared to other semester end examination. After clearing all the internal
traditionally used machine learning techniques. In our research, examinations, students can predict the outcome of their final
we developed feed forward neural networks and recurrent neural grade.
networks for developing a model to effectively predict the student
GPA. The recurrent neural networks gave greater accuracy as The huge data that is available to us may contain certain trends
compared to feed forward neural networks, as they have memory and patterns. These trends play a key role in identifying the
and take into consideration the consistency of the student prospects of the future. The task at hand is to identify the
performance. The main contribution of the paper is that we have relationships between these prospects and assign a score to each
compared various recurrent neural architectures such as single of them. This process of predicting the score can be accurately
hidden layer long short-term memory network, long short-term done by computers by using the modern techniques of Machine
memory network with multiple hidden layers, and Bi-directional Learning. There are various methods to approach this problem
long short-term memory network with multiple hidden layers. We to predict the final grade of students. Using deep learning
compared these techniques with root mean square error as the techniques, one can come up with a model to solve this
parameter of comparison and found Bi-directional long short- problem. This paper gives a brief description of the use of
term memory network to have the least error of 8.2%. A LSTM (Long Short-Term Memory) Networks which is a type
comparison of results of the proposed technique versus other deep
of Recurrent Neural Network to come up with a model to solve
learning models and machine learning techniques has been
provided in the section VIII and the visualization of the results has
the problem.
been provided in section IX. The novelty of the method proposed
is that it has memory to differentiate tuples with different order of II. OBTAINING THE DATASET AND DATA
scores and learn to assign the weights of relationship between PREPROCESSING
nodes by scanning the sequence in both directions compared to
decision tree, SVM, feed forward neural network based algorithms
The dataset which we train our machine learning model is
which have been earlier used to solve this problem of predicting basically a register of marks containing the marks of 2000
student score. students in five subjects and the expected grade point averages.
These marks are randomly generated according to a certain
Keywords— Artificial Neural Networks, Recurrent Neural logic and the grades are also given according to a certain
Networks, Long Short Tertm Memory Networks, Bidirectional Long grading methodology. The marks are present in the form of a
Short Term Memory Networks, Root Mean Square Error. csv file, which can be read using the pandas library in python.
The pandas library is used to make a data frame out of the csv
I. INTRODUCTION file, making it easier to manipulate. For pre-processing of the
data, the Minmax Scalar method present in the pre-processing
Predicting the outcome of any task which has been undertaken class of the sklearn library was suitable. The training set and the
by an individual is a very interesting topic which has caught the test set was split on an 80:20 basis. The input variables and the
attention of people today. The technique of analysing the past output target variables are the marks in various subjects and
events to predict the outcome of the future is known as semester grade point averages respectively. The snapshot of the
Predictive Analysis. Predictive Analysis incorporates many simulated dataset is as depicted in the below table:
techniques such as data mining, machine learning, statistical
modelling and artificial intelligence to analyse current data to
make predictions about the future. Using these techniques, we TABLE I. SNIPPET OF DATASET
can predict quite accurately the outcome of the student’s grade. Sub1 Sub2 Sub3 Sub4 Sub5 CGPA SGPA
As we all know, most of the colleges and schools in India today 28 26 27 32 33 6.4 6.43
follow the CIE (Continuous Internal Evaluation) system to 33 36 39 42 38 7.4 7.06
assess and grade their students. In this type of grading system, 38 38 43 37 44 7.8 7.17
34 33 38 40 44 7.4 7.14
B. How can Recurrent Neural Networks be used to detect In the first step, the LSTM model decides by a sigmoid layer,
abnormalities. looks at a previously computed hidden layer- and the
present output- and outputs a number between 0 and 1 for
As we know, a recurrent neural network has a looping
each number in state .
methodology between its hidden layers which keeps track of the
previous results before coming up with a final predicted score. , (1)
This way we can keep track of the consistency maintained by a
student in a particular semester as well as his consistence Secondly, we need to determine as to what is going to be stored
throughout his semester history. On comparing the test values in the successive cell. Firstly, the sigmoid layer decides as to
of the dataset corresponding to a student’s grade point averages which values to update. Finally, the tanh layer vectorizes the
and the predicted values we can make out abnormalities like values to form the values. The next step involves the
discrepancies in marks entry, underperformance of a student combination of the two layers.
and some critical cases. A deviation of grade points over a
certain threshold will help in identifying the abnormalities.
. , (2)
VI. LONG SHORT-TERM MEMORY NETWORKS
Humans do not start to think from scratch every time they read tanh . , (3)
some new article. As a person reads an article, he or she reads
every word based on the understandings of the previous words.
Our thoughts can retain the things of the past. A simple neural
In the next step, we update the old cell state to new state
network will not have the ability do so. Recurrent neural
. In the case of our model this is where we drop the
networks can overcome this shortcoming, due to the presence
information of the old state and add new information, as
of a looping methodology in the hidden layers. Long Short-
decided earlier. Here, is the previous old cell state.
Term Memory networks are a special kind of Recurrent Neural
Networks, which have the capability of learning from
previously computed output results. In our problem where we
predict the grades of the students, the whole idea here is that the ∗ ∗ (4)
grades in our data-frame are a sequence of marks which we may
consider as a vector. The RNN that obtains the vectors as input
and considers the order of vectors to generate predictions. From Finally, we come up with the output, that is the final
the embedding layer, the new representations will be passed to information of a sentence. For the model, it might want to
LSTM cells. These will add recurrent connections to the output information relevant the outcome of the result.
network, so we can include information about the sequence of . , (5)
marks collected. Finally, the cells of the LSTM will go to
sigmoid output layer. We use a sigmoid because we are trying ∗ (6)
to predict the final grade. Unlike artificial neural networks Long
Short-Term Memory Networks do not have a single layer of
neural network. Instead, they have four interacting and VII. ROOT MEAN SQUARE ERROR
repeating modules of chain like structures. The basic structure
of the LSTM architecture is as shown in the below figure. We can check the accuracy of the recurrent neural network by
using a statistical method called root mean squared error. In this
methodology the exact difference between the observed
frequency and the predicted frequency of values can be
determined. The individual differences can be considered as
residuals, and the RMSE serves to aggregate them into a single
measure of predictive power. The RMSE of a model prediction
with respect to the estimated variable X model is defined as the
square root of the mean squared error. For our Long Short-Term
Memory Network model, the residual values were founded for
the test values and the predicted grade values. The following
equation is used to find the Root mean square error.
n
( X obs,i − X mo del ,i ) 2
i =1
RMSE = (7)
n
Fig 2. Long Short-Term Memory Network.
2017 IEEE International Conference on Computational Intelligence and Computing Research
where Xobs is observed values and Xmodel is modelled values at Using multiple layers-five to be precise, the root mean squared
time/place i. error was tuned up-to 15 percent. This indicates that the model
was found to be 85 percent accurate.
Incorporating bidirectionality in the Long Short-Term Memory
VIII. IMPROVING ACCURACY USING Networks the root mean squared error was decreased to 7.4
BIDIRECTIONAL LSTMs percent. Hence, an accuracy of 92.6 percent was obtained. Since
Bidirectional Long Short-Term Memory Networks (LSTMs) the bidirectional Long Short-Term Memory Networks were
can extract the most out of an input sequence by running over found to be more accurate, we can choose this model to come
the input sequence both in the forward direction as well as in up with the statistics of student data.
the backward direction. The architecture is developed by We could obtain the number of students whose marks were
creating another copy of the first recurrent layer in the neural predicted with 5 percent error 10 percent error and 20 percent
network so that there are two layers side-by-side. The next step error. Out of the 2000 students, 1357 student marks were
is to provide the input sequence as it is to the first layer and predicted with less than or equal to 5 percent error, 238 student
feeding an inverted copy to the second duplicated layer. This marks were predicted with less than or equal to 10 percent error
approach is effectively used along-side the Long Short-Term and 5 student marks were predicted with less than or equal 20
Memory Networks. percent error. We also compared the Root Mean Square Error
of Machine Learning techniques like Decision Tree, Support
Vector Machine, Random Forest and Linear Regression.
Through our comparisons we were able to show that the
Bidirectional Long Short-term Memory networks used in our
research showed the best result.
We can compare the results by observing the values of the part
of the dataset used as test data and the predicted final scores of
the students in respective. The test and the predicted values for
both the above-mentioned cases are as depicted in the below
figure:
X. VISUALISATION
A visualisation was also obtained for comparing the predicted
and the actual values. The X axis in this visualisation depicts
the student ID and the Y axis represents the grades. The
visualisations were done for three models used for comparative
analysis. The blue shade indicates the predicated values and the
orange shade indicated the actual values. The visualisations
were carried out for the more accurate main deep learning
algorithms that is Bi-directional Long Short-Term Networks
and Multi-Layered Long Short-Term Memory Networks. Fig 6. Visualisation of Artificial Neural Network
2017 IEEE International Conference on Computational Intelligence and Computing Research
REFERENCES
[1] Emaan Abdul Majeed, and Khurun Nazir Junejo, “Grade Prediction Using
Supervised Machine Learning Techniques,” e-Proceedings of the 4th
Global Summit on Education 2016.
[2] Ermiyas Birihanu Belachew, and Feidu Akmel Godena, “Student
Performance Prediction Model using Machine Learning Approach: The
case of Wolkite University”, in International Journal if Advanced
Research in Computer Science and Software Engineering: Volume 7,
Issue 2, Febuary 2017.
[3] Mashael A. Al-Barrak, and Muna Al-Razgan, “Predicting Students Final
GPA using Decesion Trees: A Case Study”, in International Journal of
Information and Education Technology, Vol 6, No 7, July 2016.
[4] Janusz Szczypula, “Comparitive Study of artificial neural network and
statistical models for oredicting student grade point averages”, in
International Journal of Forecasting, Volume 10, Issue 1, June 1994.
[5] Colah’s Blog, “Understanding LSTM Networks”, August 27, 2015.
[6] Zafar Iqbal*,Junaid Qadir**, Adnan Noor Mian* and Faisal Kamiran,
Machine Learning Based Student Grade Prediction: A Case Study.
[7] Havan Agrawal, Harshil Mavani, Student Performance Prediction using
Machine Learning, International Journal of Engineering Research &
Technology (IJERT) Vol. 4 Issue 03, March-2015.
[8] Jie Xu, Member, IEEE, Kyeong Ho Moon, Student Member, IEEE, and
Mihaela van der Schaar, Fellow, IEEE, A Machine Learning Approach
Fig 7. Visualisation of Bi-directional Long Short-Term Networks. for Tracking and Predicting Student Performance in Degree Programs
Models
Chart Title
100
Percentage
50
0
Error Accuracy