Analysis of Student Performance Using Classification and MapReduce

International Journal of Pure and Applied Mathematics
Volume 118 No. 14 2018, 141-148

ISSN: 1311-8080 (printed version); ISSN: 1314-3395 (on-line version)
url: http://www.ijpam.eu
Special Issue
ijpam.eu
ANALYSIS OF STUDENT PERFORMANCE BASED ON CLASSIFICATION AND MAPREDUCE

APPROACH IN BIGDATA
Dr.R Senthil Kumar1, Jithin Kumar.K.P2
1,2
Department of Computer Science, Amrita School of Arts and Sciences, Amrita Vishwa Vidyapeetham, Mysuru
Campus Mysuru, India
sen07mca@gmail.com1, jithinkumarkp18@gmail.com2
Abstract:In recent years, a large amount of data stored represents clustering, if we have N observations it
in the educational database mainly contains student’s produces K clusters. For prediction of the student
information like marks and personal details. Using that performance multiple linear regression algorithms are
information, we can analyze the student's performance, used, which will transfer into MapReduce algorithm and
which will help both teachers and students. The teacher it will run on software MongoDB. The students' previous
can identify student’s performance and can counsel the data applied to linear regression algorithm and build the
student to perform better, likewise also the student can predictive model, which consists of two parts of Map
improve their performance in examinations. In this and Reduces function. Map function performs filtering
paper, the predictive (forecasting) modeling method is and sorting, such as sorting student’s first name and last
used to remove this error data. We use different data set name into columns and reduce function performs the
to develop a prediction model. MapReduce's MongoDB summary operation such as counting the number of
framework model will analyze the performance. The students in each column.
result will help teachers analyze student’s performance
in the classroom, and then examination department can The proposed model will predict students’ performance
choose appropriate faculty and teaching mediation to and it will help parents and teachers to understand
improve student educational outcomes with the use of student’s improvement in the exams. The main
MapReduce's MongoDB framework to implement the advantage of the system is that the time complexity is
student's results prediction. reduced and large data is supported for several practical
applications. It also supports online data, even if the
Keywords: Big Data; Mango DB; Map-Reducing; classification accuracy is increased for large data it
Learning analytics; Clustering; supports datasets and unnecessary data, and lower
computational complexity and efficiency of the system is
1.Introduction improved compared with the existing system.
Educational institutions have large amounts of data.
Those data will include students’ personal information 2. Literature survey
and the academic details. An Educational institution
needs to know the case history of the registered students Some related works are summarized here based on the
of their institute to predict their performance, and to help Analysis of Student Performance based on Classification
analyze the student performance in the future academic. and Map-reduce approach in Big Data. Alcala et.al [1]
This will help teachers to identify the student who needs suggested that educational data mining plagued an
an extra attention in classes. The main intention is to increasing number of ways to determine the evidence
identify and support the students to score better marks. from the data, mainly from the unknown source of data
driven by the model to highlight the strength and the
Data from different organizations are considered for weakness. Clint McElroy et.al [2] include online data of
input data sets, such as primary, secondary and advanced students which will help to identifying risk and good
levels of education, and testing whether different data students rapidly. The data will be increasing regularly.
mining methods are used to achieve the same Anna Lea Dyckhof et.al [3]
performance results. Big data consists of a large amount Introduced a learning analytics toolkit that helps teachers
of data. Existing MapReduce algorithm using Hadoop, discover and compare learning object usage, user
the proposed Map/reduce Algorithm using MongoDB attributes, user behavior, and graph-based results.
software is efficient data analysis in big data. For George Lepouras et.al [4] presented a n open source
clustering, advanced K-mean algorithm is used. It learning analytics platform that will generate data from
mainly helps to categorize student with their marks. K different sources and provide the necessary functionality
for all clients to make decisions for the learning process.
141
International Journal of Pure and Applied Mathematics Special Issue
Chatti et.al [5] This review has recently published four Prachuabsupakij et.al [16] proposed the effectiveness
dimensions about Los Angeles and its related fields and and efficiency of rule-based learning to predict student
mapping them to the reference model. In addition, we graduation and help improve the quality of education by
identified LA's various challenges and opportunities in matching the two pretreatment methods of SMOTE and
all aspects. Ourania Petropoulou et.al [6] Here, a Releif. Devasia et.al [17] it is a web-based application
complete view of the Hadoop MapReduce scheduling that utilizes naive Bayesian techniques to extract useful
algorithm, which helps researchers. MapReduce has information from 700 students at the University of
three important scheduling issues, such as region, Amrita. The results show that the naive Bayesian
management, and equality. The most common goal of algorithm is more accurate than other prediction
scheduling algorithms is to minimize the completion algorithms. Archanaa et.al [18] for comparative analysis
time of parallel applications. Ferreira et.al [7] of different controls Learning algorithms such as group
Educational analysis performance from the need to learning methods, trees, Bayesian methods, results
aggregate a variety of data sources, and easy to mobilize checking and reliable features, packaging methods -
selected information related to the difficulty of treatment classifier subset evaluation feature selection methods for
will help to identify management behavior. selection. Thangavel et.al [19] proposed a
recommendation system that predicts students have one
Mohan et.al [8] presented here a student study of the of five resettlement states. Dream, core, mass, do not
results of the study of Indian secondary education qualify. Also, know that they are most likely to achieve
secondary school student data and to determine the risk the individual placement of students. Dinesh et.al [20]
of their college students. Alshammari et.al [9] measured proposed an automated analysis system that monitors
the probability of using the cloud platform to analyze the students who attend online lectures from a remote
genome Bigdata. Here they detail about the big data. location and provides feedback to the teacher. Record
Zhang et.al [10] proposed a student performance and analyze classroom video to determine student
prediction model associated with the entire learning movement that may not be noticed by the teacher during
process. The model consists of four parts: data class.
collection and preprocessing, learning behavior analysis,
algorithm model building, and forecasting. And the As we have seen in the above paragraphs, there are few
application of enhanced Logistic regression algorithm to papers that discuss about the prediction of results over a
analyze the behavior of students and predict their limited dataset and the effectiveness of the algorithms is
performance. Oyelade et.al [11] here they will propose a not to the mark. The prediction of the result will be
statistical algorithm for arranging students score based on various information and there are very few
according to their performance. variables that are taken into consideration. There are a
lot of constraints and limitations to the algorithms that
Widyahastufi et.al [12] is designed to provide a predictor they have used, the limitations also include efficiency
of student performance in the final exam by applying and reliability.
linear regression and multi-layer perceptions in WEKA
in terms of accuracy, performance and error rates to 3. Proposed System
compare its feasibility. Wang et.al [13] proposed a
hierarchical prediction model that automatically predicts To identify students with educational risks and develop
student achievement based on student performance. predictive models to predict student’s performance, and
They used the feature-based regression back propagation to help identify students’ final outcome. Students’
neural network method, carefully considering the academic performance will be related to many features.
educational theory. Ramos et.al [14] aims to apply data The scope of this study is limited to the study of progress
mining techniques to determine the student's profile and in learning. The analysis of student performance consists
participation patterns at a higher distance course to of two functions:
predict each student's approval opportunities. Al-Shehri
et.al [15] used two models to predict the student's results a) Students who are at the academic risk
in the final exam. Vector machine algorithm and the K-
b) Predict the student performance.
Nearest Neighbor algorithm forecast the student's
performance on the data set and then compare its
accuracy.
142
(such as MapReduce), and sent directly to the database

to be executed.
B. Predict Student Performance

For educational institutes, the student's result analysis
and prediction are very important because of their
quality of education. The main process is the ability to
meet the needs of students. Analyzing the past
performance of these students will give a better
understanding. It should change their future results.
This can be done well using the concept of predictive
analytic and performance analytics. Predictive analytics
includes various statistical techniques from modeling,
machine learning, and data mining to analyze current
and past data to predict the future.
In the prediction methods, the main process is collecting

data. Here the data collected from different educational
Fig1. Architecture Diagram institutes will include students’ information and split the
data into sample data and test data. The predictive model
a) Students who are at the academic risk is developed with sample data with the help of statistical
Data is collected from different educational sectors and methods, and apply the model to test data to predict the
the lowest six subject marks are selected, it requires result.
proper method of extracting knowledge from large
repositories for better decision making. This presents an
important challenge to organizations that use data
management mechanisms to analyze, process and store
large data sets. Therefore, a new model called "big data
analytics" needs to be defined to reevaluate the existing
system and manage and process large amounts of student
data. Here we implement a new component of big data
analytics called "learning analytics". It refers to
performing various data produced by students in order to .
evaluate learning progress like, predict future Fig 2. Steps for Prediction
performance, and identify probable problems.
Advantages of proposed system
The first step of learning analytics is to collect data from  The time complexity of the system is reduced.
different educational institutions, this step is very  It supports for big data. So it will be applicable in
difficult, because of the real information about students’ several real applications. It also supports the online
details are fetched from the respective institutions. Data data.
collected from various institutes and since it contains a  The classification accuracy is increased even for big
large amount of data those data sets are considered as the data.
big data. Here MongoDB framework can be used. It’s  Support redundant data.
very cost-effective and it generates quicker data  Low computational complexity comparing with the
processing. existing work
 The system efficiency is improved.
MongoDB is free and open-source cross-platform,
document oriented database program. Classified as a
a) K-mean Algorithm Based on MapReduce
NoSQL database program. MongoDB can be used as a
file system with load balancing and data replication
features over multiple machines for storing files. MongoDB store the input dataset as sequence files of
<key, value> pair, their included all files denote a record
JavaScript can be used in queries, aggregation functions
in the given input dataset. The key is the starting point
143
of the data and value is the string content. Mapping is The final stage of the project is visualization. Here the
independent for each input key, reduce step is parallel visualization is done with Robomongo software. We can
for each input key. Here array has the clusters center view the analysis of student results and their
point and their information. Here the mapper function performance on table formats and graphs, those steps are
will compute all clusters nearest center point from the in the form of human-readable format.
dataset.
Algorithm MAP (key, value) II. EXPERIMENTAL RESULTS
Input: key and array of the value Fig 3 shows sample dataset, using for building the
Output: <key, value> pair predictive model. The sample dataset contains
information about student’s name (f name), students
1. Introducing one array record.(sample in the array); district, gender and mark for six subjects and result.
2. Counter=0;
3. While(v.next()!=NULL) Linear regression algorithms are used to verify the
dependent and independent variable connection in the
{ sample dataset. When the connection will identify, the
Create instance from v.next(); multiple linear regression model is built. When the
Add the value of instance to array; model is created we can apply the data set for the test.
Counter+=num; Here the mark belongs to the respective subjects taken as
} the dependent variables. After finding any relationship
between dependent and independent variable,
4. New counter=enter the array/counter; forecasting model is generated to predict the result for
5. Take key as key; the student as pass or fail.
6. Value as a string comprise of the values(generating
new values);
7. Output is in the form of <key, value>pair;
8. End
b) Algorithm Reduce(Key,V)
Input: key,value,mean counter
Output: <key,value>pair
1. Crete instance from value;

2. Assign mindist as double max.value;
3. Assign index as -1;
4. For each centers in array
do
i.complete distance from center and instance; Fig 3. Sample dataset
ii. if ditance less than mindist
{
Reassign mindist as distance
Reassign index as array index of
centers
}
5. End for
6. Key=index;
7. Build the value as a string ;
8. Output is in the form of <key, value> pair;
9. End
144
Fig6 shown the MapReduce function with the list is

shown as pass. Only pass students list will available on
there and we can also get the list of fail and gender (male
or female) etc. We can also get the results of students
marks in Ascending and descending order. Which will
help the educational institutes for identifying the risk
student and identifying who all getting higher marks in
exams.
Fig 4. Clustered Output.
The cluster output for the sample data set is given above
in the Fig 4, this figure contain six clusters that contain
all sample data sets. The given data set include a large
amount of information, which will separate into different Fig 6.Result of Passed Students(MapReduce)
clusters.
Fig.7 shows the student mark analysis with district wise,
The final forecast results are shown in Fig 5. Test data here the table shows the students come under district
sets are applied in forecast models established using a waynad. and we can also get the students details of other
sample dataset. The predicted result dataset contains district’s using MapReduce.
information about students' name, district, gender, and
result. The MapReduce function used to process the
given data set. Mainly map-reducing function focus on
the sorting, the students by the first name into queues,
filtering and sorting function will do in map () function.
And the Reduce () function will count the number of
students in each queue, it is called as summary
operation. It will manage communication around the
data and transfer between various part of the systems.
Fig 7. Student Mark Analysis With District Wise.
Fig 5.Predicted Result.

Fig 8.District Wise Mark Analysis on Bar chart.
MapReduce function also handles the data analysis. Here Above bar chart showing the district wise student
the student mark and result will show in the figures. performance analysis. The X-axis is document name
145
(district) and Y-axis is no of student pass the exams. [6] Ourania Petropoulou, Katerina Kasimatis, Ioannis
Using MapReduce technique, we can get the district Dimopoulos, and Symeon Retalis, LAe-R: A new
wise data. That will represent the above bar chart. learning analytics tool in Moodle for assessing
students’ performance, Bulletin of the IEEE
3. Conclusion Technical Committee on Learning Technology,
Volume 16, Number 1, January 2014
Here we propose a method "learning analytics and
[7] Ferreira, S. A., & Andrade, A. (2014). Academic
predictive analytics" to recognize students getting the
analytics: mapping the genome of the University.
poor mark in exams and predict what students will learn
IEEE Revista Iberoamericana de Tecnologias del
from the class. This analysis model will help teachers
Aprendizaje, 9(3), 98-105.
identify how or to what extent their students perform
differently so that teachers can choose appropriate [8] Mohan, M. M., Augustin, S. K., & Roshni, V. K.
teaching and pedagogical interventions to improve their (2015, December). A BigData approach for
teaching strategy. It also helps teachers to analyse the classification and prediction of student result
student success and failure, who are getting the higher using MapReduce. In Intelligent Computational
mark in the exam also is available here. This study will Systems (RAICS), 2015 IEEE Recent Advances
help to improve the student's academic performance and in (pp. 145-150). IEEE.
identify poor students and give some special attention to [9] Alshammari, H., Bajwa, H., & Lee, J. (2014,
improve their marks and minimize the percentage of the May). Hadoop based enhanced cloud architecture
failure ratio in the exams. The details of students for bioinformatic algorithms. In Systems,
include their personal details and marks. the MapReduce Applications, and Technology Conference
technique will help in sorting and ordering the student (LISAT), 2014 IEEE Long Island (pp. 1-5). IEEE.
data in the dataset.
[10] Zhang, W., Huang, X., Wang, S., Shu, J., Liu, H.,
References & Chen, H. (2017, June). Student Performance
Prediction via Online Learning Behavior
[1] Alcalá-Fdez, J., Sanchez, L., Garcia, S., del Jesus, Analytics. In Educational Technology (ISET),
M. J., Ventura, S., Garrell, J. M., ... & Fernández, 2017 International Symposium on (pp. 153-157).
J. C. (2009). KEEL: a software tool to assess IEEE.
evolutionary algorithms for data mining problems. [11] Oyelade, O. J., Oladipupo, O. O., & Obagbuwa, I.
Soft Computing-A Fusion of Foundations, C. (2010). Application of k Means Clustering
Methodologies, and Applications, 13(3), 307-318. algorithm for prediction of Students Academic
[2] McElroy, C. (2011). The online student profile Performance. arXiv preprint arXiv:1002.2425.
learning system: A learner-centered approach to [12] Widyahastuti, F., & Tjhin, V. U. (2017, July).
learning analytics. Predicting students performance the final
[3] Dyckhoff, A. L., Zielke, D., Bültmann, M., Chatti, examination using linear regression and multilayer
M. A., & Schroeder, U. (2012). Design and perceptron. In Human System Interactions (HSI),
implementation of a learning analytics toolkit for 2017 10th International Conference on(pp. 188-
teachers. Journal of Educational Technology & 192). IEEE.
Society, 15(3), 58. [13] Li, X., Xie, L., & Wang, H. (2016, August). Grade
[4] Lepouras, G., Katifori, A., Vassilakis, C., Prediction in MOOCs. In Computational Science
Antoniou, A., & Platis, N. (2014, July). Towards a and Engineering (CSE) and IEEE Intl Conference
learning analytics platform for supporting the on Embedded and Ubiquitous Computing (EUC)
educational process. In Information, Intelligence, and 15th Intl Symposium on Distributed
Systems and Applications, IISA 2014, The 5th Computing and Applications for Business
International Conference on (pp. 246-251). IEEE. Engineering (DCABES), 2016 IEEE Intl
Conference on (pp. 386-392). IEEE.
[5] Chatti, M. A., Dyckhoff, A. L., Schroeder, U., &
Thüs, H. (2012). A reference model for learning [14] Ramos, T., Gomes, A., Lucena, M., Nunes, I.,
analytics. International Journal of Technology Valentim, R., & Nóbrega, G. (2017, June). Use of
Enhanced Learning, 4(5-6), 318-331. educational data mining to identify distance
learning students' profiles and patterns of
146
participation. In Information Systems and

Technologies (CISTI), 2017 12th Iberian
[15] Al-Shehri, H., Al-Qarni, A., Al-Saati, L., Batoaq,
A., Badukhen, H., Alrashed, S., ... & Olatunji, S.
O. (2017, April). Student performance prediction
using Support Vector Machine and K-Nearest
Neighbor. In Electrical and Computer Engineering
(CCECE), 2017 IEEE 30th Canadian Conference
on (pp. 1-4). IEEE.
[16] Prachuabsupakij, W., & Doungpaisan, P. (2016,
October). Matching preprocessing methods for
improving the prediction of student's graduation.
In Computer and Communications (ICCC), 2016
2nd IEEE International Conference on (pp. 33-
37). IEEE.
[17] Devasia, T., Vinushree, T. P., & Hegde, V. (2016,
March). Prediction of students performance using
Educational Data Mining. In Data Mining and
Advanced Computing (SAPIENCE), International
[18] R. Archanaa, V. Athulya, T. Rajasundari and M.
V. K. Kiran, "A comparative performance analysis
on network traffic classification using supervised
learning algorithms," 2017 4th International
Conference on Advanced Computing and
Communication Systems (ICACCS), Coimbatore,
2017.
[19] Thangavel, S. K., Bkaratki, P. D., & Sankar, A.
(2017, January). Student placement analyzer: A
recommendation system using machine learning.
In Advanced Computing and Communication
Systems (ICACCS), 2017 4th International
[20] Dinesh, D., & Bijlani, K. (2016, August). Student
analytics for productive teaching/learning. In
Information Science (ICIS), International
Conference on(pp. 97-102). IEEE.
147
148

Analysis of Student Performance Using Classification and MapReduce

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Analysis of Student Performance Using Classification and MapReduce

Uploaded by

Copyright:

Available Formats

International Journal of Pure and Applied Mathematics

Volume 118 No. 14 2018, 141-148

ANALYSIS OF STUDENT PERFORMANCE BASED ON CLASSIFICATION AND MAPREDUCE

(such as MapReduce), and sent directly to the database

B. Predict Student Performance

In the prediction methods, the main process is collecting

Algorithm MAP (key, value) II. EXPERIMENTAL RESULTS

Input: key,value,mean counter

1. Crete instance from value;

Fig6 shown the MapReduce function with the list is

Fig 4. Clustered Output.

Fig 7. Student Mark Analysis With District Wise.

Fig 5.Predicted Result.

participation. In Information Systems and

You might also like