0 views

Uploaded by IJETA - EighthSenseGroup

The academic leaders encourage in promoting the education of students who can lead the nation one day. Therefore, monitoring the progress of student's academic performance is an important issue to the academic community. To achieve this issue, there is a system that consists of two parts. Firstly, the student's results are analysed based on cluster analysis and second the standard statistical algorithm is used to arrange their results data according to the level of their performance. In this paper, efficient k-means clustering algorithm is implemented to analyse the students' data. The efficient k-means clustering can be used to monitor the progress of the students' performance and to make the decisions by the teachers/academic planner in each semester for the improving of the future academic results of the students.

- V2I3-IJERTV2IS3115
- IJETTCS-2014-08-08-89
- Digital Booklet - Nothing Was the Same
- A Robust Initialization Algorithm for K-Means Clustering in Power Distribution Networks With PMU-based Adaptive Protection System
- [IJCST-V6I6P9]:Faten Alkrdy, Jabr Hanna
- 3 Mahout Clustering
- Presentation 1
- Cluster.docx
- A Heuristic Approach for Network Data Clustering
- UNCOVERING FEATURES NOISE WITH K-MEANS ALGORITHM USING ENTROPY MEASURE
- Journal of Computer Applications - www.jcaksrce.org - Volume 4 Issue 2
- DWM_Experiment5_E059
- Clustering on Boston Dataset
- 5A324B51.pdf
- cs707_022812
- Analyzing the Food Quality Using K-Means Clustering
- A survey of Network Intrusion Detection using soft computing Technique
- 06991841
- Differential Evolution With Local Information for Neuro-Fuzzy Systems Optimisation
- John a. Hartigan-Clustering Algorithms-John Wiley & Sons (1975)

You are on page 1of 4

Students’ Academic Performance

Ei Ei Phyo [1], Ei Ei Myat [2]

Department of Information Technology

Technological University (Thanlyin)

Myanmar

ABSTRACT

The academic leaders encourage in promoting the education of students who can lead the nation one day. Therefore, monitoring

the progress of student’s academic performance is an important issue to the academic community. To achieve this issue, there is

a system that consists of two parts. Firstly, the student’s results are analysed based on cluster analysis and second the standard

statistical algorithm is used to arrange their results data according to the level of their performance. In this paper, efficient k-

means clustering algorithm is implemented to analyse the students’ data. The efficient k-means clustering can be used to

monitor the progress of the students’ performance and to make the decisions by the teachers/academic planner in each semester

for the improving of the future academic results of the students.

Keywords :— clustering, k-means clustering, initial centroid, min-max normalization, gain ratio.

I. INTRODUCTION researchers have put forward various methods to improve the

Every university set commonly used indicators of academic efficiency and time of K-means algorithm. K-means uses the

performance is Grade Point Average (GPA). But grade system concept of Euclidean distance to calculate the centroids of the

is also used some of the public schools/university. This grade clusters. This method is less effective when new data sets are

system and GPA scales may vary significant. But it is added and have no effect on the measured distance between

important to determine specifically how grade and GPA are various data objects. The computational complexity of k

calculated to measure academic performance. It is difficult to means algorithm is also very high [2] [9]. Also, K-means is

obtain a comprehensive view of the state of the students’ unable to handle noisy data and missing values. Data pre-

performance and simultaneously discover important details processing techniques are often applied to the datasets to make

from their performance with grouping of students based on them cleaner, consistent and noise free.

their average scores [1]. In this paper presents k-means Normalization is used to eliminate redundant data

clustering as a simple and efficient tool to monitor the and ensures that good quality clusters are generated which can

progression of students’ performance in higher institution. improve the efficiency of clustering algorithms. So it becomes

Data mining tools and techniques are used to generate an an essential step before clustering as Euclidean distance is

effective result which was earlier difficult and time consuming. very sensitive to the changes in the differences [10].

Data mining is widely used in various areas like financial data A feature weight algorithm can be seen as the

analysis, retail and telecommunication industry, biological combination of a search technique for proposing new feature

data analysis, fraud detection, spatial data analysis and other subsets, along with an evaluation measure which scores the

scientific applications. different feature subsets. The simplest algorithm is to test each

Clustering is a technique of data mining in which similar possible subset of features finding the one which minimizes

objects are grouped into clusters. Clustering techniques are the error rate. This is an exhaustive search of the space, and is

widely used in various domains like information retrieval, computationally intractable for all but the smallest of feature

image processing, etc. [2] [3]. There are two types of sets [11].

approaches in clustering: hierarchical and partitioning. In

hierarchical clustering, the clusters are combined based on II. K-MEANS CLUSTERING ALGORITHM

their proximity or how close they are. This combination is The basic idea of K-means algorithm is to classify the

prevented when further process leads to undesirable clusters. dataset D into k different clusters where D is the dataset of n

In partition clustering approach, one dataset is separated into data; k is the number of desired clusters. The algorithm

definite number of small sets in a single iteration [4]. consists of two basic phases [12]. The first phase is to select

The most widely used clustering algorithm is the K- the initial centroids for each cluster randomly. The second and

means algorithm. This algorithm is used in many practical final phase is to take each point in dataset and assign it to the

applications. It works by selecting the initial number of nearest centroids [12]. To measure the distance between points

clusters and initial centroids [5] [6]. Moreover initial centroids Euclidean Distance method is used. When a new point is

are chosen randomly due to which clusters produced vary assigned to a cluster the cluster mean is immediately updated

from one run to another. Also various data points exist on by calculating the average of all the points in that cluster [10].

International Journal of Engineering Trends and Applications (IJETA) – Volume 5 Issue 6, Nov-Dec 2018

After all the points are included in some clusters the early GainRatio (A) =

grouping is done. Now each data object is assigned to a cluster

based on closeness with cluster centre where closeness is

The attribute with the maximum gain ratio is selected as

measured by Euclidean distance. This process of assigning a

the splitting attribute. The split information approaches 0, the

data points to a cluster and updating cluster centroids

ratio becomes unstable. A constraint is added to avoid this,

continues until the convergence criteria is met or the centroids

whereby the information gain of the test selected must be

don’t differ between two consecutive iterations. Once, a

large at least as great as the average gain over all tests

situation is met where centroids don’t move any more the

examined.

algorithm ends. The k-means clustering algorithm is given

below. C. Proposed Methodology

Step 1: Begin with a decision on the value of k = number of The proposed efficient k-means clustering is upgraded the

clusters. origin k-means clustering to reduce the computational

Step 2: Put any initial partition that classifies the data into k complexity. In the proposed efficient k-means clustering

clusters. You may assign the training samples method, the normalization and feature weight are applied.

randomly, or systematically as the following: Firstly, the methodology employs normalized dataset by using

1. Take the first k training sample as single-element min-max normalization to improve the efficiency of clustering

clusters algorithm. After that gain ratio method compute feature

2. Assign each of the remaining (N-k) training weights for each attributes of the data to minimize the error

samples to the cluster with the nearest centroid. rate. It the centroids are then posted to the traditional

After that each assignment, recomputed the centroid clustering algorithms for being executed in the way it

of the gaining cluster. normally does. The results of the proposed work are validated

Step 3: Take each sample in sequence and compute its against number of iterations and accuracy obtained and

distance from the centroid of each of the clusters. If a compared with the randomly selected initial centroids.

sample is not currently in the cluster with the closest Step 1: Accept the dataset to cluster as input values

centroid, switch this sample to that cluster and update Step 2: Perform a linear transformation on the original dataset

the centroid of the cluster gaining the new sample and using mix-max normalization

the cluster losing the sample. Step 3: Compute the feature weight for each attribute and

Step 4: Repeat step 3 until convergence is achieved, that is update the dataset.

until a pass through the training sample causes no new Step 4: Initialize the first K cluster

assignments. Step 5: Calculate centroid point of each cluster formed in the

dataset.

III. METHODOLOGY Step 6: Assign each record in the dataset for only one of the

A. Min-Max Normalization initial cluster using a measure Euclidean distance.

Step 7: Repeat step 4 until convergence is achieved, that is

Min-max normalization [11] performs a linear

until a pass through the training sample causes no new

transformation on the original data. Min-max is a technique

assignments.

that helps to normalize a dataset. It will scale the dataset

between the 0 and 1. Suppose that minA and maxA are the

IV. RESULTS

minimum and maximum values of an attribute, A. Min-max

normalization maps a value, v, of A to v' in the range We tested the dataset (academic result of one semester) of

[new_minA, new_maxA] by computing a Technological University (Thanlyin). The performance

index of the grade system is shown in Table 1.

= Table 1. Performance Index

Min-max normalization preserves the relationships among

the original data values. It will encounter an “out-of-bounds” 61 – 80 Grade B

error if a future input case for normalization falls outside of

the original data range for A. 41 - 60 Grade C

Information gain applied to attributes that can take on a 0 - 20 Grade E

large number of distinct values might learn the training set too

well. The information gain measure is biased toward tests with

many outcomes. That is, it prefers to select attributes having a In Table 2, all marks of 7 courses of 66 students in

large number of values. The gain ratio [13] is defined as Technological University (Thanlyin).

Table 2. Statistics of the Data used

International Journal of Engineering Trends and Applications (IJETA) – Volume 5 Issue 6, Nov-Dec 2018

Student’s Scores

Students courses

Data 66 7

For K = 3, the result is generated as shown in Table 3. In

cluster 1, the cluster size is 29 and the overall performance is

39.57. The cluster numbers 2 and 3 are 18, 19 and the overall

performance are 56.43 and 69.93, respectfully.

Table 3. K= 3

Overall

Cluster# Number of Students

Performance

1 29 39.57

students fall in the region of “Grade D” performance index in

table 1 above (32.57%), while 16 students has performance in

In Table 4, there are 4 clusters. The cluster sizes are 1, 2, 3 the region of “Grade C” performance (54.14%). 16 and 17

and 4 are 17, 16, 16, 17 and the overall performance are 32.57, students has a “Grade B” performance (67.79%) and (66.43%).

54.14, 67.79 and 66.43, respectively.

Table 4. K= 4

Overall

Cluster# Number of Students

Performance

1 17 32.57

2 16 54.14

3 16 67.79

4 17 66.43

cluster sizes are 1, 2, 3, 4 and 5 are 11, 13, 17, 10, 15 and the

overall performance are 30.00, 59.00, 52.71, 67.36 and 70.50,

respectively. Figure 2. Overall performance of students k = 4

Overall

that, 11 students crossed over to “Grade B” performance

Cluster# Number of Students

Performance region (39%), while 13 students had “Grade C” performance

1 11 30.00 results (59%). 17 students fall in the region of “Grade C”

performance index (52.71%), 10 students were in the region of

2 13 59.00 “Grade B” performance (67.36%) and the remaining 15

3 17 52.71 students had “Grade B” performance (77.50%).

4 10 67.36

5 15 77.50

students had a “Grade D” performance (39.57%). The cluster

2 analyses showed that, 18 out of 66 students had performance

in the region of very “Grade C” performance (56.43%) and the

remaining 19 students had a “Grade B” performance (69.9%).

International Journal of Engineering Trends and Applications (IJETA) – Volume 5 Issue 6, Nov-Dec 2018

Figure 3. Overall performance of students k = 5 [10] Margaret H. Dunham, Data Mining-Introductory and

Advanced Concepts, Pearson Education, 2006

V. CONCLUSIONS [11] Navdeep Kaur and Krishan Kumar, “Normalization

In this paper, the methodology to compare k-means Based K-means Data Analysis Algorithm”, International

clustering and efficient k-means clustering. The proposed Journal of Advanced Research in Computer Science and

algorithm using k-means clustering and combine with the Software Engineering, ISSN: 2277 128X, Volume 6,

min-max normalization and gain ratio. A data set of results Issue, June 2016, pp. 455-457

with seven courses offered for that semester for each student [12] A.K. Jain, "Data clustering: 50 years beyond K-means",

for total number of 66 students, and produces the numerical Pattern Recognition Letters 31 (2010) 651–666.

interpretation of the results for the performance evaluation. [13] Ester Martin, Kriegel Hans-Peter introduced the Idea of

The efficient k-means algorithm serves as a good mark to “Clustering for Mining in Large Spatial Database”.

monitor the progression of students’ performance in higher

institution. The decision making by academic planners to

monitor the candidates’ performance semester by semester by

improving on the future academic results in the subsequence

academic session.

REFERENCES

[1] Oyelade, O. J, Oladipupo, O. O, Obagbuwa, I. C,

“Application of k-Means Clustering Algorithm for

prediction of Students’ Academic Performance”, IJCSIS

International Journal of Computer Science and

Information Security, Vol. 7, No 1, 2010, pp. 292-295.

[2] Md Sohrab Mahmud, Md. Mostafizer Rahman, Md.

Nasim Akhtar, “Improvement of K-means Clustering

algorithm with better initial centroids based on weighted

average”, 7th International Conference on Electrical and

Computer Engineering, 2012, pp. 647-650.

[3] Madhu Yedla, Srinivasa Rao Pathakota, TM Srinivasa,

“Enhancing K-means Clustering Algorithm with

Improved Initial Center”, International Journal of

Computer Science and Information Technologies

(IJCSIT),Vol.1 (2), 2010, pp. 121-125.

[4] Margaret H. Dunham, “Data Mining Introductory and

Advanced Concepts”, Pearson Education, 2006.

[5] Vaishali R. Patel, Rupa G. Mehta, “Impact of Outlier

Removal and Normalization Approach in Modified k-

Means Clustering Algorithm”, IJCSI International

Journal of Computer Science Issues, Vol. 8, Issue 5, No

2, 2011, pp. 331-336.

[6] McQueen J, “Some methods for classification and

analysis of multivariate observations,” Proc. 5th

Berkeley Symp. Math. Statist. Prob., Vol. 1, 1967, pp.

281–297.

[7] Zhang Chen, Xia Shixiong, “K-means Clustering

Algorithm with improved Initial Center”, Second

International Workshop on Knowledge Discovery and

Data Mining, 2009, pp. 790-792.

[8] David Arthur & Sergei Vassilvitskii, "How Slow is the k

means Method?", Proceedings of the 22nd Symposium

on Computational Geometry (SoCG), 2006, pp. 144-153.

[9] K. A. Abdul Nazeer, M. P. Sebastian, “Improving the

Accuracy and Efficiency of the k-means Clustering

Algorithm”, Proceedings of the World Congress On

Engineering 2009 Vol I, WCE 2009, pp. 308-312.

- V2I3-IJERTV2IS3115Uploaded byPrianca Kayastha
- IJETTCS-2014-08-08-89Uploaded byAnonymous vQrJlEN
- Digital Booklet - Nothing Was the SameUploaded byKarim Sowley Delgado
- A Robust Initialization Algorithm for K-Means Clustering in Power Distribution Networks With PMU-based Adaptive Protection SystemUploaded byMiguel Quispe
- [IJCST-V6I6P9]:Faten Alkrdy, Jabr HannaUploaded byEighthSenseGroup
- 3 Mahout ClusteringUploaded byFycm Achemlal
- Presentation 1Uploaded byANUBHA
- Cluster.docxUploaded byshankar_mission
- A Heuristic Approach for Network Data ClusteringUploaded byidescitation
- UNCOVERING FEATURES NOISE WITH K-MEANS ALGORITHM USING ENTROPY MEASUREUploaded byijsret
- Journal of Computer Applications - www.jcaksrce.org - Volume 4 Issue 2Uploaded byJournal of Computer Applications
- DWM_Experiment5_E059Uploaded byShubham Gupta
- Clustering on Boston DatasetUploaded byanubhav582
- 5A324B51.pdfUploaded bygi
- cs707_022812Uploaded bycecsdistancelab
- Analyzing the Food Quality Using K-Means ClusteringUploaded byIJARTET
- A survey of Network Intrusion Detection using soft computing TechniqueUploaded byInternational Journal for Scientific Research and Development - IJSRD
- 06991841Uploaded byFlorin Nastasa
- Differential Evolution With Local Information for Neuro-Fuzzy Systems OptimisationUploaded byCristian Klen
- John a. Hartigan-Clustering Algorithms-John Wiley & Sons (1975)Uploaded byCarolina Salas
- Introduction to Five Data ClusteringUploaded byerkanbesdok
- Clustering in AIUploaded byRam Kushwaha
- Cluster AnalysisUploaded bysogekingu88
- CAINE_109Uploaded bynobeen666
- Sde625 PaperUploaded byTriệuAnhTuân
- Knime Social Media Data Clustering WhitepaperUploaded byJavier Vicho Soto
- k-meansUploaded by박수혁
- COLOUR BASED IMAGE SEGMENTATION USING HYBRID KMEANS WITH WATERSHED SEGMENTATIONUploaded byIAEME Publication
- ThoughtsUploaded bykarteekak
- Hybrid Model Based on User Tags and Textual Passwords and Pearsonian Type III Mixture ModelUploaded byIJAERS JOURNAL

- [IJETA-V5I6P5]:K.Arumuganainar, P.Ramesh, R.RamaswamyUploaded byIJETA - EighthSenseGroup
- [IJCST-V6I6P2]:Savita.A.Harkude, Dr. G.N.Kodanda RamaiahUploaded byEighthSenseGroup
- [IJETA-V6I1P1]: Mohammad Omar FarooqueUploaded byIJETA - EighthSenseGroup
- [IJCST-V6I6P1]: Gandrapu Gideon, Prof. K. Venkata RaoUploaded byEighthSenseGroup
- [IJETA-V5I4P7]:V.Gokulradhai, R.PushpalathaUploaded byIJETA - EighthSenseGroup
- IJETA-V5I5P2.pdfUploaded byIJETA - EighthSenseGroup
- IJETA-V5I5P2.pdfUploaded byIJETA - EighthSenseGroup
- [IJETA-V6I2P2]:D. Yogesh, Dr. Azha. Periasamy, T. KaruppiahUploaded byIJETA - EighthSenseGroup
- [IJETA-V6I2P1]:Dr. M. Karthik, P. Shanmathi, R. Shankar, P. SivadharshiniUploaded byIJETA - EighthSenseGroup
- [IJETA-V5I5P3]:Nyome Tin, Khin Thanda HtunUploaded byIJETA - EighthSenseGroup
- [IJETA-V5I5P4]:Kyaw Naing SoeUploaded byIJETA - EighthSenseGroup
- [IJETA-V5I4P6]: Thu Zar Win, Aye Wai OoUploaded byIJETA - EighthSenseGroup
- [IJETA-V5I6P4]:Dinesh Bhardwaj, RishiUploaded byIJETA - EighthSenseGroup
- [IJETA-V5I5P1]: Saritha.T.J, P.MetildaUploaded byIJETA - EighthSenseGroup
- [IJETA-V5I4P5]: Subarani.R, P.MetildaUploaded byIJETA - EighthSenseGroup
- [IJETA-V5I4P1]:Mr.Shreenath Waramballi, Mr.Aravinda MUploaded byIJETA - EighthSenseGroup
- [IJETA-V5I3P18]:T. Karuppiah, Dr. Azha PeriasamyUploaded byIJETA - EighthSenseGroup
- [IJETA-V5I3P17]:Sammeta Lalitha , Mr.D.Shekar GoudUploaded byIJETA - EighthSenseGroup
- [IJETA-V5I3P16]:D.Kavitha, K.V.N MallikarjunraoUploaded byIJETA - EighthSenseGroup
- [IJETA-V5I3P10]:Mayura Yeole, Gayatri Chaskar, Mohini Koli, Jyoti Mishra, Sumeet SonwaneUploaded byIJETA - EighthSenseGroup
- [IJETA-V5I4P4]:Prachi Goyal, Dr. Rajesh K shuklaUploaded byIJETA - EighthSenseGroup
- [IJETA-V5I4P3]:Yu Yu Wai, Ei Ei MyatUploaded byIJETA - EighthSenseGroup
- [IJETA-V5I3P14]: Bhavkanwal Kaur, Puspendra Kumar PateriyaUploaded byIJETA - EighthSenseGroup
- [IJETA-V5I3P15]:DR. A.Singaravelan, M. Praveen KumarUploaded byIJETA - EighthSenseGroup
- [IJETA-V5I3P9]:G. Kameswari, B. Himaja, N. TirumalaUploaded byIJETA - EighthSenseGroup
- [IJETA-V5I3P11]:N.Janakiram, P.Murahari KrishnaUploaded byIJETA - EighthSenseGroup
- [IJETA-V5I4P2]:AbdulAhad, Dr.Suresh Babu Yalavarthi, Dr.Ali Hussain MdUploaded byIJETA - EighthSenseGroup
- [IJETA-V5I3P13]:V.Rushendramani, G.Ganesh NaiduUploaded byIJETA - EighthSenseGroup
- [IJETA-V5I3P12]:P.Ramanaidu, P.Murahari KrishnaUploaded byIJETA - EighthSenseGroup

- Board of Advanced Studies and Research Cecos University of It and Emerging Sciences(1)Uploaded byWajahat Ullah Khan Tareen
- ECE 252 Lec01 - Introduction to Computer Engineering - Wisconsin MadisonUploaded bybks
- VLSI Implementation of High Resolution High Speed Low Latency Pipeline Floating Point Adder Subtractor for FFT ApplicationsUploaded byDr. Rozita teymourzadeh, CEng.
- Capacitance LabUploaded byAditya P Banerjee
- Ejercicios para certificación JAVA SE 6Uploaded byGabriel Miranda
- Intermediate R - Analysis of Count and Proportion DataUploaded byVivay Salazar
- math fact practice websites sheetUploaded byapi-292427319
- Distributions of Residence Times (rtd) for.pptxUploaded byHamza Khezazna
- Latihan Matematik Tingkatan 4 03 - SetsUploaded by郑成龙
- Python for Data ScienceUploaded bySujeet Omar
- Vibrational Behaviour of Timber FloorsUploaded byjuanrius
- 204247419-Pressure-Swing-Adsorption.pdfUploaded byAnitaEriraAza
- The Opportunity Cost of Economics EducationUploaded byWilliam Huynh
- Permutation and Combination Solved ProblemsUploaded byvasu
- Bitonic SortUploaded byblahman123
- COMANDI ANSYSUploaded byAndrea Dott Nitti
- lect10.pdfUploaded byAmit Anand
- MSc Report FinalUploaded byAshik Rehmath Parambil
- Scene from "Proof" by David AubrunUploaded bygenewng
- ;;;;;;MAA BOOK;;;;;;Uploaded bySrijan Verma
- Sentiment Analysis Based on Interpretive Structure Modelling For Feature Selection and ExtractionUploaded byIJASRET
- ch10.pdfUploaded byTom
- DmcUploaded byravindrapadsala
- White Paper - Understanding DiversityUploaded byChantelle_Jon_FVlJB3
- NPTEL SortedCourses July 2018Uploaded byvipulkondekar
- How to Draw a Volute, Or Acanthus ScrollUploaded byPemiamos Hua
- SH-01Uploaded byiceman180688
- Unit 9Uploaded byTaufik Rhome
- M.tech. - Production Engineering & Engineering DesignUploaded byssss
- Generative Drafting IN CATIA V5Uploaded byspsharmagn