You are on page 1of 50

Educational Data Mining

A Practical Aspect of Data Mining


Inderjeet Singh

Guru Nanak Dev Engineering College


inderjeetsinghit@gndec.ac.in

MPIT201226

Submitted To
Prof. Gurdeep Singh
August 30, 2014

Inderjeet Singh ( M.Tech IT (PT))

Educational Data Mining

August 30, 2014

1 / 50

Knowledge Discovery in Databases (KDD)

Figure 1: Pictorial Representation of KDD

Inderjeet Singh ( M.Tech IT (PT))

Educational Data Mining

August 30, 2014

2 / 50

Data Mining Dened

Figure 2: Animation Representing Data Mining Process


Inderjeet Singh ( M.Tech IT (PT))

Educational Data Mining

August 30, 2014

3 / 50

Data Mining Dened

Data mining, writes Joseph P. Bigus in his book, Data Mining with
Neural Networks, is the ecient discovery of valuable, non-obvious
information from a large collection of data.

Non-trivial extraction of implicit, previously unknown and potentially useful


information from data.

Exploration and analysis, by automatic or semi-automatic means, of large


quantities of data in order to discover meaningful patterns.

Inderjeet Singh ( M.Tech IT (PT))

Educational Data Mining

August 30, 2014

4 / 50

Why Data Mining?

Helpful in Market Analysis


Helpful in Analysing customer patterns for Banking System
Helpful in Law Enforcement
Helpful in Analysing Student Performance
Data mining may help scientists in classifying and segmenting data

Inderjeet Singh ( M.Tech IT (PT))

Educational Data Mining

August 30, 2014

5 / 50

Architecture of Data Mining System

Figure 3: Architecture of Data Mining Process


Inderjeet Singh ( M.Tech IT (PT))

Educational Data Mining

August 30, 2014

6 / 50

Data Mining Techniques


Classifcation
Data mining, writes Joseph P. Bigus in his book, Data Mining with
Neural Networks, is the ecient discovery of valuable, non-obvious
information from a large collection of data.

Association Techniques
Non-trivial extraction of implicit, previously unknown and potentially useful
information from data.

Clustering
Exploration and analysis, by automatic or semi-automatic means, of large
quantities of data in order to discover meaningful patterns.

Inderjeet Singh ( M.Tech IT (PT))

Educational Data Mining

August 30, 2014

7 / 50

Classication

Aim
Predict categorical class/labels for new tuples/samples.

Input
A training set of tuples/samples each with a class label.

Output
A model (classier) based on the training data set and class label.

Inderjeet Singh ( M.Tech IT (PT))

Educational Data Mining

August 30, 2014

8 / 50

Classication (contd.)
Building Classier (Model)

Figure 4: Building Classier: An example


Inderjeet Singh ( M.Tech IT (PT))

Educational Data Mining

August 30, 2014

9 / 50

Classication (contd.)
Using Classier (Model)

Figure 5: Using Classier on test data: An example


Inderjeet Singh ( M.Tech IT (PT))

Educational Data Mining

August 30, 2014

10 / 50

Classication Techniques
ZeroR
ZeroR is the simplest classication method which relies on the target and
ignores all predictors.

Figure 6: Data Set on which ZeroR Classication Technique is applied


Inderjeet Singh ( M.Tech IT (PT))

Educational Data Mining

August 30, 2014

11 / 50

Classication Techniques (contd.)


Naive Bayesian
The Naive Bayesian classier is based on Bayes' theorem with independence
assumptions between predictors.

Algorithm
Bayes theorem provides a way of calculating the posterior probability,
from

P(c), P(x),

and

Inderjeet Singh ( M.Tech IT (PT))

P (c|x),

P (x|c).

Educational Data Mining

August 30, 2014

12 / 50

Classication Techniques (contd.)


Example of Naive Bayesian Classication
The posterior probability can be calculated by rst, constructing a frequency
table for each attribute against the target.The class with the highest posterior
probability is the outcome of prediction.

Inderjeet Singh ( M.Tech IT (PT))

Educational Data Mining

August 30, 2014

13 / 50

Classication Techniques (contd.)


J48 (A Java Implementation of C4.8)(Decision Tree Based
Classication)
Decision tree builds classication or regression models in the form of a
tree structure.
It breaks down a dataset into smaller and smaller subsets while at the
same time an associated decision tree is incrementally developed.
The nal result is a tree with decision nodes and leaf nodes.

Inderjeet Singh ( M.Tech IT (PT))

Educational Data Mining

August 30, 2014

14 / 50

Classication Techniques (contd.)


Core Algorithm of J48
The core algorithm for building decision trees called ID3 by J. R. Quinlan
which employs a top-down, greedy search through the space of possible
branches with no backtracking. ID3 uses Entropy and Information Gain to
construct a decision tree.

Entropy
A decision tree is built top-down from a root node and involves partitioning
the data into subsets that contain instances with similar values (homogenous).

Information Gain
Constructing a decision tree is all about nding attribute that returns the
highest information gain (i.e., the most homogeneous branches).

Inderjeet Singh ( M.Tech IT (PT))

Educational Data Mining

August 30, 2014

15 / 50

Classication Techniques (contd.)


Types of Entropy
Entropy using the frequency table of one attribute

Figure 7: Example of Entropy calculation for single attribute

Inderjeet Singh ( M.Tech IT (PT))

Educational Data Mining

August 30, 2014

16 / 50

Classication Techniques (contd.)


Types of Entropy (contd.)
Entropy using the frequency table of two attribute

Figure 8: Example of Entropy calculation for two attribute


Inderjeet Singh ( M.Tech IT (PT))

Educational Data Mining

August 30, 2014

17 / 50

Classication Techniques (contd.)


Calculating Information Gain
Calculate entropy of the target.
The dataset is then split on the dierent attributes. The entropy for
each branch is calculated. Then it is added proportionally.

Figure 9: Entropy Attributes

Gain(T , X ) = Entropy (T ) Entropy (T , X )


Inderjeet Singh ( M.Tech IT (PT))

Educational Data Mining

August 30, 2014

18 / 50

Classication Techniques (contd.)


Calculating Information Gain (contd.)
Choose attribute with the largest info. gain as the decision node.
A branch with entropy of 0 is a leaf node. A branch with entropy more
than 0 needs further splitting.
The ID3 algorithm is run recursively on the non-leaf branches.

Figure 10: Decision Rules


Inderjeet Singh ( M.Tech IT (PT))

Educational Data Mining

August 30, 2014

19 / 50

Classication Techniques (contd.)


K Nearest Neighbors (KNN)
K nearest neighbors is a simple algorithm that stores all available cases and
classies new cases based on a similarity measure (e.g., distance functions).

Figure 11: Various Distance Functions


Inderjeet Singh ( M.Tech IT (PT))

Educational Data Mining

August 30, 2014

20 / 50

Classication Techniques (contd.)


K Nearest Neighbors (KNN) in action
Consider the following data concerning credit default. Age and Loan are two
numerical variables (predictors) and Default is the target.

We can now use the training set to classify an unknown case using Euclidean
distance.
Inderjeet Singh ( M.Tech IT (PT))

Educational Data Mining

August 30, 2014

21 / 50

Classication Techniques (contd.)


Logistic Regression
Logistic regression predicts the probability of an outcome that can
only have two values (i.e. a dichotomy).
The prediction is based on the use of one or several predictors
(numerical and categorical).

Inderjeet Singh ( M.Tech IT (PT))

Educational Data Mining

August 30, 2014

22 / 50

Clustering
Clustering (Dened)
A cluster is a subset of data which are similar.
Clustering (also called unsupervised learning) is the process of dividing a
dataset into groups such that the members of each group are as similar
(close) as possible to one another, and dierent groups are as dissimilar
(far) as possible from one another.
Two main groups of clustering algorithms are Hierarchical and Partitive

Inderjeet Singh ( M.Tech IT (PT))

Educational Data Mining

August 30, 2014

23 / 50

Clustering Types
Hierarchical Clustering
Hierarchical clustering involves creating clusters that have a
predetermined ordering from top to bottom.
For example, all les and folders on the hard disk are organized in a
hierarchy.
There are two types of hierarchical clustering, Divisive and
Agglomerative.

Inderjeet Singh ( M.Tech IT (PT))

Educational Data Mining

August 30, 2014

24 / 50

Clustering Types (contd.)


Partitive Clustering
Partitive clustering is categorized as a prototype-based model, i.e., each
cluster can be represented by a prototype, leading to a concise description
of the original data set. Two types of Partitive clustering are:K Means Clustering
Self Organizing Maps

K Means Clustering
K-Means clustering intends to partition n objects into k clusters in which
each object belongs to the cluster with the nearest mean.

L=

Inderjeet Singh ( M.Tech IT (PT))

1
2n

k X
n
2
X
(j)

xi cj

(1)

j=1 i=1

Educational Data Mining

August 30, 2014

25 / 50

Clustering Types (contd.)


Self Organizing Maps (SOM)
SOM is used for visualization and analysis of high-dimensional datasets.
SOM facilitate presentation of high dimensional datasets into lower dimensional ones.
It is an unsupervised learning algorithm.

Inderjeet Singh ( M.Tech IT (PT))

Educational Data Mining

August 30, 2014

26 / 50

Clustering Types (contd.)


Algorithm (SOM)
Initialization of each node's weights with a number between 0 and 1.
Choosing a random input vector from training dataset
Calculating the Best Matching Unit (BMU). The distance between the
input vector and the weights of node is calculated in order to nd the
BMU.

v
u i=n
uX
Dist = t
(Vi Wi )2

(2)

i=0
V= the current input vector & W= the node's weight vector
Calculating the size of the neighborhood around the BMU with an exponential decay function.

(t) = 0 exp t

Modication of node's weights of the BMU and neighboring nodes,

Inderjeet Singh ( M.Tech IT (PT))

Educational Data Mining

August 30, 2014

27 / 50

Association Rules
Association Rule (Dened)
Association Rules nd all sets of items (itemsets) that have support
greater than the minimum support and then using the large itemsets
to generate the desired rules that have condence greater than the
minimum condence.
The lift of a rule is the ratio of the observed support to that expected
if X and Y were independent.

Inderjeet Singh ( M.Tech IT (PT))

Educational Data Mining

August 30, 2014

28 / 50

Association Rules
Example of Association Rule

Figure 12: Association rules application is market basket analysis


Inderjeet Singh ( M.Tech IT (PT))

Educational Data Mining

August 30, 2014

29 / 50

Association Rules
Apriori Algorithm
Candidate itemsets are generated using only the large itemsets.
The large itemset of the previous pass is joined with itself.
Each generated itemset having a small subset is deleted.

Figure 13: Association rules application in market basket analysis


Inderjeet Singh ( M.Tech IT (PT))

Educational Data Mining

August 30, 2014

30 / 50

Visualization
Receiver Operating Characteristic curve (ROC) Analysis
ROC curve is a plot of the true positive rate against the false positive
rate for the dierent possible cutpoints of a diagnostic test.
The closer the curve follows the left-hand border and then the top
border of the ROC space, the more accurate the test.
The closer the curve comes to the 45-degree diagonal of the ROC
space, the less accurate the test.

Other Visualization Techniques


Tree Visualization
Graph Visualization

Inderjeet Singh ( M.Tech IT (PT))

Educational Data Mining

August 30, 2014

31 / 50

Educational Data Mining (EDM)

EDM (Dened)
Educational data mining can be dened as An emerging discipline concerned
with developing methods for exploring the unique types of data that come
from educational settings and using those methods to better understand
students, educational system and the settings which they learn in and grow
.[1]
Educational Data Mining (EDM) can be dened as an emerging discipline,
concerned with developing methods for exploring the unique types of data
that come from educational settings, and using those methods to better
understand students and associated educational parameters. [2]

Inderjeet Singh ( M.Tech IT (PT))

Educational Data Mining

August 30, 2014

32 / 50

EDM- Literature Survey (contd.)

Suman and Pooja Mittal (2014)


Suman and Pooja Mittal (2014)[3] described the various approaches and
techniques of data mining which can be applied on Educational data to build
up a new environment to improve performance of existing data and help to
create the new predictions on the data and performed the comparative study
of classication techniques are Bayes net, naive net and decision tree etc And
clustering techniques are k-mean, hierarchal, OPTICS and DBSCAN etc.

Inderjeet Singh ( M.Tech IT (PT))

Educational Data Mining

August 30, 2014

33 / 50

EDM- Literature Survey (contd.)


Smita Bhanap and Rasika Kulkarni (2013)
Smita Bhanap and Rasika Kulkarni (2013)[4] discussed capabilities of data
mining in context of education by presenting a conceptual model for students
as well as teachers.

S. Anupama Kumar and M. N. Vijayalakshmi (2013)


S. Anupama Kumar and M. N. Vijayalakshmi (2013)[1] explored the various
approaches and techniques of data mining which can be applied on Educational data to build up a new environment so as to give the new predictions
on the data.

Mohd Maqsood Ali (2013)


Mohd Maqsood Ali (2013) [5]examined the role of data mining in an education sector and laid emphasis on application of data mining that contribute
to oer competitive courses and improve their business.
Inderjeet Singh ( M.Tech IT (PT))

Educational Data Mining

August 30, 2014

34 / 50

EDM- Literature Survey (contd.)


Manpreet Singh Bhullar (2012)
Manpreet Singh Bhullar (2012)[6] discussed about data mining , their dierent phase's, advantages and used J48 algorithm to predict the result of the
student.

Surjeet Kumar Yadav and Saurabh Pal (2012)


Surjeet Kumar Yadav and Saurabh Pal (2012)[7] discussed the prediction
techniques that will help to identify the weak students and help them to
score better marks using C4.5, ID3 and CART decision tree algorithms.

A. Tadiparthi, S. Prasad.R, T. Rao S.N (2011)


Anuradha.Tadiparthi, Satya Prasad.R, Tirumala Rao S.N (2011)[8] proposed
the data mining technique called association rule mining can be applied to
identify the subjects in which the students are weak in the current semester
using previous semester's results.
Inderjeet Singh ( M.Tech IT (PT))

Educational Data Mining

August 30, 2014

35 / 50

EDM- Literature Survey (contd.)


J. Ranjan and R. Ranjan (2010)
Jayanthi Ranjan and Raju Ranjan (2010) [9] discussed about the prediction
of higher education system pathway that can result in student preformance
improvement, retention rate and grant/fund management of an institution
in India.

Delavari et al (2004)
Delavari et al (2004) discussed a new model for using data mining in higher
educational system.

Waiyamai (2003)
Waiyamai (2003) suggested that the use of data mining in education can
help improve the quality of graduate students.

Inderjeet Singh ( M.Tech IT (PT))

Educational Data Mining

August 30, 2014

36 / 50

EDM- Literature Survey


Luan (2002)
Luan (2002) studied the impact of data mining on higher education. This
study helped to gain insights about the existing higher education worldwide
and its improvement from data mining perspective.

Ma et al (2000)
Ma et al (2000) visualized that the education domain oers many interesting
and challenging applications for data mining. First, an educational institution often has many diverse and varied sources of information.

There are

the traditional databases (e.g. students' information, teachers' information,


class and schedule information, alumni information), online information (online web pages and course content pages) and more recently, multimedia
databases.

Inderjeet Singh ( M.Tech IT (PT))

Educational Data Mining

August 30, 2014

37 / 50

Educational Data Mining- EDM


EDM in Action
Brijesh Kumar Baradwaj and Saurabh Pa (2011)[10] demonstrated how in
an educational system, a students' performance can be determined by applying data mining techniques on the internal assessment and end semester
examination resulting into a model that performs prediction weaker students.

Variable

Description

PSM

Previous Semester Marks

CTG
SEM
ASS
GP
ATT
LW

Class Test Grade


Seminar Performance
Assignment
General Prociency
Attendance
Lab Work

ESM

End Semester Marks

Inderjeet Singh ( M.Tech IT (PT))

Possible Values

First > 60%, Second > 45 & < 60%


Third >36% & < 45% Fail < 36%
Poor , Average, Good
Poor , Average, Good
Yes, No
Yes, No
Poor , Average, Good
Yes, No
First > 60%, Second > 45 & < 60%
Third >36% & < 45% Fail < 36%

Educational Data Mining

August 30, 2014

38 / 50

EDM in Action (contd.)


Steps for Peforming Analysis
1

Data Preparations

Data selection and transformation

Decision Tree

The ID3 Decision Tree (J48 an equivalent in WEKA)

Measuring Impurity

Entropy =

pj log2 pj

(3)

j
6

Splitting Criteria

Gain(S, A) = Entropy (S)

X
v Values(A)

Inderjeet Singh ( M.Tech IT (PT))

Educational Data Mining

|Sv |
Entropy (Sv )
|S|
August 30, 2014

(4)

39 / 50

EDM in Action (contd.)

Split Information
SplitInformation(S, A) =

n
X
|Si |
i=1

|S|

log2

|Si |
|S|

(5)

Gain Ratio
GainRatio(S, A) =

Inderjeet Singh ( M.Tech IT (PT))

Gain(S, A)
SplitInformation(S, A)

Educational Data Mining

August 30, 2014

(6)

40 / 50

EDM in Action (contd.)


Data Set for Training
S. No.
1.
2.
3.
4.
...

PSM
First
First
First
First
...

CTG
Good
Good
Good
Average
...

SEM
Good
Average
Average
Good
...

ASS
Yes
Yes
No
No
...

GP
Yes
No
No
No
...

ATT
Good
Good
Average
Good
...

LW
Yes
Yes
No
Yes
...

ESM
First
First
First
First
...

Entropy Gain Calculation


Entropy (S) = pfirst log2 (pfirst ) psecond log2 (psecond )
pthird log2 (pthird ) pfail log2 (pfail )
=
Inderjeet Singh ( M.Tech IT (PT))

1.964

Educational Data Mining

August 30, 2014

41 / 50

EDM in Action (contd.)


Information Gain Calculation
|SFirst |
Entropy (SFirst )
S
|SSecond |
|SThird |

Entropy (SSecond )
Entropy (SThird )
S
S
|SFail |

Entropy (SFail )
S

Gain(S, PSM) = Entropy (S)

Gain Values
Gain
Gain(S,PSM)
Gain(S,CTG)
Gain(S,SEM)
Gain(S,ASS)
...

Inderjeet Singh ( M.Tech IT (PT))

Value
0.577036
0.515173
0.365881
0.218628
...

Educational Data Mining

August 30, 2014

42 / 50

EDM in Action (contd.)


Root Node Selection

Figure 14: Attribute with High Gain as Root Node

Split Information
Split
Split(S,PSM)
Split(S,CTG)
Split(S,SEM)
...
Inderjeet Singh ( M.Tech IT (PT))

Value
1.386579
1.448442
1.597734
...

Educational Data Mining

August 30, 2014

43 / 50

EDM in Action (contd.)


Gain Ratio
Gain Ratio
Gain Ratio(S,PSM)
Gain Ratio(S,CTG)
Gain Ratio(S,SEM)
...

Value
0.416158
0.355674
0.229
...

Decision Rules
IF PSM = `First' AND ATT = `Good' AND CTG = `Good' or
`Average' THEN ESM = First
IF PSM = `First' AND CTG = `Good' AND ATT = `Good' OR
`Average' THEN ESM = `First'
IF PSM = `Second' AND ATT = `Good' AND ASS = `Yes' THEN
ESM = `First ...

Inderjeet Singh ( M.Tech IT (PT))

Educational Data Mining

August 30, 2014

44 / 50

WEKA- Data Mining Tool


Weka (pronounced Weh-Kuh) workbench contains a collection of visualization tools and algorithms for data analysis and predictive modeling together
with graphical user interfaces for easy access to this functionality.

Figure 15: Weka Explorer


Inderjeet Singh ( M.Tech IT (PT))

Educational Data Mining

August 30, 2014

45 / 50

WEKA- Data Mining Tool (contd.)


Features of WEKA
Free availability under the GNU General Public License
A comprehensive collection of data preprocessing and modeling
techniques
Ease of use due to its graphical user interfaces
The various panels in WEKA are:

The Preprocess panel has facilities for importing data from a database,
a CSV le, etc
The Classify panel enables the user to apply classication and regression
algorithms
The Associate panel provides access to association rule learners
The Cluster panel gives access to the clustering techniques in Weka
The Select attributes panel provides algorithms for identifying the most
predictive attributes in a dataset
The Visualize panel allows to visualize data with various techniques
Inderjeet Singh ( M.Tech IT (PT))

Educational Data Mining

August 30, 2014

46 / 50

References I
S. A. Kumar and M. N. Vijayalakshmi, Relevance of data mining
techniques in edication sector, International Journal of Machine
Learning and Computing, vol. 3, no. 3, pp. 46, 2013.
M. Goyal and R. Vohra, Applications of data mining in higher
education, International Journal of Computer Science, vol. 9, no. 1,
pp. 113120, 2012.
Suman and P. Mittal, A comparative study on role of data mining
techniques in education: A review, International Journal of Emerging
Trends & Technology in Computer Science (IJETTCS), vol. 3, pp.
6569, 2014.
S. Bhanap and R. Kulkarni, Student - teacher model for higher
education system, Current Trends in Technology and Science, vol. II,
pp. 258261, 2013.

Inderjeet Singh ( M.Tech IT (PT))

Educational Data Mining

August 30, 2014

47 / 50

References II
M. M. Ali, Role of data mining in education sector, International
Journal of Computer Science and Mobile Computing, vol. 2, pp.
374383, 2013.
M. S. Bhullar and A. Kaur, Use of data mining in education sector,
Proceedings of the World Congress on Engineering and Computer
Science, vol. I, pp. 69, 2012.
S. K. Yadav and S. Pal, Data mining: A prediction for performance
improvement of engineering students using classication, World of
Computer Science and Information Technology Journal (WCSIT),
vol. 2, pp. 5156, 2012.
Anuradha.Tadiparthi, S. Prasad.R, and T. R. S.N, Identifying weak
subjects using association rule mining, International Journal of
Scientic & Engineering Research, vol. 2, pp. 13, 2011.

Inderjeet Singh ( M.Tech IT (PT))

Educational Data Mining

August 30, 2014

48 / 50

References III

J. Ranjan and R. Ranjan, Application of data mining techniques in


higher education in india, International Conference On Innovation In
Redening Business Horizons Institute of Management Technology,
Ghaziabad, 2008.
B. K. Baradwaj and S. Pal, Mining educational data to analyze
students' performance, International Journal of Advanced Computer
Science and Applications, vol. 2, no. 6, pp. 6369, 2011.

Inderjeet Singh ( M.Tech IT (PT))

Educational Data Mining

August 30, 2014

49 / 50

Thank You for Listening !


This presentation had been made using LATEX

Inderjeet Singh ( M.Tech IT (PT))

Educational Data Mining

August 30, 2014

50 / 50