Professional Documents
Culture Documents
Classification
Classification of Student’s data Using Data Mining
Techniques for Training & Placement Department in
Technical Education
Page | 121
1 2
Samrat Singh, Dr. Vikesh Kumar
1
Ph.d Research Scholar, School of Computer Science & IT, Singhania University,Rajshtan, India
2
Professor & Director, Neelkant Institute of Technology, Meerut (India)
collected from the student’s database, to predict the pre-classified examples to determine the
performance grade. This paper also investigates the set of parameters required for proper discrimination. The
accuracy of Decision tree techniques for predicting student algorithm then encodes these parameters into a model called
performance. a classifier.
B. Clustering
2. Data Mining Definition & Techniques
Clustering can be said as identification of similar classes of Page |
Data mining, also popularly known as Knowledge objects. By using clustering techniques we can further 122
Discovery in Database, refers to extracting or identify dense and sparse regions in object space and can
“mining" knowledge from large amounts of data. Data discover overall distribution pattern and correlations
mining techniques are used to operate on large volumes of among data attributes. Classification approach can also
data to discover hidden patterns and relationships helpful in be used for effective means of distinguishing groups or
decision making. While data mining and knowledge classes of object but it becomes costly so clustering
discovery in database are frequently treated as synonyms, can be used as preprocessing approach for attribute
data mining is actually part of the knowledge discovery subset selection and classification.
process. The sequences of steps identified in extracting
knowledge from data are shown in Figure 1 C. Predication
Regression technique can be adapted for predication.
Knowledge Regression analysis can be used to model the relationship
between one or more independent variables and
dependent variables. In data mining independent variables
are attributes already known and response variables are
what we want to predict. Unfortunately, many real-world
Pattern
Evaluation problems are not simply prediction. Therefore, more
complex techniques (e.g., logistic regression, decision
trees, or neural nets)maybe necessary to forecast future
values. The same model types can often be used for both
regression and classification. For example, the CART
Data Mining
(Classification and Regression Trees) decision tree
algorithm can be used to build both classification trees (to
classify categorical response variables) and regression trees
(to forecast continuous response variables). Neural
Data Selection & networks too can create both classification and regression
Transformation models.
D. Association rule
Association and correlation is usually to find frequent item
Data Cleaning &
Integration set findings among large data sets. This type of finding
helps businesses to make certain decisions, such as
Figure -1 The steps of extracting knowledge from data catalogue design, cross marketing and customer shopping
behavior analysis. Association Rule algorithms need to be
A. Classification able to generate rules with confidence values less than one.
However the number of possible Association Rules for a
Classification is the most commonly applied data mining given dataset is generally very large and a high proportion
technique, which employs a set of pre-classified examples of the rules are usually of little (if any) value.
to develop a model that can classify the population of
records at large. This approach frequently employs decision E. Neural networks
tree or neural network-based classification algorithms. The
data classification process involves learning and Neural network is a set of connected input/output units and
classification .In learning the training data are analyzed by each connection has a weight present with it. During the
classification algorithm. In classification test data are used learning phase, network learns by adjusting weights so as
to estimate the accuracy of the classification rules. If the to be able to predict the correct class labels of the input
accuracy is acceptable the rules can be applied to the new tuples. Neural networks have the remarkable ability to
data tuples. The classifier-training algorithm uses these derive meaning from complicated or imprecise data and
International Journal of Computer Science and Network (IJCSN)
Volume 1, Issue 4, August 2012 www.ijcsn.org ISSN 2277-5420
can be used to extract patterns and detect trends that are too ID3, C4.5 and the Naïve Bayes were used. The outcome
complex to be noticed by either humans or other computer of their results indicated that Decision Tree model had
techniques. These are well suited for continuous valued better prediction than other models. Varsha, Anuj,
inputs and outputs. Neural networks are best at identifying Divakar, R.C Jain [13] applied four classification methods
patterns or trends in data and well suited for prediction or on student academic data i.e Decision tree (ID3),
forecasting needs. Multilayers perceptron, Decision table & Naïve Bayes
classification method. Brijesh kumar & Saurabh Pal [14] Page |
F. Decision Trees study the data set of 50 students from VBS Purvanchal 123
University, Jaunpur (U.P). As there are many approaches
Decision tree is tree-shaped structures that represent sets of that are used for data classification, the decision tree
decisions. These decisions generate rules for the method is used here. Information‟s like Attendance,
classification of a dataset. Specific decision tree methods Class test, Seminar and Assignment marks were collected
include Classification and Regression Trees (CART) and from the student‟s previous database, to predict the
Chi Square Automatic Interaction Detection (CHAID). performance at the end of the semester.
Branch Student’s branch in {CS, IT, EC, EN} users split each node recursively according to decision
B.Tech course. tree learning algorithm. The final result is a decision tree
in which each branch represents a possible scenario of
10th % Percentage of marks { First > 60%
decision and its outcome. The three widely used decision tree
obtained in 10th class Second > 45 & < 60 %
examination. learning algorithms are: ID3, ASSISTANT and C4.5.
Third > 35 & < 45 % }
12th % Percentage of marks { First > 60% C. The ID3 Decision Tree
Page |
obtained in 12th class Second > 50 & < 60 %
examination. ID3 is a simple decision tree learning algorithm developed 124
}
by Ross Quinlan [12]. The basic idea of ID3 algorithm is
B.Tech Percentage of marks { First > 60% to construct the decision tree by employing a top-down,
% obtained in B.Tech Second > 50 & < 60 %
course.
greedy search through the given sets to test each attribute at
} every tree node. In order to select the attribute that is
Final_G Final Grade obtained { Excellent, Good, most useful for classifying a given sets, we introduce a
rade after analysis the Average } metric - information gain.
passing percentage of
10th ,12th ,B.Tech .
5. Result and Discussion
The domain values for some of the variables were defined
for the present investigation as follows: The data set of 40 students used in this study was
obtained from Bansal Institute of Engineering &
Branch – Student’s branch in they are enrolled in B.Tech Technology, Meerut (India) of B.Tech course (Bachelor
Course. Branch split in four classes: CS IT, of Technology).
EC, EN.
TABLE –III
10th % -- Student’s passing percentage (%) in 10th class. RULE SET GENRATED BY DECIESION TREE
10th % is split into three classes: First- >60%
IF 10th % =“First” AND 12th % =“First” AND B.Tech % =
Second - >45% and <60%, Third - >35% “First” THEN Final_Grade = “Excellent”
and < 45%.
IF 10th % =“Second” AND 12th % =“First” AND B.Tech % =
12th % --Student’s passing percentage (%) in 12th class. “First” THEN Final_Grade = “Good”
For admission in B.Tech course minimum
50% marks is compulsory in 12th class. So IF 10th % =“Third” AND 12th % =“First” AND B.Tech % =
12th % is split into two classes: First- >60% “First” THEN Final_Grade = “Average”
Second - >50% and <60%. IF 10th % =“First” AND 12th % =“Second” AND B.Tech % =
“First” THEN Final_Grade = “Good”
B.Tech% --Student’s passing percentage (%) in B.Tech
Course. In B.Tech course Minimum 50% IF 10th % =“Second” AND 12th % =“Second”AND B.Tech %
marks is compulsory for passing. So B.Tech = “First” THEN Final_Grade = “Average”
% is split into two classes: First- >60%
Second - >50% and <60%. IF 10th % =“Third” AND 12th % =“Second” AND B.Tech %
= “First” THEN Final_Grade = “Average”
Final_Grade –The value of final grade (X) will be finding
IF 10th % =“First” AND 12th % =“First” AND B.Tech % =
after analysis of rule sets of Student’s passing
“Second” THEN Final_Grade = “Average”
percentage (%) in 10th (x0), 12th (x1), B.Tech
(x2). The final grade is divided into three IF 10th % =“Second” AND 12th % =“First” AND B.Tech % =
categories: Excellent, Good, Average “Second” THEN Final_Grade = “Average”