Welcome to Scribd, the world's digital library. Read, publish, and share books and documents. See more
Download
Standard view
Full view
of .
Save to My Library
Look up keyword
Like this
3Activity
0 of .
Results for:
No results containing your search query
P. 1
Using Genetic Algorithms for Data Mining Optimization in an Educational Web-Based System

Using Genetic Algorithms for Data Mining Optimization in an Educational Web-Based System

Ratings:

4.75

(4)
|Views: 874 |Likes:
Published by vasu

More info:

Published by: vasu on Feb 18, 2009
Copyright:Attribution Non-commercial

Availability:

Read on Scribd mobile: iPhone, iPad and Android.
download as PDF or read online from Scribd
See more
See less

05/10/2014

 
 
E. Cantú-Paz et al. (Eds.): GECCO 2003, LNCS 2724, pp. 2252–2263, 2003.© Springer-Verlag Berlin Heidelberg 2003
Using Genetic Algorithms for Data MiningOptimization in an Educational Web-Based System
Behrouz Minaei-Bidgoli and William F. PunchGenetic Algorithms Research and Applications Group (GARAGe)Department of Computer Science & EngineeringMichigan State University2340 Engineering BuildingEast Lansing, MI 48824
{minaeibi,punch}@cse.msu.eduhttp://garage.cse.msu.edu
Abstract.
Thispaperpresentsanapproach for classifying students in order topredict their final grade based on features extracted from logged data in anedu-cation web-based system. A combination of multiple classifiers leads to a sig-nificant improvement in classification performance. Through weighting the fea-ture vectors usinga GeneticAlgorithm we can optimize the prediction accuracyand get a marked improvement over raw classification. Itfurthershowsthatwhenthenumber of features is few; feature weighting is works better than justfeature selection.
1 Statement of Problem
Many leading educational institutions are working to establish an online teaching andlearning presence. Several systems with different capabilities andapproaches havebeen developed to deliver online education in anacademic setting.Inparticular,Michigan State University (MSU) has pioneered some of these systems to provide aninfrastructure for online instruction. The research presented here was performed on apartof the latestonline educational system developed at MSU, the
 Learning Online Network with Computer-Assisted Personalized Approach
(
 LON-CAPA
).In LON-CAPA
1
, we are involved with two kinds of large data sets: 1) educationalresources such as web pages, demonstrations, simulations, and individualized prob-lemsdesigned for use on homework assignments, quizzes, and examinations; and 2)information aboutusers who create, modify, assess, or use these resources. In otherwords, we have two ever-growing pools of data.Wehave been studying data mining methods for extracting useful knowledge fromthese large databases of students using online educationalresourcesand their re-corded paths through the web of educational resources. In thisstudy, weaim toan-
1
See http://www.lon-capa.org
 
Using Genetic Algorithms for Data Mining Optimization 2253
swer the following research questions: Can we find
classes
of students? In otherwords, do there exist groups of students whouse these online resources in a
similar 
way? If so, can we identify that class for any individual student? With this informa-tion, can we
help
a student use the resources better, based on the usage of the resourceby other students in their groups?Wehope tofind similar patternsof use in the data gathered from LON-CAPA, andeventuallybeable to make predictions as to the most-beneficial course of studies foreach learner based on their present usage. The system could then make suggestions tothe learner as to how to best proceed.
2 Map the Problem to Genetic Algorithm
Genetic Algorithms have been shown to be an effective tool to use indatamining andpattern recognition. [7], [10], [6],
[16], [15]
, [13], [4]. An important aspect of GAs in alearning context is their use in pattern recognition. There are two different ap-proaches to applying GA in pattern recognition:1.Apply a GA directly as a classifier. Bandyopadhyay and Murthy in [3] applied GAto find the decision boundary in N dimensional feature space.2.Use a GA as an optimization tool for resetting the parameters in other classifiers.Most applicationsof GAs in pattern recognition optimize some parameters in theclassification process. Many researchers have used GAs in feature selection [2], [9],[12], [18]. GAs has been applied to find an optimal set of feature weights that im-prove classification accuracy. First, a traditional feature extraction method such asPrincipal Component Analysis (PCA) is applied, and then a classifier such ask-NNis usedtocalculate the fitness function for GA [17], [19]. Combination of classifi-ers is another area that GAs have been used to optimize. Kuncheva and Jainin[11]used a GA for selecting the features as well as selecting the types of individual clas-sifiers in their design of a Classifier Fusion System. GA is also used in selecting theprototypes in the case-based classification [20].In this paper we will focus on the second approach and use a GA to optimize acombination of classifiers. Our objective is to
 predict 
the students’ final grades basedon their web-use features, which are extracted from the homework data. Wedesign,implement, and evaluate a series of pattern classifiers with various parameters inorder to compare their performance on a dataset from LON-CAPA. Error rates for theindividual classifiers, their combination and the GA optimizedcombination are pre-sented.
 
2254 B. Minaei-Bidgoli and W.F. Punch
2.1 Dataset and Class Labels
As test data we selected the student and course data of a LON-CAPAcourse,PHY183 (Physics for Scientists and EngineersI), which washeldatMSUinspringsemester 2002. Thiscourseintegrated 12 homework sets including 184 problems, allof which are online. About 261 students used LON-CAPA for this course. Some of students dropped the course after doing a couple of homework sets, sothey do nothave any final grades. After removing those students, there remained 227 validsam-ples. The grade distribution of the students is shown in Fig 1.
0 10 20 30 40 50 60
# ofstudents
0.01.01.52.02.53.03.54.0
     G    r    a     d    e
Grade Distribution
Fig. 1.
Graph of distribution of grades in course PHY183 SS02
Wecan group the studentsregarding their final grades in several ways, 3 of whichare:1.Let the 9 possible class labels be the same as students’ grades, as shown intable12.We can label the students in relation totheir gradesand group them intothreeclasses, “
high
” representing grades from 3.5 to 4.0, “
middle
” representing gradesfrom 2.5 to 3, and “
low
” representing grades less than 2.5.3.We can also categorize the studentswith one of twoclass labels:
Passed 
” forgrades higher than 2.0, and ”
Failed 
” for grades less than or equal to 2.0, asshown in table 3.
Table 1.
Selecting 9 class labels regarding to students’ grades in course PHY183 SS02Class Grade Student # Percentage
1
0.0 2 0.9%
2
0.5 0 0.0%
3
1.0 10 4.4%
4
1.5 28 12.4%
5
2.0 23 10.1%
6
2.5 43 18.9%
7
3.0 52 22.9%
8
3.5 41 18.0%
9
4.0 28 12.4%

Activity (3)

You've already reviewed this. Edit your review.
1 hundred reads
1 thousand reads
Naveen Reddy liked this

You're Reading a Free Preview

Download
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->