(IJCSIS) International Journal of Computer Science and Information Security,Vol. 9, No. 11, November 2011
Visualization of MUSTAS Model using ECHAID
G.Paul Suthan
Head, Department of Computer ScienceCSI Bishop Appasamy CollegeRace Course, Coimbatore,Tamil Nadu 641018, Indiagpsuthan@hotmail.com
Lt.Dr.Santosh BabooReader,PG and Research Department of ComputerApplicationDG
V
ishnav College, Arumbakkam
Chennai 600106,Tamil Nadu,IndiaSantos2001@sify.com
Abstract
— Educational assessment is an important insight toknow about the student. In recent years there is an increasinginterest of Educational Data Mining (EDM), which helps toexplore the student data in different perspective. As the case, weintroduced a new model called MUSTAS to assess the student’sattitude in three dimensions known as self assessment,institutional assessment and external assessment. Thus, thismodel exhibits the student performance in three grades as poor,fair, and good. The final part of visualization is generatedthrough ECHAID algorithm. In this paper, we present the modeland its performance on our private student dataset collected byus. Our model shows interesting insights about the student andcan be used to identify their performance grade.
Keywords-component; Educational Data Mining, MUSTAS,CHAID prediction, Latent Class Analysis, Hybrid CHAID, ECHAID
I.
I
NTRODUCTION
In the past years, researchers from varity of disciplines(including computer science, statistics , data mining ,and education) have started to investigate how we can improveeducation using Data mining concepts. As a result EducationalData Mining[EDM] has emerged. EDM emphasis ondeveloping methods on exploring unique type of data that comefrom educational context. Educational Data Mining isconcerned with developing methods for exploring data fromeducational settings. Data mining also called KnowledgeDiscovery in Databases (KDD), is the field of discoveringnovel and potentially useful information from large amount of data by Witten and Frank[19]. It has been proposed thateducational data mining methods are often different fromstandard data mining methods, due to the need to explicitlyaccount for educational data by Baker[3]. For this reason, it isincreasingly common to see the use of models in these series assuggested by Barnes[4] and Pavlik et al.[14]. The traditionaldata mining methods are constructed in generic pattern, whichis suitable for any kind of application to fit in the methodspecified. Hence, existing techniques may be useful to discoverthe data, but it does not fulfill specific or customizedrequirement.Education specific mining techniques can help to improvethe instructional design, understanding of student’s attitude,academic performance appraisal and so on. In this scenario,traditional mining algorithms need to be adjusted intoeducational context.II.
RESEARCH
BACKGROUNDModern educational and psychological assessment isdominated by two mathematical models, Factor Analysis (FA)and Item Response Theory (IRT). FA operates at the level of atest, i.e., a collection of questions (items). The basicassumption of FA is that test score of individual i on test j isdetermined by(1)where the
f
ik
terms represent the extent to which individual
i
has underlying ability
k
, and the
w
kj
terms represent the extentto which the ability
k
is required for test
j
. The
e
ij
term is aresidual which is to be minimized. The weights of the abilitiesrequired for the test, i.e. the {
w
kj
}, is constant acrossindividuals. This amounts to an assumption that all individualsdeploy their abilities in the same way on each test.Assessments are made in an attempt to determine students’
f
ik
values, i.e. a student’s place or position on the underlyingability scales.IRT operates at the item level within a test. Consider the
i
th
item on a test. This item is assumed to have a characteristicdifficulty level,
B
i
. Each examinee is assumed to have skilllevel
θ
on the same scale. In the basic three parameter IRTmodel, the probability that a person with ability
θ
will get item
i
correct is(2)where D is a constant scaling factor,
a
i
is an itemdiscrimination parameter and
c
i
is a “correction for guessingparameter”. A consequence of this model is that the relative
73http://sites.google.com/site/ijcsis/ISSN 1947-5500