Welcome to Scribd, the world's digital library. Read, publish, and share books and documents. See more
Download
Standard view
Full view
of .
Save to My Library
Look up keyword
Like this
1Activity
0 of .
Results for:
No results containing your search query
P. 1
Visualization of MUSTAS Model using ECHAID

Visualization of MUSTAS Model using ECHAID

Ratings: (0)|Views: 4 |Likes:
Published by ijcsis
Educational assessment is an important insight to know about the student. In recent years there is an increasing interest of Educational Data Mining (EDM), which helps to explore the student data in different perspective. As the case, we introduced a new model called MUSTAS to assess the student’s attitude in three dimensions known as self assessment, institutional assessment and external assessment. Thus, this model exhibits the student performance in three grades as poor, fair, and good. The final part of visualization is generated through ECHAID algorithm. In this paper, we present the model and its performance on our private student dataset collected by us. Our model shows interesting insights about the student and can be used to identify their performance grade.
Educational assessment is an important insight to know about the student. In recent years there is an increasing interest of Educational Data Mining (EDM), which helps to explore the student data in different perspective. As the case, we introduced a new model called MUSTAS to assess the student’s attitude in three dimensions known as self assessment, institutional assessment and external assessment. Thus, this model exhibits the student performance in three grades as poor, fair, and good. The final part of visualization is generated through ECHAID algorithm. In this paper, we present the model and its performance on our private student dataset collected by us. Our model shows interesting insights about the student and can be used to identify their performance grade.

More info:

Published by: ijcsis on Feb 19, 2012
Copyright:Attribution Non-commercial

Availability:

Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less

02/19/2012

pdf

text

original

 
(IJCSIS) International Journal of Computer Science and Information Security,Vol. 9, No. 11, November 2011
Visualization of MUSTAS Model using ECHAID
G.Paul Suthan
 
Head, Department of Computer ScienceCSI Bishop Appasamy CollegeRace Course, Coimbatore,Tamil Nadu 641018, Indiagpsuthan@hotmail.com
Lt.Dr.Santosh BabooReader,PG and Research Department of ComputerApplicationDG
V
ishnav College, Arumbakkam
Chennai 600106,Tamil Nadu,IndiaSantos2001@sify.com 
 Abstract
— Educational assessment is an important insight toknow about the student. In recent years there is an increasinginterest of Educational Data Mining (EDM), which helps toexplore the student data in different perspective. As the case, weintroduced a new model called MUSTAS to assess the student’sattitude in three dimensions known as self assessment,institutional assessment and external assessment. Thus, thismodel exhibits the student performance in three grades as poor,fair, and good. The final part of visualization is generatedthrough ECHAID algorithm. In this paper, we present the modeland its performance on our private student dataset collected byus. Our model shows interesting insights about the student andcan be used to identify their performance grade.
 Keywords-component; Educational Data Mining, MUSTAS,CHAID prediction, Latent Class Analysis, Hybrid CHAID, ECHAID
I.
 
I
NTRODUCTION
In the past years, researchers from varity of disciplines(including computer science, statistics , data mining ,and education) have started to investigate how we can improveeducation using Data mining concepts. As a result EducationalData Mining[EDM] has emerged. EDM emphasis ondeveloping methods on exploring unique type of data that comefrom educational context. Educational Data Mining isconcerned with developing methods for exploring data fromeducational settings. Data mining also called KnowledgeDiscovery in Databases (KDD), is the field of discoveringnovel and potentially useful information from large amount of data by Witten and Frank[19]. It has been proposed thateducational data mining methods are often different fromstandard data mining methods, due to the need to explicitlyaccount for educational data by Baker[3]. For this reason, it isincreasingly common to see the use of models in these series assuggested by Barnes[4] and Pavlik et al.[14]. The traditionaldata mining methods are constructed in generic pattern, whichis suitable for any kind of application to fit in the methodspecified. Hence, existing techniques may be useful to discoverthe data, but it does not fulfill specific or customizedrequirement.Education specific mining techniques can help to improvethe instructional design, understanding of student’s attitude,academic performance appraisal and so on. In this scenario,traditional mining algorithms need to be adjusted intoeducational context.II.
 
RESEARCH
 
BACKGROUNDModern educational and psychological assessment isdominated by two mathematical models, Factor Analysis (FA)and Item Response Theory (IRT). FA operates at the level of atest, i.e., a collection of questions (items). The basicassumption of FA is that test score of individual i on test j isdetermined by(1)where the
 f 
ik 
 
terms represent the extent to which individual
i
has underlying ability
, and the
w
kj
 
terms represent the extentto which the ability
is required for test
 j
. The
e
ij
term is aresidual which is to be minimized. The weights of the abilitiesrequired for the test, i.e. the {
w
kj
}, is constant acrossindividuals. This amounts to an assumption that all individualsdeploy their abilities in the same way on each test.Assessments are made in an attempt to determine students’
 f 
ik 
 
values, i.e. a student’s place or position on the underlyingability scales.IRT operates at the item level within a test. Consider the
i
th
 item on a test. This item is assumed to have a characteristicdifficulty level,
 B
i
. Each examinee is assumed to have skilllevel
θ 
 
on the same scale. In the basic three parameter IRTmodel, the probability that a person with ability
θ 
 
will get item
i
correct is(2)where D is a constant scaling factor,
a
i
 
is an itemdiscrimination parameter and
c
i
 
is a “correction for guessingparameter”. A consequence of this model is that the relative
73http://sites.google.com/site/ijcsis/ISSN 1947-5500
 
(IJCSIS) International Journal of Computer Science and Information Security,Vol. 9, No. 11, November 2011
order of difficulty for any pair of items on a test and must bethe same for all individuals.Neither any of these commonly used models allow foridiosyncratic patterns of thought, where different people attack problems in different ways. More specialized models candescribe mixtures of strategies as mentioned by Huang[9].However, many educational theories are not easily fit to theassumptions of factor analytic or IRT models. Much of themotivation behind diagnostic assessment is to identify thedifferent strategies that might change the relative order of difficulty of items.The problem of how best to mathematically model aknowledge space is open, and the answer may be domain-dependent. There is evidence suggesting that in fact, facets(fine grained correct, partially correct, and incorrectunderstandings) may have a structure to them in somedomains that can be modeled using a partial credit model asdescribed by Wright and Masters[22]. Using this model,multiple choice responses are ordered in difficulty on a linearscale, allowing one to rank students by ability based on theirresponses. This implies that the relative difficulty of items insome interesting domains may indeed be the same for allstudents as said by Scalise et al.[17]. Thus, modeling can beimproved by identifying this linear structure of concepts.Wilson[20] and Wislon and Sloane[21] mentions that eachitem response would have its own difficulty on a linear scale,providing a clear measure of student and classroom progress,e.g., a learning progression where content is mapped to anunderlying continuum. But building this knowledgerepresentation is an extremely large endeavor, especially insubject areas where little research has been done into the ideasstudents have before instruction that affect theirunderstanding, or what dimensional structure is appropriate torepresent them. This approach assumes that all options areequally plausible, because if one option made no sense, eventhe lowest ability person would be able to discard it, so IRTparameter estimation methods take this into account andestimate a c
i
based on the observed data. In contrast,automatically constructed knowledge spaces may lead tooverestimation of knowledge states. Thus, we have paidattention to create a unique framework based on exploratorydata-mining approach.III.
 
EDUCATIONAL
 
DATA
 
MINING
 
(EDM)In recent years, advances in computing and informationtechnologies have radically expanded the data available toresearchers and professionals in a wide variety of domains.EDM has emerged over past few years, and its community hasactively engaged in creating large repositories. The increase ininstrumented educational software and in databases of studenttest scores has created large data repositories reflecting howstudents learn. EDM focuses on computational approaches forusing those data to address important educational questions.Erdogan and Timor [5] used educational data mining toidentify and enhance educational process which can improvetheir decision making process. Henrik[8] concluded thatclustering was effective in finding hidden relationships andassociations between different categories of students. Waltersand Soyibo,[23] conducted a study to determine Jamaican highschool students’ (population n=305) level of performance onfive integrated science process skills with performance linkedto gender, grade level, school location, school type, studenttype, and socioeconomic background (SEB). The resultsrevealed that there was a positive significant relationshipbetween academic performance of the student and the natureof the school.Khan, [24] conducted a performance study on 400 studentscomprising 200 boys and 200 girls selected from the seniorsecondary school of Aligarh Muslim University, Aligarh,India with a main objective to establish the prognostic value of different measures of cognition, personality and demographicvariables for success at higher secondary level in sciencestream. The selection was based on cluster sampling techniquein which the entire population of interest was divided intogroups, or clusters, and a random sample of these clusters wasselected for further analyses. It was found that girls with highsocio-economic status had relatively higher academicachievement in science stream and boys with lowsocioeconomic status had relatively higher academicachievement in general.Hijazi and Naqvi, [18] conducted a study on the studentperformance by selecting a sample of 300 students (225 males,75 females) from a group of colleges affiliated to Punjabuniversity of Pakistan. The hypothesis that was stated as"Student's attitude towards attendance in class, hours spent instudy on daily basis after college, students’ family income,student mother’s age and mother’s education are significantlyrelated with student performance" was framed. By means of simple linear regression analysis, it was found that the factorslike mother’s education and student’s family income werehighly correlated with the student academic performance.A.L Kristjansson, Sigfusdottir and Allegrante[2] made a studyto estimate the relationship between health behaviors, bodymass index (BMI), self-esteem and the academic achievementof adolescents. The authors analyzed survey data related to6,346 adolescents in Iceland and it was found that the factorslike lower BMI, physical activity, and good dietary habitswere well associated with higher academic achievement.Therefore the identified students were recommended diet tosuit their needs.Cortez and Silva[15] attempted to predict failure in the twocore classes (Mathematics and Portuguese) of two secondaryschool students from the Alentejo region of Portugal byutilizing 29 predictive variables. Four data mining algorithmssuch as Decision Tree (DT), Random Forest (RF), NeuralNetwork (NN) and Support Vector Machine (SVM) wereapplied on a data set of 788 students, who appeared in 2006examination. It was reported that DT and NN algorithms hadthe predictive accuracy of 93% and 91% for two-class dataset(pass/fail) respectively. It was also reported that both DT and
74http://sites.google.com/site/ijcsis/ISSN 1947-5500
 
(IJCSIS) International Journal of Computer Science and Information Security,Vol. 9, No. 11, November 2011
NN algorithms had the predictive accuracy of 72% for a four-class dataset.IV.
 
MUSTAS
 
MODELThe Multidimensional Students Assessment (MUSTAS)framework is a novel model, which consist of demographicfactors, academic performance of the student and dimensionalfactors. The dimensional factors has further sub divided intothree dimensions respectively self assessment, institutionalassessment and external assessment. The main objective of this framework is to identify the contribution of selecteddimensions over academic performance of the student, whichhelps to teachers, parents and management about the student’spattern. Understanding of the pattern may facilitate to redefinethe education method, additional care on weakness, andpromoting their abilities.A general form of the Multidimensional Random CoefficientMultinomial Logit Model was fitted, with between-itemdimensionality as described by Adams, Wilson & Wang[1].This means each item was loaded on a single latent dimensiononly so that different dimensions contained different items. Athree-dimensional model, a two-dimensional model and a one-dimensional model were fitted in sequence. The three-dimensional model assigned items into three groups. Group 1consisted of items that had a heavy reading and extractinginformation component. Group 2 consisted of items that wereessentially common-sense mathematics, or non-schoolmathematics. Group 3 consisted of the rest of the item pool,consisting of mostly items that were typically schoolmathematics, as well as logical reasoning items. In this itemresponse theory (IRT) model, Dimensions 3 and 4 of theframework, mathematics concepts and computation skills, hadbeen combined to form one IRT dimension.The MUSTAS model was built with the backbone of CHAIDand LCM. Chi-squared Automatic Interaction Detection(CHAID) analysis which was first proposed by Kass, 1980[6]is one of post-hoc predictive segmentation methods. TheCHAID, using of decision tree algorithms, is an exploratorymethod for segmenting a population into two or moreexclusive and exhaustive subgroups by maximizing thesignificance of the chi-square, based on categories of the bestpredictor of the dependent variable. Segments obtained fromCHAID analysis are different from cluster type modelsbecause the CHAID method, which is derived to be predictiveof a criterion variable, is defined by combinations of predictorvariables by Magidson, [12].Latent Class (LC) modeling was initially introduced byLazarsfeld and Henry.[10] as a way of formulating latentattitudinal variables from dichotomous survey items. Incontrast to factor analysis, which posts continuous latentvariables, LC models assume that the latent variable iscategorical, and areas of application are more wide ranging. Inrecent years, LC models have been extended to includeobservable variables of mixed scale type (nominal, ordinal,continuous and counts), covariates, and to deal with sparsedata, boundary solutions, and other problem areas.
Figure 1: MUSTAS Model
The Figure 1, exhibits the proposed model of studentassessment strategy. Academic performance and assessmentfactors are combined together as General AssessmentClassification (GAC), which is visualize through demographicfactors of the students. The GAC can be mentioned asparameter, which is act as rule based classification.AMOS is an application for structural equation modeling,multi-level structural equation modeling, non-linear modeling,generalized linear modeling and can be used to fitmeasurement models to data. In the subsequent sections, weillustrate this feature by fitting a measurement model to anSPSS data set using path diagram.
Self AssessmentInstitutionalAssessmentExternalAssessmentSELF1SELF2SELF3SELF4SELF5INST1INST2INST3INST4INST5EXT1EXT2EXT3EXT4EXT5
Figure 2. Path Diagram of MUSTAS
The path diagram shown in Figure 2, exhibit the pattern of MUSTAS model, which extracts R2=0.802. The LC analysisused to identifying segments based on academic performance
75http://sites.google.com/site/ijcsis/ISSN 1947-5500

You're Reading a Free Preview

Download
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->