You are on page 1of 6

International Journal of Advance Foundation and Research in Computer (IJAFRC)

Volume 2, Issue 7, July - 2015. ISSN 2348 4853

A Short Study On Effective Predictors Of Academic


Performance Based On Course Evaluation
1Mudasir

Ashraf , 2Majid Zaman and3Dr. Muheet Ahmad


1 M.Phil Scholar, Department of Computer Science, University of Kashmir, J&k,
India 2Scientist B, Directorate of IT&SS, University of Kashmir, J&k, India
3Scientist D, Department of Computer Science, University of Kashmir, J&k, India
1mudasir04@gmail.com2zamanmajid@gmail.com3ermuheet@kashmiruniversity.ac.in
ABSTRACT
One of the most contemporary debates in higher education centers on academic success, performance
and quality. So, the aim of this research work is to identify key factors within a specific course that may
indicate which variables/predictors are likely to impact and optimize academic performance using
various statistical techniques like discriminant & ANOVA. This study conducted at the university of
Kashmir, examines the relationship between students overall performance in their final year of
bachelors course (ARTS) and individual predictors/ subjects of the course. The data comprised of one
hundred and twenty nine students (129).
Index Terms: KDD, Data mining, EDM, Classification, Prediction, Discriminant.

I.

INTRODUCTION

Congratulations Educational data mining (EDM) is said to be budding and promising application of data
mining, concerned with constructing such procedures which can discover the distinctive types of data
that comes from the educational quarter, and using these procedures to comprehend students in a
healthier way by which they learn [1]. Educational data mining is also concerned with investigating,
developing and applying automated methods on large collections of educational data to extract patterns
which would otherwise be complex to comprehend and unfeasible to analyze due to massive size of data
within which it survives [2].
Data mining is sometimes called as knowledge discovery in Databases (KDD), which can be used to find
the hidden knowledge and the relationships that exist among the huge amount of educational data. The
Knowledge discovery process that digs out data or knowledge from databases is also known as Data
mining [3]. There are number of techniques and algorithms such as classification, prediction, clustering,
outlier detection, and association rule etc., used for such specific purpose. Mining techniques are used for
predicting the dropout students so that the stakeholders or decision makers can retain such students at
an earliest. Thus, making an attempt to improve their performance and consequently reduce their
dropout ratio or the likelihood of their failure. For this, discriminant analysis is used and the central
attention of this paper has been on the relationship among the variables using discriminant analysis.
Discriminant analysis being a statistical technique and parallel to regression, involves dependent variable
to be categorical in nature rather than continuous [4]. And the method is based on a linear combination
of predictor variables.
D= c+b1X1 + b2X2 +.. + bkXk

22 | 2015, IJAFRC All Rights Reserved

www.ijafrc.org

International Journal of Advance Foundation and Research in Computer (IJAFRC)


Volume 2, Issue 7, July - 2015. ISSN 2348 4853
Where D is the discriminant score, c (Constant), b (discriminant weight or coefficient), and X is the
independent or predictor variable.
II. RELATED WORK
Educational data mining is an emerging area wherein we apply various data mining techniques for
discovering data in educational environment. For successful education preparation it provides inherent
knowledge of learning and teaching process. There are number of research authors who have
contributed in the field of educational data by applying different data mining methods.
Sheikh et al. (2004), applied association rule in educational data mining for evaluating learning data [5].
The researchers put forward a case study of learning management system. Romero and Ventura (2006),
showed the eminence of data mining and carried out comprehensive study from 1995-2005 in the
discipline of education [6]. They proposed their view and highlighted some of the areas that should be
taken into consideration for development, improvement and growth in educational data mining such as
ease of using mining tools, e-learning, generalized tools etc.
El- Halees (2009,) studied how data mining can be beneficial in higher education in order to improve the
performance of the students using association rule and classification rule using decision tree [7]. The
evolved knowledge that resulted after the application of such techniques was used for performance
enhancement.
Kifaya (2009), studied associative classification and came to the conclusion that clustering is proficient in
searching the association and relationship between various factors assessing the growth of students [8].
Ayesha et al. (2010), applied k-means method to improve the performance of students by analyzing their
learning behavior and report the result of such assessment in advance to the class tutor so that the drop
out ration is brought to a significant level [9].
Sunil kumar et al. (2013), applied association rule mining which determines various significant factors
like support and confidence that are responsible for planning the retention of students and subsequently
reducing the drop out ratio [10].
III. RESULTS AND DISCUSSIONS
After running discriminant analysis across overall result (pass or fail) and other predictor variables
including General English (GE), Economics (EC), Education (ED) and political science (PS) respectively.
Statistics associated with the data have been shown in different tables highlighted below.
Table 1: Group Statistics
Valid N (listwise)
VARRESULT
1

Mean

Std. Deviation

Unweighted

Weighted

VARGE

65.69

20.675

100

100.000

VAREC

66.98

11.870

100

100.000

VARED

53.79

34.598

100

100.000

VARPS

72.00

15.708

100

100.000

VARGE

60.14

18.466

29

29.000

23 | 2015, IJAFRC All Rights Reserved

www.ijafrc.org

International Journal of Advance Foundation and Research in Computer (IJAFRC)


Volume 2, Issue 7, July - 2015. ISSN 2348 4853

Total

VAREC

43.90

14.652

29

29.000

VARED

47.55

35.900

29

29.000

VARPS

62.10

19.927

29

29.000

VARGE

64.44

20.263

129

129.000

VAREC

61.79

15.796

129

129.000

VARED

52.39

34.851

129

129.000

VARPS

69.78

17.173

129

129.000

It is clear from the group means (table 1) that two groups namely pass or fail are widely separated in
terms of subject economics and political sciences respectively. There appear some differences across the
subjects of Education and General English.
Further, there appears higher scattering in the results of education as its standard deviation is on higher
side. Also, there appears reasonable scattering in the result obtained in the subject of General English.
The pooled within groups correlation Matrix indicates low correlation between the predictors. Therefore,
Multi-Co linearity is unlikely to be a problem (see table 2).
Table 2: Pooled Within-Groups Matrices
VARGE

VAREC

VARED

VARPS

Correlation VARGE

1.000

.045

-.004

.097

VAREC

.045

1.000

-.101

-.057

VARED

-.004

-.101

1.000

-.053

VARPS

.097

-.057

-.053

1.000

The significance of univariate F-ratio (table 3) indicates that when predictors are considered are
considered individually, only Economics and political science significantly differentiate between those
who passed or failed in the examination as significance value associated with two predictors is less than
acceptable level of significance (i.e 0.05).
Table 3: Tests of Equality of Group Means
Wilks'
Lambda

df1

df2

Sig.

VARGE

.987

1.697

127

.195

VAREC

.625

76.215

127

.000

VARED

.994

.719

127

.398

VARPS

.942

7.866

127

.006

Thus the competence in political sciences and economics would at large determine the overall results
significantly. Thus to have better results, the curriculum should be administered in such a way that
should lead to superior performance in Economics and Political Sciences.

24 | 2015, IJAFRC All Rights Reserved

www.ijafrc.org

International Journal of Advance Foundation and Research in Computer (IJAFRC)


Volume 2, Issue 7, July - 2015. ISSN 2348 4853
Table 4: Eigenvalues
Functi
on Eigenvalue % of Variance Cumulative %
1

.719a

100.0

Canonical
Correlation

100.0

.647

The Egin value associated with the function is 0.719 (Table 4) and it accounts for 100 percent of the
explained variance. The canonical correlation associated with the function is 0.647. The square of this
correlation (0.647)2 =0.41, indicates that 41 percent of the variance in the dependant variable result
[pass/fail] is explained or accounted for by this model.
Determine the significance of the Discriminant Function: It would not be meaningful to interpret the
analysis if the discriminant functions estimated were not statistically significant. The null hypothesis that,
in the population, the means of all discriminant functions in all groups are equal can be statistically tested
and is based on wilks lamda. The significance level is estimated based on chi-square transformation of
the statistic. In testing the significance in the overall results, it may be noted that the wilks lamda
associated with the function is 0.582 which transform to a chi-square of 67.693 with 4 degrees of
freedom. This is significant beyond the 0.05 level (table 5).
Table 5: Wilks' Lambda
Test of
Function(s)

Wilks'
Lambda

Chi-square

df

Sig.

.582

67.693

.000

Interpret the Results: An examination of the standardized discriminant function coefficients for the
overall results (Bjb College) is constructive. Given the low interactions between the predictors, on might
use the magnitude of standardized coefficients to suggest that Economics is predominant factor in
determining the performance in overall result. However, it is not the top most important variable based
on standardized canonical discriminant function coefficients. This anomaly results from multiCo linearity (table 6).
Table 6: Standardized Canonical Discriminant Function Coefficients
Function
1
VARGE

.060

VAREC

.952

VARED

.204

VARPS

.353

Also from structure correlation matrix, it is clear that the first and for most important factor (Subjects in
this case) is economics followed by political sciences, then General English and finally the competence in
Education. Thus, it is clear that competence in economics (as a subjects) largely determine the
performance in overall result of a candidate (table 7).

25 | 2015, IJAFRC All Rights Reserved

www.ijafrc.org

International Journal of Advance Foundation and Research in Computer (IJAFRC)


Volume 2, Issue 7, July - 2015. ISSN 2348 4853
Table 7: Pooled within-groups correlations between discriminating variables and standardized
canonical discriminant functions, Variables ordered by absolute size of correlation within
function.
Structure Matrix
Function
1
VAREC

.914

VARPS

.294

VARGE

.136

VARED

.089

IV. CONCLUSION
In this research study, the various variables associated with a specific course (B.A) were analyzed using
discriminant method with the intention of enhancing the student performance. After studying these
variables, the researcher came to the conclusion that the first and foremost predictor variable
(economics) plays a pivotal role in the overall success of a student. Pooled within groups correlations
between discriminating variables and standardized canonical discriminant functions also showed
imperativeness of the subjects (Political Science and General English) in determining the overall results
which has great bearing on the performance of the student. Thus, the stakeholders or decisions makers
can strengthen the subject to reduce the overall drop out ratio considerably.
V. REFERENCES
[1]

BakerRSJd, Yacef K. The state of educational data mining in 2009. A review and future visions. J
EduData Min 2009.

[2]

Romero C, Ventura S, Pechenizky M, Baker R. Handbook of educational data mining. Data Mining
and Knowledge Discovery Series. Boca Raton, FL: Chapman and Hall/CRCpress; 2010.
J. Han and M. Kamber, Data mining: Concepts and Techniques. San Francisco, CA, USA: Morgan
Kaufmann Publishers Inc., 2000.

[3]

[4]

Klecka, W.R. 1980. Discriminant analysis. Sage, Beverley Hills.

[5]

Sheikh, L., Tanveer, B. and Hamdani, S. 2004. Interesting measures for mining association rules.
IEEE-NMIC Conference. held at Lahore (Pakistan), 2426 Dec. 2004.

[6]

Romero, C. and Ventura, S. (2007) Educational data Mining: A Survey from 1995 to 2005, Expert
Systems pp. 135-146.with Applications (33),

[7]

El-Halees, A. 2009. Mining students data to analyze learning behavior: a case study.
https://uqu.edu.sa/fi les2/tiny_mce/plugins/fi lemanager/fi les/30/papers/f158.pdf.

[8]

Kifaya. 2009. Mining student evaluation using associative classification and clustering.
Communications of the IBIMA. 11, IISN 19437765.

26 | 2015, IJAFRC All Rights Reserved

www.ijafrc.org

International Journal of Advance Foundation and Research in Computer (IJAFRC)


Volume 2, Issue 7, July - 2015. ISSN 2348 4853
[9]

Ayesha, S., Mustafa, T., Sattar, A.R. and Khan, M.I. 2010. Data mining model for higher education
system. European Journal of Scientific Research. 43(1): pp. 2429.

[10]

Sunil Kumar, P., Panda, A.K. and Jena, D.2013. Mining the factors affecting the high school
dropouts in rural areas, International Journal of Advance Computer Engineering and
Communication Technology (IJACECT), 2(1); pp. 16.

27 | 2015, IJAFRC All Rights Reserved

www.ijafrc.org