Professional Documents
Culture Documents
Tolle
Introduction
Education is the platform that enables the individuals and the society to contribute in the
development of the country. Each individual is equipped with the weapons of knowledge,
and skills. Providing the right guidance and skills to the youth has a significant effect on
the overall progress and economic growth of the nation. In the last decade, with increase
in the number of students enrolled, the amount of data related to students has also grown
tremendously in the education domain, making it difficult to analyze the academic
performance of the students. Therefore, it has become essential to develop powerful
means for analyzing the students' academic data for the extraction of useful knowledge.
This discovered knowledge will assist in improving the performance of students. The
academic achievement of the students depend on various factors such as demographics,
examination, socio-economic, etc. In the education domain, various factors are involved
corresponding to different entities viz. student, faculty, infrastructure, and learning
environment that are multi-dimensional in nature. The student entity describes the basic,
personal, academic, and examination details about each student. Similarly, the faculty and
the infrastructure entities represent the teacher profile and facilities provided in a specific
course. There can be direct or indirect relationships among various attributes of students
that ought to be identified. Moreover, excessive growth in databases necessitates
developing new technologies that use information and knowledge intelligently. Hence,
there is a need for an automated decision support system that discovers the educational
patterns to analyze the performance of the students.
Data mining is the technique of extracting knowledge from heterogeneous data sources. It
is the technology that explores a large amount of data in search of consistent patterns and
systematic relationships between variables. It involves validation of the discovered
patterns to new subsets of data, once the ground relations among different entities are
identified [1]. Data mining is an iterative process that starts with problem definition at
phase 1, followed by the collection of data through data sampling and data preparation via
transformation of data at phase 2, following this a model is constructed and evaluated at
1
Chapter 1: Introduction
phase 3, and at last, this process ends with knowledge deployment, as shown in Figure
1.1.
Problem Definition
Accessing of data
Sampling of Data
Transformation of data Model Construction
and Assessment
Generate Model
Text Model Assess & Knowledge
Understand Model
Deployment
Model Apply
Custom Reports
External Applications
The actual task of data mining is the semi-automatic or automatic analysis of the huge
amount of data to extract previously unidentified, useful patterns, clusters of data records,
uncommon records, and dependencies. For gathering valuable facts and quicker
repossession of data, numerous arenas have opted for the data mining skills. A few
tendencies [2] of data mining that reveal the quest of knowledge such as the structure of
combined and communicating data mining atmosphere are: -
2
Chapter 1: Introduction
Educational data mining is the research area with a set of computational approaches
to explore educational datasets. Data mining techniques, when applied in the
education sector, results in Educational Data Mining.
Distributed data mining is the approach that distributes the workload among several
sites to make the system scalable.
Real-time data mining involves the intrusion detection system, a technology to
protect network systems that are used in real-time.
Multi-database mining is a distributed approach of data mining in which data is
distributed across multiple databases. These databases can be aggregated using
several techniques to create a single dataset that can be mined using standard
approaches.
Intelligent query answering helps in semantic query optimization and querying
database knowledge. Data mining tools can be applied to database systems for
intelligent query answering.
Privacy protection and information security can be achieved with the help of data
mining.
Apart from these, data mining techniques can also be applied to the areas of corporate
analysis, production control, risk management, science exploration, and customer
retention. The data mining specializing in the educational domain is named as
Educational Data Mining (EDM). It deals with the approaches, tools, and research plans
that are used to attain information from educational records viz. online logs, and
examination results, and then evaluates this information to formulate decisions.
EDM is the research area that analyzes the educational data to identify meaningful
information about different types of learners and their learning behavior, and the effect of
educational policies implemented in various learning environments. The process of EDM
takes raw data as input for various educational settings and transforms it into valuable
facts and information. This information assists educational policymakers, school
administrators, teachers, and students in making informed decisions about how to manage
and interact with educational resources. It enables the data-driven decision-making in
improving the current educational practices and learning materials. With the advancement
of technological use in educational information systems, a large amount of student data
has been made available. This makes it significant to use EDM for analyzing student’s
3
Chapter 1: Introduction
learning behavior. EDM helps in effective assessment of educational institutions for the
optimal usage of learning resources.
Learners
Performance
Recommendations predictions
Educators
Figure 1.2 depicts the information mining from an educational information system that
collects and uses educational data, available in the form of learning objects, event logs,
and student profiles. From the educational data, EDM tasks are formulated in the form of
student profiling and knowledge modeling. This educational data helps the students in
using learning resources effectively and also helps the educators about the usage of this
discovered knowledge i.e. learning patterns for performance prediction. EDM focuses on
various difficulties and challenges associated with diverse segments in the learning
phenomenon [4]. These challenges are as follow:
Various practices, in contrast to the different educational datasets, leads to the unveiling
of several problems. So, it is essential to select the right problem formulation technique
corresponding to the desired research objectives to get the desired results.
4
Chapter 1: Introduction
The selection of the optimal technique is centered on the structure of the educational
dataset. However, the decision to apply these techniques is always based on research
objectives. So, the researcher has to select a specific technique keeping in mind the value-
set of available datasets. Table 1.1 describes the most prominent problem formulation
techniques related to educational data mining [3]. All these techniques are elaborated
corresponding to their specific objectives.
Techniques Purpose
All the techniques discussed in Table 1.1 are suitable to find the results corresponding to
the desired objectives of the researchers. However, prominent researchers in the field of
education, propose that association rule mining, classification, regression, and clustering
are the most common and useful techniques used in the domain of educational data
mining.
5
Chapter 1: Introduction
Several studies [6, 7, 17, 18, 19, 20] have propounded different factors that are having a
direct or indirect impact on the students' academic performance, although very few of
them have come out with prediction models. Particularly, an accurate prediction of
students’ performance in academics at the higher secondary level is required for offering
a student the mandatory support in learning process. The academic performance modeling
can be categorized into four segments, namely ‘student modeling’, ‘decision support
system’, ‘adaptive system’, and ‘scientific evaluation’ that are centered on the target
audience [5] as labeled in Figure 1.3. The first two categories can be further classified
into the subcategories for defining the specifications corresponding to their specific
domain. There might not be any similarity between these two subcategories, but there are
differences in their objectives. There is a need to identify these differences for
distinguishing these sub-categories clearly. Details of all these categories and their
corresponding subcategories are given below.
Student Decision
Modeling Support System
Target
Audience
Adaptive Scientific
System Evaluation
6
Chapter 1: Introduction
1.4.1 Student Modeling (SM): It is used to estimate the value of the attributes of the
student for analyzing their performance. This segment accounts for the cognitive aspects
of student activities, such as analyzing the student’s performance or behavior,
identification of the students’ goals and plans, discerning his/her prior acquired
knowledge, maintaining an episodic memory, and describing characteristics of the
learners. The student modeling is further categorized into sub-segments to highlight the
similarities and differences between them. The student modeling is further sub-classified
as ‘Students Performance and Characteristics’, ‘Detecting Students Learning Behavior’,
‘Students Profiling and Grouping’, and ‘Social Network Analysis’. The details about each
sub-segment are given below:
7
Chapter 1: Introduction
because while clustering the students, the greatest dissimilarity between clusters, but
this is not the case in grouping tasks. While using a grouping task for forming teams
in a specific course, one prefers to have groups that are similar and comprising of
dissimilar students who can complement each other. Similar to other applications of
EDM, different data mining methods such as feature selection and clustering
techniques can be used to profile and group the students.
Social Network Analysis: The objective of this segment is to recognize the
relations between individuals and illustrate it using a graph. The other segments of
student performance modeling focus on the individuals, but the social network
analysis focuses only on the association properties assigned to the relationships
among individuals. Reyes and Tchounikine [9] have applied social network analysis
techniques to discover structural characteristics of learning groups. Consequently,
the social network analysis can be used to structure the educative communities to
model the work analysis to measure the cohesion in the learning environment.
8
Chapter 1: Introduction
courses based on the students’ preferences. This segment can also help the educators
in the allocation of resources, counseling processes, or the other tasks involved in
planning and scheduling.
Alerts for Administration: The alerts for administration serves as an online tool for
updating the administration in real-time. This segment also works on the detection of
unwanted behavior of the students. Knowles [12] introduced a dropout early warning
system using the statistical models and regression techniques. However, in the case
of an offline learning environment, this segment can act as an early performance
prediction warning system for the administration.
Concept Maps: This segment is a domain model for educators, which acts as a
graphical tool for organizing and representing knowledge. Chun-hsiung et al. [13]
used an Apriori algorithm to automatically construct the concept map of learners.
Such kind of concept maps assists the educator in enhancing the performance of the
student in academics. Examples of concept maps are the hierarchy of topics in the
course material, relationships of skills and test items, correlation of test items, and
knowledge components.
Generating Recommendation System: This segment focuses on performance
enhancement in the student learning management system. The recommendation
system serves as commendations to the students, but it can be targeted to any
stakeholder. Some of the examples of this category are course recommendations to
students or test item recommendations to educators. The collaborative filtering and
association-rule based techniques can also prove to be useful in this system. The
discovery with the model method can also be used in the development of the
recommendations system. Vialardi et al. [14] used the performance predictor model
for generating recommendations. This predictor model predicts the success of each
student in each course and would thereby recommend courses in which the student is
most likely to be successful.
1.4.3 Adaptive System: This segment of application is related to the use of intelligent
systems in the online learning environment. This segment assists educators in adapting
the learning characteristics of students. In various online learning systems, numerous
learners with different needs are involved in the system. However, if the number of
applicants grows, it becomes difficult to meet specific needs of all learners. Adaptive
systems can help educators in meeting the needs of every individual learner. This
9
Chapter 1: Introduction
adaptation can take on the form of adapting to the course material, instruction pace,
providing hints, ordering and generating tests, etc. For example, Alaofi et al. [15]
explored the personalization of a digital library using the students' profile information to
improve search results about learning content.
1.4.5 Target Audience: One important goal of EDM is improving the quality of learning,
and in this process of learning only two groups of users come to mind i.e. learners and
educators. We also look into the target users corresponding to each formulated segment.
The end-users in educational environments are students, educators, administrators, and
researchers all of them correspond to all the modeling segments of academic performance
[5]. Any research in EDM may address one or more than one of the segments at the same
time. Table 1.2 maps all the possible relationships between the target users respective to
the applications domain of academic performance modeling respectively.
Table 1.2: Target users of various segments of academic performance modeling [5].
10
Chapter 1: Introduction
Table 1.2 describes the possible target user corresponding to each application under
EDM. It also reveals that many applications can target more than one user. Learners have
been the target of EDM in various applications, such as grouping students, generating
recommendations, and adaptive systems. Most of the applications in categories of student
modeling and decision support systems target educators as their end-users. Student
modeling provides a better understanding of the students’ state of learning, and decision
support systems can directly help educators in making improvement in the learning
process. Therefore, student modeling also assists the administrator of educational
institutions in making higher-level decisions. Researchers also represent a group of end-
users, as the objective of the research is to understand the learning process, develop
theories and test them. For example, researchers can use social network analysis (SNA) to
pinpoint the properties that are valuable in the prediction of the performance of the
student. Therefore, more than one application can collaborate to serve multiple users.
1.5 Motivation
In the last decade, the school education system in India has increased in size, but the
quality of the education system in terms of student success rate and retention rate is still
unsatisfactory. Numerous realities come into light regarding the student dropout rate
while referring to the MHRD education census from 2011 to 2014 [21]. Figure 1.4
presents statistics about students’ dropout rates from 2011 to 2014. The education survey
of 2011 describes that the dropout rate for I to V classes was 28.9%; for I to VIII, it was
42.4%, and for I to X standard, it was 52.8 [Appendix-A]. In 2012, the dropout rate for I
to V classes was 27.0%; for I to VIII, it was 40.6%, and for I to X, it was 49.3%
[Appendix-B]. Similarly, in 2013, the dropout rate for I to V classes was 27.0%; for I to
11
Chapter 1: Introduction
VIII, it was 40.6%, and for I to X, it was 49.3% [Appendix-C]. Moreover, in 2014, there
was no change in the situation up till V class. The dropout rate was 19.8%, for up to VIII
class, it was 36.3%, and for up to class X, the dropout rate was 47.4% [Appendix-D].
Figure 1.4: Students’ dropout rate from 2011 to 2014, a census by MHRD
in ‘Education Statistics at a Glance.’
Apparently, the size of the dataset plays a crucial role in discovering valuable facts and
information. However, in national context, bigger datasets have rarely been used for the
analytical process [22]. For better understanding and implementation of the educational
system in the national aspect, there is a need for large standardized databases. Not
limiting to dataset size, the versatility of the available dataset is another challenge. The
dataset while considering only one academic session may fail to reflect the versatility of
the educational system. Moreover, the data acquisition process also needs to be
streamlined so that it reflects the correct picture of the learning phenomenon. In relation
to the effective analytical process, the dataset must be in a requisite format and must be
free from any anomalies. Thus, working with the bigger versatile dataset, poses a
challenge of data transformation and integration in the data warehouse as it may limit the
analytical mining techniques of the generic system and the results will also not be up to
the mark [173].
12
Chapter 1: Introduction
and preprocessing of a larger dataset as data consistency always plays a crucial role in the
effective analytical process. With the availability of larger attribute sets, the generic
mining techniques do not contribute as much as the collaborative mining techniques.
Similarly, the academic performance also depends on various factors like demographics,
examination, and socio-economic parameters of students. These factors necessitate the
development of an explicit tool for the analysis of educational data. It will also be useful
for framing new policies for better utilization of current resources. Such a system also
helps in getting a better insight into the causes of unexpectedly high failure rates.
Therefore, considering the social as well as research issues, we need explicit research to
analyze the educational pattern that assists in the improvement of the learning
phenomena.
The core objective of this research work is to propose an approach in the educational area
to analyze the academic performance of students. In relation to the above mentioned
proportion, we have conducted special studies of the problems to meet the following
objectives:
This research is an attempt to apply the data mining technique in the education area for
the analysis and evaluation of the students’ academic data so as to enhance the quality of
the school education system. Appropriate methodologies and datasets have to be used to
meet the objectives formulated under this study.
13
Chapter 1: Introduction
The core objective of this proposed research is to discover the learning pattern of students
for the betterment of the educational scenario. This study puts forward the need for an
automated analytical tool that would discover the similarities among student's attributes.
The development of an automated association rule-based classification system helps in
attaining the formulated research objectives. Therefore, the proposed study has performed
the following key contributions to discover the frequent patterns from the
multidimensional educational dataset.
Analysis of EDM Systems: Numerous studies were carried out with EDM systems
involved in the discovery of educational patterns. Most of the systems target the
web-based learning platform. But in the national context, there is only an offline
dataset which is also non-volatile in nature. When we talk about the size and
versatility of the chosen dataset, it was also not enough to describe the complete
result of the target source. Furthermore, the matrices involved in existing EDM
systems are not enough to discover the multi-hidden dependencies among learners’
attributes. Therefore, all these factors focus on the domain of multidimensional
frequent pattern discovery of educational datasets. Therefore, we choose it as the
research objective to fulfill the requirement of the education department for
exploring the huge amount of students’ data and answer the educational queries.
Structural Design: The architecture was designed to specifically handle the huge
amount of educational datasets and analyze the learning behavior of students. This
architecture comprises of various sub-sections as ‘multidimensional dataset
extraction, ‘association rule classification system’, ‘data visualization’ and
‘objective & subjective validation’. We have opted this architecture as our guidance
model to design an EDM based analytical tool to analyze the educational patterns of
students.
Multidimensional Dataset Construction: The initial step of the proposed
methodology is the production of a required dataset. This process starts with the
acquisition of a dataset from the aforesaid source. After getting the required dataset,
the data cleaning is performed to remove all the anomalies in the collected dataset.
The setting up of working attributes also needs to be done for choosing suitable
attributes for the analytical purpose. The transformation of the filtered dataset is
performed to convert the dataset into the required format. The array-based
14
Chapter 1: Introduction
15
Chapter 1: Introduction
Chapter 1: This chapter presents a brief introduction to data mining and education
data mining. The motivation of proposed research and academic performance
modeling is explained with the objectives of the research in details.
Chapter 2: This chapter elaborates about the literature that is finalized for this
research work. The timeline of this research work is decided to review the progress
of research work in the area of educational data mining. The categorization of
reviewed literature is done for arranging the literature corresponding to their
respective research work.
Chapter 4: This chapter describes the process of construction of the required data-
set about learners. The construction of the required dataset comprises of various
steps viz. acquisition of relevant data from educational sources, removal of
anomalies, handling of missing values, setting up of working attributes, and the
transformation of data into the requisite format. The transformed dataset has to be
loaded into the warehouse with all attribute sets of students along with metadata of a
loaded dataset.
Chapter 5: This chapter describes the core phase of the proposed research named
as association rule-based classification system. The associative and classification
mining has collaborated to discover the multi-hidden dependencies among learners’
attributes. This frequent pattern discovery has been devised to discover the
educational patterns from the dataset that is multidimensional in nature. This
module also allows the domain expert to input their self-derived educational pattern
that may not be returned by mining algorithms. These integrated pattern set further
classified according to target-attribute to illustrate the decision tree of the
association rules. These discovered patterns set assists in the discovery of learning
16
Chapter 1: Introduction
pattern about students and derive an optimal solution for the betterment of the
educational scenario.
Chapter 6: This chapter describes the data visualization module under EDM
analytical tool. The visual analytics feature is embedded within this module to
illustrate the graphical picture of the discovered rule set. This chapter also
demonstrates the visualization of the discovered rule set that was taken into
consideration for the experimental study.
Chapter 7: This chapter describes the validation and analysis of the proposed
methodology. The validation has to be performed in an objective as well as
subjective manner. The objective validation brings out the correctness of the
discovered results set against the desired parameters. The subjective validation
empirically proves the proposed methodology corresponding to the responses of
domain experts.
Chapter 8: The final chapter summarizes the established research and provides the
conclusion corresponding to the formulated objectives. It also lists out the future
scope for which the researcher to continue with this novel methodology to derive
more efficient results from the educational dataset.
17