You are on page 1of 45

Performance Prediction of Engineering Students

using Data Mining tools


ABSTARCT:

This paper presents a Data Mining tool for tutoring support of engineering
students without any need of data scientist background for usage. This tool is
focused on the analysis of students’ performance, in terms of the observable
scores and of the completion of their studies. For that purpose, it uses a data set
that only contains features typically gathered by university administrations
about the students, degrees and subjects. The web-based tool provides access to
results from different analyses. In existing system the preliminary experiments
on data of the engineering students from the 6 institutions associated to this
project were used to define the final implementation of the web-based tool. The
usefulness of the tool is discussed with respect to the stated goals, showing its
potential for the support of early profiling of students. The study has focused in
Engineering Bachelor degree programs currently running at higher education
institutions from 5 different countries of the European Union with 7 different
languages. Our EDM (educational data mining) trying to solve such as students
behavioral modeling, drop out prediction and placement prediction. Preliminary
results for classification and drop-out were acceptable since accuracies were
higher than 90% in some cases. The usefulness of the tool is discussed with
respect to the stated goals, showing its potential for the support of early
profiling of students. Real data from engineering degrees of EU Higher
Education institutions show the potential of the tool for managing high
education and validate its applicability on real scenarios.

KEYWORD: Drop-out prediction, educational data mining, performance


prediction, visual analytics, Data Mining tool.
INTRODUCTION:

The availability of data is a relevant asset for institutions, because data analysis
can be used to help in decision making both in the day-to-day operative as well
as strategically. In the educational domain, higher education institutions
generate vast amounts of data from different sources. In particular, the
universities collect every year data from their students including demographic
details (e.g., age, address or socioeconomic status) and information about their
admission and academic performance (school, degree, course path, and even
examination results). Sometimes this information is augmented with data
obtained from questionnaires and field observations or with information about
their career after graduation. Knowledge can be extracted from those data to
optimize the education management tasks and improve the students’ success
rate. Indeed, the European Commission states that ‘‘monitoring students creates
a foundation for institutional action’ Nowadays, it is quite common that any
interaction between students and the computer-based educational information
systems leaves a digital footprint that can be seen as complementary data[1].
Learning management systems, apart from providing access to the course
contents, might include support for management and evaluation of tasks, student
tracking and reporting that allows to assess their learning performance and to
predict the risk of dropping out. Intelligent tutoring systems are computer-
assisted instruction systems which record all student-educator interaction and
consequently customize the teaching process. Data stored by all these systems
will have higher granularity, at the course level, because it is related to specific
activities or events, such as the results to exercises and quizzes. In this sense,
two fields are receiving increasing attention: Learning Analytics (LA) and
Educational Data Mining (EDM). Both fields are multidisciplinary and cover a
common ground, but are focused on different targets [5],[6]. The focus of
learning analytics is the collection, analysis and knowledge extraction from
learning-related data to better understand and optimize learning results and
environments . The expected advantages of learning analytics can include
customized learning and course offerings, curriculum adjustments and
improvement of faculty performance or research . On the other hand,
educational data mining stress the research and development of automated and
data-driven methods to discover patterns in large volumes of educational data.
Methods in educational data mining can be classified in terms of its aim, i.e.,
prediction, clustering, relationship mining, distillation of data for human
judgment and discovery with models. Nevertheless, there is in any case an
overlap with regard to the problems LA and EDM are trying to solve, such as
student behavior modeling or drop-out prediction . Although there have been
already many studies applying data analysis to learning in higher-education, it is
still an emerging field that requires more attention from university
administration, instructors and other stakeholders . The prediction of drop-out
risk would be useful to identify tutoring needs and define early instructional and
counseling actions, which are agreed to be beneficial for students’ retention .
The number of tutors or counselors is usually small compared to the number of
students, so support systems will be needed to help these tutors in their
diagnostic activities, alleviating the needed effort to carry out personalized
retention actions. However, tutoring staff usually does not have a data scientist
background and ignores the potential of data analysis. This is one of the major
difficulties that prevents the adoption of those approaches. Furthermore, it also
needs to be recognized that tracking, collection and evaluation of data is
challenging. For that reason, previous works are usually constrained to, at most,
data from one institution. However, the joint analysis of students’ behavior at
different institutions could lead to interesting insight about their common
aspects and their differences that might be rooted in the institutional
characteristics. Even more if those institutions are heterogeneous enough, with
different sizes, demographic circumstances or countries of origin. There are
currently few reports in the learning analytics literature of deployment at scale .
For the previous reasons, the aim of the work reflected in this paper is the
development of an web-based software tool,1 to be used for support of the
predictive modeling activities of tutoring staff without a data scientist
background. This work has been developed in the context of a joint educational
project, Student Profile for Enhancing Tutoring Engineering, with the
participation of 6 European institutions of higher education. The proposed web
tool (SPEET tool) is focused on the analysis of students’ performance in
Engineering Bachelor degree programs, because the problem of dropout is
common in this stage and disciplines. Performance, for that purpose, would be
defined in terms of observable scores and completion of studies. It is also
necessary that the data on which the tool are based are easily acquired and
processed , so that any faculty or school could collect and organize their own
data in a format that matches with the one proposed here, gaining meaningful
benefits from the resulting tool analysis with a remarkable benefit arising from
inter-institutions comparison. Finally, the proposed approach needs also to have
a transnational nature, since obtaining similar student profiles among different
EU institutions might help to identify common characteristics of European
engineering students and the differences on a country/institution basis could
also be exposed and lead to deeper analysis. However, this transnational context
imposes some constraints on the targets that are studied. For that reason, the
work focuses in a global and transnational degree-wide view of performance,
rather than focusing on a course-wise analysis. Higher granularity is impractical
due to multiple reasons: courses from different institutions would hardly be
comparable unless they were specifically designed for that purpose, the usage
and particular implementation of course tracking software would differ among
institutions and findings would not be easily generalizable. Moreover, the need
for a simple and easily available data set brings further constraints. For these
reasons, he proposed common data set and representation, accounting for the
national and institutional differences in degree organization, uses only variables
obtained from the administrative records of the students, such as demographic
data, courses taken, or academic performance. This is a large scale comparative
study on dropout and completion in higher education in Europe that provides
insight into the policies that European countries and higher education
institutions employ to explicitly address study success, how these policies are
being monitored and whether they are effective. Pulling together evidence from
existing research, surveying national and institutional experts and stakeholders
across 35 European countries as well as exploring national definitions and data
on various aspects of study success makes this ground breaking research. In the
perspective of the Europe 2020 Strategy, including the ambition to have at least
40% of the 30-34 year olds holding a tertiary education qualification by 2020,
the issue of increasing educational attainment is gaining importance in the
national and international debates on higher education. Reducing dropout and
increasing completion are regarded prime strategies to achieve higher
attainment levels. A key concern is that too many students in Europe drop out
before obtaining a higher education diploma or degree. This is a problem across
the EU, as success in higher education is vital for jobs, social justice and
economic growth. Particularly in times of economic austerity, the pressure for
effective and efficient use of resources is necessary, from governmental,
institutional as well as student perspectives. The 2011 Modernisation Agenda
rightfully states that it takes a joint effort of all Member States, higher education
institutions and the European Commission to take a pro-active approach in
working towards the objectives and increasing participation and attainment in
higher education. Widening access and improving completion rates accordingly
have been on the Bologna Process agenda since the Prague Communiqué (2001)
and became a priority for 2012-2015 (cf. Bucharest Communiqué, 2012) as well
as the Yerevan Communiqué (2015-2018). In Yerevan communiqué the EHEA
objectives put an even greater emphasis on the quality and relevance of learning
and teaching and making higher education more inclusive to widen
opportunities for access and completion (European Commission, 2015). A
number of governments have taken initiatives to increase the attractiveness,
quality, efficiency and diversity of higher education. For example, various
countries – such as Denmark, Germany, the Netherlands and Scotland – have
implemented profiling and performance orientation policies to better align
higher education institutions and programmes with the demands and needs of
students and the labour market (De Boer et al., 2015; Vossensteyn et al., 2011).
Obviously, there is tension between the policy aims of increasing participation
rates and maintaining high completion or low dropout rates: higher education
has to accommodate larger enrolments and more diversity among learners, yet
keep more students in the system and assure they can achieve the learning
outcomes needed for completing a degree. This calls for a stronger
knowledgebase on what countries and higher education institutions can do in
order to effectively achieve the objectives of reducing dropout and increasing
completion[4]. This research work is focused on technologies aimed at elearning
based scenarios such as Massive Open Online Courses (MOOC) or other e-
learning platforms, Intelligent Tutoring Systems (ITS) and Learning Analytics
(LA), and its main objective is to enhance the learners’ experience and reduce
dropout rates in these e-learning based scenarios. For this, it was established, as
the first objective, to study the background and state of the art, mainly with the
analysis, exploration and comparison among existing interactive platforms and
technologies, their pros, cons and specifications. As MOOCs and other e-
learning scenarios grow in popularity, the relatively low completion rates of
students has been a dominant criticism [1]. Therefore, this study aimed also to
identify, in a first phase, through a survey within a student community, the key
reasons for dropouts when using video lectures as a primary e-learning resource.
At a second phase, further insight was obtained by interviewing teachers,
counselors and platform administrators. This first research and analysis was
used to detect motives and behavior patterns of students with dropout thoughts.
A very important part in a Learning Management System (LMS) is the tracking
and recording of student progress with the use of learning analytics [2]. These
measurements, analysis and reporting of data about learners and their contexts is
essential in determining dropout patterns (in a similar way to the process of
finding patterns in banking or insurance clients). Therefore, another objective of
this work is to, with the usage of learning analytics, determine different stages,
levels and patterns in dropout students and suggest appropriate interventions in
order to prevent, in advance, these closure actions. Based on Intelligent
Tutoring Systems, and with the knowledge of abandonment patterns, computer
intelligent services may be generated, in a first reaction to a dropout profile, by
automatically messaging motivational sentences in an early phase; by alerting
the guidance counselor and/or teacher in a middle stage, and by the usage of e-
tutoring technologies (one-to-one) on an advanced stage (i.e., only within a final
phase and not by student demand). Though the work has a technological
approach, its main objective is to prevent dropouts and raise completion rates
within these scenarios. It also seeks that proposals follow, to the extent possible,
to existing educational models and concepts. It is important to note, that the
methods can vary substantially with regard to the technologies used and the
target audience[5]. As an interdisciplinary field of study, Educational Data
Mining (EDM) applies machine-learning, statistics, Data Mining (DM), psycho-
pedagogy, information retrieval, cognitive psychology, and recommender
systems methods and techniques to various educational data sets so as to resolve
educational issues [1]. The International Educational Data Mining Society [2]
defines EDM as ‘‘an emerging discipline, concerned with developing methods
for exploring the unique types of data that come from educational settings, and
using those methods to better understand students, and the settings which they
learn in’’ (p. 601). EDM is concerned with analyzing data generated in an
educational setup using disparate systems. Its aim is to develop models to
improve learning experience and institutional effectiveness. While DM, also
referred to as Knowledge Discovery in Databases (KDDs), is a known field of
study in life sciences and commerce, yet, the application of DM to educational
context is limited [3]. One of the pre-processing algorithms of EDM is known
as Clustering. It is an unsupervised approach for analyzing data in statistics,
machine learning, pattern recognition, DM, and bioinformatics. It refers to
collecting similar objects together to form a group or cluster. Each cluster
contains objects that are similar to each other but dissimilar to the objects[9],
[10]. Data Mining (DM) techniques to educational data, and so, its objective is
to analyze these type of data in order to resolve educational research issues [27].
DM can be defined as the process involved in extracting interesting,
interpretable, useful and novel information from data [7]. It has been used for
many years by businesses, scientists and governments to sift through volumes of
data like airline passenger records, census data and the supermarket scanner
data that produces market research reports [10]. EDM is concerned with
developing methods to explore the unique types of data in educational settings
and, using these methods, to better understand students and the settings in which
they learn [21]. On one hand, the increase in both instrumental educational
software as well as state databases of student information has created large
repositories of data reflecting how students learn [14]. On the other hand, the
use of Internet in education has created a new context known as elearning or
web-based education in which large amounts of information about teaching-
learning interaction are endlessly generated and ubiquitously available [16]. All
this information provides a gold mine of educational data [18]. The EDM
process converts raw data coming from educational systems into useful
information that could potentially have a great impact on educational research
and practice. This process does not differ much from other application areas of
data mining like business, genetics, medicine, etc. because it follows the same
steps as the general data mining process [21]: pre-processing, data mining and
post-processing. However, it is important to notice that in this paper the term
data mining is used in a larger sense than the original/traditional DM definition.
That is, we are going to describe not only EDM studies that use typical DM
techniques such as classification, clustering, association rule mining, sequential
mining, text mining, etc. but also other approaches such as regression,
correlation, visualization, etc. that are not considered to be DM in a strict sense.
Furthermore, some methodological innovations and trends in EDM such as
discovery with models and the integration of psychometric modeling
frameworks are unusual DM categories or not necessarily universally seen as
being DM [20].

LITERATURE SURVEY:

1. Topic: Dropout and completion in higher education in Europe

Author: J. J. Vossensteyn, A. Kottmann, B. W. Jongbloed, F. Kaiser, L.


Cremonini, B. Stensaker, E. Hovdhaugen, and S. Wollscheid

Improving completion and reducing dropout in higher education are key


concerns for higher education in Europe. This study on dropout and completion
in higher education in Europe demonstrates that national governments and
higher education institutions use three different study success objectives:
completion, time-to-degree and retention. To address these objectives policy
makers at national and institutional level apply various policy instruments.
These can be categorized under three main policy headings: financial
incentives; information and support for students; and organizational issues. The
evidence indicates that countries that have more explicit study success
objectives, targets and policies are likely to be more successful. Particularly if
the policy approach is comprehensive and consistent. As such, it is important
that study success is an issue in the information provision to (prospective)
students, in financial incentives for students and institutions, in quality
assurance, and in the education pathways offered to students. Furthermore,
increasing the responsibility of higher education institutions for study success,
for example in the area of selecting, matching, tracking, counselling, mentoring
and integrating students in academic life is clearly effective. Finally, to support
the policy debate and monitoring of study success evidence, there is a need for
more systematic international comparative data and thorough analysis of the
effectiveness of study success policies[1]

2.Topic: Learning analytics: A glance of evolution, status, and trends according


to a proposed taxonomy

Author: A. Peña-Ayala

Before the emergence of computer-based educational systems (CBES) whose


aims of providing teaching and learning experiences to hundreds even
thousands of users, an explosion of information (e.g., students' log data)
demands sophisticated methods to gather, analyze, and interpret learners' traces
to regulate and enhance education. Thus, learning analytics (LA) arises as a
knowledge discovery paradigm that provides valuable findings and facilitates
stakeholders to understand the learning process and its implications. Therefore,
a landscape of the LA nature, its underlying factors, and applications achieved
is outlined in this paper according to a suggested LA Taxonomy that classifies
the LA duty from a functional perspective. The aim is to provide an idea of the
LA toil, its research lines, and trends to inspire the development of novel
approaches for improving teaching and learning practices. Furthermore, the
scope of this review covers recently published papers in prestigious journals and
conferences, where the works dated from 2016 are summarized and those
corresponding to 2014–2015 are cited according to the proposed LA taxonomy.
A glimpse is sketched of LA, where underlying elements frame the field
foundations to ground the approaches. Moreover, LA strengths, weaknesses,
challenges, and risks are highlighted to advice how the LA arena could be
enhanced and empowered. In addition, this review offers an insight of the recent
LA labor, as well as motivates readers to enrich the LA achievements. This
work promotes the LA practice giving an account of the job being achieved and
reported in literature, as well as a reflection of the state-of-the-art and an
acumens vision to inspire future labor[3].

3.Topic: ‘‘Enhancing learners’ experience in e-learning based scenarios using


intelligent tutoring systems and learning analytics: First results from a
perception survey

Author: R. M. M. F. Luis, M. Llamas-Nistal, and M. J. F. Iglesias

E-learning students tend to get jaded and easily dropout from online courses.
Enhancing the learners' experience and reducing dropout rates in these e-
learning based scenarios is the main purpose of this study. This paper presents
the results obtained so far and preliminary conclusions. In a first stage, the
objective was to study the background and state of the art of these educational
scenarios. In a second phase, identifying key reasons for dropouts, through a
survey and interviews, was the aim to understand and detect motives and
behavior patterns of students with dropout thoughts. Finally, developing, testing
and validating a functional prototype of an Intelligent Tutoring System will
allow to evaluate concepts, collect statistical information on its effectiveness,
analyze and discover if course completion rates are improved[4]

4.Topic: Educational data mining and learning analytics for 21st century higher
education: A review and synthesis

Author: H. Aldowah, H. Al-Samarraie, and W. M. Fauzy


The potential influence of data mining analytics on the students’ learning
processes and outcomes has been realized in higher education. Hence, a
comprehensive review of educational data mining (EDM) and learning analytics
(LA) in higher education was conducted. This review covered the most relevant
studies related to four main dimensions: computer-supported learning analytics
(CSLA), computer-supported predictive analytics (CSPA), computer-supported
behavioral analytics (CSBA), and computer-supported visualization analytics
(CSVA) from 2000 till 2017. The relevant EDM and LA techniques were
identified and compared across these dimensions. Based on the results of 402
studies, it was found that specific EDM and LA techniques could offer the best
means of solving certain learning problems. Applying EDM and LA in higher
education can be useful in developing a student-focused strategy and providing
the required tools that institutions will be able to use for the purposes of
continuous improvement[5].

5.Topic: A systematic review on educational data mining

Author: A. Dutt, M. A. Ismail, and T. Herawan,

Presently educational institutions compile and store huge volumes of data such
as student enrolment and attendance records, as well as their examination
results. Mining such data yields stimulating information that serves its handlers
well. Rapid growth in educational data points to the fact that distilling massive
amounts of data requires a more sophisticated set of algorithms. This issue led
to the emergence of the field of Educational Data Mining (EDM). Traditional
data mining algorithms cannot be directly applied to educational problems, as
they may have a specific objective and function. This implies that a
preprocessing algorithm has to be enforced first and only then some specific
data mining methods can be applied to the problems. One such preprocessing
algorithm in EDM is Clustering. Many studies on EDM have focused on the
application of various data mining algorithms to educational attributes.
Therefore, this paper provides over three decades long (1983-2016) systematic
literature review on clustering algorithm and its applicability and usability in the
context of EDM. Future insights are outlined based on the literature reviewed,
and avenues for further research are identified[6].

6.Topic: Educational data mining: A review of the state of the art

Author: C. Romero and S. Ventura

Educational data mining (EDM) is an emerging interdisciplinary research area


that deals with the development of methods to explore data originating in an
educational context. EDM uses computational approaches to analyze
educational data in order to study educational questions. This paper surveys the
most relevant studies carried out in this field to date. First, it introduces EDM
and describes the different groups of user, types of educational environments,
and the data they provide. It then goes on to list the most typical/common tasks
in the educational environment that have been resolved through data-mining
techniques, and finally, some of the most promising future lines of research are
discussed[7].

7.Topic: The current land scape of learning analytics in higher education

Author: O. Viberg, M. Hatakka, O. Bälter, and A. Mavroudi

Learning analytics can improve learning practice by transforming the ways we


support learning processes. This study is based on the analysis of 252 papers on
learning analytics in higher education published between 2012 and 2018. The
main research question is: What is the current scientific knowledge about the
application of learning analytics in higher education? The focus is on research
approaches, methods and the evidence for learning analytics. The evidence was
examined in relation to four earlier validated propositions: whether learning
analytics i) improve learning outcomes, ii) support learning and teaching, iii)
are deployed widely, and iv) are used ethically. The results demonstrate that
overall there is little evidence that shows improvements in students' learning
outcomes (9%) as well as learning support and teaching (35%). Similarly, little
evidence was found for the third (6%) and the forth (18%) proposition. Despite
the fact that the identified potential for improving learner practice is high, we
cannot currently see much transfer of the suggested potential into higher
educational practice over the years. However, the analysis of the existing
evidence for learning analytics indicates that there is a shift towards a deeper
understanding of students’ learning experiences for the last years[8].

8.Topic: Learning analytics: Drivers, developments and challenges

Author: R. Ferguson

Learning analytics is a significant area of technology-enhanced learning that has


emerged during the last decade. This review of the field begins with an
examination of the technological, educational and political factors that have
driven the development of analytics in educational settings. It goes on to chart
the emergence of learning analytics, including their origins in the 20th century,
the development of data-driven analytics, the rise of learning-focused
perspectives and the influence of national economic concerns. It next focuses on
the relationships between learning analytics, educational data mining and
academic analytics. Finally, it examines developing areas of learning analytics
research, and identifies a series of future challenges[9].

9.Topic: Penetrating the fog: Analytics in learning and education

Author: G. Siemens and P. Long

Attempts to imagine the future of education often emphasize new technologies


—ubiquitous computing devices, flexible classroom designs, and innovative
visual displays. But the most dramatic factor shaping the future of higher
education is something that we can’t actually touch or see: big data and
analytics. Basing decisions on data and evidence seems stunningly obvious, and
indeed, research indicates that data-driven decision-making improves
organizational output and productivity.1 For many leaders in higher education,
however, experience and “gut instinct” have a stronger pull.

Meanwhile, the move toward using data and evidence to make decisions is
transforming other fields. Notable is the shift from clinical practice to evidence-
based medicine in health care. The former relies on individual physicians basing
their treatment decisions on their personal experience with earlier patient
cases.2 The latter is about carefully designed data collection that builds up
evidence on which clinical decisions are based. Medicine is looking even
further toward computational modeling by using analytics to answer the simple
question “who will get sick?” and then acting on those predictions to assist
individuals in making lifestyle or health changes. 3Insurance companies also are
turning to predictive modeling to determine high-risk customers. Effective data
analysis can produce insight into how lifestyle choices and personal health
habits affect long-term risks.4 Business and governments too are jumping on the
analytics and data-driven decision-making trends, in the form of “business
intelligence”[10].

10. Topic: Guest editorial–learning and knowledge analytics

Author: G. Siemens and D. Gasevic

The early stages of the internet and world wide web drew attention to the
communication and connective capacities of global networks. The ability to
collaborate and interact with colleagues from around the world provided
academics with new models of teaching and learning. Today, online education
is a fast growing segment of the education sector. A side effect, to date not well
explored, of digital learning is the collection of data and analytics in order to

understand and inform teaching and learning. Learning analytics currently

sits at a crossroads between technical and social learning theory fields. On


the one hand, the algorithms that form recommender systems, personalization
models, and network analysis require deep technical expertise. The impact
of these algorithms, however, is felt in the social system of learning. As
a consequence, researchers in learning analytics have devoted
significant attention to bridging these gaps and bringing thesecommunities in
contact with each other through conversations and conferences. The LAK12
conference inVancouver, for example, included invited panels and presentations
from the educational data mining.  Community
The SoLAR steering committee also includes representation from the Internatio
nal Educational Data mining Society[11].

EXISTING SYSTEM:

One of the most interesting uses of data analysis in the educational field
is the exploration of data to discover patterns and derive knowledge. For this
purpose, it is useful to involve the human analyst in the process. Therefore,
interactive visual analytics, which blends information visualization and
advanced computational methods to provide a semi-automated analytical
process driven by interaction, is an interesting option . The ability of visual
analytics to augment data analysis with human perceptual and cognitive abilities
is valuable as a tool to manage educational data , because these techniques
allow people to discover trends, gaps or groups. Most applications of visual
analytics in education have been constrained to the analysis of data obtained
from the interaction of students with learning management systems and other
learning support platforms. For instance, interactive visualizations were used for
the analysis of the correlations between activity patterns in MOOCs (massive
open online courses) and dropout. Nevertheless, most previous works face
educational data analysis from a predictive perspective, aiming to forecast
future academic outcomes and to obtain a better understanding of the factors
that play a part in academic success. The factors related to students’
performance are still the subject of debate among educators, academics, and
policy makers. Some authors found that academic achievement is related to the
student’s ability and adaptation (also described in relation to motivation and
perseverance). The challenge is to acquire quantitative data for those factors,
because questionnaires could be used for that matter but students’ responses
might not reflect faithfully their latent abilities or attitudes. Other studies
examining this problem also point out that environmental factors such as
previous schooling, parents’ education or family income have a significant
effect on the students behavior. The institutional factors can also influence
academic success, specifically the degree of adaptation and support that the
institution provides, its structure, as well as the clarity on the communication of
expectations and requirements, such as the admissions criteria. In this sense, the
joint analysis of data from multiple sources of the university, such as academic
records, the activity on a LMS, the prior academic history or demographic
variables, has been used to predict the likelihood of being unsuccessful and the
retention rate. In any case, a non-trivial stage of data preprocessing is necessary,
where aspects such as the hierarchical structure, context, granularity and time
range of data must be considered . The goal behind the prediction of students’
performance is generally explanatory, i.e., to obtain a better understanding that
guides educational actions that would hopefully result in enhanced outcomes.
For that reason, sometimes performance prediction is rather posed as a
classification problem, either binary or with multiple classes, ranging from low
to high performance . That is also the case in the approach presented in , which
is also aimed at finding courses that are good predictors of students’
performance and their progression. In this application, it is necessary to find a
trade-off between classification performance and interpretability. Widely-used
classification techniques have been used for this purpose, including decision
trees , Bayesian networks, k-nearest neighbors , naïve Bayes and random
forests. The prediction of dropout, which pertains to the fact or risk of not
completing the degree due to academic failure, voluntary withdrawal or transfer
to other institution, is useful not only to help faculty in understanding its causes
but also to provide an early alert that might lead to corrective interventions.
Student retention is an important aim, because dropout has undesirable
consequences both for individuals and society. For that reason, dropout has
been extensively studied in the literature, trying to analyze its predicting
variables . Several factors are assumed to have an impact on the drop-out rate.
Among the external ones, one is the socio-economic environment, which
includes variables such as family income, fees, availability of financial support,
need for a supporting job, parents’ previous education, cultural differences or
social disadvantages . Apart from that, low performance in previous studies,
poor results at the first year or simultaneous enrollment in multiple programs
can be relevant factors of dropout. On the other hand, there are additional
internal factors, related to the student’s personality and development, including
at least the students’ general attitude towards studying, their confidence and
beliefs about themselves as learners, the anxiety with certain subjects, the
perception of value, the interest in a subject, and the enjoyment. Loss of
motivation is usually linked to situations where a student cannot master
fundamental concepts and skills, due to alienation or disengagement from
learning[4].

EXISTING ARCHITECTURE:
SOFTWARE ENVIRNOMENT:

Python is a general-purpose interpreted, interactive, object-oriented, and high-


level programming language. It was created by Guido van Rossum during
1985- 1990. Like Perl, Python source code is also available under the GNU
General Public License (GPL). This tutorial gives enough understanding
on Python programming language.

Why to Learn Python?

Python is a high-level, interpreted, interactive and object-oriented scripting


language. Python is designed to be highly readable. It uses English keywords
frequently where as other languages use punctuation, and it has fewer
syntactical constructions than other languages.

Python is a MUST for students and working professionals to become a great


Software Engineer specially when they are working in Web Development
Domain. I will list down some of the key advantages of learning Python:
 Python is Interpreted − Python is processed at runtime by the
interpreter. You do not need to compile your program before executing
it. This is similar to PERL and PHP.

 Python is Interactive − You can actually sit at a Python prompt and


interact with the interpreter directly to write your programs.

 Python is Object-Oriented − Python supports Object-Oriented style or


technique of programming that encapsulates code within objects.

 Python is a Beginner's Language − Python is a great language for the


beginner-level programmers and supports the development of a wide
range of applications from simple text processing to WWW browsers to
games.

Characteristics of Python

Following are importance characteristics of Python Programming −

 It supports functional and structured programming methods as well as


OOP.

 It can be used as a scripting language or can be compiled to byte-code for


building large applications.

 It provides very high-level dynamic data types and supports dynamic


type checking.

 It supports automatic garbage collection.

 It can be easily integrated with C, C++, COM, ActiveX, CORBA, and


Java.

Hello World using Python.


Just to give you a little excitement about Python, I'm going to give you a small
conventional Python Hello World program, You can try it using Demo link.

print ("Hello, Python!");

Applications of Python

As mentioned before, Python is one of the most widely used language over the
web. I'm going to list few of them here:

 Easy-to-learn − Python has few keywords, simple structure, and a


clearly defined syntax. This allows the student to pick up the language
quickly.

 Easy-to-read − Python code is more clearly defined and visible to the


eyes.

 Easy-to-maintain − Python's source code is fairly easy-to-maintain.

 A broad standard library − Python's bulk of the library is very portable


and cross-platform compatible on UNIX, Windows, and Macintosh.

 Interactive Mode − Python has support for an interactive mode which


allows interactive testing and debugging of snippets of code.

 Portable − Python can run on a wide variety of hardware platforms and


has the same interface on all platforms.

 Extendable − You can add low-level modules to the Python interpreter.


These modules enable programmers to add to or customize their tools to
be more efficient.

 Databases − Python provides interfaces to all major commercial


databases.
 GUI Programming − Python supports GUI applications that can be
created and ported to many system calls, libraries and windows systems,
such as Windows MFC, Macintosh, and the X Window system of Unix.

 Scalable − Python provides a better structure and support for large


programs than shell scripting.

Audience

This Python tutorial is designed for software programmers who need to learn


Python programming language from scratch.

Prerequisites

You should have a basic understanding of Computer Programming


terminologies. A basic understanding of any of the programming languages is a
plus Python is a high-level, interpreted, interactive and object-oriented scripting
language. Python is designed to be highly readable. It uses English keywords
frequently where as other languages use punctuation, and it has fewer
syntactical constructions than other languages.

 Python is Interpreted − Python is processed at runtime by the


interpreter. You do not need to compile your program before executing
it. This is similar to PERL and PHP.

 Python is Interactive − You can actually sit at a Python prompt and


interact with the interpreter directly to write your programs.

 Python is Object-Oriented − Python supports Object-Oriented style or


technique of programming that encapsulates code within objects.

 Python is a Beginner's Language − Python is a great language for the


beginner-level programmers and supports the development of a wide
range of applications from simple text processing to WWW browsers to
games.

History of Python

Python was developed by Guido van Rossum in the late eighties and early
nineties at the National Research Institute for Mathematics and Computer
Science in the Netherlands.

Python is derived from many other languages, including ABC, Modula-3, C,


C++, Algol-68, SmallTalk, and Unix shell and other scripting languages.

Python is copyrighted. Like Perl, Python source code is now available under
the GNU General Public License (GPL).

Python is now maintained by a core development team at the institute, although


Guido van Rossum still holds a vital role in directing its progress.

Python Features

Python's features include −

 Easy-to-learn − Python has few keywords, simple structure, and a


clearly defined syntax. This allows the student to pick up the language
quickly.

 Easy-to-read − Python code is more clearly defined and visible to the


eyes.

 Easy-to-maintain − Python's source code is fairly easy-to-maintain.


 A broad standard library − Python's bulk of the library is very portable
and cross-platform compatible on UNIX, Windows, and Macintosh.

 Interactive Mode − Python has support for an interactive mode which


allows interactive testing and debugging of snippets of code.

 Portable − Python can run on a wide variety of hardware platforms and


has the same interface on all platforms.

 Extendable − You can add low-level modules to the Python interpreter.


These modules enable programmers to add to or customize their tools to
be more efficient.

 Databases − Python provides interfaces to all major commercial


databases.

 GUI Programming − Python supports GUI applications that can be


created and ported to many system calls, libraries and windows systems,
such as Windows MFC, Macintosh, and the X Window system of Unix.

 Scalable − Python provides a better structure and support for large


programs than shell scripting.

Apart from the above-mentioned features, Python has a big list of good
features, few are listed below −

 It supports functional and structured programming methods as well as


OOP.

 It can be used as a scripting language or can be compiled to byte-code for


building large applications.

 It provides very high-level dynamic data types and supports dynamic


type checking.

 It supports automatic garbage collection.


 It can be easily integrated with C, C++, COM, ActiveX, CORBA, and
Java

Python is available on a wide variety of platforms including Linux and Mac OS


X. Let's understand how to set up our Python environment.

Local Environment Setup

Open a terminal window and type "python" to find out if it is already installed
and which version is installed.

 Unix (Solaris, Linux, FreeBSD, AIX, HP/UX, SunOS, IRIX, etc.)


 Win 9x/NT/2000
 Macintosh (Intel, PPC, 68K)
 OS/2
 DOS (multiple versions)
 PalmOS
 Nokia mobile phones
 Windows CE
 Acorn/RISC OS
 BeOS
 Amiga
 VMS/OpenVMS
 QNX
 VxWorks
 Psion
 Python has also been ported to the Java and .NET virtual machines
Getting Python

The most up-to-date and current source code, binaries, documentation, news,
etc., is available on the official website of Python https://www.python.org/

You can download Python documentation from https://www.python.org/doc/.


The documentation is available in HTML, PDF, and PostScript formats.

Installing Python

Python distribution is available for a wide variety of platforms. You need to


download only the binary code applicable for your platform and install Python.

If the binary code for your platform is not available, you need a C compiler to
compile the source code manually. Compiling the source code offers more
flexibility in terms of choice of features that you require in your installation.

Here is a quick overview of installing Python on various platforms −

Unix and Linux Installation

Here are the simple steps to install Python on Unix/Linux machine.

 Open a Web browser and go to https://www.python.org/downloads/.


 Follow the link to download zipped source code available for
Unix/Linux.
 Download and extract files.
 Editing the Modules/Setup file if you want to customize some options.
 run ./configure script
 make
 make install

This installs Python at standard location /usr/local/bin and its libraries


at /usr/local/lib/pythonXX where XX is the version of Python.
Windows Installation
Here are the steps to install Python on Windows machine.

 Open a Web browser and go to https://www.python.org/downloads/.


 Follow the link for the Windows installer python-XYZ.msi file where
XYZ is the version you need to install.
 To use this installer python-XYZ.msi, the Windows system must support
Microsoft Installer 2.0. Save the installer file to your local machine and
then run it to find out if your machine supports MSI.
 Run the downloaded file. This brings up the Python install wizard, which
is really easy to use. Just accept the default settings, wait until the install
is finished, and you are done.

Macintosh Installation
Recent Macs come with Python installed, but it may be several years out of
date. See http://www.python.org/download/mac/ for instructions on getting the
current version along with extra tools to support development on the Mac. For
older Mac OS's before Mac OS X 10.3 (released in 2003), MacPython is
available.

Jack Jansen maintains it and you can have full access to the entire
documentation at his website − http://www.cwi.nl/~jack/macpython.html. You
can find complete installation details for Mac OS installation.

Setting up PATH

Programs and other executable files can be in many directories, so operating


systems provide a search path that lists the directories that the OS searches for
executables.
The path is stored in an environment variable, which is a named string
maintained by the operating system. This variable contains information
available to the command shell and other programs.

The path variable is named as PATH in Unix or Path in Windows (Unix is


case sensitive; Windows is not).

In Mac OS, the installer handles the path details. To invoke the Python
interpreter from any particular directory, you must add the Python directory to
your path.

Setting path at Unix/Linux

To add the Python directory to the path for a particular session in Unix −

 In the csh shell − type setenv PATH "$PATH:/usr/local/bin/python" and


press Enter.
 In the bash shell (Linux) − type export
PATH="$PATH:/usr/local/bin/python" and press Enter.
 In the sh or ksh shell − type PATH="$PATH:/usr/local/bin/python" and
press Enter.
 Note − /usr/local/bin/python is the path of the Python directory

Setting path at Windows

To add the Python directory to the path for a particular session in Windows −

At the command prompt − type path %path%;C:\Python and press Enter.

Note − C:\Python is the path of the Python directory

Python Environment Variables


Here are important environment variables, which can be recognized by Python

Sr.No Variable & Description


.

1 PYTHONPATH

It has a role similar to PATH. This variable tells the Python interpreter
where to locate the module files imported into a program. It should include
the Python source library directory and the directories containing Python
source code. PYTHONPATH is sometimes preset by the Python installer.

2 PYTHONSTARTUP

It contains the path of an initialization file containing Python source code. It


is executed every time you start the interpreter. It is named as .pythonrc.py
in Unix and it contains commands that load utilities or modify
PYTHONPATH.

3 PYTHONCASEOK

It is used in Windows to instruct Python to find the first case-insensitive


match in an import statement. Set this variable to any value to activate it.

4 PYTHONHOME

It is an alternative module search path. It is usually embedded in the


PYTHONSTARTUP or PYTHONPATH directories to make switching
module libraries easy.

Running Python

There are three different ways to start Python –

Interactive Interpreter

You can start Python from Unix, DOS, or any other system that provides you a
command-line interpreter or shell window.

Enter python the command line.

Start coding right away in the interactive interpreter.

$python # Unix/Linux
or
python% # Unix/Linux
or
C:> python # Windows/DOS

Here is the list of all the available command line options –

Sr.No Option & Description


.

1 -d

It provides debug output.

2 -O
It generates optimized bytecode (resulting in .pyo files).

3 -S

Do not run import site to look for Python paths on startup.

4 -v

verbose output (detailed trace on import statements).

5 -X

disable class-based built-in exceptions (just use strings); obsolete starting


with version 1.6.

6 -c cmd

run Python script sent in as cmd string

7 file

run Python script from given file

Script from the Command-line


A Python script can be executed at command line by invoking the interpreter
on your application, as in the following −

$python script.py # Unix/Linux

or
python% script.py # Unix/Linux

or

C: >python script.py # Windows/DOS

Note − Be sure the file permission mode allows execution.

Integrated Development Environment


You can run Python from a Graphical User Interface (GUI) environment as
well, if you have a GUI application on your system that supports Python.

 Unix − IDLE is the very first Unix IDE for Python.


 Windows − PythonWin is the first Windows interface for Python and is
an IDE with a GUI.
 Macintosh − The Macintosh version of Python along with the IDLE IDE
is available from the main website, downloadable as either MacBinary or
BinHex'd files.

If you are not able to set up the environment properly, then you can take help
from your system admin. Make sure the Python environment is properly set up
and working perfectly fine.

Note − All the examples given in subsequent chapters are executed with
Python 2.4.3 version available on CentOS flavor of Linux.

We already have set up Python Programming environment online, so that you


can execute all the available examples online at the same time when you are
learning theory. Feel free to modify any example and execute it online.

Python has been an object-oriented language since it existed. Because of this,


creating and using classes and objects are downright easy. This chapter helps
you become an expert in using Python's object-oriented programming support.
If you do not have any previous experience with object-oriented (OO)
programming, you may want to consult an introductory course on it or at least a
tutorial of some sort so that you have a grasp of the basic concepts.

However, here is small introduction of Object-Oriented Programming (OOP) to


bring you at speed −

Overview of OOP Terminology

 Class − A user-defined prototype for an object that defines a set of


attributes that characterize any object of the class. The attributes are data
members (class variables and instance variables) and methods, accessed
via dot notation.
 Class variable − A variable that is shared by all instances of a class.
Class variables are defined within a class but outside any of the class's
methods. Class variables are not used as frequently as instance variables
are.
 Data member − A class variable or instance variable that holds data
associated with a class and its objects.
 Function overloading − The assignment of more than one behavior to a
particular function. The operation performed varies by the types of
objects or arguments involved.
 Instance variable − A variable that is defined inside a method and
belongs only to the current instance of a class.
 Inheritance − The transfer of the characteristics of a class to other
classes that are derived from it.
 Instance − An individual object of a certain class. An object obj that
belongs to a class Circle, for example, is an instance of the class Circle.
 Instantiation − The creation of an instance of a class.
 Method − A special kind of function that is defined in a class definition.
 Object − A unique instance of a data structure that's defined by its class.
An object comprises both data members (class variables and instance
variables) and methods.
 Operator overloading − The assignment of more than one function to a
particular operator.

Creating ClassesThe class statement creates a new class definition. The name


of the class immediately follows the keyword class followed by a colon as
follows −
class ClassName:
'Optional class documentation string'
class_suite

 The class has a documentation string, which can be accessed


via ClassName.__doc__.
 The class_suite consists of all the component statements defining class
members, data attributes and functions.

PROPOSED ARCHITECTURE:
ANALYTIC PART VISUALIZATION
PART

Student Input Dataset Explority Analysis

Pre-Process Reinforcement
Algorithm

Variable

Segmentation Classification Regression

Data Data
Random Forest
Algorithm

Filtration
Trained Accuracy

Predictive Model Result Graph

Testing Predictive Model Accuracy

Placement Drop-Out
Prediction Prediction

PROPOSED SYSTEM:

Here, we proposed student placement prediction using machine learning. The


exist work completed with some analytic prediction and student performance
with visualization . In our proposed work deals with two parts

1) Analysis and visualization

2) prediction

The analysis and visualization used to analyse the student mark and makes the
graph for understanding purpose. The prediction will be provide the feature
prediction of drop-out and placement. The drop-out for both graduate and other
are exist in previous base paper .here were are focus only placement prediction .
The dataset are collected from particular institute, that dataset having the
student performance like (mark ,physics, fitness ,other activity etc…,)first
extraction . The information from the dataset and pre-process it (clean). The
random forest algorithm are used to train and test the dataset the random forest
algorithm provide better result for classification dataset.

SYSTEM SPECIFICATION:

HARDWARE REQUIREMENTS:

 System : I3 Processor
 Hard Disk : 500 GB.
 Monitor : 15 inch VGA Color.
 Mouse : Logitech Mouse.
 Ram : 4 GB
 Keyboard : Standard Keyboard

SOFTWARE REQUIREMENTS:

 Operating System : Windows.


 Platform : PYTHON TECHNOLOGY
 Tool : Jupyter Notebook
 Back End : python anaconda script

CONCLUSION:

This paper presents a Data Mining software tool for student profiling, providing
support to tutoring staff without a data scientist background. The presented tool
is focused on the analysis and forecasting of students’ performance, in terms of
the observable scores and of the completion of studies. The study has focused in
Engineering Bachelor degree programs currently running at higher education
institutions from 5 different countries of the European Union, with different
sizes and degrees taught in 7 languages. For those reasons, the considered
variables are those commonly found in the administrative records of the
students (student’s explanatory variables, student’s performance and
information about subjects and degrees) and analyses are aimed to provide a
global, degree-wide view of performance, instead of course-wise .The data
structure has been kept simple enough to be applicable to diverse institutions. It
would also be useful to add further information about classroom attendance and
results at the course level, obtained from learning management systems, to the
analysis. Including these variables is a pre-requisite to study, for instance, the
effects of teaching methodologies

REFERENCES:

[1] J. J. Vossensteyn, A. Kottmann, B. W. Jongbloed, F. Kaiser, L. Cremonini,


B. Stensaker, E. Hovdhaugen, and S. Wollscheid, ‘‘Dropout and completion in
higher education in Europe,’’ Eur. Union, Brussels, Belgium, Tech. Rep. NC-
04-15-779-EN-N,2015.

website: https://research.utwente.nl/en/publications/dropout-and-completion-
in-higher-education-in-europe-main-report

[2] C. Lang, G. Siemens, A. Wise, and D. Gasevic, Handbook of Learning


Analytics. Beaumont, AB, Canada: SOLAR, Society Learning Analytics
Research, 2017, doi: 10.18608/hla17.
[3] A. Peña-Ayala, ‘‘Learning analytics: A glance of evolution, status, and
trends according to a proposed taxonomy,’’ Wiley Interdiscipl. Rev., Data
Mining Knowl. Discovery, vol. 8, no. 3, p. e1243, May 2018.

Website:
https://www.researchgate.net/publication/322815971_Learning_analytics_A_gl
ance_of_evolution_status_and_trends_according_to_a_proposed_taxonomy

[4] R. M. M. F. Luis, M. Llamas-Nistal, and M. J. F. Iglesias, ‘‘Enhancing


learners’ experience in e-learning based scenarios using intelligent tutoring
systems and learning analytics: First results from a perception survey,’’ in Proc.
12th Iberian Conf. Inf. Syst. Technol. (CISTI), Jun. 2017, pp. 1–4.

Website:
https://www.researchgate.net/publication/318416411_Enhancing_learners'_expe
rience_in_e-learning_based_scenarios_using_Intelligent_tutoring_systems_

and_learning_analytics_First_results_from_a_perception_survey/link/
60586de492851cd8ce5ab389/download

[5] H. Aldowah, H. Al-Samarraie, and W. M. Fauzy, ‘‘Educational data mining


and learning analytics for 21st century higher education: A review and
synthesis,’’ Telematics Informat., vol. 37, pp. 13–49, Apr. 2019.

https://www.semanticscholar.org/paper/Educational-data-mining-and-learning-
analytics-for-Aldowah-Al-
Samarraie/6f715e8bbdc69840eb6fe40357b092739da02f12

[6] A. Dutt, M. A. Ismail, and T. Herawan, ‘‘A systematic review on


educational data mining,’’ IEEE Access, vol. 5, pp. 15991–16005, 2017.

https://www.researchgate.net/publication/312509093_A_Systematic_Review_
on_Educational_Data_Mining
[7] C. Romero and S. Ventura, ‘‘Educational data mining: A review of the state
of the art,’’ IEEE Trans. Syst., Man, Cybern. C, Appl. Rev., vol. 40, no. 6, pp.
601–618, Nov. 2010.

https://www.researchgate.net/publication/
224160756_Educational_Data_Mining_A_Review_of_the_State_of_the_Art

[8] O. Viberg, M. Hatakka, O. Bälter, and A. Mavroudi, ‘‘The current


landscape of learning analytics in higher education,’’ Comput. Hum. Behav.,
vol. 89, pp. 98–110, Dec. 2018.

[9] R. Ferguson, ‘‘Learning analytics: Drivers, developments and challenges,’’


Int. J. Technol. Enhanced Learn., vol. 4, nos. 5–6, pp. 304–317, 2012.

[10] G. Siemens and P. Long, ‘‘Penetrating the fog: Analytics in learning and
education,’’ EDUCAUSE Rev., vol. 46, no. 5, pp. 31–40, 2011.

[11] G. Siemens and D. Gasevic, ‘‘Guest editorial–learning and knowledge


analytics,’’ Educ. Technol. Soc., vol. 15, no. 3, pp. 1–3, 2012.

[12] J. T. Avella, M. Kebritchi, S. G. Nunn, and T. Kanai, ‘‘Learning analytics


methods, benefits, and challenges in higher education: A systematic literature
review,’’ Online Learn., vol. 20, no. 2, pp. 13–29, 2016.

[13] C. Romero and S. Ventura, ‘‘Data mining in education,’’ Wiley


Interdiscipl. Rev., Data Mining Knowl. Discovery, vol. 3, no. 1, pp. 12–27,
2013.

[14] R. S. J. D. Baker and K. Yacef, ‘‘The state of educational data mining in


2009: A review and future visions,’’ J. Edu. Data Mining, vol. 1, no. 1, pp. 3–
17, 2009.

[15] C. Vieira, P. Parsons, and V. Byrd, ‘‘Visual learning analytics of


educational data: A systematic literature review and research agenda,’’ Comput.
Edu., vol. 122, pp. 119–135, Jul. 2018.
[16] C. C. Gray and D. Perkins, ‘‘Utilizing early engagement and machine
learning to predict student outcomes,’’ Comput. Edu., vol. 131, pp. 22–32, Apr.
2019.

[17] A. Ortigosa, R. M. Carro, J. Bravo-Agapito, D. Lizcano, J. J. Alcolea, and


O. Blanco, ‘‘From lab to production: Lessons learnt and real-life challenges of
an early student-dropout prevention system,’’ IEEE Trans. Learn. Technol., vol.
12, no. 2, pp. 264–277, Apr. 2019.

[18] R. Ferguson, L. P. Macfadyen, D. Clow, B. Tynan, S. Alexander, and S.


Dawson, ‘‘Setting learning analytics in context: Overcoming the barriers to
large-scale adoption,’’ J. Learn. Anal., vol. 1, no. 3, pp. 120–144, Sep. 2014.

[19] R. Vilanova, M. Dominguez, J. Vicario, M. Prada, M. Barbu, M. Varanda,


P. Alves, M. Podpora, U. Spagnolini, and A. Paganoni, ‘‘Data-driven tool for
monitoring of students performance,’’ IFAC-PapersOnLine, vol. 52, no. 9, pp.
190–195, 2019.

[20] M. Barbu, R. Vilanova, J. Lopez Vicario, M. J. Varanda, P. Alves, M.


Podpdora, M. A. Prada, A. Morán, A. Torreburno, S. Marin, and R. Tocu,
‘‘Data mining tool for academic data exploitation: Literature review and first
architecture proposal,’’ Erasmus+ KA2 / KA203 project SPEET— Student
Profile for Enhancing Engineering Tutoring, Instituto Politécnico de Bragança,
Bragança, Portugal, Tech. Rep. SPEET-IO1, 2017.

[21] D. Keim, G. Andrienko, J.-D. Fekete, C. Görg, J. Kohlhammer, and G.


Melançon, ‘‘Visual analytics: Definition, process, and challenges,’’ in
Information Visualization. Berlin, Germany: Springer, 2008, pp. 154–175.

[22] Y. Chen, Q. Chen, M. Zhao, S. Boyer, K. Veeramachaneni, and H. Qu,


‘‘DropoutSeer: Visualizing learning patterns in massive open online courses for
dropout reasoning and prediction,’’ in Proc. IEEE Conf. Vis. Anal. Sci.
Technol. (VAST), Oct. 2016, pp. 111–120. VOLUME 8, 2020 212833
[23] J. Zimmerman, K. H. Brodersen, H. R. Heinimann, and J. M. Buhmann,
‘‘A model-based approach to predicting graduate-level performance using
indicators of undergraduate-level performance,’’ J. Edu. Data Mining, vol. 7,
no. 3, pp. 151–176, 2015.

[24] V. Tinto, ‘‘Student attrition and retention,’’ in The Encyclopedia of


Higher Education, vol. 3. Oxford, U.K.: Pergamon Press, 1992, pp. 1697–1709.

[25] K. E. Arnold and M. D. Pistilli, ‘‘Course signals at purdue: Using learning


analytics to increase student success,’’ in Proc. 2nd Int. Conf. Learn. Analytics
Knowl. (LAK), 2012, pp. 267–270.

[26] P. Strecht, L. Cruz, C. Soares, J. Mendes-Moreira, and R. Abreu, ‘‘A


comparative study of classification and regression algorithms for modelling
students’ academic performance,’’ in Proc. 8th Int. Conf. Educ. Data Mining,
Madrid, Spain, Jun. 2015, pp. 392–395.

[27] N. T. Nghe, P. Janecek, and P. Haddawy, ‘‘A comparative analysis of


techniques for predicting academic performance,’’ in Proc. 37th Annu. Frontiers
Edu. Conf.-Global Eng., Knowl. Borders, Opportunities Passports, Oct. 2007,
pp. 7–12.

[28] E. Pathros Ibarra Garcia and P. Medina Mora, ‘‘Model prediction of


academic performance for first year students,’’ in Proc. 10th Mex. Int. Conf.
Artif. Intell., Dec. 2011, pp. 169–174.

[29] D. Kabakchieva, ‘‘Predicting student performance by using data mining


methods for classification,’’ Cybern. Inf. Technol., vol. 13, no. 1, pp. 61–72,
2013.

[30] R. Asif, A. Merceron, S. A. Ali, and N. G. Haider, ‘‘Analyzing


undergraduate students’ performance using educational data mining,’’ Comput.
Edu., vol. 113, pp. 177–194, Oct. 2017.
[31] A. U. Khasanah, ‘‘A comparative study to predict student’s performance
using educational data mining techniques,’’ in Proc. IOP Conf. Ser., Mater. Sci.
Eng., 2017, vol. 215, no. 1, Art. no. 012036.

[32] G. Lassibille and L. N. Gómez, ‘‘Why do higher education students drop


out? Evidence from Spain,’’ Edu. Econ., vol. 16, no. 1, pp. 89–105, 2008.

[33] C. Aina, ‘‘Parental background and university dropout in Italy,’’ Higher


Edu., vol. 65, no. 4, pp. 437–456, Apr. 2013.

[34] Regulation (EU) 2016/679 of 27 April 2016 on the Protection of Natural


Persons With Regard to the Processing of Personal Data and on the Free
Movement of Such Data, and Repealing Directive 95/46/EC (General Data
Protection Regulation), Eur. Parliament Council Eur. Union, Brussels, Belgium,
2016.

[35] D. Ifenthaler and C. Widanapathirana, ‘‘Development and validation of a


learning analytics framework: Two case studies using support vector
machines,’’ Technol., Knowl. Learn., vol. 19, nos. 1–2, pp. 221–240, Jul. 2014.

[36] J. Kuzilek, M. Hlosta, and Z. Zdrahal, ‘‘Open university learning analytics


dataset,’’ Sci. Data, vol. 4, 2017, Art. no. 170171, doi: 10.1038/sdata.2017.171.

[37] S. Rovira, E. Puertas, and L. Igual, ‘‘Data-driven system to predict


academic grades and dropout,’’ PLoS ONE, vol. 12, no. 2, Feb. 2017, Art. no.
e0171207.

[38] R. Xu and D. Wunsch, ‘‘Survey of clustering algorithms,’’ IEEE Trans.


Neural Netw., vol. 16, no. 3, pp. 645–678, May 2005.

[39] J. MacQueen, ‘‘Some methods for classification and analysis of


multivariate observations,’’ in Proc. 5th Berkeley Symp. Math. Statist. Probab.,
Statist., L. LeCam and J. Neyman, Eds. Berkeley, CA, USA: Univ. California
Press, 1967, pp. 281–297.
[40] R. Vilanova, J. Vicario, M. Prada, M. Barbu, M. Dominguez, M. J. Pereira,
M. Popdora, U. Spagnolini, P. Alves, and A. Paganoni, ‘‘SPEET: Software
tools for academic data analysis,’’ in Proc. EDULEARN Int. Conf. Edu. New
Learn. Technol., 2018, pp. 1–10.

[41] L. Van Der Maaten, E. Postma, and J. Van den Herik, ‘‘Dimensionality
reduction: A comparative review,’’ Tilburg Centre Creative Comput., Tilburg
Univ., Tilburg, The Netherlands, Tech. Rep. TiCC TR 2009-005, 2009.

[42] S. Wold, K. Esbensen, and P. Geladi, ‘‘Principal component analysis,’’


Chemometrics Intell. Lab. Syst., vol. 2, nos. 1–3, pp. 37–52, 1987.

[43] L. van der Maaten and G. Hinton, ‘‘Visualizing data using t-SNE,’’ J.
Mach. Learn. Res., vol. 9, pp. 2579–2605, Nov. 2008.

[44] M. Prada, A. Domínguez, A. Morán, R. Vilanova, J. L. Vicario, M. J.


Varanda, P. Alves, M. Podpdora, M. Barbu, A. Torreburno, U. Spagnolini, and
A. Paganoni, ‘‘Data mining tool for academic data exploitation: Graphical data
analysis and visualization,’’ Erasmus+ KA2 / KA203 project SPEET—Student
Profile for Enhancing Engineering Tutoring, Instituto Politécnico de Bragança,
Bragança, Portugal, Tech. Rep. SPEET-IO3, 2018.

[45] M. Domínguez, R. Vilanova, M. Prada, J. Vicario, M. Barbu, M. J.


Pereira, M. Podpora, U. Spagnolini, P. Alves, and A. Paganoni, ‘‘SPEET:
Visual data analysis of engineering students performance from academic data,’’
in Proc. Learn. Anal. Summer Inst. Spain, 2018, pp. 50–61.

[46] I. Díaz Blanco, A. A. Cuadrado Vega, D. Pérez López, M. Domínguez


González, S. Alonso Castro, and M. Á. Prada Medrano, ‘‘Energy analytics in
public buildings using interactive histograms,’’ Energy Buildings, vol. 134, pp.
94–104, Jan. 2017.
[47] J. Lopez Vicario, R. Vilanova, M. Bazzarelli, A. Paganoni, U. Spagnolini,
A. Torrebruno, M. A. Prada, A. Morán, M. Domínguez, M. J. Varanda, P.
Alves, M. Podpora, and M. Barbu, ‘‘Data mining tool for academic data
exploitation: Selection of most suitable algorithms,’’ Erasmus+ KA2 / KA203
project SPEET—Student Profile for Enhancing Engineering Tutoring, Instituto
Politécnico de Bragança, Bragança, Portugal, Tech. Rep. SPEET-IO2, 2018.

[48] N. Cristianini and J. Shawe-Taylor, An Introduction to Support Vector


Machines and Other Kernel-Based Learning Methods. Cambridge, U.K.:
Cambridge Univ. Press, 2000.

[49] C. M. Bishop, Pattern Recognition and Machine Learning. New York, NY,
USA: Springer, 2006.

[50] T. Hastie, R. Tibshirani, and J. H. Friedman, The Elements of Statistical


Learning: Data Mining, Inference, and Prediction. New York, NY, USA:
Springer-Verlag, 2001.

[51] L. Fontana and A. M. Paganoni, ‘‘Analysis of dropout in engineering BSc


using logistic mixed-effect models,’’ in XLIX Scientific Meeting of the Italian
Statistical Society. London, U.K.: Pearson, 2018.

[52] C. E. McCulloch and J. M. Neuhaus, ‘‘Generalized linear mixed models,’’


in Encyclopedia of Biostatistics, vol. 4. Hoboken, NJ, USA: Wiley, 2005.

[53] M. Barbu, R. Vilanova, J. Lopez Vicario, M. J. Varanda, P. Alves, M.


Podpdora, A. Kawala-Janik, M. A. Prada, Domínguez, U. Spagnolini, and L.
Fontana, ‘‘Data mining tool for academic data exploitation: Publication report
on engineering students profiles,’’ Erasmus+ KA2 / KA203 Project SPEET—
Student Profile for Enhancing Engineering, Instituto Politécnico de Bragança,
Bragança, Portugal, Tech. Rep. SPEET-IO4, 2019.
[54] W. McKinney, Python for Data Analysis: Data Wrangling with Pandas,
NumPy, and IPython. Newton, MA, USA: O’Reilly Media, Inc, 2012.

[55] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O.


Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, and J. Vanderplas,
‘‘Scikit-learn: Machine learning in Python,’’ J. Mach. Learn. Res., vol. 12, pp.
2825–2830, Oct. 2011.

[56] M. Bostock, V. Ogievetsky, and J. Heer, ‘‘D3 data-driven documents,’’


IEEE Trans. Vis. Comput. Graphics, vol. 17, no. 12, pp. 2301–2309, Dec. 2011.

[57] G. W. Dekker, M. Pechenizkiy, and J. M. Vleeshouwers, ‘‘Predicting


students drop out: A case study,’’ in Proc. Int. Work. Group Educ. Data Mining,
2009, pp. 1–10.

[58] U. Spagnolini, L. Fontana, A. Paganoni, A. Torrebruno, M. A. Prada, M.


Domínguez, A. Morán, R. Vilanova, J. Lopez Vicario, M. J. Varanda, P. Alves,
M. Podpora, and M. Barbu, ‘‘Data mining tool for academic data exploitation:
Webtool description and usage,’’ Erasmus+ KA2 / KA203 project SPEET—
Student Profile for Enhancing Engineering Tutoring,’’ Instituto Politécnico de
Bragança, Bragança, Portugal, Tech. Rep. SPEET-IO5, 2019

You might also like