Professional Documents
Culture Documents
Article
Sustainable e-Learning by
Data Mining—Successful
Results in a Chilean University
https://doi.org/10.3390/su15020895
sustainability
Article
Sustainable e-Learning by Data Mining—Successful Results in
a Chilean University
Aurora Sánchez 1 , Cristian Vidal-Silva 2, *, Gabriela Mancilla 1 , Miguel Tupac-Yupanqui 3 and José M. Rubio 4
1 Department of Administration, Universidad Católica del Norte, Angamos 0610, Antofagasta 1270709, Chile
2 Faculty of Engineering, School of Videogame Development and Virtual Reality Engineering,
University of Talca, Talca 3460000, Chile
3 EAP, Ingeniería de Sistemas e Informática, Universidad Continental, Huancayo 12000, Peru
4 Escuela de Computación e Informática, Facultad de Ingeniería, Ciencia y Tecnología, Universidad Bernardo
O’Higgins, Santiago 8370993, Chile
* Correspondence: cvidal@utalca.cl; Tel.: +56-9-62002702
Abstract: People are increasingly open to using online education mainly to break the distance and
time barriers of presential education. This type of education is sustainable at all levels, and its
relevance has increased even more during the pandemic. Consequently, educational institutions are
saving large volumes of data containing relevant information about their operations, but they do not
know why students succeed or fail. The Knowledge Discovery in Databases (KDD) process could
support this challenge by extracting innovative models to identify the main patterns and factors
that could affect the success of their students in online education programs. This work uses the
CRISP-DM (Cross-Industry Standard Process for Data Mining) methodology to analyze data from
the Distance Education Center of the Universidad Católica del Norte (DEC-UCN) from 2000 to 2018.
CRISP-DM was chosen because it represents a proven process that integrates multiple methodologies
to provide an effective meta-process for data knowledge projects. DEC-UCN is one of the first centers
to implement online learning in Chile, and this study analyses 18,610 records in this period. The
study applies data mining, the most critical KDD phase, to find hidden data patterns to identify
the variables associated with students’ success in online learning (e-learning) programs. This study
found that the main variables explaining student success in e-learning programs are age, gender,
Citation: Sánchez, A.; Vidal-Silva, C.;
degree study, educational level, and locality.
Mancilla, G.; Tupac-Yupanqui, M.;
Rubio, J.M. Sustainable e-Learning by Keywords: CRISP-DM; e-learning; data mining; KDD; DEC-UCN; students’ success
Data Mining—Successful Results in a
Chilean University. Sustainability
2023, 15, 895. https://doi.org/
10.3390/su15020895 1. Introduction
Academic Editor: Tarah Wright Current advances in education and technology facilitate people to develop compe-
tencies in defined areas at home [1]. As [2] highlight, online learning is a model that
Received: 11 October 2022 has revolutionized education thanks to the inclusion of Information and Communication
Revised: 10 December 2022
Technologies (ICTs) and the growing of Educational Data Mining (EDM). Educational
Accepted: 20 December 2022
institutions place attention on this revolution for the use of new methodologies in the
Published: 4 January 2023
educational process [3]. Multiple studies exist that evaluate the success of online learning
technology platforms, mainly based on the success of DeLone and McLean information
systems model to measure and assess the success and sustainability of electronic learning
Copyright: © 2023 by the authors.
systems [4,5]. However, despite the rapid growth of online learning and EDM, there are
Licensee MDPI, Basel, Switzerland. many problems faced by institutions that offer online courses, and the variables that impact
This article is an open access article student success in distance education is yet unknown.
distributed under the terms and Tools for identifying behavioral data patterns and factors for the success of online
conditions of the Creative Commons learning already exist [6]. This study fills the gap in terms of the variables for the student
Attribution (CC BY) license (https:// success in e-learning programs by adapting the data mining methodology CRISP-DM
creativecommons.org/licenses/by/ (Cross-Industry Standard Process for Data Mining) to discover the variables of success [7].
4.0/). E-learning could readily meet the needs, features, and requirements of potential students
who select this modality of study [8–10], even more so during the pandemic [11]. No
previous research exists in South American countries that identifies determinants for the
student success in e-learning programs.
Knowledge Discovery in Databases (KDD), commonly known as data mining [12,13],
is a process for the patterns discovery and predictive modeling in large databases [14].
KDD makes extensive use of data mining methods, automated techniques, and algorithms
for pattern recognition and identifying hidden patterns in e-learning environments [15].
Characteristically, data mining uses machine learning methods developed in the domain of
artificial intelligence [16]. Data mining uses statistical, mathematical, artificial intelligence,
and machine learning techniques to extract and identify pertinent information and related
knowledge hidden in large volumes of raw data [17]. Data mining is technically the process
of finding correlations or patterns between thousands of fields in large databases [15]. Data
mining finds these patterns and relationships using data analysis tools and techniques to
build models and machine learning [18,19].
Data mining comprises various techniques for pre-processing, analyzing, and inter-
preting data. Most researchers in the area agree that we could organize them into pattern
recognition and machine learning. Pattern recognition aims to identify implicit objects
and relations, and machine learning techniques are mainly applied to extract generalized
knowledge from data for use in prediction tasks. Researchers can use classification tech-
niques in data mining to predict group membership for data occurrences. Consequently,
data mining involves more than collecting and managing data because it includes analysis
and prediction. Classification techniques allow the processing of a wider variety of data
than regression, and they are growing in popularity. There is a great variety of algorithms
for classification purposes. Scheuer and Mclaren [20] propose a model to identify the
most influential factors that predict student academic performance. They predict students’
passing or failing status by considering and defining their academic performance (high,
medium, or low).
Educational Data Mining (EDM) is concerned with developing, investigating, and ap-
plying machine learning, data mining, and statistical methods to detect patterns in extensive
collections of data from educational institutions that would otherwise be impossible to
analyze using traditional computing techniques [2,10,21]. In this sense, in recent years,
the use of deep learning techniques has emerged in EDM. Hence, developing data mining
competencies represents a research area. For example, the work of [22] presents positive
experiences to enhance knowledge acquisition about data mining via the game-based
approach. Regarding the implementation of EDM systems, the work of Almaiah et al. [23]
discuss about traditional issues, and the success of using modern programming languages.
• First, this article identifies potential variables for success or failure in e-learning
programs, not only academic factors, through a systematic literature review.
• Second, this article defines a repeatable data mining application for identifying stu-
dents’ success patterns in e-learning environments using a large set of data. This
analysis was not feasible with other methods.
• Third, this article provides a utilization example of multi-year historical data starting
when e-learning programs began being a phenomenon in Chile and other countries
(the year 2000). Other institutions in the region could repeat this application.
The remainder of this paper is organized as follows. Section 2 defines the main
concepts of e-learning, CRISP-DM, the data mining process, and previous data mining in
education experiences. Section 3 describes the applied methodology and case study data:
we define the main steps of the data mining process, the data source, concepts, and expected
results. After that, Section 4 highlights obtained results to validate our hypothesis. Section 5
describes the usefulness of this work for a similar context, overall, for online educational
institutions and programs concerning what variables are relevant to consider. The paper
concludes with a discussion of future work in Section 6.
The application of data mining techniques has two primary purposes: building models
and detecting patterns [30]. The model building seeks to produce a summary of the data set
to identify and describe the main characteristics. Pattern detection seeks to identify small
deviations from the norm to detect unusual behavior patterns by discovering patterns and
rules and searches for content. When it is not possible to build models for the data set, you
can look for behavior patterns. Pattern and rule discovery seeks frequent combinations and
associations of attributes found in database transactions (for example, products purchased
together). Techniques based on association rules usually address that issue.
Sustainability 2023, 15, 895 4 of 16
2.1. CRISP-DM
CRISP-DM method is one of the most efficient methodologies for developing projects
applying data mining [31,32]. The objective of CRISP-DM is to allow different using
a common vocabulary, methodology, and tools in data mining activities. CRISP-DM
organizes in six phases from general to specific tasks:
1. Business Understanding Phase: The first phase analysis of the problem includes
understanding the project’s objectives and requirements from a business or institu-
tional perspective.
2. Data Comprehension Phase: The second phase of data analysis includes the initial
data collection, identifying the quality of the data.
3. Data Preparation Phase: This phase includes general data selection tasks for applying
modeling techniques (variables and samples), data cleaning, generation of additional
variables, integration of different data sources, and format changes.
4. Modeling Phase: In this phase, selecting the most appropriate modeling techniques
takes place to generate and evaluate the model. The parameters used in the model
generation depend on the characteristics of the data.
5. Evaluation Phase: In the evaluation phase, the model is evaluated, not from the data
point of view, but for fulfilling the problem’s success criteria. If the generated model
is valid based on the success established in the first phase, the model is exploited.
6. Implementation Phase: At this stage, in addition to the implementation of the model,
the results must be presented and documented understandably, to achieve an increase
in knowledge.
Figure 2 [33] illustrates CRISP-DM stages.
Figure 3. Proposed model to predict the most influential factors of students at risk.
3. Methodology
This study looks to determine the success of the online learning modality provided by
the Distance Education Center of the Universidad Católica del Norte (DEC-UCN) by using
data mining to support the case analysis methodology to know about the initial conditions
of students in educative programs.
This study worked with data on the admission and final results of DEC-UCN students
between 2000 and 2018. The total number of students was of 12,264. The study stages were
developed from the CRISP-DM model to analyze the database information and apply the
corresponding tools. We used particular data mining techniques and algorithms, such as
decision trees, descriptive statistics, and neural networks. The computational tool used was
SPSS Statistics 22 [57]. The benefit of this technique is that it provides an easy understanding
of data mining decision making.
they did not contain reliable information for the investigation. Figure 4 presents an extract
of the relational model of the ANTEC system.
Figure 4. Excerpt from the relational model of the ANTEC database system.
data selection and construction process was complete, the changes were saved to the
files for their use later in the modeling stage. The new files are in SPSS format.
3.4. Modeling
We carried out the data modeling for the DEC-UCN at a global level. In this study,
the classification model predicts the student profile’ associated with success in programs
with an online learning modality. Considering the research and results of [58–60], we
applied the decision tree AdaBoostM1 and tree J.48 along with the naive Bayes and random
forest algorithms for classification. The classification model takes as a dependent variable
“state”, which is a categorical variable, and the category “Graduated” as the highest level
of success of a student in the model. The academic success in this perspective measures
students in the category graduated from a started program.
For the formulation of the model, we applied neural networks to identify relationships
between the variables and determine their importance concerning the target variable.
For constructing the decision trees, we initially used two algorithms: (i) the C5.0 algorithm
that presents rules that allow a clearer understanding of the generative partitioning; (ii) the
CHAID algorithm that, from a statistical point of view (based on the significance of the
chi-square test), constructs the trees by comparing the categories, contracting those that do
not present differences in their results. Subsequently, a decision tree algorithm is selected
based on the results obtained (case prediction) and the analysis of the construction of the
tree itself.
In order to predict the accuracy and ensure precision, the study established a confusion
matrix for each algorithm, which was necessary to calculate the metrics of Precision, Recall,
F1, Accuracy, and the Matthews correlation coefficient. Table 1 defines the procedure and
characteristic of those measures [61].
Metric Definition
It is used to measure the positive patterns that are correctly
Precision predicted from the total predicted patterns in a positive class [62].
It permits to measure the fraction of positive patterns that
Recall are correctly classified [62].
It measures the ratio of correct predictions over the total
Accuracy number of instances evaluated [63].
Metric that represents the harmonic mean between recall and
F1 precision values [41].
Measure that is not affected by the dataset problem of being
unbalanced. MCC is a correlation coefficient between observed
and predicted binary rankings; returns a value between −1 and +1.
Matthew’s correlation A coefficient of +1 represents a perfect prediction, 0 is
coefficient (MCC) no better than a random prediction, and −1 indicates complete
disagreement between prediction and observation [64].
It is a graphic representation of the relationship between the true-positive
and false-positive ratios of the classifier. The area under the ROC curve
provides an approach to evaluate which model is better on average.
ROC curve A model will be considered to discriminate better than chance if the curve
lies above the diagonal of no discrimination, i.e., if the AUC is higher than [65].
4. Results
The statistical results are the behavior patterns that influence students’ success in on-
line learning modality and their failures. The programs with the largest number of students
are Human Resources Administration, Environmental Management, Family Medication,
Psychopedagogy, Total Quality Management, Integrated Management, and Educational
Orientation (see Table 2).
Sustainability 2023, 15, 895 9 of 16
Initially, we present the analysis of the data using decision trees. This analysis shows
that the first level of the tree identified the variable “type of programs” as the main predictor
of student success at the DEC-UCN, from left to right, from nodes 20 to 22. The type of
program with the highest percentage of graduates is Continuous Improvement. With a
p-value of 0.001, a chi-square of 66.4 and a degree of freedom (df) of 8, we can observe that
students belonging to the Metropolitan, Magallanes, Tarapacá, and Bio Bio areas obtained
the highest percentage of graduated students with 76.8%, followed by the Aysén and Los
Lagos areas with 67.9%. Figure 5 illustrates the mentioned results.
The second type of program with the highest percentage of graduate students is
Training and Technical Courses with 53.8%. In this program, students with non-university
professions obtained a larger portion of qualifications than students with university profes-
sions from the art and health science areas. Students prefer this program due to its high
degree percentage compared to professional and technical courses (see Figure 6).
Sustainability 2023, 15, 895 10 of 16
Figure 6. Decision Tree II: Second program with the highest percentage of graduates.
Sustainability 2023, 15, 895 11 of 16
Figure 7. Decision Tree III: Program with the highest percentage of students eliminated.
The analysis of the data using neural networks gave additional information about
the main variables that could predict success of students in e-learning programs. Table 3
shows that when using neural networks to identify the variables, the analysis classifies
60.8% of correct predictions. The model in Figure 8 showed that the determinant factors for
academic success for all programs, from the highest to the lowest, are age, program code,
profession, scholarity, type of program, region, and finally sex, according to the student’s
final academic situation in the most successful programs, which gives a reasonable first
approximation regarding the topic. The study analyzed comparatively the performance of
the classification algorithms used, as defined in the research model. These results indicated
that AdaBoostM1 and naive Bayes were the algorithms with the lowest performance.
Table 4 shows that precision, recall, and F-measure indicators were comparatively low.
The AdaBoost M1 algorithm achieved a correct classification of 62.15% compared to naive
Bayes, with 61.7%. The MCC values are also closer to zero (0.118 and 0.007), so their
prediction is not much better than chance. The ROC values are also quite close to 0.5,
Sustainability 2023, 15, 895 12 of 16
which is not an indication of good prediction. The tree J.48 and random forest algorithms
had the best results. The random forest algorithm stands out as the one with the best
result, with 64.5% of the correctly classified instances, achieving the best prediction of
graduate students. In addition, this algorithm obtains the best MCC value, indicating a
better relationship between the observed data and the prediction. The ROC value is also a
sign of its good performance with a value of 65.2%, well above the rest of the algorithms.
Table 3. Importance grade of variables in the global program type using neural networks.
Predicted
Sample
Act Rem Abn Trans Grad Cert Correct %
Active (Act) 0 2 0 0 0 34 0.0%
Removed (Rem) 0 6102 0 0 0 1140 84.3%
Abnegated (Abn) 0 426 0 0 0 20 0.0%
Training Transferred (Trans) 0 17 0 0 0 3 0.0%
Graduated (Grad) 0 302 0 0 0 201 0.0%
Certified (Cert) 0 2906 0 0 0 1720 37.2%
Overall % 0.0% 75.8% 0.0% 0.0% 0.0% 24.2% 60.8%
Active (Act) 0 2 0 0 0 16 0.0%
Removed (Rem) 0 2565 0 0 0 479 84.3%
Abnegated (Abn) 0 181 0 0 0 8 0.0%
Testing Transferred (Trans) 0 13 0 0 0 0 0.0%
Graduated (Grad) 0 103 0 0 0 97 0.0%
Certified (Cert) 0 1300 0 0 0 753 36.7%
Overall % 0.0% 75.0% 0.0% 0.0% 0.0% 24.5% 60.2%
TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area
AdaBoostM1 0.622 0.532 0.593 0.622 0.574 0.118 0.543 0.559
Naïve 0.617 0.616 0.548 0.617 0.475 0.007 0.535 0.554
Bayes
Random 0.645 0.435 0.634 0.645 0.636 0.222 0.652 0.658
Forest
TREE J.48 0.643 0.494 0.625 0.643 0.607 0.184 0.604 0.609
5. Discussion
The advances in technology permit the massive application of data mining nowadays.
As Soria-Barreto et al. [66] remark, computing tools and technologies permit a more effective
e-learning success. In this research, we aimed to identify variables for student success or
failures in the e-learning programs at DEC-UCN by applying the CRISP-DM methodology,
which is one of the most widely used tools in this research field. We identified factors
that determined student success in studying online programs through the decision tree
and neural network techniques. Those results contribute to a greater understanding of
the factors with the contingent issue of distance education in Chile. Our study identified
the types of programs with the greatest success in terms of the student’s final academic
situation and the programs with the greatest failure. The greatest failure programs are
undergraduate and bachelor degrees that require more time and dedication for their
completion. The number of programs without a degree continues increasing due to its
short-term characteristics.
This study is highly relevant for e-learning programs because of data from a database
of the oldest online program in Chile. The database contained student records from
2000 to 2018 inclusive; that is, 18,610 records in nineteen years. We highlight that our
results found variables that determine the success and failure of students. Our study
established that student success and failure largely depend on age, sex, previous education,
job, and region. Understanding each program’s academic success factors is decisive for
the students’ selection and dissemination of the programs. These results support the
organization’s know-how to establish policies for disseminating and maintaining students
in online learning modalities. The found variables are relevant for online education in
Chile and other neighboring countries because educational institutions can consider those
variables to organize their programs.
dropping out. We are currently working on applying big data techniques and data mining
for pattern discovery to compare their results and know the best approach.
Author Contributions: Formal analysis, M.T.-Y. and J.M.R.; Investigation, A.S., C.V.-S. and G.M.;
Data curation, M.T.-Y. and J.M.R.; Project administration, A.S. All authors have read and agreed to
the published version of the manuscript.
Funding: This research received no external funding.
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.
Data Availability Statement: Data is part of the UCN database.
Conflicts of Interest: The authors declare no conflict of interest.
References
1. Coman, C.; T, îru, L.G.; Meses, an-Schmitz, L.; Stanciu, C.; Bularca, M.C. Online teaching and learning in higher education during
the coronavirus pandemic: Students’ perspective. Sustainability 2020, 12, 10367. [CrossRef]
2. Koedinger, K.R.; D’Mello, S.; McLaughlin, E.A.; Pardos, Z.A.; Rosé, C.P. Data mining and education. WIREs Cogn. Sci. 2015, 6,
333–353.
3. Asín, A.; Peinado, J.; Jurado, P. La sociedad del conocimiento y las TICs: Una inmejorable oportunidad para el cambio docente.
In Pixel-Bit: Revista de Medios y Educación Nº 34; Universidad de Sevilla: Seville, Spain, 2009; pp. 179–204, ISSN 1133-8482.
4. Delone, W.H.; McLean, E.R. The DeLone and McLean Model of Information Systems Success: A Ten-Year Update. J. Manag. Inf.
Syst. 2003. 19, 9–30.
5. Alsabawy, A.; Cater-Steel, A.; Soar, J. A Model to Measure E-Learning Systems Success. Meas. Organ. Inf. Syst. Success New
Technol. Pract. 2012, 39, 293–317. [CrossRef]
6. Herrera, M.; Ruiz, S.; Romagnano, M.R.; Ganga, L.; Lund, M.I.; Torres, E. Aplicando métodos y técnicas de la ciencia de los
datos a datos universitarios. In Proceedings of the XXI Workshop de Investigadores en Ciencias de la Computación (WICC 2019,
Universidad Nacional de San Juan, San Jose, Argentina, 21 October 2019.
7. Martínez-Plumed, F.; Contreras-Ochando, L.; Ferri, C.; Hernández Orallo, J.; Kull, M.; Lachiche, N.; Ramírez Quintana, M.J.;
Flach, P.A. CRISP-DM Twenty Years Later: From Data Mining Processes to Data Science Trajectories. IEEE Trans. Knowl. Data
Eng. 2019, 33, 3048–3061. [CrossRef]
8. Hussin, W.N.T.W.; Harun, J.; Shukor, N.A. A Review on the Classification of Students’ Interaction in Online Social Collaborative
Problem-based Learning Environment: How Can We Enhance the Students’ Online Interaction? Univ. J. Educ. Res. 2019,
7, 125–134. [CrossRef]
9. Fukuzawa, S.; Cahn, J. Technology in problem-based learning: Helpful or hindrance? Int. J. Inf. Learn. Technol. 2019, 36, 66–76.
[CrossRef]
10. Valverde-Berrocoso, J.; Garrido-Arroyo, M.d.C.; Burgos-Videla, C.; Morales-Cevallos, M.B. Trends in educational research about
e-learning: A systematic literature review (2009–2018). Sustainability 2020, 12, 5153. [CrossRef]
11. Ocaña, J.M.; Morales-Urrutia, E.K.; Pérez-Marín, D.; Pizarro, C. Can a learning companion be used to continue teaching
programming to children even during the COVID-19 pandemic? IEEE Access 2020, 8, 157840–157861. [CrossRef]
12. Palacios, C.A.; Reyes-Suárez, J.A.; Bearzotti, L.A.; Leiva, V.; Marchant, C. Knowledge Discovery for Higher Education Student
Retention Based on Data Mining: Machine Learning Algorithms and Case Study in Chile. Entropy 2021, 23, 485. [CrossRef]
13. Gao, P.; Wu, W.; Yang, Y. Discovering Themes and Trends in Digital Transformation and Innovation Research. J. Theor. Appl.
Electron. Commer. Res. 2022, 17, 1162–1184. [CrossRef]
14. Fayyad, U.; Piatetsky-Shapiro, G.; Smyth, P. From data mining to knowledge discovery in databases. AI Mag. 1996, 17, 37–37.
15. Nájera, A.B.U.; de la Calleja Mora, J. Brief review of educational applications using data mining and machine learning. Redie. Rev.
Electrón. De Investig. Educ. 2017, 19, 84–96.
16. Cummins, M.R. Nonhypothesis-driven research: Data mining and knowledge discovery. In Clinical Research Informatics; Springer:
Berlin/Heidelberg, Germany, 2019; pp. 341–356.
17. Sugiyarti, E.; Jasmi, K.A.; Basiron, B.; Huda, M.; Shankar, K.; Maseleno, A. Decision support system of scholarship grantee
selection using data mining. Int. J. Pure Appl. Math. 2018, 119, 2239–2249.
18. Witten, I.H.; Frank, E. Data mining: Practical machine learning tools and techniques with Java implementations. Acm Sigmod Rec.
2002, 31, 76–77. [CrossRef]
19. Ngo, T. Data mining: Practical machine learning tools and technique, by ian h. witten, eibe frank, mark a. hell. ACM SIGSOFT
Softw. Eng. Notes 2011, 36, 51–52. [CrossRef]
20. Scheuer, O.; McLaren, B.M. Educational data mining. Encycl. Sci. Learn. 2012, 1075, 1079.
21. Hernández-Blanco, A.; Herrera-Flores, B.; Tomás, D.; Navarro-Colorado, B. A systematic review of deep learning approaches to
educational data mining. Complexity 2019, 2019, 1306039. [CrossRef]
Sustainability 2023, 15, 895 15 of 16
22. Cengiz, M.; Birant, K.U.; Yildirim, P.; Birant, D. Development of an interactive game-based learning environment to teach data
mining. Int. J. Eng. Educ. 2017, 33, 1598–1617.
23. Almaiah, M.A.; Almulhem, A. A conceptual framework for determining the success factors of e-learning system implementation
using Delphi technique. J. Theor. Appl. Inf. Technol. 2018, 96, 5962–5976.
24. Almaiah, M.A.; Alyoussef, I.Y. Analysis of the effect of course design, course content support, course assessment and instructor
characteristics on the actual use of E-learning system. IEEE Access 2019, 7, 171907–171922. [CrossRef]
25. Almaiah, M.A.; Alismaiel, O.A. Examination of factors influencing the use of mobile learning system: An empirical study. Educ.
Inf. Technol. 2019, 24, 885–909. [CrossRef]
26. Almaiah, M.A.; Al-Khasawneh, A.; Althunibat, A. Exploring the critical challenges and factors influencing the E-learning system
usage during COVID-19 pandemic. Educ. Inf. Technol. 2020, 25, 5261–5280. [CrossRef] [PubMed]
27. Hendrickx, T.; Cule, B.; Meysman, P.; Naulaerts, S.; Laukens, K.; Goethals, B. Mining Association Rules in Graphs Based on
Frequent Cohesive Itemsets. In Proceedings of the Advances in Knowledge Discovery and Data Mining; Cao, T., Lim, E.P., Zhou, Z.H.,
Ho, T.B., Cheung, D., Motoda, H., Eds.; Springer International Publishing: Cham, Switzerland, 2015; pp. 637–648.
28. Moro, S.; Cortez, P.; Laureano, R. Using Data Mining for Bank Direct Marketing: An Application of the CRISP-DM Methodology;
EUROSIS-ETI: Ostend, Belgium, 2011.
29. Ghazal, M.M.; Hammad, A. Application of knowledge discovery in database (KDD) techniques in cost overrun of construction
projects. Int. J. Constr. Manag. 2022, 22, 1632–1646. [CrossRef]
30. Hand, D.J.; Smyth, P.; Mannila, H. Principles of Data Mining; MIT Press: Cambridge, MA, USA, 2001.
31. Dåderman, A.; Rosander, S. Evaluating Frameworks for Implementing Machine Learning in Signal Processing: A Comparative Study of
CRISP-DM, SEMMA and KDD; KTH, School of Electrical Engineering and Computer Science (EECS): Stockholm, Sweden, 2018.
32. Wiemer, H.; Drowatzky, L.; Ihlenfeldt, S. Data Mining Methodology for Engineering Applications (DMME)—A Holistic Extension
to the CRISP-DM Model. Appl. Sci. 2019, 9, 2407. [CrossRef]
33. Wirth, R.; Hipp, J. CRISP-DM: Towards a standard process model for data mining. In Proceedings of the 4th International
Conference on the Practical Applications of Knowledge Discovery and Data Mining, Manchester, UK, 11–13 April 2000; Volume 1,
pp. 29–39.
34. Phyu, T.N. Survey of classification techniques in data mining. In Proceedings of the International Multiconference of Engineers
and Computer Scientists, London, UK, 1–3 July 2009; Volume 1.
35. Soofi, A.A.; Awan, A. Classification techniques in machine learning: Applications and issues. J. Basic Appl. Sci. 2017, 13, 459–465.
[CrossRef]
36. Mahesh, B. Machine learning algorithms-a review. Int. J. Sci. Res. (IJSR) 2020, 9, 381–386.
37. Phan, T.N.; Kuch, V.; Lehnert, L.W. Land Cover Classification using Google Earth Engine and Random Forest Classifier—The
Role of Image Composition. Remote Sens. 2020, 12, 2411. [CrossRef]
38. Hameed, K.; Chai, D.; Rassau, A. A sample weight and adaboost cnn-based coarse to fine classification of fruit and vegetables at
a supermarket self-checkout. Appl. Sci. 2020, 10, 8667. [CrossRef]
39. Quinlan, J. C4.5: Programs for Machine Learning; Ebrary online; Elsevier Science: Amsterdam, The Netherlands, 2014.
40. Badawi, S.A.Q.; Takruri, M.; Albadawi, Y.; Khattak, M.A.K.; Nileshwar, A.K.; Mosalam, E. Four Severity Levels for Grading the
Tortuosity of a Retinal Fundus Image. J. Imaging 2022, 8, 258. [CrossRef]
41. Chaves, L.; Marques, G. Data mining techniques for early diagnosis of diabetes: A comparative study. Appl. Sci. 2021, 11, 2218.
[CrossRef]
42. Martínez-Cerdá, J.F.; Torrent-Sellens, J.; González-González, I. Socio-technical e-learning innovation and ways of learning in the
ICT-space-time continuum to improve the employability skills of adults. Comput. Hum. Behav. 2020, 107, 105753. [CrossRef]
43. Pozón-López, I.; Kalinic, Z.; Higueras-Castillo, E.; Liébana-Cabanillas, F. A multi-analytical approach to modeling of customer
satisfaction and intention to use in Massive Open Online Courses (MOOC). Interact. Learn. Environ. 2020, 28, 1003–1021.
[CrossRef]
44. Gilar-Corbi, R.; Pozo-Rico, T.; Castejón, J.L. Desarrollando la Inteligencia Emocional en Educación Superior: Evaluación de la Efectividad
de un Programa en tres Países; Universidad Nacional de Educación a Distancia (España): Madrid, Spain, 2019.
45. Wani, H.A. The relevance of e-learning in higher education. ATIKAN 2013, 3.
46. Meskhi, B.; Ponomareva, S.; Ugnich, E. E-learning in higher inclusive education: Needs, opportunities and limitations. Int. J.
Educ. Manag. 2019, 33, 424–437. [CrossRef]
47. Saqr, M.; Alamro, A. The role of social network analysis as a learning analytics tool in online problem based learning. BMC Med.
Educ. 2019, 19, 160. [CrossRef] [PubMed]
48. Al-Fraihat, D.; Joy, M.; Sinclair, J. Evaluating E-learning systems success: An empirical study. Comput. Hum. Behav. 2020,
102, 67–86. [CrossRef]
49. Romi, I.M. A Model for e-Learning Systems Success: Systems, Determinants, and Performance; Palestine Polytechnic University:
Hebron, Palestinian, 2017.
50. Hayashi, A.; Chen, C.; Ryan, T.; Wu, J. The role of social presence and moderating role of computer self efficacy in predicting the
continuance usage of e-learning systems. J. Inf. Syst. Educ. 2020, 15, 5.
51. Damabi, M.; Firoozbakht, M.; Ahmadyan, A. A Model for Customers Satisfaction and Trust for Mobile Banking Using DeLone
and McLean Model of Information Systems Success. J. Soft Comput. Decis. Support Syst. 2018, 5, 21–28.
Sustainability 2023, 15, 895 16 of 16
52. Donovan, E.; Guzman, I.R.; Adya, M.; Wang, W. A Cloud Update of the DeLone and McLean Model of Information Systems
Success. J. Inf. Technol. Manag. 2018, 29, 23–34.
53. Németh, T. How to back up Modules with blended learning The e-Learning platform of FAME. Prosperitas 2019, 6, 102–141.
[CrossRef]
54. Radha, S.; Michael Mariadhas, J.; Subramani, A.; Akbar Jan, N. Role of e-learning and digital media resources in employability of
management students. Online J. Distance Educ. e-Learn. 2019, 7, 116–123.
55. Cidral, W.A.; Oliveira, T.; Di Felice, M.; Aparicio, M. E-learning success determinants: Brazilian empirical study. Comput. Educ.
2018, 122, 273–290. [CrossRef]
56. García Aretio, L. El problema del abandono en estudios a distancia. Respuestas desde el Diálogo Didáctico Mediado. RIED. Rev.
Iberoam. De Educ. Distancia 2019, 22, 245–270. [CrossRef]
57. Weinberg, S.L.; Abramowitz, S.K. Statistics Using IBM SPSS: An Integrative Approach, 3rd ed.; Cambridge University Press:
Cambridge, CA, USA, 2016.
58. Li, M.; Xu, H.; Deng, Y. Evidential Decision Tree Based on Belief Entropy. Entropy 2019, 21, 897. [CrossRef]
59. Zhao, L.; Lee, S.; Jeong, S.P. Decision Tree Application to Classification Problems with Boosting Algorithm. Electronics 2021,
10, 1903. [CrossRef]
60. Chiu, Y.P. Social Recommendations for Facebook Brand Pages. J. Theor. Appl. Electron. Commer. Res. 2021, 16, 71–84. [CrossRef]
61. Hossin, M.; Sulaiman, M.N. A review on evaluation metrics for data classification evaluations. Int. J. Data Min. Knowl. Manag.
Process 2015, 5, 1.
62. Nhu, V.H.; Janizadeh, S.; Avand, M.; Chen, W.; Farzin, M.; Omidvar, E.; Shirzadi, A.; Shahabi, H.; Clague, J.; Jaafari, A.; et al.
Gis-based gully erosion susceptibility mapping: A comparison of computational ensemble data mining models. Appl. Sci. 2020,
10, 2039. [CrossRef]
63. Tsiakmaki, M.; Kostopoulos, G.; Kotsiantis, S.; Ragos, O. Implementing AutoML in educational data mining for prediction tasks.
Appl. Sci. 2019, 10, 90. [CrossRef]
64. Chicco, D.; Jurman, G. Machine learning can predict survival of patients with heart failure from serum creatinine and ejection
fraction alone. BMC Med. Inform. Decis. Mak. 2020, 20, 1–16. [CrossRef] [PubMed]
65. Jiménez-Valverde, A. Insights into the area under the receiver operating characteristic curve (AUC) as a discrimination measure
in species distribution modelling. Glob. Ecol. Biogeogr. 2012, 21, 498–507. [CrossRef]
66. Soria-Barreto, K.; Ruiz-Campo, S.; Al-Adwan, A.S.; Zuniga-Jara, S. University students intention to continue using online learning
tools and technologies: An international comparison. Sustainability 2021, 13, 13813. [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.