CSC 832 GROUP 1 Assignment

SCHOOL OF POST GRADUATE
STUDIES
FEDERAL UNIVERSITY LOKOJA
KOGI STATE
GROUP ONE (1)
DEPARTMENT: COMPUTER SCIENCE
FACULTY: SCIENCE
COURSE CODE: CSC 832
COURSE TITLE: MACHINE LEARNING
OUTLINE
Review at least 25 most recent papers (2020-
2022) on the State of the Art in Tree-based model
solutions; and write a 17 page review article.
GROUP MEMBER
S/N NAME MATRIC NUMBER
1 ADEBAYO, NANA ONYINOYI PG/MS/20/CSC/001
2 ADEJO, ALOYSIUS OKPANACHI PG/MS/20/CSC/002
3 AKAGWU, JOHN OSMAN ATTADOGA PG/MS/20/CSC/003

4 ALI, MONDAY MUHAMMED PG/MS/20/CSC/004
5 ALIYU, ILIYASU EGENE PG/MS/20/CSC/005
LECTURER: DR EMEKA OGBUJU

INTRODUCTION
Merriam-Webster dictionary posit a model to entail, a system of
postulates, data, and inferences presented as a mathematical description
of an entity or state of affairs or better used as, an example for imitation
or emulation.
Tree-based model is what we would be looking into in this work and it is
imperative to begin this work by adopting that, Tree-based classification
models are a type of supervised machine learning algorithm that uses a
series of conditional statements to partition training data into subsets.
Each successive split adds some complexity to the model, which can be
used to make predictions. The end result model can be visualized as a

roadmap of logical tests that describes the data set. Decision trees are
popular for small-to-medium-sized data sets because they are easy to
implement and even easier to interpret. (K.C Lee, 2020).
Admittedly, Tree-based model was brought about to serve as an answer
to questions and needs, how well has this model evolved and how well
has it been utilized? This work would definitely reveal these and more.
A number of articles relating to tree-based model would be reviewed in
this work and they include the followings.
Privacy Preserving Vertical Federated Learning for Tree-based
Models by Yuncheng Wu, Shaofeng Cai, Xiaokui Xiao, Gang Chen,
Beng Chin Ooi, 2020.
Overtime, people raise alarm on data theft as a result of easy access and it
is in this light that, the writers of this article discovered the inadequacies
of the popular horizontal Federated Learning. Existing work on Federated
Learning has mainly focused on the horizontal setting which assumes that
each client’s data have the same schema, but no tuple is shared by
multiple clients. In practice, however, there is often a need for vertical
federated learning, where all clients hold the same set of records, while
each client only has a disjoint subset of features (Yuncheng Wu et al,
2020:1).
This article identified two possible privacy leakages: the training label
leakage and the feature value leakage, regarding a target client’s training
dataset. The intuition behind the leakages is that the colluding clients are
able to split the sample set based on the split information in the model
and their own datasets (Yuncheng Wu et al, 2020:7).
In the article, pivot was proposed as privacy protection best for financial
risk management. A linking was created with Decision tree-based model
and pivot.
System and threat model were used as solution overview. Conclusively,
the experimental results demonstrated that Pivot achieves accuracy

comparable to non-private algorithms and is highly efficient (Yuncheng
Wu et al, 2020:16).
A Tree-based Machine Learning Model for Go-around Detection and
Prediction by Imen Dhief, Sameer Alam, Chan Chea Mean and Nimrod
Lilith, 2021.
The writers of this paper discovered the risk associated with the taking off
and landing of airplanes and sought to provide ways that would make it
less risky. They talked about the important roles played by Air Traffic
Controllers in Go-around operations and proposed that, a data-driven and
machine-learned safety metric to assist tower ATC with increased
situational awareness. In particular, the paper developed a machine
learning prediction model for go-around events when an aircraft is in its
final approach phase (Imen et al, 2021:1).
It was discovered in the paper that, the developed algorithm for go-
around labeling is able to correctly identify 731 go-arounds, among

which 93 flight change their runway following a go-around. This shows
that the runway change is relatively frequent when performing a go-
around, as it represents around 13% of the total number of go-arounds
(Imen et al, 2021:6)
Two types of experiments were conducted; the first include down-
sampling techniques and the second include the full data set.
Having seen good discoveries and propositions front this paper however,
it is noteworthy to state the writers of paper failed to realize that
digitalization though good also have some limitations in which Data can
be corrupted when there is power failure or viruses.
forgeNet: a graph deep neural network model using tree-based
ensemble classifiers for feature graph construction by Yunchuan Kong
and Tianwei Yu, 2020.
This paper began with stating the challenges involved in the study of
bioinformatics, it was said that one important problem is the prediction of

clinical outcomes using profiling datasets with a large number of
variables such as gene expression data, proteomics data and
metabolomics data. They said that, in such datasets, major challenges lie
in the relatively small number of samples compared to the large number
of predictors (genes/proteins/metabolites), namely the ‘n - p’ issue
(Yunchuan and Tianwei, 2020:3507)
Having identified the problem, the authors of the paper sought to address
these issues, they developed a method that does not rely on a given
feature network, yet can still benefit from the idea of building a model
with sparse and informative flow of information. Instead of using known
feature graphs, we tried to construct a feature graph within the feature
space. They used a supervised feature graph construction framework
using tree-based ensemble model (Yunchuan and Tianwei, 2020:3508).
The simulation study in the paper proved the forgeNet a powerful
classifier, with reasonably good feature selection ability. Through the

experiment results, one can easily conclude the novelty of forgeNets is
that, by borrowing the neural net architecture of the original GEDFN,
forgeNets utilize feature information more effectively in classification
tasks compared to regular tree-based ensemble methods (Yunchuan and
Tianwei, 2020:3511).
FIST: A Feature-Importance Sampling and Tree-Based Method for
Automatic Design Flow Parameter Tuning by Zhiyao Xie, Guan-Qi
Fang, Yu-Hung Huang, Haoxing Ren, Yanqing Zhang, Brucek Khailany,
Shao-Yun Fang, Jiang Hu, Yiran Chen, Erick Carvajal Barboza, 2020.
In this paper, it was agreed that Modern industrial chip design flows are
immensely complex. A design flow might have multiple steps, each step
might have multiple functions and each function can be configured with
many parameters. Consequently, industrial flows may have hundred-
thousand lines of scripts and are configured with thousands of parameters
(Zhiyao et al, 2020:1).

The authors also postulated that Changing logic synthesis parameters can
result in 3X difference in power and more than one clock cycle difference
in slack. Industrial design teams will tune flow parameters as best as they
can. Flow parameters are usually tuned manually based on designers’
experiences. Because industrial design flows would take several hours or
days to run on large designs, the manual parameter tuning process can be
very time-consuming, especially for novice designers. Consequently,
design turnaround time is stretched long or design quality is
compromised with an inadequate exploration of parameters.
In this paper, Zhiyao et al proposed a Feature-Importance Sampling and
Tree-Based (FIST) method to conduct design flow parameter tuning.
FIST learns the impact of parameters from previously well explored
designs and fully utilizes such information in its sampling process.
In the course of this study, Zhiyao et al built a large dataset, from which
they developed a clustering-based method to leverage prior data to

improve sampling efficiency during exploration. Also, there was an
introduction of approximate sampling and dynamic modeling based on
semi-supervised learning and bias-variance trade-off principles. This
approach improves design quality significantly or requires much less
sampling cost to achieve a given design performance compared with prior
exploration methods.
Having examined the approach and experiments in this work, their
submissions are highly appraised as it is best to be used.
Functionalization of Remote Sensing and On-site Data for Simulating
Surface Water Dissolved Oxygen: Development of Hybrid Tree-Based
Artificial Intelligence Models by Tiyasha Tiyasha, Tran Minh Tung,
Suraj Kumar Bhagat, Mou Leong Tan, Ali H. Jawad, Wan Hanna Melini
Wan Mohtar, Zaher Mundher Yaseen, 2021.
Tiyasha et al started this paper by looking at Water Quality, represented as
WQ, the maintenance, effects of bad Water Quality and looked further to
ways of solving this problem. In their words, the challenges of water
quality brought attention to the researchers the need of developing
systems for WQ monitoring at the global and regional levels to ensure a
proper prediction of sudden changes that help in the efficient
management of water resources. However, the conventional techniques
such as laboratory evaluation of surface WQ is a time-consuming and
costly process. Hence, mathematical models have been developed as an
alternative solution for WQ parameter estimation. However,
mathematical models contain several limitations, such as lower
inadequate modeling performance, non-effective generalized
methodologies, and difficulty in addressing high data uncertainty and
stochasticity. As such, advanced soft computing machine learning (ML)
algorithms have been developed to enhance the prediction of WQ
parameters. ML algorithms are developed from statistical methods that
can automatically learn from data and build a

detection/classification/estimation model that reduces the variation
between the training and prediction dataset without the need for explicit
programming (Tiyasha et al., 2021).
These authors believe that WQ monitoring using advanced techniques
such as artificial intelligence (AI) and remote sensing can help in the
taking of appropriate measures to mitigate the harmful effects of water
pollution.
XGBoost model was utilized in this work because they agreed that it is
mostly considered because of its flexibility in hyper-parameters tuning by
soft computing techniques.
It is highly commendable to see this research end by stating that, the use
of the data for trend and statistical analyses was not mentioned in this
paper. In particular, the use of remote sensing data has the advantage of
providing more knowledge about the selected site characteristics, climate
change, and meteorological variation events, although these variables

may not be useful during modeling. Future research can explore more
remote sensing data with the aim of reducing monitoring site data
utilization without compromising model efficiency and accuracy, thus
reducing the cost of data collection and experiments.
Tree-based nonlinear ensemble technique to predict energy
dissipation in stepped spillways by Ömer Ekmekcioğlu , Eyyup Ensar
Başakın & Mehmet Özger, 2020.
The authors of this paper began by saying the main purpose of the
spillways is to ensure the transmission of flood flow without causing
major damages in the upstream and downstream parts and, in doing so, to
provide the hydraulic criteria as well as to keep the cost to a minimum.
They believed that stepped spillways were widely used in the past,
although different flow regimes, turbulence and analysis are complex
(Chanson & Gonzalez, 2005 in Ömer et al, 2020), they are being applied
for the last few decades due to the fact that they considerably reduce the
cost (Rajaratnam,1990 in Ömer et al, 2020). Step spillways maximise
energy dissipation and reduce the length of the stilling basin
(Chanson,1993 in Ömer et al, 2020).
Support vector regression method and K-star algorithm was used and they
said that support vector machine is an optimisation-based ML algorithm
that works with the rule of minimising structural risks (Ömer et al,
2020:4). Ensemble model was used to make findings in the research, they
discovered that, Stepped spillways not only provide gradual energy
dissipation, but also reduce the size of the stilling basin. Although it is
difficult to examine the stepped spillways in terms of hydraulics, they are
commonly preferred structures due to their high energy dissipation.
An Evaluation of Preprocessing Steps and Tree-based Ensemble
Machine Learning for Analysing Sentiment on Indonesian YouTube
Comments by A. S. Aribowo, H. Basiron, N. S. Herman, S. Khomsah.
Just as the article is titled, the research was in a bid to look through the
comments made on YouTube and use the Ensemble model to find out
how to correct the wrong use of words and emoticons.
At the end, the authors realized that the comments can be improved upon
with proper command of words.
Novel Ensemble Tree Solution for Rockburst Prediction Using Deep
Forest by Diyuan Li, Zida Liu, Danial Jahed Armaghani, Peng Xiao and
Jian Zhou, 2022.
Worthy of note it is to state that the incessant occurrences of rockburst
made these authors to look into this topic. Rockburst is known to be the
unexpected release of strain energy in the process of after excavation of
engineering works underground. These authors used the ensemble tree
model in their study and saw the extent to which the model can be
utilized in engineering works.
It was discovered in this article that smaller databases limit their findings.
They concluded by sharing this thought, Deep Forest, a novel tree-based

ensemble model, was proposed to build the rockburst prediction model
based on 329 collected real rockburst cases. Bayesian optimization was
used to turn the hyperparameters of the DF. The DF had 100% accuracy
in the training set and 92.4% accuracy in the testing set, and it performed
better than other ML models and can forecast massive rockburst disasters
(Diyuan et al, 2022).
Long-Term Wind Power Forecasting Using Tree-Based Learning
Algorithms by AMIRHOSSEIN AHMADI, MOJTABA NABIPOUR,
BEHNAM MOHAMMADI-IVATLOO, ALI MORADI AMANI,
SEUNGMIN RHO AND MD. JALIL PIRAN, 2020.
This article looks into the possibility of having a long-term wind power
forecasting as against the conventional short-term wind power forecasting
that we see around. Since wind energy is one of the green energy that the
world is looking into, this research becomes very important. In this
article, it was noted that, the forecasting accuracy of the proposed models
was investigated using observations measured at various heights and time
intervals.
AMIRHOSSEIN et Al discovered that, the quantity and quality of the
dataset have a profound impact on the performance of the Water Power
Forecasting model. They submitted that the uncertainty of the wind
nature makes collection of sufficient representative datasets difficult.
The generalized minimum spanning tree problem: An overview of
formulations, solution procedures and latest advances by Petrică
C.Pop
This paper started by making us conscious of the fact that the minimum
spanning tree problem is not a thing of today as it was first talked about
in 1926 by Boruvka. This author said that the minimum spanning tree
MST is famous because of the efficient solutions which make it practical
to solve when dealing with large graphs.
In this article, General Minimum Spanning Tree Problem (GMSTP) was

said to be the focus of the research and two integers of programming
were formulated; the local-global formulation and the multigraph
formulation, they are said to be specially tailored to the investigated
problem. The article was put to conclusions by saying, the best exact
algorithms can only solve relatively small instances and in practice
heuristic algorithms are the preferred solution.
New M5P model tree-based control for doubly fed induction
generator in wind energy conversion system by Mounira Ali, Abdelaziz
Talha, El madjid Berkouk, 2020.
In our world today, wind energyhas been made popular by much talk on
green energy. In this article, it was said that among the different available
wind energy conversion system (WECS), variable-speed wind turbine
generators have attracted much attention because of their high energy
production efficiency and reduced friction and mechanical stress
The authors said, the main objective of this paper was to design a new
M5 control algorithm based on fuzzy logic dataset to improve control
performance of DFIG RSC. They continued by saying that The resulting
algorithm is provided based on simple < if/then > rules. The M5 could
reduce the complexity of fuzzy logic computation in the RSC, thereby
enhancing the control dynamics and robustness and minimizing the
harmonic distortion. The result of the simulation carried out in this article
show that M5P controllers have a smoother start-up and faster dynamics
compared with fuzzy logic.
A Continuous Cuffless Blood Pressure Estimation Using Tree-Based
Pipeline Optimization Tool by Suliman Mohamed Fati, Amgad Muneer,
Nur Arifin Akbar and Shakirah Mohd Taib, 2021.
It was established in the article that High Blood Pressure is an acute
health challenge which poses delicate impact health-wise, this therefore,
make people monitor their health well. Two types of blood pressure
monitoring, aneroid blood pressure and invasive arterial BP measurement

procedure was brought to bear, the former uses cuff while the later is
cuffless. The author used a tree-based pipeline optimization tool (TPOT)
to estimate the blood pressure from photoplethysmogram (PPG). This
paper focused on the extraction of PPG signals to derive key features of
invasive arterial BP.
This article rightly pointed out the gap in their work by stating that
further work can be conducted to explore critical care unit artifacts, such
as line access interruptions in the BP waveform, to assess infection risk
due to CLABSI in both pediatric and adult ICU settings.
TreeCaps: Tree-Based Capsule Networks for Source Code Processing
by Nghi D. Q. Bui, Yijun Yu, Lingxiao Jiang, 2021.
From this article, it is revealed that programmers do not just jump at new
things, they use existing languages and code better before trying to
implement new features or fixing bug. TreeCaps was proposed in the
article as a novel neural network architecture that incorporates tree-based

convolutional neural networks (TBCNN) into capsule networks for better
learning of code on abstract syntax trees.
The paper pushed to be the first to re-purpose capsule networks over
syntax trees to learn code without the need for explicit semantics analysis.
Robust Counterfactual Explanations for Tree-Based Ensembles by
Sanghamitra Dutta, Jason Long, Saumitra Mishra, Cecilia Tilli, Daniele
Magazzeni, 2022.
According to this article, the goal of counterfactual explanations is to
guide an applicant on how they can change the outcome of a model by
providing suggestions for improvement.
This work addresses the problem of finding robust counterfactuals for
tree-based ensembles. It provides a novel metric to compute the stability
of a counterfactual that can be representative of its robustness to possible
model changes, as well as, a novel algorithm to find robust counterfactual

(Sanghamitra et al, 2022).
It is great to say this article pointed out the gap in their work by saying,
though not exactly comparable, but our cost and validity are in the same
ballpark as that observed for these datasets in existing works.
RDTIDS: Rules and Decision Tree-Based Intrusion Detection System
for Internet-of-Things Networks by Mohamed Amine Ferrag, Leandros
Maglaras, Ahmed Ahmim, Makhlouf Derdour and Helge Janicke, 2020.
This paper points out that the ravaging of cyberattacks led to the forming
of team to overcome the attackers. Countermeasures are taken according
to the information obtained regarding the detected attacks from the
detection systems.
This paper proposed a hierarchical intrusion detection system based on
the combination of three different classifiers, namely REP Tree, JRip
algorithm and Forest PA. The proposed model consists of three
classifiers, where two of them operate in parallel and feed the third one.
Ranking top-k trees in tree-based phylogenetic networks by Momoko
Hayamizu and Kazuhisa Makino, 2020.
This paper state that, phylogenetic networks have become popular among
biologists as a tool to depict conflicting signals and also used for
modeling reticulate evolution. In this paper, the writers considered the
top-k support tree ranking problem and provided a linear-delay (and
hence theoretically optimal) algorithm for solving it.
A gradient boosted decision tree-based sentiment classification of
twitter data by S. Neelakandan and D. Paulraj, 2020.
This paper began by stating that the Internet has become the most
imperative source for people to attain information for making decisions
and twitter is one of the most popular and most used in this light. Here,
the proposed system handles the efficient SA of twitter data. This is done
utilizing the GBDT classifier. The proposed technique is executed
utilizing Java.
The author pointed out this gap and which it can be worked on, extension
could be made to test a general thesaurus centered on a common corpus.
Artificial Flora Algorithm-Based Feature Selection with Gradient
Boosted Tree Model for Diabetes Classification by Nagaraj P,
Deepalakshmi P, Romany F Mansour, Ahmed Almazroa, 2021.
This paper pointed out that Diabetes is on the increase in everywhere in
the world and gave out the 3 types; type1 diabetes mellitus, type2
diabetes mellitus and the gestational diabetes mellitus (GDM). This paper
used the artificial flora algorithm- gradient boosted tree, AFA-GBT
Model and AFA-FS Model for its findings, the AFA was adopted for
feature selection and the classification was performed using the GBT
model while the GBT model was superior to other models because it was
highly flexible, offered better classification accuracy and operated on
both categorical and numerical values. The gap in the paper is that there
was still no balance between the numbers of the three types of samples.
Performance Evaluation of Deep Learning-Based Gated Recurrent
Units (GRUs) and Tree-Based Models for Estimating ETo by Using
Limited Meteorological Variables by Mohammad Taghi Sattari, Halit
Apaydin and Shahaboddin Shamshirband, 2020.
This paper talks about the use of water for irrigation and said that there
are various methods to determine plant water requirements, but the
Penman-Monteith (ETo-PM) method presented by the United Nations
Food and Agriculture Organization (FAO) has been accepted as the
standard, since other methods give different results. This method
calculates reference evapotranspiration values using different
meteorological variables. Recently, artificial intelligence—specifically,
machine learning and data mining—has been used to calculate
evapotranspiration (ETo) amounts.
The gap in the study as shown by the authors of the paper is that the
discoveries after this research is similar to other works.

A novel tree-based dynamic heterogeneous ensemble method for
credit scoring by Yufei Xia, Mengyi Niu, 2020.
This paper states that extending credit to customers is a core business of
financial institutions because it brings dramatic profits to stakeholders.
Ensemble model was utilized in this article, the rationale behind
ensemble learning is to integrate the decisions of different algorithms to
acquire a better result relative to relying only on a single algorithm. The
long-lasting effects of subprime crisis highlight the importance of credit
risk assessment tools in developed and developing countries. Due to its
superior performance, the ensemble method has attracted much attention
of researchers in the credit scoring domain.
Decision tree-based user-centric security solution for critical IoT
infrastructure by Deepak Puthal, Mukesh Prasad, 2022.
The world today revolves around the internet, important messages are
shared and discovered far and wide as a result of the internet. This paper
state that proper processing and the correct interpretation of the data
provided by the internet may provide better insights to solve real-world
problems. it states further that traditional IT infrastructures are inherently
hybrid and diversified even though there is a shift, the IT infrastructures
are moving from a hardware-centric to a service-oriented infrastructure.
The authors of this article made a proposition which is the DecisionTSec,
posited to be a user-centric adaptive security mechanizing for IoT-based
critical infrastructure. A decision tree mechanism was adopted and
integrated with the crypto system to set the bar higher for the attackers.
Piracema.io: A rules-based tree model for phishing prediction by
Carlo Marcelo Revoredo da Silva, Vinicius Cardoso Garcia, 2022.
It was observed in this article that more than half of the total scam on
credit cards taking place is made possible by phishing, the malicious mail
scam is also said to be another phishing style.
The methodology of this paper is said to be based on a rules tree

processed by gradual analysis, looking at a structure organized by
semantics and similarity and prioritizing the relevance of the features.
This paper proposed a solution aimed to minimize fraud incidents while
browsing the end-user through the Web, the targeted phishing is closed-
scope fraud, such as spear phishing and Smishing.
Slope Stability Classification under Seismic Conditions Using Several
Tree-Based Intelligent Techniques by Panagiotis G. Asteris, Fariz
Iskandar Mohd Rizal, Mohammadreza Koopialipoor, Panayiotis C.
Roussis, Maria Ferentinou, Danial Jahed Armaghani and Behrouz
Gordan, 2022.
Challenges abound for geotechnical engineers to fully access the site of
Job and go on with, this is why they usually put into use, analytical
methods to check the site locations before they go on with their jobs.
In this paper, the authors state that, Slope stability analysis is a standard
practice in geotechnical engineering employed for the estimation of the

stability of natural or man-made slopes such as embankments of
highways, railways, earth dams, tailings, etc. The analysis of slope
stability mainly involves the calculation of the factor of safety (FOS),
which is defined as the ratio between shear strength and the acting shear
stress (Panagiotis et al., 2022:2).
It is observed in this article that, Artificial intelligence (AI) and machine
learning (ML) techniques have been successfully implemented in the area
of engineering and sciences. This paper also posits a series of models
were constructed to calculate FOS using a standard geotechnical
software.
The authors discovered that the better performance and higher capability
for classification purpose goes to the proposed AdaBoost technique.
Therefore, it can be introduced as a new technique for slope stability
classification with the largest number of matched cases.
It was also well established in the paper that, to propose a new method for
classifying slope stability cases using AI techniques, extensive
investigation is required. Therefore, in order to develop a model for
classifying slope stability, a comprehensive database comprising real
cases must be gathered and utilized (Panagiotis et al., 2022:15).
Ensemble Tree-Based Approach towards Flexural Strength
Prediction of FRP Reinforced Concrete Beams by Muhammad Nasir
Amin, Mudassir Iqbal, Kaffayatullah Khan, Muhammad Ghulam Qadir,
Faisal I. Shalabi and Arshad Jamal, 2022.
The paper observed that, the rapid increase in the population of the world
poses a huge demand for the development of infrastructure; thus, the
production of concrete is considerably increased. The work began by
exposing that cement-based material and concrete is used globally
because of their low porosity and high mechanical strength.
This article presents estimation of the bending capacity of FRP-reinforced
concrete beams. Murad et al was quoted in this article to have developed

a GEP tree-based model for flexural capacity of concrete beams
reinforced with FRP rebars.
INTELLIGENT TREE-BASED ENSEMBLE APPROACHES FOR PHISHING WEBSITE
DETECTION by YAZAN A. ALSARIERA, ABDULLATEEF O. BALOGUN, VICTOR E.
ADEYEMO, OMAR H. TARAWNEH, HAMMED A. MOJEED, 2022.
It is observed that humans now live online as virtually, everything is done
on the internet, as a result of this, unsuspecting users have also been
duped of their valuables. The authors of this article stated that, due to the
absence of standard internet protocol, the unregulated access and
availability of these IT infrastructures create possibilities for internet
threats and attacks.
They posited that, A phishing website involves the utilization of
illegitimate websites and their resources to wrongful acquire sensitive
information from end-users. Biometric data, bank account details, and
other sensitive information are taken from innocent users. In order to
cope with the evolving complexities of phishing websites, machine

learning (ML)-based approaches are used to analyse features retrieved
from websites to evaluate their legitimacy (YAZAN et al.,2022:564).
This article proposed intelligent tree-based ensemble approaches for
phishing website detection. BFTree and NBTree were augmented with
ensemble methods for efficient phishing website detection models (YAZAN
et al.,2022:576).
Title of Article methods Tools techniques

Tree-based ensemble Synthetic and real Forest graph-
forgeNet: a graph datasets embedded deep
feedforward
network
deep neural network
model using tree-
based ensemble
classifiers for feature
graph construction
INTELLIGENT TREE-BASED Naïve Bayes Tree Tree-based
ENSEMBLE APPROACHES (NBTree), Best First Tree ensemble
FOR PHISHING WEBSITE (NBTree)
DETECTION
Dynamic Fault Tree Analysis: Petri Nets for Application of Machine Classical fault trees
state-of-the-Art in Modeling, quantifying DFTs, Learning
Analysis and Tools Markov Models for
quantifying DFTs,
Bayesian Networks for
quantifying DFTs,
Simulation Approaches
for quantifying DFTs,
Modularizations
Approaches for
quantifying DFTs
A Tree-Based Intelligent Bishop simplified Decision Tree (DT)
Slope Stability Technique
Classification under
Seismic Conditions
Using Several Tree-
Based Intelligent
Techniques
A Tree-based Machine ADS-B data Data-driven model
A Tree-based Learning Model
Machine Learning
Model for Go-around

Detection and
Prediction
Tree-baesd non-linear Vector machine (SVM), K- Random Forest (RF-
Tree-based nonlinear ensemble star (K) algorithm and E)
artificial neural networks
ensemble technique (ANN)
to predict energy
dissipation in
stepped spillways
Tree-based learning MW Wind Turbines Decision Tree,
Long-Term Wind Algorithms Bagging, Random
Forest, Boosting,
Power Forecasting Gradient Boosting,
XGBoost
Using Tree-Based
Learning Algorithms
Tree-based Tree-based
Ranking top-k trees phylogenetic networks phylogenetic
in tree-based
phylogenetic
networks
Ensemble tree-based 60% of the database Artificial intelligence
Ensemble Tree- approach were validated first-hand (AI), decision tree
on the remaining 40% (DT) and gradient
Based Approach boosting tree (GBT)
approaches.
towards Flexural
Strength Prediction
of FRP Reinforced
Concrete Beams
M5P model tree-based Matlab/Simulink M5 model tree
New M5P model control software. Simplified
tree-based control
for doubly fed
induction generator
in wind energy
conversion system
Ensemble Tree Solution Bayesian optimization Deep forest
Novel Ensemble Tree
Solution for
Rockburst Prediction
Using Deep Forest

Tree-based sentiment HDFS MapReduce Improved Elephant
A gradient boosted classification Herd Optimization
(I-EHO) technique.
decision tree-based
sentiment
classification of
twitter data
Tree-based Ensemble Metric Counterfactual RobX
Robust Stability
Counterfactual
Explanations for
Tree-Based
Ensembles
Limited meteorological 15 input scenarios, Random forest
Performance variables consisting of model, M5 tree
meteorological variables model; random tree;
including maximum and regression tree
Evaluation of Deep minimum temperature,
wind speed, maximum
Learning-Based and minimum relative
humidity, dew point
Gated Recurrent temperature, and
sunshine duration
Units (GRUs) and simplified.
Tree-Based Models
for Estimating ETo by
Using Limited
Meteorological
Variables
Tree-based method ITC99, 45nm NanGate XGBoost model,
FIST: A Feature- Library dynamic tree
technique
Importance Sampling
and Tree-Based
Method for
Automatic Design
Flow Parameter
Tuning
Tree-Based Intrusion CICIDS2017 dataset and REP Tree, JRip
RDTIDS: Rules and Detection System BoT-IoT dataset algorithm and
Forest PA.
Decision Tree-Based
Intrusion Detection
System for Internet-
of-Things Networks
Tree-Based Capsule Java and C/C++ programs TreeCaps
TreeCaps: Tree-Based Networks
Capsule Networks
for Source Code
Processing
Tree-based Ensemble Converting emoticons Naïve Bayes (NB),
An Evaluation of and handling Support vector
unstructured words machine (SVM),
Preprocessing Steps Decision Tree,
Random Forest, and
and Tree-based Extra Tree classifier
Ensemble Machine
Learning for
Analysing Sentiment
on Indonesian
YouTube Comments
Tree-Based Pipeline PhysioNet global dataset Random forest (RF)
A Continuous and K-nearest
neighbours (KNN)
Cuffless Blood
Pressure Estimation
Using Tree-Based
Pipeline
Optimization Tool
Gradient Boosted tree Three diabetes datasets Artificial flora
Artificial Flora Model Algorithm (AFA)-
Based feature
Algorithm-Based selection, and
gradient boosted
Feature Selection tree (GBT)-based
classification
with Gradient
Boosted Tree Model
for Diabetes
Classification

CSC 832 GROUP 1 Assignment

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CSC 832 GROUP 1 Assignment

Uploaded by

Copyright:

Available Formats

SCHOOL OF POST GRADUATE

GROUP ONE (1)

DEPARTMENT: COMPUTER SCIENCE

COURSE CODE: CSC 832

COURSE TITLE: MACHINE LEARNING

2022) on the State of the Art in Tree-based model

solutions; and write a 17 page review article.

1 ADEBAYO, NANA ONYINOYI PG/MS/20/CSC/001

2 ADEJO, ALOYSIUS OKPANACHI PG/MS/20/CSC/002

3 AKAGWU, JOHN OSMAN ATTADOGA PG/MS/20/CSC/003

5 ALIYU, ILIYASU EGENE PG/MS/20/CSC/005

LECTURER: DR EMEKA OGBUJU

Merriam-Webster dictionary posit a model to entail, a system of

postulates, data, and inferences presented as a mathematical description

of an entity or state of affairs or better used as, an example for imitation

Tree-based model is what we would be looking into in this work and it is

imperative to begin this work by adopting that, Tree-based classification

models are a type of supervised machine learning algorithm that uses a

series of conditional statements to partition training data into subsets.

used to make predictions. The end result model can be visualized as a

popular for small-to-medium-sized data sets because they are easy to

implement and even easier to interpret. (K.C Lee, 2020).

Admittedly, Tree-based model was brought about to serve as an answer

A number of articles relating to tree-based model would be reviewed in

this work and they include the followings.

Privacy Preserving Vertical Federated Learning for Tree-based

Models by Yuncheng Wu, Shaofeng Cai, Xiaokui Xiao, Gang Chen,

Beng Chin Ooi, 2020.

of the popular horizontal Federated Learning. Existing work on Federated

multiple clients. In practice, however, there is often a need for vertical

each client only has a disjoint subset of features (Yuncheng Wu et al,

and their own datasets (Yuncheng Wu et al, 2020:7).

risk management. A linking was created with Decision tree-based model

System and threat model were used as solution overview. Conclusively,

the experimental results demonstrated that Pivot achieves accuracy

A Tree-based Machine Learning Model for Go-around Detection and

Controllers in Go-around operations and proposed that, a data-driven and

machine-learned safety metric to assist tower ATC with increased

situational awareness. In particular, the paper developed a machine

learning prediction model for go-around events when an aircraft is in its

final approach phase (Imen et al, 2021:1).

around labeling is able to correctly identify 731 go-arounds, among

that the runway change is relatively frequent when performing a go-

around, as it represents around 13% of the total number of go-arounds

(Imen et al, 2021:6)

Two types of experiments were conducted; the first include down-

it is noteworthy to state the writers of paper failed to realize that

be corrupted when there is power failure or viruses.

forgeNet: a graph deep neural network model using tree-based

ensemble classifiers for feature graph construction by Yunchuan Kong

and Tianwei Yu, 2020.

bioinformatics, it was said that one important problem is the prediction of

variables such as gene expression data, proteomics data and

in the relatively small number of samples compared to the large number

of predictors (genes/proteins/metabolites), namely the ‘n - p’ issue

(Yunchuan and Tianwei, 2020:3507)

with sparse and informative flow of information. Instead of using known

feature graphs, we tried to construct a feature graph within the feature

space. They used a supervised feature graph construction framework

using tree-based ensemble model (Yunchuan and Tianwei, 2020:3508).

The simulation study in the paper proved the forgeNet a powerful

classifier, with reasonably good feature selection ability. Through the