Professional Documents
Culture Documents
Department of Computer Science & Department of Computer Science & Faculty of Computer Applications &
Engineering Engineering Information Technology
Krishna School of Emerging Krishna School of Emerging Gujarat Law Society University
Technology & Applied Research – Technology & Applied Research – Ahmedabad, Gujarat, India
Drs. Kiran & Pallavi Patel Global Drs. Kiran & Pallavi Patel Global kinjal5721@gmail.com
University (KPGU) University (KPGU)
Vadodara, Gujarat, India Vadodara, Gujarat, India
milindshahcomputer@gmail.com kinjal445@gmail.com
Ankita Kothari
Harsh Kantawala Rohini Patel
Assistant Professor
Assistant Professor Assistant Professor
Department of Computer Science &
Department of Computer Engineering Department of Computer Science &
Engineering
G H Patel College of Engineering & Engineering
Krishna School of Emerging
Technology – Krishna School of Emerging
Technology & Applied Research –
Charutar Vidhya Mandal University Technology & Applied Research -
Drs.Kiran & Pallavi Patel Global
(CVM) Drs. Kiran & Pallavi Patel Global
University (KPGU)
Vallabh - Vidhyanagar, Gujarat, India University (KPGU)
Vadodara, Gujarat, India
harshkantawala@gcet.ac.in Vadodara, Gujarat, India
ankita.cor@gmail.com
rohini.d.patel@gmail.com
Abstract— The use of ensemble techniques is widely Ensemble methods are a set of procedures that include
recognized as the most advanced approach to solving a variety the development of numerous models followed by their
of problems in machine learning. These strategies train many combination in order to get superior outcomes. The results
models and combine the results from all of those models in that ensemble techniques give are often more accurate than
order to enhance the predictive performance of a single model. the solutions that a single model would produce. Ensemble
During the period of the last several years, the disciplines of learning is able to effectively solve a broad range of issues
artificial intelligence, pattern recognition, machine learning, pertaining to machine learning, including estimate,
neural networks, and data mining have all given a considerable
confidence, error correction, continuous learning, missing
consideration to the concept of ensemble learning. Ensemble
features, and many more. When using ensemble learning,
Learning has shown both effectiveness and usefulness across a
broad range of problem domains and in significant real -world you generate a number of base-level classifiers or learners
applications. Ensemble learning is a technique that involves the and combine them for training on your dataset. Base learners,
construction of many classifiers or a group of base learners sometimes called weak learners, are generated by applying
and the merging of their respective outputs in order to the base learning algorithm to the training data. In most
decrease the total variance. When compared to using only one cases, ensemble learning uses some type of learning
classifier or one base learner at a time, the accuracy of the algorithm to produce what is known as a homogeneous
results achieved by combining numerous classifiers or the set ensemble. However, in certain cases, it constructs what are
of base learners is greatly improved. It has been shown that the known as heterogeneous ensembles by applying many types
use of ensemble methods may increase the predicted accuracy of learning algorith ms. The benefit of using ensemble
of machine learning models for a range of tasks, including learning is that it is able to improve the performance of less
classification, regression, and the identification of outliers. This capable learners, which in turn helps to improve the overall
study will discuss about ensemble machine learning techniques accuracy of the learning algorithm when applied to the
and its various methods such as bagging, boosting, and training data. There are a lot of applications out there that use
stacking. Finally, all the factors involved in bagging, boosting, the ensemble learning approaches, and in the end, those
and stacking are compared. applications have come to the conclusion that the ensemble
method works better than the traditional way (single learner)
Keywords— En semble Learning, Bagging, Boosting,
[5].
Stacking, Machine Learning, Multiple Classifier, Blend of
Experts. Finding a single model that can provide the most accurate
prediction of the intended outcome is the objective of every
I. INT RODUCT ION machine learning problem, rather than utilizing any of the
traditional machine learning algorithms and expecting that
In the field of co mputers, machine learning is where art this model will be the best accurate predictor, ensemble
and science come together. Understanding data and selecting methods take into consideration a numerous models and then
the appropriate algorithm are two crucial steps in developing generate one final model by averaging all of those models
an effective machine learning tool. Why settle with just one together.
algorithm when you can choose several different ones and When trying to increase the reliability of one's predictions,
have them all contribute to achieving the same goal? it is common practice to use ensemble methods of machine
Enhanced outcomes. learning. They are often used in situations in which a single
machine learning algorithm would be unable to effectively
understand the underlying connections due to the dataset making use of EM G s ignals as the data source. The purpose
being either too large or too complex. The main reason to of this research is to evaluate whether or not these classifiers
choose ensemble machine learning techniques is that they are capable of doing so. It is important to understand the fact
can help to avoid overfitting, it can also be less susceptible to that the method can be divided into three distinct steps . The
bias, and are more robust to changes in data. first step is to calculate the wavelet packing coefficient
Let's compare ensemble learning with a real-world case (WPC) for each type of EM G signal. Next, we need to
study. Imagine if the co mpany's management decided to hold calculate the WPC statistics to represent the distribution of
a meeting to discuss a measure that had just been approved, wavelet coefficients. This can be done using a computer
after wh ich they called the meet ing. Instead of the Managing program. To achieve this goal, wavelet transform can be
Director or HR taking the decision alone by themselves may used. Ensemble classifiers were used during the final step of
not be the best idea because this accounts for dictatorship, the method to achieve d iagnosis of neuromuscular
and the results may not be in favor of the employees as much conditions. The resulting features were used as input
as they could be. Therefore, this comparison may be used to variables in the classification process. In this research, a
a single model in machine learn ing, such as a Decision Tree comparison has been made between the bag ensemb le
or Logistic Regression, and they will always perform slightly learning strategy and the reinfo rcement ensemble learning
less than the ensemble approaches. On the other hand, if a approach for automat ic recognition of electro myography
meeting is held between a number of different employees, (EM G) data. The performance of ensemble classifiers for
say four of them, each of them will offer an opinion along real-world p roblems has been presented in many research
with the benefits and drawbacks of the situation. And finally, studies, but many focus on the feasibility of bagging and
the most compelling viewpoint will be selected based on the boosting ensemble classifiers for the diagnosis of
votes of the majority. The ensemble method is precisely the neuromuscular disorders. Very few studies have been done.
same thing as described here.Similarly, we can say that Meanwhile, several research studies have shown the
several models are utilized, and depending on the results of effectiveness of ensemble classifiers in relation to real-
each, a majority vote will be used to make the final decision. world p roblems. This is still true despite ensemble
classifiers being shown to be efficient. The results of this
Training research showed that ensemble classification performs better
Model A
Data in the diagnosis of neuromuscular disorders compared to
individual classifiers. The results are encouraging as the
Training AdaBoost algorithm co mbined with the random forest
Model B Generalizer
Data ensemble method is shown to produce an accuracy of
99.08%, an F-measure of 0.99, an AUC of 1, and a kappa
Training statistic of 0.99.
Model C In [2] Xianwei Gao et al, as its research object ive, the
Data
NSL-KDD data collection is used in this research. After
Fig 1. Summary of Ensemble Learning that, it goes into the most recent advancements in the area of
This paper consists of six sections. Section I is Introduction, intrusion detection technology as well as the obstacles that it
Section II is Why we need ensemble techniques , Section III faces, and it wraps up with a suggestion for an adaptable
is Related Work, Section IV Methodology, Section V is ensemble learning model. A ltering the quantity of data that
Co mparative Analysis and finally, Conclusion which is used during the training phase of the MultiTree method
concludes the paper. and creating several decision trees are both required steps in
the construction of this technique. We came to the
II. WHY WE NEED ENSEMBLE TECHNIQUES? conclusion that the most effective way to improve the
It is well known that bias, variance, and noise all have a overall detection impact would be to make use of a variety
severe impact on the errors and predictions produced by of different fundamental classifiers, such as decision trees,
mach ine learning models. To counteract these drawbacks, random forests, kNN, and DNN, in addition to developing
ensemble techniques are used. And there are two primary an adaptive voting mechanism for ensembles. When we put
reasons, both of which are connected, to adopt an ensemble our method through its paces using NSL-KDD TestC, we
rather than a single model. These are some of the discover that the MultiTree algorith m has an accuracy of
arguments: Performance: An ensemble of models may 84.2%, and that the final accuracy of the adaptive voting
produce more accurate predictions and achieve higher levels process is 85.2%. Previous research papers have been
of performance than any one of the individual models that proved to be inferior to our ensemble model in terms o f its
contribute to it. The use of an ensemble helps to bring the capability to effectively boost detection accuracy. This has
spread or dispersion of the predictions and the performance been shown via a series of tests. Furthermore, the analysis of
of the model closer together, which is what is meant by the the data indicates that the quality of the data characteristics
term "robustness." When trying to solve a problem with is an important component in determining the efficacy of the
predictive modeling, ensembles are used to do a better job detection. This was discovered as a result of the discovery
of predicting than a single predictive model could. that the quality of the data characteristics is an essential
component. If we want to obtain better results in the future,
III. RELATED WORK we need to make sure that the feature selection and
In [1] Emine Yaman et al, the objective of the research is preprocessing of the data that is used for intrusion detection
to discover if bagging and boosting ensemble classifiers are are optimized. Only then will we be ab le to achieve our goal
capable of properly detecting neuromuscular disorders by of getting better results.
In [3] Thomas G Dietterich, in this research, an overview approach. This research review main ly focuses on
of various approaches is provided, along with an hydrological topics such as surface hydrology, river water
explanation of why ensembles may often do better quality, rain fall runoff, debris flo w, river freezing, sediment
than anyone classifier. We discuss some recent research that transport, groundwater, flood modeling and forecasting, and
compared other ensemble approaches, and then we show river freezing escalation. This finding shows that the use of
some recent experiments in an effort to determine the ensemble methods in hydrology is definitely superior to the
factors that contribute to Adaboost's lack of quick traditional methods of model learning (which is done on an
overfitting. This paper is offering a quick analysis of the individual basis). The use of ensemble techniques shows its
techniques for generating ensemb les and examined the three unquestionable superiority and proves the point. In addition,
key reasons why ensemble approaches are able to boosting techniques such as boosting, AdaBoost, and steep
outperform any single classifier that is included within the gradient boosting are often successfully applied to
ensemble. The research has also presented some hydrological problems rather than binning, stacking, and
experimental data, which reveal on one of the secrets behind dagging methods. This is because boost technology allows
AdaBoost's impressive level of performance. you to squeeze more water into your system. This is because
In [4] Omer Sagi et al, the object ive of the research is the enhancement techniques are likely to increase the
analyzes conventional, new, and state-of-the-art ensemble gradient. One of the main conclusions that can be drawn
approaches and explores current difficu lties and fro m this analysis is that during the past three years fro m
developments in the area of ensemble learning. The purpose 2018 to 2020, most researchers have used AdaBoost, XGB,
of this article is to exp lain the notion of ensemble learning. rotating forests, random subspaces, and mining. A lthough
The main goal of these kinds of approaches is to increase AdaBoost and XGB boosting methods have been exploited
the accuracy of pred ictions by comb ining the results of a in many studies, more work is needed to evaluate the
number of separate models with their weighted averages. In performance of rotational and stochastic spatial bagging
this research, we take a look at a nu mber of current trends as methods as well as drilling in different hydrological
well as potential future research areas in the field o f modeling. Th is can be achieved using Dagging. Integrating
ensemble learn ing. One current direction is to improve mu ltip le ensemb le techniques into learning algorith ms to
widely used algorithms in order to make them mo re improve performance is an interesting and potentially
effective and suited for "Big Data." These initiatives often fruitfu l research avenue. For examp le, co mbining stacking
involve enhancing the usage of computing resources and methods can further improve the packing and reinforcement
facilitating the spread of algorith ms among several approach.
workstations. The translation of ensemble models into In [7] Debachudamani Prusti et al, the objective o f the
models that are simp ler and more co mprehensive while research is researchers and industry professionals have come
maintaining the pred icted accuracy of the ensemble they up with a nu mber of different methods for detecting fraud,
were generated fro m is another promising research path. In all o f wh ich include the use of different algorith ms in order
conclusion, we discussed current research that aim at to identify fraudulent patterns. This research presents the
merging the ensemble model with deep neural networks, in use of several classification models using machine learning
line with the accelerated pace of research into deep neural techniques to find accuracy and other performance
networks. These researches were carried out very recently. characteristics to detect fraudulent transactions. These
In [5] Thomas Rincy N et al, the objective of the research models are developed to investigate performance. Using
is to conduct a comprehensive analysis of the many different classification algorith ms such as K-Nearest Neighbor (K-
ensemble learning techniques that are widely used in NN), Extreme Learning Machine (ELM ), Rando m Forest
mach ine learning. The main contribution of the research is (RF), Mult ilayer Perceptron (M LP), and closed classifiers,
to discuss a one of the most co mmon types of learning we critically examine their performance enhancement.
methods in mach ine learn ing is termed ensemble learning, Created and evaluated. For its performance as a result of
an essential idea. The benefit of ensemb le learning and its improving the predict ion performance p rovided by this
many methods, such as boosting, which construct a robust model, we presented a prediction classification model which
classifier based on the number of base learners. Bagging is a is a set of five d ifferent algorith ms. The performance
method that combines the bootstrap and aggregation into evaluation of the presented model was established using
one, and it is shown as a parallel ensemb le technique. 20% of the collected test data. Using a s et of machine
Essential contributions made by our research include learning algorith ms is one of the unique methods developed
stacking, a method in wh ich independent learners are for cred it card fraud detection technology. Although there
merged by a learner and a variety of experts to train an are only minor d ifferences in accuracy between different
ensemble of classifiers using a method called sampling. models, we found the percentage of p rediction accuracy o f
Stacking is one of the most important aspects of our wo rk. the proposed classification model to be 83.83%. Th is is a
In future, one of our goals is to imp lement a model based on significant imp rovement over previous percentages. The
mach ine lea rning that makes use of ensemble learning suggested approach achieves a reduction in the fraud
strategies. detection error while simu ltaneously achieving an
In [6] Mohammad Zounemat-Kermani et al, the objective improvement in the fraud prediction rate.
of the research is to analy ze development of ensemb le In [8] Arshad Jamal et al, the object ive of the research
techniques in several applied areas of hydrology. These investigated the possibility of using extreme gradient
techniques include ensemble resampling methods (bagging, boosting (XGBoost) models to assess crash injury severity
boosting, dogging, etc.), model averaging, and clustering. compared to several trad itional machine learning methods
Generalized accu mulation. Accu mulation is also an (logistic regression, random forest, decision t ree). The data
collected for this study were obtained fro m the Min istry of In [10] K.M. Zubair Hasan et al, the objective of the
Transportation (MOT) in Riyadh, Saudi Arabia. research is to provide a classifier that is based on an
Specifically, the data comes fro m the Traffic Safety ensemble technique in order to effectively enhance the
Admin istration. The data in this dataset is from 13,546 decision-making of existing classifiers and diagnose kidney
traffic accidents reported on 15 local highways between disease. Ensemble techniques are a way to improve
January 2017 and December 2019. Experimental results predicted performance beyond that which could be gained
obtained using k-folds (k = 10) fo r d ifferent performance fro m any of the component learning algorith ms by
measures show that the XGBoost method outperforms other combin ing a number of different learning algorith ms into
models. Collective pred iction performance and accuracy o f one solution. In addition, data are examined using 10-fold
individual damage severity classes. This was demonstrated cross-validation and system performance is evaluated using
by the fact that the XGBoost method outperformed other receiver operating characteristic curves. Extensive testing
models in terms of group prediction performance and using the CKD dataset taken from the Un iversity of
individual class accuracy. These results were obtained using California, Irvine Machine Learning Library shows that the
the k-fo ld algorith m. Based on the findings of XGBoost ensemble-based approach provides the best possible
feature importance analysis, the impo rtant variables that can performance. Unlike many conventional prediction
accurately predict the consequences of crash damage algorith ms, the classification accuracy of our proposed
severity are crash type, weather conditions, road surface ensemble learning method reaches 99%.
conditions, damage type at location, lighting conditions, etc. In [11] Sad i Evren Seker et al, in this research, for this
and vehicle type As a result, a co mparative XGBoost study purpose, a total of 333 data sets including uniaxial
using different performance data showed that our model compressive strength and canvas shear force, 103 data sets
outperforms most previous studies. including RQD and 125 data sets including mach ine weight
In [9] D.P. Gaikwad et al, the objective of the research is were collected fro m relevant research literature. The
to offer a unique approach for the identificat ion of intrusions purpose of this study is to predict loader performance using
that is based on the ensemble method of machine learning. six d ifferent machine learn ing algorith ms and a co mbination
The imp lementation of the intrusion detection system makes of different machine learning algorith ms obtained fro m the
advantage of the Bagging technique of ensemble, with ensemble techniques. The following algorith ms are
REPTree serving as the basis class. In order to raise the considered computationally useful: Zero R, random forest
accuracy of the classification and lower the number of false (RF), Gaussian process, linear regression, logistic
positives, we have chosen the relevant characteristics from regression, and mult ilayer perceptron (M LP). As a result,
the NSL KDD dataset. The suggested ensemble method's MLP and RF p rovide better results compared to other
performance is analyzed in terms of the accuracy of its methods. Many road performance predict ion models have
classifications, the amount of time it takes to create models, been published in relevant academic literature, including
and the number of false positives it produces. According to expert energy models, empirical models, and artificial
the findings of the experiments, the Bagging ensemble neural network models. On the other hand, these models are
combined with the REPTree base class has the greatest rarely used as data because they mostly have little or
classification accuracy. Bu ild ing a model v ia the Bagging moderate correlation with the actual field perfo rmance of
approach results in a reduction in the amount of time needed loaders.
to complete the task. When co mpared to other approaches to In [12] Nouf Rahimi et al, the objective of the research is
mach ine learning, the proposed ensemble method produces to further imp rove the accuracy and expand the availab ility
false positive rates that are significantly lower than those of FR representations, we introduce a novel ML
achieved by other methods. The proposed approach is tested classification approach. This strategy uses higher precision
on a test dataset, and its validity is checked using a 10-fold as the weight of the weighted group voting method. It is a
cross validation. Classifiers' abilit ies are evaluated in terms technique that combines different machine learning models.
of the accuracy of their classifications, the amount of time Simp le Bayes models, support vector machines (SVM),
they take to develop models, and the number of false decision tree models, logistic regression models, and
positives they produce. The effectiveness of the strategy is support vector classification models create the five
evaluated in relation to that of other conventional machine combined models (SVC).The co llected data sets served as
learning strategies. Based on the findings of the research, the basis for method implementation, train ing and testing.
the ensemble bagging machine learning approach achieves FR classification took 0.7 seconds and the classification
the greatest classification accuracy of 99.67% when accuracy was 99.45%. SVM , SVC, simple Bayes, decision
subjected to 10-fold cross validation and 81.29% when tree and logistic regression (ML) machine learning
applied to the test dataset. The amount of time spent classifiers were used to develop this model. The proposed
creating the model and the number of false positives is two ensemble using the most accurate mach ine learning based
of the most important aspects of the procedure. Co mpared to classifiers (SVM, SVC and logistic regression) showed the
the AdaBoost algorith m, which uses a decision -based same accuracy (99.45%) as using five classifiers. The only
classifier and other machine learn ing strategies, the model difference is the time, the decrease represents the
building time o f this method and the number of false improvement when using fewer classifiers.
positives are much less. Overall, the developed ensemble In [13] Xibin DONG et al, the objective of the research is
method is more efficient and provides the highest to evaluate the current state of research concerning the
classification accuracy with the lowest number of false conventional methods of ensemble learn ing and categorize
positives. these methods based on the many qualit ies they exhib it. In
addition, we present challenges and potential research
directions for each mainstream approach of ensemble good agreement with the experimental facts regarding the
learning, and we also provide an additional introduction for fluorescence wavelengths. This was shown during the
the combination of ensemble learning with various other processes of regression that were carried out on the
hotspots in machine learning, such as deep learning, fluorescence dataset. It was shown that blending had a
reinforcement learning, and so on. However, there is still stronger predictive performance for the classification o f
work to be done to better enhance the performances of liquid crystal behavior, and it provided insight into the
ensemble models, particularly in situations when the data behaviors of liquid crystals. The research came to this
includes pretty patterns. Following the completion of this conclusion as one of its results. Evidence that the linked
work, we believe that readers will have a fundamental processes in the data have been accurately defined may be
comprehension of the above current ensemble learning seen in the form of an increase in the accuracy of the
methodologies and will be able to use ensemble learning blending strategy. Although DT-based ensemble learning
fro m a variety of perspectives. We hope that by presenting models were powerful enough to consistently predict
some proposals for future ensemble learn ing paths, we may features, different DT -based ensemble learning models
be able to provide some insight on this emerging topic. produced contradictory predictions for the same co mpounds.
In [14] Xibin Dong et al, the objective of the research is This was the case despite the fact that DT-based ensemble
to analyze the current state of research concerning the learning models were used. Therefore, the inconsistency of
conventional methods of ensemble learning and to level-0 model predictions may be overcome by blending
categorize such methods according to their many defining techniques so long as more than two levels of models made
features. In addition, we present challenges and potential correct forecasts.
research directions for each mainstream approach of After literature survey it has been observed that both
ensemble learning, and we also provide an additional bagging and boosting are techniques that have the benefit of
introduction for the co mbination of ensemble learning with enhancing model stability while simultaneously reducing
various other critical areas in machine learning, such as deep variation. In addition, these strategies may be utilized to
learning, reinfo rcement learn ing, and so on. They presented deal with imbalanced datasets and provide more accurate
a range of concerns as well as potential research predictions. By training nu merous models on different
opportunities, regarding ensemb le learn ing. However, there subsets of the data, it is feasible to significantly reduce the
is still work to be done to better enhance the performances amount of time spent "overfitting" the data by using bagging
of ensemble models, particularly in situations when the data and boosting methods. Both bagging and boosting are
includes intricate patterns. Following the completion of this examples of algorith ms that may assist make models more
work, we believe that readers will have a fundamental interpretable by showing which attributes are most essential
comprehension of the aforementioned current ensemb le for the predictions.
learning methodologies and will be able to use ensemble So me drawbacks are also observed, that real-time
learning fro m a variety of perspectives. We hope that by applications are not a good fit for bagging and boosting
presenting some proposals for future ensemble learning algorith ms since these techniques are co mputationally costly
paths, we may be able to provide some insight on this and require a large number of training data points. As a
emerging topic. result, these algorithms are not well suited for real-time
In [15] Chia Hsiu Chen et al, the objective o f the research applications. Bagging and boosting are two different types
is when applied to binary classification and regression of algorithms, but both involve training mu ltiple models,
modeling tasks, they investigated the levels of predictability which may be costly co mputationally. In the event that the
and interpretability provided by four standard and well - models are not tuned appropriately, there is a possibility that
established ensemble learning methods (Random forest, they may overfit the data. Given that bagging and boosting
extreme randomized forests, adaptive boosting, and gradient algorith ms use many models, the results they provide might
boosting). After then, the approaches for blending were be difficult to understand.
established by assembling the main co mponents of four T ABLE 1. EXIST ING APPROACH LIMIT AT IONS
various ways of learning in an ensemble. The blending Author Publication Method / Limitations
Name with Year Algorithm
strategy was able to bring about an increase in performance
Used
as well as a unity of interpretation since it comp iled the
Emine Hindawi, Ensemble -
unique predictions that were given by the several learning Yaman et al 2019 classifiers
models. Th is research presents an in-depth assessment of the [1]
major parts of t wo case studies that supplied us with so me Omer Sagi Wiley, 2018 Ensemble Improve
valuable informat ion on co mpound properties. The case et al [4] methods standard
studies were carried out by the same researchers that algorithms to
conducted the previous research. These research findings make them
were used in order to provide us with the aforementioned more efficient
informat ion. It's possible that QSPR modeling with and
appropriate for
interpretable machine learn ing algorithms might help
"Big Data"
enhance chemical design so that it functions more Thomas IEEE, 2020 - One of our
efficiently, generate knowledge, and test ideas for better Rincy N et goals is to
results. According to the findings that were obtained, the al [5] implement a
combination of the QC descriptors and the fluorescence model based
dataset resulted in the production of a model that not only on machine
had good predictability and interpretability, but also had learning that
2). Boosting
Subset Selection Method During the bagging process, In the boosting procedure, It trains each individual
various training data subsets additional sub-datasets are dataset by applying the
are selected at random, and selected at random from the knowledge gained from the
then the original dataset is weighted dataset, and then overall dataset.
used to make replacements. replacement takes place.
Dependencies In Bagging, each model is When boosting, the It illustrates how a
developed independently performance of subsequent generalized model is created
from the others. models is dependent on the by integrating a number of
performance of the model different models into one.
that came before it.
Performance If the issue is overfitting, If you have an issue with Addressing overfitting in the
bagging would be the best underfitting, boosting might models used for stacking
option. provide better results. seems to be the superior
strategy.
Takeaways Bagging is the process of Boosting consists of Stacking includes fitting
fitting multiple decision sequentially adding many kinds of models to the
trees to separate samples of ensemble techniques that same data and use a second
the same dataset and correct the predicted output model to determine the
averaging the resulting provided by preceding optimal way to combine the
predictions. models and producing a predictions.
weighted average of the
predictions.
[2] X. Gao, C. Shan, C. Hu, Z. Niu, and Z. Liu, “An Adaptive Ensemble
CONCLUSION & FUTURE WORK Machine Learning Model for Intrusion Detection,” IEEE Access, vol.
The ability to mix the results of many different models is 7, pp. 82512–82521, 2019, doi: 10.1109/ACCESS.2019.2923640.
one of the primary reasons why ensemble techniques are so [3] T. G. Dietterich, “Ensemble methods in machine learning,” Lect. Notes
significant in the field of machine learning. Since of this, the Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes
Bioinformatics), vol. 1857 LNCS, pp. 1–15, 2000, doi: 10.1007/3-540-
performance of an ensemble of models may often be 45014-9_1.
enhanced over that of a single model because the ensemble [4] O. Sagi and L. Rokach, “Ensemble learning: A survey,” Wiley
can learn fro m the benefits and drawbacks of each individual Interdiscip. Rev. Data Min. Knowl. Discov., vol. 8, no. 4, pp. 1–18,
model. Because the mixture of models may produce a more 2018, doi: 10.1002/widm.1249.
accurate model than any single model, ensemble approaches [5] N. Thomas Rincy and R. Gupta, “Ensemble learning techniques and its
can also assist in lowering the likelihood of overfitting efficiency in machine learning: A survey,” 2nd Int. Conf. Data, Eng.
occurring..The most common types of mistake that may Appl. IDEA 2020, 2020, doi: 10.1109/IDEA49133.2020.9170675.
occur in learning models are noise, variation, and bias. The [6] M. Zounemat-Kermani, O. Batelaan, M. Fadaee, and R. Hinkelmann,
ensemble approaches that are used in machine learning assist “Ensemble machine learning paradigms in hydrology: A review,” J.
Hydrol., vol. 598, no. December 2020, 2021, doi:
to reduce the impact of these error-causing factors, which in 10.1016/j.jhydrol.2021.126266.
turn helps to ensure that the machine learning (ML)
[7] D. Prusti and S. K. Rath, “Fraudulent Transaction Detection in Credit
algorithms are accurate and reliable. Card by Applying Ensemble Machine Learning techniques,” 2019 10th
The main objective of this research is to determine why Int. Conf. Comput. Commun. Netw. Technol. ICCCNT 2019, pp. 6–11,
to use ensemble machine learning techniques instead of 2019, doi: 10.1109/ICCCNT45670.2019.8944867.
single machine learning models. [8] A. Jamal et al., “Injury severity prediction of traffic crashes with
And we just covered ensemble machine learning ensemble machine learning techniques: a comparative study,” Int. J.
techniques and also compared all the factors of bagging, Inj. Contr. Saf. Promot., vol. 28, no. 4, pp. 408–427, 2021, doi:
boosting, and stacking because our paper is only discussing 10.1080/17457300.2021.1928233.
three main techniques of ensemble machine learning [9] D. P. Gaikwad and R. C. T hool, “Intrusion detection system using
bagging ensemble method of machine learning,” Proc. - 1st Int. Conf.
techniques; other methods are out of the scope of th is Comput. Commun. Control Autom. ICCUBEA 2015, pp. 291–295,
research. 2015, doi: 10.1109/ICCUBEA.2015.61.
Future research opportunities include improving the [10] K. M. Z. Hasan, “Perfomance Evaluation of Ensemble-Based Machine
accuracy of the predictions produced by ensemble models , Learning Kidney Disease”, [Online]. Available:
using ensemble models while simu ltaneously reducing the http://dx.doi.org/10.1007/978-981-13-5953-8_34.
computational costs of training, creating new ensemble [11] S. E. Seker and I. Ocak, “Performance prediction of roadheaders using
methods that are more efficient than the approaches that are ensemble machine learning techniques,” Neural Comput. Appl., vol.
already in use and finally applying the use of ensemble 31, no. 4, pp. 1103–1116, 2019, doi: 10.1007/s00521-017-3141-2.
[12] N. Rahimi, F. Eassa, and L. Elrefaei, “An ensemble machine learning
methods to non-traditional methods of machine learning, technique for functional requirement classification,” Symmetry
such as unsupervised learning and reinforcement learning . (Basel)., vol. 12, no. 10, pp. 1–26, 2020, doi: 10.3390/sym12101601.
[13] X. Dong, Z. Yu, W. Cao, Y. Shi, and Q. Ma, “A survey on ensemble
REFERENCES learning,” Front. Comput. Sci., vol. 14, no. 2, pp. 241–258, 2020, doi:
[1] M. A. Yaman, F. Rattay, and A. Subasi, “Comparison of Bagging and 10.1007/s11704-019-8208-z.
Boosting Ensemble Machine Learning Methods for Face [14] X. Dong, Z. Yu, W. Cao, Y. Shi, and Q. Ma, “A survey on ensemble
Recognition,” Procedia Comput. Sci., vol. 194, pp. 202–209, 2021, learning,” Front. Comput. Sci., vol. 14, no. 2, pp. 241–258, 2020, doi:
doi: 10.1016/j.procs.2021.10.074. 10.1007/s11704-019-8208-z.
[15] C. H. Chen, K. Tanaka, M. Kotera, and K. Funatsu, “Comparison and
improvement of the predictability and interpretability with ensemble
learning models in QSPR applications,” J. Cheminform., vol. 12, no. 1, Teja, “Inhibiting Webshell Attacks by Random Forest Ensembles with
pp. 1–16, 2020, doi: 10.1186/s13321-020-0417-9. XGBoost,” J. Inf. Technol. Digit. World, vol. 4, no. 3, pp. 153–166,
[16] D. Sasikala, D. Chandrakanth, C. Sai Pranathi Reddy, and J. Jitendra 2022, doi: 10.36548/jitdw.2022.3.003.