Professional Documents
Culture Documents
lOMoARcPSD|369 802 53
UNIT IV
Combining multiple learners: Model combination schemes, Voting, Ensemble Learning - bagging,
boosting, stacking, Unsupervised learning: K-means, Instance Based Learning: KNN, Gaussian
mixture models and Expectation maximization
ENSEMBLE TECHNIQUES
Ensemble learning is a supervised learning technique used in machine learning to improve overall
performance by combining the predictions from multiple models. Ensemble techniques are the methods
that use multiple learning algorithms or models to produce one optimal predictive model. The model
produced has better performance than the base learners. Ensemble methods are techniques that
improving the accuracy of results in models by combining multiple models instead of using a single model.
The combined models increase the accuracy of the results.
• The most popular ensemble methods are Voting, Boosting, Bagging, and Stacking.
• Ensemble methods are suitable for regression and classification to boost the accuracy of models.
lOMoARcPSD|369 802 53
Sequential
ensemble techniques generate base learners in a sequence, e.g., Adaptive Boosting (AdaBoost). Sequential
generation of base learners promotes dependence between base learners.
lOMoARcPSD|369 802 53
In parallel ensemble techniques, base learners are generated in a parallel format, e.g.,Bagging. Parallel
methods utilize the parallel generation of base learners to encourage independence between the base
learners, which reduces the error due to the application of averages.
• Voting
• Bagging or Bootstrap aggregation
• Boosting
• Stacking or Stacked Generalization (Blending)
• Random Forests
Voting
Voting is an ensemble machine learning algorithm that involves making a prediction that is the
average (regression) or the sum (classification) of multiple machine learning models.
Bagging
• Bagging, or bootstrap aggregation, is an ensemble method increases the accuracy of models through
decision trees, which reduces variance of individual models. The reduction of variance increases
accuracy, eliminating overfitting.
• Bagging is mainly applied in supervised learning problems such as classification and regression.
o Bootstrapping is a random sampling technique in which samples are derived from the data
using the replacement procedure.
o Aggregation in bagging is done to incorporate all possible outcomes of the prediction and
randomize the outcome. Aggregation is based on the all outcomes of predictive models.
Eg.- RandomForest where the predictions from decision trees(base learners) are taken parallelly.
• Bagging mechanism is a parallel stream of processing occurs. The main aim of the bagging method is to
reduce variance in the ensemble predictions.
• Different training sets and Same algorithm
lOMoARcPSD|369 802 53
Advantages of Bagging:
▪ Weak base learners are combined to form a single strong learner that is more stable than single
learners.
Disadvantages of Bagging:
Random Forests
Random forest is both a supervised learning algorithm and an ensemble algorithm. Random
forests use decision trees as the base estimator for the predictions and improve the performance of
models by calculating the majority voting / average prediction of multiple decision trees.
Boosting
Boosting is an ensemble method that converts weak learners into strong learners in which each
predictor learns from preceding predictor mistakes to make better predictions in the future. This technique
combines several weak base learners to form one strong learner, thus significantly improving the
predictability of models
• Gradient Boosting
• Adaptive Boosting (AdaBoost)
• XGBoost (Extreme Gradient Boosting).
lOMoARcPSD|369 802 53
• Gradient boosting adds predictors sequentially to the ensemble, where preceding predictors correct their
successors, thereby increasing the model’s accuracy.
• XGBoost makes use of decision trees with boosted gradient, providing improved speed and performance
Here, sequential processing of the dataset occurs. The first classifier is fed with the entire dataset, and the
predictions are analyzed. The instances where Classifier-1 fails to produce correct predictions are fed to the second
classifier. This is done so that Classifier-2 can specifically focus on the problematic areas of feature space and
learn an appropriate decision boundary. Similarly, further steps of the same idea are employed, and then the
ensemble of all these previous classifiers is computed to make the final prediction on the test data.
Stacking (Blending)
Stacking, also known as Stacked Generalization, is an ensemble technique that improves the accuracy
of the models by combining predictions of multiple classification or regression machine learning models.
This technique works by allowing a training algorithm to ensemble several other similar learning algorithm
predictions. Stacking has been successfully implemented in regression and classifications. It can also be
used to measure the error rate involved during bagging.
lOMoARcPSD|369 802 53
While bagging and boosting used homogenous weak learners for ensemble, Stacking often considers
heterogeneous weak learners, learns them in parallel, and combines them by training a meta-learner to
output a prediction based on the different weak learner’s predictions.
In stacking, an algorithm takes the outputs of sub-models as input and attempts to learn how to best
combine the input predictions to make a better output prediction.
lOMoARcPSD|369 802 53
The stacked model with meta learner = Logistic Regression and the weak learners = Decision Tree,
Random Forest, K Neighbors Classifier and XGBoost
The stacking ensemble method also involves creating bootstrapped data subsets, like the bagging ensemble
mechanism for training multiple models.
However, here, the outputs of all such models are used as an input to another classifier, called meta-classifier,
which finally predicts the samples.
For example, in cat/dog/wolf classifier, Classifier-1 can distinguish between cats and dogs, but not between dogs
and wolves, the meta-classifier present in the second layer will be able to capture this behavior from classifier-1.
The meta classifier can then correct this behavior before making the final prediction.
▪ Fraud Detection
lOMoARcPSD|369 802 53
Unsupervised Learning
Supervised machine learning in which models are trained using labeled data under the supervision
of training data.
Unsupervised learning is a type of machine learning in which models are trained using
unlabeled dataset without any supervision of training data to find the hidden patterns from the given
dataset
Unsupervised learning is a learning method in which a machine learns without any supervision. The
training is provided to the machine with the set of data that has not been labeled, classified, or categorized,
and the algorithm needs to act on that data without any supervision. The goal of unsupervised learning is to
restructure the input data into new features or a group of objects with similar patterns.
Unsupervised learning is a machine learning technique in which models are not supervised using
training dataset.
Unsupervised learning cannot be directly applied to a regression or classification problem because
unlike supervised learning, It has the input data but no corresponding output data.
The goal of unsupervised learning is to find the underlying structure of dataset, group that data
according to similarities, and represent that dataset in a compressed format.
Here, input data is an unlabeled, which means it is not categorized and corresponding outputs are also not
given. Now, this unlabeled input data is fed to the machine learning model in order to train it. Firstly, it will
interpret the raw data to find the hidden patterns from the data and then will apply suitable algorithms such
as k-means clustering, Decision tree, etc. Algorithm divides the data objects into groups according to the
similarities and difference between the objects.
lOMoARcPSD|369 802 53
o Clustering: Clustering is an unsupervised method of grouping objects into clusters such that objects with
most similarities remains into a group and has less or no similarities of another group.
o Association: An association rule is an unsupervised learning method which finds the relationships
between variables in the large database. It determines set of items that occurs together in dataset.
lOMoARcPSD|369 802 53
o A typical example of Association rule is Market Basket Analysis i.e., People who buy X
item (suppose a bread) are also tend to purchase Y (Butter/Jam) item.
List of Unsupervised learning algorithms:
o Clustering:
▪ K-means clustering
▪ Expectation Maximization
▪ Hierarchal clustering
▪ Apriori algorithm
▪ FP growth Algorithm
▪ ECLAT Algorithm
o Dimensionality Reduction:
2. Medical Diagnosis
3. Marketing
4. Insurance
o The result of the unsupervised learning algorithm might be less accurate as input data is not labeled,
and algorithms do not know the exact output in advance.
Clustering
Clustering or cluster analysis is a machine learning technique, which groups the unlabelled dataset.
It can be defined as "A way of grouping the data points into different clusters, consisting of similar data
points. The objects with the possible similarities remain in a group that has less or no similarities with
another group."
Clustering is similar to the classification algorithm, but the only difference is the type of dataset used. In
classification, the labeled data set is used, whereas in clustering, unlabelled dataset is used.
K-Means Clustering is an Unsupervised Learning algorithm, which groups the unlabeled dataset
into different clusters. Here K defines the number of pre-defined clusters that need to be created in the
process, as if K=2, there will be two clusters, and for K=3, there will be three clusters, and so on.
It is an Unsupervised Learning algorithm algorithm that divides the unlabeled dataset into k
different clusters in such a way that each dataset belongs only one group that has similar properties.
It is a centroid-based algorithm, where each cluster is associated with a centroid. The main aim of
this algorithm is to minimize the sum of distances between the data point and their corresponding clusters.
The algorithm takes the unlabeled dataset as input, divides the dataset into k-number of clusters, and
repeats the process until it does not find the best clusters.
o Determines the best value for K center points or centroids by an iterative process.
o Assigns each data point to its closest k-center. Those data points which are near to the particular k-
center, create a cluster.
The below diagram explains the working of the K-means Clustering Algorithm:
Step-5: Repeat the third steps, which means reassign each datapoint to new closest centroid of each
cluster.
Step-6: If any reassignment occurs, then go to step-4 else go to FINISH.
Step-7: The model is ready.
Considering the visual plots: Two variables M1 and M2. The x-y axis scatter plot of these two variables is
given below:
o Take number k of clusters, i.e., K=2, to identify the dataset and to put them into two different
clusters. It means group these datasets into two different clusters.
o Choose some random k points or centroid to form the cluster. These points can be either the points
from the dataset or any other point. So, here we are selecting the below two points as k points,
which are not the part of our dataset. Consider the below image:
lOMoARcPSD|369 802 53
o Now we will assign each data point of the scatter plot to its closest K-point or centroid. We will
compute it by applying some mathematics that we have studied to calculate the distance between
two points. So, we will draw a median between both the centroids. Consider the below image:
From the above image, it is clear that points left side of the line is near to the K1 or blue centroid, and
points to the right of the line are close to the yellow centroid. Let's color them as blue and yellow for clear
visualization.
o As we need to find the closest cluster, so we will repeat the process by choosing a new centroid.
To choose the new centroids, we will compute the center of gravity of these centroids, and will find
new centroids as below:
lOMoARcPSD|369 802 53
o Next, we will reassign each datapoint to the new centroid. For this, we will repeat the same process
of finding a median line. The median will be like below image:
From the above image, we can see, one yellow point is on the left side of the line, and two blue points are
right to the line. So, these three points will be assigned to new centroids.
As reassignment has taken place, so we will again go to the step-4, which is finding new centroids or K-
points. Repeat the process by finding the center of gravity of centroids, so the new centroids will be as
lOMoARcPSD|369 802 53
o As we got the new centroids so again will draw the median line and reassign the data points. So, the
image will be:
o We can see in the above image; there are no dissimilar data points on either side of the line, which
means our model is formed. Consider the below image:
As our model is ready, so we can now remove the assumed centroids, and the two final clusters will be as
shown in the below image:
lOMoARcPSD|369 802 53
Example: Two categories, i.e., Category A and Category B, and a new data point x1, so this data point
will lie in which of these categories. To solve this type of problem, K-NN algorithm is used. With the help
of K-NN, we can easily identify category or class of a particular dataset. Consider the below diagram:
Suppose, an image that looks similar to cat and dog, but we want to know either it is a cat or dog.
KNN algorithm is used for this identification, it works on a similarity measure. KNN model will find the
lOMoARcPSD|369 802 53
similar features of the new data set and based on most similar features it will put either in cat or dog
category.
o By calculating the Euclidean distance we got the nearest neighbors, as three nearest neighbors in
category A and two nearest neighbors in category B.
o Consider the below image:
o The 3 nearest neighbors are from category A, hence this new data point must belong to category A.
o Computation cost is high because of calculating the distance between the data points for all the
training samples.
ml
https://www.edureka.co/blog/arti昀椀cial-intelligence-with-python/
https://prutor.ai/regression-algorithms-linear-regression/
https://www.analyticsvidhya.com/blog/2021/10/implementing-arti昀椀cial-neural-networkclassi昀椀cation-in-python- from-
scratch/
https://www.vtupulse.com/arti昀椀cial-intelligence/implementation-of-a-star-search-algorithm-in-python/
lOMoARcPSD|369 802 53