You are on page 1of 25

lOMoARcPSD|369 802 53

lOMoARcPSD|369 802 53

UNIT IV

ENSEMBLE TECHNIQUES AND UNSUPERVISED LEARNING

Combining multiple learners: Model combination schemes, Voting, Ensemble Learning - bagging,
boosting, stacking, Unsupervised learning: K-means, Instance Based Learning: KNN, Gaussian
mixture models and Expectation maximization

ENSEMBLE TECHNIQUES

Ensemble learning is a supervised learning technique used in machine learning to improve overall
performance by combining the predictions from multiple models. Ensemble techniques are the methods
that use multiple learning algorithms or models to produce one optimal predictive model. The model
produced has better performance than the base learners. Ensemble methods are techniques that
improving the accuracy of results in models by combining multiple models instead of using a single model.
The combined models increase the accuracy of the results.

Each model can be a different classifier:

• K-Nearest Neighbor (KNN)


• Decision Tree
• Logistic Regression etc.,

• The most popular ensemble methods are Voting, Boosting, Bagging, and Stacking.
• Ensemble methods are suitable for regression and classification to boost the accuracy of models.
lOMoARcPSD|369 802 53

Types of Ensemble Methods

Ensemble methods fall into two broad categories, i.e.,


• Sequential Ensemble Techniques And
• Parallel Ensemble Techniques.

Sequential
ensemble techniques generate base learners in a sequence, e.g., Adaptive Boosting (AdaBoost). Sequential
generation of base learners promotes dependence between base learners.
lOMoARcPSD|369 802 53

In parallel ensemble techniques, base learners are generated in a parallel format, e.g.,Bagging. Parallel
methods utilize the parallel generation of base learners to encourage independence between the base
learners, which reduces the error due to the application of averages.

Various Types of Ensemble Methods

• Voting
• Bagging or Bootstrap aggregation
• Boosting
• Stacking or Stacked Generalization (Blending)
• Random Forests

Homogenous base learners refer to base


learners of the same type, with similar qualities. Heterogeneous base learners are learners of distinct types.

Voting

Voting is an ensemble machine learning algorithm that involves making a prediction that is the
average (regression) or the sum (classification) of multiple machine learning models.

• Same training sets


• Different algorithms
• sklearn.ensemble: VotingRegressor, VotingClassifier
lOMoARcPSD|369 802 53

Bagging

• Bagging, or bootstrap aggregation, is an ensemble method increases the accuracy of models through
decision trees, which reduces variance of individual models. The reduction of variance increases
accuracy, eliminating overfitting.

• Bagging is mainly applied in supervised learning problems such as classification and regression.

• It involves two steps, i.e., bootstrapping and aggregation.

o Bootstrapping is a random sampling technique in which samples are derived from the data
using the replacement procedure.

o Aggregation in bagging is done to incorporate all possible outcomes of the prediction and
randomize the outcome. Aggregation is based on the all outcomes of predictive models.

First step in bagging is bootstrapping, where random


data samples are fed to each base learner. Base learning algorithm is run on samples to complete
procedure. In Aggregation, Outputs from the base learners are combined. The goal is to increase the
accuracy and reducing variance to a large extent.

Eg.- RandomForest where the predictions from decision trees(base learners) are taken parallelly.

• Bagging mechanism is a parallel stream of processing occurs. The main aim of the bagging method is to
reduce variance in the ensemble predictions.
• Different training sets and Same algorithm
lOMoARcPSD|369 802 53

• Two models from sklearn.ensemble: BaggingClassifier, BaggingRegressor

Advantages of Bagging:

▪ Weak base learners are combined to form a single strong learner that is more stable than single
learners.

▪ It eliminates variance, thereby reducing the overfitting of models.

Disadvantages of Bagging:

▪ One limitation of bagging is that it is computationally expensive.

Random Forests

Random forest is both a supervised learning algorithm and an ensemble algorithm. Random
forests use decision trees as the base estimator for the predictions and improve the performance of
models by calculating the majority voting / average prediction of multiple decision trees.

• Base estimator is a decision tree


lOMoARcPSD|369 802 53

• Each estimator uses a different bootstrap sample of the training set


• Two models from sklearn.ensemble: RandomForestClassifier, RandomForestRegressor

Boosting

Boosting is an ensemble method that converts weak learners into strong learners in which each
predictor learns from preceding predictor mistakes to make better predictions in the future. This technique
combines several weak base learners to form one strong learner, thus significantly improving the
predictability of models

▪ Boosting can be used in classification and regression problems.

Eg. – XGBoost, AdaBoost.

Types of boosting Algorithms:

• Gradient Boosting
• Adaptive Boosting (AdaBoost)
• XGBoost (Extreme Gradient Boosting).
lOMoARcPSD|369 802 53

• Stochastic Gradient Boosting


• AdaBoost uses weak learners in the form of decision trees, which mostly include one split that is
popularly known as decision stumps.

• Gradient boosting adds predictors sequentially to the ensemble, where preceding predictors correct their
successors, thereby increasing the model’s accuracy.

• XGBoost makes use of decision trees with boosted gradient, providing improved speed and performance

Fig: Boosting ensemble mechanism


lOMoARcPSD|369 802 53

Here, sequential processing of the dataset occurs. The first classifier is fed with the entire dataset, and the
predictions are analyzed. The instances where Classifier-1 fails to produce correct predictions are fed to the second
classifier. This is done so that Classifier-2 can specifically focus on the problematic areas of feature space and
learn an appropriate decision boundary. Similarly, further steps of the same idea are employed, and then the
ensemble of all these previous classifiers is computed to make the final prediction on the test data.

Stacking (Blending)

Stacking, also known as Stacked Generalization, is an ensemble technique that improves the accuracy
of the models by combining predictions of multiple classification or regression machine learning models.

This technique works by allowing a training algorithm to ensemble several other similar learning algorithm
predictions. Stacking has been successfully implemented in regression and classifications. It can also be
used to measure the error rate involved during bagging.
lOMoARcPSD|369 802 53

While bagging and boosting used homogenous weak learners for ensemble, Stacking often considers
heterogeneous weak learners, learns them in parallel, and combines them by training a meta-learner to
output a prediction based on the different weak learner’s predictions.

In stacking, an algorithm takes the outputs of sub-models as input and attempts to learn how to best
combine the input predictions to make a better output prediction.
lOMoARcPSD|369 802 53

Stacking for Machine Learning

The stacked model with meta learner = Logistic Regression and the weak learners = Decision Tree,
Random Forest, K Neighbors Classifier and XGBoost

The stacking ensemble method also involves creating bootstrapped data subsets, like the bagging ensemble
mechanism for training multiple models.

However, here, the outputs of all such models are used as an input to another classifier, called meta-classifier,
which finally predicts the samples.

For example, in cat/dog/wolf classifier, Classifier-1 can distinguish between cats and dogs, but not between dogs
and wolves, the meta-classifier present in the second layer will be able to capture this behavior from classifier-1.
The meta classifier can then correct this behavior before making the final prediction.

A pictorial representation of the stacking mechanism is shown below.


lOMoARcPSD|369 802 53

Stacking machine learning algorithms work by:

• Using multiple first level models to predict on a training set


• Combining (stacking) the predictions to generate a new training set
• Fitting and predicting a second level model on the generated training set

• From sklearn.ensemble: StackingClassifier

Applications of Ensemble Learning

▪ Disease detection Remote Sensing

▪ Speech emotion recognition

▪ Fraud Detection
lOMoARcPSD|369 802 53

Unsupervised Learning
Supervised machine learning in which models are trained using labeled data under the supervision
of training data.
Unsupervised learning is a type of machine learning in which models are trained using
unlabeled dataset without any supervision of training data to find the hidden patterns from the given
dataset

Unsupervised learning is a learning method in which a machine learns without any supervision. The
training is provided to the machine with the set of data that has not been labeled, classified, or categorized,
and the algorithm needs to act on that data without any supervision. The goal of unsupervised learning is to
restructure the input data into new features or a group of objects with similar patterns.
Unsupervised learning is a machine learning technique in which models are not supervised using
training dataset.
Unsupervised learning cannot be directly applied to a regression or classification problem because
unlike supervised learning, It has the input data but no corresponding output data.
The goal of unsupervised learning is to find the underlying structure of dataset, group that data
according to similarities, and represent that dataset in a compressed format.

Working of Unsupervised Learning

Here, input data is an unlabeled, which means it is not categorized and corresponding outputs are also not
given. Now, this unlabeled input data is fed to the machine learning model in order to train it. Firstly, it will
interpret the raw data to find the hidden patterns from the data and then will apply suitable algorithms such
as k-means clustering, Decision tree, etc. Algorithm divides the data objects into groups according to the
similarities and difference between the objects.
lOMoARcPSD|369 802 53

Types of Unsupervised Learning Algorithm:

o Clustering: Clustering is an unsupervised method of grouping objects into clusters such that objects with

most similarities remains into a group and has less or no similarities of another group.
o Association: An association rule is an unsupervised learning method which finds the relationships

between variables in the large database. It determines set of items that occurs together in dataset.
lOMoARcPSD|369 802 53

o A typical example of Association rule is Market Basket Analysis i.e., People who buy X
item (suppose a bread) are also tend to purchase Y (Butter/Jam) item.
List of Unsupervised learning algorithms:

o Clustering:

▪ K-means clustering

▪ KNN (k-nearest neighbors)

▪ Gaussian Mixture Model

▪ Expectation Maximization

▪ Hierarchal clustering

o Association rule Mining:

▪ Apriori algorithm

▪ FP growth Algorithm

▪ ECLAT Algorithm

o Dimensionality Reduction:

▪ Principle Component Analysis

▪ Independent Component Analysis

Applications of Unsupervised Learning:

1. Market Basket Analysis

2. Medical Diagnosis

3. Marketing

4. Insurance

Advantages of Unsupervised Learning


o Unsupervised learning is used for more complex tasks as compared to supervised learning because
unsupervised learning have labeled input data.
o It is preferable as it is easy to get unlabeled data in compared to labeled data.
Disadvantages of Unsupervised Learning
o It is more difficult than supervised learning as it does not have corresponding output.
lOMoARcPSD|369 802 53

o The result of the unsupervised learning algorithm might be less accurate as input data is not labeled,
and algorithms do not know the exact output in advance.
Clustering

Clustering or cluster analysis is a machine learning technique, which groups the unlabelled dataset.
It can be defined as "A way of grouping the data points into different clusters, consisting of similar data
points. The objects with the possible similarities remain in a group that has less or no similarities with
another group."

Clustering is similar to the classification algorithm, but the only difference is the type of dataset used. In
classification, the labeled data set is used, whereas in clustering, unlabelled dataset is used.

K-Means Clustering Algorithm


lOMoARcPSD|369 802 53

K-Means Clustering is an Unsupervised Learning algorithm, which groups the unlabeled dataset
into different clusters. Here K defines the number of pre-defined clusters that need to be created in the
process, as if K=2, there will be two clusters, and for K=3, there will be three clusters, and so on.

It is an Unsupervised Learning algorithm algorithm that divides the unlabeled dataset into k
different clusters in such a way that each dataset belongs only one group that has similar properties.

It is a centroid-based algorithm, where each cluster is associated with a centroid. The main aim of
this algorithm is to minimize the sum of distances between the data point and their corresponding clusters.
The algorithm takes the unlabeled dataset as input, divides the dataset into k-number of clusters, and
repeats the process until it does not find the best clusters.

The k-means clustering algorithm mainly performs two tasks:

o Determines the best value for K center points or centroids by an iterative process.

o Assigns each data point to its closest k-center. Those data points which are near to the particular k-
center, create a cluster.

The below diagram explains the working of the K-means Clustering Algorithm:

Working of K-Means Algorithm

Step-1: Select the number K to decide the number of clusters.


Step-2: Select random K points or centroids.
Step-3: Assign each data point to their closest centroid, which will form the predefined K clusters.
Step-4: Calculate the variance and place a new centroid of each cluster.
lOMoARcPSD|369 802 53

Step-5: Repeat the third steps, which means reassign each datapoint to new closest centroid of each
cluster.
Step-6: If any reassignment occurs, then go to step-4 else go to FINISH.
Step-7: The model is ready.

Considering the visual plots: Two variables M1 and M2. The x-y axis scatter plot of these two variables is
given below:

o Take number k of clusters, i.e., K=2, to identify the dataset and to put them into two different
clusters. It means group these datasets into two different clusters.

o Choose some random k points or centroid to form the cluster. These points can be either the points
from the dataset or any other point. So, here we are selecting the below two points as k points,
which are not the part of our dataset. Consider the below image:
lOMoARcPSD|369 802 53

o Now we will assign each data point of the scatter plot to its closest K-point or centroid. We will
compute it by applying some mathematics that we have studied to calculate the distance between
two points. So, we will draw a median between both the centroids. Consider the below image:

From the above image, it is clear that points left side of the line is near to the K1 or blue centroid, and
points to the right of the line are close to the yellow centroid. Let's color them as blue and yellow for clear
visualization.

o As we need to find the closest cluster, so we will repeat the process by choosing a new centroid.
To choose the new centroids, we will compute the center of gravity of these centroids, and will find
new centroids as below:
lOMoARcPSD|369 802 53

o Next, we will reassign each datapoint to the new centroid. For this, we will repeat the same process
of finding a median line. The median will be like below image:

From the above image, we can see, one yellow point is on the left side of the line, and two blue points are
right to the line. So, these three points will be assigned to new centroids.

As reassignment has taken place, so we will again go to the step-4, which is finding new centroids or K-
points. Repeat the process by finding the center of gravity of centroids, so the new centroids will be as
lOMoARcPSD|369 802 53

shown in the below image:

o As we got the new centroids so again will draw the median line and reassign the data points. So, the
image will be:

o We can see in the above image; there are no dissimilar data points on either side of the line, which
means our model is formed. Consider the below image:

As our model is ready, so we can now remove the assumed centroids, and the two final clusters will be as
shown in the below image:
lOMoARcPSD|369 802 53

K-Nearest Neighbor(KNN) Algorithm

o K-Nearest Neighbour is simplest Machine Learning algorithms based on Supervised Learning


technique. K-NN algorithm can be used for Regression as well as for Classification but mostly it is
used for the Classification problems.
o K-NN is a non-parametric algorithm, it does not make any assumption on underlying data.
o It is also called a lazy learner algorithm because it does not learn from the training set
immediately instead it stores the dataset and at the time of classification, it performs an action on
the dataset.
o K-NN algorithm assumes the similarity between the new data and available cases and put the new
case into the category that is most similar to the available categories.
o K-NN algorithm stores all the available data and classifies a new data point based on the similarity.

Example: Two categories, i.e., Category A and Category B, and a new data point x1, so this data point
will lie in which of these categories. To solve this type of problem, K-NN algorithm is used. With the help
of K-NN, we can easily identify category or class of a particular dataset. Consider the below diagram:

Suppose, an image that looks similar to cat and dog, but we want to know either it is a cat or dog.
KNN algorithm is used for this identification, it works on a similarity measure. KNN model will find the
lOMoARcPSD|369 802 53

similar features of the new data set and based on most similar features it will put either in cat or dog
category.

Working of K-NN Algorithm


The K-NN working can be explained on the basis of the below algorithm:
o Step-1: Select the number K of the neighbors
o Step-2: Calculate the Euclidean distance of K number of neighbors
o Step-3: Take the K nearest neighbors as per the calculated Euclidean distance.
o Step-4: Among these k neighbors, count the number of the data points in each category.
o Step-5: Assign new data points to that category for which the number of the neighbor is maximum.
o Step-6: Our model is ready.
Consider a new data point and need to put it in the required category. Consider the below image:

o Firstly, Choose the number of neighbors, choose the k=5.


o Next, Calculate the Euclidean distance between the data points. The Euclidean distance is the
distance between two points. It can be calculated as:
lOMoARcPSD|369 802 53

o By calculating the Euclidean distance we got the nearest neighbors, as three nearest neighbors in
category A and two nearest neighbors in category B.
o Consider the below image:

o The 3 nearest neighbors are from category A, hence this new data point must belong to category A.

Advantages of KNN Algorithm:


o It is simple to implement.
o It is robust to the noisy training data
o It can be more effective if the training data is large.

Disadvantages of KNN Algorithm:


o Always needs to determine the value of K which may be complex some time.
lOMoARcPSD|369 802 53

o Computation cost is high because of calculating the distance between the data points for all the
training samples.

ml

https://www.edureka.co/blog/arti昀椀cial-intelligence-with-python/

https://prutor.ai/regression-algorithms-linear-regression/

https://www.analyticsvidhya.com/blog/2021/10/implementing-arti昀椀cial-neural-networkclassi昀椀cation-in-python- from-

scratch/

https://www.vtupulse.com/arti昀椀cial-intelligence/implementation-of-a-star-search-algorithm-in-python/
lOMoARcPSD|369 802 53

You might also like