You are on page 1of 22

What is Machine Learning?

Features of Machine Learning:

1) Machine learning uses data to detect various patterns in a given dataset

2) It can learn from past data and improve automatically

3) It is a data driven technology

4) Machine learning is much similar to data mining as it also deals with the huge amount of the
data

Classification of Machine Learning:


Supervised Learning:

Supervised learning, you train the machine using data which is well "labelled." It means some data is
already tagged with the correct answer. A supervised learning algorithm learns from labelled training
data, helps you to predict outcomes for unforeseen data.

Types of Supervised Learning – 1) Classification:

• In machine learning, classification refers to a predictive modelling problem where a class


label is predicted for a given example of input data.

• Examples of classification problems include:

ü Given an example, classify if it is spam or not.

ü Prediction of decease

ü Win-loss prediction of games

ü Prediction of Natural calamity like earthquakes, floods

Types of Supervised Learning – Regression:

the goal of regression model is to build a mathematical equation that defines y as a function of the x
variables. Examples.

• Demand forecasting in retails

• Sales Prediction for managers

• Price predication in the real estate

• Weather forecast

• Skill demand forecast in job market


Un-supervised Learning:

Unsupervised learning is a machine learning technique, where you do not need to supervise the
model. Instead, you need to allow the model to work on its own to discover information. It mainly
deals with the unlabelled data.

Unsupervised learning is a type of machine learning in which models are trained using unlabeled
dataset and allowed to acts on that data without any supervision”.

Why use Unsupervised Learning?

• Unsupervised learning is helpful for finding useful insights from the data.

• Unsupervised learning is much similar as a human learns to think by their own experience,
which makes it closer to the real AI.

• Unsupervised learning works on unlabeled and uncategorized data which makes


unsupervised learning more important.

• In real world, we do not always have input data with the corresponding output so to solve
such cases, we need unsupervised learning.

Types of Unsupervised Learning – Clustering:

Clustering is an unsupervised machine learning task. It involves automatically discovering natural


grouping in data. Unlike supervised learning (like predictive modelling),clustering algorithms only
interpret the input data and find natural groups or clusters in feature space.
Unsupervised Learning algorithms:

o K-means clustering

o KNN (k-nearest neighbours)

o Hierarchal clustering

o Anomaly detection

o Neural Networks

o Principle Component Analysis

o Independent Component Analysis

o Apriori algorithm

Singular value decomposition

Advantages of Unsupervised Learning:

o Unsupervised learning is used for more complex tasks as compared to supervised learning
because, in unsupervised learning, we don't have labelled input data.

o Unsupervised learning is preferable as it is easy to get unlabelled data in comparison to


labelled data.

Disadvantages of Unsupervised Learning

o Unsupervised learning is intrinsically more difficult than supervised learning as it does not
have corresponding output.

o The result of the unsupervised learning algorithm might be less accurate as input data is not
labelled, and algorithms do not know the exact output in advance.
Regression Analysis:

Regression analysis is used in stats to find trends in data.

Some examples of regression can be as:

Prediction of rain using temperature and other factors

Determining Market trends

Prediction of road accidents due to rash driving

Terminologies Related to the Regression Analysis:

Dependent Variable

Independent Variable

Outliers Multicollinearity

Underfitting and Overfitting

Why do we use Regression Analysis?

Ø It is used to find the trends in data.

Ø It helps to predict real/continuous values.

Ø By performing the regression, we can confidently determine the most important factor, the
least important factor, and how each factor is affecting the other factors
Linear Regression:

Some popular applications of linear regression are:

Ø Analysing trends and sales estimates

Ø Salary forecasting

Ø Real estate prediction

Ø Arriving at ETAs in traffic.


Finding the best fit line:

When working with linear regression, our main goal is to find the best fit line that means the error
between predicted values and actual values should be minimized. The best fit line will have the least
error.

Residuals: The distance between the actual value yi and predicted values is called residual.

Gradient Descent: A regression model uses gradient descent to update the coefficients of the line by
reducing the cost function.

Model Performance: The Goodness of fit determines how the line of regression fits the set
of observations. The process of finding the best model out of various models is called optimization.
It can be achieved by below method:

R-squared method:

o R-squared is a statistical method that determines the goodness of fit.

o It measures the strength of the relationship between the dependent and independent
variables on a scale of 0-100%.

o The high value of R-square determines the less difference between the predicted values and
actual values and hence represents a good model.

o It is also called a coefficient of determination, or coefficient of multiple determination for


multiple regression.

o It can be calculated from the below formula:

Regression Analysis…:
Simple Linear Regression:

We predict scores on one variable from the scores of a second variable.

Algorithm for Simple Linear Regression (SLR):

1. Import the libraries

2. Import the dataset

3. Splitting the data set into training set and testing set

4. Fitting Simple Linear Regression to the Training set

5. Predicting the Test set results

6. Visualising the Training set results

7. Visualising the Test set results


Ø Multiple Linear Regression is one of the important regression algorithms which models the
linear relationship between a single dependent continuous variable and more than one
independent variable. Example Prediction of CO2 emission based on engine size and number
of cylinders in a car.

Assumptions for Multiple Linear Regression:

o A linear relationship should exist between the Target and predictor variables.

o The regression residuals must be normally distributed.

o MLR assumes little or no multicollinearity (correlation between the independent variable)


in data.

How Multiple Linear Regression (MLR) is useful ?

1. To identify the strength of the effect that the independent variables have on a

dependent variable

2. To forecast effects or impacts of changes (GPA change w.r.t. IQ)

3. To guess the precise values / trends (price of gold after 6 months from now)

Significance of ‘Backword Elimination’ and ‘p-value:

 P value informs about significance of your results


 P value tells us how likely it is to get a result like this if the null hypothesis is true
 The p-value is a number between 0 and 1

 A low P value suggests that your sample provides enough evidence that you can reject
the null hypothesis for the entire population.

 The smaller the p-value, the higher the significance


Note: A null hypothesis is a type of hypothesis used in statistics that proposes that no
statistical significance exists in a set of given observations. The null hypothesis attempts to
show that no variation exists between variables or that a single variable is no different than
its mean.

Steps of Backward Elimination:

Step-1: Firstly, we need to select a significance level to stay in the model. (SL=0.05)

Step-2: Fit the complete model with all possible predictors/independent variables.

Step-3: Choose the predictor which has the highest P-value, such that.

• If P-value >SL, go to step 4.

1. Else Finish, and Our model is ready.

Step-4: Remove that predictor.

Step-5: Rebuild and fit the model with the remaining variables

Main Problem in Regression Analysis:


1) Multicollinearity

2) Heteroskedasticity

improving Accuracy of the Linear Regression Model:


High bias = low accuracy ( not close to real value)

High variance = low prediction ( values are scattered)

Low bias = high accuracy( close to real value)

Low variance = high prediction ( values are close to each other)

Need for Backward Elimination: An optimal Multiple Linear Regression model:


Algorithm for Multiple Linear Regression (MLR):

1. Import the libraries

2. Import the dataset

3. Encoding the categorical data

4. Splitting the dataset into the Training set and Test set

5. Fitting Multiple Linear Regression to the Training set

6. Predicting the Test set results

7. Building the optimal model using backword elimination

8. End
What is Model Selection ?

Model selection is the process of selecting one final machine learning model from among a
collection of candidate machine learning models for a training dataset.

• Model selection is a process that can be applied both Across different types of models (e.g.
logistic regression, SVM, KNN, etc.)

Model Selection vs Model Assessment:

• The process of evaluating a model’s performance is known as model assessment,whereas


the process of selecting the proper level of flexibility for a model is known as model
selection.

1. Relationship between the independent variable x and the dependent variable y is


modelled as an nth degree polynomial in x. Describe nonlinear phenomena. PLR is
considered to be a special case of multiple linear regression (MLR).
Algorithm for Polynomial Linear Regression (PLR):

1. Import the libraries

2. Import the dataset

3. Fitting the Linear Regression to the dataset

4. Fitting the Polynomial Regression to the dataset

5. Visualizing the Linear Regression results

6. Visualizing the Polynomial Regression results

7. End

Logistic regression may be used to predict the risk of developing a given disease (e.g. diabetes;
coronary heart disease), based on observed characteristics of the patient (age, sex, body mass
index, results of various blood tests, etc. Another example might be to predict whether an Indian
voter will vote BJP or Trinamool Congress or Congress, based on age, income, sex, state of
residence, votes in previous elections, etc.

Logistic Function (Sigmoid Function):


The value of the logistic regression must be between 0 and 1, which cannot go beyond this limit, so
it forms a curve like the "S" form. The S-form curve is called the Sigmoid function or the logistic
function. In logistic regression, we use the concept of the threshold value, which defines the
probability of either 0 or 1.

 The sigmoid function is a mathematical function used to map the predicted values to
probabilities.
Assumptions for Logistic Regression:
o The dependent variable must be categorical in nature.

o The independent variable should not have multi- collinearity.

Logistic Regression Equation:

The above equation is the final equation for Logistic Regression.

Type of Logistic Regression:


On the basis of the categories, Logistic Regression can be classified into three types:

• Binomial: In binomial Logistic regression, there can be only two possible types of the
dependent variables, such as 0 or 1, Pass or Fail, etc.

• Multinomial: In multinomial Logistic regression, there can be 3 or more possible unordered


types of the dependent variable, such as "cat", "dogs", or "sheep“

• Ordinal: In ordinal Logistic regression, there can be 3 or more possible ordered types of
dependent variables, such as "low", "Medium", or "High".

Algorithm for Logistic Regression (LR):


1. Import the libraries

2. Import the dataset

3. Splitting the dataset into the Training set and Test set

4. Perform Feature Scaling

5. Fitting Logistic Regression to the Training set

6. Predicting the Test set results

7. Making the Confusion Matrix

8. Visualising the Training set results

9. Visualising the Test set results

10. End

Logistic Regression (PLR) with some Mathematical background and case study:

The odds are defined as the probability that the event will occur divided by the probability that the
event will not occur. Unlike probability, the odds are not constrained to lie between 0 and 1, but can
take any value from zero to infinity.

Confusion Matrix:

A confusion matrix is a table that is often used to describe the performance of a classification model
(or "classifier") on a set of test data for which the true values are known.

• There are four possibilities with regards to the cricket match win/loss prediction

1) The model predicted win and the team won- TP-True Positive

2) The model predicted win and the team lost- FP-False Positive

3) The model predicted loss and the team won- FN-False Negative

4) The model predicted lost and the team lost- TN-True Negative
Decision Tree Classification Algorithm:

• Decision Tree is a Supervised learning technique

• In a Decision tree, there are two nodes, which are the Decision Node and Leaf Node

• The decisions or the test are performed on the basis of features of the given dataset.

• It is a graphical representation

• In order to build a tree, we use the CART algorithm

• A decision tree can contain categorical data (YES/NO) as well as numeric data.

Decision Tree Terminologies


 Root Node: Root node is from where the decision tree starts. It represents the entire
dataset, which further gets divided into two or more homogeneous sets.

 Leaf Node: Leaf nodes are the final output node, and the tree cannot be segregated further
after getting a leaf node.

 Splitting: Splitting is the process of dividing the decision node/root node into sub-nodes
according to the given conditions.

 Branch/Sub Tree: A tree formed by splitting the tree.

 Pruning: Pruning is the process of removing the unwanted branches from the tree.

 Parent/Child node: The root node of the tree is called the parent node, and other nodes are
called the child nodes.

How does the Decision Tree algorithm Work?

o Step-1: Begin the tree with the root node, says S, which contains the complete dataset.

o Step-2: Find the best attribute in the dataset using Attribute Selection Measure (ASM).

o Step-3: Divide the S into subsets that contains possible values for the best attributes.

o Step-4: Generate the decision tree node, which contains the best attribute.

o Step-5: Recursively make new decision trees using the subsets of the dataset created in step
-3. Continue this process until a stage is reached where you cannot further classify the
nodes and called the final node as a leaf node.

Attribute Selection Measures:

o Information Gain

o Gini Index

1. Information Gain:

o Information gain is the measurement of changes in entropy(Entropy: Entropy is a metric to


measure the impurity in a given attribute. It specifies randomness in data) after the
segmentation of a dataset based on an attribute.

o It calculates how much information a feature provides us about a class.

o According to the value of information gain, we split the node and build the decision tree.

o A decision tree algorithm always tries to maximize the value of information gain, and a
node/attribute having the highest information gain is split first

2. Gini Index:

o Gini index is a measure of impurity or purity used while creating a decision tree in the
CART(Classification and Regression Tree) algorithm.

o An attribute with the low Gini index should be preferred as compared to the high Gini index.
• It only creates binary splits, and the CART algorithm uses the Gini index to create binary
splits

Example:

K-Nearest Neighbor(K-NN) Algorithm:

o K-Nearest Neighbour is one of the simplest Machine Learning algorithms based on

Supervised Learning technique.

o K-NN algorithm assumes the similarity between the new case/data and available cases and
put the new case into the category that is most similar to the available categories.

o K-NN algorithm stores all the available data and classifies a new data point based on the
similarity. This means when new data appears then it can be easily classified into a well
suite category by using K- NN algorithm.

o K-NN algorithm can be used for Regression as well as for Classification but mostly it is used
for the Classification problems.

o It is also called a lazy learner algorithm because it does not learn from the training set
immediately instead it stores the dataset and at the time of classification, it performs an
action on the dataset.

o KNN algorithm at the training phase just stores the dataset and when it gets new data, then
it classifies that data into a category that is much similar to the new data.

How does K-NN work?


The K-NN working can be explained on the basis of the below algorithm:
o Step-1: Select the number K of the neighbors

o Step-2: Calculate the Euclidean distance of K number of neighbors

o Step-3: Take the K nearest neighbors as per the calculated Euclidean distance.

o Step-4: Among these k neighbors, count the number of the data points in each category.

o Step-5: Assign the new data points to that category for which the number of the neighbor is
maximum.

Step-6: model is ready

Advantages of KNN Algorithm:

o It is simple to implement.

o It is robust to the noisy training data

o It can be more effective if the training data is large.

Disadvantages of KNN Algorithm:

o Always needs to determine the value of K which may be complex some time.

o The computation cost is high because of calculating the distance between the data points for
all the training samples.

Applications of Naïve Bays Classifier:

 Real-time Prediction: Being a fast learning algorithm can be used to make predictions in
real-time as well. It can be used in real-time Predictions because Naïve Bayes Classifier is
an eager learner
 Multi Class Classification: It can be used for multi-class classification problems also.
• It is used for Credit Scoring.
• It is used in medical data classification.

Working of Naïve Bayes' Classifier:


Working of Naïve Bayes' Classifier can be understood with the help of the below example:

Suppose we have a dataset of weather conditions and corresponding target variable "Play". So using
this dataset we need to decide that whether we should play or not on a particular day according to
the weather conditions. So to solve this problem, we need to follow the below steps:

1. Convert the given dataset into frequency tables.

2. Generate Likelihood table by finding the probabilities of given features.

3. Now, use Bayes theorem to calculate the posterior probability. Problem: If the weather is
sunny, then the Player should play or not? Solution: To solve this, first consider the below
dataset:

Example:

Solution:
Clustering in Machine Learning:
 Clustering or cluster analysis is a machine learning technique

 It is an unsupervised learning method

 The clustering technique is commonly used for statistical data analysis

The clustering technique can be widely used in various tasks. Some most common uses of this
technique are:

o Market Segmentation

o Statistical data analysis

o Social network analysis

o Image segmentation

o Anomaly detection, etc.

Types of Clustering Methods:


1. Partitioning Clustering
2. Density-Based Clustering

3. Distribution Model-Based Clustering

4.Hierarchical Clustering

5. Fuzzy Clustering

Clustering Algorithms:
1. K-Means algorithm: The k-means algorithm is one of the most popular clustering algorithms.
It classifies the dataset by dividing the samples into different clusters of equal variances. The
number of clusters must be specified in this algorithm. It is fast with fewer computations
required, with the linear complexity of O(n).

2. Mean-shift algorithm: Mean-shift algorithm tries to find the dense areas in the smooth
density of data points. It is an example of a centroid-based model, that works on updating
the candidates for centroid to be the center of the points within a given region.

3. DBSCAN Algorithm: It stands for Density-Based Spatial Clustering of Applications with


Noise. It is an example of a density-based model similar to the mean-shift, but with some
remarkable advantages. In this algorithm, the areas of high density are separated by the
areas of low density. Because of this, the clusters can be found in any arbitrary shape.

4. Expectation-Maximization Clustering using GMM: This algorithm can be used as an


alternative for the k-means algorithm or for those cases where K-means can be failed. In
GMM, it is assumed that the data points are Gaussian distributed.

5. Agglomerative Hierarchical algorithm: The Agglomerative hierarchical algorithm performs


the bottom-up hierarchical clustering. In this, each data point is treated as a single cluster at
the outset and then successively merged. The cluster hierarchy can be represented as a
tree-structure.

6. Affinity Propagation: It is different from other clustering algorithms as it does not require to
specify the number of clusters. In this, each data point sends a message between the pair of
data points until convergence. It has O(N2T) time complexity, which is the main drawback of
this algorithm.

Applications of Clustering:
o In Identification of Cancer Cells
o In Search Engines
o Customer Segmentation
o In Biology: It is used in the biology stream to classify different species of plants and
animals using the image recognition technique
o In Land Use: The clustering technique is used in identifying the area of similar lands use
in the GIS database. This can be very useful to find that for what purpose the particular
land should be used, that means for which purpose it is more suitable
o Fraud Detection: Anomaly or fraud detection in the banking sector by identifying the
patterns of loan defaulters.

You might also like