You are on page 1of 28

BDA PROJECT Presented by :

GROUP 3 Surabhi

Shubham

Shinjan

Saket

Ujjwal

Lohith
WHAT IS MACHINE
LEARNING?
Machine learning is an application of artificial intelligence
(AI) that provides systems the ability to automatically
learn and improve from experience without being
explicitly programmed. 
Machine learning focuses on the development of
computer programs that can access data and use it learn
for themselves.
TYPES OF MACHINE
LEARNING
• A supervised learning algorithm learns from labeled training
data, helps you to predict outcomes for unforeseen data.
Supervised • E.g.: Predicting the time you will reach home depends on
Learning Weather conditions, time of day, route chosen, etc.

• Unsupervised learning is a machine learning technique,


where you do not need to supervise the model. Instead, you
Unsupervis need to allow the model to work on its own to discover
ed information. It mainly deals with the unlabeled data.
Learning

• Reinforcement learning is a type of learning that is based


Reinforcem on interaction with the environment.
ent
Learning
MODELS IN MACHINE
LEARNING
Gaussia
Support Rando
n
Vector m Naïve
Machine
Forest Bayes

Multinomia
l Naïve Logistic Decision
Regression Tree
Bayes
WHAT IS THE PROJECT
ABOUT?
• Classify the data set using various models
• Compare the score of the classification of various
models
• Extract the model that gives the best score
SUPPORT VECTOR MACHINE
Key Concept: Training data enters optimization problem in the form of dot
products of pairs of points.

• support vectors
weights associated with data points are zero except for those points nearest the separator
(i.e., the support vectors)

• kernel function K(xi,xj)


function that can be applied to pairs of points to evaluate dot products in the
corresponding (higher-dimensional) feature space F (without having to directly compute
F(x) first)

Efficient training and complex functions!


SUPPORT VECTOR MACHINES: DECISION
BOUNDARY
BAGGING

The basic idea:


• randomly draw datasets with
replacement from the
• training data, each sample the same size
as the original training set
RANDOM FOREST CLASSIFIER

• Random forest classifier, an extension to bagging which uses de-correlated


trees.
NAÏVE BAYES CLASSIFIERS

• Main aim is that every pair of features being classified is independent of each other.

• To predict the likihood that an event will occur given evidence that present in your data.

• Assumes that the presence (or absence) of a particular feature of a class is unrelated to

the presence (or absence) of any other feature

• Types of Naïve Bayes Models- : Multinomial and Gaussian


MULTINOMIAL

• Used when features are categorical or continuous and describes discrete frequency counts.
• simple Naive Bayes would model a document as the presence and absence of particular
words, Multinomial Naive Bayes explicitly models the word counts and adjusts the
underlying calculations to deal with in

GAUSSIAN

• Used for making predictions from normally distributed features


• Used when the features have continuous values. It's also assumed that all the features are
following a Gaussian distribution
LOGISTIC REGRESSION

• A statistical approach to classification of categorical outcome variables


• Similar to multiple linear regression, but can be used when the outcome has
more than two values
• Uses data to produce a probability that a given case will fall into one of two
classes (e.g., flights that leave on time/delayed, companies that will/will not
default on bonds, employees who will/will not be promoted)

12
MULTIPLE LINEAR REGRESSION

• One of the most widely-used tools from classical statistics


• Used widely in natural and social sciences, more often for explanatory than
predictive modeling
• To determine if specific variables influence outcome variable

• Answers questions like:


• Do the data support the claim that women are paid less than men in comparable jobs?
• Is there evidence that price discounts and rebates lead to higher long-term sales?
• Do data support the idea that firms that outsource manufacturing overseas have
higher profits?

13
LINEAR REGRESSION – MACHINE LEARNING
MULTIPLE LINEAR REGRESSION – MACHINE
LEARNING
THE LOGISITIC REGRESSION MODEL
Let p denote P[y = 1] = P[Success].

This quantity will increase with the value of x.

p
The ratio: is called the odds ratio
1 p
This quantity will also increase with the value of
x, ranging from zero to infinity.
 p 
The quantity: ln  
 1 p 
is called the log odds ratio
EXAMPLE: ODDS RATIO, LOG ODDS
RATIO
Suppose a die is rolled:
Success = “roll a six”, p = 1/6
p 1 1
1
The odds ratio  61  6

1 p 1 6 5
6 5

The log odds ratio


 p  1
ln    ln    ln  0.2   1.69044
 1 p  5
THE LOGISITIC REGRESSION MODEL

Assumes the log odds ratio is linearly


related to x.
 p 
i. e. : ln     0  1 x
 1 p 
In terms of the odds ratio
p 0  1 x
e
1 p
THE LOGISITIC REGRESSION MODEL

Solving for p in terms x.


p 0  1 x
e
1 p
pe  0  1 x
 1 p
 0  1 x 0  1 x
p  pe e
 0  1 x
e
or p  0  1 x
1 e
LOGISTIC REGRESSION – MACHINE LEARNING
LOGISTIC FUNCTION – SIGMOID FUNCTION
Decision Tree
DECISION TREE INDUCTION: DECISION BOUNDARY
THE IRIS FLOWER DATASET

Sepal /Petal
Iris setosa Iris versicolor Iris virginica
OUTPUT
  model best_score best_params

0 svm 0.980000 {'C': 1, 'kernel': 'rbf'}

1 random_forest 0.966667 {'n_estimators': 10}

2 logistic_regression 0.966667 {'C': 5}


3 naive_bayes_gaussian 0.953333 {}

4 naive_bayes_multinomial 0.953333 {}

5 decision_tree 0.960000 {'criterion': 'gini'}


Use of Radial Basis Function in the Kernel of SVM
DECISION TREE: GINI INDEX

You might also like