Welcome to Scribd!

AIB Class Assignment PGP 25 160

Uploaded by

0% found this document useful (0 votes)

20 views16 pages

The document describes using a naive Bayes classifier on a dataset containing information about 32561 people to predict whether they earn more than $50,000 per year. It discusses importing necessary packages, reading in the CSV data, exploring missing values, cleaning and encoding the data, splitting it into training and test sets, training a Gaussian naive Bayes classifier model on the data, and evaluating the model's accuracy, true/false positives and negatives, and area under the ROC curve.

Original Description:

AIB

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

20 views16 pages

AIB Class Assignment PGP 25 160

Uploaded by

Piyush Sonawane

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 16

Search inside document

Artificial Intelligence for Business

Naïve Bayes Classifier

Submitted by:

Paras Nath Munda

PGP/25/160

Naive Bayes Classifier - Predict if a person earns more than $50,000.

● The Naive Bayes Classifier method is used in Jupyter Notebook to estimate if a person
earns more than $50,000 per year.
● Imported the Pandas data analytics package, NumPy for numerics, and matplotlib for
visualization.
● pd.read CSV reads the data from the CSV and shows it in the head.
● There are 32561 instances and 15 attributes in the data set.

● Displays the number of rows and columns

● Since the attributes we labelled as number, that is, 0,1,2…. and they were renamed like
“age”, “workclass” etc
● The code was run to find categorical variables fundamental to the classifier. It
identified 9 categorical variables and viewed the data frame for the same.
● Checked for the missing values by looking for a null.
● Identified the frequency of values in categorical variables in integer and floating point
numeric format to identify the missing values.
● Since in the data set null was not coded as NaN while as ?, it was replaced with NaN for
python to identify the missing values for each categorical variable.
● Data was split into training and test. The size of test data set is 30% while that of training
data is 70%.
● After exploring the missing values, the data was further cleaned.
● Missing values were replaced and added with new values in order to
remove the null values from the set.
● Data was encoded in numerical format. Intitially we had 14 columns but now we have
113 columns.
● Data set is fed to the Naïve Bayes Classifier to train the model. The type utilized here

is the Gaussian classifier.

● Predicted results help in identifying the accuracy of the data set.

● Naïve Bayes Classifier can be utilised to calculate Accuracy score, Total True Positives
(TP), True Negatives (TN), False Positives (FP), and False Negatives (FN).
● The same data could be visualised in the form of visual_confusion_matrix with seaborn
heatmap.
● The values of performance parameters, that is, accuracy, classification error,Precision,
Recall or Sensitivity are also calculating using the formulas.
● The curve of True Positive Rate vs False Positive Rate generated from the algorithm
depicts that AUC is greater than 0.7 making the model good.

Statistics-and-Probability - G11 - Quarter - 4 - Module - 1 - Test-of-Hypothesis
Document35 pages
Statistics-and-Probability - G11 - Quarter - 4 - Module - 1 - Test-of-Hypothesis
Vincent Villasante
85% (85)
Fat in The Diet and Mortality From Heart Disease - Yerushalmy & Hilleboe
Document12 pages
Fat in The Diet and Mortality From Heart Disease - Yerushalmy & Hilleboe
acolpo
No ratings yet
Statistical Analysis Plan
Document92 pages
Statistical Analysis Plan
himeshagrawal
No ratings yet
3 Marc D. Hauser-The Evolution of Communication (1997) PDF
Document762 pages
3 Marc D. Hauser-The Evolution of Communication (1997) PDF
horrity
No ratings yet
Machine Learning - Brief
Document12 pages
Machine Learning - Brief
الريس حمادة
No ratings yet
Machine Learning Assignment
Document13 pages
Machine Learning Assignment
b89410172
No ratings yet
Exp 3 Bi 30
Document7 pages
Exp 3 Bi 30
Smaranika Patil
No ratings yet
Data Set
Document82 pages
Data Set
Esse Epistles Blog
No ratings yet
ML Unit 1 Part 2
Document56 pages
ML Unit 1 Part 2
jkdprince3
No ratings yet
DM Ext QP Solution 2015-16
Document26 pages
DM Ext QP Solution 2015-16
Venkataramana Battula
No ratings yet
6 Easy Steps To Learn Naive Bayes Algorithm With Codes in Python and R
Document6 pages
6 Easy Steps To Learn Naive Bayes Algorithm With Codes in Python and R
Zahid Dar
No ratings yet
Healthy Food Suggestions Based On Blood Parameters Web Application
Document23 pages
Healthy Food Suggestions Based On Blood Parameters Web Application
kalampasha460
No ratings yet
Python, Data Science, and Unsupervised Learning
Document32 pages
Python, Data Science, and Unsupervised Learning
Hendri Karisma
No ratings yet
Ass-2 Ds
Document29 pages
Ass-2 Ds
Vedant Andhale
No ratings yet
Group A Assignment No2 Writeup
Document9 pages
Group A Assignment No2 Writeup
403 Chaudhari Sanika Sagar
No ratings yet
Data Preprocessing
Document38 pages
Data Preprocessing
Pradhana Riza
No ratings yet
ML & Statistics Unit 6
Document36 pages
ML & Statistics Unit 6
Soham Badjate
No ratings yet
Reportprediction of Employee Atrition Uisng Machine Learning
Document6 pages
Reportprediction of Employee Atrition Uisng Machine Learning
Areena Mahek
No ratings yet
Comparative Analysis of Classification Models On Income Prediction
Document5 pages
Comparative Analysis of Classification Models On Income Prediction
Editor IJRITCC
No ratings yet
Types of Machine Learning Algorithms
Document14 pages
Types of Machine Learning Algorithms
Vipin Rajput
No ratings yet
6 Different Ways To Compensate For Missing Values in A Dataset
Document12 pages
6 Different Ways To Compensate For Missing Values in A Dataset
9fd1343d1d
No ratings yet
SK Learn
Document9 pages
SK Learn
dome
No ratings yet
TB 969425740
Document16 pages
TB 969425740
guohong hu
No ratings yet
DSBDL Asg 2 Write Up
Document4 pages
DSBDL Asg 2 Write Up
sdaradeyt
No ratings yet
12 Useful Pandas Techniques in Python For Data Manipulation
Document19 pages
12 Useful Pandas Techniques in Python For Data Manipulation
xwpom2
100% (2)
DATA SCIENCE With DA, ML, DL, AI Using Python & R PDF
Document10 pages
DATA SCIENCE With DA, ML, DL, AI Using Python & R PDF
Saikumar Reddy
No ratings yet
Chapter 6 - Supervised Machine Learning-Classification
Document20 pages
Chapter 6 - Supervised Machine Learning-Classification
Rushana Khan
No ratings yet
ML GTU Solution
Document83 pages
ML GTU Solution
DIGVIJAY SINH CHAUHAN
No ratings yet
6 Different Ways To Compensate For Missing Values in A Dataset
Document6 pages
6 Different Ways To Compensate For Missing Values in A Dataset
icha
No ratings yet
Predict Diabetes Using Machine Learning Algorithm
Document5 pages
Predict Diabetes Using Machine Learning Algorithm
papan banik
No ratings yet
6 Different Ways To Compensate For Missing Values in A Dataset (Data Imputation With Examples)
Document10 pages
6 Different Ways To Compensate For Missing Values in A Dataset (Data Imputation With Examples)
emirav2
No ratings yet
Modelling and Error Analysis
Document8 pages
Modelling and Error Analysis
Atmuri Ganesh
No ratings yet
Ai Project Cycle
Document9 pages
Ai Project Cycle
Hardik Gulati
No ratings yet
TM Adaboost
Document12 pages
TM Adaboost
Alisha Samal
No ratings yet
03-Data Analysis
Document18 pages
03-Data Analysis
Debashish Deka
No ratings yet
Data Mining - Classification & Prediction
Document5 pages
Data Mining - Classification & Prediction
Tdx mentor
No ratings yet
Lapse Team
Document28 pages
Lapse Team
Laura Stephanie
No ratings yet
Chapter 2 Data Preprocessing
Document23 pages
Chapter 2 Data Preprocessing
liyu agye
No ratings yet
Advanced Data Analytics Assignment
Document6 pages
Advanced Data Analytics Assignment
Olwethu N Mahlathini (Lethu)
No ratings yet
6 Easy Steps To Learn Naive Bayes Algorithm (With Code in Python)
Document3 pages
6 Easy Steps To Learn Naive Bayes Algorithm (With Code in Python)
sprasadv
No ratings yet
DWM Exp 5,219
Document12 pages
DWM Exp 5,219
Mayur Pawade
No ratings yet
ML File
Document17 pages
ML File
Rishita Maheshwari
No ratings yet
DM Lab Cycle 2 1
Document10 pages
DM Lab Cycle 2 1
ispclx
No ratings yet
Surabhi Charu Project
Document16 pages
Surabhi Charu Project
sachin joshi
No ratings yet
02.data Preprocessing PDF
Document31 pages
02.data Preprocessing PDF
sunil
100% (1)
Missing Data Imputation Using Singular Value Decomposition
Document6 pages
Missing Data Imputation Using Singular Value Decomposition
Alamgir Mohammed
No ratings yet
Ai CH 2
Document43 pages
Ai CH 2
yoseffisseha12
No ratings yet
Assignment 1 - LP1
Document14 pages
Assignment 1 - LP1
bbad070105
No ratings yet
Data Warehousing and Mining: Ii Unit: Data Preprocessing, Language Architecture Concept Description
Document7 pages
Data Warehousing and Mining: Ii Unit: Data Preprocessing, Language Architecture Concept Description
ravi3754
No ratings yet
CE802 Pilot
Document2 pages
CE802 Pilot
prenithjohnsamuel
No ratings yet
Evaluating Model Performance Unit 6
Document46 pages
Evaluating Model Performance Unit 6
jahnabi122
No ratings yet
Dimensional Reduction in R
Document24 pages
Dimensional Reduction in R
Shil Shambharkar
No ratings yet
Kappa Statistic For Attribute MSA
Document1 page
Kappa Statistic For Attribute MSA
Vijay K Sharma
No ratings yet
Working With Data - Annotated
Document62 pages
Working With Data - Annotated
Hala M
No ratings yet
Day 2 - Data Management - Statistics
Document42 pages
Day 2 - Data Management - Statistics
NEVOX STUDIOS
No ratings yet
Predicting Credit Card Approvals
Document14 pages
Predicting Credit Card Approvals
as
100% (1)
In5490 Classification
Document85 pages
In5490 Classification
sherin joyson
No ratings yet
Intro ML Applications
Document26 pages
Intro ML Applications
swetank.raut22
No ratings yet
Experiment 6 AIM: Implementation of Decision-Tree and Naive-Based Classification-Based Algorithms. Theory
Document8 pages
Experiment 6 AIM: Implementation of Decision-Tree and Naive-Based Classification-Based Algorithms. Theory
Yashika Gupta
No ratings yet
Data Wrangling and Preprocessing
Document41 pages
Data Wrangling and Preprocessing
Archana Balikram
No ratings yet
Data Mining
Document12 pages
Data Mining
ساره عبد المجيد المراكبى عبد المجيد احمد Unknown
No ratings yet
Ass 2 DSBDL
Document29 pages
Ass 2 DSBDL
Anvi
No ratings yet
Reading 11 - Programming End-to-End Solution
Document13 pages
Reading 11 - Programming End-to-End Solution
lussy
No ratings yet
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
Big Data Predictive Analysis Using R
Document5 pages
Big Data Predictive Analysis Using R
Saul Leonidas Astocaza Antonio
No ratings yet
Mat 510 Week 11 Final Exam Latest Strayer
Document4 pages
Mat 510 Week 11 Final Exam Latest Strayer
coursehomework
No ratings yet
0329 Henssge Estimationofthetimesincedeath 2
Document9 pages
0329 Henssge Estimationofthetimesincedeath 2
Ranny
No ratings yet
Stat For Business Spring2017 Final Exam PSUT
Document5 pages
Stat For Business Spring2017 Final Exam PSUT
Zaina Rame
No ratings yet
Solutions Manual To Accompany Miller Freunds Probability and Statistics For Engineers 8th Edition 0321640772
Document24 pages
Solutions Manual To Accompany Miller Freunds Probability and Statistics For Engineers 8th Edition 0321640772
RebekahPattersontowgi
100% (46)
Data Presentation Assignment 2 Part III Taylor Therease
Document10 pages
Data Presentation Assignment 2 Part III Taylor Therease
api-286703035
No ratings yet
Normal Distribution
Document29 pages
Normal Distribution
ari
No ratings yet
Reciprocal Model
Document7 pages
Reciprocal Model
Palak Sharma
No ratings yet
Puh 641 Regular Exam 2017
Document4 pages
Puh 641 Regular Exam 2017
Ayo Alabi
No ratings yet
PDF Assessing The Accuracy of Remotely Sensed Data Principles and Practices Third Edition Kass Green Ebook Full Chapter
Document53 pages
PDF Assessing The Accuracy of Remotely Sensed Data Principles and Practices Third Edition Kass Green Ebook Full Chapter
harry.duran136
100% (3)
3 1 Backpropagation - Example
Document9 pages
3 1 Backpropagation - Example
anxo4spam
No ratings yet
Bose, A., & Chatterjee, S. (2018) - U-Statistics, Mm-Estimators and Resampling
Document181 pages
Bose, A., & Chatterjee, S. (2018) - U-Statistics, Mm-Estimators and Resampling
JORGE LUIS JUNIOR JIMENEZ GOMEZ
No ratings yet
Minitab Tip Sheet 15
Document5 pages
Minitab Tip Sheet 15
ixberis
No ratings yet
UCODE Lecture v2.3
Document45 pages
UCODE Lecture v2.3
khaled
No ratings yet
Nursing Dissertation Topics 2014
Document8 pages
Nursing Dissertation Topics 2014
Canada
100% (1)
Multiple Regression Tutorial 3
Document5 pages
Multiple Regression Tutorial 3
2plus5is7
100% (2)
MAT 3103: Computational Statistics and Probability Chapter 3: Probability
Document23 pages
MAT 3103: Computational Statistics and Probability Chapter 3: Probability
Shahriar Mahir
No ratings yet
Statistics Questions and Answers
Document10 pages
Statistics Questions and Answers
Ramesh Chandra Das
No ratings yet
Impact of Social Media Advertising On Consumer Buy
Document19 pages
Impact of Social Media Advertising On Consumer Buy
Kishan Mundhada
No ratings yet
CHAPTER 1 Basic Concepts in Audit Sampling PDF
Document10 pages
CHAPTER 1 Basic Concepts in Audit Sampling PDF
Jovelle Leonardo
No ratings yet
IRIS Species Predictor
Document8 pages
IRIS Species Predictor
IJRASETPublications
No ratings yet
Chapter 6 Normal
Document25 pages
Chapter 6 Normal
Shivani Priya
No ratings yet
FinalExam Fall2020 Updated GB213
Document11 pages
FinalExam Fall2020 Updated GB213
Jie Xu
No ratings yet
Set 1 - Descriptive Statistics+probability
Document3 pages
Set 1 - Descriptive Statistics+probability
kitesreddy
67% (3)
Week 9
Document47 pages
Week 9
kins
No ratings yet
ACCT 315 Data Analytics
Document5 pages
ACCT 315 Data Analytics
Monica Martinez
No ratings yet