Group Assignment: Machine Learning: TOPIC: Predicting of Census Data Using Machine Learning Techniques

Uploaded by

Simran Saha

0% found this document useful (0 votes)

28 views11 pages

Original Title

ML_Assignment_Grp13_new

Copyright

Available Formats

DOCX, PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as DOCX, PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

28 views11 pages

Group Assignment: Machine Learning: TOPIC: Predicting of Census Data Using Machine Learning Techniques

Uploaded by

Simran Saha

Copyright:

Available Formats

Download as DOCX, PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 11

Search inside document

GROUP ASSIGNMENT: MACHINE LEARNING

TOPIC: Predicting of Census Data using Machine Learning Techniques

Group Members:
Simran Saha
Srinidhi Narsimhan
Kalai Anbumani
Indushree AnandRaj
Problem Statement:
Census data is provided. Assignment task is to evaluate various methods of machine learning to
find out which works best for the given data and predict whether the person earns more than 50K
or not. Following points to be considered.
1. How to tackle missing values? Should these be tackled at all?
2. Is there any need to normalize data?
3. What can be said about class imbalance?
4. Which machine learning algorithm works the best without tuning?
5. How to determine the tuning parameters?

Solution:

Explanatory Data Analysis (EDA)

 Dependent Variable of this dataset is “Income”. It is a classification problem since there
are two classes “<=50” and “>50”.
 Dataset Split

 Split Dataset into Train (70%) and Test (30%) data.

Creating model using Azure ML

PART 1
1. Clean Missing Data

 We could see below Categorical variables have few missing values in the data set.
o Workclass
o Occupation
o Native.country
 Since the dataset is small, we cannot omit the missing data. Therefore, better option is
to replace the missing values with mode. The reason for replacing with mode is
because all the respective variables are categorical in nature.
2. Visualizing imbalance inTrain dataset
 Below percentage of imbalance in train dataset has been visualized
class1 “<=50” has 76% of data
class2 “>50%” has arount 24% of data in the train set
 We see a risk of small amount of Imbalance in data. However we can proceed
with our validation and later perform SMOTE and validate whether the results has
any difference or not.

3. Clip Values
 Performed 99% upper threshold clipping of values for the below Variable to
handle the outliers. We could see an outlier of sudden jump from 22000 to 99999
which is weird. Therefor in such case to avoid incorrect result in the model, we do
clipping of values.
o Capital.gain
4. Feature Engineering
 Instead of having two columns of capital gain and loss, we can take a difference
between them to get the net capital gain/loss which helps to reduce the number of
features in the dataset.

5. Normalize Data
 We need to scale all the numeric data in the dataset to get more reliable model.
Hence Scaling is performed on the newly added column of net capital gain/loss.
PART 2
1. The Normalized dataset of part1 is used as input in part
2. While Selecting the Columns from Dataset, we can eliminate “education.num” variable
as it has a strong correlation with the “education” variable. Also, elimination of this
categorical variable may not have much impact on the result as it is just a mapping
variable of “education”.

3. A Plain Simple Model is created using the Normalized dataset.

Comparison between different Model Performances

a) Two-Class Support Vector Model

Train Dataset
 SVM model give below ROC curve with an accuracy rate of 83.8%
Test Dataset
 The same model performance with test dataset gives ROC curve and Accuracy is
82.8%.
b) Two-Class Linear Regression Model

Train Dataset

 This model gives a better ROC curve with an accuracy rate of 84.7%

Test Dataset

 The same model performance with Test dataset gives an accuracy rate of 84.1%.
4. Tune the Model

 We need to tune the model to bring down the Sum of Squares error (SSE)
<Yet to do>

DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet
Data Mining Assignment Help
Document5 pages
Data Mining Assignment Help
Statistics Homework Solver
No ratings yet
GlobalLogic - Optimization Algorithms For Machine Learning
Document4 pages
GlobalLogic - Optimization Algorithms For Machine Learning
Kumar manickam
No ratings yet
Machine Learning Multiple Choice Questions - Free Practice Test
Document12 pages
Machine Learning Multiple Choice Questions - Free Practice Test
arafaliwijaya
100% (1)
ML Concept
Document3 pages
ML Concept
Vikash Rryder
No ratings yet
P-149 Final PPT
Document57 pages
P-149 Final PPT
Vijay rathod
No ratings yet
ML Final Project Report
Document8 pages
ML Final Project Report
Aditya Gupta
No ratings yet
Predicting Credit Card Approvals
Document14 pages
Predicting Credit Card Approvals
as
100% (1)
TB 969425740
Document16 pages
TB 969425740
guohong hu
No ratings yet
Data Interpretation
Document8 pages
Data Interpretation
Manu
No ratings yet
PCX - RepoHHHHHHHHHrt
Document13 pages
PCX - RepoHHHHHHHHHrt
Said Rahman
No ratings yet
Ranking Features Based On Predictive Power - Importance of The Class Labels
Document11 pages
Ranking Features Based On Predictive Power - Importance of The Class Labels
Juan
No ratings yet
Introduction To Dimensionality Reduction-1
Document16 pages
Introduction To Dimensionality Reduction-1
xavieranosike
No ratings yet
Interview Questions
Document2 pages
Interview Questions
rashmi
No ratings yet
7 محاضرات
Document36 pages
7 محاضرات
nnnn403010
No ratings yet
6 Different Ways To Compensate For Missing Values in A Dataset
Document12 pages
6 Different Ways To Compensate For Missing Values in A Dataset
9fd1343d1d
No ratings yet
Data Science Assignment 2
Document14 pages
Data Science Assignment 2
anigunasekara
No ratings yet
ML Model Paper 1 Solution-1
Document10 pages
ML Model Paper 1 Solution-1
VIKAS KUMAR
No ratings yet
Machine Learning Model
Document9 pages
Machine Learning Model
Sanjay Kumar
No ratings yet
Building Good Training Sets UNIT 1 PART2
Document46 pages
Building Good Training Sets UNIT 1 PART2
Aditya Sharma
No ratings yet
Data Prep
Document5 pages
Data Prep
Himanshu Shrivastava
No ratings yet
Machine Learning
Document9 pages
Machine Learning
Sanjay Kumar
No ratings yet
CS 461 - Fall 2021 - Neural Networks - Machine Learning
Document5 pages
CS 461 - Fall 2021 - Neural Networks - Machine Learning
Victor Ruto
No ratings yet
ML MU Unit 2
Document42 pages
ML MU Unit 2
Paulos K
100% (2)
Machine Learning Models: by Mayuri Bhandari
Document48 pages
Machine Learning Models: by Mayuri Bhandari
mayuri
No ratings yet
Machine Learning Interview Questions
Document8 pages
Machine Learning Interview Questions
Priya Koshta
No ratings yet
40 Interview Questions On Machine Learning - AnalyticsVidhya
Document21 pages
40 Interview Questions On Machine Learning - AnalyticsVidhya
Kaleab Tekle
100% (1)
Second Edited
Document25 pages
Second Edited
andrew ramarumo
No ratings yet
Electricity Load Forecasting - Intelligent
Document10 pages
Electricity Load Forecasting - Intelligent
karthikbollu
No ratings yet
Unit 4 Basics of Feature Engineering
Document33 pages
Unit 4 Basics of Feature Engineering
Yash Desai
No ratings yet
ML Project Shivani Pandey
Document49 pages
ML Project Shivani Pandey
Shubhangi Pandey
100% (2)
Q No. 1 1.1machine Learning:: Machine Learning Is The Study of Computer Algorithms That Improve Automatically
Document10 pages
Q No. 1 1.1machine Learning:: Machine Learning Is The Study of Computer Algorithms That Improve Automatically
sajid
No ratings yet
6 Different Ways To Compensate For Missing Values in A Dataset
Document6 pages
6 Different Ways To Compensate For Missing Values in A Dataset
icha
No ratings yet
Data Mininig Project
Document28 pages
Data Mininig Project
Karthikeyan Manimaran
67% (3)
Develop A Program To Implement Data Preprocessing Using
Document19 pages
Develop A Program To Implement Data Preprocessing Using
Fucker Jamun
No ratings yet
ML Question Answer
Document21 pages
ML Question Answer
Madhulina Pal
No ratings yet
CE802 Report
Document7 pages
CE802 Report
prenithjohnsamuel
No ratings yet
Types of Machine Learning
Document63 pages
Types of Machine Learning
williamkin14
No ratings yet
Machine Learning
Document30 pages
Machine Learning
ShreyaPrakash
100% (1)
AP19110010110 Project Report
Document9 pages
AP19110010110 Project Report
Phani Bhushan
No ratings yet
Q. (A) What Are Different Types of Machine Learning? Discuss The Differences
Document12 pages
Q. (A) What Are Different Types of Machine Learning? Discuss The Differences
Hassan Saddiqui
No ratings yet
Credit Risk Modelling
Document20 pages
Credit Risk Modelling
Durgesh Kinnerkar
100% (1)
Machine Learning: Lecture 13: Model Validation Techniques, Overfitting, Underfitting
Document26 pages
Machine Learning: Lecture 13: Model Validation Techniques, Overfitting, Underfitting
Md Fazle Rabby
100% (2)
Machine Learning Algorithm
Document8 pages
Machine Learning Algorithm
Shivaprakash D M
No ratings yet
Deep Learning Workout: I. Selected Architecture
Document3 pages
Deep Learning Workout: I. Selected Architecture
ArmadaDefrente
No ratings yet
Validation Over Under Fir Unit 5
Document6 pages
Validation Over Under Fir Unit 5
Harpreet Singh Bagga
No ratings yet
Capstone Project - Credit Card Fraud Prediction - Alexandre Daltro
Document15 pages
Capstone Project - Credit Card Fraud Prediction - Alexandre Daltro
DKN
No ratings yet
Quiz-1 Notes
Document4 pages
Quiz-1 Notes
Shruthi Shetty
No ratings yet
CaseStudy ClassificationandEvaluation
Document4 pages
CaseStudy ClassificationandEvaluation
vettithala
No ratings yet
Assignment 2
Document4 pages
Assignment 2
abdulrehman liaqat
No ratings yet
Machine Learning Notes
Document5 pages
Machine Learning Notes
Vinay
No ratings yet
Ai-Ml in 5G Challenge Report
Document11 pages
Ai-Ml in 5G Challenge Report
Usha Chandrakala
No ratings yet
Interview Questions For DS & DA (ML)
Document66 pages
Interview Questions For DS & DA (ML)
pratikmovie999
100% (1)
Interview Questions On Machine Learning
Document22 pages
Interview Questions On Machine Learning
Praveen
100% (4)
50 Advanced Machine Learning Questions - ChatGPT
Document18 pages
50 Advanced Machine Learning Questions - ChatGPT
Lily Lauren
100% (1)
Workflow of A Machine Learning Project
Document12 pages
Workflow of A Machine Learning Project
ashish
No ratings yet
Tarptask 4
Document10 pages
Tarptask 4
Kartik Sharma
No ratings yet
Industrial Training Presentation Presented To:-Mrs Shilpi Khanna Presented By: - Shveta
Document25 pages
Industrial Training Presentation Presented To:-Mrs Shilpi Khanna Presented By: - Shveta
shveta074
No ratings yet
The Nature of Feature Selection Technique
Document7 pages
The Nature of Feature Selection Technique
Rasika Dilshan
No ratings yet
Business Intelligence Using Text Mining Assignment
Document1 page
Business Intelligence Using Text Mining Assignment
WH
No ratings yet
GA - Web and Social Media Analytics
Document9 pages
GA - Web and Social Media Analytics
Simran Saha
No ratings yet
Analysis:: Sample Size 30 The Summary Is Described As Below
Document4 pages
Analysis:: Sample Size 30 The Summary Is Described As Below
Simran Saha
No ratings yet
Group Assignment - Predictive Modelling
Document23 pages
Group Assignment - Predictive Modelling
Simran Saha
No ratings yet
Group Assignment - Data Mining
Document28 pages
Group Assignment - Data Mining
Simran Saha
No ratings yet
Advanced Statistics Group Assignment
Document7 pages
Advanced Statistics Group Assignment
Simran Saha
No ratings yet
Advanced Statistics Group Assignment
Document7 pages
Advanced Statistics Group Assignment
Simran Saha
No ratings yet
GA - Web and Social Media Analytics
Document9 pages
GA - Web and Social Media Analytics
Simran Saha
No ratings yet
Python Programming (Int 213) : Report For House Price Prdiction
Document23 pages
Python Programming (Int 213) : Report For House Price Prdiction
Sachin Patel
No ratings yet
Behavioral Interventions - 2022 - Wong - Systematic Review of Acquisition Mastery Criteria and Statistical Analysis of
Document20 pages
Behavioral Interventions - 2022 - Wong - Systematic Review of Acquisition Mastery Criteria and Statistical Analysis of
Fabiana Pessoa
No ratings yet
Types of Designs
Document16 pages
Types of Designs
maaaaaaaaaan
No ratings yet
Exam
Document11 pages
Exam
Mahendra Singh Meena
50% (2)
Simple Linear Regression
Document8 pages
Simple Linear Regression
Kathlen Mae Marollano
No ratings yet
Nowak 2017
Document8 pages
Nowak 2017
Bogdan Marian Sorohan
No ratings yet
Specification Error
Document12 pages
Specification Error
Sarita Arora
No ratings yet
Chapter 15 - ANCOVA - 2019
Document20 pages
Chapter 15 - ANCOVA - 2019
Ti SI
No ratings yet
Soc Sci Reviewer-Research
Document21 pages
Soc Sci Reviewer-Research
Wennie Dela Rosa
No ratings yet
The University of Chicago Press
Document50 pages
The University of Chicago Press
Azizan Shafie
No ratings yet
Pillars of Strategic Cost Management
Document25 pages
Pillars of Strategic Cost Management
Jovanne Luag
No ratings yet
Prognosis and Prognostic Research - Developing A Prognostic Model - The BMJ
Document10 pages
Prognosis and Prognostic Research - Developing A Prognostic Model - The BMJ
Adriana Bispo
No ratings yet
Caffeine Consumption and Self-Assessed Stress, Anxiety, and Depression in Secondary School Children
Document12 pages
Caffeine Consumption and Self-Assessed Stress, Anxiety, and Depression in Secondary School Children
Peggy Lalita Fedora
No ratings yet
Flight Price Predection 2
Document6 pages
Flight Price Predection 2
TEB475RATNAKARGARJE
No ratings yet
Dr. Kiran Kumar Agrawal (Associate Professor)
Document22 pages
Dr. Kiran Kumar Agrawal (Associate Professor)
Hermione Granger
No ratings yet
JMI-SOHEL99Mgt Issues5
Document30 pages
JMI-SOHEL99Mgt Issues5
Saurabh Saran Satsangi
No ratings yet
Logistic Reg
Document87 pages
Logistic Reg
Siddhant Sanjeev
No ratings yet
Practical Research 2 Quarter 1 - Module 1 Nature of Inquiry and Research
Document10 pages
Practical Research 2 Quarter 1 - Module 1 Nature of Inquiry and Research
Oppang Oppang
No ratings yet
Lesson 5.1 - Validity
Document14 pages
Lesson 5.1 - Validity
gwen awas
No ratings yet
Original PDF Foolproof Guide To Statistics Using Ibm Spss Custom Edition PDF
Document42 pages
Original PDF Foolproof Guide To Statistics Using Ibm Spss Custom Edition PDF
gordon.hatley642
100% (35)
Marketing Research: Ninth Edition
Document44 pages
Marketing Research: Ninth Edition
Sara Dassouli
No ratings yet
The Effect of Internal Audit On Organizational Performance: An Empirical Exploration of Selected Jordanian Banks
Document8 pages
The Effect of Internal Audit On Organizational Performance: An Empirical Exploration of Selected Jordanian Banks
Fermwell
No ratings yet
Uji Reabilitas, Path Analysis, Asumsi Klasik, Sem Pls
Document13 pages
Uji Reabilitas, Path Analysis, Asumsi Klasik, Sem Pls
Murdhiyati Hilma Purba
No ratings yet
7939 17751 1 SM
Document15 pages
7939 17751 1 SM
hendrik silalahi
No ratings yet
Active Vs Passive
Document23 pages
Active Vs Passive
AhmedShah
No ratings yet
1b - Research Design
Document35 pages
1b - Research Design
Nezlyn D'Souza
No ratings yet
JINAN BANNA - Calcium Intake Among Early Adolescents
Document13 pages
JINAN BANNA - Calcium Intake Among Early Adolescents
Azizah Sabrina 19 057
No ratings yet
Week 1 and 2 - Scope of Science & The Scientific Method
Document9 pages
Week 1 and 2 - Scope of Science & The Scientific Method
Archie Flores
No ratings yet
Salary Prediction
Document4 pages
Salary Prediction
MOHIT KUMAR
No ratings yet
The Giving Back Statistic: A Comparative Analysis of The Factors That Dictate The Chances of Alumni Donations For Their Universities
Document20 pages
The Giving Back Statistic: A Comparative Analysis of The Factors That Dictate The Chances of Alumni Donations For Their Universities
Jose Guerrero
No ratings yet