Census Income Project

Uploaded by

Amit Phulwani

0% found this document useful (0 votes)

256 views4 pages

Original Title

Census Income project

Copyright

Available Formats

DOCX, PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as DOCX, PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

256 views4 pages

Census Income Project

Uploaded by

Amit Phulwani

Copyright:

Available Formats

Download as DOCX, PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 4

Search inside document

Census Income Project

support@intellipaat.com

+91-7022374614

US: 1-800-216-8930 (Toll-Free)

In this project, you are going to work on the The "Census Income" data set from the UCI
Machine Learning Repository that contains the income information for over 48,000
individuals taken from the 1994 US census.

For more details about this dataset, you can refer to the following
link: https://archive.ics.uci.edu/ml/datasets/census+income

Problem Statement:
In this project, initially you need to preprocess the data and then develop an understanding
of different features of the data by performing exploratory analysis and creating
visualizations.Further, after having sufficient knowledge about the attributes you will perform
a predictive task of classification to predict whether an individual makes over 50K a year or
less,by using different Machine Learning Algorithms.

Tasks to be done:

1. Data Preprocessing:
a) Replace all the missing values with NA.
b) Remove all the rows that contain NA values.

2. Data Manipulation:
a) Extract the “education” column and store it in “census_ed” .
b) Extract all the columns from “age” to “relationship” and store it in “census_seq”.
c) Extract the column number “5”, “8”, “11” and store it in “census_col”.
d) Extract all the male employees who work in state-gov and store it in “male_gov”.
e) Extract all the 39 year olds who either have a bachelor's degree or who are
native of the United States and store the result in “census_us”.
f) Extract 200 random rows from the “census” data frame and store it in “census_200”.
g) Get the count of different levels of the “workclass” column.
h) Calculate the mean of the “capital.gain” column grouped according to “workclass”.
i) Create a separate dataframe with the details of males and females from the census
data that has income more than 50,000.
j) Calculate the percentage of people from the United States who are private employees
and earn less than 50,000 annually.
k) Calculate the percentage of married people in the census data.
l) Calculate the percentage of high school graduates earning more than 50,000
annually.

3. Linear Regression:
a) Build a simple linear regression model as follows:
● Divide the dataset into training and test sets in 70:30 ratio.
● Build a linear model on the test set where the dependent variable is
“hours.per.week” and the independent variable is “education.num”.
● Predict the values on the train set and find the error in prediction.
● Find the root-mean-square error (RMSE).

4. Logistic Regression:
a) Build a simple logistic regression model as follows:

● Divide the dataset into training and test sets in 65:35 ratio.
● Build a logistic regression model where the dependent variable is
“X”(yearly income) and the independent variable is “occupation”.
● Predict the values on the test set.
● Build a confusion matrix and find the accuracy.

b)Build a multiple logistic regression model as follows:

● Divide the dataset into training and test sets in 80:20 ratio.
● Build a logistic regression model where the dependent variable is
“X”(yearly income) and independent variables are “age”, “workclass”, and
“education”.
● Predict the values on the test set.
● Build a confusion matrix and find the accuracy.

5. Decision Tree:
a) Build a decision tree model as follows:

● Divide the dataset into training and test sets in 70:30 ratio.
● Build a decision tree model where the dependent variable is “X”(Yearly Income)
and the rest of the variables as independent variables.
● Predict the values on the test set.
● Build a confusion matrix and calculate the accuracy.

6. Random Forest:
a) Build a random forest model as follows:

● Divide the dataset into training and test sets in 80:20 ratio.
● Build a random forest model where the dependent variable is “X”(Yearly Income)
and the rest of the variables as independent variables and number of trees as
300.
● Predict values on the test set
● Build a confusion matrix and calculate the accuracy
7. For this problem, use the population dataset, and perform the following:

1. EDA on the time series to find trends and seasonality.

2. Forecast the population on the given dataset for the next 6 months.

Application of Soft Computing Question Paper 21 22
Document3 pages
Application of Soft Computing Question Paper 21 22
Mohd Ayan Beg
No ratings yet
BCA Practical Exercises
Document12 pages
BCA Practical Exercises
anon_355047517
100% (1)
Unit 2 Fod
Document27 pages
Unit 2 Fod
it hod
No ratings yet
Web Technology
Document13 pages
Web Technology
Prabha Krishnan
No ratings yet
Sample Exam Intake 27
Document5 pages
Sample Exam Intake 27
Eyad El Refaie
No ratings yet
AIML Unit Wise Question Bank
Document4 pages
AIML Unit Wise Question Bank
Nss Aissms Coe Pune
No ratings yet
C Programming Array Guide
Document27 pages
C Programming Array Guide
Radhika Ajadka
No ratings yet
Record 2022-23
Document92 pages
Record 2022-23
kawsar
No ratings yet
"Google Driverless Car": A Technical Seminar Report On
Document15 pages
"Google Driverless Car": A Technical Seminar Report On
Sumana Angel
No ratings yet
18 Cps 13
Document3 pages
18 Cps 13
Suhas Hanjar
No ratings yet
Object-Oriented Programming Lab Manual R
Document63 pages
Object-Oriented Programming Lab Manual R
KLR CET
No ratings yet
Data Mining and Business Intelligence Lab Manual
Document52 pages
Data Mining and Business Intelligence Lab Manual
Vinay Jokare
No ratings yet
Balagurusamy Programming in C#
Document22 pages
Balagurusamy Programming in C#
Sayan Chatterjee
No ratings yet
Employee table queries and PL/SQL programs
Document13 pages
Employee table queries and PL/SQL programs
Karthik Nsp
50% (2)
What Is Difference Between Backtracking and Branch and Bound Method
Document4 pages
What Is Difference Between Backtracking and Branch and Bound Method
Mahi On D Rockzz
67% (3)
BD Problem Solving - I
Document2 pages
BD Problem Solving - I
Rishab kumar
No ratings yet
What Is GKS (The Graphical Kernel System)
Document6 pages
What Is GKS (The Graphical Kernel System)
Pram Arora
No ratings yet
Credit Card Fraud Detection
Document14 pages
Credit Card Fraud Detection
Snehal Jain
100% (1)
CS6611 Mobile Application Development Lab
Document54 pages
CS6611 Mobile Application Development Lab
Indumathy Mayuranathan
No ratings yet
Structured Programming Approach
Document1 page
Structured Programming Approach
Dreamtech Press
No ratings yet
Question Paper
Document8 pages
Question Paper
Gopi P
No ratings yet
Visvesvaraya Technological University: Lung Cancer Segmentation and Detection Using Machine Learning
Document67 pages
Visvesvaraya Technological University: Lung Cancer Segmentation and Detection Using Machine Learning
Soni Tiwari
No ratings yet
A2020 DAY 24 APPS Java 15 April 2020
Document19 pages
A2020 DAY 24 APPS Java 15 April 2020
lok esh
No ratings yet
Python Assignment
Document4 pages
Python Assignment
NITIN RAJ
No ratings yet
Tech C, C#,java
Document471 pages
Tech C, C#,java
Hema Malani
No ratings yet
Web Technology Lab Manual
Document30 pages
Web Technology Lab Manual
Tanmay Mukherjee
No ratings yet
Study Material CS XII For High Achievers
Document124 pages
Study Material CS XII For High Achievers
Aarush Mahajan
No ratings yet
Database queries and operations
Document5 pages
Database queries and operations
Kaushik C
100% (1)
Java EE694c
Document8 pages
Java EE694c
archana
100% (1)
Data Analytics Using Python Lab Manual
Document8 pages
Data Analytics Using Python Lab Manual
Nayana Gowda
0% (1)
Module II
Document22 pages
Module II
johnsonjoshal5
No ratings yet
Manual / Guidelines: Web Technology Laboratory With Mini Project-17Csl77
Document31 pages
Manual / Guidelines: Web Technology Laboratory With Mini Project-17Csl77
Nishath Humaira
No ratings yet
CS8261 C Programming Lab Manual With All 18 Programs
Document42 pages
CS8261 C Programming Lab Manual With All 18 Programs
Kousheek Vinnakoti
No ratings yet
C# programs demonstrating command line arguments, boxing unboxing, operator overloading, arrays, exceptions
Document75 pages
C# programs demonstrating command line arguments, boxing unboxing, operator overloading, arrays, exceptions
Chandan Panda
No ratings yet
CS3361 Data Science Lab Manual (II CYS)
Document40 pages
CS3361 Data Science Lab Manual (II CYS)
rajananandh72138
100% (1)
Practical-8: Create An Application in Android To Find Factorial of A Given Number Using Onclick Event
Document4 pages
Practical-8: Create An Application in Android To Find Factorial of A Given Number Using Onclick Event
hdtrs
No ratings yet
Web Technologies Lab Manual
Document97 pages
Web Technologies Lab Manual
Harsha Nihanth
71% (7)
SQL (Part 2)
Document44 pages
SQL (Part 2)
Tara Mecca Luna
No ratings yet
Stack Organization
Document6 pages
Stack Organization
Sadaf Rasheed
No ratings yet
DBMS LAB Manual Final22
Document74 pages
DBMS LAB Manual Final22
naveed ahmed
No ratings yet
Question Bank ASQL
Document2 pages
Question Bank ASQL
Meenakshi Vaylure Paul
No ratings yet
Week 1 Solution
Document13 pages
Week 1 Solution
pba1 pba1
No ratings yet
Final PPT
Document39 pages
Final PPT
Nitesh Kumar
No ratings yet
ADBMS Lab Manual Aug-Dec 2017 - ByMe
Document9 pages
ADBMS Lab Manual Aug-Dec 2017 - ByMe
Dennise Shughni
No ratings yet
Nishant Web Tech
Document40 pages
Nishant Web Tech
Nishant Mishra
No ratings yet
Tricky C Questions For GATE
Document114 pages
Tricky C Questions For GATE
Kirti Kumar
No ratings yet
Anna University: Chennai - 600 025
Document6 pages
Anna University: Chennai - 600 025
bala
No ratings yet
Bank Management System
Document19 pages
Bank Management System
jnvbagudi32
100% (2)
DBMS Unit 3
Document98 pages
DBMS Unit 3
Varun Gupta
No ratings yet
Java
Document47 pages
Java
sneha ajay nair
No ratings yet
Practical Syllabus
Document7 pages
Practical Syllabus
Pooran Sen
No ratings yet
System Programming Lab Report 04
Document12 pages
System Programming Lab Report 04
Abdul Rehman
No ratings yet
Compiler Design Programs
Document59 pages
Compiler Design Programs
raj
No ratings yet
Practical Question C#
Document4 pages
Practical Question C#
Vitthal Patil
100% (1)
Ai PDF
Document85 pages
Ai PDF
Romaldo
No ratings yet
COMPUTER PRACTICAL FILE (Class 12)
Document49 pages
COMPUTER PRACTICAL FILE (Class 12)
ishika gupta
No ratings yet
Pincer Search Algo
Document8 pages
Pincer Search Algo
aaksb123
No ratings yet
GA LAB Manual-1
Document33 pages
GA LAB Manual-1
Sundar Shahi Thakuri
No ratings yet
Trackpad Ver. 2.0 Class 5
From Everand
Trackpad Ver. 2.0 Class 5
Nidhi Arora
No ratings yet
Network Management System A Complete Guide - 2020 Edition
From Everand
Network Management System A Complete Guide - 2020 Edition
Gerardus Blokdyk
Rating: 5 out of 5 stars
5/5 (1)
Data Analytics End Term QP
Document3 pages
Data Analytics End Term QP
Amit Phulwani
No ratings yet
Schedule For Our Course
Document7 pages
Schedule For Our Course
Amit Phulwani
No ratings yet
Chi-Square Test For Feature Selection in Machine Learning
Document15 pages
Chi-Square Test For Feature Selection in Machine Learning
Amit Phulwani
No ratings yet
Bank Marketing Case-Study:: Relevant Information About The Data
Document1 page
Bank Marketing Case-Study:: Relevant Information About The Data
Amit Phulwani
100% (1)
Data Visualization Assignment: +91-7022374614 US: 1-800-216-8930 (Toll Free)
Document2 pages
Data Visualization Assignment: +91-7022374614 US: 1-800-216-8930 (Toll Free)
Amit Phulwani
No ratings yet
Prob Dist Function - Class
Document3 pages
Prob Dist Function - Class
Amit Phulwani
No ratings yet
Probability & Statistics Facts, Formulae and Information: Www. .Ac - Uk
Document4 pages
Probability & Statistics Facts, Formulae and Information: Www. .Ac - Uk
Amit Phulwani
No ratings yet
Maths YouTube Video List
Document39 pages
Maths YouTube Video List
Amit Phulwani
No ratings yet
Online Student Enrollment System
Document29 pages
Online Student Enrollment System
ajidonson
No ratings yet
Elliott Wave Watching Part 2 Rev 1-2600717
Document9 pages
Elliott Wave Watching Part 2 Rev 1-2600717
GateshNdegwah
No ratings yet
Enr Plan
Document40 pages
Enr Plan
Shelai Lucero
No ratings yet
Cellular Tower: Bayombong, Nueva Vizcaya
Document17 pages
Cellular Tower: Bayombong, Nueva Vizcaya
Monster Pocky
No ratings yet
Airbag Inflation: The Airbag and Inflation System Stored in The Steering Wheel. See More
Document5 pages
Airbag Inflation: The Airbag and Inflation System Stored in The Steering Wheel. See More
Shivankur Hinge
No ratings yet
Appendix A - Status Messages: Armed. Bad Snubber Fuse
Document9 pages
Appendix A - Status Messages: Armed. Bad Snubber Fuse
이민재
No ratings yet
GC 1999 03 Minas Brethil
Document5 pages
GC 1999 03 Minas Brethil
Erszebeth
No ratings yet
SAP HCM - Default Wage Types - Info Type 0008
Document6 pages
SAP HCM - Default Wage Types - Info Type 0008
cjherrera2
No ratings yet
The Mini-Guide To Sacred Codes and Switchwords
Document99 pages
The Mini-Guide To Sacred Codes and Switchwords
Jason Alex
100% (9)
Vsphere Storage PDF
Document367 pages
Vsphere Storage PDF
Ngo Van Truong
No ratings yet
The Essential Guide To Data in The Cloud:: A Handbook For Dbas
Document20 pages
The Essential Guide To Data in The Cloud:: A Handbook For Dbas
Ines Plantak
No ratings yet
Ndeb Bned Reference Texts 2019 PDF
Document11 pages
Ndeb Bned Reference Texts 2019 PDF
navroop bajwa
No ratings yet
Biokimia - DR - Maehan Hardjo M.biomed PHD
Document159 pages
Biokimia - DR - Maehan Hardjo M.biomed PHD
Herry
No ratings yet
Audiology Dissertation
Document4 pages
Audiology Dissertation
PaperWritingHelpOnlineUK
100% (1)
Telegram Log File Details Launch Settings Fonts OpenGL
Document5 pages
Telegram Log File Details Launch Settings Fonts OpenGL
The nofrizal
No ratings yet
(Nano and Energy) Gavin Buxton - Alternative Energy Technologies - An Introduction With Computer Simulations-CRC Press (2014) PDF
Document302 pages
(Nano and Energy) Gavin Buxton - Alternative Energy Technologies - An Introduction With Computer Simulations-CRC Press (2014) PDF
marcos
No ratings yet
Aptamers in HIV Research Diagnosis and Therapy
Document11 pages
Aptamers in HIV Research Diagnosis and Therapy
miki
No ratings yet
Problem solving and decision making in nutrition postgraduate studies
Document55 pages
Problem solving and decision making in nutrition postgraduate studies
teklay
No ratings yet
Rectangle Stabbing
Document49 pages
Rectangle Stabbing
Apurba Das
No ratings yet
MDF 504 Investment Analysis and Portfolio Management 61017926
Document3 pages
MDF 504 Investment Analysis and Portfolio Management 61017926
komalkataria2003
No ratings yet
Bohemian Flower Face Mask by Maya Kuzman
Document8 pages
Bohemian Flower Face Mask by Maya Kuzman
Dorca Morales
No ratings yet
Insert - Elecsys Anti-HBs II - Ms - 05894816190.V2.En
Document4 pages
Insert - Elecsys Anti-HBs II - Ms - 05894816190.V2.En
yantu
No ratings yet
Gen - Biology 2 Module 2
Document12 pages
Gen - Biology 2 Module 2
Camille Castrence Caranay
No ratings yet
5e3 Like Approach
Document1 page
5e3 Like Approach
disse_deti
No ratings yet
FSA&V Case Study
Document10 pages
FSA&V Case Study
Al Qur'an
No ratings yet
Drainage Manual: State of Florida Department of Transportation
Document78 pages
Drainage Manual: State of Florida Department of Transportation
ghoyarbide
No ratings yet
Rick Cobby
Document4 pages
Rick Cobby
rickcobby
No ratings yet
Cambridge English Business Vantage Sample Paper 1 Listening v2
Document5 pages
Cambridge English Business Vantage Sample Paper 1 Listening v2
salma23478
No ratings yet
Nad C541i
Document37 pages
Nad C541i
api-3837207
No ratings yet
Topicwise Analysis For Last 10 Years
Document20 pages
Topicwise Analysis For Last 10 Years
VAMC
No ratings yet