Welcome to Scribd!

Mid-Term Assignment

Uploaded by

0% found this document useful (0 votes)

5 views3 pages

The data was checked for missing values and duplicates, which were not found. Unknown values were converted to NAs. The 'contact' and 'poutcome' columns contained 96% NAs and were dropped to prevent significant data loss. The cleaned data set had 43,193 observations across 15 variables. The binary 'y' variable was converted to numeric (1=yes, 0=no). The data was split into training and testing sets. Testing found the data was 88% skewed towards 'no', so under-sampling was used to balance the data and reduce the total rows to 8,014 for analysis.

Original Description:

Original Title

Mid-Term Assignment.docx

Copyright

Available Formats

DOCX, PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as DOCX, PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

5 views3 pages

Mid-Term Assignment

Uploaded by

Anil Kumar Nayak

Copyright:

Available Formats

Download as DOCX, PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 3

Search inside document

Data Preparation

The data set was checked for missing data and duplicate cells – none was found.

sum(duplicated(banks))
[1] 0
> sum(!complete.cases(banks))
[1] 0

Though there are no missing values in the dataset, there are many unknows in it. So, all the ‘unknown’
data are converted to NA.

From the above result, it can be observed that the contact and poutcome column contain nearly 96% of
NA value. Cleaning NA value from these 2 columns will lead to significant data loss form the data set. So,
these two columns can be dropped from the analysis.

banks1<-subset (banks, select = -c(contact,poutcome))

After deleting these 2 columns, delete all the rows that has missing values and recheck the cleaned
dataset.

The new file was saved in csv format and then the “banks.clean” data set was renamed again to ‘banks’
for convenience.

DATA STRUCTURE OF CLEANED DATA SET

The cleaned data set (banks) has 43193 observations with 15 variables – the variables “contact” and
“poutcome” being dropped.
The response variable “y” which is a categorical variable with two levels “Yes” and “No” need to be
converted into binary series with “yes = 1” and “no = 0” for further analysis.

Splitting the Data Set

Before proceeding to analysis, the data set was split into 2 part [training data and testing data]. Training
data will be used to build the model and the model will be tested on the testing data for verification.
Testing for Imbalance

When tested for imbalance, it was found that the data set is skewed towards ‘no’ for almost 88% of
data.

To balance the data set for better accuracy in prediction, we need to analyse equal proportion of ‘yes’
and ‘no’ in our model. So, to achieve this, we need to under sample/oversample the data set.

Under-sampling has been done in the present case because there are sufficient data for analysis and it
will be easier to analyze less no of data. (total rows with over-sampling = 61096; total rows with under-
sampling = 8014).

Practical Engineering, Process, and Reliability Statistics
From Everand
Practical Engineering, Process, and Reliability Statistics
Mark Allen Durivage
No ratings yet
Data Mining Project DSBA PCA Report Final
Document21 pages
Data Mining Project DSBA PCA Report Final
indraneel120
No ratings yet
Introduction To Business Statistics Through R Software: Software
From Everand
Introduction To Business Statistics Through R Software: Software
Editor IJSMI
No ratings yet
Principal Component Analysis
Document13 pages
Principal Component Analysis
Shil Shambharkar
No ratings yet
Random Sample Consensus: Robust Estimation in Computer Vision
From Everand
Random Sample Consensus: Robust Estimation in Computer Vision
Fouad Sabry
No ratings yet
Sajjad DS
Document97 pages
Sajjad DS
Hey Buddy
100% (2)
Churn Assignment
Document11 pages
Churn Assignment
Sanjeet Behera
No ratings yet
4823 Dsejournal
Document129 pages
4823 Dsejournal
Hey Buddy
No ratings yet
ML Final Project Report
Document8 pages
ML Final Project Report
Aditya Gupta
No ratings yet
Excel Tools & Techniques: September 2017
Document5 pages
Excel Tools & Techniques: September 2017
Muhammad Irka Irfa D
No ratings yet
Kunal DS
Document92 pages
Kunal DS
Vipul Gupta
No ratings yet
Data Normalization
Document2 pages
Data Normalization
Dania Dallah
No ratings yet
Unit - IV - DIMENSIONALITY REDUCTION AND GRAPHICAL MODELS
Document59 pages
Unit - IV - DIMENSIONALITY REDUCTION AND GRAPHICAL MODELS
Indumathy Paranthaman
No ratings yet
ML Practical File
Document43 pages
ML Practical File
Pankaj Singh
100% (1)
Module 2 Part2
Document16 pages
Module 2 Part2
Divya P
No ratings yet
Statistical Analysis 3: Paired T-Test: Research Question Type
Document4 pages
Statistical Analysis 3: Paired T-Test: Research Question Type
Retno Tri Astuti Ramadhana
No ratings yet
DM Ext QP Solution 2015-16
Document26 pages
DM Ext QP Solution 2015-16
Venkataramana Battula
No ratings yet
Machine Learning Project Report
Document65 pages
Machine Learning Project Report
Abhishek Abhi
No ratings yet
Vijaya ML
Document26 pages
Vijaya ML
Vijayalakshmi Palaniappan
83% (6)
Education - Post 12th Standard - CSV
Document11 pages
Education - Post 12th Standard - CSV
Zohaib Imam
88% (16)
Dimensional Reduction in R
Document24 pages
Dimensional Reduction in R
Shil Shambharkar
No ratings yet
Tidy Data
Document21 pages
Tidy Data
Xhantus
No ratings yet
BSChem-Statistics in Chemical Analysis PDF
Document6 pages
BSChem-Statistics in Chemical Analysis PDF
KENT BENEDICT PERALES
No ratings yet
Analytics Advanced Assignment Mubassir Surve
Document7 pages
Analytics Advanced Assignment Mubassir Surve
Mubassir Surve
No ratings yet
Quality Control, Basic Control Charts
Document20 pages
Quality Control, Basic Control Charts
malyduzy
0% (1)
Chandigarh Group of Colleges College of Engineering Landran, Mohali
Document47 pages
Chandigarh Group of Colleges College of Engineering Landran, Mohali
tanvi wadhwa
No ratings yet
Education - Post 12th Standard - CSV
Document11 pages
Education - Post 12th Standard - CSV
Ruhee's Kitchen
No ratings yet
Bitmap Index vs. B-Tree Index: Which and When?: Published 2005
Document29 pages
Bitmap Index vs. B-Tree Index: Which and When?: Published 2005
Deepak Chaudhry
No ratings yet
Data Science Program With SONAR Data
Document11 pages
Data Science Program With SONAR Data
Swapnil Saurav
No ratings yet
Statistical Data Analysis
Document9 pages
Statistical Data Analysis
Atul Tripathi
No ratings yet
Data Analysis Using SPSS: Research Workshop Series
Document86 pages
Data Analysis Using SPSS: Research Workshop Series
Muhammad Asad Ali
No ratings yet
ML Project Shivani Pandey
Document49 pages
ML Project Shivani Pandey
Shubhangi Pandey
100% (2)
Principal Component Analysis Primer
Document15 pages
Principal Component Analysis Primer
Ari David Paul
100% (1)
Analysis and Prediction of House Prices by Linear Regression Model
Document91 pages
Analysis and Prediction of House Prices by Linear Regression Model
2001 Since
No ratings yet
Missing Data Imputation Using Singular Value Decomposition
Document6 pages
Missing Data Imputation Using Singular Value Decomposition
Alamgir Mohammed
No ratings yet
LDA KNN Logistic
Document29 pages
LDA KNN Logistic
shruti gujar
100% (1)
PM Projec2 - SOBAC
Document38 pages
PM Projec2 - SOBAC
Soba C
No ratings yet
FAQ - ReCell
Document5 pages
FAQ - ReCell
Nkechi Koko
No ratings yet
09 Exploration
Document14 pages
09 Exploration
déborah_rosales
No ratings yet
ML Unit 2
Document41 pages
ML Unit 2
abhijit kate
No ratings yet
Great Learning Predictive Modelling Project
Document12 pages
Great Learning Predictive Modelling Project
rameshj16708
No ratings yet
XSTK Câu hỏi
Document19 pages
XSTK Câu hỏi
imaboneofmysword
No ratings yet
Data Cleansing Using R
Document10 pages
Data Cleansing Using R
Daniel N Sherine Foo
0% (1)
Stat Quiz
Document79 pages
Stat Quiz
Janice Bulao
No ratings yet
DM Group Assignment
Document23 pages
DM Group Assignment
Kritima
No ratings yet
Step 4 Exploring Data
Document17 pages
Step 4 Exploring Data
MR Minaya
No ratings yet
Statistics Using Excel 2007
Document36 pages
Statistics Using Excel 2007
Óscar de Brito
100% (1)
Step 4: Exploring Data: 6.1 A First Glimpse at The Data
Document17 pages
Step 4: Exploring Data: 6.1 A First Glimpse at The Data
daniel
No ratings yet
Escriptive Tatistics Pplications: Pavan Kumar A
Document12 pages
Escriptive Tatistics Pplications: Pavan Kumar A
naresh darapu
No ratings yet
Pca Tutorial
Document11 pages
Pca Tutorial
preethihp
No ratings yet
Presenting The Results From Factor Analysis
Document2 pages
Presenting The Results From Factor Analysis
Cristina Valentina
No ratings yet
Solution 1
Document6 pages
Solution 1
prakshi
No ratings yet
Assistant Variables Control Charts
Document19 pages
Assistant Variables Control Charts
Khyle Laurenz Duro
No ratings yet
Process Capability Study 1500
Document6 pages
Process Capability Study 1500
Siddhesh Mane
No ratings yet
Ecotrix
Document8 pages
Ecotrix
zuhanshaik
No ratings yet
Example Report
Document22 pages
Example Report
Trần Thảo
No ratings yet
Tera Data
Document14 pages
Tera Data
Sagar
No ratings yet
6 Basic Statistical Tools: There Are Lies, Damn Lies, and Statistics...... (Anon.)
Document44 pages
6 Basic Statistical Tools: There Are Lies, Damn Lies, and Statistics...... (Anon.)
Bhagwati Shukla
No ratings yet
Karn Completed Data of Employee Salary
Document8 pages
Karn Completed Data of Employee Salary
arjitkumargosh
No ratings yet
Business Report Project Machine Learning Rupesh Kumar DSBA-A5-21C-2021
Document77 pages
Business Report Project Machine Learning Rupesh Kumar DSBA-A5-21C-2021
Rupesh Gaur
No ratings yet
Bank Rpubs
Document24 pages
Bank Rpubs
Anil Kumar Nayak
No ratings yet
Detailed Project Report: Construction of Greenfield Airport
Document9 pages
Detailed Project Report: Construction of Greenfield Airport
Anil Kumar Nayak
No ratings yet
Detailed Project Report: Construction of Greenfield Airport
Document9 pages
Detailed Project Report: Construction of Greenfield Airport
Anil Kumar Nayak
No ratings yet
Greenfield Airport Development in India: A Case Study of Bangalore International Airport
Document31 pages
Greenfield Airport Development in India: A Case Study of Bangalore International Airport
Anil Kumar Nayak
100% (1)
Expansion Bangalore
Document41 pages
Expansion Bangalore
Anil Kumar Nayak
No ratings yet
Statistics & Probability, Grades 5 - 12
From Everand
Statistics & Probability, Grades 5 - 12
Myrl Shireman
Rating: 5 out of 5 stars
5/5 (2)
Mental Math: How to Develop a Mind for Numbers, Rapid Calculations and Creative Math Tricks (Including Special Speed Math for SAT, GMAT and GRE Students)
From Everand
Mental Math: How to Develop a Mind for Numbers, Rapid Calculations and Creative Math Tricks (Including Special Speed Math for SAT, GMAT and GRE Students)
Joseph White
No ratings yet
Math Word Problems For Dummies
From Everand
Math Word Problems For Dummies
Mary Jane Sterling
No ratings yet
Quantum Physics: A Beginners Guide to How Quantum Physics Affects Everything around Us
From Everand
Quantum Physics: A Beginners Guide to How Quantum Physics Affects Everything around Us
Carl Weston
Rating: 4.5 out of 5 stars
4.5/5 (3)
Basic Math & Pre-Algebra For Dummies
From Everand
Basic Math & Pre-Algebra For Dummies
Mark Zegarelli
Rating: 3.5 out of 5 stars
3.5/5 (6)
Limitless Mind: Learn, Lead, and Live Without Barriers
From Everand
Limitless Mind: Learn, Lead, and Live Without Barriers
Jo Boaler
Rating: 4 out of 5 stars
4/5 (6)
Build a Mathematical Mind - Even If You Think You Can't Have One: Become a Pattern Detective. Boost Your Critical and Logical Thinking Skills.
From Everand
Build a Mathematical Mind - Even If You Think You Can't Have One: Become a Pattern Detective. Boost Your Critical and Logical Thinking Skills.
Albert Rutherford
Rating: 5 out of 5 stars
5/5 (1)
Counting in Circlemaths
From Everand
Counting in Circlemaths
Robert Michael Taylor
No ratings yet
Pre-Calculus Workbook For Dummies
From Everand
Pre-Calculus Workbook For Dummies
Mary Jane Sterling
Rating: 4.5 out of 5 stars
4.5/5 (2)
Geometry For Dummies
From Everand
Geometry For Dummies
Mark Ryan
Rating: 4.5 out of 5 stars
4.5/5 (4)
Infographics, Grade 2
From Everand
Infographics, Grade 2
Carson Dellosa Education
No ratings yet
Algebra - The Very Basics
From Everand
Algebra - The Very Basics
Metin Bektas
Rating: 5 out of 5 stars
5/5 (6)
Maths Problem Solving Year 4
From Everand
Maths Problem Solving Year 4
Catherine Yemm
No ratings yet
Math, Grade 7
From Everand
Math, Grade 7
Katie Kee Daughtrey
Rating: 4.5 out of 5 stars
4.5/5 (3)
Pre-Calculus For Dummies
From Everand
Pre-Calculus For Dummies
Mary Jane Sterling
No ratings yet
Math, Grade 4
From Everand
Math, Grade 4
Carson Dellosa Education
No ratings yet
Algebra II Workbook For Dummies
From Everand
Algebra II Workbook For Dummies
Mary Jane Sterling
Rating: 4 out of 5 stars
4/5 (3)
Basic Math & Pre-Algebra Workbook For Dummies with Online Practice
From Everand
Basic Math & Pre-Algebra Workbook For Dummies with Online Practice
Mark Zegarelli
Rating: 4 out of 5 stars
4/5 (2)
Let's Learn! Blends and Digraphs
From Everand
Let's Learn! Blends and Digraphs
Lorenz Educational Press
No ratings yet
Calculus Made Easy: Being a Very-Simplest Introduction to Those Beautiful Methods of Reckoning Which are Generally Called by the Terrifying Names of the Differential Calculus and the Integral Calculus
From Everand
Calculus Made Easy: Being a Very-Simplest Introduction to Those Beautiful Methods of Reckoning Which are Generally Called by the Terrifying Names of the Differential Calculus and the Integral Calculus
Silvanus Phillips Thompson
Rating: 4.5 out of 5 stars
4.5/5 (2)
Maths Problem Solving Year 6
From Everand
Maths Problem Solving Year 6
Catherine Yemm
Rating: 5 out of 5 stars
5/5 (1)
Math Workshop, Grade K: A Framework for Guided Math and Independent Practice
From Everand
Math Workshop, Grade K: A Framework for Guided Math and Independent Practice
Jennifer B. Stith
Rating: 5 out of 5 stars
5/5 (1)
Mathematical Mindsets: Unleashing Students' Potential through Creative Math, Inspiring Messages and Innovative Teaching
From Everand
Mathematical Mindsets: Unleashing Students' Potential through Creative Math, Inspiring Messages and Innovative Teaching
Jo Boaler
Rating: 4.5 out of 5 stars
4.5/5 (21)
Calculus For Dummies
From Everand
Calculus For Dummies
Mark Ryan
Rating: 3.5 out of 5 stars
3.5/5 (3)
The Handy Math Answer Book
From Everand
The Handy Math Answer Book
Patricia Barnes-Svarney
Rating: 4 out of 5 stars
4/5 (6)
A Mathematician's Lament: How School Cheats Us Out of Our Most Fascinating and Imaginative Art Form
From Everand
A Mathematician's Lament: How School Cheats Us Out of Our Most Fascinating and Imaginative Art Form
Paul Lockhart
Rating: 5 out of 5 stars
5/5 (5)
Probability For Dummies
From Everand
Probability For Dummies
Deborah J. Rumsey
Rating: 3 out of 5 stars
3/5 (8)
Interactive Math Notebook Resource Book, Grade 6
From Everand
Interactive Math Notebook Resource Book, Grade 6
Schyrlet Cameron
No ratings yet
Lattice Labyrinth Tessellations
From Everand
Lattice Labyrinth Tessellations
David Mitchell
Rating: 5 out of 5 stars
5/5 (1)