Welcome to Scribd!

Problem 2: Logistic Regression and LDA: Head of The Dataframe

Uploaded by

0% found this document useful (0 votes)

8 views12 pages

The document describes using logistic regression and linear discriminant analysis (LDA) to predict whether employees will opt for a holiday package based on their characteristics. The models achieve around 55% accuracy on both the training and test sets. While the accuracy is similar for both models, LDA can be used since it works with small datasets. The analysis found employees around age 50 with a salary of around $50k are most likely to opt for a package. The summary recommends targeting older employees by adding comfort features and higher-paid employees with luxury/private stay options to improve sales.

Original Description:

Original Title

Predictive_Modelling_Ques_2_Solution.docx

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

8 views12 pages

Problem 2: Logistic Regression and LDA: Head of The Dataframe

Uploaded by

saarang K

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 12

Search inside document

Problem 2: Logistic Regression and LDA

You are hired by a tour and travel agency which deals in selling holiday packages. You are provided
details of 872 employees of a company. Among these employees, some opted for the package and some
didn't. You have to help the company in predicting whether an employee will opt for the package or not
on the basis of the information given in the data set. Also, find out the important factors on the basis of
which the company will focus on particular employees to sell their packages.

Solution –

Firstly the required packages are loaded, and set the working directory. Load the CSV file into a
dataframe. To validate whether the data has loaded fetch for the top 5 or 10 rows of the data set.

Head of the Dataframe

Shape of the Data Set - There are 872 rows and 8 attributes. There are integer and object data types in
the data. No null values are present in the data.
Description/ Summary of all attributes

Except for the ‘salary’ column, no high variations can be seen in the data set (indicating any presence of
outliers), which we will be resolving before forming any models.

We will now check how the data is distributed across a various attributes, and for duplicates (if any).
From the above we see that while 54% are not interested in the holiday package, 45% are positively
skewed towards picking the package.

Data pattern across various attributes

Taking the salary attribute to consideration we can see from the below while performing univariate
analysis that it is rightly skewed –
Performing bivariate analysis by considering attributes like age, salary to see how the pattern varies -
There is a cluster of people whose salary is around 50k and have opted for a package, while the density
seems to reduce as the salary goes up. In salary the trend where package is decided is seen more in the
mid level ages rather than in the 20’s or in the 60’s.

Pairplot for all the data columns

From the above 2 plots we can see that there is no major correlation among the data variables.

Before proceeding with the train-test split on data, we first resolve the outliers so that the data is
uniform –

After encoding the categorical variables, data set looks as below –

Since the variables are now encoded, we can split the data into train and test set (70 – 30), and
implement the Logistic regression model on the split data.

Logistic Regression -

Test Data Predictions –

Training Data Predictions -
Modelling with Linear Discriminant Analysis –
The accuracy in both the models on the training as well as the test set is just above 55%, which is roughly
the same proportion as the class 0 observations in the dataset.

Inference –

Logistic regression and linear discriminant analyses are multivariate statistical methods which can
be used for the evaluation of the associations between various covariates and a categorical outcome.

LDA can also be implemented with small data sets, and hence in this case even though the yield of
accuracy from both models is same we can use the LDA model.
Per the problem we need to identify whether the employee would opt for a holiday package or not. By
using some attributes like salary, age, education (which show a considerate variation) we have observed
that a certain aged people whose salary is around 50k prefer to opt for the package.

To recommend, the focus/ target is that the holiday packages should be made reachable to the older age
group as well, and for the same additional comfort features can be added, or instead of making a
package more sport/adventure oriented, it can be simplified.

For the salary group having higher pay a private stay/ sophisticated vacation options can be provided.
Making any of these alteration by keeping the packages customizable to an extent can be of benefit.

Project Submission Predictive Modelling - Logistic Regression and LDA
Document29 pages
Project Submission Predictive Modelling - Logistic Regression and LDA
ankitbhagat
No ratings yet
Data Mining Project
Document22 pages
Data Mining Project
Ranadip Guha
No ratings yet
Advanced Statistics - Project - 16052021
Document9 pages
Advanced Statistics - Project - 16052021
vansh gupta
No ratings yet
Machine Learning Business Report
Document60 pages
Machine Learning Business Report
shorya
74% (53)
SAP TCode Cheat Sheet Includes a large list of the most common SAP Tcodes used in the areas of Security, Basis, Change Management, Order To Cash, Procure To Pay, Inventory Management, Financial Accounting, Fixed Assets, Auditing Information
Document10 pages
SAP TCode Cheat Sheet Includes a large list of the most common SAP Tcodes used in the areas of Security, Basis, Change Management, Order To Cash, Procure To Pay, Inventory Management, Financial Accounting, Fixed Assets, Auditing Information
jcasselman01
100% (3)
Predicting Credit Card Approvals
Document14 pages
Predicting Credit Card Approvals
as
100% (1)
Compensation & Benefits
Document9 pages
Compensation & Benefits
mzakif
No ratings yet
ML Interview Questions PDF
Document20 pages
ML Interview Questions PDF
nandex777
100% (4)
Total Rewards Program - Pay Structure
Document36 pages
Total Rewards Program - Pay Structure
Ilkontisa Halima
No ratings yet
Answer Report: Data Mining
Document32 pages
Answer Report: Data Mining
Chetan Sharma
No ratings yet
Logistic Regression and Lda
Document27 pages
Logistic Regression and Lda
Karthikeyan Manimaran
75% (4)
FFT128 Project
Document70 pages
FFT128 Project
Bảo Ngọc Lê
No ratings yet
Employee Attrition Prediction
Document21 pages
Employee Attrition Prediction
user user
100% (1)
Got2000 o Mes e
Document210 pages
Got2000 o Mes e
Flávio Henrique Vicente
No ratings yet
Problem 2 - Logistic Regression and LDA
Document24 pages
Problem 2 - Logistic Regression and LDA
saarang K
No ratings yet
Travel Agency Package
Document26 pages
Travel Agency Package
KATHIRVEL S
No ratings yet
Analytics Advanced Assignment Mubassir Surve
Document7 pages
Analytics Advanced Assignment Mubassir Surve
Mubassir Surve
No ratings yet
Lecture 6: Modeling, Evaluation, and Visualization
Document14 pages
Lecture 6: Modeling, Evaluation, and Visualization
Gabrielle Joshebed Abarico
No ratings yet
Thesis Using Multiple Linear Regression
Document7 pages
Thesis Using Multiple Linear Regression
tmexyhikd
100% (2)
Business Report SMDM Bhushan
Document18 pages
Business Report SMDM Bhushan
Raibhush
No ratings yet
Data Mining - Classification & Prediction
Document5 pages
Data Mining - Classification & Prediction
Tdx mentor
No ratings yet
Project: Advanced Statistics: Post Graduate Program in Data Science and Business Analytics
Document34 pages
Project: Advanced Statistics: Post Graduate Program in Data Science and Business Analytics
Thakur Kriti
No ratings yet
### Data Exploration: 'Yes' 'No' 'Agency' 'Direct' 'Employee Referral' 'Yes' 'No'
Document6 pages
### Data Exploration: 'Yes' 'No' 'Agency' 'Direct' 'Employee Referral' 'Yes' 'No'
Varshini Kandikatla
100% (1)
Handout 3
Document24 pages
Handout 3
Satyanarayana Areti
No ratings yet
Dissertation Using Logistic Regression
Document6 pages
Dissertation Using Logistic Regression
BuyCheapPapersSingapore
100% (1)
Reportprediction of Employee Atrition Uisng Machine Learning
Document6 pages
Reportprediction of Employee Atrition Uisng Machine Learning
Areena Mahek
No ratings yet
Advanced Statistics Project Report
Document20 pages
Advanced Statistics Project Report
rajesh
No ratings yet
Assignment1 2020
Document6 pages
Assignment1 2020
Adam Master
No ratings yet
Validation Over Under Fir Unit 5
Document6 pages
Validation Over Under Fir Unit 5
Harpreet Singh Bagga
No ratings yet
Research Paper
Document6 pages
Research Paper
Negro, Gwyn Steffani
No ratings yet
CSL0777 L08
Document29 pages
CSL0777 L08
Konkobo Ulrich Arthur
No ratings yet
CHAPTER 3-4 (Reviewer)
Document50 pages
CHAPTER 3-4 (Reviewer)
daenille beldua
No ratings yet
Unit III 1
Document21 pages
Unit III 1
mananrawat537
No ratings yet
DEI Using Technology Model
Document18 pages
DEI Using Technology Model
Jaan Mukherjee
No ratings yet
Assumptions of Multiple Linear Regression
Document18 pages
Assumptions of Multiple Linear Regression
Dr. Krishan K. Pandey
No ratings yet
Last Project For Math 1040
Document15 pages
Last Project For Math 1040
api-303044832
No ratings yet
Advanced Statistics - Business Report - Dheerendra Gupta
Document24 pages
Advanced Statistics - Business Report - Dheerendra Gupta
Dheeraj
No ratings yet
US Census Income 1
Document18 pages
US Census Income 1
rajeshpndt01
No ratings yet
ISM Assignment 2
Document48 pages
ISM Assignment 2
sridevi04
No ratings yet
Statistics
Document87 pages
Statistics
Linna Zhu
No ratings yet
EDA-IR Assignments
Document10 pages
EDA-IR Assignments
Nithish Gunasekaran
No ratings yet
London School of Commerce (LSC) : Name: Anika Thasin Binti Course Title: QTB
Document13 pages
London School of Commerce (LSC) : Name: Anika Thasin Binti Course Title: QTB
anika
No ratings yet
Cit 901
Document1 page
Cit 901
COLLETA OWINO
No ratings yet
2
Document12 pages
2
Bhaskaran Balamurali
No ratings yet
Spss Coursework Help
Document8 pages
Spss Coursework Help
fzdpofajd
100% (2)
AS All Combined Project Report
Document13 pages
AS All Combined Project Report
Parthesh Roy Tewary
No ratings yet
Qualireg
Document9 pages
Qualireg
Hira Mehmood
No ratings yet
1st Unit Notes
Document22 pages
1st Unit Notes
Jazz
No ratings yet
Machine Learning
Document6 pages
Machine Learning
Pravin Sakpal
No ratings yet
HR Analytics Solved Questions
Document9 pages
HR Analytics Solved Questions
匿匿
No ratings yet
Hiring Procss
Document6 pages
Hiring Procss
Tim Kansi
100% (1)
Job Evaluation Methods
Document3 pages
Job Evaluation Methods
gurudeep25
No ratings yet
Sayan Pal Business Report Advance Statistics Assignment PDF
Document13 pages
Sayan Pal Business Report Advance Statistics Assignment PDF
Sayan Pal
No ratings yet
Research Paper Logistic Regression
Document7 pages
Research Paper Logistic Regression
fvf2ffp6
100% (1)
For More Visit WWW - Ktunotes.in
Document21 pages
For More Visit WWW - Ktunotes.in
Archa Rajan
No ratings yet
Dissertation Proposal Logistic Regression
Document6 pages
Dissertation Proposal Logistic Regression
WriteMyPaperApaStyleIrving
100% (1)
HR Analytics Session34
Document22 pages
HR Analytics Session34
MRD Enterprises
No ratings yet
What Is A Job Evaluation
Document4 pages
What Is A Job Evaluation
Tinotenda Mariah
No ratings yet
Measurement, Reliability, Validity
Document36 pages
Measurement, Reliability, Validity
Hailemariam Atsbeha
No ratings yet
Assessment-1 Sabina K
Document6 pages
Assessment-1 Sabina K
Sabina
No ratings yet
Module 5
Document27 pages
Module 5
Joseph Alianic
No ratings yet
Calculator Thesis
Document8 pages
Calculator Thesis
kerrylewiswashington
100% (2)
Introduction To Business Statistics Through R Software: Software
From Everand
Introduction To Business Statistics Through R Software: Software
Editor IJSMI
No ratings yet
Hercules General Info
Document176 pages
Hercules General Info
mano bsb
No ratings yet
C Language Staff Manual
Document153 pages
C Language Staff Manual
SRINU PADIGALA
No ratings yet
Programming Without Coding Technology (PWCT) - Create Your First Application
Document17 pages
Programming Without Coding Technology (PWCT) - Create Your First Application
Mahmoud Samir Fayed
0% (1)
Maths Olympiad Class (Regular) : Grade - 8
Document96 pages
Maths Olympiad Class (Regular) : Grade - 8
ANDREW BMGO
No ratings yet
A Synopsis On A Study On Impact of Moblie Banking On Custmer Satificition
Document5 pages
A Synopsis On A Study On Impact of Moblie Banking On Custmer Satificition
Shubham Yadav
No ratings yet
C1860 - Release Note - English
Document2 pages
C1860 - Release Note - English
azerfftyfgfg
No ratings yet
Advanced Power Electronics Interfaces For Distributed Energy Workshop Summary
Document255 pages
Advanced Power Electronics Interfaces For Distributed Energy Workshop Summary
Paola Correa Moncada
No ratings yet
Flysky FS gt2 User Manual
Document17 pages
Flysky FS gt2 User Manual
yv4hn
No ratings yet
Anime Boy Drawings - Google Search
Document1 page
Anime Boy Drawings - Google Search
kyhbnpsd9b
No ratings yet
JS28F128J3D
Document72 pages
JS28F128J3D
oncom
No ratings yet
SECPDS - 013 - EN 1804 NETSCOUT Arbor Edge Defense - 0 PDF
Document4 pages
SECPDS - 013 - EN 1804 NETSCOUT Arbor Edge Defense - 0 PDF
Adam Quartzite
No ratings yet
Veritas Access Appliance: Cost-Optimized Long-Term Retention
Document2 pages
Veritas Access Appliance: Cost-Optimized Long-Term Retention
Rousal Valino
No ratings yet
1 - Best Practices Client Settings - V1.2
Document13 pages
1 - Best Practices Client Settings - V1.2
rishshris
No ratings yet
GPC 2.2 D SCP03 v1.0
Document26 pages
GPC 2.2 D SCP03 v1.0
aldhosutra
No ratings yet
Middleware Org Structure: John Lyons
Document11 pages
Middleware Org Structure: John Lyons
anarki85
No ratings yet
Retail Solution With Cit Features
Document2 pages
Retail Solution With Cit Features
marco.valdez1989
No ratings yet
Denon mc3000 Manual PDF
Document41 pages
Denon mc3000 Manual PDF
MrDarcy de Bennet
No ratings yet
Narrative Report On OJT Orientation (Aug.19, 2022)
Document22 pages
Narrative Report On OJT Orientation (Aug.19, 2022)
School Office Secretary
No ratings yet
Classification of Nutrient Deficiency in Oil Palms From Leaf Images Using Convolutional Neural Network
Document9 pages
Classification of Nutrient Deficiency in Oil Palms From Leaf Images Using Convolutional Neural Network
IAES IJAI
No ratings yet
SCoRe Lab Project Proposal GSoC 2023
Document9 pages
SCoRe Lab Project Proposal GSoC 2023
fx_ww2001
No ratings yet
PDHONLINE - Google Search
Document2 pages
PDHONLINE - Google Search
Thanga Pandi
No ratings yet
Activity 2. Identify The Following Kinds of Media Below As Traditional or New Media. Write Your Answer in Your Activity Notebook
Document4 pages
Activity 2. Identify The Following Kinds of Media Below As Traditional or New Media. Write Your Answer in Your Activity Notebook
Chester Tagayuna
No ratings yet
Match The Words in The Box To The Pictures Below.: Power Supply
Document1 page
Match The Words in The Box To The Pictures Below.: Power Supply
yulia
No ratings yet
Cambridge IGCSE: Computer Science 0478/13
Document16 pages
Cambridge IGCSE: Computer Science 0478/13
Tamer Ahmed
No ratings yet
Cases
Document240 pages
Cases
Jm Krizen Soriano
No ratings yet
Manual Hioki 8430
Document8 pages
Manual Hioki 8430
Tp Link
No ratings yet
Figure 1 Describes The Java Environment
Document4 pages
Figure 1 Describes The Java Environment
Daryll Abion
No ratings yet