You are on page 1of 21

Telemarketing

dataset analysis

Group 7
Abhishek Jagdale Nilay N
Sonal Mittal Swapnil B
Swapnil T Vishal Sinha
PROBLEM STATEMENT
Analyze the effect of the ongoing Telemarketing Campaign of Portuguese banking
institution, to predict whether or not the bank customers will subscribe for the term deposit
services provided by the bank

IMPORTANCE
Predict the sales conversion outcome for both new and existing customers

Insights about the customer segment that needs more focus

Understand the important factors for conversion

Next steps can be aligned with the insights gained


ANALYSIS OF DATA
SET
CATEGORICAL DATA
DETAILS
JOB
POUTCOME
Admin, unknown, unemployed,
management, housemaid,
Unknown, failure, success
entrepreneur, student, blue-collar,
self-employed, retired, technician,
services
CONTA
MARIT Categorical CT
data Unknown, telephone,
AL
cellular
Married, divorced, single

MONTH
EDUCATIO
N Jan to dec
Unknown, secondary,
primary, tertiary
DESCRIPTIVE ANALYSIS DATA CLEANING
There is no missing value in our data

APPROACH OF SOLVING
PROBLEM
RELATIONSHIP IDENTIFICATION
Evaluating the Identification of critical
relationship of each input input variables
variable with output

ML
RESULT
ALGORITHM
Deciding which ML Combining all analysis to
algorithm to use prepare the future plan
DATA ANALYSIS

Data set is imbalanced, as negative class is 8


Distribution of numeric data
times more than positive class. Therefore,
univariate analysis would be more feasible
FEATURE - JOB

Occupation – Management is more prevalent,


followed by blue-collar Management Job people and Retirees are the ones
who have the highest balance in their accounts
FEATURE - JOB

Management job customers have the highest rate of subscribing, but they are also the 2nd highest not
subscribing after blue-collar as we have more customers working as management than any other
profession.
FEATURE – HOUSING LOAN

Majority of the customers have


housing loan.
So, call people who opted for loan
EDUCATION AND MARITAL
STATUS

Clustered Marital Status and education


Level of education has significant impact on account balance
Whether they have previous loan is also significant for the balance
Divorced people have low balance
in their accounts
FEATURE - EDUCATION
Market success is based on
education level or profile.

Target Secondary and Tertiary


people
FEATURE - CONTACT
More than one contact required

Success is more likely when


contacted frequently.
Customers over cellular medium
are more inclined towards
subscription
FEATURE - MONTH
The success rate varies based on
the month of contact.

More for May, June, July, August

Day of the weeks seems to be


irrelevant
FEATURE - POUTCOME
Most of the customers are new
customers as previous campaign
outcome is not available

Customers who had a successful


outcome from the previous
campaign, mostly subscribe for a
term deposit.
AGE & DURATION

Duration (last contact duration) of a


Median age of both type of customers customer can be useful
is around 38-40.
Already mentioned in the data
Age isn't necessarily a good indicator overview that this field highly affects
the target variable
MACHINE LEARNING
TECHNIQUES USED
The output variable is discrete (binary), therefore, we will use
Classification model for prediction- Decision tree, Random Forest,
Logistic Regression and XGBoost

We will execute Auto ML (H2o) model. The best model will be chosen
according to its performance.

There are 4 binary data variables: Housing, loan, default and Y. They will
be coded in the form of 0 and 1, where 0 = No and 1 = Yes

Month is coded as 1 to 12 and other 5 categorical variables: job, marital,


education, poutcome, contact are coded with the help of one hot vector
encoding (using get dummies)
COMPARISON OF DIFFERENT
MODELS

1. Recall value and F1 scores are good for all the models, except for Decision tree, where the F1 score is quite less
2. False negatives are lesser in case of Random Forest and XGBoost
3. The accuracy on unseen data is best in Random Forest
4. So, according to the comparison, Random forest seems to be performing better than other models
INSIGHTS FROM MODELS
Duration is important
and should be greater
than 645.5 seconds

Cellular contact is
preferred more

Other important factors


are: No. of days of
contacted previously,
month of the year and
their yearly average
balance
RECOMMENDAT
APPLICATIONS
ION
1 1
Engage more with customer to Better conversion rate by
increase likeliness of precision based targeting
conversion campaigns

2 2
Can leverage upon the Identification of trends in
good reach among customer segments & sub-
managers & blue collared ANALY segments

3 SIS 3
Maximise the
Put more efforts in
marketing outputs and
successful months – May,
minimise the wasted
June, July, August
efforts
4 4
Avoid people with default Helps in focusing on
credit, as they don’t go for improvement of the
subscription customer experience
LEARNINGS &
LIMITATIONS
LEARNINGS LIMITATIONS

Analyzing the impact of each Output can also be dependent


input variable on the output on factors not mentioned
variable
This learning may not be
Choosing the correct
applicable to other marketing
Machine Learning algorithms
campaign. Ex - duration might
to improve the performance
not be important in another

Predicting the outcome for Collected data may not be


unseen data absolutely correct, or can have
unseen errors
Thank You!

Any Questions?

You might also like