0% found this document useful (0 votes)

55 views21 pages

Machine Learning Boosts Bank Marketing

Uploaded by

jfk9074

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

55 views21 pages

Machine Learning Boosts Bank Marketing

Uploaded by

jfk9074

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 21

Machine Learning in Business

MIS710 – A2

Part A. Case Study Report

0
Table of Content
1. Introduction......................................................................................................................................................................
1.1 Objective.....................................................................................................................................................................
2. Methodology.....................................................................................................................................................................
2.1 Overview of the machine learning approach...............................................................................................................
3. Data preparation and Exploratory Data Analysis (EDA):....................................................................................................
3.1 Data sources................................................................................................................................................................
4. Model development and evaluation...............................................................................................................................
4.1 Supervised Machine Learning....................................................................................................................................
4.2Unsupervised Machine Learnining…………………………………………………………………………………………………………..14
5. Solution recommendation...............................................................................................................................................
5.1Supervised Machine learning......................................................................................................................................
5.2Unsupervised Machine Learnining…………………………………………………………………………………………………………..14
5. Solution recommendation...............................................................................................................................................
6. Technical recommendations...........................................................................................................................................
References..........................................................................................................................................................................

1
Machine Learning in Business

Executive Summary

This report provides a strategic analysis and machine learning solution for Great Ocean Bank aimed at
enhancing marketing campaign effectiveness and customer understanding. Using the GoBank dataset,
which includes customer demographics, banking relationships, and economic indicators, the study
predicts 'Sale' or 'No Sale' outcomes and identifies key influencing factors.

Objectives and Approach: The primary objective was to develop predictive and clustering models to
optimize marketing strategies. The approach included:

● Data preparation and exploratory data analysis (EDA) to uncover key patterns.

● Development and evaluation of two predictive machine learning models.

● Implementation of DBSCAN clustering analytics to segment customers based on similarities.

Key Findings:

Demographic factors such as age and qualification have less influence on sales outcomes ,compared
to other variables they have less predictive power.

● Existing account types affect customers' likelihood of engaging with new banking products.

● The method of last contact and past campaign results are crucial predictors of future sales.

● Economic indicators significantly impact sales outcomes.

Recommendations:

● Model Deployment: The Logistic Regression model was chosen for its simplicity, explainable

and reasonable accuracy.

2
● Customer Segmentation: DBSCAN clustering showed most promise, allowing for more

targeted marketing strategies.

The proposed solutions are expected to significantly improve marketing efficiency and customer
satisfaction for Great Ocean Bank.

1. Introduction
1.1 Objective

Great Ocean Banking Group, serving over 1 million customers in Victoria, Australia, seeks to enhance
the effectiveness of its marketing campaigns by understanding the factors influencing campaign
outcomes. The business problem is to predict potential 'Sale' or 'No Sale' outcomes and segment
customers for targeted marketing efforts. This project leverages data analytics and machine learning
to provide insights aiming to optimize marketing strategies, improve customer engagement, and drive
higher returns on investment. The value proposition lies in more efficient resource allocation,
personalized customer interactions, and data-driven decision-making, ultimately fostering stronger
customer relationships and increasing overall satisfaction.

2. Methodology
2.1 Overview of the machine learning approach

Data Preprocessing: Clean and preprocess the dataset (GoBank.csv) by handling missing values,
encoding categorical variables, and scaling numerical features.

Feature Engineering: Select and engineer relevant features that could influence the prediction, such
as customer demographics, previous interactions, and economic indicators.

Model Selection: Experiment with various classification algorithms, including Logistic Regression and
Random Forest, to identify the best-performing model.

Model Evaluation: Evaluate the models using cross-validation techniques and metrics such as
accuracy, precision, recall, and F1 score to ensure robustness and reliability.

3
Hyper-parameter Tuning: Optimize the selected model's hyper-parameters using Grid Search with
Cross-Validation to achieve the best possible performance.

Image D

3. Data preparation and Exploratory Data Analysis

(EDA):
3.1 Data sources

4
Image 1A (Source: Self-Created)

The dataset from Great Ocean Bank contains 22,940 entries and 19 columns, capturing customer
demographics, banking relationships, last contact details from marketing campaigns, and economic
indicators. Key columns, such as 'Qualification' and 'Previous Campaign Outcome', contain some null
values, which need to be addressed. Overall, the dataset is relatively clean but requires preprocessing.

Several preprocessing steps were undertaken to enhance the dataset's quality and suitability.
5
3.2. Handling Missing Values:

Given that the proportion of missing values was less than 1%, the decision was made to drop these
rows. This approach was chosen to avoid tampering with the data or introducing any bias that
imputation methods might cause. By removing these few instances, the dataset's integrity and
reliability were maintained.

Image 1B

3.3. Encoding Categorical Variables:

Machine learning models require numerical input, necessitating the conversion of categorical
variables into numerical format. Two primary methods were used:

● Label Encoding: For ordinal categorical variables, label encoding was used to maintain the

6
inherent order.

● One-Hot Encoding: For nominal categorical variables, one-hot encoding was applied to create

binary columns for each category, preventing any ordinal relationships from being inferred
where none exist.

3.4.Scaling Numerical Features:

Feature scaling is essential to ensure that numerical features contribute equally to the model,
especially when using algorithms sensitive to feature magnitudes.

Before encoding the categorical variables, a univariate analysis of the 18 variables is done to
understand the distribution and characteristics of each feature. This analysis helped in identifying the
nature of the variables, such as their central tendency, variability, and the presence of any outliers.

7
Image 1C

8
Image 1D
The initial analysis revealed a class imbalance in the target variable, which is critical to address for
accurate model performance.

3.5.Feature selection:

The feature selection utilizes the chi-square (χ²) statistical test to identify the top k features most
relevant to the target variable. This is achieved using the SelectKBest class from
sklearn.feature_selection, which ranks features based on their chi-square scores and selects the top k
highest-scoring ones. After fitting this selector to the scaled training data (X_train_scaled) and
applying it to both training and test datasets, the method retrieves the indices of the selected
features, which are then used to print their names. This process aids in dimensionality reduction by

9
retaining only the most statistically significant features, potentially enhancing model performance and
interpretability.

Image 2A

10
Image 2B

4. Model development and evaluation

4.1 Supervised Machine Learning
LOGISTIC REGRESSION

Logistic regression is a statistical method used for binary classification tasks, where the goal is to
predict the probability of an instance belonging to one of two classes. It's called "logistic" because it's
based on the logistic function, also known as the sigmoid function, which maps any real-valued
number to a value between 0 and 1.

11
Important parameters:

· penalty: Type of regularization.

· C: Inverse of regularization strength.

· solver: Optimization algorithm.

· max_iter: Maximum number of iterations.

· multi_class: Handling of multiple classes.

Image 4A

Image 4B

The accuracy of the Logistic regression model is 86.89 %. In addition, from Table 3A provided by the

12
software, it can be observed that its precision was 86%, meaning that most of the positive predictions
are correct. Similarly, the recall score is 87%, meaning that some positive prediction is not identified.
The F1 score is 87%. In the confusion matrix, it can be observed that 3627 0 s were correctly classified
while 846 1 s were correctly classified.

DECISION TREE CLASSIFIER

A decision tree classifier is like a flowchart that makes decisions based on the features of data. It
starts at the root and asks questions about the features, splitting the data into smaller groups at each
node. These questions are based on the most informative features for predicting the target variable.
Eventually, it reaches leaf nodes where no more questions are needed, and a prediction is made. To
classify new data, you follow the path in the tree based on its features until you reach a leaf node,
which gives the predicted class. Decision trees are easy to understand and interpret, making them
useful for various classification tasks.

Important parameters:

· criterion: Impurity measure for split quality.

· splitter: Strategy for choosing splits.

· max_depth: Maximum depth of the tree.

· min_samples_split: Minimum samples required to split a node.

· min_samples_leaf: Minimum samples required at a leaf node.

· max_features: Number of features considered for the best split.

· random_state: Seed for random number generation.

13
Image 4 C

Image 4D

The accuracy of the decision tree classifier model is 87.96 %. In addition, from Table 4A provided by
the software, it can be observed that its precision was 87%, meaning that most of the positive
predictions are correct. Similarly, the recall score is 88%, meaning that some positive prediction is not
identified. The F1 score is 88%. In the confusion matrix, it can be observed that 5437 0 s were
correctly classified while 1272 1 s were correctly classified.

4.2 Unsupervised Machine learning:

Clustering models
K-MEANS:
14
KMeans is a widely-used centroid-based clustering algorithm that iteratively partitions data into K
clusters. It achieves this by assigning each data point to the nearest cluster centroid and updating
centroids based on the mean of the data points in each cluster.

HIERARCHICAL CLUSTERING

Agglomerative Clustering, in contrast, is a hierarchical clustering algorithm that begins with each data
point as a separate cluster. It then iteratively merges the closest pairs of clusters until only one cluster
remains.

DBSCAN (Density-Based Spatial Clustering of Applications with Noise)

DBSCAN is a density-based clustering algorithm that groups together closely packed points based on
their density. DBSCAN does not require the user to specify the number of clusters beforehand. It
relies on two parameters: epsilon (eps), which defines the radius of the neighborhood around a point,
and minPts, the minimum number of points within that radius to define a cluster.

5. Solution recommendation

Supervised Machine learning

MODEL CLASS F1-SORE PRECISION RECALL ACCURACY

LOGISTIC 0 0.92 0.89 0.96 86.8

REGRESSION
1 0.60 0.73 0.50

DECISION TREE 0 0.93 0.91 0.94 87.9

1 0.66 0.72 0.62

15
Logistic Regression emerges as the preferred choice over Decision Trees for reasons:

Precision for the 'Sale' class: Logistic Regression exhibits slightly better precision (0.73) for the 'Sale'
class compared to Decision Trees (0.72). This indicates that Logistic Regression is better at correctly
identifying positive instances of 'Sale'.

Model Robustness and Interpretability: Logistic Regression models are known for their simplicity and
interpretability. They offer clear insights into the impact of each feature through coefficients. This
transparency aids in understanding the driving factors behind predictions.

Lower risk of overfitting: Logistic Regression tends to be less prone to overfitting compared to
Decision Trees. With fewer hyperparameters to tune and a simpler model structure, Logistic
Regression offers more robust generalization to unseen data.

Interpretability and Ease of Deployment: Logistic Regression's simplicity and interpretability make it
an attractive choice for deployment in real-world scenarios. Clients can easily grasp and trust the
model's predictions, facilitating smoother integration into decision-making processes.

Unsupervised Machine learning:

MODEL SILHOUETTE SCORE

K-MEANS 0.2959911036256011

HIERARCHICAL CLUSTERING 0.1787855930595059

DB SCAN 0.9894963986146712

Table 1B

4.5 Clustering analytics results and justification of the number of clusters:

Clustering analysis results provided insights into customer segmentation employing K-means and
hierarchical clustering. The K-means model defined four clusters regarding scaled numerical features
16
related to the customers, such as age and consumer confidence index. The number of four clusters
was justified after evaluating the elbow plot where a significant decrease in the sum of squared gen
distances of the clusters that comprised more than four was indicated. The outcomes were further
supported by the hierarchical clustering method where the number of clusters was specified in the
dendrogram.

5. Solution recommendation
The analysis conducted shows that the Great Ocean Banking Group can benefit from utilizing both
supervised and unsupervised machine learning models to improve marketing strategies. Both the
logistic regression and random forest models demonstrate strong predictive powers, allowing the
identification of prospective customers for various marketing campaigns. Furthermore, the clustering
analysis shows that several customer segments demonstrate distinct behaviors, allowing their
targeting. The bank can improve its customer engagement by focusing on these demographics and
customer behaviors.

6. Technical recommendations

Summary of Development and Testing Environment:

● Programming Language: Python

● Computing Environment: Jupyter Notebook running on Kaggle platform,VS Code

● Software Libraries:

o Pandas: Data manipulation and analysis (pd)

o NumPy: Numerical computations (np)
o Scikit-learn: Machine learning library (StandardScaler, LabelEncoder, KMeans,
AgglomerativeClustering, DBSCAN, silhouette_score)
o Matplotlib: Plotting and visualization (plt)
o Seaborn: Statistical data visualization (sns)

17
Suggestions for Maintenance of Accuracy and Relevance Over Time:

Regular Data Updates:

1. Periodically update the dataset to capture new trends and customer behaviors.
2. Automate the data ingestion process to ensure fresh data is always available.

Model Retraining:

1. Regularly retrain clustering models to adjust to new data patterns.

2. Implement a retraining schedule (e.g., quarterly) based on data volume and business
needs

Parameter Tuning:

1. Continuously monitor clustering performance metrics like silhouette score.

2. Perform periodic hyperparameter tuning for algorithms like DBSCAN to adapt to data
changes.

Monitoring and Evaluation:

1. Set up a monitoring system to track clustering performance over time.

2. Evaluate clusters against business metrics to ensure they remain meaningful and
actionable.

Data Preprocessing Enhancements:

1. Refine preprocessing steps to handle new data anomalies or emerging patterns.

2. Update encoding schemes and standardization techniques as necessary.

Documentation and Knowledge Sharing:

1. Maintain comprehensive documentation of preprocessing steps, model parameters,

and evaluation metrics.
2. Foster a collaborative environment where insights and improvements are shared

18
among team members.

Scalability and Performance:

1. Ensure the computational environment can scale with data growth.

2. Optimize code for performance, particularly for large datasets and complex algorithms.

By implementing these recommendations, the clustering models will maintain accuracy and
relevance, adapting to the evolving nature of the data and providing valuable insights for decision-
making.

References
-International Institute of Business Analysis. (2022). Business Analysis Core Concept Model (BACCM).
IIBA.

https://www.iiba.org/business-analysis-blogs/6-steps-to-applying-the-baccm/

-Zakrzewska, D., & Murlewski, J. (2005). Clustering algorithms for bank customer segmentation. In
Intelligent Systems Design and Applications, 2005. ISDA '05. Proceedings. 5th International
Conference on (pp. 197-202). IEEE Xplore. DOI:10.1109/ISDA.2005.33

19
20

K-Mean Clustering Method For Analysis Customer Lifetime Value With LRFM Relationship Model in Banking Services
No ratings yet
K-Mean Clustering Method For Analysis Customer Lifetime Value With LRFM Relationship Model in Banking Services
9 pages
Semester: 3 Course Name: Marketing Analytics Course Code: 18JBS315 Number of Credits: 3 Number of Hours: 30
No ratings yet
Semester: 3 Course Name: Marketing Analytics Course Code: 18JBS315 Number of Credits: 3 Number of Hours: 30
4 pages
Customer Analytics & Behavior Insights
No ratings yet
Customer Analytics & Behavior Insights
18 pages
ABM Sales Playbook Guide
No ratings yet
ABM Sales Playbook Guide
18 pages
Business Intelligence Basics
No ratings yet
Business Intelligence Basics
7 pages
Course: ISYS6196 Business Analytics Year: 2016: What Is Big Data and Why Is It Important?
No ratings yet
Course: ISYS6196 Business Analytics Year: 2016: What Is Big Data and Why Is It Important?
15 pages
Salesforce Marketing Cloud Insights
No ratings yet
Salesforce Marketing Cloud Insights
5 pages
Practicenor Guide To Abm
No ratings yet
Practicenor Guide To Abm
10 pages
eCommerce Personalization Whitepaper
No ratings yet
eCommerce Personalization Whitepaper
70 pages
Understanding Customer Intelligence Basics
No ratings yet
Understanding Customer Intelligence Basics
33 pages
Business Analytics
50% (2)
Business Analytics
28 pages
Getting The Most From Your Google Analytics
No ratings yet
Getting The Most From Your Google Analytics
86 pages
Sports Sponsorship Is It Worth It
100% (1)
Sports Sponsorship Is It Worth It
3 pages
Customer Profitability
No ratings yet
Customer Profitability
30 pages
Albright DADM 5e - PPT - CH 16
No ratings yet
Albright DADM 5e - PPT - CH 16
50 pages
3 - A Modified Pareto
100% (1)
3 - A Modified Pareto
26 pages
Key Questions for ERP Selection
No ratings yet
Key Questions for ERP Selection
19 pages
Evaluating Web Analytics Results
No ratings yet
Evaluating Web Analytics Results
19 pages
CLV Reading1
No ratings yet
CLV Reading1
28 pages
Experience-Led Growth in Retail
No ratings yet
Experience-Led Growth in Retail
23 pages
Customer Journey Mapping and Personalization-GP 9
100% (1)
Customer Journey Mapping and Personalization-GP 9
27 pages
Business Analytics and Decision Making V3.0
No ratings yet
Business Analytics and Decision Making V3.0
3 pages
Marketing Analytics Presentation 11 - 16
No ratings yet
Marketing Analytics Presentation 11 - 16
36 pages
CLV and Pricing Analytics Case 3
No ratings yet
CLV and Pricing Analytics Case 3
2 pages
Understanding Target Audiences
No ratings yet
Understanding Target Audiences
6 pages
Python RFM Customer Segmentation Guide
No ratings yet
Python RFM Customer Segmentation Guide
8 pages
Google Analytics Case Study Overview
No ratings yet
Google Analytics Case Study Overview
10 pages
Chpt4 ThConsumer Satisfaction Theories A Critical Revieweories
67% (3)
Chpt4 ThConsumer Satisfaction Theories A Critical Revieweories
35 pages
Customer Lifetime Value (CLV) Worksheet
No ratings yet
Customer Lifetime Value (CLV) Worksheet
4 pages
Marketing Database Analytics Guide
No ratings yet
Marketing Database Analytics Guide
22 pages
André Jackson: S E P S M - C D - B D
No ratings yet
André Jackson: S E P S M - C D - B D
3 pages
Churn Prediction Using Logistic Regression
No ratings yet
Churn Prediction Using Logistic Regression
5 pages
Implementing Sales Force Automation at Quantium Technology
No ratings yet
Implementing Sales Force Automation at Quantium Technology
12 pages
Market Basket Analysis New
No ratings yet
Market Basket Analysis New
21 pages
Customer Lifetime Value
No ratings yet
Customer Lifetime Value
4 pages
Module 3 Advertising Agency
No ratings yet
Module 3 Advertising Agency
21 pages
Marketing Intelligence: The Guide To
No ratings yet
Marketing Intelligence: The Guide To
22 pages
Mobile Marketing for Brand Marketers
No ratings yet
Mobile Marketing for Brand Marketers
29 pages
Marketing - Market Research: A Case S Tudy Analysis of Kellogg's Indian Experience
No ratings yet
Marketing - Market Research: A Case S Tudy Analysis of Kellogg's Indian Experience
5 pages
The Effects of Loyalty Programs On Profits and Customer Retention
No ratings yet
The Effects of Loyalty Programs On Profits and Customer Retention
17 pages
BMW Films Case Study
No ratings yet
BMW Films Case Study
19 pages
Viral Marketing and Social Media Insights
No ratings yet
Viral Marketing and Social Media Insights
16 pages
Find Your CAC, CLV, and ARPU: Wadhwani Foundation
100% (1)
Find Your CAC, CLV, and ARPU: Wadhwani Foundation
7 pages
ElecKart Market Mix Modeling Guide
0% (1)
ElecKart Market Mix Modeling Guide
34 pages
Future of Business Analytics PDF
No ratings yet
Future of Business Analytics PDF
14 pages
Sitecatalyst Implementation Guide
100% (3)
Sitecatalyst Implementation Guide
118 pages
Where Predictive Analytics Is Having The Biggest Impact
No ratings yet
Where Predictive Analytics Is Having The Biggest Impact
6 pages
NPS Improvement Strategies at Zolo
No ratings yet
NPS Improvement Strategies at Zolo
4 pages
Machine Learning PBL
No ratings yet
Machine Learning PBL
9 pages
Ex 5.1 Customer Behaviour Prediction
No ratings yet
Ex 5.1 Customer Behaviour Prediction
8 pages
Churn Prediction with ML Techniques
No ratings yet
Churn Prediction with ML Techniques
77 pages
Quadexp IDS Project
No ratings yet
Quadexp IDS Project
22 pages
Revenue Prediction Using Data Mining
No ratings yet
Revenue Prediction Using Data Mining
30 pages
Predictive Analysis For Retail Banking
No ratings yet
Predictive Analysis For Retail Banking
28 pages
Bank Marketing Campaign Prediction
No ratings yet
Bank Marketing Campaign Prediction
20 pages
Data Analytics On Banking
No ratings yet
Data Analytics On Banking
3 pages
Project Report
No ratings yet
Project Report
19 pages
WS 2020 ORBA - Motivation - Letter - Formular 04 2020 SK
No ratings yet
WS 2020 ORBA - Motivation - Letter - Formular 04 2020 SK
4 pages
Predicting Term Deposit Subscriptions
No ratings yet
Predicting Term Deposit Subscriptions
19 pages
Deep Learning of Path-Based Tree Classifiers For Large-Scale Plant Species
No ratings yet
Deep Learning of Path-Based Tree Classifiers For Large-Scale Plant Species
6 pages
Predicting Employee Promotions
No ratings yet
Predicting Employee Promotions
52 pages
Mo Aip 10-10-24 B2C
No ratings yet
Mo Aip 10-10-24 B2C
15 pages
Unit 2 Introduction To Deep Learning
67% (3)
Unit 2 Introduction To Deep Learning
79 pages
1506254825final Advtertisement
No ratings yet
1506254825final Advtertisement
16 pages
MCQ of Ai Class 10
No ratings yet
MCQ of Ai Class 10
15 pages
Localizing BERT for NLP Tasks
No ratings yet
Localizing BERT for NLP Tasks
1 page
Artificial Intelligence Techniques For Landslides Prediction Using Satellite Imagery
No ratings yet
Artificial Intelligence Techniques For Landslides Prediction Using Satellite Imagery
17 pages
A Competency Framework For AI Integration in India
No ratings yet
A Competency Framework For AI Integration in India
78 pages
Predicting Used Car Prices in India
No ratings yet
Predicting Used Car Prices in India
15 pages
Particle Swarm Optimization Based Detection of Diabetic Retinopathy Using A Novel Deep CNN
No ratings yet
Particle Swarm Optimization Based Detection of Diabetic Retinopathy Using A Novel Deep CNN
6 pages
Predicting Student Success in Blended Learning
No ratings yet
Predicting Student Success in Blended Learning
32 pages
Artificial Intelligence: Brochure
No ratings yet
Artificial Intelligence: Brochure
8 pages
Data Valley 21VV1A0510
No ratings yet
Data Valley 21VV1A0510
85 pages
Data & Analytics Modernization - Final
100% (1)
Data & Analytics Modernization - Final
34 pages
Department of Electronics & Telecommunications Engineering: ETEL71A-Machine Learning and AI
No ratings yet
Department of Electronics & Telecommunications Engineering: ETEL71A-Machine Learning and AI
4 pages
Deep Learning With Python Sample
100% (1)
Deep Learning With Python Sample
31 pages
Domain Adaptation For Ear Recognition Using (2018)
No ratings yet
Domain Adaptation For Ear Recognition Using (2018)
12 pages
Satellite Tech for Utility Vegetation Management
No ratings yet
Satellite Tech for Utility Vegetation Management
13 pages
Statement of Purpose
No ratings yet
Statement of Purpose
3 pages
Enhancing Early Detection of Diabetic Retinopathy Through The Integration of Deep Learning Models and Explainable Artificial Intelligence
No ratings yet
Enhancing Early Detection of Diabetic Retinopathy Through The Integration of Deep Learning Models and Explainable Artificial Intelligence
20 pages
M.tech - Computer Science Engg. 1 4 Sem. W.E.F. 2020 21
No ratings yet
M.tech - Computer Science Engg. 1 4 Sem. W.E.F. 2020 21
60 pages
AI Mini Report
No ratings yet
AI Mini Report
4 pages
843-Artificial Intelligence-Xi Xii
100% (2)
843-Artificial Intelligence-Xi Xii
11 pages
Transactions On Intelligent Welding Manufacturing Vol IY No.1 2020
No ratings yet
Transactions On Intelligent Welding Manufacturing Vol IY No.1 2020
102 pages
Perguntas Ai Associate
No ratings yet
Perguntas Ai Associate
47 pages
Orange AI 843 12 QP
No ratings yet
Orange AI 843 12 QP
8 pages
On The Safety of Machine Learning
No ratings yet
On The Safety of Machine Learning
20 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
71 pages
CHBE413CDS Lecture 12 Unsupervised DimRed
No ratings yet
CHBE413CDS Lecture 12 Unsupervised DimRed
30 pages

Machine Learning Boosts Bank Marketing

Uploaded by

Machine Learning Boosts Bank Marketing

Uploaded by

Machine Learning in Business

Part A. Case Study Report

● Development and evaluation of two predictive machine learning models.

● Implementation of DBSCAN clustering analytics to segment customers based on similarities.

● Economic indicators significantly impact sales outcomes.

and reasonable accuracy.

targeted marketing strategies.

3. Data preparation and Exploratory Data Analysis

3.3. Encoding Categorical Variables:

3.4.Scaling Numerical Features:

4. Model development and evaluation

· penalty: Type of regularization.

· C: Inverse of regularization strength.

· solver: Optimization algorithm.

· max_iter: Maximum number of iterations.

· multi_class: Handling of multiple classes.

DECISION TREE CLASSIFIER

· criterion: Impurity measure for split quality.

· splitter: Strategy for choosing splits.

· max_depth: Maximum depth of the tree.

· min_samples_split: Minimum samples required to split a node.

· min_samples_leaf: Minimum samples required at a leaf node.

· max_features: Number of features considered for the best split.

· random_state: Seed for random number generation.

4.2 Unsupervised Machine learning:

DBSCAN (Density-Based Spatial Clustering of Applications with Noise)

Supervised Machine learning

MODEL CLASS F1-SORE PRECISION RECALL ACCURACY

LOGISTIC 0 0.92 0.89 0.96 86.8

DECISION TREE 0 0.93 0.91 0.94 87.9

1 0.66 0.72 0.62

Unsupervised Machine learning:

MODEL SILHOUETTE SCORE

HIERARCHICAL CLUSTERING 0.1787855930595059

4.5 Clustering analytics results and justification of the number of clusters:

Summary of Development and Testing Environment:

● Programming Language: Python

● Computing Environment: Jupyter Notebook running on Kaggle platform,VS Code

o Pandas: Data manipulation and analysis (pd)

Regular Data Updates:

1. Regularly retrain clustering models to adjust to new data patterns.

1. Continuously monitor clustering performance metrics like silhouette score.

Monitoring and Evaluation:

1. Set up a monitoring system to track clustering performance over time.

Data Preprocessing Enhancements:

1. Refine preprocessing steps to handle new data anomalies or emerging patterns.

Documentation and Knowledge Sharing:

1. Maintain comprehensive documentation of preprocessing steps, model parameters,

Scalability and Performance:

1. Ensure the computational environment can scale with data growth.

You might also like